Skip to main content

Virtual Retail-ity: State-of-the-Art Machine Learning for Remote Shopping

A look at the current state of machine learning applications in retail with a focus on virtual try-on techniques
Created on July 27|Last edited on September 22


Table of Contents (Click to Expand)



Introduction

Have you ever been frustrated with the online shopping experience? Maybe you've ordered multiple sizes of the same clothing item just to ensure you get one that fits? Perhaps you've decided not to purchase something online because you weren't sure about the right size? If you answered yes to any of these questions — this project is for you.

ML in Retail

The use of machine learning in the retail industry has matured rapidly in the last decade. Retailers today have leveraged traditional machine learning for recommendation systems, basket composition, and just-in-time incentives (e.g., checkout coupons).
Time series modeling has also impacted the industry with applications in SKU-level sales modeling, stocking/supply modeling, and staffing estimation. More recently, deep learning techniques, specifically the use of computer vision, have been used in several applications like customer intent/mood classification, customer journey modeling, and remote shopping applications.
The surges in online sales have made remote shopping a key initiative for many retailers. With the growth of curbside pickup and grocery/goods delivery services in recent years, retailers have endeavored to improve the remote shopping experience for their customers as competition between brick and mortar and online retailers has heated up. The contributing factor to many of these services is restrictions and shopper health concerns related to COVID-19.

Advances in Remote Shopping

Several retailers have publicized their use of cutting-edge techniques and mediums for improving the remote shopping experience. The use of virtual reality (VR) to give users the ability to shop virtually "in-store" has been widely documented by large retailers such as Walmart.
Remote displays of shelf contents for online browsing and subsequent delivery or pickup have been used in practice for a number of years. VR/AR techniques for improving the remote shopping experience are not likely to see considerable uptake in the next 5-10 years with the general use of VR/AR in the general public being the limiting factor.
Shopping for clothing items and accessories has struggled to shift from brick and mortar to online outlets. Where it has, namely the purchasing of clothing items from online retailers such as Amazon, retailers have been plagued with costs stemming from excessive returns, poorly sized items, and items not matching online representation in images. One technique that shows promise, particularly for clothing and accessory retailers, is the use of remote try-on techniques to give shoppers the ability to try on clothing and accessories using their own bodies as the subject.


Run set
1


M3D-VTON

Doing a quick survey of the literature to get a sense of what the state-of-the-art was for virtual try-on, a paper and model detailing a technique called Monocular to 3D Virtual Try-On (M3D-VTON) appeared.
In a nutshell, M3D-VTON can take a 2-dimensional front-facing image of a person and a clothing item and create a 3-dimensional representation of the person with that clothing item projected onto the person.

The M3D-VTON technique consists of three embedded modules that focus on different aspects of the problem.
At a high level there is the:
  • Monocular Prediction Module (MPM) that focuses on understanding the shape of the person and their body parts and creates a 2-dimensional warped person-clothing image

MTM
1

  • Depth Refinement Module (DRM) which generates a 3-dimensional representation of the person-clothing image

DRM
1

  • Texture Refinement Module (TRM) which improves the textured person-clothing image and creates the final 3-dimensional representation of the person-clothing image

TFM
1

Here's a deeper analysis on M3D-VTON and an additional resource to compare it with two other virtual try-on methods:


W&B Artifacts and Tables

It turns out that W&B Artifacts and Tables are excellent tools for tracking inputs and outputs in complex modeling techniques such as M3D-VTON.
Across all of these modules, there are over 20 images, masks, body joint key points, body segmentations, and point cloud objects that are created or consumed by the modules. Artifacts is an invaluable tool for tracking the lineage of these intermediate products throughout the pipeline.

With complex modeling techniques that include so much rich media, Tables become an invaluable tool for logging rich media and visualizing the entire data and modeling pipeline. The ability to log both raw image objects and 3d point clouds gives users the ability to see all of their inputs and outputs in a single table.

Outtakes


Run set
19

Iterate on AI agents and models faster. Try Weights & Biases today.