SBX Robotics: Synthetic Training Data & Scene Composition with Tables

Exploring the impact of scene composition on segmentation model performance with the new Tables view in W&B. Made by Artem @ SBX Robotics using W&B
Artem @ SBX Robotics
Did you know that scene composition has a large impact on model performance in computer vision projects?
Imagine you’re looking to identify groceries piled inside a shopping basket from a wide-angle camera. Collecting this data in the real world can take unimaginable hours of piling products into baskets and capturing images, not to mention the cost of labeling, data cleaning, or sourcing the items themselves!
It is tempting to bootstrap these types of projects with freely available datasets like D2S or by labeling generic images from the web. Unfortunately, even if the datasets feature the same items, performance will be limited because the images are not taken from the same perspective, scene composition, or camera configuration.
At SBX Robotics we're experts in using synthetic data to bootstrap and improve deep learning vision systems, so we decided to run an experiment to show just how much of an impact scene composition can have on model performance, and take W&B's new Tables feature for a spin!
In this demo we we will:
  1. Take a look at a well-performing segmentation model trained to identify grocery items on a tabletop.
  2. Test the same model on a validation set with the same items, but a different scene composition: bin-picking
  3. Craft a new dataset with the same items, yet in a bin-picking scene, train a benchmark model, and compare it with results from #2.

1. Tabletop Dataset & Model

We start with a validation set of grocery items on a tabletop, taken with various camera settings in various lighting conditions.
Then we'll create a synthetic dataset of these same items in table-top settings and expose object segmentation annotations.
With synthetic data, we can vary lighting and camera parameters more aggressively to produce a more robust model.
Next we train a benchmark model on the synthetic data (Mask R-CNN from TorchVision), and test it on the real validation data.
We have published a Synthetic Data Tutorial, featuring 10,000 free training samples, training code, and a video walkthrough to allow you to reproduce this same model on your own.

Table-top model: inferences

From a first glance, the results look very good!
The new Tables feature in Weights & Biases allows us to take a deeper look at the model performance per image.

Table-top model: aggregate results

With a few clicks we can aggregate that data into histograms, showing near-perfect performance on most validation samples, and highlighting that the "double-detection" false positives pose the biggest issue.

2. Testing the tabletop model on bin-picking data

Now that we have a baseline model (tabletop), we can ask our question:
Does performance of our tabletop model hold up when it is tested on a validation set of items piled into a bin or a shopping basket?

New dataset: bin-picking

Same items, similar camera & lighting variations, but with drastically different scene composition:
Let's take a look at inferences using the exact same Tables setup, starting with the lowest F1 score, to see the core issues

The tabletop model performance drops significantly when tested on the bin-picking dataset.

We can glean a few more insights with a simple "Group By" aggregation:

What happened to the performance?

Taking a deeper look into the inferences, we construct a theory that the bin picking scene introduces perspectives on the items that the tabletop model was not exposed to. For instance, our maple syrup bottle looks different upright than from the top or on its side, and pile of items creates novel poses, occlusions, and shadows.

3. Synthetic bin-picking dataset

We can test our scene composition theory by generating another synthetic training dataset using the same 3D assets used to create the tabletop dataset, with a few core changes to more closely represent the new bin-picking validation data:
Some samples from our synthetic bin-picking training data
We can now follow the same procedure outlined in our synthetic data tutorial to train a benchmark model on this new dataset, and test it against the bin-picking validation data.
Since we've already loaded the aggregate data into a Weights & Biases Table, we can start by comparing the two models side by side:

Results: Bin picking model vs. Table model performance on bin-picking validation data

The impact is quite drastic:

Inference images: Bin picking model vs. Table model

Looking at the un-aggregated data for every image, allows us to visualize the inferences side-by-side.

Bin picking inference video

Instead of looking at boring pictures, why not see the model in action on a video sequence?
See the full video here: SBX Kitchen Bin Unload on Vimeo

Conclusion

Scene composition makes a large impact on transfer learning.
We have tested 2 similar synthetic training datasets created with the same 3D assets arranged in different scenes: tabletop and bin-picking. A well performing model trained and tested on the table-top scene, performed quite poorly on the bin-picking dataset. Modifying the synthetic data generator to more closely map to the bin-picking scene improved the mean F1 score by 30%
This experiment highlights the power of synthetic data - a simple change in the data generator had a material impact on model performance. By removing data collection, annotation, and cleaning as blockers, computer vision engineers can focus on building the best vision systems for their business.

About SBX Robotics

Working on a computer vision problem? We can help.
At SBX Robotics we are experts in using synthetic data to bootstrap and improve computer vision systems.
Our clients send us ~25 images from their production setting, and we generate 25,000 synthetic training samples proven to work on the original validation data. All of our datasets ship with:

Ready to try synthetic data for your project?

Use this link to submit 25-50 images from your production setting, or contact us at info@sbxrobotics.com.
Mention "Weights & Biases Tables Tutorial" for 20% off your first synthetic dataset.
If you want to experiment with synthetic data on your own, try our Synthetic Data Tutorial with a free 10,000 sample synthetic dataset, training code, and instructional videos to help you train and evaluate a model.
You can find all of the code and models used to generate this report in this Colab Notebook.
Thanks for reading!
~Team SBX