Image Masks for Semantic Segmentation Using Weights & Biases

This article explains how to log and explore semantic segmentation masks, and how to interactively visualize models' predictions with Weights & Biases.

Stacey Svetlichnaya

Created on April 15|Last edited on October 10

Comment

﻿
﻿
When working on semantic segmentation, you can interactively visualize your models' predictions in Weights & Biases. This article explores just one use case to give you a feel for the possibilities, and the API for logging image masks is very intuitive. 
Below, I explain the interaction controls for this tool and include some examples for you to try from four model variants, which you can read about in detail in this article. Of course this is helpful for many domains besides self-driving: medical imaging, satellite data, microscopic slides, and more.
﻿Try it yourself in a Colab notebook →﻿
Interactive, Stateless, and Precise VisualizationWith this tool, I can interact with all of the predictions and the ground truth as separate layers in my browser, without needing to track, save, or restore a bunch of different views of the same image. I can understand a model's behavior on different classes, relative to the ground truth, much faster and more precisely. Finally, I can share these insights much more easily with others by saving my view in a report like this one. Below, I just toggled the "car" class to see that initially, the model predicts the humans in the foreground are cars, but by the end of training, it correctly identifies them as "person".

﻿
Before this tool, I was analyzing results through a single, fixed composite view. It's hard to remember which colors correspond to which class, hard to distinguish small details, and easy to confuse predictions when the colors of the label mask and the ground truth image all perceptually combine to similar hues:
﻿
I separated these out into side-by-side masks, which helped somewhat but still requires a lot of cognitive overhead to visually diff the images. Plus, when I discover something in this view, it's hard for me to save that exact visual to share it with someone else.

﻿
Interaction Controls If you click on the Settings icon in the top left corner of a media panel, you will see this pop-up menu for interacting with the images:

﻿
mask type selection: toggle the  eye icon to the left of the mask name to turn the mask types on or off. This lets you compare a model's predictions and the correct answer on the same image
class selection: if you click on a class label like "road", you can toggle it on and off, independently across the mask types.
class opacity: if you hove over a class label, you'll see a slider to adjust the opacity of that class. This can help you visually pick out which subregions  of the image correspond to that label and see all the detail of the original image when you set the opacity low.
class search : to see a class label that doesn't fit in the menu, you can type it in the search bar to the right of the mask name. For these models, the full label set is:  'road', 'sidewalk', 'building', 'wall', 'fence', 'pole', 'traffic light', 'traffic sign', 'vegetation', 'terrain', 'sky', 'person', 'rider', 'car',  'truck', 'bus', 'train', 'motorcycle', 'bicycle', and 'void'.
Examples To TryPrediction improvement over time: this shows predictions for 4 different models variants (one per row) on the same image at the start (left column) and end (right column) of  20 epochs of training. You can see how the predictions improve over time and that some classes are easier to learn than others.
Final predictions for 4 models (one per column): this shows 5 example predictions for each of 4 models (one per column). You can see how different models compare overall (e.g. the rightmost model learns much blockier regions than all the others) and how they compare within particular classes (e.g. only the middle two models find any "trucks", no models find "bus", and  the third model from left mislabels the bus as a "truck").
﻿
﻿
﻿
UNet Models4
﻿
﻿
More Detailed API Walkthrough
Example: Semantic Segmentation for Self-Driving CarsI train a U-Net in fast.ai to identify 20 different categories relevant to driving scenes: car, road, person, bus, etc. The training data is from the Berkeley Deep Drive 100K, and you can read more details in this article. After each training epoch, I test the latest version of the model on a subset of images from the validation set and log the results as follows:
for each validation image original_image, the model returns a prediction prediction_mask—this is the same shape as the raw image, but instead of RGB pixel values it contains integers corresponding to the most likely class label for that pixel, out of the 20 different possibilities
for reference, I also log the label for that validation image: the ground_truth_mask, which contains the correct label for each pixel of the raw image
the class_labels are a dictionary of the form {0: "car", 1: "road",  2: "person", ...} 
with these components, I call
wandb.log(
  {"my_image_key" : wandb.Image(original_image, masks={
    "predictions" : {
        "mask_data" : prediction_mask,
        "class_labels" : class_labels
    },
    "ground_truth" : {
        "mask_data" : ground_truth_mask,
        "class_labels" : class_labels
    }
})})
﻿