Skip to main content

Layout: Deep Drive

Created on January 31|Last edited on February 5

Semantic segmentation for scene parsing





A self-driving car must functionally understand the road and its environment the way a human would from the driver's seat. One promising computer vision approach is semantic segmentation: parse visual scenes from a car dashboard camera into relevant objects (cars, pedestrians, traffic signs), foreground (road, sidewalk), and background (sky, building). Semantic segmentation annotates an image with object types, labeling meaningful subregions as a tree, bus, cyclist, etc. For a given car dashboard photo, this means labeling every pixel as belonging to a subregion.

Below you can see examples in two columns: raw images, the model's predictions, and the correct labels. Buildings are orange, car is pink, road is cobalt blue, and pedestrians are beige. In the left column, the model can't differentiate between a pedestrian and a rider on a bicycle (magenta and cyan in ground truth, beige in prediction). Note how the hazy conditions in the right column make the model predictions blurry around the boundaries between dashboard and road, or vehicle and road).

Run set
2


Example segmentation maps




Model predictions
Run set
2


Reproduce & extend existing work




Run set
398


Which objects matter most?




Run set
3


Comparing per-class accuracies




Run set
3


Resnet is too broad, Alexnet too detailed




Run set
2


Comparing encoder variants




Run set
2


First experiments: increase weight decay, decrease learning rate




All manual runs
107
First manual sweep
10


First Experiments: Increase Weight Decay, Decrease Learning Rate




First manual sweep
10


Hyperparameter Sweep Insights




Run set
398


Hyperparameter Sweep Insights




Run set
398


Manual vs Automated Sweeps




Runs by sweep
398


Comparing manual and automated sweeps




Runs by sweep
398


Insights from sweeps




Runs by sweep
398