Layout: Deep Drive
Semantic segmentation for scene parsing
A self-driving car must functionally understand the road and its environment the way a human would from the driver's seat. One promising computer vision approach is semantic segmentation: parse visual scenes from a car dashboard camera into relevant objects (cars, pedestrians, traffic signs), foreground (road, sidewalk), and background (sky, building). Semantic segmentation annotates an image with object types, labeling meaningful subregions as a tree, bus, cyclist, etc. For a given car dashboard photo, this means labeling every pixel as belonging to a subregion.
Below you can see examples in two columns: raw images, the model's predictions, and the correct labels. Buildings are orange, car is pink, road is cobalt blue, and pedestrians are beige. In the left column, the model can't differentiate between a pedestrian and a rider on a bicycle (magenta and cyan in ground truth, beige in prediction). Note how the hazy conditions in the right column make the model predictions blurry around the boundaries between dashboard and road, or vehicle and road).