Skip to main content

Status Report - Feb 2022

A sample Autonomous Vehicle Project Status Report and Analysis
Created on February 18|Last edited on March 18

Current Objectives

The objectives of the current iteration are:
  • Establish a baseline Semantic Segmentation model.
  • Iteratively improve the model to ensure the best performance.
  • Tweak the various hyperparameters and select the best performing backbone architecture.
  • Analyze the performance of the model based on the established metrics and the number of parameters or the size of the model.
  • Identify the most suitable model to be deployed in production.

Dataset Overview

We are using the Cambridge-driving Labeled Video Database or CamVid to train our model. It contains a collection of videos with object class semantic labels, complete with metadata. The database provides ground truth labels that associate each pixel with one of 32 semantic classes.


We are using Artifacts by Weights & Biases which makes it really easy and convenient to store and version our datasets. Creating a new version of dataset and fetching a particular version takes only a couple of lines of code:
# Create Artifact
with wandb.init():
artifact = wandb.Artifact('camvid-dataset', type='dataset')
artifact.add_dir(dataset_path)
wandb.log_artifact(artifact)

# Fetching Artifact
with wandb.init():
artifact = run.use_artifact('camvid-dataset:v0', type='dataset')
artifact_dir = artifact.download()
We use Tables on our Weights & Biases workspace to visualize and explore our images and segmentation labels. This table contains the amount of pixels of each class that are present on one image. This is useful to filter images containg a give class.

CamVid Visualization
1


Learning Objective

The model is supposed to learn a per-pixel annotation of a scene captured from the point of view of the autonomous agent. The model needs to categorize or segment each pixel of a given scene into 32 relevant categories such as road, pedestrian, sidewalk, cars etc. like listed below. You can click on any of the segmented images on the table shown above and access this interactive interface for accessing the segmentation result and categories.



Baseline Experiments

For the baseline experiments we decided to use a simple architecture inspired by the UNet with a ResNet50, VGG19 and MobileNetV2 backbone which in-spite being quite easy to implement is also quite robust with respect to its performance. We also incorporated the Chained Residual Pooling Layer as proposed by the creators of the RefineNet architecture, so that our model is able to capture background context from a large image region by efficiently pooling features with multiple window sizes and fusing them together with residual connections and learnable weights. We performed the baseline experiments with Focal Loss. We attach a brief summary of our experiments with the baseline models and the loss functions.

Baseline Experiments
3


Edge Cases

For safety reasons, the classes that are going to be our top priority will be:
  • Pedestrian 🚶‍♂️
  • Bicyclist 🚴‍♂️
  • Car 🚗
  • Heavy Vehicles 🚌
  • Traffic Light 🚥
We use wandb.Table to log inputs and predictions of our models along metrics per class. We log the Dice coefficient per class. This metric is 1 when we correctly classify the object and 0 when we don't. We can filter and sort the tables dynamically to visualize where our models fail. Let us examine the performance of the baseline models on these classes:

Baselines Runs
3

Note that the models fail to detect the high priority cases in a lot of the images resulting in lots of Edge Cases. We added a conditional filter in the tables to only show the cases where the model was able to detect all the high priority classes.
💡

Hyperparameter Tuning

In order to improve the performance of the baseline model, we need to not only select the best model, but also the best set of hyperparameters to train it with. This, in-spite of being quite a daunting task, was actually made quite easy for us by Sweeps. Sweeps made it extremely easy for us to employ a Bayesian hyperparameter search method with a goal to minimize the loss of the model on the validation dataset.
From our experiments run using the Sweeps, we can see that the performances of the models with various backbones and different sets of hyper-parameters and based on that we can see which model performs the best as per our pre-determined metrics.


Hyperparameter Search using Sweep
37


Key Insights from the Sweep

  • Lower learning rate and lower weight decay results in better foreground accuracy and dice scores.
  • Batch size and Image resize factor have strong positive correlations with the metrics.
  • The VGG based backbones might not be a good option to train our final model, since they are prone to result in vanishing gradient.
  • The ResNet backbones result into the best overall performance with respect to the metrics.
  • The ResNet34 or ResNet50 backbone should be chosen for the final model due to their strong performance in terms of metrics and faster inference times than other backbones.


Result of Best Runs from the Sweep
1
Baseline Experiements
1



Final Training

In order to finalize the model for this iteration, we decided to both fine-tune and fit-one-cycle a UNet with ResNet50 and ResNet34 backbones with the best set of hyperparameters for these two backbones from the sweep.

Final Training Experiments
4

Final Graph View for the Dataset Artifact

All the code used in our experiments in available at https://github.com/soumik12345/Wandb-Status-Report-Template
💡

Next Steps

Improving the Semantic Segmentation Model

  • Collect more data, especially containing the classes with the highest priority to improve the performance of the model on edge cases.
  • Experiment with Weighted CrossEntropy Loss in order to tackle the imbalance in the distribution of high priority classes in our dataset.
  • Experiment performance of the model with Multi objective Loss function: a weighted sum of CrossEntropy, Focal Loss and Dice Loss.
  • Experiment with more recent architectures such as DeepLabV3+, Bilateral Segmentation Network, Swin Transformer, etc.

Semantic Segmentation along with Depth Estimation

Our autonomous vehicle agent needs a complete 3D perception of the world surrounding it. We need a model that can estimate an overall perception of depth from a given scene beside segmentation. This could be achieved by creating two separate models for semantic segmentation and depth estimation but the models are expected to run in real-time in production, possibly on an on-board computer with limited computational resources. In order to overcome this problem, we chose a model that could simultaneously perform semantic segmentation as well as depth estimation using a singular shared backbone. We can use the model in this open-source project as our baseline for this task.

HydraNet Inference
3

Sample 3D Point Clouds reconstructed from the predicted depth maps using Weights & Biases Rich Media Format

Similar Posts