Skip to main content

Autonomous Vehicle: Project Status Report and Analysis

A sample Autonomous Vehicle Project Status Report and Analysis
Created on February 9|Last edited on February 14

Iteration Objectives

The Objective of the current iteration are:
  • Establish a baseline Semantic Segmentation model.
  • Iteratively improve the model to ensure the best performance.
  • Tweak the various hyper-parameters and select the best performing backbone architecture.
  • Analyze the performance of the model based on the established metrics and the number of parameters or the size of the model.
  • Identify the most suitable model to be deployed in production.


Dataset Overview

We are using the Cambridge-driving Labeled Video Database or CamVid to train our model. It contains a collection of videos with object class semantic labels, complete with metadata. The database provides ground truth labels that associate each pixel with one of 32 semantic classes.
We are using Artifacts by Weights & Biases which makes it really easy and convenient to store and version our datasets. Creating a new version of dataset and fetching a particular version takes only a couple of lines of code:
# Create Artifact
with wandb.init():
artifact = wandb.Artifact('camvid-dataset', type='dataset')
artifact.add_dir(dataset_path)
wandb.log_artifact(artifact)

# Fetching Artifact
with wandb.init():
artifact = run.use_artifact('camvid-dataset:v0', type='dataset')
artifact_dir = artifact.download()
We also utilised the Table widget on our Weights & Biases workspace to visualize and explore dour data.

Run set
82



Learning Objective

The model is supposed to learn a per-pixel annotation of a scene captured from the point of view of the autonomous agent. The model needs to categorise or segment each pixel of a given scene into 32 relevant categories such as road, pedestrian, sidewalk, cars etc. like listed below. You can click on any of the segmented images on the table shown above and access this interactive interface for accessing the segmentation result and categories.



Baseline Model

For the baseline model we decided to use a simple architecture inspired by the UNet with a MobileNetV2 backbone which inspite being quite easy to implement and understand is also quite robust with respect to its performance. We also incorporated the Chained Residual Pooling Layer as proposed by the creators of the RefineNet architecture, so that our model is able to capture background context from a large image region by efficiently pooling features with multiple window sizes and fusing them together with residual connections and learnable weights. We experimented with 3 loss functions: Categorical Cross-Entropy, Focal Loss and Dice Loss. We attach a brief summary of our experiments with the baseline models and the loss functions.


Run set
4


Hyper-parameter Tuning

In order to improve the performance of the baseline model, we need to not only select the best model to begin with, but also the best set of hyper-parameters to train it with. This, inspite of being quite a daunting task, was actually made quite easy for us by Weights & Biases Sweeps. Sweeps made it extremely easy for us to employ a Bayesian hyper-parameter search method with a goal to minimize the loss of the model on the validation dataset.
From our experiments run using the Sweeps, we can see that the performances of the models with various backbones and different sets of hyper-parameters and based on that we can see which model performs the best as per our pre-determined metrics.

Run set
73


Model Selection

From the Parallel Coordinate Plot of the sweeps attached in the aforementioned panel, we can see that a few models with VGG16 and VGG19 backbones result in NaN for validation loss which might have caused due to vanishing gradients. Hence it might be wise for us to avoid the usage of VGG based backbones for our final model. We need to check which model gives us the best tradeoff in terms of number of model parameters and performance with respect to the metrics since we need our model to run as fast as possible in a production environment without significant drop in the accuracy and overall performance metrics.
As we can see that the models with the smallest and largest number of parameters does not hold up with respect to foreground accuracy and Dice score. The best performance comes from the 15th run from the sweep which has a MobileNetV2 backbone and luckily its a very light-weight model which can be easily deployed for production.


Run set
72



Next Steps

Improving the Segmentation Model

Now that we have set up a training and evaluation pipeline for Semantic Segmentation using the awesome set of tools by Weights and Biases, we are now looking forward to make our model's performance more robust on real-life test data. We need to collect more data, train our model iteratively and figure out the edge cases in an iterative manner.

Semantic Segmentation along with Depth Estimation

Our autonomous vehicle agent needs a complete 3D perception of the world surrounding it. We need a model that can estimate an overall perception of depth from a given scene beside segmentation. This could be achieved by creating two separate models for semantic segmentation and depth estimation but the models are expected to run in real-time in production, possibly on an on-board computer with limited computational resources. In order to overcome this problem, we chose a model that could simultaneously perform semantic segmentation as well as depth estimation using a singular shared backbone. We can use the model in this open-source project as our baseline for this task.

Run set
82

Sample 3D reconstruction via point clouds derived from predicted Depth Maps visualized using Weights & Biases Rich Media Format


Similar Reports


Thomas Capelle
Thomas Capelle •  
We should add keywords that SEO well like: the loss functions, unet, segmentation stuff, etc... The Objective of the current iteration are:
Reply
Thomas Capelle
Thomas Capelle •  
awesome set of too hahaha
Reply
Thomas Capelle
Thomas Capelle •  
Model Params
group by model name?
Reply
Thomas Capelle
Thomas Capelle •  
Weave: (empty)
here you would do: job_type = upload_dataset
Reply
Thomas Capelle
Thomas Capelle •  
Here we should use job type to quickly filter afterwards. th wandb.init(): artifact = run.use_artifact('camvid-dataset:v0', type='dataset') artifact_dir = artifact.download()
Reply