Skip to main content

A System of Record for Autonomous Driving Machine Learning Models

A look at the most useful Weights & Biases features for autonomous vehicle companies
Created on September 6|Last edited on October 6
We often hear from companies building autonomous vehicles software about their machine learning technology stacks. There are usually multiple applications and data sources reflecting different stages of the workflow, such as data exploration, model training, evaluation, or model lifecycle management.
This generates a lot of maintenance costs and a lot of effort for people trying to reconcile data across those different sources. And that assumes an optimistic scenario that data is centralized in those different systems, rather than being saved in various local or distributed storages.
On top of the excessive costs and losses to productivity, this type of landscape makes it hard to perform critical activities such as comparing and evaluating models, debugging ML pipelines, and auditing ML systems.
Fortunately, there is a better way.
Weights & Biases - a System of Record for all your ML workstreams
Adopting W&B as a system of record for machine learning generates savings for our customers who often are able to sunset multiple disconnected applications. Having the data in one place significantly improves the observability and reproducibility of your ML pipelines. Past that, collaboration is much easier as everyone has access to the same data and focuses the conversation on insights rather than reconciling data sources.
In the following case study, we'll show how to realize these benefits by adopting W&B in an autonomous vehicle context.

Table Of Contents



Explore Data

We are going to train a 3D segmentation model on LiDAR data from the KITTI Vision Benchmark Suite. In order to iterate quickly, we will sample 10% of the dataset (2000 examples).
First, let’s take a look at our training data. We will log a sample of images and labels in a W&B table which will allow us to interactively explore the dataset. Then, we'll visualize the 2D projections of the annotated points, their depth, and intensity, as well as the 3D point cloud.


Views like this make it easy to see that, for example, we likely have insufficient motorcycle data and would need to collect a bit more.

Train Your Models and Compare Experiments

Many of our customers run long and compute-intensive training jobs and want to see how the training is doing. With W&B, you can monitor your training progress from any location by logging into your dashboard.
The example charts below show the training progress on the training and validation loss as well as the mean intersection over the union metric. Different lines correspond to different experiments we conducted (you can see more details on these experiments including the model architecture and configuration by expanding the run set below the charts).

Run set
6


Evaluation and Model Registry

How do you make sense of hundreds or thousands of experiments?
A simple way is to look at summary metrics, but often you may want to dig deeper. In W&B you can interactively explore and compare models’ performance across multiple metrics. For example in the screenshot below, you can see that while model v2 is better in terms of overall IoU metric, model v0 performs better at recognizing pedestrians.

How do you decide which model is promoted to staging or production?
We often see lots of manual processes including evaluating models, tracking the results across many documents, and keeping models in a disconnected storage. This can become messy!
With W&B Model Registry, you can put things in order: your models will be linked to training runs that produced them, evaluation runs that contain test metrics, and dataset versions through Artifacts. You’ll be able to easily promote the best model to production and document the full lineage of that model.
Still, sometimes metrics are not sufficient and you may want to explore model predictions on selected test cases. In the table below we put side-by-side image labels and predictions on the validation dataset and see where the model is having problems. This may inform our choices on how to improve this model.



Version Datasets

Once your model pipeline reaches some level of maturity, many experiments will involve changes to the training data, such as adding rare or difficult examples. Tracking model lineage across multiple dataset versions may be hard, but W&B offers a lightweight approach to do this. Artifacts will track your dataset's versions including de-duplication (so that you can keep your storage cost under control). This will also work if you use your own storage solution.
In the diagram below, you can see how W&B Artifacts track the entire lineage of the model from the initial dataset, through splitting data, training a model, up to evaluation.


Collaborate

A big benefit of having a system of record is how easy collaboration becomes when everyone can look at the same data. W&B makes it even easier by giving you access to Reports - interactive documents that can include your plots, and tables and be filled with insights, updates, or summaries. Anyone reading a report can go back to the source data if they want to do some additional checks or validations - it’s a huge productivity boost for our customers! In fact, this article is a W&B Report.

Conclusion

For a bit of a deeper dive into some of our other posts on autonomous driving, feel free to check out any of the Reports in our autonomous vehicle section. And if you'd like to get in touch to schedule a demo, we'd love to hear from you.

Iterate on AI agents and models faster. Try Weights & Biases today.