Anomalous Data in Your Autonomous Data
This article shows how exploratory data analyses can be made easily shareable in less than a few minutes with Weights & Biases.
Created on September 13|Last edited on November 2
Comment
The autonomous vehicle space is evolving quickly: benchmark datasets and the techniques used to create decision intelligence on those datasets are constantly changing and improving. Still, sometimes it's worth re-examining the foundational datasets that allowed for early 'leaps' forward in machine learning.
One such collection is the Berkeley Deep Drive 100K (BDD100K)dataset. This dataset is, as its name suggests, a collection of one hundred thousand examples of training data – short video recordings, specifically – designed around fully autonomous vehicle prediction tasks. BDD100K also has several smaller datasets made up of still images and using the image or video can allow us to train a model on specific tasks such as lane detection, object detection and localization, and semantic segmentation.
In this report, we'll look at why real-world, non-simulated datasets like BDD100K are particularly valuable and how you can find and leverage corner cases to improve autonomous vehicle model performance.
First though, let's talk a bit about the data we're using today:
Data Provenance: Why Use Datasets like BDD100K?
The BDD100K dataset was collected from phone cameras to allow for data collection under a variety of real-world conditions.
It leverages lower-cost instrumentation (a.k.a. a single smartphone's camera setup) as compared to more expensive, multimodal sensors such as mono and stereo cameras, thermal imaging, night vision, and the various 'ranging' imaging systems (RADAR, LiDAR and SONAR). By fusing the output of these sensors and training machine learning models on the multimodal perception of the driving environment, a model becomes much more robust to a wide range of driving scenarios that are typically more challenging for autonomous vehicles. Think poor-contrast scenes (too dark or too light), precipitation in its various forms (fog, snow, and rain), etc.
Relying solely on visible-light cameras can be thought of as being analogous to how a human perceives the world. This is in contrast to, say, LiDAR sensors with their wide field of view, centimeter-level accuracy, object detection and recognition at long distances, ability to 'see' in poor contrast environments (too light, too dark). After all, the human brain doesn't have the ability to see in the dark, to perceive a full 180º of scenery on the x plane, nor can it tell you if the ball in the road is 12 cm tall or 14 cm tall.
Thus, if you're attempting to model visual perception of a driving environment in a fashion similar to how the human eye takes in information using a pure mono camera configuration, like that which was used to create the BDD100K dataset, is recommended.

An early data collection process involving a mobile phone with built-in accelerometer, gyroscope and GPS, which evolved into the data seen in the BDD100K dataset
While there's no shortage of extremely highly-fidelity AV datasets that were either collected under near-optimal conditions in sunny locales with little to no unexpected pedestrian or other non-car traffic, these datasets add little value to an autonomous vehicle's self-driving model. Training on billions of kilometres of 'easy' data only means that when the model is presented with a 'hard' scenario that it has never seen before, it's likely to perform in suboptimal ways. In other words, if a model's never seen a tricky edge case, you can't expect it to perform well when it comes across one in the real world.
Queue the BDD100K dataset: gathered using mobile phone cameras, under suboptimal environmental conditions, in densely-populated urban areas with large amounts of unpredictable pedestrian and vehicular traffic, the BDD100K dataset aims to fill a need faced by autonomous vehicle companies and their data providers. Namely, the need for 'challenging' data on which to train a self-driving vehicle model. After all, it's not a matter of if but when.
And when a self-driving vehicle encounters these rare, adverse scenarios? The hope is that it will have already seen similarly-challenging examples. At that point, the model will either fail elegantly and safely–requesting that the human driver briefly take control of the wheel–or it will succeed in maintaining automated control of the vehicle.
Nexar's cofounder and CTO Bruno Fernandez-Ruiz has this to say about the BDD100K dataset, the data collection method, and the challenges faced by AV model builders who have 'too clean' of training data:
By corner cases and edge conditions, I mean highly unusual events that happen on the road or conditions that aren’t standard. This could range from unusual and extreme weather to collapsed power lines and more. In contrast, the solutions by others in the industry focus on high fidelity data collection, and as a result are not as easily accessible, not widely deployed, and observe and learn from a limited dataset, 10 times smaller than Nexar, that exhibits reduced variation. This is the reason why these companies end up using simulation to develop and test their algorithms, while Nexar and BDD100K can work with actual real field data.
Programmatically Understanding Anomalous Data
Knowing why anomalous data is particularly valuable in this domain, let's look at how we can discover and use it. Here, we'll leverage some work from researchers at ETH Zürich–namely, utilizing a model that allows us to predict at a pixel level the anomalous regions in our complex driving scenes–and display the results in a Weights & Biases Table below, our tool for data exploration.
Tables allows us to do more than log images, text data, or videos. We can log essentially any form of data that you may want to display in columnar format. Below, you'll see some examples, like predicted semantic segmentation and perceptual difference. Tables are interactive, so simply click one of the images below to examine it in greater detail.
With just a few lines of code, you can call out examples where your model may find anomalous regions in images, like we've done above with the heatmap-like visualizations of some BDD100K examples in less-than-ideal weather. You can also you can highlight examples in your training data which may require re-annotation, examples that may cause your model to learn inaccurate representations. In fact, we've outlined those below in Drivable Area mask images. For additional information on quickly and easily utilizing our dynamic bounding box functionality, which works great with object localization algorithms, check out these tutorials:
Anomalous Data Observations for Practitioners
While exploring anomalous images in the BDD dataset by using some visual similarity tools along with Weights & Biases Tables, we noticed some data collection issues which may be relevant to practitioners in the AV, robotic path-finding, or automated driver assist spaces who want to make use of this task-diverse dataset:
- Although the data creators state that the dataset was gathered in New York City, Berkeley, San Francisco, and Bay Area you may be surprised to find some other cities in different countries represented in the dataset; we'll give you a hint: not all palm trees in this dataset are located in California!
- Ground truths may not be what they seem. If you're intent on using this data as training data or ground-truth data and take the annotated driveable area markings and/or road signage at face value you may be disappointed. Building off of the first bullet point: what looks like a road sign in New York City or the greater Bay Area may appear very different in the other palm-tree locale.
- Beware the class imblance! Yes, we have only a few hundred examples of foggy , snowy, or otherwise inclement weather but it may behoove you to trial some metalearning or other methods of identifying incorrectly-classified data. Not only may weather have error-prone labeling, but the time_of_day data deserves a second look as well. Granted, if you're utilizing sensor fusion, then many of these incorrect metadata attributes won't apply to you, but if you're developing pure CV models using mono cameras and want to preprocess overly-dark or -light images to optimize contrast levels, then it makes sense to double-check the classes that were provided by the BDD100K data annotators.
On-Going Contests Using the BDD100K Dataset
As an AV practitioner or CV enthusiast, you may be interested in participating in the ECCV 2002 BDD100K Challenge.
Make use of Weights & Biases Tables, Experiment Tracking and more to help you quickly build, report on, and iterate on your best-performing models!
Add a comment
Iterate on AI agents and models faster. Try Weights & Biases today.