Skip to main content

The Berkeley Deep Drive (BDD110K) Dataset

The BDD100K dataset is the largest and most diverse driving video dataset with 100,000 videos annotated for 10 different perception tasks in autonomous driving.
Created on September 13|Last edited on September 30

What Is The BDD100K Dataset?

The BDD100K dataset (Berkley Deep Drive dataset) is the largest and most diverse driving video dataset with 100,000 videos annotated for 10 different perception tasks in autonomous driving. These tasks include road object detection and lane detection.
This crowd-sourced dataset contains high-resolution images and GPS/IMU data covering diverse scene types such as city streets, residential areas, and highways in varying weather conditions recorded at different times of the day.
The image frame at the 10th second in each video is annotated for image tasks and entire sequences are used for tracking tasks. BDD100K covers realistic driving scenarios and captures more of the “long-tail” of appearance variation and pose configuration of categories of interest released in scalable annotation format.
Before we dive in, here's what we'll be covering:

What We're Covering About BDD100K



General Info About The BDD100K Dataset

Dataset Structure (click to expand)

Supported Tasks Of The BDD100K Dataset

Here's a quick list of tasks supported by the BDD100K dataset.

Image Tagging

The BDD100K dataset provides image-level annotation on six weather conditions, six scene types, and three distinct times of the day. It contains large portions of annotations of extreme weather conditions, such as snow and rain. Additionally, there are approximately an equal number of day and night video annotations in the dataset.
Getting a bit more granular, this dataset contains:
  • Weather: clear, overcast, snowy, rainy, foggy, partly cloudy, undefined
  • Scene: tunnel, residential, parking lot, city streets, gas stations, highway, undefined
  • Time of day: daytime, night, dawn/dusk, undefined



Lane Detection

Lane detection is the task of detecting lanes on a road from a camera. It's essential for many aspects of autonomous driving, such as lane-based navigation and high-definition (HD) map modeling.
The rich annotations in the BDD100K dataset mark lane labels for 3 distinct sub-tasks:

Lane Categories

0: crosswalk
1: double other
2: double white
3: double yellow
4: road curb
5: single other
6: single white
7: single yellow
8: background

Lane Directions

0: parallel
1: vertical
2: background

Lane Styles

0: solid
1: dashed
2: background

Drivable Area Segmentation

In addition to lane level annotations, the BDD100K dataset is also curated for drivable area segmentation tasks. Specifically, there are annotations for two different categories in the dataset.
  • Directly Divable Area: The directly drivable area refers to what the driver is currently driving on. It also refers to the region where the driver has priority over other cars or the right of the way.
  • Alternatively Drivable Area: The alternatively drivable area refers to the driver who is currently not driving, but is able to do so via changing lanes.
Although the directly and alternatively drivable areas are visually in-distinguishable, they are functionally different and require the algorithms to recognize blocking objects and scene context.

Road Object Detection

The frames at the 10th second in the videos are annotated with bounding boxes for the 10 common objects in the autonomous driving domain. This results in 100K images with the following 2-D object annotations.
1: pedestrian
2: rider
3: car
4: truck
5: bus
6: train
7: motorcycle
8: bicycle
9: traffic light
10: traffic sign

Semantic Segmentation

In image segmentation, an image has two main components: things and stuff.
Things correspond to countable objects in an image (e.g., people, flowers, birds, animals, etc.) while stuff represents uncountable regions (or repeating patterns) of similar texture (e.g., road, sky, and grass).
Pixel-level semantic segment annotations for stuff are available for 10K images in the dataset. However, due to some legacy reasons, not all the images have corresponding videos and this is therefore not a subset of the 100K images, even though there is a significant overlap.
0: road
1: sidewalk
2: building
3: wall
4: fence
5: pole
6: traffic light
7: traffic sign
8: vegetation
9: terrain
10: sky
11: person
12: rider
13: car
14: truck
15: bus
16: train
17: motorcycle
18: bicycle

Panoptic Segmentation

The same 10K images also contain unified image segmentation annotations where each pixel in a scene is assigned a semantic label and a unique instance identifier. The discrepancy in overlapping annotations is resolved by favoring the object instance, as the priority is to identify each thing rather than stuff.
The following different panoptic segmentation annotations are available in the dataset. Labels 0-30 represent stuff while 31-40 represent things.
0: unlabeled
1: dynamic
2: ego vehicle
3: ground
4: static
5: parking
6: rail track
7: road
8: sidewalk
9: bridge
10: building
11: fence
12: garage
13: guard rail
14: tunnel
15: wall
16: banner
17: billboard
18: lane divider
19: parking sign
20: pole
21: polegroup
22: street light
23: traffic cone
24: traffic device
25: traffic light
26: traffic sign
27: traffic sign frame
28: terrain
29: vegetation
30: sky
31: person
32: rider
33: bicycle
34: bus
35: car
36: caravan
37: motorcycle
38: trailer
39: train
40: truck

Multi-Object Tracking

To aid in understanding the temporal association between objects in videos, the BDD100K dataset includes 2,000 videos with about 400K frames. Each video is approximately 40 seconds and annotated at 5 fps, resulting in approximately 200 frames per video. There are 130.6K track identities and 3.3M bounding boxes annotated for the first 8 classes in the object detection task.
The dataset presents complicated occlusion and reappearing patterns with over 49,418 occurrences of occlusion, or one occurrence of occlusion every 3.51 tracks.

Multi-Object Tracking and Segmentation

In autonomous vehicle development, MOTS aims to perform the segmentation and tracking of multiple objects in crowded scenes. Rich and dense annotations are provided for 90 videos with over 14K frames and 129K annotations. The same 8 classes are annotated as in the object tracking task.

Pose Estimation

Humans and pedestrians in the BDD100K dataset are annotated with 18 different key point annotations to aid in pose estimation and detection. The dataset includes joint annotations for around 10K frames out of the 100,000 sampled frames.

0: head
1: neck
2: right_shoulder
3: right_elbow
4: right_wrist
5: left_shoulder
6: left_elbow
7: left_wrist
8: right_hip
9: right_knee
10: right_ankle
11: left_hip
12: left_knee
13: left_ankle
14: right_hand
15: left_hand
16: right_foot
17: left_foot


Iterate on AI agents and models faster. Try Weights & Biases today.