The Berkeley Deep Drive (BDD110K) Dataset
The BDD100K dataset is the largest and most diverse driving video dataset with 100,000 videos annotated for 10 different perception tasks in autonomous driving.
Created on September 13|Last edited on September 30
Comment
What Is The BDD100K Dataset?
The BDD100K dataset (Berkley Deep Drive dataset) is the largest and most diverse driving video dataset with 100,000 videos annotated for 10 different perception tasks in autonomous driving. These tasks include road object detection and lane detection.
This crowd-sourced dataset contains high-resolution images and GPS/IMU data covering diverse scene types such as city streets, residential areas, and highways in varying weather conditions recorded at different times of the day.
The image frame at the 10th second in each video is annotated for image tasks and entire sequences are used for tracking tasks. BDD100K covers realistic driving scenarios and captures more of the “long-tail” of appearance variation and pose configuration of categories of interest released in scalable annotation format.
Before we dive in, here's what we'll be covering:
What We're Covering About BDD100K
What Is The BDD100K Dataset?What We're Covering About BDD100KGeneral Info About The BDD100K DatasetDataset Structure (click to expand)Supported Tasks Of The BDD100K DatasetImage TaggingLane DetectionDrivable Area SegmentationRoad Object DetectionRecommended Reading
General Info About The BDD100K Dataset
Dataset Structure (click to expand)
Supported Tasks Of The BDD100K Dataset
Here's a quick list of tasks supported by the BDD100K dataset.
Image Tagging
The BDD100K dataset provides image-level annotation on six weather conditions, six scene types, and three distinct times of the day. It contains large portions of annotations of extreme weather conditions, such as snow and rain. Additionally, there are approximately an equal number of day and night video annotations in the dataset.
Getting a bit more granular, this dataset contains:
- Weather: clear, overcast, snowy, rainy, foggy, partly cloudy, undefined
- Scene: tunnel, residential, parking lot, city streets, gas stations, highway, undefined
- Time of day: daytime, night, dawn/dusk, undefined

Lane Detection
Lane detection is the task of detecting lanes on a road from a camera. It's essential for many aspects of autonomous driving, such as lane-based navigation and high-definition (HD) map modeling.
The rich annotations in the BDD100K dataset mark lane labels for 3 distinct sub-tasks:
Lane Categories
0: crosswalk1: double other2: double white3: double yellow4: road curb5: single other6: single white7: single yellow8: background
Lane Directions
0: parallel1: vertical2: background
Lane Styles
0: solid1: dashed2: background
Drivable Area Segmentation
In addition to lane level annotations, the BDD100K dataset is also curated for drivable area segmentation tasks. Specifically, there are annotations for two different categories in the dataset.
- Directly Divable Area: The directly drivable area refers to what the driver is currently driving on. It also refers to the region where the driver has priority over other cars or the right of the way.
- Alternatively Drivable Area: The alternatively drivable area refers to the driver who is currently not driving, but is able to do so via changing lanes.
Although the directly and alternatively drivable areas are visually in-distinguishable, they are functionally different and require the algorithms to recognize blocking objects and scene context.
Road Object Detection
The frames at the 10th second in the videos are annotated with bounding boxes for the 10 common objects in the autonomous driving domain. This results in 100K images with the following 2-D object annotations.
1: pedestrian2: rider3: car4: truck5: bus6: train7: motorcycle8: bicycle9: traffic light10: traffic sign
Semantic Segmentation
In image segmentation, an image has two main components: things and stuff.
Things correspond to countable objects in an image (e.g., people, flowers, birds, animals, etc.) while stuff represents uncountable regions (or repeating patterns) of similar texture (e.g., road, sky, and grass).
Pixel-level semantic segment annotations for stuff are available for 10K images in the dataset. However, due to some legacy reasons, not all the images have corresponding videos and this is therefore not a subset of the 100K images, even though there is a significant overlap.
0: road1: sidewalk2: building3: wall4: fence5: pole6: traffic light7: traffic sign8: vegetation9: terrain10: sky11: person12: rider13: car14: truck15: bus16: train17: motorcycle18: bicycle
Panoptic Segmentation
The same 10K images also contain unified image segmentation annotations where each pixel in a scene is assigned a semantic label and a unique instance identifier. The discrepancy in overlapping annotations is resolved by favoring the object instance, as the priority is to identify each thing rather than stuff.
The following different panoptic segmentation annotations are available in the dataset. Labels 0-30 represent stuff while 31-40 represent things.
0: unlabeled1: dynamic2: ego vehicle3: ground4: static5: parking6: rail track7: road8: sidewalk9: bridge10: building11: fence12: garage13: guard rail14: tunnel15: wall16: banner17: billboard18: lane divider19: parking sign20: pole21: polegroup22: street light23: traffic cone24: traffic device25: traffic light26: traffic sign27: traffic sign frame28: terrain29: vegetation30: sky31: person32: rider33: bicycle34: bus35: car36: caravan37: motorcycle38: trailer39: train40: truck
Multi-Object Tracking
To aid in understanding the temporal association between objects in videos, the BDD100K dataset includes 2,000 videos with about 400K frames. Each video is approximately 40 seconds and annotated at 5 fps, resulting in approximately 200 frames per video. There are 130.6K track identities and 3.3M bounding boxes annotated for the first 8 classes in the object detection task.
The dataset presents complicated occlusion and reappearing patterns with over 49,418 occurrences of occlusion, or one occurrence of occlusion every 3.51 tracks.
Multi-Object Tracking and Segmentation
In autonomous vehicle development, MOTS aims to perform the segmentation and tracking of multiple objects in crowded scenes. Rich and dense annotations are provided for 90 videos with over 14K frames and 129K annotations. The same 8 classes are annotated as in the object tracking task.
Pose Estimation
Humans and pedestrians in the BDD100K dataset are annotated with 18 different key point annotations to aid in pose estimation and detection. The dataset includes joint annotations for around 10K frames out of the 100,000 sampled frames.
0: head1: neck2: right_shoulder3: right_elbow4: right_wrist5: left_shoulder6: left_elbow7: left_wrist8: right_hip9: right_knee10: right_ankle11: left_hip12: left_knee13: left_ankle14: right_hand15: left_hand16: right_foot17: left_foot
Recommended Reading
Object Detection for Autonomous Vehicles (A Step-by-Step Guide)
Digging into object detection and perception for autonomous vehicles using YOLOv5 and Weights & Biases
The Semantic KITTI Dataset
Semantic-Kitti is a large semantic segmentation and scene understanding dataset developed for LiDAR-based autonomous driving. But what it is and what is it for?
The Waymo Open Dataset
The Waymo Open Dataset is a perception and motion planning video dataset for self-driving cars. It’s composed the perception and motion planning datasets.
The PandaSet Dataset
PandaSet is a high-quality autonomous driving dataset that boasts the most number of annotated objects among 3d scene understanding datasets.
The nuScenes Dataset
nuScenes is a large-scale 3D perception dataset for Autonomous Driving provided by motional. The dataset has 3D bounding boxes for 1000 scenes.
The Woven Planet (Lyft) Level 5 Dataset
In this article, we'll be exploring the Woven Planet (Lyft) Level 5 dataset. We'll look at what it is as well as the autonomous vehicle tasks and techniques it supports
Add a comment
Iterate on AI agents and models faster. Try Weights & Biases today.