The Berkeley Deep Drive (BDD110K) Dataset

The BDD100K dataset is the largest and most diverse driving video dataset with 100,000 videos annotated for 10 different perception tasks in autonomous driving.
Bharat Ramanathan
Created on September 13|Last edited on September 30
Comment
﻿
What Is The BDD100K Dataset?The BDD100K dataset (Berkley Deep Drive dataset) is the largest and most diverse driving video dataset with 100,000 videos annotated for 10 different perception tasks in autonomous driving. These tasks include road object detection and lane detection.
This crowd-sourced dataset contains high-resolution images and GPS/IMU data covering diverse scene types such as city streets, residential areas, and highways in varying weather conditions recorded at different times of the day.
The image frame at the 10th second in each video is annotated for image tasks and entire sequences are used for tracking tasks. BDD100K covers realistic driving scenarios and captures more of the “long-tail” of appearance variation and pose configuration of categories of interest released in scalable annotation format.
Before we dive in, here's what we'll be covering:
What We're Covering About BDD100KWhat Is The BDD100K Dataset?What We're Covering About BDD100KGeneral Info About The BDD100K DatasetDataset Structure (click to expand)Supported Tasks Of The BDD100K DatasetImage TaggingLane DetectionDrivable Area SegmentationRoad Object DetectionRecommended Reading
﻿
General Info About The BDD100K DatasetHomepage: https://www.bdd100k.com/﻿
Repository: https://github.com/bdd100k/bdd100k﻿
Paper: https://arxiv.org/abs/1805.04687﻿
Leaderboard: https://www.kaggle.com/competitions/lyft-motion-prediction-autonomous-vehicles/leaderboard﻿
License: https://doc.bdd100k.com/license.html﻿
Dataset Structure (click to expand)
Supported Tasks Of The BDD100K DatasetHere's a quick list of tasks supported by the BDD100K dataset.
Image TaggingThe BDD100K dataset provides image-level annotation on six weather conditions, six scene types, and three distinct times of the day. It contains large portions of annotations of extreme weather conditions, such as snow and rain. Additionally, there are approximately an equal number of day and night video annotations in the dataset.
Getting a bit more granular, this dataset contains: 
Weather: clear, overcast, snowy, rainy, foggy, partly cloudy, undefined
Scene: tunnel, residential, parking lot, city streets, gas stations, highway, undefined
Time of day: daytime, night, dawn/dusk, undefined
﻿
﻿
Lane DetectionLane detection is the task of detecting lanes on a road from a camera. It's essential for many aspects of autonomous driving, such as lane-based navigation and high-definition (HD) map modeling.
The rich annotations in the BDD100K dataset mark lane labels for 3 distinct sub-tasks:
Lane Categories0: crosswalk
1: double other
2: double white
3: double yellow
4: road curb
5: single other
6: single white
7: single yellow
8: background
Lane Directions0: parallel
1: vertical
2: background
Lane Styles0: solid
1: dashed
2: background
Drivable Area SegmentationIn addition to lane level annotations, the BDD100K dataset is also curated for drivable area segmentation tasks. Specifically, there are annotations for two different categories in the dataset.
Directly Divable Area: The directly drivable area refers to what the driver is currently driving on. It also refers to the region where the driver has priority over other cars or the right of the way.
Alternatively Drivable Area: The alternatively drivable area refers to the driver who is currently not driving, but is able to do so via changing lanes.
Although the directly and alternatively drivable areas are visually in-distinguishable, they are functionally different and require the algorithms to recognize blocking objects and scene context.
Road Object DetectionThe frames at the 10th second in the videos are annotated with bounding boxes for the 10 common objects in the autonomous driving domain. This results in 100K images with the following 2-D object annotations.
1:  pedestrian
2:  rider
3:  car
4:  truck
5:  bus
6:  train
7:  motorcycle
8:  bicycle
9:  traffic light
10: traffic sign
Semantic SegmentationIn image segmentation, an image has two main components: things and stuff.
Things correspond to countable objects in an image (e.g., people, flowers, birds, animals, etc.) while stuff represents uncountable regions (or repeating patterns) of similar texture (e.g., road, sky, and grass).
Pixel-level semantic segment annotations for stuff are available for 10K images in the dataset. However, due to some legacy reasons, not all the images have corresponding videos and this is therefore not a subset of the 100K images, even though there is a significant overlap.
0:  road
1:  sidewalk
2:  building
3:  wall
4:  fence
5:  pole
6:  traffic light
7:  traffic sign
8:  vegetation
9:  terrain
10: sky
11: person
12: rider
13: car
14: truck
15: bus
16: train
17: motorcycle
18: bicycle
Panoptic SegmentationThe same 10K images also contain unified image segmentation annotations where each pixel in a scene is assigned a semantic label and a unique instance identifier. The discrepancy in overlapping annotations is resolved by favoring the object instance, as the priority is to identify each thing rather than stuff. 
The following different panoptic segmentation annotations are available in the dataset. Labels 0-30 represent stuff while 31-40 represent things. 
0:  unlabeled
1:  dynamic
2:  ego vehicle
3:  ground
4:  static
5:  parking
6:  rail track
7:  road
8:  sidewalk
9:  bridge
10: building
11: fence
12: garage
13: guard rail
14: tunnel
15: wall
16: banner
17: billboard
18: lane divider
19: parking sign
20: pole
21: polegroup
22: street light
23: traffic cone
24: traffic device
25: traffic light
26: traffic sign
27: traffic sign frame
28: terrain
29: vegetation
30: sky
31: person
32: rider
33: bicycle
34: bus
35: car
36: caravan
37: motorcycle
38: trailer
39: train
40: truck
Multi-Object TrackingTo aid in understanding the temporal association between objects in videos, the BDD100K dataset includes 2,000 videos with about 400K frames. Each video is approximately 40 seconds and annotated at 5 fps, resulting in approximately 200 frames per video. There are 130.6K track identities and 3.3M bounding boxes annotated for the first 8 classes in the object detection task.
The dataset presents complicated occlusion and reappearing patterns with over 49,418 occurrences of occlusion, or one occurrence of occlusion every 3.51 tracks.
Multi-Object Tracking and SegmentationIn autonomous vehicle development, MOTS aims to perform the segmentation and tracking of multiple objects in crowded scenes. Rich and dense annotations are provided for 90 videos with over 14K frames and 129K annotations. The same 8 classes are annotated as in the object tracking task.
Pose EstimationHumans and pedestrians in the BDD100K dataset are annotated with 18 different key point annotations to aid in pose estimation and detection. The dataset includes joint annotations for around 10K frames out of the 100,000 sampled frames.
﻿
0:  head
1:  neck
2:  right_shoulder
3:  right_elbow
4:  right_wrist
5:  left_shoulder
6:  left_elbow
7:  left_wrist
8:  right_hip
9:  right_knee
10: right_ankle
11: left_hip
12: left_knee
13: left_ankle
14: right_hand
15: left_hand
16: right_foot
17: left_foot
﻿
Recommended Reading
Object Detection for Autonomous Vehicles (A Step-by-Step Guide)
Digging into object detection and perception for autonomous vehicles using YOLOv5 and Weights & Biases
The Semantic KITTI Dataset
Semantic-Kitti is a large semantic segmentation and scene understanding dataset developed for LiDAR-based autonomous driving. But what it is and what is it for?
The Waymo Open Dataset
The Waymo Open Dataset is a perception and motion planning video dataset for self-driving cars. It’s composed the perception and motion planning datasets.
The PandaSet Dataset
PandaSet is a high-quality autonomous driving dataset that boasts the most number of annotated objects among 3d scene understanding datasets. 
The nuScenes Dataset
nuScenes is a large-scale 3D perception dataset for Autonomous Driving provided by motional. The dataset has 3D bounding boxes for 1000 scenes.
The Woven Planet (Lyft) Level 5 Dataset
In this article, we'll be exploring the Woven Planet (Lyft) Level 5 dataset. We'll look at what it is as well as the autonomous vehicle tasks and techniques it supports
﻿
﻿
Add a comment
Tags: Autonomous Vehicles, Computer Vision, Semantic Segmentation, Object Detection
Iterate on AI agents and models faster. Try Weights & Biases today.