The nuScenes Dataset
nuScenes is a large-scale 3D perception dataset for Autonomous Driving provided by motional. The dataset has 3D bounding boxes for 1000 scenes.
Created on September 14|Last edited on September 30
Comment
What Is the nuScenes Dataset?
In the nuScenes Dataset, each scene is 20 seconds long and shows a diverse and interesting set of driving maneuvers, traffic situations, and unexpected behaviors. They are annotated with 23 object classes with accurate 3D bounding boxes at 2Hz.
This results in a total of 28,130 samples for training, 6,019 samples for validation, and 6,008 samples for testing. It is the first large-scale dataset to provide data from the entire sensor suite of an autonomous vehicle (6 cameras, 1 LIDAR, 5 RADAR, GPS, IMU)
What We're Covering About nuScenes
What Is the nuScenes Dataset? What We're Covering About nuScenes General Info About the nuScenes DatasetDataset Structure Supported Tasks of the nuScenes Dataset3d Object Detection3D Object TrackingMotion PredictionLiDAR SegmentationPanoptic Segmentation and TrackingRecommended Reading
General Info About the nuScenes Dataset
Dataset Structure
The dataset is provided as a relational database with multiple annotations and source tables to include calibration, maps, vehicle coordinates, etc.
A sample record looks as follows:
sample_data {"token": <str> -- Unique record identifier."sample_token": <str> -- Foreign key. Sample to which this sample_data is associated."ego_pose_token": <str> -- Foreign key."calibrated_sensor_token": <str> -- Foreign key."filename": <str> -- Relative path to data-blob on disk."fileformat": <str> -- Data file format."width": <int> -- If the sample data is an image, this is the image width in pixels."height": <int> -- If the sample data is an image, this is the image height in pixels."timestamp": <int> -- Unix time stamp."is_key_frame": <bool> -- True if sample_data is part of key_frame, else False."next": <str> -- Foreign key. Sample data from the same sensor that follows this in time. Empty if end of scene."prev": <str> -- Foreign key. Sample data from the same sensor that precedes this in time. Empty if start of scene.}
and a sample annotation looks like this:
sample_annotation {"token": <str> -- Unique record identifier."sample_token": <str> -- Foreign key. NOTE: this points to a sample NOT a sample_data since annotations are done on the sample level taking all relevant sample_data into account."instance_token": <str> -- Foreign key. Which object instance is this annotating. An instance can have multiple annotations over time."attribute_tokens": <str> [n] -- Foreign keys. List of attributes for this annotation. Attributes can change over time, so they belong here, not in the instance table."visibility_token": <str> -- Foreign key. Visibility may also change over time. If no visibility is annotated, the token is an empty string."translation": <float> [3] -- Bounding box location in meters as center_x, center_y, center_z."size": <float> [3] -- Bounding box size in meters as width, length, height."rotation": <float> [4] -- Bounding box orientation as quaternion: w, x, y, z."num_lidar_pts": <int> -- Number of lidar points in this box. Points are counted during the lidar sweep identified with this sample."num_radar_pts": <int> -- Number of radar points in this box. Points are counted during the radar sweep identified with this sample. This number is summed across all radar sensors without any invalid point filtering."next": <str> -- Foreign key. Sample annotation from the same object instance that follows this in time. Empty if this is the last annotation for this object."prev": <str> -- Foreign key. Sample annotation from the same object instance that precedes this in time. Empty if this is the first annotation for this object.}
More details on the data format with details of all the relevant tables can be found in the data format section of the data provider
Supported Tasks of the nuScenes Dataset
The following are the support tasks of the nuScenes Dataset:
3d Object Detection
3D Object Detection places a bounding box around 10 object categories and estimates a set of attributes and the current velocity vector. For each class, annotations decrease with an increasing range from the ego vehicle. The number of annotations per range varies.
The following objects are annotated for this task:
1: barrier2: bicycle3: bus4: car5: construction_vehicle6: motorcycle7: pedestrian8: traffic_cone9: trailer10: truck
3D Object Tracking
3D Object Tracking is a natural progression from object detection. In addition to detecting objects in scenes, the task is to track these objects across time. The nuScenes Dataset provides object annotations for 7 different categories.
The number of annotations decreases with increasing radius from the ego vehicle, but the number of annotations per radius varies by class. The objects annotated for this task do not include static objects such as barrier, traffic_cone and construction_vehicle.
The following object annotations are present for this task:
1: bicycle2: bus3: car4: motorcycle5: pedestrian6: trailer7: truck
Motion Prediction
The Motion Prediction task aim to predict the future trajectories of an ego vehicle in a scene as a sequence of x-y locations. With the nuScenes Dataset, the predictions that are generated are 6-seconds long and are sampled at 2 hertz.
The models for the task can be evaluated using Minimum Average Displacement Error over k (minADE_k), Minimum Final Displacement Error over k (minFDE_k), and Miss Rate At 2 meters over k (MissRate_2_k)
LiDAR Segmentation
The goal of LiDAR Segmentation is to predict the category of every point in a set of 3D LiDAR point clouds. There are 16 categories (10 foreground classes and 6 background classes). Annotations are provided for the following classes:
1: barrier2: bicycle3: bus4: car5: construction_vehicle6: motorcycle7: pedestrian8: traffic_cone9: trailer10: truck11: driveable_surface12: other_flat13: sidewalk14: terrain15: manmade16: vegetation
Panoptic Segmentation and Tracking
The goal of Panoptic Segmentation is to predict the semantic categories of every point and additional instance IDs for things, focusing on static frames. Panoptic Tracking additionally enforces temporal coherence and pixel-level associations over time.
For both tasks, there are 16 categories (10 thing and 6 stuff classes). Panoptic Quality (PQ) is the primary ranking metric for panoptic segmentation tasks, and Panoptic Tracking (PAT) metric is the primary ranking metric for panoptic tracking tasks.
Recommended Reading
The Berkeley Deep Drive (BDD110K) Dataset
The BDD100K dataset is the largest and most diverse driving video dataset with 100,000 videos annotated for 10 different perception tasks in autonomous driving.
The Semantic KITTI Dataset
Semantic-Kitti is a large semantic segmentation and scene understanding dataset developed for LiDAR-based autonomous driving. But what it is and what is it for?
The PandaSet Dataset
PandaSet is a high-quality autonomous driving dataset that boasts the most number of annotated objects among 3d scene understanding datasets.
The Waymo Open Dataset
The Waymo Open Dataset is a perception and motion planning video dataset for self-driving cars. It’s composed the perception and motion planning datasets.
The Woven Planet (Lyft) Level 5 Dataset
In this article, we'll be exploring the Woven Planet (Lyft) Level 5 dataset. We'll look at what it is as well as the autonomous vehicle tasks and techniques it supports
The Many Datasets of Autonomous Driving
Below we'll explore the datasets used to train autonomous driving systems to perform the various tasks required of them.
Add a comment
Iterate on AI agents and models faster. Try Weights & Biases today.