An Exploration of Lyft's Self Driving Car Dataset
The goal of this report is to introduce the reader to the Lyft self-driving car dataset, and empower them with enough knowledge to start training a model.
Created on January 8|Last edited on August 15
Comment
The Goal Of This Article On The Lyft Dataset
The goal of this report is to introduce the reader to the Lyft self-driving car dataset, and empower them with enough knowledge to start training a model.
Table Of Contents
Reproducing This AnalysisThe Lyft DatasetThe GoalThe 9 Classes In The Lyft DatasetThe Lyft DatasetLet's Dig A Little DeepLyft SDK's Data Visualization MethodsThe U-Net ModelThe Insights and Next StepsKey Insights
Reproducing This Analysis
Here are some links you might find useful as you start to explore the Lyft self-driving car dataset.
The Lyft Dataset
The Lyft dataset is composed of raw sensor camera and LiDAR inputs as perceived by a fleet of multiple, high-end, autonomous vehicles in a bounded geographic area. The dataset also includes high-quality, human-labeled 3D bounding boxes of traffic agents, and an underlying HD spatial semantic map.

The Goal
The goal's to predict the bounding volumes and classes of all of the objects in a given scene in the test dataset.
For example, for the sample_token 97ce3ab08ccbc0baae0267cbf8d4da947e1f11ae1dbcb80c3f4408784cd9170c, we might predict the presence of a car (with confidence 1.0) and a bus (with confidence 0.5). The prediction would look like this with all predicted objects on the same line.:
97ce3ab08ccbc0baae0267cbf8d4da947e1f11ae1dbcb80c3f4408784cd9170c,1.0 2742.152625996093 673.1631800662494 -18.6561112411676 1.834 4.609 1.648 2.619835541569646 car 0.5 2728.9634555684484 657.8296521874645 -18.54676216218047 1.799 4.348 1.728 -0.5425527100619654 bus
Note that confidence values are inserted prior to center_x center_y center_z width length height yaw class_name.
The 9 Classes In The Lyft Dataset
Lyft has 9 classes of objects in the dataset:
- bus
- bicycle
- emergency_vehicle
- truck
- car
- motorcycle
- animal
- pedestrian, and
- other_vehicle.
As we can see below, car is the most dominant class. This means our models will have to be able to handle imbalanced classes and get really good at detecting things like pedestrians and motorcycles with a smaller amount of data.

The Lyft Dataset
The data contains many interlocking tables and formats - image and lidar files, bounding box annotations, and JSON files
There are 55,000 human-labeled 3D annotated frames of traffic agents. In addition the data set contains bitstreams from 7 cameras and 3 lidar sensors, and an HD spatial semantic map with 4,000 lane segments, 197 crosswalks, 60 stop signs, 54 parking zones, eight speed bumps, and 11 speed humps.
1. Image and Lidar files
The images and lidar files all correspond to a sample in sample_data.json, and the sample_token from sample_data.json is the primary identifier used for the train and test samples.
2. Bounding Box Annotations
The annotations in train.csv are in the following format: center_x center_y center_z width length height yaw class_name
- center_x, center_y and center_z are the world coordinates of the center of the 3D bounding volume.
- width, length and height are the dimensions of the volume.
- yaw is the angle of the volume around the z axis (where y is forward/back, x is left/right, and z is up/down - making 'yaw' the direction the front of the vehicle / bounding box is pointing at while on the ground).
- class_name is the type of object contained by the bounding volume.
3. JSON files
In addition to image files and bounding box data, the train and test sets also come with the following JSON files:
- scene - 25-45 seconds snippet of a car's journey.
- sample - An annotated snapshot of a scene at a particular timestamp.
- sample_data - Data collected from a particular sensor.
- sample_annotation - An annotated instance of an object within our interest.
- instance - Enumeration of all object instance we observed.
- category - Taxonomy of object categories (e.g. vehicle, human).
- attribute - Property of an instance that can change while the category remains the same.
- visibility - (currently not used)
- sensor - A specific sensor type.
- calibrated sensor - Definition of a particular sensor as calibrated on a particular vehicle.
- ego_pose - Ego vehicle poses at a particular timestamp.
- log - Log information from which the data was extracted.
- map - Map data that is stored as binary semantic masks from a top-down view.
The JSON files all contain single tables with identifying `tokens` that can be used to join with other files / tables.

Let's Dig A Little Deep
The complexity of the dataset and the sheer abundance of JSON files might make the dataset seem intimidating. I found it helpful to visualize the contents of the JSON files. This helped me build an intuition around how to best to train my models.a
Let's look at a few of the files. The time spent here will pay dividends once you start model training.
1. Sample
An annotated snapshot of a scene at a particular timestamp. Let's visualize the first annotated sample in this scene.
Instead of looking at camera and lidar data separately, we can also project the lidar pointcloud into camera images:

2. Sample_Data
Data collected from a particular sensor.
The dataset contains data that is collected from a full sensor suite. Hence, for each snapshot of a scene, Lyft provides references to a family of data that is collected from these sensors.
Let's render the sample_data at a particular sensor.

3. Sample_Annotation
An annotated instance of an object within our interest.
sample_annotation refers to any *bounding box defining the position of an object seen in a sample**. All location data is given with respect to the global coordinate system. Let's examine an example from the sample above.

4. Instance
Object instances are instances that need to be detected or tracked by an AV (e.g a particular vehicle, or pedestrian). Lyft generally tracks an instance across different frames in a particular scene. However, it does not track them across different scenes. An instance record takes note of its first and last annotation token. Let's visualize them.
First annotated sample of this instance

Last annotated sample of this instance:
Notice the position of the bounding box is different.

Lyft SDK's Data Visualization Methods
Finally, before we move on to the models, I want to share the data visualization methods that come with the lyft-dataset-sdk. These are immensely useful in visualizing model inputs and outputs.
List Methods
There are 3 list methods.
- list_categories() lists all categories, counts and statistics of width/length/height in meters and aspect ratio.
- list_attributes() lists all attributes and counts.
- list_scenes() lists all scenes in the loaded DB.
Render Methods
- level5data.render_pointcloud_in_image() - Renders pointclouds
- level5data.render_sample() - Render all annotations across all sample data for that sample
- level5data.render_sample_data() - Render data from a particular sensor
- level5data.render_sample_data() - aggregate the point clouds from multiple sweeps to get a denser point cloud
- level5data.render_annotation() - render a specific annotation
- level5data.render_scene_channel(my_scene_token, 'CAM_FRONT') - render a full scene as a video (optionally for a specific channel)
- level5data.render_egoposes_on_map() - visualize all scenes on the map for a particular location
Example of level5data.render_egoposes_on_map()

The U-Net Model
I trained a simplified version of the U-Net model on this dataset.
Baseline
I started with a baseline model with the following attributes:
- base_filters: 16
- epochs: 5
- train_size: 100
- learning_rate: 0.00001
- activation: relu
- optimizer: adam
- batch_norm: False
- depth: 2
- batch_size: 16
Experiments with Hyperparameters
I then tried a few different values for each of these hyperparameters (you can explore them in the run sets below).
<-- Analysis of experiments coming soon -->
U-Net
The U-Net architecture.
<-- Explain choice of U-Net architecture -->

The Insights and Next Steps
Next Steps
Lavanya
Done
- Train baseline model
- Measure effect of changing learning rate, batch size, optimizers, activation functions and other hyperparams
Next Steps
- Add analysis of hyperparameter experiments
- Explain my choice of U-net architecture
- Add key insights
- Run a sweep
- Create visualizations (plot runtime vs epochs, plot runtime vs datapoints, Visualize Predictions - which classes were consistently misclassified?)
- Dive deeper into misclassified cars (does the model miss cars of a specific dimension?, is it bad at recognizing cars at a particular place in its field of vision)
- Train on base_filters, batch_size > 32, more training examples (I ran out of memory and haven't had a chance to run this model on a beefier machine yet, I ran these experiments on 100 training examples for 5 epochs each. Might be interesting to run experiments on more training examples for more epochs and see if results are consistent.)
Stacey
- Try model architectures other than Unet (It might be fun to try a couple of different model architectures and build on each other’s insights!)
Key Insights
Add a comment