An Exploration of Lyft's Self Driving Car Dataset

The goal of this report is to introduce the reader to the Lyft self-driving car dataset, and empower them with enough knowledge to start training a model.
Lavanya Shukla
Created on January 8|Last edited on August 15
Comment
﻿
The Goal Of This Article On The Lyft DatasetThe goal of this report is to introduce the reader to the Lyft self-driving car dataset, and empower them with enough knowledge to start training a model.
Table Of ContentsReproducing This AnalysisThe Lyft DatasetThe GoalThe 9 Classes In The Lyft DatasetThe Lyft DatasetLet's Dig A Little DeepLyft SDK's Data Visualization MethodsThe U-Net ModelThe Insights and Next StepsKey Insights
﻿
Reproducing This AnalysisHere are some links you might find useful as you start to explore the Lyft self-driving car dataset.
﻿Dataset﻿
﻿Lyft API﻿
﻿Logging point clouds with W&B﻿
﻿Code﻿
The Lyft DatasetThe Lyft dataset is composed of raw sensor camera and LiDAR inputs as perceived by a fleet of multiple, high-end, autonomous vehicles in a bounded geographic area. The dataset also includes high-quality, human-labeled 3D bounding boxes of traffic agents, and an underlying HD spatial semantic map.
﻿
The GoalThe goal's to predict the bounding volumes and classes of all of the objects in a given scene in the test dataset.
For example, for the sample_token 97ce3ab08ccbc0baae0267cbf8d4da947e1f11ae1dbcb80c3f4408784cd9170c, we might predict the presence of a car (with confidence 1.0) and a bus (with confidence 0.5). The prediction would look like this with all predicted objects on the same line.:
97ce3ab08ccbc0baae0267cbf8d4da947e1f11ae1dbcb80c3f4408784cd9170c,1.0 2742.152625996093 673.1631800662494 -18.6561112411676 1.834 4.609 1.648 2.619835541569646 car 0.5 2728.9634555684484 657.8296521874645 -18.54676216218047 1.799 4.348 1.728 -0.5425527100619654 bus
Note that confidence values are inserted prior to center_x center_y center_z width length height yaw class_name.
The 9 Classes In The Lyft DatasetLyft has 9 classes of objects in the dataset:
bus
bicycle
emergency_vehicle
truck
car
motorcycle
animal
pedestrian, and
other_vehicle.
As we can see below, car is the most dominant class. This means our models will have to be able to handle imbalanced classes and get really good at detecting things like pedestrians and motorcycles with a smaller amount of data.
﻿
The Lyft DatasetThe data contains many interlocking tables and formats - image and lidar files, bounding box annotations, and JSON files
There are 55,000 human-labeled 3D annotated frames of traffic agents. In addition the data set contains bitstreams from 7 cameras and 3 lidar sensors, and an HD spatial semantic map with 4,000 lane segments, 197 crosswalks, 60 stop signs, 54 parking zones, eight speed bumps, and 11 speed humps.
1. Image and Lidar filesThe images and lidar files all correspond to a sample in sample_data.json, and the sample_token from sample_data.json is the primary identifier used for the train and test samples.
2. Bounding Box AnnotationsThe annotations in train.csv are in the following format: center_x center_y center_z width length height yaw class_name
﻿
- center_x, center_y and center_z are the world coordinates of the center of the 3D bounding volume.
- width, length and height are the dimensions of the volume.
- yaw is the angle of the volume around the z axis (where y is forward/back, x is left/right, and z is up/down - making 'yaw' the direction the front of the vehicle / bounding box is pointing at while on the ground).
- class_name is the type of object contained by the bounding volume.
3. JSON filesIn addition to image files and bounding box data, the train and test sets also come with the following JSON files:
scene - 25-45 seconds snippet of a car's journey.
sample - An annotated snapshot of a scene at a particular timestamp.
sample_data - Data collected from a particular sensor.
sample_annotation - An annotated instance of an object within our interest.
instance - Enumeration of all object instance we observed.
category - Taxonomy of object categories (e.g. vehicle, human). 
attribute - Property of an instance that can change while the category remains the same.
visibility - (currently not used)
sensor - A specific sensor type.
calibrated sensor - Definition of a particular sensor as calibrated on a particular vehicle.
ego_pose - Ego vehicle poses at a particular timestamp.
log - Log information from which the data was extracted.
map - Map data that is stored as binary semantic masks from a top-down view.
The JSON files all contain single tables with identifying `tokens` that can be used to join with other files / tables. 
﻿
﻿
Let's Dig A Little DeepThe complexity of the dataset and the sheer abundance of JSON files might make the dataset seem intimidating. I found it helpful to visualize the contents of the JSON files. This helped me build an intuition around how to best to train my models.a 
Let's look at a few of the files. The time spent here will pay dividends once you start model training.
1. SampleAn annotated snapshot of a scene at a particular timestamp. Let's visualize the first annotated sample in this scene.
Instead of looking at camera and lidar data separately, we can also project the lidar pointcloud into camera images:
﻿
﻿
2. Sample_DataData collected from a particular sensor.
The dataset contains data that is collected from a full sensor suite. Hence, for each snapshot of a scene, Lyft provides references to a family of data that is collected from these sensors.
Let's render the sample_data at a particular sensor.
﻿
﻿
3. Sample_AnnotationAn annotated instance of an object within our interest.
sample_annotation refers to any *bounding box defining the position of an object seen in a sample**. All location data is given with respect to the global coordinate system. Let's examine an example from the sample above.
﻿
4. InstanceObject instances are instances that need to be detected or tracked by an AV (e.g a particular vehicle, or pedestrian). Lyft generally tracks an instance across different frames in a particular scene. However, it does not track them across different scenes. An instance record takes note of its first and last annotation token. Let's visualize them.
First annotated sample of this instance
﻿
﻿
Last annotated sample of this instance:Notice the position of the bounding box is different.
﻿
Lyft SDK's Data Visualization MethodsFinally, before we move on to the models, I want to share the data visualization methods that come with the lyft-dataset-sdk. These are immensely useful in visualizing model inputs and outputs.
List MethodsThere are 3 list methods.
list_categories() lists all categories, counts and statistics of width/length/height in meters and aspect ratio.
list_attributes() lists all attributes and counts.
list_scenes() lists all scenes in the loaded DB.
Render Methodslevel5data.render_pointcloud_in_image() - Renders pointclouds
level5data.render_sample() - Render all annotations across all sample data for that sample
level5data.render_sample_data() - Render data from a particular sensor
level5data.render_sample_data() - aggregate the point clouds from multiple sweeps to get a denser point cloud
level5data.render_annotation() - render a specific annotation
level5data.render_scene_channel(my_scene_token, 'CAM_FRONT') - render a full scene as a video (optionally for a specific channel)
level5data.render_egoposes_on_map() - visualize all scenes on the map for a particular location
Example of level5data.render_egoposes_on_map()
﻿
The U-Net ModelI trained a simplified version of the U-Net model on this dataset.
BaselineI started with a baseline model with the following attributes:
﻿
base_filters: 16
epochs: 5
train_size: 100
learning_rate: 0.00001
activation: relu
optimizer: adam
batch_norm: False
depth: 2
batch_size: 16
Experiments with HyperparametersI then tried a few different values for each of these hyperparameters (you can explore them in the run sets below).
<-- Analysis of experiments coming soon -->
U-NetThe U-Net architecture.
<-- Explain choice of U-Net architecture -->
﻿
﻿
﻿
﻿
The Insights and Next Steps
Next Steps﻿
LavanyaDone
Train baseline model
Measure effect of changing learning rate, batch size, optimizers, activation functions and other hyperparams
Next Steps
Add analysis of hyperparameter experiments
Explain my choice of U-net architecture
Add key insights
Run a sweep
Create visualizations (plot runtime vs epochs, plot runtime vs datapoints, Visualize Predictions - which classes were consistently misclassified?)
Dive deeper into misclassified cars (does the model miss cars of a specific dimension?, is it bad at recognizing cars at a particular place in its field of vision)
Train on base_filters, batch_size > 32, more training examples (I ran out of memory and haven't had a chance to run this model on a beefier machine yet, I ran these experiments on 100 training examples for 5 epochs each. Might be interesting to run experiments on more training examples for more epochs and see if results are consistent.)
StaceyTry model architectures other than Unet (It might be fun to try a couple of different model architectures and build on each other’s insights!)
Key Insights﻿
﻿
﻿
﻿
Add a comment