Debugging a Self-Driving RC Car

How I gained insights about my model in production. Made by Armand du Parc Locmaria using Weights & Biases
Armand du Parc Locmaria
A few months ago I created a YouTube video explaining what a regression task in machine learning is. To do so I built an NVIDIA Jetracer autonomous RC car and trained it to drive on a masking tape racetrack.
Unfortunately, I didn't log much at the time. I ran into weird bugs (like why does it keep crashing into the wall 🙃🙃) and was left with more questions than before I started the project. Did I have enough compute to drive faster? How long does it take to infer one image? Am I using the GPU to its full capacity?
NVIDIA Jetracer RC Car with a W&B license plate
In this report I'll start by explaining how I monitored the car's control policy. Then we'll take a look at an issue I resolved using W&B. Finally I'll dig into a run log to gain insights on what's actually happening in production!
My hope is that this will give you ideas about how to leverage W&B in small robotics hobby projects! Let's get started.

Monitoring the Car's Control Policy

First of all: how does this car drive and what's the control policy?
The car's POV along with the annotated track center
The system is simple. We are using a neural net to predict the center of the racetrack from the car's camera. We then pass it to our rule based control policy:
def control_policy(road_center, config): x, y = road_center steering = x * STEERING_GAIN throttle = config.throttle * THROTTLE_GAIN return throttle, steering
As you can see, we take the value of the center along the x axis and make that our steering signal. Effectively this means that we steer to the right when the center is on the right and to the left when the center is on the left.
If you look closely you'll see we're multiplying x with a STEERING_GAIN variable. It is a magic number that we tune by hand to make the car steer more or less aggressively.
If it is too high, the car is going to over-steer and zig-zag. If it's too low it's going to miss turns. To evaluate the STEERING_GAIN, I am logging data coming from the car's Inertial Measurement Unit.
Here, I'm displaying the yaw acceleration (gyroscope_z), as well as the lateral acceleration (accelerometer_x). If the gain was well-tuned, the lateral and yaw accelerations would be constant and close to 0 outside of turns (the big spikes). This would mean the car is not doing zig-zags.
Here I would say we're not doing too bad! We can clearly see the acceleration resulting from the turns and see that it's pretty minimal outside of those.

How to Debug a Self-Driving RC Car

The short answer is: log as many metrics as you can.
For me, the key was to log the 'Time per Frame.' This is the time it takes for the model to analyze one image.
See, the first time I worked on this project I spent a few days struggling to get the car to drive by itself. I would collect and annotate data, then train a model, deploy it to the car, watch the car crash into a wall, and go back to collecting data.
My first intuition was that the model wasn't good enough, causing the car to go off track. It took me quite some time to realize that the issue wasn't the model but the compute power of the onboard computer.
On the left you can see the model's predictions that are looking good: the predicted center of the track is indeed in the center of the track.
On the right you can see the time it takes to infer one image: it's about 0.03s. This means that the maximum frame-rate we could run the model at is 1/0.03 = 33 frames per second.
But looking at the run's config the frame-rate is 65 fps. In other words, we're trying to analyze too many images per second! For it to work we would need the Time per Frame to be 1/65s at most. This is the limit which is represented by the orange line on the graph.
Being above the limit meant that the computer was quickly falling behind and thus steering too late.
Dropping the frame-rate below 33 to 10 frames per second did the trick! You can now see that the Time per Frame is below the limit.
Side note: The spikes in Time per Frame are caused by the periodic logging of frames.

Answering Questions About the Model in Production

After the first iteration of this project, I had quite a few questions.
What's nice is that logging a few things during one single run was enough to answer most of these!
What if we wanted to drive FASTER? We would need to increase the frame-rate so that the car doesn't miss important information when driving fast. To increase the frame-rate we need to know what is the inference time? Are we using the GPU to its full capacity?
Referring back to the debugging section above, the inference time for the current model is 0.03s. So the maximum frame-rate is 33 fps.
Looking at the GPU Usage over time, it looks like it's not used 100% of the time. It is common for "under the hood" operations (reading from disk, moving memory, spinning up GPU kernels) to take up most of the time in most workflows, not the matrix math. This means we should definitely take a look at Performance Tuning and Profiling to understand what is slowing us down and if there is room for optimization!
Still, we could increase the frame-rate to drive faster.
But hold on, isn't increasing the frame-rate going to draw more power? What's the power consumption? Are we close to the battery's limits and running the risk of the computer shutting down?
Looking at the Power Consumption graph above, we can see that it stays constant around 5W. The battery can output a maximum of 10W so we're well below the limit. We should be able to handle higher frame-rates!
Driving faster also means less room for error! Are our prediction on unseen data precise enough?
Again, going back to the debugging section. Looking at images from the run we can see that the model gives satisfying results.
So good news: with a bit of work we should be able to drive faster!!
I hope this report was useful! You can check out the YouTube video for a different take on the project! Or this report that goes over building the pipeline using W&B Artifacts.