Training Reproducible Robots with W&B

How I'm using W&B in my Robot Training Workflow. Made by Armand du Parc Locmaria using Weights & Biases
Armand du Parc Locmaria
I want to build robots to fight entropy so we can have autocratic solarpunk farms and abolish scarcity.
Chobani solarpunk ad
I don't know much about robot training. So building the RL "hello world" cartpole environment and training it seemed like a good first step and a decent challenge.
My friend Pierre choose to embark with me on this journey and we gave ourselves 48 hours to build the robot.
In this report, I'll share how I'm using Artifacts to reduce wear and tear on the robot, Experiment Tracking as an easy to set up robot telemetry system and how using W&B helps with reproducibility and team work.
Assembling the robot
Incredibly, the hardware engineering part went smoothly. We managed to get the robot working under our time limit. But I foolishly believed the software and ML engineering would take us a few hours tops.
No, no, no.
What was supposed to be a weekend project is starting to look more like a research project 😬😬. This means I couldn't escape good engineering practices very long. I had to leverage W&B to move forward with this project.
Training in progress. We went for a rotary version of the CartPole. It seemed easier to build and had a smaller footprint.

Using Artifacts to Reduce Wear and Tear on the Robot

Repairing the robot is a bit costly and also a bit annoying. Plus, I want to avoid hurting it too much (also we wouldn't want it coming back for us would we?). To do so, I need to make sure I'm using the robot only when truly necessary. I need to keep track of the outcomes of each training run and I want to earn dividends each second the robot is on.
To train this robot, I'm using the SAC algorithm which uses a replay buffer. You can think of the replay buffer as the memory of the robot. It is a bunch of environment transitions that are collected while the robot is on.
We then use those transitions to train a policy. The policy is like the brain of the robot. It takes in environment observations and outputs robot actions.
It is crucial to keep track of those as, again, they are expensive to collect and keep our robot happier. To do so I'm using Artifacts. Artifacts is W&B's version controlled cloud storage. It lets us keep track of data lineage––using it we always know what policy was trained on which version of the environment and with which hyper-parameters!
Saving the policy (or the replay buffer) is as simple as:
model_path = f"runs/{}/models/" # SB3 synthax to save policyartifact = wandb.Artifact("sac_model", type="model")artifact.add_file(model_path)wandb.log_artifact(artifact)
After this, I can retrieve the replay buffer to do some further training or the policy to deploy in on the robot!
Learn more about Artifacts in this video or try our Artifacts quickstart notebook on Google Colab →

Using Experiment Tracking as a Robot Telemetry System

During this project, I encountered plenty of confusing bugs. For example: check out the robot deciding to spin out of control.
Robot losing its mind, followed by me pressing the emergency cutoff switch (red button).
That is very weird.
To debug this I need to know what's happening onboard. What do system metrics look like? Are we using too many resources on the onboard computer? What actions is the agent taking? Are we reading the sensors correctly?
After all, I want the robot to be self-contained. If it had legs I would like to be able to release it in the wild for it to learn to walk on its own. This means that all the computations (notably NN computations) happen on the onboard computer. At the time of the bug, I was using a Raspberry Pi which, as it turns out, wasn't powerful enough to run my code. This caused my control loop to hang. It meant the last action (for example full speed to the right 😬) kept being executed and this caused the robot to spin fast without anything to stop it.
I found out about this issue by using Experiment Tracking as an easy-to-set-up robot telemetry system. Essentially, experiment tracking allows us to log anything to W&B servers. It then provides us with a nice dashboard keeping track of our relevant metrics. Usually, we log metrics such as losses and accuracies. But we are completely free to log everything and anything, such as the state of our robot!
This turns W&B Experiment Tracking into a telemetry dashboard that barely requires any setup:
wandb.init(project="furuta") # init wandb run[...]obs, reward, done, info = env.step(action)wandb.log(info) # log our pendulum's state
For me logging the "loop time" (the time from action to action) greatly helped make sure the code was running at the desired control frequency and not hanging.
Logging very frequently like this could cause performances issues, feel free to check out our Limits & Performances docs if you're having any troubles.

Now: onto the next problem. During the exploration phase (where the robot explores actions and their associated reward) I use what's called generalized State Dependent Exploration (gSDE). When training robots in simulations, we send Gaussian noise as the action during the exploration phase. Unfortunately, a real robot can't execute these actions as they are not temporally correlated. The motor physically can't go from full power clockwise to full power counter-clockwise. In consequence, the system acts as a low pass filter. This makes exploration difficult or even ineffective. Furthermore, it can damage the actuators.
gSDE solves these issues by deriving the exploration actions based on the system state (based on the policy features, to be precise). As a result, if the system state doesn't change the action doesn't change.
During one exploration experiment, the robot went from taking random actions as expected to taking no action at all. Can you spot what's wrong?
Looking at the logs, the pendulum angle reads 0. Upon further investigation, it turned out that the cable to the angle sensor broke. This meant no reading for the pendulum angle, which meant less variation in the state, which meant less variation in the action! After fixing the broken cable, everything went back to normal!
Learn more about Experiment Tracking by exploring our demo dashboard→

Logging Everything Helps With Reproducibility

The furuta pendulum is a common robot to build: there are tons of demos of it on YouTube. But oddly, reproducing these demos has been OSINT work, a treasure hunt even. We had to gather pieces of the puzzle from all over the internet. From technical documentation to YouTube demos passing by semi-hidden git repositories.
This is fun but doesn't help with the reproducibility crisis.
Me and Pierre scouring the internet for any info on how to build these things.
Using W&B as a system of records for experiments, trained policies, replay buffers, and code puts more pieces of the puzzle in one place, in a structured manner. I hope that this will help make this project reproducible!

Logging Everything Makes it Easier to Get Help

Since I'm learning a lot with this project I have to make a lot of assumptions. Most of them turn out to be wrong, some of them to be right. But in both cases, these give me precious insights on what to try next and what to ask people on my team that know a little bit more than I do.
W&B Teams make it easy to share findings in reports, have a centralized dashboard for each of your experiments and look for feedback from others on your team.
Run logs give useful context to the person that's helping you:
Getting precious help


In this report we saw how you can:
I hope you found this useful! If you are interested in RL applied to robotics I recommend watching this presentation by Antonin Raffin. It covers relevant tricks to work with robots!