Skip to main content

Open RL Benchmark (new)

Created on February 21|Last edited on March 9
Open RL Benchmark by CleanRL is a comprehensive, interactive and reproducible benchmark of deep Reinforcement Learning (RL) algorithms. It uses Weights and Biases to keep track of the experiment data of popular deep RL algorithms (e.g. DQN, PPO, DDPG, TD3) in a variety of games (e.g. Atari, Mujoco, PyBullet, Procgen, Griddly, MicroRTS). The experiment data includes critical information for reproducibility such as hyper-parameters, the exact command to reproduce results, system metrics, logs, requirements.txt, source code, training metrics and videos of the agents playing the game.

Why this project

Because there are other import stuff we want to check in an RL experiment in addition to just the learning curves. We might also want to know, for example, how many hours did the experiment run, what was the GPU and CPU utilization percentage, how did the losses progress, or what behavior did the agents learn throughout training.
In Open RL Benchmark, we give users access to all these information in an interactive dashboard, as shown in the following demo:


We want to provide this experience for as many deep RL algorithms and games as possible. If you share this vision, consider checking out our contribution guide.

What are the results

Open RL Benchmark has over 1000+ experiments including runs from other projects, which is overwhelming to present in a single report. Instead, we present the results in separate reports. Please click on the links below to access them.

Atari results, Mujoco results, PyBullet results, Procgen results, Griddly results, MicroRTS results, Slimevolleygym results, PySC2 results, Other results.


As a quick demo, the following panel shows the performance of 4 deep RL algorithms (Apex-DQN, PPO, C51, and DQN) in the Atari game Breakout, where getting a return of 400 is considered decent.


BreakoutNoFrameskip-v4
14
Ant-v2
34
SlimeVolleySelfPlayEnv-v0
2

Feel free to play around with the panel above to check out our results on other algorithms or games, as shown in the demo below: