Skip to main content

Homepage Demo

This article includes first runs and observations in the SafeLife benchmark in Weights & Biases, which was developed in collaboration with the Partnership on AI (PAI).
Created on January 20|Last edited on January 20

Logged Examples

The line plots below show these metrics over the course of training (note that the x-axis fo these needs to be training/steps and not the default Steps, which tracks wandb.log steps). The bar charts report the final averages from benchmark levels. Below the charts, you can click on individual runset tabs to show/hide each group of agents by task type (append, prune, or navigate) independently. Note that scores are not directly comparable across task types. After some fast tests, I tried using DQN instead of PPO (all worse), then modifying some of the PPO hyperparameters.

200k400k600k800k1Mtraining/steps00.20.40.60.81
append
4
prune
5
navigate
6
test runs
2
DQN
3