Homepage Demo
This article includes first runs and observations in the SafeLife benchmark in Weights & Biases, which was developed in collaboration with the Partnership on AI (PAI).
Created on January 20|Last edited on January 20
Comment
Logged Examples
The line plots below show these metrics over the course of training (note that the x-axis fo these needs to be training/steps and not the default Steps, which tracks wandb.log steps). The bar charts report the final averages from benchmark levels.
Below the charts, you can click on individual runset tabs to show/hide each group of agents by task type (append, prune, or navigate) independently. Note that scores are not directly comparable across task types. After some fast tests, I tried using DQN instead of PPO (all worse), then modifying some of the PPO hyperparameters.
append
4
prune
5
6
2
3
Add a comment