Homepage Demo

This article includes first runs and observations in the SafeLife benchmark in Weights & Biases, which was developed in collaboration with the Partnership on AI (PAI).

Stacey Svetlichnaya

Created on January 20|Last edited on January 20

Comment

﻿
Logged ExamplesThe line plots below show these metrics over the course of training (note that the x-axis fo these needs to be training/steps and not the default Steps, which tracks wandb.log steps). The bar charts report the final averages from benchmark levels.
Below the charts, you can click on individual runset tabs to show/hide each group of agents by task type (append, prune, or navigate) independently. Note that scores are not directly comparable across task types. After some fast tests, I tried using DQN instead of PPO (all worse), then modifying some of the PPO hyperparameters. 
﻿
training/reward
training/reward
200k400k600k800k1Mtraining/steps00.20.40.60.81
append_p_0.05_v_0.4
append_2xlr_vf_0.75
append_1M
append_6M
prune_6M
prune_1M_permute
prune_1M_2x_lr
prune_1M_2x_env
prune-spawn_1M
append4
prune5
 
navigate6
 
test runs2
 
DQN3
﻿
﻿

Add a comment