PPO with add-ons on Cartpole-1
I added some features on CartPole
Created on February 26|Last edited on February 26
Comment
This env is small so the values found here are probably not correct for bigger envs. Here, a lambda close to 1 is better. That's probably because the variance of the experience is very limited.
The suprising result is that normalizing the advantage was apparently not very interesting, here.
Section 1
Sweep: wuh9z9u8 1
15
Sweep: wuh9z9u8 2
0
Sweep: wuh9z9u8
15
Add a comment