PPO with add-ons on Cartpole-1

I added some features on CartPole

Created on February 26|Last edited on February 26

Comment

This env is small so the values found here are probably not correct for bigger envs. Here, a lambda close to 1 is better. That's probably because the variance of the experience is very limited.
The suprising result is that normalizing the advantage was apparently not very interesting, here.
Section 1﻿
3,9504,0004,0504,1004,1504,2004,2504,3004,3504,400rewards_auc0.800.820.840.860.880.900.920.940.960.981.00gae_lambda0.100.110.120.130.140.150.160.170.180.19clip_rangefalsetruenormalize_advant...2,3002,2502,2002,1502,1002,0502,0001,9501,9001,8501,8001,7501,7001,650Step
Sweep: wuh9z9u8 115
Sweep: wuh9z9u8 20
﻿
﻿
﻿
Sweep: wuh9z9u815
﻿
﻿

Add a comment