[WIP] APO on Gym Mujoco
APO performance on 3 seeds 1M steps
Created on June 22|Last edited on June 27
Comment
I found out that APO is sensitive to gae-lambda, so for each env I first check [0.8, 0.9, 0.95, 0.99] values for 500k steps and pick best for 1M here. All other hparams are set to default, as in code (TLDR: they are identical to PPO). For PPO runs available only for v2 envs, so I will show them on the same seeds.
P.S. I expect that APO will be better in Swimmer, HalfCheetah, Ant, but worse in Hopper, Walker as they have unsafe states, which average reward doesn't handle by default.
Swimmer-v3
gae-lambda: 0.99
videos
CleanRL's avg_ppo_continuous_action.py
3
CleanRL's ppo_continuous_action.py
10
HalfCheetah-v3
gae-lambda: 0.9
APO
3
PPO
10
Ant-v3
gae-lambda 0.8
APO 0.8
3
PPO
10
Walker2d-v3
TODO
APO
3
PPO
3
Hopper-v3
TODO
APO 0.99
3
PPO
3
Humanoid-v3
TODO
Add a comment