PPO vs RecurrentPPO (aka PPO LSTM) on environments with masked velocity (SB3 Contrib)
Created on May 1|Last edited on October 9
Comment
This is for checking that PPO with recurrent network actually works.
Hyperparameters were tuned for PPO, so they are probably not optimal for RecurrentPPO (PPO LSTM).
PPO with frame-stacking (giving an history of observation as input) is usually quite competitive if not better, and faster than recurrent PPO. Still, on some envs, there is a difference, currently on: CarRacing-v0 and LunarLanderNoVel-v2.
Note: on some env, critic lstm need to be disabled (on CarRacing env, atari because of framestack?) as it slows down the process and hinder performance.
Instructions to reproduce the results are in the doc: https://sb3-contrib.readthedocs.io/en/master/modules/ppo_recurrent.html#how-to-replicate-the-results
PPO LSTM vs PPO (no FrameStack)PendulumNoVel-v1LunarLanderNoVel-v2CartPoleNoVel-v1MountainCarContinuousNoVel-v0LSTM vs FrameStackingPendulumNoVel-v1 (n_stack=2)LunarLanderNoVel-v2 (n_stack=2)CartPoleNoVel-v1 (n_stack=2)MountainCarContinuousNoVel-v0 (n_stack=2)CarRacing-v0 (n_stack=2)
PPO LSTM vs PPO (no FrameStack)
PendulumNoVel-v1
Run set
4
Run set 2
3
LunarLanderNoVel-v2
Run set 2
3
Run set 3
3
CartPoleNoVel-v1
18
Run set 2
10
MountainCarContinuousNoVel-v0
Run set
10
LSTM vs FrameStacking
Note: n_stack=2
PendulumNoVel-v1 (n_stack=2)
11
Run set 2
3
Run set 3
3
LunarLanderNoVel-v2 (n_stack=2)
Run set 2
4
Run set 3
3
4
CartPoleNoVel-v1 (n_stack=2)
Run set 2
6
10
Run set 3
6
MountainCarContinuousNoVel-v0 (n_stack=2)
Run set 2
4
Run set 3
6
Run set 2
4
Run set 3
6
CarRacing-v0 (n_stack=2)
Note: stacking also enabled for PPO LSTM, but critic lstm disabled (faster training).
Run set 2
1
Run set 3
1
Add a comment
Hi nice qork. Do you happen to have the code for this?
1 reply