Skip to main content

PPO vs RecurrentPPO (aka PPO LSTM) on environments with masked velocity (SB3 Contrib)

Created on May 1|Last edited on October 9
This is for checking that PPO with recurrent network actually works.
Hyperparameters were tuned for PPO, so they are probably not optimal for RecurrentPPO (PPO LSTM).
PPO with frame-stacking (giving an history of observation as input) is usually quite competitive if not better, and faster than recurrent PPO. Still, on some envs, there is a difference, currently on: CarRacing-v0 and LunarLanderNoVel-v2.
Note: on some env, critic lstm need to be disabled (on CarRacing env, atari because of framestack?) as it slows down the process and hinder performance.
RL Zoo (training scripts): https://github.com/DLR-RM/rl-baselines3-zoo



PPO LSTM vs PPO (no FrameStack)

PendulumNoVel-v1



Run set
4
Run set 2
3


LunarLanderNoVel-v2



Run set 2
3
Run set 3
3


CartPoleNoVel-v1


Run set
18
Run set 2
10



MountainCarContinuousNoVel-v0



Run set
10



LSTM vs FrameStacking

Note: n_stack=2

PendulumNoVel-v1 (n_stack=2)




Run set
11
Run set 2
3
Run set 3
3


LunarLanderNoVel-v2 (n_stack=2)




Run set 2
4
Run set 3
3
Run set 3
4


CartPoleNoVel-v1 (n_stack=2)


Run set 2
6
Run set 3
10
Run set 3
6



MountainCarContinuousNoVel-v0 (n_stack=2)



Run set 2
4
Run set 3
6



Run set 2
4
Run set 3
6


CarRacing-v0 (n_stack=2)

Note: stacking also enabled for PPO LSTM, but critic lstm disabled (faster training).


Run set 2
1
Run set 3
1

Felipe Bivort Haiek
Felipe Bivort Haiek •  
Hi nice qork. Do you happen to have the code for this?
1 reply