PPO vs RecurrentPPO (aka PPO LSTM) on environments with masked velocity (SB3 Contrib)

Created on May 1|Last edited on October 9

Comment

This is for checking that PPO with recurrent network actually works.
Hyperparameters were tuned for PPO, so they are probably not optimal for RecurrentPPO (PPO LSTM).
 PPO with frame-stacking (giving an history of observation as input) is usually quite competitive if not better, and faster than recurrent PPO. Still, on some envs, there is a difference, currently on: CarRacing-v0 and LunarLanderNoVel-v2.
Note: on some env, critic lstm need to be disabled (on CarRacing env, atari because of framestack?) as it slows down the process and hinder performance.
Code: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib﻿
RL Zoo (training scripts): https://github.com/DLR-RM/rl-baselines3-zoo﻿
Instructions to reproduce the results are in the doc: https://sb3-contrib.readthedocs.io/en/master/modules/ppo_recurrent.html#how-to-replicate-the-results﻿
﻿
PPO LSTM vs PPO (no FrameStack)PendulumNoVel-v1LunarLanderNoVel-v2CartPoleNoVel-v1MountainCarContinuousNoVel-v0LSTM vs FrameStackingPendulumNoVel-v1 (n_stack=2)LunarLanderNoVel-v2 (n_stack=2)CartPoleNoVel-v1 (n_stack=2)MountainCarContinuousNoVel-v0 (n_stack=2)CarRacing-v0 (n_stack=2)
﻿
PPO LSTM vs PPO (no FrameStack)
PendulumNoVel-v1﻿
﻿
Run set4
Run set 23
﻿
LunarLanderNoVel-v2﻿
﻿
Run set 23
Run set 33
﻿
CartPoleNoVel-v1﻿
 
Run set18
Run set 210
﻿
﻿
MountainCarContinuousNoVel-v0﻿
﻿
Run set10
﻿
﻿
LSTM vs FrameStackingNote: n_stack=2
PendulumNoVel-v1 (n_stack=2)﻿
﻿
﻿
 
Run set11
Run set 23
Run set 33
﻿
LunarLanderNoVel-v2 (n_stack=2)﻿
﻿
﻿
Run set 24
Run set 33
 
Run set 34
﻿
CartPoleNoVel-v1 (n_stack=2)﻿
Run set 26
 
Run set 310
Run set 36
﻿
﻿
MountainCarContinuousNoVel-v0 (n_stack=2)﻿
﻿
Run set 24
Run set 36
﻿
﻿
﻿
Run set 24
Run set 36
﻿
CarRacing-v0 (n_stack=2)Note: stacking also enabled for PPO LSTM, but critic lstm disabled (faster training).
﻿
﻿
Run set 21
Run set 31
﻿
﻿

Add a comment

Felipe Bivort Haiek • 3 years ago

Hi nice qork. Do you happen to have the code for this?

1 reply