Skip to main content

LSTM no framestack

Created on November 10|Last edited on March 7

New new pilot run after fixing

  1. Store the initial hidden state for the rollouts
  2. Apply mask inside of LSTM (link)
Below the orange line is from openai/baselines, and green mine.

2M4M6M8MStep0100200300Episodic Return
Run set
1
Run set 2
1



New Pilot run after fixing

1. Layer initialization: the original implementation does orthogonal initialization with weights 1 and bias 0 for the LSTM stuff
([https://github.com/openai/baselines/blob/ea25b9e8b234e6ee1bca43083f8f3cf974143998/baselines/a2c/utils.py#L84-L86](https://github.com/openai/baselines/blob/ea25b9e8b234e6ee1bca43083f8f3cf974143998/baselines/a2c/utils.py#L84-L86))
2. By default, there are 128 LSTM units (https://github.com/openai/baselines/blob/ea25b9e8b234e6ee1bca43083f8f3cf974143998/baselines/common/models.py#L187), whereas in our implementation we had 512 for atari and 64 for memory env.
3. Finally, I have only applied the LSTM to the actor and not the critic, so this is an implementation error.


Run set
4
Run set 2
Run set 3
3
Run set 4
2


I used my fork Runnable Baselines to reproduce this work. Note that in particular, I have changed the frame_stack_size=1 to match the memory setting.
frame_stack_size = 1
env = make_vec_env(env_id, env_type, nenv, seed, gamestate=args.gamestate, reward_scale=args.reward_scale)
env = VecFrameStack(env, frame_stack_size)
This matches the ppo_atari_lstm_noframestack setting in the following report

To give it a run, try
WANDB_PROJECT=cleanrl \
WANDB_ENTITY=cleanrl \
OPENAI_LOGDIR=$PWD/runs \
OPENAI_LOG_FORMAT=tensorboard \
python -m baselines.run --alg=ppo2 \
--env=BreakoutNoFrameskip-v4 \
--network cnn_lstm \
--num_env 8 --track

Run set
1