LSTM no framestack
Created on November 10|Last edited on March 7
Comment
New new pilot run after fixing
- Store the initial hidden state for the rollouts
Below the orange line is from openai/baselines, and green mine.
Run set
1
Run set 2
1
New Pilot run after fixing
1. Layer initialization: the original implementation does orthogonal initialization with weights 1 and bias 0 for the LSTM stuff
([https://github.com/openai/baselines/blob/ea25b9e8b234e6ee1bca43083f8f3cf974143998/baselines/a2c/utils.py#L84-L86](https://github.com/openai/baselines/blob/ea25b9e8b234e6ee1bca43083f8f3cf974143998/baselines/a2c/utils.py#L84-L86))
2. By default, there are 128 LSTM units (https://github.com/openai/baselines/blob/ea25b9e8b234e6ee1bca43083f8f3cf974143998/baselines/common/models.py#L187), whereas in our implementation we had 512 for atari and 64 for memory env.
3. Finally, I have only applied the LSTM to the actor and not the critic, so this is an implementation error.
Run set
4
3
Run set 4
2
I used my fork Runnable Baselines to reproduce this work. Note that in particular, I have changed the frame_stack_size=1 to match the memory setting.
frame_stack_size = 1env = make_vec_env(env_id, env_type, nenv, seed, gamestate=args.gamestate, reward_scale=args.reward_scale)env = VecFrameStack(env, frame_stack_size)
This matches the ppo_atari_lstm_noframestack setting in the following report
To give it a run, try
WANDB_PROJECT=cleanrl \WANDB_ENTITY=cleanrl \OPENAI_LOGDIR=$PWD/runs \OPENAI_LOG_FORMAT=tensorboard \python -m baselines.run --alg=ppo2 \--env=BreakoutNoFrameskip-v4 \--network cnn_lstm \--num_env 8 --track
Run set
1
Add a comment