MountainCar-v0__dqn__2__1649474206
MountainCar-v0 Regression Investigation
Created on April 8|Last edited on April 9
Comment
Ok looks like for our particular set of hyperparameters, turning off handle_timeout_termination made it work…
Expand 140 lines ... | |||||
141 | target_network.load_state_dict(q_network.state_dict()) | 141 | target_network.load_state_dict(q_network.state_dict()) | ||
142 | 142 | ||||
143 | rb = ReplayBuffer( | 143 | rb = ReplayBuffer( | ||
144 | - | args.buffer_size, envs.single_observation_space, envs.single_action_space, device=device, handle_timeout_termination=False, | 144 | + |
|
145 |
| 145 |
| ||
146 | start_time = time.time() | 146 | start_time = time.time() | ||
147 | 147 | ||||
Expand 59 lines ... |
Run set
2
Looks like the replay buffer is the culprit...?
We have a regression on DQN's performance in MountainCar-v0, and based on the performance below... It looks like it was the replay buffer that caused the performance difference?
Run set
2
Run set
3
DQN.py performance regression.
Below shows our dqn.py (with source code here) can no longer solve MountainCar-v0 compared to the older version of dqn_old.py (with source code here).
Run set
6
Add a comment