Skip to main content

MountainCar-v0 Regression Investigation

Created on April 8|Last edited on April 9
Ok looks like for our particular set of hyperparameters, turning off handle_timeout_termination made it work…

Expand 140 lines ...
141
    target_network.load_state_dict(q_network.state_dict())
141
    target_network.load_state_dict(q_network.state_dict())
142
142
143
    rb = ReplayBuffer(
143
    rb = ReplayBuffer(
144
-
        args.buffer_size, envs.single_observation_space, envs.single_action_space, device=device
, handle_timeout_termination=False,
144
+
        args.buffer_size, envs.single_observation_space, envs.single_action_space, device=device
145
    )
145
    )
146
    start_time = time.time()
146
    start_time = time.time()
147
147
Expand 59 lines ...
500k1M1.5Mglobal_step0.20.40.60.8
500k1M1.5Mglobal_step-200-180-160-140-120-100
Run set
2

Looks like the replay buffer is the culprit...?
We have a regression on DQN's performance in MountainCar-v0, and based on the performance below... It looks like it was the replay buffer that caused the performance difference?

Run set
2



Run set
3



DQN.py performance regression.

Below shows our dqn.py (with source code here) can no longer solve MountainCar-v0 compared to the older version of dqn_old.py (with source code here).

Run set
6