MountainCar-v0 Regression Investigation

Created on April 8|Last edited on April 9

Comment

Ok looks like for our particular set of hyperparameters, turning off handle_timeout_termination made it work…
﻿
MountainCar-v0__dqn__2__1649474206
MountainCar-v0__dqn__2__1649455932
Expand 140 lines ...
    target_network.load_state_dict(q_network.state_dict())    target_network.load_state_dict(q_network.state_dict())
    rb = ReplayBuffer(    rb = ReplayBuffer(
-        args.buffer_size, envs.single_observation_space, envs.single_action_space, device=device, handle_timeout_termination=False,+        args.buffer_size, envs.single_observation_space, envs.single_action_space, device=device
    )    )
    start_time = time.time()    start_time = time.time()
Expand 59 lines ...
charts/epsilon
charts/epsilon
500k1M1.5Mglobal_step0.20.40.60.8
MountainCar-v0__dqn__2__1649474206
MountainCar-v0__dqn__2__1649455932
charts/episodic_return, charts/episode_reward
charts/episodic_return, charts/episode_reward
500k1M1.5Mglobal_step-200-180-160-140-120-100
MountainCar-v0__dqn__2__1649474206
Run set2
﻿
Looks like the replay buffer is the culprit...?
We have a regression on DQN's performance in MountainCar-v0, and based on the performance below... It looks like it was the replay buffer that caused the performance difference?
﻿
Run set2
﻿
﻿
﻿
Run set3
﻿
﻿
DQN.py performance regression.Below shows our dqn.py (with source code here) can no longer solve MountainCar-v0 compared to the older version of dqn_old.py (with source code here).
﻿
Run set6
﻿
﻿

Add a comment

		Expand 140 lines ...
141		target_network.load_state_dict(q_network.state_dict())	141		target_network.load_state_dict(q_network.state_dict())
142			142
143		rb = ReplayBuffer(	143		rb = ReplayBuffer(
144	-	args.buffer_size, envs.single_observation_space, envs.single_action_space, device=device , handle_timeout_termination=False,	144	+	args.buffer_size, envs.single_observation_space, envs.single_action_space, device=device
145		)	145		)
146		start_time = time.time()	146		start_time = time.time()
147			147
		Expand 59 lines ...