Playing Atari Pong with DQN and PPO

This experiment was done for an assignment in a graduate level course in RL. Code implementations are based on CleanRL project works

Hamza

Created on November 15|Last edited on November 16

Comment

﻿
Project repo: https://github.com/hamzakat/rl-lab﻿
﻿
Line charts are smoothed using a gaussian kernel smoothing with a 50% smoothing factor (docs)﻿﻿
💡
﻿
Deep Q-Network (DQN)﻿
(Mnih et al., 2013)
﻿
﻿
﻿
﻿
﻿
Proximal Policy Optimization (PPO) with Clipped Surrogate Loss﻿
﻿
﻿
 
Attempt 1: with learning rate = 2.5e-3﻿
﻿
Run set1
﻿
Attempt 2: with learning rate = 2.5e-4This attempt aimed to reduce the variance in the training episodic return shown in attempt 1.
﻿
﻿
Run set1
﻿
Comparison﻿
Run set3
﻿
﻿
ReferencesHuang, S., Dossa, R. F. J., Ye, C., Braga, J., Chakraborty, D., Mehta, K., & Araújo, J. G. M. (2022). CleanRL: High-quality Single-file Implementations of Deep Reinforcement Learning Algorithms. Journal of Machine Learning Research, 23(274), 1–18.
﻿https://github.com/vwxyzjn/cleanrl﻿
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., & Riedmiller, M. (2013). Playing Atari with Deep Reinforcement Learning (No. arXiv:1312.5602). arXiv. https://doi.org/10.48550/arXiv.1312.5602 
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal Policy Optimization Algorithms. https://arxiv.org/abs/1707.06347v2
﻿
﻿

Add a comment