Playing Atari Pong with DQN and PPO
This experiment was done for an assignment in a graduate level course in RL. Code implementations are based on CleanRL project works
Created on November 15|Last edited on November 16
Comment
Deep Q-Network (DQN)

(Mnih et al., 2013)
Proximal Policy Optimization (PPO) with Clipped Surrogate Loss

Attempt 1: with learning rate = 2.5e-3
Run set
1
Attempt 2: with learning rate = 2.5e-4
This attempt aimed to reduce the variance in the training episodic return shown in attempt 1.
Run set
1
Comparison
Run set
3
References
- Huang, S., Dossa, R. F. J., Ye, C., Braga, J., Chakraborty, D., Mehta, K., & Araújo, J. G. M. (2022). CleanRL: High-quality Single-file Implementations of Deep Reinforcement Learning Algorithms. Journal of Machine Learning Research, 23(274), 1–18.
- Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., & Riedmiller, M. (2013). Playing Atari with Deep Reinforcement Learning (No. arXiv:1312.5602). arXiv. https://doi.org/10.48550/arXiv.1312.5602
- Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal Policy Optimization Algorithms. https://arxiv.org/abs/1707.06347v2
Add a comment