Deep Reinforcment Learning

Deep Reinforcment Learning

OpenAI's gym and The Cartpole Environment

The OpenAI gym is an API built to make environment simulation and interaction for reinforcement learning simple. It also contains a number of built in environments (e.g. Atari games, classic control problems, etc).

One such classic control problem is Cart Pole, in which a cart carrying an inverted pendulum needs to be controlled such that the pendulum stays upright. The reward mechanics are described in the gym page for this environment.

image.png

https://gym.openai.com/envs/CartPole-v1/

Results of Applying DQN to the Cartpole problem

As part of the first stream in the series I mentioned above, I put together a model for solving the cartpole environment and got it training with a set of parameters that felt right from past experience. Usually a learning rate somewhere between 1e-3 and 1e-4 tends to work well. I set the epislon decay factor so that it would hit min epsilon or 5% by about half a million steps.

One thing I haven't mentioned so far is the target model. DQN tends to be very unstable unless you use two copies of the same model and update one of them every batch and the other one very rarely. How often that update happens becomes another hyper parameter. I set it to 5000 epochs (which equals 50,000 steps in the environment).

Results of Applying DQN to the Cartpole problem

Hyperparameter Sweep

Hyperparameter Sweep

Conclusion

Reinforcement learning is a very interesting idea, and in the past few years it has become even more powerful through the use of deep learning and modern hardware. DQN makes for a relatively pain free starting point for beginners who can focus on the simpler environments in OpenAI's gym.