A Brief Overview of DQN training
This report discusses the results of training DQNs
Created on March 25|Last edited on March 25
Comment
Training Loss
Run set
1
One interesting phenomenon to watch out for is that the training loss actually increases the longer the network trains; however, it does so in cycles. So a good indicator that your training code is bug-free is to make sure that the training loss goes down until the target network and the policy network synchronize. The graph here shows one of the runs where the sync time is very long ( ~ 100 episodes) and you can clearly observe that the loss spikes and then decreases in a cycle.
Test Results
Run set
1
The first panel shows the agent before it begins training, notice how it fails almost immediately. The second panel shows the model after training for ~1000 episodes; you can observe it trying to ever so slow slightly correct the orientation of the pole until it ultimately fails.
Parameter Importance
Run set
8
Here is a neat feature of W&B that I love, We can visualize the importance of all the hyperparameters in our runs (Keep in mind, this works a lot better when you have a whole bunch of runs ). I have run a few experiments where I changed the sync time, the epsilon decay, as well as the gamma values. Let's look at the effects each of those parameters has on our model.
Sync Time
Run set
3
We can right away notice that increasing the sync time slows down the training of the network but makes it more stable while decreasing the sync time makes the network highly unstable. The baseline parameters I chose here are from PyTorch's DQN tutorial. The higher sync value also prevents the loss from exploding.
Epsilon Decay Rate
Run set
3
We can observe that changing the epsilon decay rate has a very subtle effect on our training process, all runs seem to behave quite similarly to each other; however the sudden spike in training loss occurs at different points in time. Since the effects are very mild, we would need a lot more runs to properly understand the effect of this parameter.
Gamma
Run set
3
The effects of gamma are subtle but clearly visible in our training process. A change in the gamma value actually changes the stability of our training process ( observe the loss diagrams ); I have a hypothesis that gamma values have a very small range in which they don't destabilize the training process, we would need a lot more runs to statistically confirm this idea.
Add a comment