PPO on Cartpole
I wanted to do a first test on PPO to see what works and what doesn't.
Created on October 13|Last edited on October 13
Comment
Results
what worked best:
- a low clipping (0.12)
- a low policy estimator number of parameters (80 seems to be enough)
- a policy estimator learning rate not too high (above 0.009 is too high, 0.008 seemed just right)
- not too many layers on the policy estimator
- memory buffer size: not too much. 1000, as I found on the internet, seems just great.
- more epochs is better
The parameters of the function approximator seem to not matter that much, so I'd better not make it too much heavy on computation.
Sweep: mdkjy86c 1
62
Sweep: mdkjy86c 2
0
Sweep: mdkjy86c
62
Sweep: mdkjy86c
62
Add a comment