Skip to main content

PPO on Cartpole

I wanted to do a first test on PPO to see what works and what doesn't.
Created on October 13|Last edited on October 13

Results



what worked best:

  • a low clipping (0.12)
  • a low policy estimator number of parameters (80 seems to be enough)
  • a policy estimator learning rate not too high (above 0.009 is too high, 0.008 seemed just right)
  • not too many layers on the policy estimator
  • memory buffer size: not too much. 1000, as I found on the internet, seems just great.
  • more epochs is better

The parameters of the function approximator seem to not matter that much, so I'd better not make it too much heavy on computation.

Sweep: mdkjy86c 1
62
Sweep: mdkjy86c 2
0



Sweep: mdkjy86c
62



Sweep: mdkjy86c
62