PPO on Cartpole

I wanted to do a first test on PPO to see what works and what doesn't.

Created on October 13|Last edited on October 13

Comment

﻿
Results﻿
what worked best:
a low clipping (0.12)
a low policy estimator number of parameters (80 seems to be enough)
a policy estimator learning rate not too high (above 0.009 is too high, 0.008 seemed just right)
not too many layers on the policy estimator
memory buffer size: not too much. 1000, as I found on the internet, seems just great.
more epochs is better
The parameters of the function approximator seem to not matter that much, so I'd better not make it too much heavy on computation.
Sweep: mdkjy86c 162
Sweep: mdkjy86c 20
﻿
﻿
﻿
Sweep: mdkjy86c62
﻿
﻿
﻿
Sweep: mdkjy86c62
﻿
﻿

Add a comment