Skip to main content

Proximal Policy Optimization (PPO) Experiments

Here we can observe three experiments with PPO RL algorithm (with different number of timesteps {2000,5000,10000}) seperately (with non-smoothing form) and finally all three together (with smoothing form).
Created on February 28|Last edited on May 5

Experiment with 2.000 timesteps



02004006008001k1.2kEpisode0200400600800Reward
Run set
142


Experiment with 5.000 timesteps


Run set
142


Experiment with 10.000 timesteps


Run set
142


Final chart with all experiments


Run set
3


Specs of Experiments


Run set
3