Proximal Policy Optimization (PPO) Experiments

Here we can observe three experiments with PPO RL algorithm (with different number of timesteps {2000,5000,10000}) seperately (with non-smoothing form) and finally all three together (with smoothing form).

Spyros Briakos

Created on February 28|Last edited on May 5

Comment

﻿
Experiment with 2.000 timesteps﻿
﻿
Rewards vs Episodes
Rewards vs Episodes
02004006008001k1.2kEpisode0200400600800Reward
ALEXA_2000
Run set142
﻿
Experiment with 5.000 timesteps﻿
Run set142
﻿
Experiment with 10.000 timesteps﻿
Run set142
﻿
Final chart with all experiments﻿
Run set3
﻿
Specs of Experiments﻿
Run set3
﻿
﻿

Add a comment