Skip to main content
costa-huang
Projects
cleanRL
Reports
Train policy (negative KL divergence)
Log in
Sign up
Share
Comment
Star
Share
Comment
Star
Train policy (negative KL divergence)
Costa
Created on June 23
|
Last edited on June 23
Comment
This actually reproduces the negative KL divergence issue!!!
ppo/objective/score, objective/scores
ppo/objective/score, objective/scores
50
100
150
200
250
Time (minutes)
-0.5
0
0.5
1
1.5
2
objective/scores
objective/scores
100
200
300
400
500
600
global_step
-0.9
-0.8
-0.7
-0.6
-0.5
-0.4
objective/kl, ppo/objective/kl
objective/kl, ppo/objective/kl
500
1k
1.5k
2k
Step
-1200
-1000
-800
-600
-400
-200
0
Ours
1
OAI
1
Ours
1
OAI
1
Add a comment