Skip to main content
costa-huang
Projects
cleanRL
Reports
Train policy
Log in
Sign up
Share
Comment
Star
Share
Comment
Star
Train policy
Costa
Created on June 23
|
Last edited on July 12
Comment
ppo/objective/score, objective/scores
ppo/objective/score, objective/scores
50
100
150
200
250
Time (minutes)
0
1
2
3
objective/entropy, ppo/objective/entropy
objective/entropy, ppo/objective/entropy
50
100
150
200
250
Time (minutes)
10
20
30
40
50
objective/kl_coef, ppo/objective/kl_coef
objective/kl_coef, ppo/objective/kl_coef
50
100
150
200
250
Time (minutes)
0.1
0.15
0.2
0.25
0.3
0.35
objective/kl_coef, ppo/objective/kl_coef
objective/kl_coef, ppo/objective/kl_coef
Select runs that logged objective/kl_coef
to visualize data in this line chart.
objective/non_score_reward
objective/non_score_reward
500
1k
1.5k
Step
-1.5
-1
-0.5
0
objective/scores, ppo/objective/score
objective/scores, ppo/objective/score
Select runs that logged objective/scores
to visualize data in this line chart.
objective/kl, ppo/objective/kl
objective/kl, ppo/objective/kl
Select runs that logged objective/kl
to visualize data in this line chart.
torch adam 1e-5
1
openai
40
openai2
torch adam 5e-4
1
torch adam 5e-4 batch_size=64
1
Ours
3
openai
1
openai2
Add a comment