Skip to main content
costa-huang
Projects
cleanRL
Reports
next: sample temperature tldr
Log in
Sign up
Share
Comment
Star
Share
Comment
Star
next: sample temperature tldr
Costa
Created on September 27
|
Last edited on September 30
Comment
ppo/episode
ppo/episode
200
400
600
800
global_step
100000
200000
300000
400000
rouge/rougeL
rouge/rougeL
200
400
600
800
global_step
0.19
0.195
0.2
0.205
0.21
0.215
objective/kl
objective/kl
200
400
600
800
global_step
0
5
10
15
20
25
30
objective/score_total
objective/score_total
200
400
600
800
global_step
-1
-0.5
0
0.5
1
1.5
test/accuracy
test/accuracy
500
1k
1.5k
global_step
0.6
0.62
0.64
0.66
0.68
objective/scores
objective/scores
200
400
600
800
global_step
-1
-0.5
0
0.5
1
1.5
2
objective/kl_coef
objective/kl_coef
200
400
600
800
global_step
0
0.02
0.04
0.06
0.08
0.1
new set
2
with reference response logging
1
direct reference response score normalization
1
old
1
correct
1
Add a comment