Skip to main content
costa-huang
Projects
cleanRL
Reports
pipeline correct reward index
Log in
Sign up
Share
Comment
Star
Share
Comment
Star
pipeline correct reward index
Costa
Created on December 7
|
Last edited on December 31
Comment
objective/scores
objective/scores
500
1k
1.5k
2k
Step
-0.5
0
0.5
1
1.5
2
objective/score_total
objective/score_total
500
1k
1.5k
2k
Step
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
ppo/val/ratio
ppo/val/ratio
500
1k
1.5k
2k
Step
0.998
0.999
1
1.001
ppo/policy/approxkl_avg
ppo/policy/approxkl_avg
500
1k
1.5k
2k
Step
0.001
0.002
0.003
objective/score_total
objective/score_total
500
1k
1.5k
2k
Step
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
objective/kl
objective/kl
500
1k
1.5k
2k
Step
0
2
4
6
8
gpt2
1
gpt2-xl
1
original
5
gpt2 fix value
2
fix value with masked mean
1
pipeline with white spaces
1
Add a comment