Skip to main content
openrlbenchmark
Projects
lm_human_preference_details
Reports
pipeline
Log in
Sign up
Share
Comment
Star
Share
Comment
Star
pipeline
Costa
Created on September 21
|
Last edited on September 22
Comment
objective/scores, ppo/mean_scores
objective/scores, ppo/mean_scores
0
500
1k
1.5k
Step
-0.5
0
0.5
1
1.5
2
2.5
objective/kl
objective/kl
0
500
1k
1.5k
Step
0
2
4
6
8
Our repro of OAI's codebase + TRL's sentiment pipeline
4
TRL + sentiment pipeline
5
TRL + gpt2-xl + sentiment pipeline
5
Add a comment