Skip to main content
sorry
Projects
trlx
Reports
Sequence-wise v. token-wise mean KL
Log in
Sign up
Share
Comment
Star
Share
Comment
Star
Sequence-wise v. token-wise mean KL
Sorry
Created on April 19
|
Last edited on April 19
Comment
Section 1
reward/mean
reward/mean
0
1k
2k
3k
4k
5k
6k
Step
-1.4
-1.2
-1
-0.8
Run: ppo_hh/pythia-6B-static-sft/7gpus:main
1
Run: ppo_hh/pythia-6B-static-sft/7gpus:main
1
Add a comment