Skip to main content
huggingface
Projects
trl
Reports
Online DPO experiments for TL;DR summarisation
Log in
Sign up
Share
Comment
Star
Online DPO experiments for TL;DR summarisation
Lewis Tunstall
Created on August 28
|
Last edited on August 29
Comment
Section 1
train/val/contain_eos_token
train/val/contain_eos_token
500
1k
1.5k
2k
2.5k
train/global_step
0
0.2
0.4
0.6
0.8
pythia-2.8b-deduped-tldr-online-dpo
pythia-1b-deduped-tldr-online-dpo
Run set
2
Run set
2
Run set
2
Run set
2
Run set
2
Run set
2
Run set
2
Add a comment