Skip to main content

RLHF w/ different base models

Created on September 11|Last edited on September 12

exp_name: train_policy_accelerate
task.policy.initial_model: 124M
meta
1h 48m
3mo 10d 16h 8m 22s
2m 20s
config
run
-
10
-
-
["/tmp/save/train_policy/testdesc-2309211533","/tmp/save/train_policy/testdesc-2309212027","/tmp/save/train_policy/testdesc-2309220121","/tmp/save/train_policy/testdesc-2309220614","/tmp/save/train_policy/testdesc-2309221107","/tmp/save/train_policy/testdesc-2309221601","/tmp/save/train_policy/testdesc-2309222056","/tmp/save/train_policy/testdesc-2309230149","/tmp/save/train_policy/testdesc-2309230642","/tmp/save/train_policy/testdesc-2309231136"]
-
-
300
-
-
5.5
-
task
policy
-
124M
-
-
0.7
-
0.7
-
0.7
gpt2
-
cerebras/Cerebras-GPT-111M
true
-
true
-
-
false
train_policy_accelerate
-
train_policy_accelerate
0
-
0
["models/train_both_accelerate__10__1693354724/policy.pt","models/train_both_accelerate__1__1693354719/policy.pt","models/train_both_accelerate__2__1693354723/policy.pt","models/train_both_accelerate__3__1693354722/policy.pt","models/train_both_accelerate__4__1693354720/policy.pt","models/train_both_accelerate__5__1693354722/policy.pt","models/train_both_accelerate__6__1693354723/policy.pt","models/train_both_accelerate__7__1693354719/policy.pt","models/train_both_accelerate__8__1693354721/policy.pt","models/train_both_accelerate__9__1693354723/policy.pt"]
-
models/train_both_accelerate__1__1694095504/policy.pt
5.5
-
1
-
sentiment
-
true
-
true
true
-
true
lm_human_preference_details
-
cleanrl
summary
_wandb
6185
13193.25
134
elapsed
steps
-
46896
-
Select runs that logged objective/scores
to visualize data in this line chart.
Select runs that logged objective/kl
to visualize data in this line chart.
Ours + gpt2
10
openai/lm-human-preferences
40
Ours + CerebrasGPT
1
Run set 4
0