Skip to main content

data efficiency hyperparams

Created on June 22|Last edited on September 8



grouped runs


0.050.060.070.080.090.10.20.30.40.50.60.70.80.9run_progress45
0.050.060.070.080.090.10.20.30.40.50.60.70.80.9run_progress345678
150m opt hp
0
300m opt hp
0
600m opt hp
4
1.4b opt hp
7
1.4b, 419M
3
1.4b, 209M
3
Run set 7
0


overparametrized wd


1.4b, 209M, lr0.003
3
1.4b, 209M, lr0.001
10
Run set 3
10



overparametrized wd (ensembles)


wd6.40
532
wd3.20
532
wd1.6
532
wd0.8
532



1.4b ensembles (based on rough optimal hps)


1.4b, 1.7B seed tokens
1779
1.4b, 838M seed tokens
1779
1.4b, 419M seed tokens
1779
1.4b, 209M seed tokens
1779



1.4b, 1.7B seed tokens
532
1.4b, 838m
532
1.4b, 419m
532
1.4b, 209m
532



1.4b, 200m tokens, 16x epoched, ensembles


Run set
1779



Run set
532



1.4b, 200m tokens, 32x and 16x, 6.4 wd


x32
1779
x16-wd6.4
1779