Skip to main content

Standard scaling runs

Created on August 5|Last edited on August 14

1e+12e+13e+14e+15e+16e+1trainer.train_batch_size3.544.555.5eval/dclm/loss
1e+32e+33e+34e+31e+4trainer.num_train_steps3.544.555.5eval/dclm/loss
1e+02e+01e+1optimizer.weight_decay3.53.63.73.83.94eval/dclm/loss
batch size
4
epoching
8
epoching w/ weight decay
8
weight decay
24
chinchilla single runs
4