Skip to main content

New training script, pp=2, mp=1, regular adam

16 GPUs (wandb says 8 but rank goes from 0 to 15), nhidden=1024, num_layers=24, sparsity on. Unusually low loss.
Created on February 27|Last edited on February 27

Section 1


1k2k3k4k5kStep102030405060
01k2k3k4k5kStep246810
1k2k3k4k5kStep1e+92e+93e+94e+9
Run set
16



Run set
16