New training script, pp=2, mp=1, regular adam
16 GPUs (wandb says 8 but rank goes from 0 to 15), nhidden=1024, num_layers=24, sparsity on. Unusually low loss.
Created on February 27|Last edited on February 27
Comment
Section 1
Run set
16
Run set
16
Add a comment