1.8 B v.s. 1.1 B Trainable params
half of the effective batch size because of using 8 nodes
Section 1
Select runs that logged train_loss
to visualize data in this line chart.
train_step_timing in s
train_step_timing in s
reduced_train_loss
reduced_train_loss
Run: crossbmg4egc_lhmain3cross_oci_FC-GPT_llama_tiny_canaryset_b6s4kf-ASR-AST_lr1e-4wd1e-3_CosineAnnealing_warmup2000_minlr1e-6_gbs1024_mbs16_ep200
467