Skip to main content

w/ and w/o LLM based canary

Created on January 25|Last edited on January 27
1.8 B v.s. 1.1 B Trainable params
half of the effective batch size because of using 8 nodes

Section 1


Select runs that logged train_loss
to visualize data in this line chart.
51015202530Time (hours)0.40.60.811.21.4
51015202530Time (hours)123456
Run: crossbmg4egc_lhmain3cross_oci_FC-GPT_llama_tiny_canaryset_b6s4kf-ASR-AST_lr1e-4wd1e-3_CosineAnnealing_warmup2000_minlr1e-6_gbs1024_mbs16_ep200
467
Run set 2