w/ and w/o LLM based canary

Created on January 25|Last edited on January 27

Comment

﻿
1.8 B v.s. 1.1 B     Trainable params
half of the effective batch size because of using 8 nodes
Section 1﻿
train_loss
train_loss
Select runs that logged train_loss 
to visualize data in this line chart.
train_step_timing in s
train_step_timing in s
51015202530Time (hours)0.40.60.811.21.4
reduced_train_loss
reduced_train_loss
51015202530Time (hours)123456
Run: crossbmg4egc_lhmain3cross_oci_FC-GPT_llama_tiny_canaryset_b6s4kf-ASR-AST_lr1e-4wd1e-3_CosineAnnealing_warmup2000_minlr1e-6_gbs1024_mbs16_ep200467
Run set 2
﻿
﻿
﻿

Add a comment