pretraining efforts

Created on March 27|Last edited on March 27

Comment

﻿
In a variety of configurations, there is no phasic difference in the achieved test loss. Val is best for smaller ratios, indicating the existence of some harder trials; OTOH though val saturates before eval, we might be overfitting? 
﻿
So, ball is in adaptation's court, if we really believe in pretraining.
Section 1﻿
eval_loss
eval_loss
2003004005006007008009001k2k3k4k5k6k7k8k9k10k20k30k40k50k60k70k80k90ktrainer/global_step0.3
eval_loss
eval_loss
123456789102030405060708090100200epoch
val_loss, val_infill_loss
val_loss, val_infill_loss
2003004005006007008009001k2k3k4k5k6k7k8k9k10k20k30k40k50k60k70k80k90ktrainer/global_step0.5
Run set5
﻿
﻿

Add a comment