pretraining efforts
Created on March 27|Last edited on March 27
Comment
In a variety of configurations, there is no phasic difference in the achieved test loss. Val is best for smaller ratios, indicating the existence of some harder trials; OTOH though val saturates before eval, we might be overfitting?
So, ball is in adaptation's court, if we really believe in pretraining.
Section 1
Run set
5
Add a comment