916 Tootsie Deeper Spoonbill
Created on April 25|Last edited on May 12
Comment
Building on the partial success of Hypnotic Spoonbill, we eventually determined that the low LR was interacting weirdly with the very large norms of the LM head, which led to the increasing loss. We decided to add z-loss (=1e-4) for the final phase and see if that helps. We ran a quick test (focused-spoonbill; purple), found that it seemed to avoid the loss increase, and then ran a new version of spoonbill (deeper-spoonbill-2; lime green) and it fixed it!
It does!
Training Results
Adding zloss fixes the training trajectory (purple and lime green):
This set of panels contains runs from a private project, which cannot be shown in this report
Analysis of Norms
You can see the lm_head going haywire here. Gray (spoonbill-norms-2) and Orange (spoonbill-2) are the same but gray has norm tracking turned on. Magenta is the same but has zloss. As you can see, it fixed the lm_head's exploding gradient.
Interestingly, this only happened once LR got low enough and also the loss increase happened quite some time after the gradient exploded.
Run set
3
Add a comment