Tootsie 8B Main Phase ("original-ocelot")
See https://github.com/stanford-crfm/marin/issues/600 for narrative
Created on March 12|Last edited on May 12
Comment
Big Idea:
- (gray run) Core tootsie DCLM mix to ≈2.76T tokens using WSD-S and 4M token batch size
- (red run) Switched to 12M BS afterwards (on bigger hardware) using WSD (no cycles) and use the exponential moving averaging (EMA) as our proxy for model quality
Note that we threw out the cooldown on this run and only use the checkpoint at about 3.75e12 tokens
Lineage Runs
This set of panels contains runs from a private project, which cannot be shown in this report
Run set
Add a comment