Tootsie 8B dessert v1 ("zircon-badger")
See https://github.com/stanford-crfm/marin/issues/600 for narrative
Created on March 12|Last edited on May 12
Comment
Big Idea:
- Core tootsie DCLM mix to 3.7 T tokens
- Cooldown on Dolmino HQ data (without synth math or Flan) to 4.8T tokens
- add in flan and synthmath (and maintain other mix) for another 200B tokens at low LR
Flan was oversampled, and many synthmaths were inadvertently excluded
Lineage Runs
This set of panels contains runs from a private project, which cannot be shown in this report
Run set
6662
Add a comment