Tootsie 8B dessert v1 ("zircon-badger")
See https://github.com/stanford-crfm/marin/issues/600 for narrative
Created on March 12|Last edited on May 12
Comment
Big Idea:
- Core tootsie DCLM mix to 3.7 T tokens
- Cooldown on Dolmino HQ data (without synth math or Flan) to 4.8T tokens
- add in flan and synthmath (and maintain other mix) for another 200B tokens at low LR
Flan was oversampled, and many synthmaths were inadvertently excluded
Lineage Runs
Run set
4
Run set
Add a comment