Tootsie 8B dessert v2 ("fiery-hippo")
See https://github.com/stanford-crfm/marin/issues/600 for narrative
Created on March 12|Last edited on May 12
Comment
Big Idea:
- Core tootsie DCLM mix to 3.7 T tokens
- Cooldown on Dolmino HQ data (without synth math or Flan) to 4.8T tokens
- add in flan and synthmath (and maintain other mix) for another 200B tokens at low LR
(Fixed relative to zircon-badger
Lineage Runs
Run set
4
Run set
Add a comment