Skip to main content

Tootsie 8B Main Phase ("original-ocelot")

See https://github.com/stanford-crfm/marin/issues/600 for narrative
Created on March 12|Last edited on May 12

Big Idea:

  • (gray run) Core tootsie DCLM mix to ≈2.76T tokens using WSD-S and 4M token batch size
  • (red run) Switched to 12M BS afterwards (on bigger hardware) using WSD (no cycles) and use the exponential moving averaging (EMA) as our proxy for model quality
Note that we threw out the cooldown on this run and only use the checkpoint at about 3.75e12 tokens

Lineage Runs


This set of panels contains runs from a private project, which cannot be shown in this report



Run set
6662