Tootsie 8B Main Phase ("original-ocelot")

See https://github.com/stanford-crfm/marin/issues/600 for narrative

Created on March 12|Last edited on May 12

Comment

﻿
﻿Main Tootsie Report﻿
﻿GH Issue #600 (with Narrative)﻿
﻿Google Sheet with all Named Checkpoints﻿
Big Idea:(gray run) Core tootsie DCLM mix to ≈2.76T tokens using WSD-S and 4M token batch size
(red run) Switched to 12M BS afterwards (on bigger hardware) using WSD (no cycles) and use the exponential moving averaging (EMA) as our proxy for model quality
Note that we threw out the cooldown on this run and only use the checkpoint at about 3.75e12 tokens
Lineage Runs﻿
This set of panels contains runs from a private project, which cannot be shown in this report
﻿
﻿
﻿
Run set6662
﻿
﻿

Add a comment