Tootsie 8B dessert v1 ("zircon-badger")

See https://github.com/stanford-crfm/marin/issues/600 for narrative

Created on March 12|Last edited on May 12

Comment

﻿
﻿Main Tootsie Report﻿
﻿GH Issue #600 (with Narrative)﻿
﻿Google Sheet with all Named Checkpoints﻿
Big Idea:
Start from end of [monumental-jellyfish]
Core tootsie DCLM mix to 3.7 T tokens
Cooldown on Dolmino HQ data (without synth math or Flan) to 4.8T tokens
add in flan and synthmath (and maintain other mix) for another 200B tokens at low LR
Flan was oversampled, and many synthmaths were inadvertently excluded  
Lineage Runs﻿
Run set4
﻿
﻿
﻿
Run set
﻿
﻿

Add a comment