Skip to main content

#934 Phoenix Cooldown Mix Ablation

Created on May 18|Last edited on May 18
We've run a bunch of ablations on cooldowns, but they haven't carefully controlled for possible impacts of changing the data mix mid-run. We also now have access to a large new pool of pretraining data from Nemotron and a bunch of other data we think might be good.

Hypothesis or Goal


Concretely, we want to test using our annealing setup which of the following leads to better cooldown results.:

  • Our Original Pretraining Data Mix (DCLM + StarCoder)
  • Nemotron, Code
  • Nemotron, Code, Dolmino
  • Nemotron, Code, Dolmino, our other data.

For the purposes of this experiment, we will define better across a couple different axes:

  • Overall Paloma Loss
  • Tulu3_flat_llama_tokenized_as_validation Loss (NLL on Instruction Data)
  • MMLU Acc.



Section 1


This set of panels contains runs from a private project, which cannot be shown in this report