#934 Phoenix Cooldown Mix Ablation
Created on April 21|Last edited on May 18
Comment
We've run a bunch of ablations on cooldowns, but they haven't carefully controlled for possible impacts of changing the data mix mid-run. We also now have access to a large new pool of pretraining data from Nemotron and a bunch of other data we think might be good.
Hypothesis or Goal
Concretely, we want to test using our annealing setup which of the following leads to better cooldown results.:
- Our Original Pretraining Data Mix (DCLM + StarCoder)
- Nemotron, Code
- Nemotron, Code, Dolmino
- Nemotron, Code, Dolmino, our other data.
For the purposes of this experiment, we will define better across a couple different axes:
- Overall Paloma Loss
- Tulu3_flat_llama_tokenized_as_validation Loss (NLL on Instruction Data)
- MMLU Acc.
Section 1
This set of panels contains runs from a private project, which cannot be shown in this report
Add a comment