Skip to main content
ai2-llm
Projects
olmoe
Reports
Plot: QKNorm vs None
Log in
Sign up
Share
Comment
Star
Share
Comment
Star
Plot: QKNorm vs None
Niklas Muennighoff
Created on June 23
|
Last edited on August 29
Comment
eval/c4_en-validation/CrossEntropyLoss
eval/c4_en-validation/CrossEntropyLoss
50k
100k
150k
200k
Step
3
3.5
4
4.5
5
olmoe17-8x1b-fullshard-swiglu-wrapb-k2-init
Run set
olmoe17-8x1b-fullshard-swiglu-wrapb-k2-init-qknorm
Run set
olmoe17-8x1b-fullshard-swiglu-wrapb-k2-init-qknorm
Run set
olmoe17-8x1b-fullshard-swiglu-wrapb-k2-init-qknorm
Run set
train/CrossEntropyLoss
train/CrossEntropyLoss
100k
200k
300k
400k
Step
4
6
8
10
olmoe17-8x1b-fullshard-swiglu-wrapb-k2-init
Run set
olmoe17-8x1b-fullshard-swiglu-wrapb-k2-init-qknorm
Run set
olmoe17-8x1b-fullshard-swiglu-wrapb-k2-init-qknorm
Run set
olmoe17-8x1b-fullshard-swiglu-wrapb-k2-init-qknorm
Run set
Run set
4
Run set 2
Add a comment