Skip to main content
ai2-llm
Projects
olmoe
Reports
Plot: QKNorm revisited
Log in
Sign up
Share
Comment
Star
Share
Comment
Star
Plot: QKNorm revisited
olmoe17-8x1b-final-eps-noqk uses no QK-Norm but RMSNorm with weights olmoe17-8x1b-final-eps uses non-parametric QK-Norm & RMSNorm with weights
Niklas Muennighoff
Created on July 6
|
Last edited on August 29
Comment
eval/c4_en-validation/CrossEntropyLoss
eval/c4_en-validation/CrossEntropyLoss
6k
8k
10k
12k
14k
Step
2.75
2.8
2.85
2.9
olmoe17-8x1b-final-eps-noqk
Run set
olmoe17-8x1b-final-eps
Run set
olmoe17-8x1b-final-eps
Run set
train/CrossEntropyLoss
train/CrossEntropyLoss
1k
2k
3k
4k
5k
Step
4
6
8
10
olmoe17-8x1b-final-eps-noqk
Run set
olmoe17-8x1b-final-eps
Run set
olmoe17-8x1b-final-eps
Run set
Run set
3
Run set 2
Add a comment