Plot: QKNorm revisited
olmoe17-8x1b-final-eps-noqk uses no QK-Norm but RMSNorm with weights
olmoe17-8x1b-final-eps uses non-parametric QK-Norm & RMSNorm with weights
Created on July 6|Last edited on August 29
Comment
Run set
3
Run set 2
Add a comment