Skip to main content
homebrewnlp
Projects
gpt
Reports
MoE vs Dense
Log in
Sign up
Share
Comment
Star
Share
Comment
Star
MoE vs Dense
Lucas Nestler
Created on November 1
|
Last edited on November 1
Comment
Loss/Median64 vs Tokens
Loss/Median64 vs Tokens
20G
40G
60G
80G
100G
Speed/Tokens Seen
0.7
0.75
0.8
0.85
0.9
0.95
group: tied-moe-modulo
Run set
13
Add a comment