Skip to main content
marin-community
Projects
marin
Reports
620 Int8 Training
Log in
Sign up
Share
Comment
Star
620 Int8 Training
Jason Wang
Created on February 23
|
Last edited on February 28
Comment
Integrate int8 training feature from
AQT
library into levanter/haliax.
v5e-256 (eu-west4)
1.4B
Training loss of int8 matches the loss of baseline (32/16 mixed precision training). <1% difference.
Naive default int8 config (magenta) performs poorly (40% MFU)
Maxtext's default int8 config (green) outperforms the baseline (60% vs 57%)
train/loss
train/loss
Select runs that logged train/loss
to visualize data in this line chart.
throughput/tokens_per_second
throughput/tokens_per_second
Select runs that logged throughput/tokens_per_second
to visualize data in this line chart.
throughput/mfu
throughput/mfu
Select runs that logged throughput/mfu
to visualize data in this line chart.
8B
Training loss of int8 matches the loss of baseline. <1% difference.
Int8 (magenta) significantly outperforms baseline (72% vs 61% MFU)
Throughput gets a 17.4% bump!
Multislice (2x v5e-256)
As expected
14.7% increase in throughput
v4-256 (us-central2)
doesn't work
Add a comment