Skip to main content

620 Int8 Training

Created on February 23|Last edited on February 28
Integrate int8 training feature from AQT library into levanter/haliax.

v5e-256 (eu-west4)

1.4B

  • Training loss of int8 matches the loss of baseline (32/16 mixed precision training). <1% difference.
  • Naive default int8 config (magenta) performs poorly (40% MFU)
  • Maxtext's default int8 config (green) outperforms the baseline (60% vs 57%)

Select runs that logged train/loss
to visualize data in this line chart.
Select runs that logged throughput/tokens_per_second
to visualize data in this line chart.
Select runs that logged throughput/mfu
to visualize data in this line chart.


8B

  • Training loss of int8 matches the loss of baseline. <1% difference.
  • Int8 (magenta) significantly outperforms baseline (72% vs 61% MFU)
  • Throughput gets a 17.4% bump!




Multislice (2x v5e-256)

  • As expected
  • 14.7% increase in throughput



v4-256 (us-central2)

  • doesn't work