Skip to main content

Benchmark on Splash Attention

Created on May 24|Last edited on May 25
  • Comparing Splash Attention and Flash Attention:
  • Training throughput improves from 2.8M tokens/sec to 3.2M tokens/sec
  • Training and eval loss curves of two runs also match closely. Number wise, the splash attention even seems to be slightly better.

Section 1


Select runs that logged train/loss
to visualize data in this line chart.
Select runs that logged throughput/tokens_per_second
to visualize data in this line chart.
Run set




Run set



Run set



Run set




Run set




Run set



Run set



Run set