Skip to main content

Benchmark on Splash Attention

Created on May 24|Last edited on May 25
  • Comparing Splash Attention and Flash Attention:
  • Training throughput improves from 2.8M tokens/sec to 3.2M tokens/sec
  • Training and eval loss curves of two runs also match closely. Number wise, the splash attention even seems to be slightly better.

Section 1


This set of panels contains runs from a private project, which cannot be shown in this report




This set of panels contains runs from a private project, which cannot be shown in this report



This set of panels contains runs from a private project, which cannot be shown in this report



This set of panels contains runs from a private project, which cannot be shown in this report




This set of panels contains runs from a private project, which cannot be shown in this report




This set of panels contains runs from a private project, which cannot be shown in this report



This set of panels contains runs from a private project, which cannot be shown in this report



This set of panels contains runs from a private project, which cannot be shown in this report