Benchmark on Splash Attention
Created on May 24|Last edited on May 25
Comment
- Comparing Splash Attention and Flash Attention:
- Training throughput improves from 2.8M tokens/sec to 3.2M tokens/sec
- Training and eval loss curves of two runs also match closely. Number wise, the splash attention even seems to be slightly better.
Section 1
This set of panels contains runs from a private project, which cannot be shown in this report
This set of panels contains runs from a private project, which cannot be shown in this report
This set of panels contains runs from a private project, which cannot be shown in this report
This set of panels contains runs from a private project, which cannot be shown in this report
This set of panels contains runs from a private project, which cannot be shown in this report
This set of panels contains runs from a private project, which cannot be shown in this report
This set of panels contains runs from a private project, which cannot be shown in this report
This set of panels contains runs from a private project, which cannot be shown in this report
Add a comment