Experiments: LSH-attention evaluation speed
Motivation
Claim: The number of hashes of a model with LSH-attention can be increased at evaluation time to produce more accurate results. The evaluation speed measured in seconds / step is growing with the number of hashes but not with increased sequence length.
To verify the claim we test evaluation time on the synthetic task with hyperparameters as indicated in the right part of figure 5 in the reformer report. See our documentation for experiment details: https://arampacha.github.io/reformer_fastai/experiment.speed-lsh_synthetic-task.html. Our results are summarized in the figure below:
We were unable to complete the longest sequence lengths for full attention due to out of memory errors on a single GPU. The results for the smaller sequences are mostly matching, but our full attention model seems to be a bit faster relative to LSH-attention than in the paper.
All results are slightly faster than in the paper. This could be due to method of measurement, architecture choices or hardware.