The Problem with Quadratic Attention in Transformer Architectures