On the Relationship Between Self-Attention and Convolutional Layers