Visualizing The Effect of Attention on Gradient Flow Using Custom Charts

A look into gradient propagation in attentive recurrent models with Weights & Biases custom charts feature . Made by Kyle Goyette using Weights & Biases
Kyle Goyette


Attentive recurrent models can be painful to inspect. I was interested in finding out how the structure of learned attention mechanisms interacted with gradient flow in sequential models. So I created the visualization below. Hovering over any point in this model shows the strength of the connection between all other time steps and the selected time step. From this, I was clearly able to see understand how my model leveraged attention to learn to solve the denoise task, and how gradient flowed in the learned structure.

