Visualizing The Effect of Attention on Gradient Flow Using Custom Charts
In this article, we take a look into gradient propagation in attentive recurrent models with Weights & Biases Custom Charts feature.
Created on October 7|Last edited on November 9
Comment
Attentive recurrent models can be painful to inspect. I was interested in finding out how the structure of learned attention mechanisms interacted with gradient flow in sequential models.
So I created the visualization below. Hovering over any point in this model shows the strength of the connection between all other time steps and the selected time step. From this, I was clearly able to see and understand how my model leveraged attention to learn to solve the denoise task, and how gradient flowed in the learned structure.
Run set
1
Add a comment
Iterate on AI agents and models faster. Try Weights & Biases today.