Skip to main content

An Introduction to Graph Attention Networks

This article provides a beginner-friendly introduction to Attention based Graphical Neural Networks (GATs), which apply deep learning paradigms to graphical data.
Created on March 3|Last edited on March 3

Attention is the unquestioned king in the Natural Language Domain. It is perhaps one of the simplest yet the most impactful learning paradigm along with Self-Supervised Learning in leading the current AI revolution. Well Attention doesn't stop with Natural Language and no I'm not talking about Vision !!
In this article we'll introduce the notion of Attention in Graphical Neural Networks (GATs) !!

How does Attention work in Graph Neural Nets ?

Attention has a simple principle in language, in it's most simplest form it calculates similarity between every word in a sentence. It translates similarly to graphs ! In its simplest form Attention in Graph Nets calculates the similarity between two node representations i.e,
hv=ϕ(xu,vNuψ(xu,xv))\huge h_v = \phi(x_u, \, \oplus_{v \in \mathcal{N}_u} \, \psi(x_u, x_v))

where ψ\large \psi calculates the attention score between the two node representations.
The notion of attention was (arguably) introduced by Veličković et al. in Graph Attention Networks. The idea being to compute the hidden representations of each node in the graph, by "attending" over its neighbours.

As demonstrated in the figure, this approach only considered the "first-order" neighbours i.e. the nodes being one hop away from the root node under consideration thereby inserting some structural information to the proposed attention mechanism.
One might ask why use attention in graph nets ? The same reasons that we use them in sequence based tasks ! Instead of hand crafting features and telling the networks what to learn by providing some bias such as the local receptive field of convolutions as in GCNs or aggregating information from all nodes in the neighborhood in MPGNNs, attention lets us focus on other aspects by letting the algorithm itself decide what the inputs will focus and learn.
There are also other benefits compared to the other paradigms, some key advantages being:
  • Efficiency !! Same reason that attention took over from convolutions. The multi-head paradigm of attention allows for parallelization when running on GPUs thereby making it extremely efficient compared to its other counterparts.
  • Because we don't enforce anything in attention it allows for the algorithm to decide which parts of the input to focus on, thereby increasing learning capability and interpretability.
NOTE: To get a overview of the other paradigms, please refer to the following articles
💡


Summary

In this article, we learned about the Attention Framework used in Graph Neural Networks, and its benefits compared to the other paradigms. To see the full suite of W&B features, please check out this short 5 minutes guide.
If you want more reports covering graph neural networks with code implementations, let us know in the comments below or on our forum ✨!
Check out these other reports on Fully Connected covering other Graph Neural Networks-based topics and ideas.