An Introduction to Graph Attention Networks

This article provides a beginner-friendly introduction to Attention based Graphical Neural Networks (GATs), which apply deep learning paradigms to graphical data.
Saurav Maheshkar
Created on March 3|Last edited on March 3
Comment
﻿
﻿
Attention is the unquestioned king in the Natural Language Domain. It is perhaps one of the simplest yet the most impactful learning paradigm along with Self-Supervised Learning in leading the current AI revolution. Well Attention doesn't stop with Natural Language and no I'm not talking about Vision !!
In this article we'll introduce the notion of  Attention in Graphical Neural Networks (GATs) !!
How does Attention work in Graph Neural Nets ?Attention has a simple principle in language, in it's most simplest form it calculates similarity between every word in a sentence. It translates similarly to graphs ! In its simplest form Attention in Graph Nets calculates the similarity between two node representations i.e,
hv=ϕ(xu, ⊕v∈Nu ψ(xu,xv))\huge h_v = \phi(x_u, \, \oplus_{v \in \mathcal{N}_u} \, \psi(x_u, x_v))hv​=ϕ(xu​,⊕v∈Nu​​ψ(xu​,xv​))﻿
where ψ\large \psiψ﻿ calculates the attention score between the two node representations.
The notion of attention was (arguably) introduced by Veličković et al. in Graph Attention Networks. The idea being to compute the hidden representations of each node in the graph, by "attending" over its neighbours. 
﻿
As demonstrated in the figure, this approach only considered the "first-order" neighbours i.e. the nodes being one hop away from the root node under consideration thereby inserting some structural information to the proposed attention mechanism. 
One might ask why use attention in graph nets ? The same reasons that we use them in sequence based tasks ! Instead of hand crafting features and telling the networks what to learn by providing some bias such as the local receptive field of convolutions as in GCNs or aggregating information from all nodes in the neighborhood in MPGNNs, attention lets us focus on other aspects by letting the algorithm itself decide what the inputs will focus and learn.
There are also other benefits compared to the other paradigms, some key advantages being:
Efficiency !! Same reason that attention took over from convolutions. The multi-head  paradigm of attention allows for parallelization when running on GPUs thereby making it extremely efficient compared to its other counterparts. 
Because we don't enforce anything in attention it allows for the algorithm to decide which parts of the input to focus on, thereby increasing learning capability and interpretability. 
NOTE: To get a overview of the other paradigms, please refer to the following articles
💡
An Introduction to Message Passing Graph Neural Networks
This article provides a beginner-friendly introduction to Message Passing Graph Neural Networks (MPGNNs), which apply deep learning paradigms to graphical data. 
An Introduction to Convolutional Graph Neural Networks
This article provides a beginner-friendly introduction to Convolutional Graph Neural Networks (GCNs), which apply deep learning paradigms to graphical data. 
﻿
SummaryIn this article, we learned about the Attention Framework used in Graph Neural Networks, and its benefits compared to the other paradigms.  To see the full suite of W&B features, please check out this short 5 minutes guide.
If you want more reports covering graph neural networks with code implementations, let us know in the comments below or on our forum ✨!
Check out these other reports on Fully Connected covering other Graph Neural Networks-based topics and ideas.
Graph Neural Networks (GNNs) with Learnable Structural and Positional Representations
An in-depth breakdown of "Graph Neural Networks with Learnable Structural and Positional Representations" by Vijay Prakash Dwivedi, Anh Tuan Luu, Thomas Laurent, Yoshua Bengio and Xavier Bresson.
Using W&B with DeepChem: Molecular Graph Convolutional Networks
A quick tutorial on using W&B to track DeepChem molecular deep learning experiments
Part 1 – Introduction to Graph Neural Networks With GatedGCN
This article summarizes the need for Graph Neural Networks and analyzes one particular architecture – the Gated Graph Convolutional Network.
De Novo Molecule Generation with GCPNs using TorchDrug
How reinforcement learning, specifically graph convolutional policy networks, can help create brand new molecules to treat real world diseases
﻿
﻿
Add a comment