A Brief Introduction to Graph Attention Networks

This article provides a brief overview of the Graph Attention Networks architecture, complete with code examples in PyTorch Geometric and interactive visualizations using W&B.
Saurav Maheshkar
Created on September 3|Last edited on June 28
Comment
In this article, we'll briefly go over the graph attention networks (GAT) architecture proposed in the paper Graph Attention Networks by Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. 
This is a fundamental model from the graph attention networks paradigm (arguably the first paper!) inspired by work in graph convolutional networks and Attention Literature from the Sequence-to-Sequence Paradigm.
There are three main classes of models of graph neural networks, namely message passing graph neural networks, graph convolutional networks, and graph attention networks. For a brief overview of the three paradigms, you can refer to the following blogs:
﻿Check out the Colab Notebook of these experiments here ⟶\longrightarrow⟶﻿﻿
An Introduction to Graph Attention Networks
This article provides a beginner-friendly introduction to Attention based Graphical Neural Networks (GATs), which apply deep learning paradigms to graphical data. 
An Introduction to Convolutional Graph Neural Networks
This article provides a beginner-friendly introduction to Convolutional Graph Neural Networks (GCNs), which apply deep learning paradigms to graphical data. 
An Introduction to Message Passing Graph Neural Networks
This article provides a beginner-friendly introduction to Message Passing Graph Neural Networks (MPGNNs), which apply deep learning paradigms to graphical data. 
﻿
Here's what we'll be covering: 
Table of ContentsHow to Implement Graph Attention NetworksImplementing the ModelResultsSummaryRecommended Reading
﻿
﻿
Let's get going! 
How to Implement Graph Attention NetworksIn our article on residual gated graph convolution networks, we looked into the problem of variable length graphs. Models like residual gated graph convolution networks (and their much simpler origins gated graph convolution networks) employ gated recurrent units to allow the model to learn, even with graphs of variable length. Still: we can borrow one more essential idea from the sequence-to-sequence paradigm to improve our models. And yes, the answer is indeed attention!
Simply put: we learn mappings between inputs allowing our models to pay attention to what is needed to be learnt. For a better introduction to attention, you can refer to these articles: 
Sequence to Sequence Learning with Neural Networks
In this article, we dive into sequence-to-sequence (Seq2Seq) learning with tf.keras, exploring the intuition of latent space. 
An Introduction to Attention
Part I in a series on attention. In this installment we look at its origins, its predecessors, and provide a brief example of what's to come
﻿
﻿
As beautifully summed up by the authors: 
"The idea is to compute the hidden representations of each node in the graph, by attending over its neighbours, following a self-attention strategy"As discussed in the introductory blog, in its simplest form, attention in graph networks calculates the similarity between two node representations using the following formulation:
hv=ϕ( xu ,⊕v∈Nu ψ(xu,xv))\huge h_v = \phi (\, x_u \, , \oplus_{v \in \mathcal{N}_u}\, \psi(x_u, x_v))hv​=ϕ(xu​,⊕v∈Nu​​ψ(xu​,xv​))﻿
\huge ﻿
where ψ\large \psiψ﻿ is the attention mechanism which calculates the attention score.
The attention mechanism described in the paper is as follows:
hu=αuuW1xu+∑v∈NuαuvW2xv\huge h_u = \alpha_{uu} W_1 x_u + \displaystyle \sum_{v \in \mathcal{N}_u} \alpha_{uv} W_2 x_vhu​=αuu​W1​xu​+v∈Nu​∑​αuv​W2​xv​﻿
where the attention coefficients are calculated using:
αuv=exp(LeakyRELU(wT[Wxu∣∣Wxv]))∑v∈Nuexp(LeakyRELU(wT[Wxu∣∣Wxv]))\huge \alpha_{uv} = \displaystyle \frac{\text{exp}(\text{LeakyRELU}(w^T [Wx_u || Wx_v]))}{ \sum_{v \in \mathcal{N}_u}\text{exp}(\text{LeakyRELU}(w^T [Wx_u || Wx_v]))}αuv​=∑v∈Nu​​exp(LeakyRELU(wT[Wxu​∣∣Wxv​]))exp(LeakyRELU(wT[Wxu​∣∣Wxv​]))​﻿
Implementing the ModelAs with other models discussed in the series, we go to PyTorch Geometric again for an implementation of the attention mechanism discussed above outlined in the paper (GATConv). 
Let's walk through a minimal example implementation:
DROPOUT_RATE: float = 0.5
﻿
class GATv1(torch.nn.Module):
    def __init__(self, in_channels, hidden_channels, out_channels, heads):
        super().__init__()
        self.conv1 = GATConv(in_channels, hidden_channels, heads, dropout=DROPOUT_RATE)
        self.conv2 = GATConv(hidden_channels * heads, out_channels, heads=1, # For output we only use one head
                             concat=False, dropout=DROPOUT_RATE)
﻿
    def forward(self, x, edge_index):
        x = F.dropout(x, p=DROPOUT_RATE, training=self.training)
        x = self.conv1(x, edge_index).relu()
        x = F.dropout(x, p=DROPOUT_RATE, training=self.training)
        x = self.conv2(x, edge_index)
        return x
ResultsWe train some models for 50 epochs to perform node classification on the Cora Dataset, using the minimal model implementation as stated above, and report the training loss and accuracy comparing the effect of the hidden dimension on the overall performance.
﻿
Run set3
﻿
SummaryIn this article, we learned about the Graph Attention Network architecture, along with code and interactive visualizations. To see the full suite of W&B features, please check out this short 5 minutes guide.
If you want more reports covering graph neural networks with code implementations, let us know in the comments below or on our forum ✨!
Check out these other reports on Fully Connected covering other Graph Neural Networks-based topics and ideas.
Recommended Reading
Graph Neural Networks (GNNs) with Learnable Structural and Positional Representations
An in-depth breakdown of "Graph Neural Networks with Learnable Structural and Positional Representations" by Vijay Prakash Dwivedi, Anh Tuan Luu, Thomas Laurent, Yoshua Bengio and Xavier Bresson.
An Introduction to GraphSAGE
This article provides an overview of the GraphSAGE neural network architecture, complete with code examples in PyTorch Geometric, and visualizations using W&B. 
A Brief Introduction to Residual Gated Graph Convolutional Networks
This article provides a brief overview of the Residual Gated Graph Convolutional Network architecture, complete with code examples in PyTorch Geometric and interactive visualizations using W&B. 
What are Graph Isomorphism Networks?
This article provides a brief overview of Graph Isomorphism Networks (GIN), complete with code examples in PyTorch Geometric and interactive visualizations using W&B. 
﻿
﻿
Add a comment
Tags: Articles, GNN, Experiment, Intermediate, Tutorial
Iterate on AI agents and models faster. Try Weights & Biases today.