An Introduction to Message Passing Graph Neural Networks

This article provides a beginner-friendly introduction to Message Passing Graph Neural Networks (MPGNNs), which apply deep learning paradigms to graphical data.
Saurav Maheshkar
Created on March 3|Last edited on March 3
Comment
﻿
﻿
Recently Graph Neural Networks (GNNs) have emerged as the de facto standard for geometry-heavy applications such as molecule property prediction, drug interaction prediction, gene expression prediction, social network analysis, etc. 
In this article, we'll look into the popular Message Passing Graph Neural Network paradigm, a type of Spatial Network (as opposed to Spectral Networks). Let's dig in.  
What Are Graph Neural Nets?Graph Neural Networks help apply modern deep learning paradigms (such as augmentations) to graphical data, which can't be easily applied using classical deep learning methods. They have wide-ranging uses, especially in geometry-heavy applications such as molecule property prediction.
Given a graph G=(V,E,X,W)\large G = (V, E, X, W)G=(V,E,X,W)﻿, where V\large VV﻿ is the set of vertices, E\large EE﻿ is the set of edges, X\large X X﻿ are node attributes and W\large WW﻿ are the edge attributes, there are broadly 2 types of tasks:-
Given a graph G\large GG﻿ to predict a graph label F(G)\large F(G)F(G)﻿ [Graph Property Prediction]
Given a graph G\large GG﻿ and a node v\large vv﻿ (or a pair of nodes (u,v)\large (u,v)(u,v)﻿) to predict a nodef(v)/f(u,v)\large f(v) / f(u,v)f(v)/f(u,v)﻿ [Node Property Prediction & Link Property Prediction]
Message Passing ParadigmThis type of network follows an iterative scheme of updating node representations based on the aggregation from nearby nodes. Here's how it works:
Suppose hv(t)\large h^{(t)}_{v}hv(t)​﻿ represents the node embeddings for some vertex v\large vv﻿ at iteration t\large tt﻿, then the paradigm can be broken down into 3 parts
Initialization 
hv(0)=xv∀v∈V\huge h^{(0)}_{v} = x_v \hspace{2em} \forall v \in Vhv(0)​=xv​∀v∈V﻿
Aggregation 
mv(t)=fAgg(t)(hv(t−1),{hu(t−1):u∈N(v)})1≤t<T\huge m^{(t)}_{v} = f^{(t)}_{\text{Agg}} (h^{(t-1)}_{v}, \{ h^{(t-1)}_{u} : u \in \mathcal{N}(v) \}) \hspace{1em} 1 \leq t < Tmv(t)​=fAgg(t)​(hv(t−1)​,{hu(t−1)​:u∈N(v)})1≤t<T﻿
Transformation
hv(t)=fUpdate(hv(t),mv(t))\huge h^{(t)}_{v} = f_{\text{Update}}(h^{(t)}_{v}, m^{(t)}_{v})hv(t)​=fUpdate​(hv(t)​,mv(t)​)﻿
The aggregation function plays a very important role here and has driven research for the past couple of years. It's usually permutation invariant and equivariant of node representations. Its main function is to aggregate information from its neighborhood and pass on the information to the next layer.
NOTE: There's a slight difference in terminology here. The GNN depth refers to how many hops we take on a particular node. A 1-hop depth would indicate looking at all nodes which are 1 edge away. A 2-hop depth would indicate all nodes (subgraphs) which are 2 edges away from the node under consideration. Also, one iteration i.e. a t-hop neighborhood is termed as a layer
💡
The same is mostly expressed as the following in literature in a more compact manner:  
hu=ϕ(xu,⊕v∈Nvψ(xu,xv))\huge h_u = \phi \left( x_u , \oplus_{v \in \mathcal{N}_v} \psi(x_u, x_v)\right)hu​=ϕ(xu​,⊕v∈Nv​​ψ(xu​,xv​))﻿
Let's have a look at some aggregation and transformation functions from some popular Message Passing based Graph Neural Networks:
In the foundational paper "Semi-Supervised Classification with Graph Convolutional Networks" by Thomas N. Kipf and Max Welling, they used to mean and a simple MLP
mv(t)=1∣N(v)+1∣∑u∈N(v)huthv(t+1)=σ(W(t)⋅mv(t))\huge \begin{array}{ll} m_{v}^{(t)} &= \displaystyle \frac{1}{|\mathcal{N}(v) + 1|} \displaystyle \sum_{u\in\mathcal{N}(v)} h^{t}_{u} \\ 

h^{(t+1)}_{v} &= \sigma(W^{(t)} \cdot m^{(t)}_{v})
\end{array}mv(t)​hv(t+1)​​=∣N(v)+1∣1​u∈N(v)∑​hut​=σ(W(t)⋅mv(t)​)​﻿
In "How Powerful are Graph Neural Networks?" by Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka, they used simple summation and an MLP: 
mv(t)=∑u∈N(v)huthv(t+1)=σ(W(t)⋅mv(t))\huge \begin{array}{ll} m_{v}^{(t)} &= \displaystyle \sum_{u\in\mathcal{N}(v)} h^{t}_{u} \\ 

h^{(t+1)}_{v} &= \sigma(W^{(t)} \cdot m^{(t)}_{v})
\end{array}mv(t)​hv(t+1)​​=u∈N(v)∑​hut​=σ(W(t)⋅mv(t)​)​﻿
SummaryIn this article, we learned about the Message Passing Framework used in Graph Neural Networks, and some of its common forms in the literature.  To see the full suite of W&B features, please check out this short 5 minutes guide.
If you want more reports covering graph neural networks with code implementations, let us know in the comments below or on our forum ✨!
Check out these other reports on Fully Connected covering other Graph Neural Networks-based topics and ideas.
Recommended Reading
Graph Neural Networks (GNNs) with Learnable Structural and Positional Representations
An in-depth breakdown of "Graph Neural Networks with Learnable Structural and Positional Representations" by Vijay Prakash Dwivedi, Anh Tuan Luu, Thomas Laurent, Yoshua Bengio and Xavier Bresson.
Using W&B with DeepChem: Molecular Graph Convolutional Networks
A quick tutorial on using W&B to track DeepChem molecular deep learning experiments
Part 1 – Introduction to Graph Neural Networks With GatedGCN
This article summarizes the need for Graph Neural Networks and analyzes one particular architecture – the Gated Graph Convolutional Network.
De Novo Molecule Generation with GCPNs using TorchDrug
How reinforcement learning, specifically graph convolutional policy networks, can help create brand new molecules to treat real world diseases
﻿
﻿
Add a comment