Skip to main content

Introduction to Graph Neural Networks

Interested in Graph Neural Networks and want a roadmap on how to get started? In this article, we'll give a brief outline of the field and share blogs and resources!
Created on January 22|Last edited on March 3
Over the past few years, there's been emerging interest in the use of graph data in machine learning. Why is that? Well, graphs provide an excellent mathematical structure to represent molecules (leading to groundbreaking research like AlphaFold) and other networks of various kinds. They've also recently emerged as the key meta-structure for everything, i.e. modalities such as vision, text and speech can be seen as special cases of graphs and thus Graph Representation Learning has become increasingly important.
In this article we will provide a brief overview of the field based on the excellent survey paper Everything is Connected: Graph Neural Networks by Petar Veličković and share key ideas, important notation and links to other relevant blogs.
Source: Figure 1 of Everything is Connected: Graph Neural Networks by Petar Veličković

Table of Contents





Important Notation


  • Let's start by defining a simple and modular description of a graph. A graph is defined as a tuple of sets G=(V,E)\large \mathcal{G} = (\mathcal{V}, \mathcal{E}) of a set of nodes V\large \mathcal{V} and a set of edges EV×V\large \mathcal{E} \sube \mathcal{V} \times \mathcal{V} consisting of pairs of nodes that are connected. Both nodes and edges can have properties which are of interest to us.
  • Every node uV\large u \in \mathcal{V} is said to have a k\large k- dimensional feature xuRk\large x_u \in \mathbb{R}^k. If we stack the features of all the nodes we get the (node) feature matrix X=[x1...xV]\large X = [x_1 ... x_{|\mathcal{V}|}]^{\top}.
  • There are a number of ways in which we could store information about the edges E\large \mathcal{E}, the most common method is to use a adjacency matrix ARV×V\large A \in \mathbb{R}^{|\mathcal{V}| \times |\mathcal{V}|}, where
auv={1(u,v)E0(u,v)∉E}\large a_{uv} = \left\{ \begin{array}{ll} 1 \hspace{2em} (u,v) \in \mathcal{E} \\ 0 \hspace{2em} (u,v) \not\in \mathcal{E} \end{array} \right\}

  • The graph structure also allows for the edges to have some locality around them. This is usually described as the nodes surrounding or in the neighborhood of some node
Nu={v(u,v)E(v,u)E}\large \mathcal{N}_u = \left\{ \begin{array}{ll} v \, | \,(u,v) \in \mathcal{E} \lor (v,u) \in \mathcal{E} \end{array} \right\}

And the multiset of all neighborhood features, XNu\large X_{\mathcal{N}_u} can be defined as:
XNu={{xvvNu}}\large X_{\mathcal{N}_u} = \{\{ x_v \, | \, v \in \mathcal{N}_u \, \}\}

  • We define a local function i.e. the function which computes the features of a local region of the graph as
hu=ϕ(xu,XNu)\large h_u = \phi(x_u, X_{\mathcal{N}_u})


Equivariance and Invariance

Most modern day geometric deep learning relies on exploiting the underlying symmetry of our natural world. These symmetries are exploited using various properties, we shall discuss two such properties today by taking the example of permutation as a symmetric operation:
  • Invariance: "shuffling the input doesn't change the outputs" a function is said to be permutation invariant if by shuffling the order of the inputs, the output still remains the same. In the case of graphs this is realised by using some permutation matrix P\large P to change the order of the nodes and edges i.e. by permuting the adjacency matrix PAP\large PAP^{\top} . Thus, permutation invariance can be formalized as:
f(PX,PAP)=f(X,A)\large f(PX, PAP^{\top}) = f(X,A)

  • Equivariance: "shuffling the input also shuffles the output" a function is said to be equivariant if by shuffling the order of the inputs, the output also gets shuffled. Similarly as above, we can formalise permutation equivariance as:
F(PX,PAP)=PF(X,A)\large F(PX, PAP^{\top}) = PF(X, A)

There are other symmetries that also observed such as shift and rotation, which have profound implications in domains such as vision and molecular modeling.

Graph Neural Networks

Defining the aforementioned local function is of key importance in graph representation learning and much of the field revolves around defining good permutation invariant local functions ϕ\large \phi, which exhibit key symmetry and computational properties. Most methods can be grouped into three broad classes namely Graph Convolutional Networks, Graph Attentional Networks and Message Passing Graph Networks. To know more I'd recommend going through these introductory articles and some key methods in each family
  1. Convolutional
hu=ϕ(xu,vNucvuψ(xv))\huge h_u = \phi(x_u, \oplus_{v \in \mathcal{N}_u} \, c_{vu} \, \psi(x_v)) \hspace{2em}

2. Attentional
hu=ϕ(xu,vNua(xu,xv)ψ(xv))\huge h_u = \phi (x_u, \oplus_{v \in \mathcal{N}_u} \, a(x_u, x_v) \psi(x_v))

3. Message Passing
hu=ϕ(xu,vNuψ(xu,xv))\huge h_u = \phi(x_u, \oplus_{v \in \mathcal{N_u} } \, \psi(x_u, x_v))


Graph Convolutional Networks (GCN)



Graph Attentional Networks (GAT)



Message Passing Graph Neural Networks (MPGNN)



Summary

In this article we attempted to provide a simple notation for graphs and introduce key methods in Graph Representation Learning. We also looked at some important concepts such as Permutation Invariance and Equivariance.
If you want more reports covering graph neural networks with code implementations, let us know in the comments below or on our community discord ✨!
Check out these other reports on Fully Connected covering other Graph Neural Networks-based topics and ideas.
To see the full suite of W&B features, please check out this short 5 minutes guide.

Further Reading


Iterate on AI agents and models faster. Try Weights & Biases today.