Skip to main content

Graph Neural Networks (GNNs) with Learnable Structural and Positional Representations

An in-depth breakdown of "Graph Neural Networks with Learnable Structural and Positional Representations" by Vijay Prakash Dwivedi, Anh Tuan Luu, Thomas Laurent, Yoshua Bengio and Xavier Bresson.
Created on January 21|Last edited on February 24

Read the Paper | Official Implementation | W&B Implementation | W&B Dashboard

Table of Contents (Click to Expand)



🧐 Motivation

Most GNNs are designed with a message passing mechanism that builds node representation by aggregating local neighborhood information. This class of GNNs is fundamentally structural (i.e. the node representation only depends on the local structure of the graph).
As a consequence of which most nodes in graphs do not have any canonical positional information. This lack of structural information is a huge limitation and results in the model having low representation power due to their inability to differentiate between simple graph symmetries.
Figure 1: The typical way a Graph Neural Networks (GNN) are structured. Considering the example of a molecule the node features viz. hi,hjh_i, h_j can represent the atom type (Hydrogen or Oxygen) and the edge features eije_{ij} can represent the bond type (Covalent or Ionic).
For example, two atoms in a molecule with the same neighbourhood are expected to have similar representation. However this can be limiting to have the same representation for these two atoms as their positions in the molecule are distinct and their role may be different. As a consequence, the popular MP-GNNs (Message Passing Graph Neural Networks) fails to differentiate two nodes with the same 1-hop local structure, this is now well understood in context of equivalence of MP-GNNs with Weisfeiler-Lehman (WL) test for graph isomorphism.
To mitigate these issues the authors propose a new framework called LSPE (Learnable Structural and Positional Embeddings) that can be used with any MP-GNN to learn both of these properties at the same time.
These limitations can be alleviated to certain extents by:-
  1. Stacking Multiple Layers: this can help propagate the information from a node to multiple hops but is deficient for long distance nodes because of "over-squashing".
  2. Applying Higher-Order GNNs: Compute higher-order node-tuple aggregations such as in WL-based GNNs though these models are computationally expensive.
  3. Considering PE for nodes/edges: this helps by creating some global positional attributes for the various nodes in a graph and can lead to sub-structures.

🙇‍♂️ Method

Notation (Click to Expand)

Standard Message Passing Graph Neural Networks (MP-GNNs)

The typical way a Standard Message Passing Graph Neural Network problem is structured can be represented as follows :-
hil+1=fh(hil,{hjl}jNi,eijl)eijl+1=fe(hil,hjl,eijl)\huge \begin{array}{ll} h^{l+1}_{i} &= f_h(\, h^{l}_{i}\, , \,\{ h^l_j \}_{j \in \mathcal{N}_i} \, , \, e^l_{ij} \,) \\ e^{l+1}_{ij} &= f_e(\, h^l_i \, , \, h^l_j \, , \, e^l_{ij} \,) \end{array}

where hil+1,hil\large h^{l+1}_{i}, \, h^l_i are the node structural embeddings. The jNij \in \mathcal{N_i} bit represents the neighbourhood of a particular node. eil+1,eil\large e^{l+1}_i \, , \, e^l_i \,  are the edge features. fh,fe\large f_h, f_e  are just functions with learnable parameters.

Input Features and Initialization (Click to Expand)

These are produced by a linear embedding of available input nodes and edge features.
hil=0=LLh(hiin)=A0hiin+a0Rdeijl=0=LLe(eijin)=B0eijin+b0Rd\huge \begin{array}{ll} h^{l=0}_i &= LL_h(h^{in}_{i}) &= A^{0} h^{in}_i + a^0 \hspace{2em} \in \mathbb{R}^d \\ e^{l=0}_{ij} &= LL_e(e^{in}_{ij}) &= B^0 e^{in}_{ij} + b^0 \hspace{2em} \in \mathbb{R}^d \end{array}

Where,
  • hiinRdv\large h^{in}_i \in \mathbb{R}^{d_v} and eijinRde\large e^{in}_{ij} \in \mathbb{R}^{d_e}
  • A0Rd×dv\large A^0 \in \mathbb{R}^{d \times d_v}, B0Rd×de\large B^0 \in \mathbb{R}^{d \times d_e}, a0,b0Rd\large a^0, b^0 \in \mathbb{R}^d are learnable.

Positional Encoding



Existing MP-GNNs integrate Positional Embeddings to input node features by concatenation.
hil+1=fh(hil,{hjl}jNi,eijl)eijl+1=fe(hil,hjl,eijl)\huge \begin{array}{ll} h^{l+1}_{i} &= f_h(\, h^{l}_{i}\, , \,\{ h^l_j \}_{j \in \mathcal{N}_i} \, , \, e^l_{ij} \,) \\ e^{l+1}_{ij} &= f_e(\, h^l_i \, , \, h^l_j \, , \, e^l_{ij} \,) \\ \end{array}

With Initial
hil=0=LLh([hiinpiin])=D0([hiinpiin])+d0eijl+1=LLe(eijin)=B0eijin+b0\huge \begin{array}{ll} h^{l=0}_{i} &= LL_h(\,\left[ \displaystyle \begin{array}{cc} h^{in}_{i} \\ p^{in}_{i} \end{array} \right]\,) &= D^0(\, \left[ \begin{array}{cc} h^{in}_{i} \\ p^{in}_i \end{array} \right]\,) + d^0\\ e^{l+1}_{ij} &= LL_e (e^{in}_{ij}) &= B^0\, e^{in}_{ij} + b^0 \hspace{2.75em} \end{array}


⭐️ ⭐️ Decoupling Position and Structure in MP-GNNs ⭐️ ⭐️



hil+1=fh([hilpil],{[hjlpjl]}jNi,eijl)eijl+1=fe(hil,hjl,eijl)pil+1=fp(pil,{pjl}jNi,eijl)\huge \begin{array}{ll} h^{l+1}_{i} &= f_h(\,\left[ \displaystyle \begin{array}{cc} h^l_i \\ p^l_i \end{array} \right]\, , \, \left\{ \left[ \displaystyle \begin{array}{cc} h^l_j \\ p^l_j \end{array} \right] \right\}_{j \in \mathcal{N}_i} \, , \, e^{l}_{ij} \,) \\ e^{l+1}_{ij} &= f_e(\, h^l_i \, , \, h^l_j \, , \, e^{l}_{ij} \,) \\ p^{l+1}_{i} &= f_p(\, p^l_i \, , \, \{ p^l_{j} \}_{j \in \mathcal{N}_i} \, , \, e^{l}_{ij} \,) \end{array}

With Initial
pil=0=LLP(piPE)=C0piPE+c0Rd\huge p^{l=0}_{i} = LL_P (\, p^{PE}_{i} \,) = C^0 p^{PE}_{i} + c^0 \hspace{2em} \in \mathbb{R}^d


Positional Loss



Loss=LossTask([hl=Lpl=L])+αLossLap Eig(pl=L)\huge \displaystyle \text{Loss} = \text{Loss}_{\text{Task}} \left( \left[ \displaystyle \begin{array}{cc} h^{l=L} \\ p^{l=L}\end{array} \right] \right) + \alpha \, \text{Loss}_{\text{Lap Eig}} (p^{l=L})

Figure 2: The proposed method that incorporates Positional Embeddings (PE) into the standard representation to learn canonical information.

📈 Experiments

For the purposes of this report, we use the ZINC Dataset for Graph Regression comparing various architectures and Positional Embedding types. As we can see from the plots the actual and learned eigenvectors for the trained models look pretty similar thereby proving that this methodology leads to the learning of some canonical positional information.

Run set
1


✌️ Conclusion

In this report we explored Graph Neural Networks with Learnable Structural and Positional Representations by Vijay Prakash Dwivedi, Anh Tuan Luu, Thomas Laurent, Yoshua Bengio and Xavier Bresson, in which the authors propose a new method to incorporate canonical positional information into the model training paradigm, through Laplacian eigenvectors.
This report will also be followed by a full Graph Neural Network series (from Scratch to SOTA) in the coming months, stay tuned to Fully Connected for that !! Check out these other reports on Fully Connected covering other hot topics in Graph Neural Networks.
If you want to cite the paper in your own research kindly use the following BibTeX :-
@article{DBLP:journals/corr/abs-2110-07875,
author = {Vijay Prakash Dwivedi and
Anh Tuan Luu and
Thomas Laurent and
Yoshua Bengio and
Xavier Bresson},
title = {Graph Neural Networks with Learnable Structural and Positional Representations},
journal = {CoRR},
volume = {abs/2110.07875},
year = {2021},
url = {https://arxiv.org/abs/2110.07875},
eprinttype = {arXiv},
eprint = {2110.07875},
timestamp = {Fri, 22 Oct 2021 13:33:09 +0200},
biburl = {https://dblp.org/rec/journals/corr/abs-2110-07875.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}

Iterate on AI agents and models faster. Try Weights & Biases today.