A Brief Introduction to Mixture Model Networks (MoNet)

This article provides an overview of the Mixture Model Networks (MoNet) architecture, with code examples in PyTorch Geometric and interactive visualizations using W&B.
Saurav Maheshkar
Created on September 5|Last edited on June 28
Comment
In this article, we'll briefly go over the mixture models networks (MoNet) architecture proposed in the paper Geometric deep learning on graphs and manifolds using mixture model CNNs by Federico Monti, Davide Boscaini, Jonathan Masci, Emanuele Rodolà, Jan Svoboda and Michael M. Bronstein. 
This is one of the fundamental models from the graph attention networks paradigm inspired by work in spectral graph theory and offers a different definition of the convolutional operation compared to the spectral domain. If you'd like to dig into our associated colab, you can find that link here: 
﻿Check out the Colab Notebook of these experiments here ﻿⟶\longrightarrow⟶﻿﻿
﻿﻿There are three main classes of models of graph neural networks, namely message-passing graph neural networks, graph convolutional networks and graph attention networks. For a brief overview of the three paradigms, you can refer to the following blog posts:
An Introduction to Graph Attention Networks
This article provides a beginner-friendly introduction to Attention based Graphical Neural Networks (GATs), which apply deep learning paradigms to graphical data. 
An Introduction to Convolutional Graph Neural Networks
This article provides a beginner-friendly introduction to Convolutional Graph Neural Networks (GCNs), which apply deep learning paradigms to graphical data. 
An Introduction to Message Passing Graph Neural Networks
This article provides a beginner-friendly introduction to Message Passing Graph Neural Networks (MPGNNs), which apply deep learning paradigms to graphical data. 
﻿
Table of ContentsDefinition of MoNetMoNet MethodImplementing the MoNet ModelMoNet Model: Training Results SummaryRecommended Reading
﻿
﻿
Definition of MoNetStrictly speaking, MoNet comes under the banner of spectral models as opposed to spatial models. The definition of convolution that we have discussed so far in our introductory article on graph convolutional nets is based on approximations of the definition of convolution from the Euclidean domain. 
Whereas arguably the definition of convolution in spectral models is more true to its intention. Spectral methods define the convolution operation to perform template matching over small local "patches."
In particular, these models propose a way to make patches and perform operations as a function of local graphs or manifolds. However, from a wider perspective, this method relies on local patches and then computes a metric between the patches (template matching) so it's generally grouped under the banner of graph attention networks.
MoNet MethodAs we explained graph attention networks (GAT) as the extension of a general formulation, we'll do the same here. The general formula for computing intermediate representations in attention-based graph neural networks are:
hv=ϕ( xu, ⊕v∈Nuψ(xu,xv))\huge h_v = \phi(\, x_u, \, \oplus_{v \in \mathcal{N}_u} \psi(x_u,x_v))hv​=ϕ(xu​,⊕v∈Nu​​ψ(xu​,xv​))﻿
In the case of mixture model networks (MoNet) the attention mechanism (i.e. ψ\large \psiψ﻿) and the wider update rule is as follows:
hv=1∣N(u)∣∑v∈Nu1K∑k=1Kwk(euv)⊙Wxv\huge h_v = \displaystyle \frac{1}{|\mathcal{N}(u)|} \sum_{v \in \mathcal{N}_u} \frac{1}{K} \sum_{k=1}^{K} w_k (e_{uv}) \odot Wx_vhv​=∣N(u)∣1​v∈Nu​∑​K1​k=1∑K​wk​(euv​)⊙Wxv​﻿
Very complicated looking I know! The K\large KK﻿ here is the number of kernels, so we just focus on wk(euv)⊙Wxv\large w_k(e_{uv}) \odot Wx_vwk​(euv​)⊙Wxv​﻿, where e\large ee﻿ are edge features. It's worth mentioning that mixture model networks introduced and studied a family of functions represented a mixture of Gaussian kernels.
Family of Functions of Gaussian Kernelswk(e)=exp(−12(e−μk)T∑k−1(e−μk))\huge w_k(e) = \text{exp}( - \frac{1}{2} (e - \mu_k)^{T} \sum_{k}^{-1} (e - \mu_k))wk​(e)=exp(−21​(e−μk​)T∑k−1​(e−μk​))﻿
﻿
Implementing the MoNet ModelAs with other models discussed in the series, we go to PyTorch Geometric again for implementation of the attention mechanism discussed above outlined in the paper (GMMConv). 
Let's walk through a minimal example implementation:
class MoNet(torch.nn.Module):
    def __init__(self, in_channels, hidden_channels, out_channels):
        super().__init__()
        self.conv1 = GMMConv(in_channels, hidden_channels, dim=2, kernel_size=16)
        self.conv2 = GMMConv(hidden_channels, out_channels, dim=2, kernel_size=16)
﻿
    def forward(self, data):
        x, edge_index, edge_attr = data.x, data.edge_index, data.edge_attr
        x = F.dropout(x, p=0.5, training=self.training)
        x = F.elu(self.conv1(x, edge_index, edge_attr))
        x = F.dropout(x, p=0.5, training=self.training)
        x = self.conv2(x, edge_index, edge_attr)
        return F.log_softmax(x, dim=1)
MoNet Model: Training Results We train some models for 50 epochs to perform Node classification on the Cora Dataset, using the minimal model implementation as stated above, and report the training loss and accuracy, comparing the effect of the hidden dimension on the overall performance.
﻿
Run set3
﻿
SummaryIn this article, we learned about the Mixture Model Networks (MoNet) architecture, along with code and interactive visualizations. To see the full suite of W&B features, please check out this short 5 minutes guide.
If you want more reports covering graph neural networks with code implementations, let us know in the comments below or on our forum ✨!
Check out these other reports on Fully Connected covering other Graph Neural Networks-based topics and ideas.
Recommended Reading
An Introduction to GraphSAGE
This article provides an overview of the GraphSAGE neural network architecture, complete with code examples in PyTorch Geometric, and visualizations using W&B. 
A Brief Introduction to Residual Gated Graph Convolutional Networks
This article provides a brief overview of the Residual Gated Graph Convolutional Network architecture, complete with code examples in PyTorch Geometric and interactive visualizations using W&B. 
What are Graph Isomorphism Networks?
This article provides a brief overview of Graph Isomorphism Networks (GIN), complete with code examples in PyTorch Geometric and interactive visualizations using W&B. 
A Brief Introduction to Graph Attention Networks 
This article provides a brief overview of the Graph Attention Networks architecture, complete with code examples in PyTorch Geometric and interactive visualizations using W&B. 
﻿
﻿
Add a comment
Tags: Domain Agnostic, Articles, GNN, Tutorial, Panels
Iterate on AI agents and models faster. Try Weights & Biases today.