Skip to main content

A Brief Introduction to Mixture Model Networks (MoNet)

This article provides an overview of the Mixture Model Networks (MoNet) architecture, with code examples in PyTorch Geometric and interactive visualizations using W&B.
Created on September 5|Last edited on June 28
In this article, we'll briefly go over the mixture models networks (MoNet) architecture proposed in the paper Geometric deep learning on graphs and manifolds using mixture model CNNs by Federico Monti, Davide Boscaini, Jonathan Masci, Emanuele Rodolà, Jan Svoboda and Michael M. Bronstein.
This is one of the fundamental models from the graph attention networks paradigm inspired by work in spectral graph theory and offers a different definition of the convolutional operation compared to the spectral domain. If you'd like to dig into our associated colab, you can find that link here:
There are three main classes of models of graph neural networks, namely message-passing graph neural networks, graph convolutional networks and graph attention networks. For a brief overview of the three paradigms, you can refer to the following blog posts:


Table of Contents





Definition of MoNet

Strictly speaking, MoNet comes under the banner of spectral models as opposed to spatial models. The definition of convolution that we have discussed so far in our introductory article on graph convolutional nets is based on approximations of the definition of convolution from the Euclidean domain.
Whereas arguably the definition of convolution in spectral models is more true to its intention. Spectral methods define the convolution operation to perform template matching over small local "patches."
In particular, these models propose a way to make patches and perform operations as a function of local graphs or manifolds. However, from a wider perspective, this method relies on local patches and then computes a metric between the patches (template matching) so it's generally grouped under the banner of graph attention networks.

MoNet Method

As we explained graph attention networks (GAT) as the extension of a general formulation, we'll do the same here. The general formula for computing intermediate representations in attention-based graph neural networks are:
hv=ϕ(xu,vNuψ(xu,xv))\huge h_v = \phi(\, x_u, \, \oplus_{v \in \mathcal{N}_u} \psi(x_u,x_v))

In the case of mixture model networks (MoNet) the attention mechanism (i.e. ψ\large \psi) and the wider update rule is as follows:
hv=1N(u)vNu1Kk=1Kwk(euv)Wxv\huge h_v = \displaystyle \frac{1}{|\mathcal{N}(u)|} \sum_{v \in \mathcal{N}_u} \frac{1}{K} \sum_{k=1}^{K} w_k (e_{uv}) \odot Wx_v

Very complicated looking I know! The K\large K here is the number of kernels, so we just focus on wk(euv)Wxv\large w_k(e_{uv}) \odot Wx_v, where e\large e are edge features. It's worth mentioning that mixture model networks introduced and studied a family of functions represented a mixture of Gaussian kernels.

Family of Functions of Gaussian Kernels

wk(e)=exp(12(eμk)Tk1(eμk))\huge w_k(e) = \text{exp}( - \frac{1}{2} (e - \mu_k)^{T} \sum_{k}^{-1} (e - \mu_k))



Implementing the MoNet Model

As with other models discussed in the series, we go to PyTorch Geometric again for implementation of the attention mechanism discussed above outlined in the paper (GMMConv).
Let's walk through a minimal example implementation:
class MoNet(torch.nn.Module):
def __init__(self, in_channels, hidden_channels, out_channels):
super().__init__()
self.conv1 = GMMConv(in_channels, hidden_channels, dim=2, kernel_size=16)
self.conv2 = GMMConv(hidden_channels, out_channels, dim=2, kernel_size=16)

def forward(self, data):
x, edge_index, edge_attr = data.x, data.edge_index, data.edge_attr
x = F.dropout(x, p=0.5, training=self.training)
x = F.elu(self.conv1(x, edge_index, edge_attr))
x = F.dropout(x, p=0.5, training=self.training)
x = self.conv2(x, edge_index, edge_attr)
return F.log_softmax(x, dim=1)

MoNet Model: Training Results

We train some models for 50 epochs to perform Node classification on the Cora Dataset, using the minimal model implementation as stated above, and report the training loss and accuracy, comparing the effect of the hidden dimension on the overall performance.

Run set
3


Summary

In this article, we learned about the Mixture Model Networks (MoNet) architecture, along with code and interactive visualizations. To see the full suite of W&B features, please check out this short 5 minutes guide.
If you want more reports covering graph neural networks with code implementations, let us know in the comments below or on our forum ✨!
Check out these other reports on Fully Connected covering other Graph Neural Networks-based topics and ideas.

Iterate on AI agents and models faster. Try Weights & Biases today.