10.Deep Generative Models for Graphs
Graph Generation
In a way the previous chapters spoke about encoding graph structure by generating node embeddings based on local network neighbourhood. Graph neural networks are different from traditional neural networks in a sense that we want the neural network to capture feature information and also the graph structure.
Graph encoders →\rarr graph to embeddings
Graph decoders →\rarr embeddings to graphs
Problems with Graph generation
Our goal is to generate realistic graphs. In the earlier reports we had methods of generating null models for an existing real graph.
Synthetic graph generation is useful to understand graph formation process, identifying abnormal parts of the graph, predicting new structures, simulations of novel graph structures, in some cases graph completion in case of partially available graph data. Synthetic graph generation has real use cases like generating a new kind of molecule that cures a certain disease, given a class of molecules like toxic/non-toxic we want new molecules that satisfy this class.
Graph generation can be divided into two parts, identifying a good model that represents a graph, and how we generate a graph once we have a model.
Realistic graph generation - purely ml pov, all we care about is a black box that generates a good graph.
Goal directed graph generation, that optimize a certain objective function, find a new molecule that is toxic and etc.
In drug discovery, we might want to make a molecule that is incomplete with a certain disease fighting, therapeutic properties and we might want to make it more soluble.
Naively if we think of graph generation as a problem, a graph of nn nodes have n2n^2 values. 1K nodes, we need to generate 1 Million values. Large and variable output spaces. And the same graph has so many different adjacency matrix representations. Graphs also have complex long range dependencies, while generating a new node or an edge on an existing graph, slight difference in the edge placement might change the property of the whole graph. Existence of an edge may depend on the entire graph.
ML Basics for Graph Generation.
In ML terms in a graph generation task, we are given set of real graphs from a real data distribution pdata(G)p_{data}(G), our goal is to capture this distribution of graphs and mimic it to generate new graphs. We need to learn the distribution pmodel(G)p_{model}(G) and also sample from it.
- pdata(x)p_{data}(x) is the data distribution, which is never known to us, other than existing samples of graphs
- pmodel(x;θ)p_{model}(x; \theta) is the model, parameterized by θ\theta, that we use to approximate pdata(x)p_{data}(x)
Our goal is to get pmodelp_{model} close to pdatap_{data} and then sample from pmodelp_{model}
We use maximum likelihood to build a model that maximises the likelihood of our data
θ∗=arg maxθEx∼pdatalogpmodel(x∣θ)\displaystyle \theta^* = \argmax_\theta \mathbb{E}_{x \sim p_{data}} \log p_{model}(x | \theta)
The most common approach in sampling from a complex distribution is that we sample ziz_i from a simple noise distribution, and then transform this noise using some function f(.)f( . ) to get xix_i from our complex distribution. We use deep neural networks, train it on the graph data we have and find our f(.)f(.)
In this report we discuss auto-regressive generative models. Which means we will do this generation sequentially and the next steps will depend on what we have generated so far. If we consider a generated graph, and separate it into a sequence of all the graphs generated in the middle, by applying chain rule, without loss of generality we will get
pmodel(x;θ)=∏t=1npmodel(xt∣x1,⋯ ,xt−1;θ)\displaystyle p_{model}(x; \theta) = \prod_{t=1}^n p_{model}(x_t|x_1,\cdots,x_{t-1};\theta)
Every next action depends on all the previous actions. xtx_t will be the t-th action like adding a node, add all the edges connecting this node to the graph.
GraphRNN
We use graph recurrent neural networks as our auto-regressive generative model, whatever we generated till now, we feed it back to the recurrent neural network to get our next step.
A sample sequentially generation of graph looks like below
π\pi represents the ordering of the nodes and edge additions SπS^{\pi}. If we fix the ordering of the nodes, the sequence of edges will be unique for a graph. The ordering of nodes can be random.
We need two generative process, one generative process to add new nodes, and the second generative process to add all the edges that connect the node to the previous nodes if it exists. Node level generative process and edge level generative process. π\pi is a sequence of nodes which is at the node level and at the edge level SπS^{\pi} is a sequence of sequences, where SnπS_n^\pi represents the sequence of edges to be added to the node n to the existing graph generated till now.
Sπ=(S1π,S2π,⋯ )S^\pi = (S_1^\pi, S_2^\pi, \cdots)
The two generative processes will look like the image below with the adjacency matrix representation
We transformed the graph generation problem in a sequence of sequences generation problem. We use RNN to model both these generation processes. We will have a node level RNN and a edge level RNN and these two are both related. Node level RNN generate the initial state for the edge-level RNN and edge-level RNN generates edges for the new node, then updates the state for node-level RNN.
A single cell of a RNN has the following components and outputs
- sts_t: state of RNN after time t
- xtx_t: Input of RNN at time t
- yty_t: Output of RNN at time t
- W, U, V: parameters of the RNN
- σ\sigma: non-linearity
st=σ(W.xt+U.xt−1)s_t = \sigma(W.x_t + U.x_{t-1}) and yt=V.sty_t = V.s_t
We chain these RNN cells by using the output of one cell as the input of the next cell to generate a sequence. For start and end of the sequences, we use tokens like SOS, EOS. And SOS is the initial state and initial output for the first cell. And we stop generating when the RNN cell output EOS. But this model is deterministic generative model if we think of x, y as nodes and edges. We make this stochastic by thinking of y as a probability. We sample from the probability distribution y and get the sample and feed it as state for the next RNN cell. yty_t follows a Bernoulli distribution
Once we have the model, we get the node and edges using the outputs of the RNN cells, we train this model using the principle called Teacher Forcing, we initialize random weights to our RNNs and then instead of sampling from the output of the RNN cells to use as inputs for the next cell we use the actual sequence we have from our graph data. We also compute loss on the probabilities outputted at each cell based on the actual sequence. We compute loss on output and replace inputs to feed in the RNN based on the actual sequence. We back-propagate through the whole sequence of RNN cells minimizing the loss and find out the parameters U, V and W.
In our case we use binary cross entropy loss L=−[yt∗log(yt)]+(1−yt∗)log(1−yt)L = -[y_t^*\log(y_t)] + (1-y_t^*)\log(1-y_t) where y∗y^* is real value from the actual sequence and yy the probability.
This is how the two sequences of RNNs look while training
One major issue with this is that any node can connect to any prior node, there might be too many steps for edge generation in edge-level RNN. We need to generate the full adjacency matrix. We solve this by using BFS node ordering. So we make sure while training on a graph, we order the nodes in a way, any new node added will have edges to the most recent added nodes. We reduce the O(n!)O(n!) number of node orderings to the number of distinct BFS orderings. We also reduce the number of previous nodes to look at. Below you can see how BFS orderings will make the GraphRNN much more tractable.
Evaluating Generated Graphs
Our goal is define similarity of two graphs but as there are no efficient graph isomorphism test that can be applied to any class of graphs we use visual similarity as a basic check and the use graph statistics as another test.
Visual comparison of various generative models.
As we can see GraphRNN performs much better than other generative models in case of grid structure and also does well in case of ego networks and graphs with communities.
Applications
Learning a model that can generate valid, realistic molecules with high value of a given chemical property. We want to be able to generate graphs that optimize a given objective like drug-likeness, obey underlying rules like chemical valency rules and we also have to learn from examples that seem realistic.
Graph Convolutional Policy Network(GCPN)
We use Graph Neural network to capture state of what we generated so far that has structural information, enables validity check in each state transition. Reinforcement learning optimizes intermediate or expected reward. And Adversarial training imitates examples of molecules in a given datasets.
Chapters
- Introduction, Structure of Graphs
- Properties of Networks and Random Graph Models
- Motifs and Structural Roles in Networks
- Community Structure in Networks
- Spectral Clustering
- Message Passing and Node Classification
- Graph Representation Learning
- Graph Neural Networks
- Graph Neural Networks - Pytorch Geometric
- Deep Generative Models for Graphs
- Link Analysis: PageRank
- Network Effects and Cascading Behaviour
- Probabilistic Contagion and Models of Influence
- Influence Maximization in Networks
- Outbreak Detection in Networks
- Network Evolution
- Reasoning over Knowledge Graphs
- Limitations of Graph Neural Networks
- Applications of Graph Neural Networks