3.Motifs and Structural Roles in Networks
Subgraphs
Subnetworks or subgraphs are the building blocks of networks. Based on the kind of subgraphs present in the network we can characterize or discriminate a network.
Below image contains all the possible(non-isomorphic) directed subgraphs of size 3
For each subgraph in the above image, our graph might over represent it or under represent it based on some significance metric. Then our graph will have a significance profile based on all the 13 possible subgraphs of size 3.
The significance profile of a graph will be 13-d vector with how well each size 3 subgraph is represented with in it.
Here you can see how significance profile of various kinds of networks looks like. And networks from same domain have similar significance profiles.
From the correlation matrix, and the significance profiles, you can see how all the gene regulation networks have the same profile, all the neuron related networks and other networks from similar domains have similar profiles and are highly correlated.
Network Motifs: Simple Building Blocks of Complex Networks R. Milo et al Superfamilies of Evolved and Designed Networks R. Milo et al
Network Motifs
Recurring, significant patterns of interconnections. Pattern is a small induced subgraph. The pattern will be significant if its found in the network more frequent than expected, i.e, in a randomly generated network that has similar properties to our own network.
We need motifs to understand how networks work. Help us predict operation and reaction of a network in a given situation.
Feed-forward loops are found in networks of neurons, where they neutralize "biological noise" Parallel loops are found in food webs Single-input modules are found in gene control networks.
Induced subgraph of a graph G is a graph, formed from a subset of X of the vertices of graph G and all of the edges connecting pairs of vertices in subset X. While calculating the frequency we allow overlapping of motifs. For a sample randomly generated graph G here is the motif frequency. Ignore the motifs with isolated nodes in this picture.
Significance of a Motif
The significance of a motif is defined by how much more it appears in real network than in a random network.
Motifs are over represented in a network when compared to randomized networks. ZiZ_i captures the statistical significance of motif i
Zi=Nireal−N‾irandstd(Nirand)\displaystyle Z_i = \frac{N_i^{real} - \overline N_i^{rand}}{std(N_i^{rand})}
NirealN_i^{real} is number of subgraphs of type i in the real network GrealG^{real}
NirandN_i^{rand} is number of subgraphs of type i in the randomized network GrandG^{rand}
And the network significance profile(SP) is defined as SPi=Zi(∑jZj2)1/2\displaystyle SP_i = \frac{Z_i}{(\sum_j{Z_j^2})^{1/2}}
SP is a vector of normalized Z-scores, SP emphasizes relative significance of subgraphs. Important for comparison of networks of different sizes. As larger networks display higher Z-scores
Random Graph Generation
Now we have to generate the randomized network based on the real network, that we can compare. We use this as the "null" model for a given real world network. So our goal is to generate a random graph with the same degree sequence k1,k2,⋯ ,knk_1, k_2, \cdots, k_n of our real network. Once we generate GrandG^rand which has the same degree sequence as GrealG^{real} we can get the significance profile.
Configuration Model/Spokes
Based on the degree of our nodes, create mini-nodes for each of our original node. If node i has degree kik_i the i node has kik_i mini nodes in them. And we randomly pair up the mini-nodes. And after all the mini-nodes are paired up. We get our resulting randomized graph by connecting our parent nodes if their respective mini nodes formed an edge. Sort of like folding in bipartite graphs. Below you can see how the nodes A, B, C and D with degrees 3, 4, 2, and 1 form a graph. During this process we ignore if there are any double edges or self-loops in the final step.
Switching
Alternative to spokes model is you just pick two edges A→BA \rarr B and C→DC \rarr D at random and exchange edges only if no multiple edges or self edges are generated. So the original edges will now be A→DA \rarr D and C→BC \rarr B
Once we repeat the switching step Q.∣E∣Q.|E| times. The resultant graph will be with nodes of same degrees but randomly rewired. Typical values of Q are 100 for the process to converge.
Variations on Motif Concept
Canonical definition:
- Directed and undirected
- Colored and uncolored
- Temporal and static motifs
Variations on the Concept
- Different frequency concepts
- Different significance metrics
- Under-representation (anti-motifs)
- Different constraints for the null model
Experiments: Detecting Motifs
In this paper Network Motifs: Simple Building Blocks of Complex Networks R. Milo et al you can see Z-scores of motifs in various networks.
Graphlets: Node feature vectors
From paper Biological network comparison using graphlet degree distribution
Graphlets are connected non-isomorphic subgraphs. Below you can see graphlets from 2,3, 4 and 5 node subgraphs.
We use graphlets to obtain a node-level subgraph metric. The position of the node in the graphlet is what matters.
Extending the node degree concept, we can get a graphlet degree vector count for each node. The number of graphlets a given node touches.
Automorphism Orbits
For a node uu of graph G, the automorphism orbit of uu is Orb(u)={v∈V(G);v=f(u) for some f=Aut(G)}Orb(u) = \lbrace v \in V(G); v = f(u) \text{ for some } f = Aut(G)\rbrace
The Aut denotes an automorphism group of G, i.e., an isomorphism from G to itself.
This is a bit confusing, you can look at the image below to see how a,b,c,da, b, c, d are orbit positions for the node vv in graph GG. You assume vv in those positions and count the number of instances that graphlet touches node vv
The node vv can assume any of the positions a,b,da, b, d in the graphlets. But not cc Hence in the graphlet degree vector you put in 0 for orbit position cc
Graphlet Degree Vector (GDV)
Graphlet degree vector counts the number of graphlets that a node touches at a particular orbit position. Considering graphlets on 2 to 5 nodes we get a total of 73 orbit positions. Hence GDV will be a vector of 73 coordinates. It is a signature of a given node that describes the topology of node's neighbourhood. Captures its interconnectivities out to a distance of 4 hops.
You can compare GDV of two nodes and get a highly constraining measure of local topological similarity between them. Maybe you can use it as your node feature vector??
Finding Motifs and Graphlets
Finding size-k motifs/graphlets requires solving two challenges. Enumerating all size-k connected subgraphs and Counting number of occurrences of each subgraph type.
Just knowing if a certain subgraph exists in a graph is a hard computational problem. Subgraph isomorphism is NP-complete.
Computation time grows exponentially as the size of the motif/graphlet increases. Feasible motif size is usually small (3 to 8)
Algorithms
- Exact subgraph enumeration (ESU) [Wernicke 2006]
- Kavosh [Kashani 2009]
- Subgraph sampling [Kashtan]
Exact Subgraph Enumeration(ESU)
TODO
Structural roles in networks
Roles are functions of nodes in a Network. Roles are species in ecosystem networks, individuals in companies. And roles are measured by structural behaviours.
- Centers of stars
- members of cliques
- peripheral nodes, etc.
Roles and groups/communities are not the same in a network. A collection of nodes that have similar positions in a network. Roles are based on the similarity of ties between subsets of nodes. Group or a community is formed based on adjacency, proximity or reachability. Nodes with the same role need not be in direct, or even indirect interaction with each other. Two different communities might have two different nodes doing the same role.
Roles →\rarr group of nodes with similar structural properties Community →\rarr group of nodes that are well connected to each other.
Roles and communities are complimentary.
Consider the social network of a University. Roles: Professors, Staff, Students Communities: CS Dept, EE Dept and etc.
Formal Definition of Roles
Structural equivalence: Nodes uu and vv are structurally equivalent if they have the same relationships to all other nodes.
Nodes uu and vv are structurally equivalent, for all the other nodes kk, node uu has tie to kk iff node vv has tie to kk
TODO: umm what is tie here? need more clarity
Node 3 and 4 are structurally equivalent in the below graph
Discovering Structural Roles in Networks
Following are few ml problems where figuring out roles can be useful
Task | Example Application |
---|---|
Role Query | Identify individuals with similar behaviour to a known target |
Role Outliers | Identify individuals with unusual behaviour |
Role Dynamics | Identify unusual changes in behaviour |
Identity Resolution | Identify, de-anonymize, individuals in a network |
Role Transfer | Use knowledge of one network to make predictions in another |
Network Comparison | Compute similarity of networks, determine compatibility for knowledge transfer |
RolX
Automatic discovery of nodes' structural roles in networks [Henderson 2011b]
This is an unsupervised learning approach, with no requirement of prior knowledge, assigns a mixed-membership of roles to each node. Also scales linearly wrt no of edges.
Gist of this method is generate node features, and apply clustering algos on these node features to extract roles distribution of each node.
Node features can be degree, mean weight, no of edges in egonet, mean clustering coefficient of neighbours, no of edges leaving/entering egonet.
Neighbourhood features: What is a node's connectivity pattern?
Recursive features: To what kind of nodes is a node connected?
Aggregate features of a node and use them to generate new recursive features.
- Local features
- Egonet features
- Recursive features
Use set of current node features to generate additional features, recursive features grow exponentially with each recursive iteration. Prune highly correlated features.
Rolx is good old machine learning, where you do feature engineering on nodes and classify them into different roles using various unsupervised techniques.
Chapters
- Introduction, Structure of Graphs
- Properties of Networks and Random Graph Models
- Motifs and Structural Roles in Networks
- Community Structure in Networks
- Spectral Clustering
- Message Passing and Node Classification
- Graph Representation Learning
- Graph Neural Networks
- Graph Neural Networks - Pytorch Geometric
- Deep Generative Models for Graphs
- Link Analysis: PageRank
- Network Effects and Cascading Behaviour
- Probabilistic Contagion and Models of Influence
- Influence Maximization in Networks
- Outbreak Detection in Networks
- Network Evolution
- Reasoning over Knowledge Graphs
- Limitations of Graph Neural Networks
- Applications of Graph Neural Networks