Skip to main content

3.Motifs and Structural Roles in Networks

Created on December 21|Last edited on January 13

Subgraphs

Subnetworks or subgraphs are the building blocks of networks. Based on the kind of subgraphs present in the network we can characterize or discriminate a network.

Below image contains all the possible(non-isomorphic) directed subgraphs of size 3

subgraphs of size 3

For each subgraph in the above image, our graph might over represent it or under represent it based on some significance metric. Then our graph will have a significance profile based on all the 13 possible subgraphs of size 3.

The significance profile of a graph will be 13-d vector with how well each size 3 subgraph is represented with in it.

Here you can see how significance profile of various kinds of networks looks like. And networks from same domain have similar significance profiles.

significance profiles of various networks

Correlation matrix

From the correlation matrix, and the significance profiles, you can see how all the gene regulation networks have the same profile, all the neuron related networks and other networks from similar domains have similar profiles and are highly correlated.

Network Motifs: Simple Building Blocks of Complex Networks R. Milo et al Superfamilies of Evolved and Designed Networks R. Milo et al



Network Motifs

Recurring, significant patterns of interconnections. Pattern is a small induced subgraph. The pattern will be significant if its found in the network more frequent than expected, i.e, in a randomly generated network that has similar properties to our own network.

We need motifs to understand how networks work. Help us predict operation and reaction of a network in a given situation.

Feed-forward loops are found in networks of neurons, where they neutralize "biological noise" Parallel loops are found in food webs Single-input modules are found in gene control networks.

Induced subgraph of a graph G is a graph, formed from a subset of X of the vertices of graph G and all of the edges connecting pairs of vertices in subset X. While calculating the frequency we allow overlapping of motifs. For a sample randomly generated graph G here is the motif frequency. Ignore the motifs with isolated nodes in this picture.

Significance of a Motif

The significance of a motif is defined by how much more it appears in real network than in a random network.

Motifs are over represented in a network when compared to randomized networks. ZiZ_i captures the statistical significance of motif i

Zi=Nireal−N‾irandstd(Nirand)\displaystyle Z_i = \frac{N_i^{real} - \overline N_i^{rand}}{std(N_i^{rand})}

NirealN_i^{real} is number of subgraphs of type i in the real network GrealG^{real}

NirandN_i^{rand} is number of subgraphs of type i in the randomized network GrandG^{rand}

And the network significance profile(SP) is defined as SPi=Zi(∑jZj2)1/2\displaystyle SP_i = \frac{Z_i}{(\sum_j{Z_j^2})^{1/2}}

SP is a vector of normalized Z-scores, SP emphasizes relative significance of subgraphs. Important for comparison of networks of different sizes. As larger networks display higher Z-scores

Random Graph Generation

Now we have to generate the randomized network based on the real network, that we can compare. We use this as the "null" model for a given real world network. So our goal is to generate a random graph with the same degree sequence k1,k2,⋯ ,knk_1, k_2, \cdots, k_n of our real network. Once we generate GrandG^rand which has the same degree sequence as GrealG^{real} we can get the significance profile.

Configuration Model/Spokes

Based on the degree of our nodes, create mini-nodes for each of our original node. If node i has degree kik_i the i node has kik_i mini nodes in them. And we randomly pair up the mini-nodes. And after all the mini-nodes are paired up. We get our resulting randomized graph by connecting our parent nodes if their respective mini nodes formed an edge. Sort of like folding in bipartite graphs. Below you can see how the nodes A, B, C and D with degrees 3, 4, 2, and 1 form a graph. During this process we ignore if there are any double edges or self-loops in the final step. spokes

Switching

Alternative to spokes model is you just pick two edges A→BA \rarr B and C→DC \rarr D at random and exchange edges only if no multiple edges or self edges are generated. So the original edges will now be A→DA \rarr D and C→BC \rarr B

Once we repeat the switching step Q.∣E∣Q.|E| times. The resultant graph will be with nodes of same degrees but randomly rewired. Typical values of Q are 100 for the process to converge.

Variations on Motif Concept

Canonical definition:

  • Directed and undirected
  • Colored and uncolored
  • Temporal and static motifs

Variations on the Concept

  • Different frequency concepts
  • Different significance metrics
  • Under-representation (anti-motifs)
  • Different constraints for the null model

Experiments: Detecting Motifs

In this paper Network Motifs: Simple Building Blocks of Complex Networks R. Milo et al you can see Z-scores of motifs in various networks.

experiments



Graphlets: Node feature vectors

From paper Biological network comparison using graphlet degree distribution

Graphlets are connected non-isomorphic subgraphs. Below you can see graphlets from 2,3, 4 and 5 node subgraphs.

graphlets

We use graphlets to obtain a node-level subgraph metric. The position of the node in the graphlet is what matters.

Extending the node degree concept, we can get a graphlet degree vector count for each node. The number of graphlets a given node touches.

Automorphism Orbits

For a node uu of graph G, the automorphism orbit of uu is Orb(u)={v∈V(G);v=f(u) for some f=Aut(G)}Orb(u) = \lbrace v \in V(G); v = f(u) \text{ for some } f = Aut(G)\rbrace

The Aut denotes an automorphism group of G, i.e., an isomorphism from G to itself. This is a bit confusing, you can look at the image below to see how a,b,c,da, b, c, d are orbit positions for the node vv in graph GG. You assume vv in those positions and count the number of instances that graphlet touches node vv orbit positions The node vv can assume any of the positions a,b,da, b, d in the graphlets. But not cc Hence in the graphlet degree vector you put in 0 for orbit position cc graphlet degree vector

Graphlet Degree Vector (GDV)

Graphlet degree vector counts the number of graphlets that a node touches at a particular orbit position. Considering graphlets on 2 to 5 nodes we get a total of 73 orbit positions. Hence GDV will be a vector of 73 coordinates. It is a signature of a given node that describes the topology of node's neighbourhood. Captures its interconnectivities out to a distance of 4 hops.

You can compare GDV of two nodes and get a highly constraining measure of local topological similarity between them. Maybe you can use it as your node feature vector??



Finding Motifs and Graphlets

Finding size-k motifs/graphlets requires solving two challenges. Enumerating all size-k connected subgraphs and Counting number of occurrences of each subgraph type.

Just knowing if a certain subgraph exists in a graph is a hard computational problem. Subgraph isomorphism is NP-complete.

Computation time grows exponentially as the size of the motif/graphlet increases. Feasible motif size is usually small (3 to 8)

Algorithms

  • Exact subgraph enumeration (ESU) [Wernicke 2006]
  • Kavosh [Kashani 2009]
  • Subgraph sampling [Kashtan]

Exact Subgraph Enumeration(ESU)

TODO



Structural roles in networks

Roles are functions of nodes in a Network. Roles are species in ecosystem networks, individuals in companies. And roles are measured by structural behaviours.

  • Centers of stars
  • members of cliques
  • peripheral nodes, etc.

Roles and groups/communities are not the same in a network. A collection of nodes that have similar positions in a network. Roles are based on the similarity of ties between subsets of nodes. Group or a community is formed based on adjacency, proximity or reachability. Nodes with the same role need not be in direct, or even indirect interaction with each other. Two different communities might have two different nodes doing the same role.

Roles →\rarr group of nodes with similar structural properties Community →\rarr group of nodes that are well connected to each other.

Roles and communities are complimentary.

Consider the social network of a University. Roles: Professors, Staff, Students Communities: CS Dept, EE Dept and etc.

Roles vs Communities

Formal Definition of Roles

Structural equivalence: Nodes uu and vv are structurally equivalent if they have the same relationships to all other nodes.

Nodes uu and vv are structurally equivalent, for all the other nodes kk, node uu has tie to kk iff node vv has tie to kk

TODO: umm what is tie here? need more clarity

Node 3 and 4 are structurally equivalent in the below graph roles smol graph



Discovering Structural Roles in Networks

Following are few ml problems where figuring out roles can be useful

TaskExample Application
Role QueryIdentify individuals with similar behaviour to a known target
Role OutliersIdentify individuals with unusual behaviour
Role DynamicsIdentify unusual changes in behaviour
Identity ResolutionIdentify, de-anonymize, individuals in a network
Role TransferUse knowledge of one network to make predictions in another
Network ComparisonCompute similarity of networks, determine compatibility for knowledge transfer

RolX

Automatic discovery of nodes' structural roles in networks [Henderson 2011b]

This is an unsupervised learning approach, with no requirement of prior knowledge, assigns a mixed-membership of roles to each node. Also scales linearly wrt no of edges.

Gist of this method is generate node features, and apply clustering algos on these node features to extract roles distribution of each node.

Node features can be degree, mean weight, no of edges in egonet, mean clustering coefficient of neighbours, no of edges leaving/entering egonet.

Neighbourhood features: What is a node's connectivity pattern?

Recursive features: To what kind of nodes is a node connected?

Aggregate features of a node and use them to generate new recursive features.

  • Local features
  • Egonet features
  • Recursive features

Use set of current node features to generate additional features, recursive features grow exponentially with each recursive iteration. Prune highly correlated features.

Rolx is good old machine learning, where you do feature engineering on nodes and classify them into different roles using various unsupervised techniques.