4.Community Structure in Networks

Created on December 21|Last edited on January 13
Comment
﻿
CommunitiesWe studied roles, which are nodes clustered together based on their structural position with in the network even though they might in very different parts of the network. Communities on the other hand or the notion of community structure in networks is essentially identifying sets of nodes that are densly connected with each other. Our goal is identifying partitions, groups or communities that are tightly knit within the network.
GranovetterNow we will try to understand how information flows through the network. Mark Granovetter a social scientist found out that people know about new jobs mostly from acquaintances rather than close friends. One simple explanation is that people usually have more acquaintances than friends. But is there a better, more generic explanation for this phenomenon.
Structural perspective on friendship is that if two nodes have a common friend, they are more likely to become friends themself. And on top of that the interpersonal perspective is that friendship between two people can be strong or weak. Structurally embedded edges are also socially strong with strong bonds. And long ranging edges spanning different parts of the network are socially weak. But Structurally embedded nodes have redundant information, where as long range edges allow us to gather information from different parts of the network and has diverse information.
Clustered nodes with strong edges→\rarr→ strong friendship
Weak bonds to distinct part of networks →\rarr→ more diverse information
Triadic Closure implies that two nodes having a common friend are more likely to become friends themselves. And hence might lead to high clustering coefficient.
There is another empirical study by Bearman and Moody shows that teenage girls with low clustering coefficient are more likely to contemplate suicide.
Granovetters observations were empirical study from 1960's, but a recent study from 2007 Onella et al. was a study on phone call network of 20% of some european country with phone call frequency as edge weights. They define Edge overlap OijO_{ij}Oij​ between nodes i and j that are connected as the ration of common nodes both i and j have total nodes i and j both have in their neighbourhoods.
Edge overlap Oij=∣(N(i)∩N(j))\{i,j}∣∣(N(i)∪N(j))\{i,j}∣\displaystyle O_{ij} = \frac{|(N(i) \cap N(j))\backslash \lbrace i, j\rbrace|}{|(N(i) \cup N(j))\backslash \lbrace i, j\rbrace|}Oij​=∣(N(i)∪N(j))\{i,j}∣∣(N(i)∩N(j))\{i,j}∣​
N(i) is the set of neighbors of node i.
Overlap is zero when the edge between i and j is a local bridge. As OijO_{ij}Oij​ goes from 0 to 1 their structural friendship gets stronger. Edge overlap represents how strong two nodes are embedded within a network structurally.
When edge overlap is plotted against edge strength, they found that strong edges have stronger overlap. The red dots are for the null network with permuted strengths, where the same strengths are randomly shuffled and assigned to existing edges.
Following plots shows how the size of the giant component changes as edges are removed according to their edge strength and edge overlap respectively.
Red plots are when edges are removed from low edge strength to high edge strength, and black plots are when edges are removed from high edge strength to low edge strength. Likewise in case of edge overlap plot. As you can see in both cases, the network decomposes faster when low strength/low overlap edges are removed first. This is more pronounced in case of edge overlap plot against size of the largest component. As edges with low overlap keep getting removed, the giant component disappears faster. This shows how important weakly connected edges but bring the whole network together and helps with diverse information passing to all corners of the network, even though locally they appear to not that strongly embedded. Hence giving us some validation to Granovetters hypothesis.
﻿
Finding Network CommunitiesWe want to detect communities given the network structure and ideally the detected clusters world then correspond to real world groups.
Zachary's Karate club networkThis is a famous graph dataset, like MNIST for image classification, you can find it in most python graph libraries. It's a dataset of a karate club members and their relationships within. It has 34 members. A rivalry came up between the coach and one of the club members, the club partitioned into two groups, one going with the original coach and one with the club member. This is a popular dataset for node classification algorithms, community detection and etc.
NCAA Football clubsModularityModularity QQQ is measure of how well a network is partitioned into communities. A partition of a network can be thought of as mapping from every node to its respective community. And given the partitioning of the network into disjoint groups s∈Ss \in Ss∈S
Q∝∑s∈S[(no of edges within s)−(expected no of edges within s)]\displaystyle Q \propto \sum_{s\in S} [\text{(no of edges within s)} - \text{(expected no of edges within s)}]Q∝s∈S∑​[(no of edges within s)−(expected no of edges within s)] 
We have to build a null model to get the expected no of edges within a group s. Similar to the spokes model we used earlier. We will be building a new network with nodes of same degree distribution, but uniformly random edge connections. Given network GGG on nnn nodes and mmm edges, we have to construct a rewired network G′G'G′. Unlike the spokes model we will allow G′G'G′ to be a multigraph. Two nodes might have more than one edge. By doing this we don't actually have to build the network G′G'G′, and get the expected number of edges within group s analytically.
 The expected number of edges between nodes iii and jjj of degrees kik_iki​ and kjk_jkj​ is kikj2m\displaystyle \frac{k_ik_j}{2m}2mki​kj​​
We can verify this by calculating the expected total number of edges in G′G'G′ and making sure its the same as mmm
Total expected edges in G′G'G′ = 12∑i∈N∑j∈Nkikj2m=12.12m∑i∈Nki(∑j∈Nkj)=14m2m.2m=m\displaystyle \frac12\sum_{i \in N}\sum_{j \in N} \frac{k_ik_j}{2m} = \frac12 . \frac{1}{2m}\sum_{i \in N} k_i(\sum_{j \in N} k_j) = \frac{1}{4m}2m.2m = m21​i∈N∑​j∈N∑​2mki​kj​​=21​.2m1​i∈N∑​ki​(j∈N∑​kj​)=4m1​2m.2m=m
Now that we have the expected edges between i and j in the null model, the modularity of graph G on with partition S is
Q(G,S)=12m∑s∈S∑i∈s∑j∈s(Aij−kikj2m)\displaystyle Q(G, S) = \frac{1}{2m}\sum_{s \in S}\sum_{i \in s}\sum_{j \in s}\bigg(A_{ij} - \frac{k_ik_j}{2m}\bigg)Q(G,S)=2m1​s∈S∑​i∈s∑​j∈s∑​(Aij​−2mki​kj​​)
Using the indicator function δ(ci,cj)\delta(c_i, c_j)δ(ci​,cj​) that returns 1 if node 1 and j are in the same community and 0 otherwise, we can also write the modularity Q as follows.
Q=12m∑ij[Aij−kikj2m]δ(ci,cj)\displaystyle Q = \frac{1}{2m} \sum_{ij} \bigg[A_{ij} - \frac{k_ik_j}{2m}\bigg]\delta(c_i, c_j)Q=2m1​ij∑​[Aij​−2mki​kj​​]δ(ci​,cj​)
We use 2m2m2m as a normalizing constant, to keep the modularity within -1 and 1. Hence −1���Q≤1-1 \leq Q \leq 1−1≤Q≤1
When Q is positive, the number of edges within groups exceed the expected number. Q within 0.3 and 0.7 means a significant community structure.
Now our goal is to identify communities by maximizing modularity. Find the partition with the maximum modularity.
﻿
Louvian AlgorithmWe use Louvian algorithm to figure out the partition of a given network with high modularity score. It's a greedy algorithm with run time of O(nlog⁡n)O(n \log n)O(nlogn) and supports weighted graphs. It also provides hierarchal communities. It's a fast algorithm with high modularity output. Finding the partition with the maximum modularity is NP hard.
Louvian algorithm greedily maximizes modularity starting with each node as its own community. In Phase 1 modularity is optimized by allowing local changes to node-community memberships followed by Phase 2 where identified communities are aggregated into super-nodes to build a new network. These two phases are repeated iteratively until no increase of modularity is possible or enough communities were partitioned from the given network.
Phase 1We start with each node as its own distinct community. And for each node i, the algorithm performs two calculations:
Compute the modularity delta ΔQ\Delta QΔQ when putting node iii into the community of some neighbour jjj
Move iii to a community of node jjj that yields the largest gain in ΔQ\Delta QΔQ
We perform phase 1 until no movement yields a gain. Technically the order of the nodes in which we perform phase 1 affects the output of our algorithm, but empirically the ordering of the nodes does not have significance on the overall modularity that is obtained. We are always moving in such a way the ΔQ\Delta QΔQ increases and nodes are only in communities with nodes that already have an edge with.
So how do we compute ΔQ\Delta QΔQ of movie node iii to community CCC
It has two components, ΔQ(i→C\Delta Q(i \rarr CΔQ(i→C modularity difference in putting node i in C and ΔQ(D→i)\Delta Q(D \rarr i)ΔQ(D→i), modularity difference in removing node i from community D
So ΔQ=ΔQ(i→C)+ΔQ(D→C)\Delta Q = \Delta Q(i \rarr C) + \Delta Q(D \rarr C)ΔQ=ΔQ(i→C)+ΔQ(D→C) every modularity change includes, a removal of a node from an existing community and putting it in a different community.
ΔQ(i→C)=[∑in+ki,in2m−(∑tot+ki2m)2]−[∑in2m−(∑tot2m)2−(ki2m)2]\Delta Q(i\rarr C) = \bigg[ \frac{\sum_{in} + {k_{i,in}}}{2m} - {\bigg(\frac{\sum_{tot} + k_i}{2m}\bigg)}^2 \bigg] - \bigg[\frac{\sum_{in}}{2m} - \bigg(\frac{\sum_{tot}}{2m} \bigg)^2- \bigg(\frac{k_i}{2m} \bigg)^2\bigg]ΔQ(i→C)=[2m∑in​+ki,in​​−(2m∑tot​+ki​​)2]−[2m∑in​​−(2m∑tot​​)2−(2mki​​)2]
Where
∑in\sum_{in}∑in​ is sum of link weights between nodes in community CCC
∑tot\sum_{tot}∑tot​ is sum of link weights of nodes in community CCC
ki,ink_{i, in}ki,in​ is sum of link weights between node iii and community CCC
kik_iki​ is sum of all link weights or degree in case of unweighted graph of node iii
To understand ΔQ(i→C)\Delta Q(i \rarr C)ΔQ(i→C) we will abstract away the network into 3 components, the node iii and community CCC and rest of the network.
[∑in+ki,in2m−(∑tot+ki2m)2]\bigg[ \frac{\sum_{in} + {k_{i,in}}}{2m} - {\bigg(\frac{\sum_{tot} + k_i}{2m}\bigg)}^2 \bigg][2m∑in​+ki,in​​−(2m∑tot​+ki​​)2] Modularity contribution after merging node iii
[∑in2m−(∑tot2m)2−(ki2m)2]\bigg[\frac{\sum_{in}}{2m} - \bigg(\frac{\sum_{tot}}{2m} \bigg)^2- \bigg(\frac{k_i}{2m} \bigg)^2\bigg][2m∑in​​−(2m∑tot​​)2−(2mki​​)2] Modularity contribution before merging node iii
TODO: more details on Phase 1 of Louvain algorithm.
Phase 2The communities obtained in the first phase are contracted into super-nodes and the network is created accordingly.
Super nodes are connected if there is at least one edge between the nodes of the corresponding communities.
And the weight of the edge between the two super-nodes is the sum of the weights from all edges between their corresponding communities.
And after this new network of supernodes is constructed, we run Phase 1 again.
NotesLouvain algorithm is not an exact algorithm, we might not get the best possible partition of the graph. It's an approximate algorithm, but very effective and fast and scales to large networks. And figuring out the best partition is an NP hard problem.
﻿
Detecting Overlapping Communities: BigCLAMIn reality not all communities will be discrete, a node might belong more than one community and a more generic form of community detection. Your ego network on facebook might have a friend that is both in your highschool community and university community.
For this we will define a generative model for graphs that is based on node community affiliations, also Community Affiliation Graph Model(AGM). And Given a graph GGG, make the assumption that GGG was generated by our AGM. And find the best parameters for AGM that could have generated GGG and this way we can discover communities.
Community - Affiliation Graph Model (AGM)We assume a bipartite graph with communities CCC as one set of nodes, and our nodes VVV from graph G as the other set of the nodes in bipartite graph. With membership MMM into the communities as edges. The AGM is defined as the generation of the graph G by from this bipartite graph. The model also has parameters pa,pb⋯pcp_a, p_b \cdots p_cpa​,pb​⋯pc​ where pap_apa​ is the probability that each member of community A has an edge between them. Once we figure out this bipartite graph parameters Nodes VVV, Communities CCC, Memberships MMM and probabilities pa,pb,⋯p_a, p_b, \cdotspa​,pb​,⋯ we have our Affiliation Graph Model that can generate our graph GGG. This is sort of like folding a bipartite graph but with edges dependent on probabilities. 
Given parameters (V,C,M,{pc})(V, C, M, \lbrace p_c \rbrace )(V,C,M,{pc​})
Nodes in community ccc connect to each other by flipping a coin with probability pcp_cpc​
And nodes that belong to multiple communities have multiple coin flips, if they miss the first time, they get another chance through the next community. p(u,v)=1−∏c∈Mu∩Mv(1−pc)p(u, v) = 1 - \prod_{c \in M_u \cap M_v}(1 - p_c)p(u,v)=1−∏c∈Mu​∩Mv​​(1−pc​). The more groups uuu and vvv have in common the more likely there will be an edge between uuu and vvv.
If nodes uuu and vvv have no communities in common, then p(u,v)=0p(u, v) = 0p(u,v)=0. We resolve this by having a background epsilon community that every node is a member of. This makes it possible for two nodes to have an edge between them, even if they were never part of any communities.
The flexibility of AGM allows us to have all kinds of communities, with no overlap, with overlap, or hierarchical.
So now given a graph GGG our goal is to find the model FFF with above parameters (V,C,M,{pc})(V, C, M, \lbrace p_c \rbrace)(V,C,M,{pc​}) . The affiliation graph MMM, number of communities CCC and parameters pcp_cpc​.
We fit our graph GGG to FFF and find out the parameters using maximum likelihood estimation
Given real graph GGG find model/parameters FFF where arg max⁡FP(G∣F)\argmax_F P(G|F)argmaxF​P(G∣F), for this we need to be able to compute the probability P(G∣F)P(G|F)P(G∣F) and then search over FFF and find out the best possible fit by using things like gradient descent etc.
Graph likelihood P(G∣F)P(G|F)P(G∣F)Given G and F we calculate likelihood that F generated G: P(G∣F)P(G|F)P(G∣F)
Say F is edge probability matrix and G is our adjacency matrix like below
F=(0.250.100.100.040.050.150.020.060.050.020.150.060.010.030.030.09)G=(1011010110111111)F = \begin{pmatrix} 0.25 & 0.10 & 0.10 & 0.04\\ 0.05 & 0.15 & 0.02 & 0.06\\ 0.05 & 0.02 & 0.15 & 0.06\\ 0.01 & 0.03 & 0.03 & 0.09\end{pmatrix} \quad G=\begin{pmatrix} 1 & 0 & 1 & 1\\ 0 & 1 & 0 & 1\\ 1 & 0 & 1 & 1\\ 1 & 1 & 1 & 1\end{pmatrix}F=⎝⎜⎜⎜⎛​0.250.050.050.01​0.100.150.020.03​0.100.020.150.03​0.040.060.060.09​⎠⎟⎟⎟⎞​G=⎝⎜⎜⎜⎛​1011​0101​1011​1111​⎠⎟⎟⎟⎞​
P(G∣F)=∏(u,v)∈GP(u,v)∏(u,v)∉G(1−P(u,v))\displaystyle P(G|F) = \prod_{(u, v)\in G}P(u, v) \prod_{(u,v) \notin G}(1 - P(u, v))P(G∣F)=(u,v)∈G∏​P(u,v)(u,v)∈/​G∏​(1−P(u,v))
product of likelihood of edges in the graph and likelihood of edges not in the graph given edge probabilities P(u,v)P(u, v)P(u,v)
We now assume memberships have strengths. Every node uuu has a membership vector FuF_uFu​ that tells us how strongly u belongs to all the communities. FuF_uFu​ is a row vector of community memberships of node uuu. FuAF_{uA}FuA​ is the membership strength of node uuu to community AAA, if its zero then uuu is not a member of AAA.
Now our goal is find out the community membership vector for all the nodes.
BigCLAM ModelFrom the above community membership vectors, the probability of nodes (u,v)(u, v)(u,v) linking is proportional to the strength of shared memberships.
P(u,v)=1−exp⁡(−Fu.FvT)P(u, v) = 1 - \exp(-F_u.F_v^T)P(u,v)=1−exp(−Fu​.FvT​)
The strength of the link between u and v will be proportional to how close their memberships to the communities are.
Given a network G(V,E)G(V, E)G(V,E), we maximize the below loss function, which is a logarithm applied to the likelihood function P(G∣F)P(G|F)P(G∣F) we defined earlier.
l(F)=∑(u,v)∈Elog⁡(1−exp⁡(−FuFvT))−∑(u,v)∉EFuFvT\displaystyle l(F) = \sum_{(u, v) \in E} \log(1 - \exp(-F_uF_v^T)) - \sum_{(u, v) \notin E} F_uF_v^Tl(F)=(u,v)∈E∑​log(1−exp(−Fu​FvT​))−(u,v)∈/​E∑​Fu​FvT​
Optimization steps:
Start with random FFF
Update FuCF_{uC}FuC​ for node uuu while fixing the memberships of all other nodes.
Updating takes linear time in the degree of uuu
Gradient ascentΔl(Fu)=∑v∈N(u)Fvexp(−FuFvT)1−exp(−FuFvT)−∑v∈N(u)Fv\displaystyle \Delta l(F_u) = \sum_{v \in \mathcal{N}(u)} F_v \frac{exp(-F_uF_v^T)}{1 - exp(-F_uF_v^T)} - \sum_{v \in \mathcal{N}(u)} F_vΔl(Fu​)=v∈N(u)∑​Fv​1−exp(−Fu​FvT​)exp(−Fu​FvT​)​−v∈N(u)∑​Fv​
We perform gradient ascent using above gradient, where we make small changes to F that lead to increase in log-likelihood.
Pure gradient ascent is slow, so we can cache FvF_vFv​ using the following to makes it linear time in the degree of uuu
∑v∉N(u)Fv=(∑vFv−Fu−∑v∈N(u)Fv)\displaystyle \sum_{v \notin \mathcal{N}(u)} F_v = \bigg(\sum_v F_v - F_u - \sum_{v \in \mathcal{N}(u)} F_v \bigg)v∈/​N(u)∑​Fv​=(v∑​Fv​−Fu​−v∈N(u)∑​Fv​)
﻿
Chapters
Introduction, Structure of Graphs
Properties of Networks and Random Graph Models
Motifs and Structural Roles in Networks
Community Structure in Networks
Spectral Clustering
Message Passing and Node Classification
Graph Representation Learning
Graph Neural Networks
Graph Neural Networks - Pytorch Geometric
Deep Generative Models for Graphs
Link Analysis: PageRank
Network Effects and Cascading Behaviour
Probabilistic Contagion and Models of Influence
Influence Maximization in Networks
Outbreak Detection in Networks
Network Evolution
Reasoning over Knowledge Graphs
Limitations of Graph Neural Networks
Applications of Graph Neural Networks
﻿
﻿
Add a comment