4.Community Structure in Networks
Communities
We studied roles, which are nodes clustered together based on their structural position with in the network even though they might in very different parts of the network. Communities on the other hand or the notion of community structure in networks is essentially identifying sets of nodes that are densly connected with each other. Our goal is identifying partitions, groups or communities that are tightly knit within the network.
Granovetter
Now we will try to understand how information flows through the network. Mark Granovetter a social scientist found out that people know about new jobs mostly from acquaintances rather than close friends. One simple explanation is that people usually have more acquaintances than friends. But is there a better, more generic explanation for this phenomenon.
Structural perspective on friendship is that if two nodes have a common friend, they are more likely to become friends themself. And on top of that the interpersonal perspective is that friendship between two people can be strong or weak. Structurally embedded edges are also socially strong with strong bonds. And long ranging edges spanning different parts of the network are socially weak. But Structurally embedded nodes have redundant information, where as long range edges allow us to gather information from different parts of the network and has diverse information.
Clustered nodes with strong edges→\rarr strong friendship
Weak bonds to distinct part of networks →\rarr more diverse information
Triadic Closure implies that two nodes having a common friend are more likely to become friends themselves. And hence might lead to high clustering coefficient.
There is another empirical study by Bearman and Moody shows that teenage girls with low clustering coefficient are more likely to contemplate suicide.
Granovetters observations were empirical study from 1960's, but a recent study from 2007 Onella et al. was a study on phone call network of 20% of some european country with phone call frequency as edge weights. They define Edge overlap OijO_{ij} between nodes i and j that are connected as the ration of common nodes both i and j have total nodes i and j both have in their neighbourhoods.
Edge overlap Oij=∣(N(i)∩N(j))\{i,j}∣∣(N(i)∪N(j))\{i,j}∣\displaystyle O_{ij} = \frac{|(N(i) \cap N(j))\backslash \lbrace i, j\rbrace|}{|(N(i) \cup N(j))\backslash \lbrace i, j\rbrace|}
N(i) is the set of neighbors of node i.
Overlap is zero when the edge between i and j is a local bridge. As OijO_{ij} goes from 0 to 1 their structural friendship gets stronger. Edge overlap represents how strong two nodes are embedded within a network structurally.
When edge overlap is plotted against edge strength, they found that strong edges have stronger overlap. The red dots are for the null network with permuted strengths, where the same strengths are randomly shuffled and assigned to existing edges.
Following plots shows how the size of the giant component changes as edges are removed according to their edge strength and edge overlap respectively.
Red plots are when edges are removed from low edge strength to high edge strength, and black plots are when edges are removed from high edge strength to low edge strength. Likewise in case of edge overlap plot. As you can see in both cases, the network decomposes faster when low strength/low overlap edges are removed first. This is more pronounced in case of edge overlap plot against size of the largest component. As edges with low overlap keep getting removed, the giant component disappears faster. This shows how important weakly connected edges but bring the whole network together and helps with diverse information passing to all corners of the network, even though locally they appear to not that strongly embedded. Hence giving us some validation to Granovetters hypothesis.
Finding Network Communities
We want to detect communities given the network structure and ideally the detected clusters world then correspond to real world groups.
Zachary's Karate club network
This is a famous graph dataset, like MNIST for image classification, you can find it in most python graph libraries. It's a dataset of a karate club members and their relationships within. It has 34 members. A rivalry came up between the coach and one of the club members, the club partitioned into two groups, one going with the original coach and one with the club member. This is a popular dataset for node classification algorithms, community detection and etc.
NCAA Football clubs
Modularity
Modularity QQ is measure of how well a network is partitioned into communities. A partition of a network can be thought of as mapping from every node to its respective community. And given the partitioning of the network into disjoint groups s∈Ss \in S
Q∝∑s∈S[(no of edges within s)−(expected no of edges within s)]\displaystyle Q \propto \sum_{s\in S} [\text{(no of edges within s)} - \text{(expected no of edges within s)}]
We have to build a null model to get the expected no of edges within a group s. Similar to the spokes model we used earlier. We will be building a new network with nodes of same degree distribution, but uniformly random edge connections. Given network GG on nn nodes and mm edges, we have to construct a rewired network G′G'. Unlike the spokes model we will allow G′G' to be a multigraph. Two nodes might have more than one edge. By doing this we don't actually have to build the network G′G', and get the expected number of edges within group s analytically.
The expected number of edges between nodes ii and jj of degrees kik_i and kjk_j is kikj2m\displaystyle \frac{k_ik_j}{2m}
We can verify this by calculating the expected total number of edges in G′G' and making sure its the same as mm
Total expected edges in G′G' = 12∑i∈N∑j∈Nkikj2m=12.12m∑i∈Nki(∑j∈Nkj)=14m2m.2m=m\displaystyle \frac12\sum_{i \in N}\sum_{j \in N} \frac{k_ik_j}{2m} = \frac12 . \frac{1}{2m}\sum_{i \in N} k_i(\sum_{j \in N} k_j) = \frac{1}{4m}2m.2m = m
Now that we have the expected edges between i and j in the null model, the modularity of graph G on with partition S is
Q(G,S)=12m∑s∈S∑i∈s∑j∈s(Aij−kikj2m)\displaystyle Q(G, S) = \frac{1}{2m}\sum_{s \in S}\sum_{i \in s}\sum_{j \in s}\bigg(A_{ij} - \frac{k_ik_j}{2m}\bigg)
Using the indicator function δ(ci,cj)\delta(c_i, c_j) that returns 1 if node 1 and j are in the same community and 0 otherwise, we can also write the modularity Q as follows.
Q=12m∑ij[Aij−kikj2m]δ(ci,cj)\displaystyle Q = \frac{1}{2m} \sum_{ij} \bigg[A_{ij} - \frac{k_ik_j}{2m}\bigg]\delta(c_i, c_j)
We use 2m2m as a normalizing constant, to keep the modularity within -1 and 1. Hence −1���Q≤1-1 \leq Q \leq 1
When Q is positive, the number of edges within groups exceed the expected number. Q within 0.3 and 0.7 means a significant community structure.
Now our goal is to identify communities by maximizing modularity. Find the partition with the maximum modularity.
Louvian Algorithm
We use Louvian algorithm to figure out the partition of a given network with high modularity score. It's a greedy algorithm with run time of O(nlogn)O(n \log n) and supports weighted graphs. It also provides hierarchal communities. It's a fast algorithm with high modularity output. Finding the partition with the maximum modularity is NP hard.
Louvian algorithm greedily maximizes modularity starting with each node as its own community. In Phase 1 modularity is optimized by allowing local changes to node-community memberships followed by Phase 2 where identified communities are aggregated into super-nodes to build a new network. These two phases are repeated iteratively until no increase of modularity is possible or enough communities were partitioned from the given network.
Phase 1
We start with each node as its own distinct community. And for each node i, the algorithm performs two calculations:
- Compute the modularity delta ΔQ\Delta Q when putting node ii into the community of some neighbour jj
- Move ii to a community of node jj that yields the largest gain in ΔQ\Delta Q
We perform phase 1 until no movement yields a gain. Technically the order of the nodes in which we perform phase 1 affects the output of our algorithm, but empirically the ordering of the nodes does not have significance on the overall modularity that is obtained. We are always moving in such a way the ΔQ\Delta Q increases and nodes are only in communities with nodes that already have an edge with.
So how do we compute ΔQ\Delta Q of movie node ii to community CC
It has two components, ΔQ(i→C\Delta Q(i \rarr C modularity difference in putting node i in C and ΔQ(D→i)\Delta Q(D \rarr i), modularity difference in removing node i from community D
So ΔQ=ΔQ(i→C)+ΔQ(D→C)\Delta Q = \Delta Q(i \rarr C) + \Delta Q(D \rarr C) every modularity change includes, a removal of a node from an existing community and putting it in a different community.
ΔQ(i→C)=[∑in+ki,in2m−(∑tot+ki2m)2]−[∑in2m−(∑tot2m)2−(ki2m)2]\Delta Q(i\rarr C) = \bigg[ \frac{\sum_{in} + {k_{i,in}}}{2m} - {\bigg(\frac{\sum_{tot} + k_i}{2m}\bigg)}^2 \bigg] - \bigg[\frac{\sum_{in}}{2m} - \bigg(\frac{\sum_{tot}}{2m} \bigg)^2- \bigg(\frac{k_i}{2m} \bigg)^2\bigg]
Where
- ∑in\sum_{in} is sum of link weights between nodes in community CC
- ∑tot\sum_{tot} is sum of link weights of nodes in community CC
- ki,ink_{i, in} is sum of link weights between node ii and community CC
- kik_i is sum of all link weights or degree in case of unweighted graph of node ii
To understand ΔQ(i→C)\Delta Q(i \rarr C) we will abstract away the network into 3 components, the node ii and community CC and rest of the network.
[∑in+ki,in2m−(∑tot+ki2m)2]\bigg[ \frac{\sum_{in} + {k_{i,in}}}{2m} - {\bigg(\frac{\sum_{tot} + k_i}{2m}\bigg)}^2 \bigg] Modularity contribution after merging node ii
[∑in2m−(∑tot2m)2−(ki2m)2]\bigg[\frac{\sum_{in}}{2m} - \bigg(\frac{\sum_{tot}}{2m} \bigg)^2- \bigg(\frac{k_i}{2m} \bigg)^2\bigg] Modularity contribution before merging node ii
TODO: more details on Phase 1 of Louvain algorithm.
Phase 2
The communities obtained in the first phase are contracted into super-nodes and the network is created accordingly.
- Super nodes are connected if there is at least one edge between the nodes of the corresponding communities.
- And the weight of the edge between the two super-nodes is the sum of the weights from all edges between their corresponding communities.
And after this new network of supernodes is constructed, we run Phase 1 again.
Notes
Louvain algorithm is not an exact algorithm, we might not get the best possible partition of the graph. It's an approximate algorithm, but very effective and fast and scales to large networks. And figuring out the best partition is an NP hard problem.
Detecting Overlapping Communities: BigCLAM
In reality not all communities will be discrete, a node might belong more than one community and a more generic form of community detection. Your ego network on facebook might have a friend that is both in your highschool community and university community.
For this we will define a generative model for graphs that is based on node community affiliations, also Community Affiliation Graph Model(AGM). And Given a graph GG, make the assumption that GG was generated by our AGM. And find the best parameters for AGM that could have generated GG and this way we can discover communities.
Community - Affiliation Graph Model (AGM)
We assume a bipartite graph with communities CC as one set of nodes, and our nodes VV from graph G as the other set of the nodes in bipartite graph. With membership MM into the communities as edges. The AGM is defined as the generation of the graph G by from this bipartite graph. The model also has parameters pa,pb⋯pcp_a, p_b \cdots p_c where pap_a is the probability that each member of community A has an edge between them. Once we figure out this bipartite graph parameters Nodes VV, Communities CC, Memberships MM and probabilities pa,pb,⋯p_a, p_b, \cdots we have our Affiliation Graph Model that can generate our graph GG. This is sort of like folding a bipartite graph but with edges dependent on probabilities.
Given parameters (V,C,M,{pc})(V, C, M, \lbrace p_c \rbrace )
- Nodes in community cc connect to each other by flipping a coin with probability pcp_c
- And nodes that belong to multiple communities have multiple coin flips, if they miss the first time, they get another chance through the next community. p(u,v)=1−∏c∈Mu∩Mv(1−pc)p(u, v) = 1 - \prod_{c \in M_u \cap M_v}(1 - p_c). The more groups uu and vv have in common the more likely there will be an edge between uu and vv.
- If nodes uu and vv have no communities in common, then p(u,v)=0p(u, v) = 0. We resolve this by having a background epsilon community that every node is a member of. This makes it possible for two nodes to have an edge between them, even if they were never part of any communities.
The flexibility of AGM allows us to have all kinds of communities, with no overlap, with overlap, or hierarchical.
So now given a graph GG our goal is to find the model FF with above parameters (V,C,M,{pc})(V, C, M, \lbrace p_c \rbrace) . The affiliation graph MM, number of communities CC and parameters pcp_c.
We fit our graph GG to FF and find out the parameters using maximum likelihood estimation
Given real graph GG find model/parameters FF where arg maxFP(G∣F)\argmax_F P(G|F), for this we need to be able to compute the probability P(G∣F)P(G|F) and then search over FF and find out the best possible fit by using things like gradient descent etc.
Graph likelihood P(G∣F)P(G|F)
Given G and F we calculate likelihood that F generated G: P(G∣F)P(G|F)
Say F is edge probability matrix and G is our adjacency matrix like below
F=(0.250.100.100.040.050.150.020.060.050.020.150.060.010.030.030.09)G=(1011010110111111)F = \begin{pmatrix} 0.25 & 0.10 & 0.10 & 0.04\\ 0.05 & 0.15 & 0.02 & 0.06\\ 0.05 & 0.02 & 0.15 & 0.06\\ 0.01 & 0.03 & 0.03 & 0.09\end{pmatrix} \quad G=\begin{pmatrix} 1 & 0 & 1 & 1\\ 0 & 1 & 0 & 1\\ 1 & 0 & 1 & 1\\ 1 & 1 & 1 & 1\end{pmatrix}
P(G∣F)=∏(u,v)∈GP(u,v)∏(u,v)∉G(1−P(u,v))\displaystyle P(G|F) = \prod_{(u, v)\in G}P(u, v) \prod_{(u,v) \notin G}(1 - P(u, v))
product of likelihood of edges in the graph and likelihood of edges not in the graph given edge probabilities P(u,v)P(u, v)
We now assume memberships have strengths. Every node uu has a membership vector FuF_u that tells us how strongly u belongs to all the communities. FuF_u is a row vector of community memberships of node uu. FuAF_{uA} is the membership strength of node uu to community AA, if its zero then uu is not a member of AA.
Now our goal is find out the community membership vector for all the nodes.
BigCLAM Model
From the above community membership vectors, the probability of nodes (u,v)(u, v) linking is proportional to the strength of shared memberships.
P(u,v)=1−exp(−Fu.FvT)P(u, v) = 1 - \exp(-F_u.F_v^T)
The strength of the link between u and v will be proportional to how close their memberships to the communities are.
Given a network G(V,E)G(V, E), we maximize the below loss function, which is a logarithm applied to the likelihood function P(G∣F)P(G|F) we defined earlier.
l(F)=∑(u,v)∈Elog(1−exp(−FuFvT))−∑(u,v)∉EFuFvT\displaystyle l(F) = \sum_{(u, v) \in E} \log(1 - \exp(-F_uF_v^T)) - \sum_{(u, v) \notin E} F_uF_v^T
Optimization steps:
- Start with random FF
- Update FuCF_{uC} for node uu while fixing the memberships of all other nodes.
- Updating takes linear time in the degree of uu
Gradient ascent
Δl(Fu)=∑v∈N(u)Fvexp(−FuFvT)1−exp(−FuFvT)−∑v∈N(u)Fv\displaystyle \Delta l(F_u) = \sum_{v \in \mathcal{N}(u)} F_v \frac{exp(-F_uF_v^T)}{1 - exp(-F_uF_v^T)} - \sum_{v \in \mathcal{N}(u)} F_v
We perform gradient ascent using above gradient, where we make small changes to F that lead to increase in log-likelihood.
Pure gradient ascent is slow, so we can cache FvF_v using the following to makes it linear time in the degree of uu
∑v∉N(u)Fv=(∑vFv−Fu−∑v∈N(u)Fv)\displaystyle \sum_{v \notin \mathcal{N}(u)} F_v = \bigg(\sum_v F_v - F_u - \sum_{v \in \mathcal{N}(u)} F_v \bigg)
Chapters
- Introduction, Structure of Graphs
- Properties of Networks and Random Graph Models
- Motifs and Structural Roles in Networks
- Community Structure in Networks
- Spectral Clustering
- Message Passing and Node Classification
- Graph Representation Learning
- Graph Neural Networks
- Graph Neural Networks - Pytorch Geometric
- Deep Generative Models for Graphs
- Link Analysis: PageRank
- Network Effects and Cascading Behaviour
- Probabilistic Contagion and Models of Influence
- Influence Maximization in Networks
- Outbreak Detection in Networks
- Network Evolution
- Reasoning over Knowledge Graphs
- Limitations of Graph Neural Networks
- Applications of Graph Neural Networks