14.Influence Maximization in Networks
In this section we discuss the problem of identifying the seeds in a network that maximize the influence so that most nodes adopt a certain behaviour.
Example of Influence is the Kate Middleton Effect, its the trend effect of Kate on UK fashion. The Kate Effect may be worth 1 Billion pounds to the UK fashion industry. Identifying influential nodes in a network like in Kate is very useful. Influential persons often have many friends, but it doesn't mean that high degree people are highly influential.
Influence Maximization
Given a directed graph G, find k seeds that are the most influential to maximize the number of influenced people.
We have two classical propagation models.
- Linear Threshold Model
- Independent Cascade Model
Linear Threshold Model
Every node has a random threshold θv∼U[0,1]\theta_v \sim U[0, 1] which decides how much influence all the node v's friends have to do to convince v to adopt a behaviour.
A node v is influenced by each neighbour w according to a weight bv,wb_{v,w} such that ∑wneighbourofvbv,w≤1\sum_{w neighbour of v} b_{v, w} \leq 1
If the sum of these influences is above the threshold θ\theta then the node will be adopting a behaviour.
∑wactiveneighborofvbv,w≥θv\displaystyle \sum_{w active neighbor of v} b_{v, w} \geq \theta_v
Starting from active nodes, we keep on activating nodes based on the threshold of all the nodes and the influence weights of the links, and keep repeating the process once new active nodes appear.
Independent Cascade Model
We have a directed finite G = (V, E), Set S starts out with new behaviour and are active. Each dge (v, w) has a probability pvwp_{vw} that signifies with what probability node w gets activated by node v. If node v is active, it gets one chance to make w active with probability pvwp_{vw}. The ordering of the active nodes while activating another node doesn't matter. If u, v are both active at the same time, it doesn't matter which tries to activate w first.
When a node v becomes active, it activates each out-neighbour w with prob pvwp_vw and all these activations spread through the network.
Now the problem is to find out what initial set S of size k is to start with to influence the largest set of nodes. Most Influential Set S of size k. And the largest expected cascade size f(S)f(S) is the total number of activated nodes we end up with starting from S.
We consider the expected cascade size as it is a probabilistic model, as the nodes gets activated based on edge probability. so f(S)=1∣I∣∑Random realizations ifi(S)f(S) = \frac1{|I|}\sum_{\text{Random realizations i}}f_i(S). So we compute the expected number of active nodes by sampling several realizations of the same process identifying total active nodes starting from S.
And the Influence maximisation problem we end up with is maxS of size kf(S)\max_{\text{S of size k}}f(S), which is NP-Complete. Because we have to enumerate all possible subsets S and then find the respective expected cascade size. We use approximation algorithm to find the most influential set of k nodes. For some inputs the algorithm won't find globally optimal solution set. But we can prove that the algorithm will find a set S where f(S)≥0.63f(OPT)f(S) \geq 0.63 f(OPT), where OPT is the globally optimal set. We are guaranteed to find an initial set S, and be able to activate at least 63% nodes compared to the globally optimal set.
Greedy Hill Climbing Algorithm
Consider a Greedy Hill Climbing to find S. Where XuX_u of each node that contains all the nodes that will eventually get active if we activate u. Xu={v1,v2,⋯ }X_u = \lbrace v_1, v_2,\cdots \rbrace. At each iteration i activate the node u that gives largest marginal gain maxuf(Si−1∪{u})\max_u f(S_{i-1} \cup \lbrace u\rbrace)
Starting with S0={}S_0 = \lbrace \rbrace, we keep activating nodes u that maximizes f(Si−1∪{u})f(S_{i-1} \cup \lbrace u \rbrace) and add the most influential node to Si−1S_{i-1} to get SiS_i. We repeat this process for i=1⋯ki = 1 \cdots k times.
Hill climbing produces a solution S where f(S)≥(1−1/e).f(OPT)f(S) \geq (1 - 1/e).f(OPT), this claim holds for all functions that are monotone,, and submodular.
Our function in case of influence maximisation is a monotone function as activating a new node and adding it to our initial set will only lead to more activations than the original S. In our case if S⊆TS \subseteq T then f(S)≤f(T)f(S) \leq f(T) and f({})=0f(\lbrace \rbrace) = 0
Adding an element to a set gives less improvement than adding it to one of its subsets. This is the submodular property. ∀S⊆T;f(S∪{u})−f(S)≥f(T∪{u})−f(T)\forall S \subseteq T; f(S \cup \lbrace u \rbrace) - f(S) \geq f(T \cup \lbrace u \rbrace) - f(T). Gain of adding a node u to a set T will always be less than the gain of adding a node u to a subset S of T. The more I have already the less I gain by adding something to it, in our case. The returns will be diminishing.
Submodular Functions
For us to show that a function f(.)f(.) is submodular, we have to prove
∀S⊆T;f(S∪{u})−f(S)≥f(T∪{u})−f(T)\displaystyle \forall S \subseteq T; f(S \cup \lbrace u\rbrace) - f(S) \geq f(T \cup \lbrace u \rbrace) - f(T)
Gain of adding a node to a larger set is smaller than gain of adding the same node to a subset of itself.
Non-negative combination of submodular functions is also a submodular function, if f1(x),f2,⋯f_1(x), f_2, \cdots are submodular and c1,c2,⋯≥0c_1, c_2, \cdots \geq 0 then F(x)=∑ici.fi(x)F(x) = \sum_ic_i.f_i(x) is also a submodular function
Also Consider submodular functions that return sets X1,⋯ ,XmX_1,\cdots, X_m and f(S)=∣∪k∈SXk∣f(S) = |\cup_{k \in S} X_k| which is the size of the union of sets Xk,k∈SX_k, k \in S, then f(S)f(S) is also submodular. The more sets you already have the less new area a given set u will cover.
Proof that the function f(S)f(S) in Influence Maximization is Submodular
Influence maximization is an instance of Set cover problem. f(S)f(S) is the size of the union of nodes influenced by active set S. But in our case f(S)f(S) is a stochastic process, so we need to be a bit careful.
We use principle of deferred decision. We will create many parallel possible worlds and then average over them. f(S)f(S) is an expectation over several initializations of the random process. Now we will consider this expectation as several separate instances of deterministic sets once we picked the edges. Each of these influenced sets by S will be one parallel deterministic graph of influenced nodes. We flip all the coins at the beginning and record which edges fired successfully and get our deterministic graph of all the influenced nodes starting from S.
The influence set XuiX_u^i of node u, is the set of reachable nodes by live-edge paths from u, that are activated from u in the i-th realization.
Say fi(S)f_i(S) is an i-th realization of f(S)f(S) in an example graph it might look like this
f1(S={a,b})=∣{a,f,c,g}∪{b,c}∣f_1(S = \lbrace a, b \rbrace) = |\lbrace a, f, c, g\rbrace \cup \lbrace b, c \rbrace| = 5 and
f2(S={a,b})=∣{a,f,c,g}∪{d,e,h}∣=7f_2(S = \lbrace a, b \rbrace) = |\lbrace a, f, c, g\rbrace \cup \lbrace d, e, h\rbrace| = 7
From this we can see that fi(S)=∣∪v∈SXvi∣f_i(S) = |\cup_{v \in S} X_v^i|, where XviX_v^i are sets, and fi(S)f_i(S) is the size of their union. From the property of submodular functions, if X is submodular that implies fi(S)f_i(S) is submodular.
And the expected influence set size f(S)=1∣I∣∑i∈Ifi(S)f(S) = \frac1{|I|} \sum_{i \in I}f_i(S) and from the first property of submodular functions we discussed, if fif_i is submodular, the expectation of several fif_i functions with positive weights which is f(S)f(S) is also submodular.
In summary, to find the most influential set S of size k with the largest expected cascade size f(S)f(S) if set S is activated, we get it by generating multiple realizations i, each realization is a possible influenced world. And then we take the expectation of it.
So we want to be able to solve the following equation arg max∣S∣=kf(S)=1∣I∣∑i∈Ifi(S)\argmax_{|S|=k}f(S) = \frac1{|I|}\sum_{i \in I}f_i(S)
Proof that Hill Climbing gives near optimal Solutions
When f(S)f(S) is monotone and submodular then Hill climbing produces active set S where f(S)≥(1−1e).f(OPT)f(S) \geq (1 - \frac1{e}).f(OPT) or f(S)≥0.63.f(OPT)f(S) \geq 0.63.f(OPT)
Hill climbing is where we start with S0={}S_0 = \lbrace\rbrace, and produce sets S1,S2,⋯ ,SkS_1, S_2,\cdots,S_k by adding one element to S in a greedy way, that gives us the most active nodes.
TODO: The proof is in the handout.
From this we know that we have the worst case bound, through hill climbing, no matter the input data, Hill Climbing will do no worse than activating 63% nodes of best optimal solutions.
Evaluating f(S)f(S)
This is still an open question of how to compute it efficiently. But we can get good estimates by simulation, by repeating the diffusion process often enough in polynomial in n. If we achieve (1±ϵ)(1 \pm \epsilon) approximation to f(S)f(S) the greedy algorithm is now a (1−1/e−ϵ)(1 - 1/e - \epsilon) approximation by generalizing Nemhauser-Wolsey proof.
Say for a given network G, we repeated 10,000 times, we flip a coin for each edge and determine influence sets under coin-flip realization i. Each node u is associated with 10,000s of influence sets XuiX_u^i. The complexity of the greedy algorithm is O(k.n.R.m)O(k.n.R.m), where n is the number of nodes in G, k is the size of the initial activated set S, R the number of simulation worlds and m the number of edges in G.
Experimental Data
Example cascade process on collaboration network on arXiv high-energy physics theory, with 10,748 nodes and 53,000 edges. Trying to identify the spread of new scientific terminology or new research area.
They compared Independent Cascade Models with other basic heuristics. They considered two cases in Independent Cascade Model, one where each edge has uniform probability to get activated and the other where each edge from v to w has probability 1/deg(w) of activating w.
They simulated the process 10,000 times for each targeted set. Every time re-choosing edge outcomes randomly. And compared with the following 3 common heuristics.
- Degree centrality: Picking nodes with the highest degree
- Closeness centrality: Picking nodes in the "center" of the network
- Random nodes: Pick a random set of nodes.
Below images you can see how Independent cascade model performed wrt to the above heuristics when it comes to maximizing f(S)f(S)
Sketch-based Algorithms
Sketch-based Influence Maximization and Computation: Scaling up with Guarantees. CKIM 2014
In greedy hill climbing algorithm is that we have to generate R possible worlds and identify k nodes with the largest influence in these possible worlds. And for any node set, evaluating its influence in a possible world takes O(m)O(m) time, where m is the number of edges. Using sketches we can reduce estimation time from O(m)O(m) to O(1)O(1).
We do this by computing a small structure called sketch per node from which we estimate its influence from its signature or summary from this sketch of the node. Then we run influence maximisation using these estimates.
We take one of the possible deterministic network G(i)G{(i)} from one of the realizations, give each node a uniform random number from [0, 1]. And we compute the rank of each node v, which is the smallest number among the nodes that v can reach. For example u and v are two nodes with their assigned random numbers are 0.4 and 0.3 respectively and there is an edge from u to v. The rank of u will become 0.3 because of this directed edge. Because of this intuitively we can see that if a node can reach a lot of nodes and is very influential, its rank will get smaller and smaller. So an influential node will have the smallest rank. Now we can use the rank of any node v to estimate its influence in a possible world G(i)G^{(i)}. As we are assigning the nodes random number initially, we do this process of finding ranks of nodes multiple times and get multiple ranks for each node. We keep multiple ranks say the smallest c number of values for each node. We also keep the ranks fixed across the worlds.
Sketch-based Greedy algorithm.
Generate a number of possible worlds, construct reachability sketches for all nodes and generate c smallest ranks for each node. And now while we run the greedy influence maximization, whenever greedy asks for the influence of a node set S, check ranks and add a u node that has the smallest value. After u is chose, we find the influence set of nodes of node u and mark them as infected and remove their "ranks" or "numbers".
This algorithm guarantees that the expected running time is near-linear in the number of possible worlds. Also when c is large, it provides 1−1/e−ϵ1 - 1/e - \epsilon approximation with respect to the possible worlds considered.
This algorithm is much faster than the greedy hill climbing algorithm, but it does not provide and approximation guarantee on the true expected influence.
Experiments
Below image you can see how various influence maximisation algos/heuristics perform on various networks. We compared Greedy hill climbing, IRIE which is the state of the art heuristics currently, Degree based heuristics and sketch based algorithm.
We can see that sketch based heuristics achieve the same performance as greedy in a fraction of time.
Open questions
- More realistic viral marketing: Different marketing actions increase likelihood of initial activation for several nodes at once.
- Trade-offs between generality and feasibility when it comes to maximising influence with initial seeds
- Dealing with negative influence or critical comments and how it will fit into influencing the network. We only considered influence only works one way in our models.
Chapters
- Introduction, Structure of Graphs
- Properties of Networks and Random Graph Models
- Motifs and Structural Roles in Networks
- Community Structure in Networks
- Spectral Clustering
- Message Passing and Node Classification
- Graph Representation Learning
- Graph Neural Networks
- Graph Neural Networks - Pytorch Geometric
- Deep Generative Models for Graphs
- Link Analysis: PageRank
- Network Effects and Cascading Behaviour
- Probabilistic Contagion and Models of Influence
- Influence Maximization in Networks
- Outbreak Detection in Networks
- Network Evolution
- Reasoning over Knowledge Graphs
- Limitations of Graph Neural Networks
- Applications of Graph Neural Networks