14.Influence Maximization in Networks

Created on December 29|Last edited on January 13
Comment
﻿
In this section we discuss the problem of identifying the seeds in a network that maximize the influence so that most nodes adopt a certain behaviour.
Example of Influence is the Kate Middleton Effect, its the trend effect of Kate on UK fashion. The Kate Effect may be worth 1 Billion pounds to the UK fashion industry. Identifying influential nodes in a network like in Kate is very useful. Influential persons often have many friends, but it doesn't mean that high degree people are highly influential.
Influence MaximizationGiven a directed graph G, find k seeds that are the most influential to maximize the number of influenced people.
We have two classical propagation models.
Linear Threshold Model
Independent Cascade Model
Linear Threshold ModelEvery node has a random threshold θv∼U[0,1]\theta_v \sim U[0, 1]θv​∼U[0,1] which decides how much influence all the node v's friends have to do to convince v to adopt a behaviour.
A node v is influenced by each neighbour w according to a weight bv,wb_{v,w}bv,w​ such that ∑wneighbourofvbv,w≤1\sum_{w neighbour of v} b_{v, w} \leq 1∑wneighbourofv​bv,w​≤1
If the sum of these influences is above the threshold θ\thetaθ then the node will be adopting a behaviour.
∑wactiveneighborofvbv,w≥θv\displaystyle \sum_{w active neighbor of v} b_{v, w} \geq \theta_vwactiveneighborofv∑​bv,w​≥θv​
Starting from active nodes, we keep on activating nodes based on the threshold of all the nodes and the influence weights of the links, and keep repeating the process once new active nodes appear.
Independent Cascade ModelWe have a directed finite G = (V, E), Set S starts out with new behaviour and are active. Each dge (v, w) has a probability pvwp_{vw}pvw​ that signifies with what probability node w gets activated by node v. If node v is active, it gets one chance to make w active with probability pvwp_{vw}pvw​. The ordering of the active nodes while activating another node doesn't matter. If u, v are both active at the same time, it doesn't matter which tries to activate w first.
When a node v becomes active, it activates each out-neighbour w with prob pvwp_vwpv​w and all these activations spread through the network. 
Now the problem is to find out what initial set S of size k is to start with to influence the largest set of nodes. Most Influential Set S of size k. And the largest expected cascade size f(S)f(S)f(S) is the total number of activated nodes we end up with starting from S.
We consider the expected cascade size as it is a probabilistic model, as the nodes gets activated based on edge probability. so f(S)=1∣I∣∑Random realizations ifi(S)f(S) = \frac1{|I|}\sum_{\text{Random realizations i}}f_i(S)f(S)=∣I∣1​∑Random realizations i​fi​(S). So we compute the expected number of active nodes by sampling several realizations of the same process identifying total active nodes starting from S.
And the Influence maximisation problem we end up with is max⁡S of size kf(S)\max_{\text{S of size k}}f(S)maxS of size k​f(S), which is NP-Complete. Because we have to enumerate all possible subsets S and then find the respective expected cascade size. We use approximation algorithm to find the most influential set of k nodes. For some inputs the algorithm won't find globally optimal solution set. But we can prove that the algorithm will find a set S where f(S)≥0.63f(OPT)f(S) \geq 0.63 f(OPT)f(S)≥0.63f(OPT), where OPT is the globally optimal set. We are guaranteed to find an initial set S, and be able to activate at least 63% nodes compared to the globally optimal set.
﻿
Greedy Hill Climbing AlgorithmConsider a Greedy Hill Climbing to find S. Where XuX_uXu​ of each node that contains all the nodes that will eventually get active if we activate u. Xu={v1,v2,⋯ }X_u = \lbrace v_1, v_2,\cdots \rbraceXu​={v1​,v2​,⋯}. At each iteration i activate the node u that gives largest marginal gain max⁡uf(Si−1∪{u})\max_u f(S_{i-1} \cup \lbrace u\rbrace)maxu​f(Si−1​∪{u})
Starting with S0={}S_0 = \lbrace \rbraceS0​={}, we keep activating nodes u that maximizes f(Si−1∪{u})f(S_{i-1} \cup \lbrace u \rbrace)f(Si−1​∪{u}) and add the most influential node to Si−1S_{i-1}Si−1​ to get SiS_iSi​. We repeat this process for i=1⋯ki = 1 \cdots ki=1⋯k times.
Hill climbing produces a solution S where  f(S)≥(1−1/e).f(OPT)f(S) \geq (1 - 1/e).f(OPT)f(S)≥(1−1/e).f(OPT), this claim holds for all functions that are monotone,, and submodular.
Our function in case of influence maximisation is a monotone function as activating a new node and adding it to our initial set will only lead to more activations than the original S. In our case if S⊆TS \subseteq TS⊆T then f(S)≤f(T)f(S) \leq f(T)f(S)≤f(T) and f({})=0f(\lbrace \rbrace) = 0f({})=0
Adding an element to a set gives less improvement than adding it to one of its subsets. This is the submodular property.
∀S⊆T;f(S∪{u})−f(S)≥f(T∪{u})−f(T)\forall S \subseteq T; f(S \cup \lbrace u \rbrace) - f(S) \geq f(T \cup \lbrace u \rbrace) - f(T)∀S⊆T;f(S∪{u})−f(S)≥f(T∪{u})−f(T). Gain of adding a node u to a set T will always be less than the gain of adding a node u to a subset S of T. The more I have already the less I gain by adding something to it, in our case. The returns will be diminishing.
Submodular FunctionsFor us to show that a function f(.)f(.)f(.) is submodular, we have to prove
∀S⊆T;f(S∪{u})−f(S)≥f(T∪{u})−f(T)\displaystyle \forall S \subseteq T; f(S \cup \lbrace u\rbrace) - f(S) \geq f(T \cup \lbrace u \rbrace) - f(T)∀S⊆T;f(S∪{u})−f(S)≥f(T∪{u})−f(T)
Gain of adding a node to a larger set is smaller than gain of adding the same node to a subset of itself.
Non-negative combination of submodular functions is also a submodular function, if f1(x),f2,⋯f_1(x), f_2, \cdotsf1​(x),f2​,⋯ are submodular and c1,c2,⋯≥0c_1, c_2, \cdots \geq 0c1​,c2​,⋯≥0 then F(x)=∑ici.fi(x)F(x) = \sum_ic_i.f_i(x)F(x)=∑i​ci​.fi​(x) is also a submodular function
Also Consider submodular functions that return sets X1,⋯ ,XmX_1,\cdots, X_mX1​,⋯,Xm​ and f(S)=∣∪k∈SXk∣f(S) = |\cup_{k \in S} X_k|f(S)=∣∪k∈S​Xk​∣ which is the size of the union of sets Xk,k∈SX_k, k \in SXk​,k∈S, then f(S)f(S)f(S) is also submodular. The more sets you already have the less new area a given set u will cover.
Proof that the function f(S)f(S)f(S) in Influence Maximization is SubmodularInfluence maximization is an instance of Set cover problem. f(S)f(S)f(S) is the size of the union of nodes influenced by active set S. But in our case f(S)f(S)f(S) is a stochastic process, so we need to be a bit careful.
We use principle of deferred decision. We will create many parallel possible worlds and then average over them. f(S)f(S)f(S) is an expectation over several initializations of the random process. Now we will consider this expectation as several separate instances of deterministic sets once we picked the edges. Each of these influenced sets by S will be one parallel deterministic graph of influenced nodes. We flip all the coins at the beginning and record which edges fired successfully and get our deterministic graph of all the influenced nodes starting from S.
The influence set XuiX_u^iXui​ of node u, is the set of reachable nodes by live-edge paths from u, that are activated from u in the i-th realization.
Say fi(S)f_i(S)fi​(S) is an i-th realization of f(S)f(S)f(S) in an example graph it might look like this
f1(S={a,b})=∣{a,f,c,g}∪{b,c}∣f_1(S = \lbrace a, b \rbrace) = |\lbrace a, f, c, g\rbrace \cup \lbrace b, c \rbrace|f1​(S={a,b})=∣{a,f,c,g}∪{b,c}∣ = 5 and
f2(S={a,b})=∣{a,f,c,g}∪{d,e,h}∣=7f_2(S = \lbrace a, b \rbrace) = |\lbrace a, f, c, g\rbrace \cup \lbrace d, e, h\rbrace| = 7f2​(S={a,b})=∣{a,f,c,g}∪{d,e,h}∣=7
From this we can see that fi(S)=∣∪v∈SXvi∣f_i(S) = |\cup_{v \in S} X_v^i|fi​(S)=∣∪v∈S​Xvi​∣, where XviX_v^iXvi​ are sets, and fi(S)f_i(S)fi​(S) is the size of their union. From the property of submodular functions, if X is submodular that implies fi(S)f_i(S)fi​(S) is submodular.
And the expected influence set size f(S)=1∣I∣∑i∈Ifi(S)f(S) = \frac1{|I|} \sum_{i \in I}f_i(S)f(S)=∣I∣1​∑i∈I​fi​(S) and from the first property of submodular functions we discussed, if fif_ifi​ is submodular, the expectation of several fif_ifi​ functions with positive weights which is f(S)f(S)f(S) is also submodular.
In summary, to find the most influential set S of size k with the largest expected cascade size f(S)f(S)f(S) if set S is activated, we get it by generating multiple realizations i, each realization is a possible influenced world. And then we take the expectation of it.
So we want to be able to solve the following equation arg max⁡∣S∣=kf(S)=1∣I∣∑i∈Ifi(S)\argmax_{|S|=k}f(S) = \frac1{|I|}\sum_{i \in I}f_i(S)argmax∣S∣=k​f(S)=∣I∣1​∑i∈I​fi​(S)
Proof that Hill Climbing gives near optimal SolutionsWhen f(S)f(S)f(S) is monotone and submodular then Hill climbing produces active set S where f(S)≥(1−1e).f(OPT)f(S) \geq (1 - \frac1{e}).f(OPT)f(S)≥(1−e1​).f(OPT) or f(S)≥0.63.f(OPT)f(S) \geq 0.63.f(OPT)f(S)≥0.63.f(OPT)
Hill climbing is where we start with S0={}S_0 = \lbrace\rbraceS0​={}, and produce sets S1,S2,⋯ ,SkS_1, S_2,\cdots,S_kS1​,S2​,⋯,Sk​ by adding one element to S in a greedy way, that gives us the most active nodes.
TODO: The proof is in the handout.
From this we know that we have the worst case bound, through hill climbing, no matter the input data, Hill Climbing will do no worse than activating 63% nodes of best optimal solutions.
Evaluating f(S)f(S)f(S)This is still an open question of how to compute it efficiently. But we can get good estimates by simulation, by repeating the diffusion process often enough in polynomial in n. If we achieve (1±ϵ)(1 \pm \epsilon)(1±ϵ) approximation to f(S)f(S)f(S) the greedy algorithm is now a (1−1/e−ϵ)(1 - 1/e - \epsilon)(1−1/e−ϵ) approximation by generalizing Nemhauser-Wolsey proof.
Say for a given network G, we repeated 10,000 times, we flip a coin for each edge and determine influence sets under coin-flip realization i. Each node u is associated with 10,000s of influence sets XuiX_u^iXui​. The complexity of the greedy algorithm is O(k.n.R.m)O(k.n.R.m)O(k.n.R.m), where n is the number of nodes in G, k is the size of the initial activated set S, R the number of simulation worlds and m the number of edges in G.
Experimental DataExample cascade process on collaboration network on arXiv high-energy physics theory, with 10,748 nodes and 53,000 edges. Trying to identify the spread of new scientific terminology or new research area.
They compared Independent Cascade Models with other basic heuristics. They considered two cases in Independent Cascade Model, one where each edge has uniform probability to get activated and the other where each edge from v to w has probability 1/deg(w) of activating w.
They simulated the process 10,000 times for each targeted set. Every time re-choosing edge outcomes randomly. And compared with the following 3 common heuristics.
Degree centrality: Picking nodes with the highest degree
Closeness centrality: Picking nodes in the "center" of the network
 Random nodes: Pick a random set of nodes.
Below images you can see how Independent cascade model performed wrt to the above heuristics when it comes to maximizing f(S)f(S)f(S)

﻿
Sketch-based AlgorithmsSketch-based Influence Maximization and Computation: Scaling up with Guarantees. CKIM 2014
In greedy hill climbing algorithm is that we have to generate R possible worlds and identify k nodes with the largest influence in these possible worlds. And for any node set, evaluating its influence in a possible world takes O(m)O(m)O(m) time, where m is the number of edges. Using sketches we can reduce estimation time from O(m)O(m)O(m) to O(1)O(1)O(1).
We do this by computing a small structure called sketch per node from which we estimate its influence from its signature or summary from this sketch of the node. Then we run influence maximisation using these estimates.
We take one of the possible deterministic network G(i)G{(i)}G(i) from one of the realizations, give each node a uniform random number from [0, 1]. And we compute the rank of each node v, which is the smallest number among the nodes that v can reach. For example u and v are two nodes with their assigned random numbers are 0.4 and 0.3 respectively and there is an edge from u to v. The rank of u will become 0.3 because of this directed edge. Because of this intuitively we can see that if a node can reach a lot of nodes and is very influential, its rank will get smaller and smaller. So an influential node will have the smallest rank. Now we can use the rank of any node v to estimate its influence in a possible world G(i)G^{(i)}G(i). As we are assigning the nodes random number initially, we do this process of finding ranks of nodes multiple times and get multiple ranks for each node. We keep multiple ranks say the smallest c number of values for each node. We also keep the ranks fixed across the worlds.
Sketch-based Greedy algorithm.Generate a number of possible worlds, construct reachability sketches for all nodes and generate c smallest ranks for each node. And now while we run the greedy influence maximization, whenever greedy asks for the influence of a node set S, check ranks and add a u node that has the smallest value. After u is chose, we find the influence set of nodes of node u and mark them as infected and remove their "ranks" or "numbers".
This algorithm guarantees that the expected running time is near-linear in the number of possible worlds. Also when c is large, it provides 1−1/e−ϵ1 - 1/e - \epsilon1−1/e−ϵ approximation with respect to the possible worlds considered.
This algorithm is much faster than the greedy hill climbing algorithm, but it does not provide and approximation guarantee on the true expected influence.
ExperimentsBelow image you can see how various influence maximisation algos/heuristics perform on various networks. We compared Greedy hill climbing, IRIE which is the state of the art heuristics currently, Degree based heuristics and sketch based algorithm.
We can see that sketch based heuristics achieve the same performance as greedy in a fraction of time.
﻿
Open questions
More realistic viral marketing: Different marketing actions increase likelihood of initial activation for several nodes at once.
Trade-offs between generality and feasibility when it comes to maximising influence with initial seeds
Dealing with negative influence or critical comments and how it will fit into influencing the network. We only considered influence only works one way in our models.
﻿
Chapters
Introduction, Structure of Graphs
Properties of Networks and Random Graph Models
Motifs and Structural Roles in Networks
Community Structure in Networks
Spectral Clustering
Message Passing and Node Classification
Graph Representation Learning
Graph Neural Networks
Graph Neural Networks - Pytorch Geometric
Deep Generative Models for Graphs
Link Analysis: PageRank
Network Effects and Cascading Behaviour
Probabilistic Contagion and Models of Influence
Influence Maximization in Networks
Outbreak Detection in Networks
Network Evolution
Reasoning over Knowledge Graphs
Limitations of Graph Neural Networks
Applications of Graph Neural Networks
﻿
﻿
Add a comment
14.Influence Maximization in Networks

Influence Maximization

Linear Threshold Model

Independent Cascade Model

Greedy Hill Climbing Algorithm

Submodular Functions

Proof that the function $f (S)$ in Influence Maximization is Submodular

Proof that Hill Climbing gives near optimal Solutions

Evaluating $f (S)$

Experimental Data

Sketch-based Algorithms

Sketch-based Greedy algorithm.

Experiments

Open questions

Chapters