# generalized_nonbacktracking_bounds_on_the_influence__a7e4220a.pdf

Journal of Machine Learning Research 21 (2020) 1-36 Submitted 3/18; Revised 10/19; Published 02/20

Generalized Nonbacktracking Bounds on the Inﬂuence in Independent Cascade Models

Emmanuel Abbe eabbe@princeton.edu Program in Applied and Computational Mathematics and Department of Electrical Engineering Princeton University Princeton, NJ 08540, USA

Sanjeev Kulkarni kulkarni@princeton.edu Department of Electrical Engineering Princeton University Princeton, NJ 08540, USA

Eun Jee Lee ejlee.mail@gmail.com Program in Applied and Computational Mathematics Princeton University Princeton, NJ 08540, USA

Editor: Bert Huang

This paper develops deterministic upper and lower bounds on the inﬂuence measure in a network, more precisely, the expected number of nodes that a seed set can inﬂuence in the independent cascade model. In particular, our bounds exploit r-nonbacktracking walks and Fortuin Kasteleyn Ginibre (FKG) type inequalities, and are computed by message passing algorithms. Further, we provide parameterized versions of the bounds that control the trade-oﬀbetween eﬃciency and accuracy. Finally, the tightness of the bounds is illustrated on various network models. Keywords: Inﬂuence Estimation, Nonbacktracking Walk, Message Passing, Social Networks, Independent Cascade Model

1. Introduction

Social interaction is a building block of the society. Through interacting with each other, people exchange their information, ideas, opinions, etc. and inﬂuence one another. Inﬂuence propagation is a study which investigates how inﬂuence spread given a social network from initially inﬂuenced nodes, called seeds, in the network. Studying how inﬂuence spreads in a network allows us to answer many important questions in a broad range of ﬁelds, such as viral marketing (Leskovec et al., 2007), sociology (Granovetter, 1978; Lopez-Pintado and Watts, 2008; Watts, 2002), communication (Khelil et al., 2002), epidemiology (Shulgin et al., 1998), and social network analysis (Yang and Counts, 2010). With the advent of the Internet and the availability of large data on social networks, inﬂuence propagation is becoming the center of interests in machine learning and data mining community. The community has proposed several diﬀerent mathematical models to abstractize inﬂuence propagation in the real world, and one of the most widely adopted

c 2020 Emmanuel Abbe, Sanjeev Kulkarni, Eun Jee Lee.

License: CC-BY 4.0, see https://creativecommons.org/licenses/by/4.0/. Attribution requirements are provided at http://jmlr.org/papers/v21/18-112.html.

Abbe, Kulkarni and Lee

models is the independent cascade model (ICM). In this model, a network is deﬁned as a directed graph G = (V, E) where every edge is given transmission probability. Each edge decides its state, live or not, with a Bernoulli trial with the success rate equals to its transmission probability. Then, the inﬂuence propagation starts from the initially inﬂuenced nodes, called the seeds, and spreads along the live edges. A node is inﬂuenced if it is reachable from the seeds with the live edges.

While various important information can be extracted by studying ICM, the most fundamental and essential problem in inﬂuence propagation is to ﬁnd the inﬂuence, the expected number of inﬂuenced nodes given a network and a set of initially inﬂuenced nodes. Unfortunately, computing the inﬂuence of a network in ICM is proven to be #P hard (Wang et al., 2012) and thus many previous works have used Monte Carlo (MC) simulations to estimate the true inﬂuence (Kempe et al., 2003; Chen et al., 2009). Despite its simplicity, approximating the inﬂuence via MC simulations can be computationally expensive. MC approximation requires O(|V |2) trials to concentrate and each trial requires O(|V | + |E|) computation. Considering that modern real-world networks often have multiple millions of nodes, this amount of computation quickly becomes unreasonable.

To overcome the limitations of Monte Carlo simulations, many researchers have been taking both algorithmic and theoretical approaches to approximate the inﬂuence of given seeds in a network. Chen and Teng (Chen and Teng, 2017) provided a probabilistic guarantee on estimating the inﬂuence of a single seed with a relative error bound with the expected running time O(ℓ(|V | + |E|)|V | log |V |/ε2), such that with probability 1 1/nℓ, for all node v, the computed inﬂuence of v has relative error at most ε. Draief et al., (Draief et al., 2006) introduced an upper bound for the inﬂuence of any set of seeds by using the spectral radius of the adjacency matrix. Tighter upper bounds were later suggested in (Lemonnier et al., 2014) which relate the ratio of inﬂuenced nodes in a network to the spectral radius of the so-called Hazard matrix. Further, improved upper bounds which account for sensitive edges were introduced in (Lee et al., 2016). In contrast, there has been little work on ﬁnding a tight lower bound for the inﬂuence. An exception is a work by Khim et al. (Khim et al., 2016), where the lower bound is obtained by only considering the inﬂuence through the paths whose length is no more than a parameter m.

In this article, we extend the results in the paper (Abbe et al., 2017) and propose upper and lower bounds on the inﬂuence using nonbacktracking walks and Fortuin Kasteleyn Ginibre (FKG) type inequalities. FKG inequality is useful to prove correlation between non-decreasing functions, which is instrumental in inﬂuence propagation since the probability that a node is inﬂuenced is non-decreasing with respect to the partial order of random variables describing the states of the edges. The bounds can be eﬃciently obtained by message passing algorithms. This shows that nonbacktracking walks can also impact inﬂuence propagation, making another case for the use of nonbacktracking walks in graph and network problems as in (Krzakala et al., 2013; Karrer et al., 2014; Bordenave et al., 2015; Abbe and Sandon, 2015), discussed later in the paper. Next, we show that the proposed upper bound is monotone and submodular. This property allows us to use the upper bound, rather than some approximations, when greedily ﬁnding a set of seeds that maximizes the inﬂuence. Further, we provide parametrized versions of the bounds that can adjust the trade-oﬀbetween the eﬃciency and the accuracy of the bounds.

Generalized Nonbacktracking Bounds on the Influence

2. Background

We introduce here the independent cascade model and provide background for the main results.

Deﬁnition 1 (Independent Cascade Model). Consider a directed graph G = (V, E) with |V | = n, a transmission probability matrix B [0, 1]n n, and a seed set S0 V . For any u V , let N+(u) be the set of out-neighbors of node u. The independent cascade model IC(G, B) sequentially generates the inﬂuenced set St V starting from the seed set S0 for each step t 1 as follows. At the beginning of step t, St is initialized to be an empty set. Then, each node u St 1 attempts to inﬂuence v N+(u)\ t 1 i=0Si with probability Buv, i.e., node u inﬂuences its uninﬂuenced out-neighbor v with probability Buv. If v is inﬂuenced as a result of the attempts, add v to St. At the end of step t, St contains all nodes that are inﬂuenced during step t. The process stops at T if ST = at the end of the step t = T. The set of the inﬂuenced nodes at the end of propagation is deﬁned as S = T 1 i=0 St.

We often refer an edge (u, v) being live if node u inﬂuences node v. The independent cascade model is equivalent to the live-edge graph model, where the inﬂuence happens at once, rather than sequentially. The live-edge graph model ﬁrst decides the state of every edge with a Bernoulli trial, i.e., edge (u, v) becomes live independently with probability Buv. Then, the set of the inﬂuenced nodes is deﬁned as the nodes that are reachable from at least one of the seeds by the live edges.

Deﬁnition 2 (Inﬂuence). The expected number of nodes that are inﬂuenced at the end of the propagation process is called the inﬂuence (rather than the expected inﬂuence, with a slight abuse of terminology) of a seed set S0 on IC(G, B), and is deﬁned as

v V P(v is inﬂuenced).

It is shown in (Wang et al., 2012) that computing the inﬂuence σ(S0) in the independent cascade model IC(G, B) is #P-hard, even with a single seed, i.e., |S0| = 1. Next, we deﬁne nonbacktracking (NB) walks on a directed graph. Nonbacktracking walks have been used for studying the characteristics of networks. To the best of our knowledge, the use of NB walks in the context of epidemics was ﬁrst introduced in the paper of Karrer et al. (Karrer and Newman, 2010) and later applied to percolation in (Karrer et al., 2014). In particular, Karrer et al. reformulate the spread of inﬂuence as a message passing process and demonstrate how the resulting equations can be used to calculate an upper bound on the number of nodes that are susceptible at a given time. As we shall see, we take a diﬀerent approach to the use of the NB walks, which focuses on the eﬀective contribution of a node in inﬂuencing another node and accumulates such contributions to obtain upper and lower bounds. More recently, nonbacktracking walks are used for community detection (Krzakala et al., 2013; Bordenave et al., 2015; Abbe and Sandon, 2015).

Deﬁnition 3 (Nonbacktracking Walk). Let G = (V, E) be a directed graph. A nonbacktracking walk of length k is deﬁned as w(k) = (v0, v1, . . . , vk), where vi V and (vi 1, vi) E for all i [k], and vi 1 = vi+1 for all i [k 1].

Abbe, Kulkarni and Lee

We next recall a key inequality introduced by Fortuin et. al (Fortuin et al., 1971).

Theorem 1 (FKG Inequality). Let (Γ, ) be a distributive lattice, where Γ is a ﬁnite partially ordered set, ordered by , and let µ be a positive measure on Γ satisfying the following condition: for all x, y Γ,

µ(x y)µ(x y) µ(x)µ(y),

where x y = max{z Γ : z x, z y} and x y = min{z Γ : y z, y z}. Let f and g be both increasing (or both decreasing) functions on Γ. Then,

x Γ µ(x))( X

x Γ f(x)g(x)µ(x)) ( X

x Γ f(x)µ(x))( X

x Γ g(x)µ(x)).

3. Nonbacktracking Upper Bounds

In this section, we present three types of nonbacktracking upper bounds on the inﬂuence in the independent cascade model and explain the motivations and intuitions of the bounds. The bounds use nonbacktracking walks and FKG inequalities and are computed eﬃciently by message passing algorithms. The proposed upper bounds listed below have varying eﬃciency and accuracy trade-oﬀs.

Nonbacktracking upper bound (NB-UB)

Tunable nonbacktracking upper bound (t NB-UB)

r-nonbacktracking upper bound (r NB-UB)

In particular, the nonbacktracking upper bound on a network based on a graph G(V, E) runs in O(|V |2 + |V ||E|), whereas Monte Carlo simulation would require O(|V |3 + |V |2|E|) computations without knowing the variance of the inﬂuence, which is harder to estimate than the inﬂuence. The reason for the large computational complexity of MC is that in order to ensure that the standard error of the estimation does not grow with respect to |V |, MC requires O(|V |2) computations. Hence, for large networks, where MC may not be feasible, our algorithms can still provide bounds on the inﬂuence. Furthermore, the proposed upper bound is monotone and submodular, so that it can be used in place of MC approximation in greedy algorithm for inﬂuence maximization problem. Details of the greedy algorithm in inﬂuence maximization problem can be found in (Kempe et al., 2003). The tunable nonbacktracking upper bound better accounts for the sensitive (as deﬁned in (Lee et al., 2016)) edges near the seeds. However, for large values of t, it may result in exponential computational complexity for dense networks. The r-nonbacktracking upper bound generalizes the nonbacktracking upper bound by considering r-nonbacktracking walks, i.e., avoiding cycles of length r rather than just backtracking, and runs in O((|V | + |E|)|V |r 1). This section is organized as follows. First, sections 3.1 through 3.3 present the deﬁnition of each proposed upper bound followed by an eﬃcient algorithm to compute the bound and its computational complexity. Next, section 3.4 shows how each upper bound is computed on a small example network and compares the bounds with the inﬂuence.

Generalized Nonbacktracking Bounds on the Influence

3.1. Nonbacktracking Upper Bound (NB-UB)

We start by deﬁning the following terms for the independent cascade model IC(G, B), where G = (V, E) and |V | = n.

Deﬁnition 4. For any v V , we deﬁne the set of in-neighbors N (v) = {u V : (u, v) E} and the set of out-neighbors N+(v) = {u V : (v, u) E}.

Deﬁnition 5. For any v V and l [n 1], the set Pl(S0 v) is deﬁned as the set of all paths with length l from any seed s S0 to v. We call a path P is live iﬀevery edge in P is live. For l = 0, we deﬁne P0(S0 v) as the set (of size one) of the zero-length path containing node v and assume the path P P0(S0 v) is live iﬀv S0.

Deﬁnition 6. For any v V and l {0, . . . , n 1}, we deﬁne

p(v) = P(v is inﬂuenced)

pl(v) = P( P Pl(S0 v){P is live})

pl(u v) = P( P Pl(S0 u),P

v{P is live and edge (u, v) is live})

In other words, pl(v) is the probability that node v is inﬂuenced by live paths of length l, i.e., there exists a live path of length l from a seed to v, and pl(u v) is the probability that v is inﬂuenced by node u with live paths of length l + 1, i.e., there exists a live path of length l + 1 from a seed to v that ends with edge (u, v).

Lemma 1. For any v V ,

l=0 (1 pl(v)). (1)

For any v V and l [n 1],

u N (v) (1 pl 1(u v)). (2)

Lemma 1, which can be proved by FKG inequalities, suggests that given pl 1(u v), we may compute an upper bound on the inﬂuence. Ideally, pl 1(u v) can be computed by considering all paths that end with (u, v) having length l. However, this results in exponential complexity O(nl), as l goes up to n 1. Thus, in Deﬁnition 7, we present an eﬃcient way to compute an upper bound UBl 1(u v) on pl 1(u v), which in turns gives an upper bound UBl(v) on pl(v), with the following recursion formula.

Deﬁnition 7. For all l {0, . . . , n 1} and u, v V such that (u, v) E, UBl(u) [0, 1] and UBl(u v) [0, 1] are deﬁned recursively as follows. Initial condition: For every s S0, s+ N+(s), u V \S0, and v N+(u),

UB0(s) = 1, UB0(s s+) = Bss+ (3)

UB0(u) = 0, UB0(u v) = 0. (4)

Abbe, Kulkarni and Lee

Recursion: For every l [n 1], s S0, s+ N+(s), s N (s), u V \ S0, and v N+(u)\S0,

UBl(s) = 0, UBl(s s+) = 0, UBl(s s) = 0 (5)

UBl(u) = 1 Y

w N (u) (1 UBl 1(w u)) (6)

( Buv(1 1 UBl(u) 1 UBl 1(v u)), if v N (u)

Buv UBl(u), otherwise.

Equation (5) follows from that for any seed node s S0 and for all l > 0, the probabilities pl(s) = 0, pl(s s+) = 0, and pl(s s) = 0. A naive way to compute UBl(u v) is UBl(u v) = Buv UBl 1(u), but this results in an extremely loose bound due to the backtracking. For a tighter bound, we use nonbacktracking in Equation (7), i.e., when computing UBl(u v), we ignore the contribution of UBl 1(v u). Finally, an upper bound on the probability that node v is inﬂuence at the end of propagation is computed by UB(v) = (1 Qn 1 l=0 (1 UBl(v))). The following theorem shows that the summation of UB(v) for v V results in an upper bound on the inﬂuence.

Theorem 2 (Nonbacktracking Upper Bound (NB-UB)). For any independent cascade model IC(G, B) and a seed set S0 V ,

l=0 (1 UBl(v))) =: σ+(S0), (8)

where UBl(v) is obtained recursively as in Deﬁnition 7.

The Nonbacktracking Upper Bound is monotone and submodular. Therefore, it can be used in place of approximations on the inﬂuence in various greedy algorithms for the inﬂuence maximization problem.

Theorem 3. For any independent cascade model IC(G, B), the nonbacktracking upper bound σ+ : 2V [0, |V |] is monotone and submodular.

Next, we present Nonbacktracking Upper Bound (NB-UB) algorithm which computes UBl(v) and UBl(u v) by message passing. At the l-th iteration, the variables in NB-UB have the following meanings.

Sl is the set of nodes that are considered in computation at the l-th iteration.

Mcurr(v) = {(u, UBl 1(u v)) : u is an in-neighbor of v and u Sl 1} contains the current messages to node v and their sources, where the source is an in-neighbor u of v that is in Sl 1 and the message is the upper bound UBl 1(u v).

MSrc(v) = {u : u is a in-neighbor of v and u Sl 1} is the set of in-neighbors of v in Sl 1.

Mcurr(v)[u] = UBl 1(u v) is the current message from u to v.

Mnext(v) = {(u, UBl(u v)) : u is an in-neighbor of v and u Sl} contains the set of messages to node v and their sources for the algorithm to use next.

Generalized Nonbacktracking Bounds on the Influence

Algorithm 1 Nonbacktracking Upper Bound (NB-UB)

Initialize: UBl(v) = 0 for all 0 l n 1 and v V Initialize: Insert (s, 1) to Mnext(s) for all s S0 for l = 0 to n 1 do

for u Sl do

Mcurr(u) = Mnext(u) Clear Mnext(u) UBl(u) = Process Incoming Msg UB(Mcurr(u))

for u Sl do

for v N+(u) \ S0 do

Sl+1.insert(v) if v MSrc(u) then

UBl(u v) = Generate Outgoing Msg UB(Mcurr(u)[v], UBl(u), Buv) Mnext(v).insert((u, UBl(u v))). else

UBl(u v) = Generate Outgoing Msg UB(0, UBl(u), Buv) Mnext(v).insert((u, UBl(u v))). Output: UBl(u) for all l, u

At the beginning, every seed node s S0 is initialized such that Mcurr(s) = {(s, 1)} in order to satisfy the initial condition, UB0(s) = 1. For l-th iteration, every node u in Sl is processed as follows. First, Process Incoming Msg UB(Mcurr(u)) computes UBl(u) as in Equation (6). Second, u passes a message to its neighbor v N+(u) \ S0 along the edge (u, v), and v stores (inserts) the message in Mnext(v) in preparation for the next iteration. The message contains 1) the source of the message, u, and 2) the upper bound, UBl(u v), which is computed as in Equation (7), by the function Generate Outgoing Msg UB. Finally, the algorithm outputs UBl(u) for all u V and l {0, . . . , n 1}, and the upper bound σ+(S0) is computed by Equation (8). Computational complexity: Notice that for each iteration l {0, . . . , n 1}, the algorithm accesses at most n nodes, and for each node v, the functions Process Incoming Msg UB and Generate Outgoing Msg UB are computed in O(deg(v)) and O(1), respectively. Therefore, the worst case computational complexity is O(|V |2 + |V ||E|).

3.2. Tunable Nonbacktracking Upper Bound (t NB-UB)

One key observation on NB-UB is that when computing the nonbacktracking upper bound UBl(v) recursively on l, the computations with small l s are more important than the ones with larger l s since UBl(v) is a function of UBl 1(u) for some u V . In other words, computing loose bounds UBl(v) for smaller l hurts the accuracy of the bound on the inﬂuence more than computing loose bounds for larger l, because the eﬀect of loose UBl(v) with smaller l builds up more in the ﬁnal bound σ+(S0). Therefore, we propose a parameterized upper bound t NB-UB that computes pl(v) up to l t precisely. As shown in Algorithm 2, t NB-UB inputs the parameter t, which indicates the maximum length of the paths that the algorithm ﬁnds to compute the exact, rather than the upper bound on, probability of inﬂuence. That is, the algorithm computes p t(u) that node u is

Abbe, Kulkarni and Lee

inﬂuenced by live paths whose length is less than or equal to t.

p t(u) = P( P { t i=0Pi(S0 u)}{P is live}). (9)

Then, we start (non-parameterized) NB-UB algorithm from l = t + 1 with the new initial conditions: for all u V and v N+(u),

UBt(u) = p t(u) (10)

UBt(u v) = pt(u v) (11)

Finally, the upper bound t NB-UB is computed as P v V (1 Qn 1 l=t (1 UBl(v))).

Algorithm 2 Tunable nonbacktracking upper bound (t NB-UB)

parameter: non-negative integer t n 1 Initialize: UBt(v) = 0 for all t l n 1 and v V for u V do

UBt(u) = p t(u) for v N+(u) \ S0 do

if pt(u v) > 0 then

St+1.insert(v) Mnext(v).insert(u, pt(u v)) for l = t + 1 to n 1 do

for u Sl do

Mcurr(u) = Mnext(u) Clear Mnext(u) UBl(u) = Process Incoming Msg UB(Mcurr(u))

for u Sl do

for v N+(u) \ S0 do

Sl+1.insert(v) if v Mcurr(u) then

UBl(u v) = Generate Outgoing Msg UB(Mcurr(u)[v], UBl(u), Buv) Mnext(v).insert((u, UBl(u v))). else

UBl(u v) = Generate Outgoing Msg UB(0, UBl(u), Buv) Mnext(v).insert((u, UBl(u v))). Output: UBl(u) for all l = {t, t + 1, . . . , n 1}, u V

For larger values of t, the algorithm results in tighter upper bounds, while the computational complexity may increase exponentially for dense networks. Thus, this method is most applicable in sparse networks, where the degree of each node is bounded.

3.3. r Nonbacktracking Upper Bound (r NB-UB)

In this section, we generalize the nonbacktracking upper bound by considering r-nonbacktracking walks, i.e., avoiding cycles of length r rather than just backtracks. The r-nonbacktracking upper bound is not necessarily a t NB-UB which computes p t(v) for all v V exactly, even in case of r = t.

Generalized Nonbacktracking Bounds on the Influence

Deﬁnition 8. Consider a directed graph G = (V, E). For any integer r 1 and node u V , let Pr 1( u) be the set of all paths of length r 1 ending at node u and Pr 1(u ) be the set of all paths of length r 1 starting from node u. To simplify the notation, we use Pr 1(S ) for a set of nodes S to denote the set of all paths of length r 1 starting from any one of the nodes in S. For two same length paths P = (v1, . . . , vr) and Q = (u1, . . . , ur), we use the notation P Q when vi+1 = ui for all i [r 1] and v1 = ur. In other words, P Q iﬀ (v1, v2 = u1, . . . , vr = ur 1, ur) is a path of length r. For two paths P = (v0, . . . , vr) and Q = (u1, . . . , ur), we use the notation Q P when vi = ui for all i [r]. That is, Q is a subpath of P which is obtained by removing the ﬁrst node v0 from P. Also, let P Q = (v0, v1) be the edge in path P but not in path Q.

Recall that NB-UB computes the upper bounds UBl(u) on pl(u) and UBl(u v) on pl(u v) recursively. Similarly, r NB-UB computes the upper bounds UBl(u) and UBl(u, P) recursively, as deﬁned in Deﬁnition 9. UBl(u, P) is analogous to UBl(u v), since UBl(u, P) provides the eﬀective inﬂuence of u along the path P whereas UBl(u v) gives the eﬀective inﬂuence of u along the edge (u, v).

Deﬁnition 9 (r-Nonbacktracking Upper Bound (r NB-UB)). For all l {0, . . . , n 1}, u V and P Pr 1(u ), we deﬁne UBl(u) [0, 1] and UBl(u, P) [0, 1] as follows, using function f deﬁned later in Deﬁnition 10. Initial condition: For every l r 1 and u V ,

UBl(u) = f( P Pl(S0 u)(P, 1)). (12)

For every P Pr 1(s ) and s S0,

UBr 1(s, P) = 1. (13)

Recursion: For every l {r, . . . , n 1}, s S0, u, x V \S0, v N+(u)\S0, and Q Pr 1(v ),

UBl(s) = 0, UBl(s, P) = 0 for any path P (14)

UBl(v, Q) = 1 Y

P Q (1 Buv UBl 1(u, P)) (15)

UBl(x) = f( P {P Pr 1(y x): y V \S0}(P, UBl(y, P))). (16)

Equation (12) computes the upper bounds on the probability that node u is inﬂuenced by at least one length l path, for each l r 1. To provide the initial values, we set UBr 1(s, P) = 1 indicating that the probability of a seed node s being inﬂuenced equals to 1. In the recursion steps, Equation (15) computes UBl(v, Q) the upper bound on the probability that the starting node v of path Q is inﬂuenced. Note that the upper bound UBl(v, Q) = UBl (r 1)(v), since UBl(v, Q) is the upper bound on the probability that node v is inﬂuenced by a path of length l (r 1) that does not contain any node in Q, whereas UBl (r 1)(v) is the upper bound on the probability that node v is inﬂuenced by any path of length l (r 1). Note that given path Q, the starting node v is deterministic. We use UBl(v, Q) instead of UBl(Q) in order to emphasize that the bound is for node v of path Q.

Abbe, Kulkarni and Lee

Finally, Equation (16) computes UBl(x), the upper bound on the probability that node x is inﬂuenced by at least one r-nonbacktracking walks of length l by the function f. In the following, we deﬁne the function f.

Deﬁnition 10. Consider a directed graph G = (V, E) and a non-negative integer r. The function f takes input P (r) = {P1, . . . , Pm} and x(r) = {x(r) P1 , . . . , x(r) Pm}, where P (r) is a set of length r paths in G ending at the same node, say v, and x(r) is the set of real numbers in [0, 1] associated with the paths. When the input is the empty sets, f( , ) = 0. Otherwise, f(P (r), x(r)) computes the output by the following recursion. For every i [r] from r to 1,

P (i 1) = {P : P P, P P (i)} (17)

and for every P P (i 1),

x(i 1) P = 1 Y

P {P:P P,P P (i)} (1 Bex(i) P ) (18)

where e = P P and Be is the transmission probability of edge e. By the deﬁnition of P (r), the set P (0) has only one element which is the path P0 = (v) of length 0 containing node v. Therefore, at the end of the recursion, i = 1, it computes the ﬁnal output x(0) P0 [0, 1].

We next illustrates how to compute the function f in a small example. Consider a directed graph G where all edges have the same transmission probability p. Let the function f take the inputs P (3) = {(a, b, c, d), (a, f, c, d), (e, f, c, d)} and f(3) = {1, 1, 1}. Then, P (2) = {(b, c, d), (f, c, d)} since (b, c, d) (a, b, c, d) and (f, c, d) (a, f, c, d), (e, f, c, d). Likewise, P (1) = {(c, d)} as (c, d) (b, c, d), (f, c, d) and P (0) = {(d)} as (d) (c, d). In ﬁgure 1, it shows the sets P (3), . . . , P (0) in a radix tree structure. With the radix tree, we can easily ﬁnd the set {P : P P, P P (i)} by accessing the descendants of the path P . For example, path P = (f, c, d) in the radix tree has two descendants a and e, where each corresponds to the path (a, f, c, d) and (e, f, c, d). This illustrates the set {P : (f, c, d) P, P P (i)} = {(a, f, c, d), (e, f, c, d)}.

Figure 1: An example of radix tree structure

Generalized Nonbacktracking Bounds on the Influence

The function f computes the output recursively from r = 3 to 1.

f(2) (b,c,d) = 1 Y

P {(a,b,c,d)} (1 pf(3) P ) = p

f(2) (f,c,d) = 1 Y

P {(a,f,c,d),(e,f,c,d)} (1 pf(3) P ) = 2p p2

f(1) (c,d) = 1 Y

P {(b,c,d),(f,c,d)} (1 pf(2) P ) = 3p2 p3 2p4 + p5

f(0) (d) = 1 Y

P {(c,d)} (1 pf(1) P ) = 3p3 p4 2p5 + p6.

We use the function f to compute the upper bound UBl(v) in each step, instead of using Lemma 1, Equation (2) as in NB-UB. This results in a tighter upper bound since Equation (2) bounds the union of events that node v is inﬂuenced by at least one of the walks by treating each walk independent whereas f accounts for the dependencies among the walks. Also, this function ensures that r-nonbacktracking upper bound is smaller than or equal to r -nonbacktracking upper bound when r > r .

Theorem 4. For any independent cascade model IC(G, B), a seed set S0 V and an integer r 2,

l=0 (1 UBl(v))) =: σ+ r (S0) σ+(S0), (19)

where UBl(v) is obtained recursively as in Deﬁnition 9.

Next, we present the r-Nonbacktracking Upper Bound (r NB-UB) algorithm which computes UBl(v) and UBl(v, P) by message passing. At the l-th iteration, the variables in r NB-UB have the following meanings.

Sl represents the set of nodes that are used in the computations at the l-th iteration.

Mcurr(u) = {(P, UBl 1( , P)) : P Pr 1( u) and u Sl 1} is the set of messages that node u received in the previous step. The message contains a (r 1)-length path ending at node u and the bound associated with the path.

Mnext(u) = {(P, UBl( , P)) : P Pr 1( u) and u Sl} is the set messages that node u receive in the current iteration. The message contains a (r 1)-length path ending at u and the bound associated with the path.

At the beginning, the ﬁrst r 1 upper bounds for each node are computed as well as UBr 1(s, P) for all s S0 and P Pr 1(S0 ) to satisfy the initial condition in Deﬁnition 9. For each l-th iteration, the algorithm processes every node u in Sl as follows. First, f(Mcurr(u)) computes the upper bound UBl(u) as in Equation (16) deﬁned in Deﬁnition 10. Second, u transmits a message to its neighbor v N+(u) \ S0, and v stores (inserts) the message in Mnext(v) for the next iteration. The message contains 1) the path Q of length r 1 ending at node v which satisﬁes P Q, and 2) UBl(v2, Q), which is computed as in

Abbe, Kulkarni and Lee

Algorithm 3 r-nonbacktracking upper bound (r NB-UB)

parameter: non-negative integer 2 r n 1 Initialize: UBl(v) = 0 for all 0 l n 1 and v V for l = 0 to r 2 do

UBl(u) = f( P Pl(S0 u)(P, 1)), l r 1

for s S0 do

for P = (p1, . . . , pr = u) Pr 1(s ) do

UBr 1(s, P) = 1 Mnext(u).insert(P, UBr 1(s, P)) Sr 1.insert(u) for l = r 1 to n 1 do

for u Sl do

Mcurr(u) = Mnext(u) Clear Mnext(u) UBl(u) = f(Mcurr(u)) for u Sl do

for Q = (q1, . . . , qr = v) {Q : P Q , P Pr 1( u)} do

UBl(q1, Q) = r Generate Outgoing Msg UB(Mcurr(u)) Mnext(v).insert(Q, UBl(q1, Q)) Sl+1.insert(v) Output: UBl(u) for all 0 l n 1, u V

Equation (15), by the function r Generate Outgoing Msg UB. Note that v2 is the start node of path Q, which is the second node in path P. Finally, the algorithm outputs UBl(u) for all u V and l {0, . . . , n 1}, and the upper bound σ+ r (S0) is computed by Equation (19).

Computational complexity: To compute f, we ﬁrst generate a radix tree using the input paths then compute the output from the leaves. Generating the radix tree takes O(r M) where r 1 is the length of the paths and M is the number of paths since inserting a path in a radix tree takes O(r). Then, we start the computation from the leaves to the root accessing each node once. Therefore, computing f takes O(r M). Note that for every l r and u V , there are at most | deg(u)|V |r 2| elements in Mcurr(u), since Mcurr(u) may contain all paths of length r 1 ending at node u. Therefore, function f takes O(deg(u)|V |r 2). To ﬁnd the set {Q : P Q , P Pr 1( u)}, for every P Pr 1( u), check if adding an out-neighbor of node u to the end of path P forms a length r path. This takes O(deg(pr)). We pre-compute this process for every path of length r 1, which takes O((|E| + |V |)|V |r 1). In the algorithm, we compute function f at most (n r)n times and pre-compute the relationships P Q once, resulting the overall computational complexity O((|E| + |V |)|V |r 1).

3.4. Comparisons of the Upper Bounds

In this section, we compare the three types of nonbacktracking upper bounds, NB-UB, t NBUB, and r NB-UB, in a small independent cascade model. The model IC(G, B) is deﬁned

Generalized Nonbacktracking Bounds on the Influence

Next, process {𝑐, 𝑑}

Mcurr(𝑏) { 𝑎, 𝑝} Mnext( ) UB1 𝑏 𝑎= 0 UB1 𝑏 𝑐= 𝑝2

UB1 𝑏 𝑑= 𝑝2

Next, process {𝑏}

Mcurr(𝑎) { 𝑎, 1 } Mnext( ) UB0 𝑎 𝑏= 𝑝 UB0 𝑎 1

Next, process {𝑐, 𝑑, 𝑒}

Mcurr(𝑐) { 𝑏, 𝑝2 } Mnext( ) UB2 𝑐 𝑏= 0 UB2 𝑐 𝑑= 𝑝3

Mcurr(𝑑) { 𝑏, 𝑝2 } Mnext( ) UB2 𝑑 𝑏= 0 UB2 𝑑 𝑐= 𝑝3

UB2 𝑑 𝑒= 𝑝3

Next, process {𝑏, 𝑒}

Mcurr(𝑐) { 𝑑, 𝑝3 } Mnext( ) UB3 𝑐 𝑏= 𝑝4

UB3 𝑐 𝑑= 0 UB3 𝑐 𝑝3

Mcurr(𝑑) { 𝑐, 𝑝3 } Mnext( ) UB3 𝑑 𝑏= 𝑝4

UB3 𝑑 𝑐= 0 UB3 𝑑 𝑒= 𝑝4

Mcurr(𝑒) { 𝑑, 𝑝3 } Mnext( ) UB3 𝑒 𝑑= 0 UB3 𝑒 𝑝3

Mcurr(𝑏) { 𝑐, 𝑝4 , 𝑑, 𝑝4 } UB4 𝑏 2𝑝4 𝑝8

Graph 𝒢= 𝑉, 𝐸

Transmission probability matrix

𝒫= 𝑝A Seed set

Mcurr(𝑒) { 𝑑, 𝑝4 } UB4 𝑒 𝑝4

Nonbacktracking Upper Bound (NB-UB)

Figure 2: The step-wise illustration of NB-UB algorithm on the example network.

on an bidirected graph G = (V, E), where V = {a, b, c, d, e}, S0 = {a}, and every edge has the same transmission probability p. In Figure 2, we describe how NB-UB algorithm runs on the network IC(G, B). Since |V | = 5, the algorithm iterates from l = 0 to l = 4. On each step, it processes the nodes in the set Sl and passes the messages generated by them. For example, at l = 3, the nodes in S3 = {c, d, e} are processed. From previous iteration l = 2, node c sent the message (c, UB2(c d)) to d, and node d sent the message (d, UB2(d c)) to c and (d, UB2(d e)) to e. Thus,

Mcurr(c) = {(d, UB2(d c))} = {(d, p3)}

Mcurr(d) = {(c, UB2(c d))} = {(c, p3)}

Mcurr(e) = {(d, UB2(d e))} = {(d, p3)}

and the nodes c, d, e are processed as follows. First, compute the upper bounds at the step l = 3 as

UB3(c) = Process Incoming Msg UB(Mcurr(c)) = p3

UB3(d) = Process Incoming Msg UB(Mcurr(d)) = p3

UB3(e) = Process Incoming Msg UB(Mcurr(e)) = p3.

Abbe, Kulkarni and Lee

Next, each node computes the messages for its out-neighbors.

UB3(c b) = Generate Outgoing Msg UB(0, UB3(c), Bcb)

= Bcb(1 1 UB3(c)

UB3(c d) = Generate Outgoing Msg UB(UB2(d c), UB3(c), Bcd)

= Bcd(1 1 UB3(c) 1 UB2(d c)) = 0

UB3(d b) = Generate Outgoing Msg UB(0, UB3(d), Bdb)

= Bdb(1 1 UB3(d)

UB3(d c) = Generate Outgoing Msg UB(UB2(c d), UB3(d), Bdc)

= Bdc(1 1 UB3(d) 1 UB2(c d)) = 0

UB3(d e) = Generate Outgoing Msg UB(0, UB3(d), Bde)

= Bde(1 1 UB3(d)

UB3(e d) = Generate Outgoing Msg UB(UB2(d e), UB3(d), Bed)

= Bed(1 1 UB3(d) 1 UB2(d e)) = 0. (20)

Then, node c send message (b, UB3(c b)) to b, and node d send messages (d, UB3(d b)) to b and (d, UB3(d e)) to e, concluding the process of the l = 3 step.

P 3(S0 a) {(a)} UB3(a) 1 Mnext( ) 𝑛𝑜𝑛𝑒

Mcurr(𝑒) { 𝑑, 𝑝4 } UB4 𝑒 𝑝4

𝑙= 4 Initialize : 𝑡= 3

Tunable Nonbacktracking Upper Bound (t NB-UB), t=3

P 3(S0 c) { a, b, c , (a, b, d, c)}

UB3(c) 𝑝2 + 𝑝3 𝑝4

Mnext( ) UB3 𝑐 𝑏= 0 UB3 𝑐 𝑑= 0

P 3(S0 d) { a, b, d , (a, b, c, d)}

UB3(d) 𝑝2 + 𝑝3 𝑝4

Mnext( ) UB3 𝑑 𝑏= 0 UB3 𝑑 𝑐= 0 UB3 𝑑 𝑒= 𝑝4

P 3(S0 e) { (a, b, d, e)} UB3(e) 𝑝3

Mnext( ) UB3 𝑒 𝑑= 0

P 3(S0 b) {(a, b)} UB3(b) 𝑝 Mnext( ) 𝑛𝑜𝑛𝑒

Next, process {𝑒}

Figure 3: The step-wise illustration of t NB-UB algorithm on the example network.

Next, we show the process of t NB-UB in the same network when t = 3 in Figure 3. As the ﬁrst step, t NB-UB ﬁnds all paths from the seed a having the length less than or equal to 3. Then, for each node u {a, b, c, d, e}, it computes the exact probability UB3(u) that the node is inﬂuenced through a path of length less than or equal to 3. For example,

UB3(d) = P(Path (a, b, d) is live or Path (a, b, c, d) is live)

= p2 + p3 p4.

Generalized Nonbacktracking Bounds on the Influence

To initialize the process, t NB-UB computes UB3(u v), the probability that node v is inﬂuenced directly by u with a live path of length 4 from the seed a. For node d, there is a path (a, b, c, d) of length 3 from the seed, so we compute UB3(d v) for its neighbors {b, c, e}. Notice that there is no path from a that ends with (d, b) with length 4 neither a path from a that ends with (d, c) with length 4. However, (a, b, c, d, e) is a path from a ending with (d, e). Thus,

UB3(d b) = 0

UB3(d c) = 0

UB3(d e) = P(Path (a, b, c, d, e) is live) = p4.

After the initialization, t NB-UB repeats the same process as NB-UB from l = t + 1 to l = n 1.

Process 𝑙= 1

Next, process {𝑐, 𝑑}

𝑃1(𝑠0 ) {(𝑎, 𝑏)} UB1 𝑏 𝑝 Mnext( ) 𝑈𝐵2 𝑎, (𝑎, 𝑏, 𝑐) = 1 𝑈𝐵2 𝑎, (𝑎, 𝑏, 𝑑) = 1

Process 𝑙= 0

𝑃0(𝑠0 ) { 𝑎} UB0 𝑎 1

Next, process {𝑐, 𝑑, 𝑒}

Mcurr(𝑐) { (𝑎, 𝑏, 𝑐), 1 } Mnext( ) 𝑈𝐵3 𝑏, (𝑏, 𝑐, 𝑑) = 𝑝 UB2 𝑐 𝑝2

Mcurr(𝑑) { (𝑎, 𝑏, 𝑑), 1 } Mnext( ) 𝑈𝐵3 𝑏, (𝑏, 𝑑, 𝑐) = 𝑝 𝑈𝐵3 𝑏, (𝑏, 𝑑, 𝑒) = 𝑝 UB2 𝑑 𝑝2

Next, process {𝑒}

Mcurr(𝑐) { (𝑏, 𝑑, 𝑐), 𝑝} Mnext( ) 𝑛𝑜𝑛𝑒 UB3 𝑐 𝑝3

Mcurr(𝑑) { 𝑏, 𝑐, 𝑑, 𝑝} Mnext( ) 𝑈𝐵4 𝑐, (𝑐, 𝑑, 𝑒) = 𝑝2

Mcurr(𝑒) { (𝑏, 𝑑, 𝑒), 𝑝} Mnext( ) 𝑛𝑜𝑛𝑒 UB3 𝑒 𝑝3

Mcurr(𝑒) { (𝑐, 𝑑, 𝑒), 𝑝2 } UB4 𝑒 𝑝4

r-Nonbacktracking Upper Bound (r NB-UB)

Initialize : 𝑟= 3

Figure 4: The step-wise illustration of r NB-UB algorithm on the example network.

In Figure 4, we illustrate r NB-UB in the same example network. For the initialization, r NB-UB computes UBi(v) for all i {0, . . . , r 2} = {0, 1} and v {a, b, c, d, e}. Since there is only one length 0 path from the seed a, we have P = (a), UB0(a) = 1 and UB0(v) = 0, v {b, c, d, e}. Similarly, UB1(b) = p and UB1(v) = 0, v {a, c, d, e}. Then, the algorithm ﬁnds all paths from the seed a with length r 1 = 2 and sends messages (P, UB2(a, P) = 1) to the end node of the each path. In this network, there are two paths of length 2, (a, b, c) and (a, b, d). Thus, r NB-UB sends ((a, b, c), 1) to node c and ((a, b, d), 1) to node d. From l = r 1 to n 1, each node v in set Sl ﬁrst computes the upper bound UBl(v) by the function f in Deﬁnition 10. For example, at l = 2, nodes in S2 = {c, d} compute the following upper bounds. Since both Mcurr(c) = {(a, b, c), 1} and Mcurr(d) = {(a, b, d), 1}

Abbe, Kulkarni and Lee

have only one element each,

UB2(c) = 1 (1 p(1 (1 p))) = p2

UB2(d) = 1 (1 p(1 (1 p))) = p2.

Next, each node in set Sl checks whether adding its out-neighbor to the previously received messages paths forms a path or not. If so, send a message to the neighbor with the updated path and value. For example, node c has the message ((a, b, c), 1) from l = 1 step. For node c s two out-neighbors b and d, adding d forms a path but not adding b. Thus, node c sends the message ((b, c, d), p) to node d, where (b, c, d) is length 2 sub-path of the path (a, b, c, d) ending at node d. Similarly, d sends the message ((b, d, c), p) to node c and ((b, d, e), p) to node e, concluding the l = 2 step. Now, we compare the three upper bounds, NB-UB, t NB-UB, and r NB-UB, on the example graph G to the exact inﬂuence σ.

σ = 1 + p + (p2 + p3 p4) + (p2 + p3 p4) + (p3 + p4 p5)

= 1 + p + 2p2 + 3p3 p4 p5

σ+ = 1 + (p + 2p4 2p5 p8 + p9) + (p2 + p3 p5) + (p2 + p3 p5) + (p3 + p4 p7)

= 1 + p + 2p2 + 3p3 + 3p4 4p5 p7 p8 + p9

σ+ t = 1 + p + (p2 + p3 p4) + (p2 + p3 p4) + (p3 + p4 p7)

= 1 + p + 2p2 + 3p3 p4 p7

σ+ r = 1 + p + (p2 + p3 p5) + (p2 + p3 p5) + (p3 + p4 p7)

= 1 + p + 2p2 + 3p3 + p4 2p5 p7

The tunable upper bound σ+ t with t = 3 results in the tightest bound. t NB-UB computes the inﬂuence probability for nodes a, b, c and d exactly, since they can only be inﬂuenced by a path with length less than or equal to 3. The r-nonbacktracking bound σ+ r with r = 3 avoids all triangles when computing the bound. Thus, it only considers paths rather than walks. However, r NB-UB is still not exact, because it assumes the independence among the events that a node is inﬂuenced by a length l path, for all l n 1. The nonbacktracking bound σ+ results in the worst bound. The bound not only assumes the independence as in r NB-UB, but also considers all walks that include the triangle formed by nodes b, c and d.

4. Nonbacktracking Lower Bounds

In this section, we introduce a nonbacktracking lower bound on the inﬂuence in the independent cascade model. The bound uses nonbacktracking walks and is computed eﬃciently by message passing algorithms in O(|V | + |E|). Similar to nonbacktracking upper bound, we also present a parameterized version of the lower bound, tunable nonbacktracking lower bound. Furthermore, from the proposed upper σ+ and lower σ bounds on the expected number of inﬂuenced nodes, we can compute an upper bound on the variance given by (σ+ σ )2/4. This could be used to estimate the number of computations needed by MC simulations. Computing the upper bound on the variance with the proposed bounds can be done in O(|V |2 + |V ||E|), whereas computing the variance with MC requires O(|V |5 + |V |4|E|).

Generalized Nonbacktracking Bounds on the Influence

This section is organized as follows. First, sections 4.1 and 4.2 shows the deﬁnitions of the proposed lower bounds and eﬃcient algorithms to compute them. Next, section 4.3 shows how each lower bound is computed on a small example network and compares the bounds with the inﬂuence.

4.1. Nonbacktracking Lower Bound (NB-LB)

A naive way to compute a lower bound on the inﬂuence of a seed set S0 in a network IC(G, B) is to reduce the network to a (spanning) tree network, by removing edges. Then, since there is a unique path from a node to another, we can compute the inﬂuence of the tree network, which is a lower bound on the inﬂuence in the original network, in O(|V |). We take this approach of generating a subnetwork from the original network, yet we avoid the signiﬁcant gap between the bound and the inﬂuence by considering the following directed acyclic subnetwork, in which there is no backtracking walk.

Deﬁnition 11 (Min-distance Directed Acyclic Subnetwork). Consider an independent cascade model IC(G, B), where G = (V, E) and |V | = n, and a seed set S0 V . Let d(S0, v) := mins S0 d(s, v), i.e., the minimum distance from a seed in S0 to v. A minimumdistance directed acyclic subnetwork (MDAS), IC(G , B ), where G = (V , E ), is obtained as follows.

V = {v1, ..., vn} is an ordered set of nodes such that for every i < j, d(S0, vi) d(S0, vj).

E = {(vi, vj) E : i < j}, i.e., E is obtained from E by removing edges whose source node comes later in the order than its destination node.

B vivj = Bvivj, if (vi, vj) E , and B vivj = 0, otherwise.

If there are multiple ordered sets of vertices satisfying the condition, we may choose one arbitrarily. While any of the ordered sets satisfying the condition results in a lower bound, the tightness of the bound can varies depending on the ordered set chosen for the computation. This is further explained in 4.3 with an example. For any k [n], let p(vk) be the probability that vk V is inﬂuenced in MDAS, IC(G , B ). Since p(vk) is equivalent to the probability of the union of the events that an in-neighbor ui N (vk) inﬂuences vk, p(vk) can be computed by the principle of inclusion and exclusion. Thus, we may compute a lower bound on p(vk), using Bonferroni inequalities, if we know the probabilities that in-neighbors u and v both inﬂuences vk, for every pair u, v N (vk). However, computing such probabilities can take O(kk). Hence, we present LB(vk) which eﬃciently computes a lower bound on p(vk) by the following recursion.

Deﬁnition 12. For all vk V , LB(vk) [0, 1] is deﬁned by the recursion on k as follows. Initial condition: For every vs S0,

LB(vs) = 1. (21)

Recursion: For every vk V \ S0,

B uivk LB(ui)(1

j=1 B ujvk)

Abbe, Kulkarni and Lee

where N (vk) = {u1, . . . , um} is the ordered set of in-neighbors of vk in IC(G , B ) and m =max{m m : Pm 1 j=1 B ujvk 1}.

Remark. Since the i-th summand in Equation (22) can use Pi 2 j=1 B ujvk, which is

already computed in (i 1)-th summand, to compute Pi 1 j=1 B ujvk, the summation takes at most O(deg(vk)). Note that Equation (22) is formulated so that the computational complexity is minimized. However, we may use some other equations to achieve the lower bound property.

Theorem 5 (Nonbacktracking Lower Bound (NB-LB)). For any independent cascade model IC(G, B), a seed set S0 and its MDAS, IC(G , B ),

vk V LB(vk) =: σ (S0), (23)

where LB(vk) is obtained recursively as in Deﬁnition 12.

Next, we present Nonbacktracking Lower Bound (NB-LB) algorithm which eﬃciently computes LB(vk). At k-th iteration, the variables in NB-LB represent the followings.

M(vk) = {(LB(vj), B vjvk) : vj is an in-neighbor of vk}, set of pairs (incoming message from an in-neighbor vj to vk, the transmission probability of edge (vj, vk)).

Algorithm 4 Nonbacktracking Lower Bound (NB-LB)

Input: directed acyclic network IC(G , B ) Initialize: σ = 0 Initialize: Insert (1, 1) to M(vi) for all vi S0 for k = 1 to n do

LB(vk) = Process Incoming Msg LB(M(vk)) σ += LB(vk) for vl N+(vk) \ S0 do

M(vl).insert((LB(vk), B vkvl))

At the beginning, every seed node s S0 is initialized such that M(s) = {(1, 1)} in order to satisfy the initial condition, LB(s) = 1. For each k-th iteration, node vk is processed as follows. First, LB(vk) is computed as in the Equation (22), by the function Process Incoming Msg LB, and added to σ . Second, vk passes the message (LB(vk), B vkvl) to its out-neighbor vl N+(vk)\S0, and vl stores (inserts) it in M(vl). Finally, the algorithm outputs σ , the lower bound on the inﬂuence. Computational complexity: Obtaining an arbitrary directed acyclic subnetwork from the original network takes O(|V | + |E|). Next, the algorithm iterates through the nodes V = {v1, . . . , vn}. For each node vk, Process Incoming Msg LB takes O(deg(vk)) and vk sends messages to its out-neighbors in O(deg(vk)). Hence, the worst case computational complexity is O(|V | + |E|).

Generalized Nonbacktracking Bounds on the Influence

4.2. Tunable Nonbacktracking Lower Bound (t NB-LB)

The ﬁrst step of tunable NB-LB is the same as NB-LB, which is to order the vertex set as V = {v1, . . . , vn} to satisfy P(S0, vi) P(S0, vj), for every i < j. Next, given a nonnegative integer parameter t n, we obtain a t-size subnetwork IC(G[Vt], B[Vt]) and a new seed set S0 Vt, where G[Vt] is the vertex-induced subgraph which is induced by the set of nodes Vt = {v1, . . . , vt}, and B[Vt] is the corresponding transmission probability matrix. For each vi Vt, we compute the exact probability pt(vi) that node vi is inﬂuenced from the new seed set S0 Vt in the subnetwork IC(G[Vt], B[Vt]). Then, we start (non-parameterized) NB-LB algorithm from k = t + 1 with the new initial condition: for all k t,

LB(vk) = pt(vk). (24)

Finally, t NB-LB computes the lower bound as P vk V LB(vk).

Algorithm 5 Tunable nonbacktracking lower bound (t NB-LB)

parameter: non-negative integer t n Initialize: σ = 0 for k = 1 to t do

LB(vk) = pt(vk) σ += LB(vk) for vi {N+(vk) {vj : j > t}} do

M(vi).insert((LB(vk), B vkvi))

for k = t + 1 to n do

LB(vk) = Process Incoming Msg LB(M(vk)) σ += LB(vk) for vi N+(vk) \ S0 do

M(vi).insert((LB(vk), B vkvi))

For larger t, the algorithm results in a tighter bound. However, the computational complexity may increase exponentially with respect to t. To avoid such large computational complexity, the algorithm can adopt Monte Carlo simulations on the subnetwork. However, this modiﬁcation results in probabilistic lower bounds, rather than theoretically guaranteed lower bounds. Nonetheless, this can still give a signiﬁcant improvement, because the Monte Carlo simulations on a smaller network require less computation to stabilize the estimation.

4.3. Illustration of Nonbacktracking Lower Bounds (NB-LB)

In Figure 5, we show step-wise lower bound computation by NB-LB on a small network IC(G, B) deﬁned on an bidirected graph G = (V, E), where V = {a, b, c, d}, S0 = {a}, and every edge has the same transmission probability p. For each k, Table 1 shows the values of the key variables, M(vk), LB(vk), and (LB(vk), B vkvl) for the out-neighbors vl N+(vk)\S0, and shows the change in σ .

Abbe, Kulkarni and Lee

𝐌𝐃𝐀𝐒 𝒌= 𝟏 𝒌= 𝟐 𝒌= 𝟑 𝒂

𝒑𝟐 𝒑𝟐+ 𝒑𝟑 𝒑𝟒

Figure 5: The step-wise illustration of NB-LB on the example network.

k = 1 k = 2 k = 3 k = 4

M(vk) {(1, 1)} {(1, p)} {(1, p), (p, p)} {(p + p2 p3, p)}

LB(vk) 1 p p + p2 p3 p2 + p3 p4

N +(vk) \ S0 {b, c} {c} {d}

(LB(vk), B vkvl) to vl (1, p) to b and c (p, p) to c (p + p2 p3, p) to d

σ 1 1 + p 1 + 2p + p2 p3 1 + 2p + 2p2 p4

Table 1: The values of the key variables in NB-LB on the example network in Figure 5.

We obtain MDAS from the network as follows. Since d(S0, a) = 0, d(S0, b) = d(S0, c) = 1 and d(S0, d) = 2, we order the vertices as {v1 = a, v2 = b, v3 = c, v4 = d} so that d(S0, vi) d(S0, vj), for every i < j. NB-LB algorithm processes the nodes {v1 = a, v2 = b, v3 = c, v4 = d} sequentially. For example, at k =3, node c is processed. During the previous step at k = 1, node a sent the message (LB(a), B ac) to node c, and at k = 2, node b sent the message (LB(b), B bc) to node c. Thus, node c has the message:

M(c) = {(LB(a), B ac), (LB(b), B bc)} = {(1, p), (p, p)}.

Then, it computes LB(c) with the function Process Incoming Msg LB.

LB(c) = Process Incoming Msg LB(M(c))

= B ac LB(a) + B bc LB(b)(1 B ac) = p + p2 p3.

Recall that σ = 1 + p, at the end of the iteration k = 2. Thus, at k = 3,

σ = 1 + p + LB(c) = 1 + 2p + p2 p3.

Next, since N+(c) \ S0 = {d}, node c sends the message (LB(c), Bcd) = (p + p2 p3, p) to node d, concluding the process of the k = 3 step. At k = 4, the algorithm computes LB(d) = p(p + p2 p3) and the ﬁnal lower bound:

σ = 1 + 2p + p2 p3 + LB(d) = 1 + 2p + 2p2 p4.

For this example network, there is another MDAS that satisﬁes the condition in 11 with the ordered vertices {v1 = a, v2 = c, v3 = b, v4 = d}. We can also obtain a lower bound using this MDAS, which is 1 + 2p + 2p2 p3. Notice that this value is diﬀerent from the lower bound we computed above but still is a lower bound for the expected number of the inﬂuenced nodes.

Generalized Nonbacktracking Bounds on the Influence

Next, we compute t NB-LB on the same network with t = 3. First, for k = 1 to k = 3, the probabilities that nodes a, b, c are inﬂuenced are exactly computed as follows.

LB(b) = p + p2 p3

LB(c) = p + p2 p3

Then, for k = 4, the algorithm computes the message M(d) = {(LB(c), B cd)} and, in turns, the lower bound LB(d) = B cd LB(c) = p(p + p2 p3). Thus, the ﬁnal lower bound is

σ t = 1 + 2p + 3p2 p3 p4.

Finally, we compare the lower bounds to the inﬂuence.

σ = 1 + 2p + 3p2 p3 p4

σ = 1 + 2p + 2p2 p4

σ t = 1 + 2p + 3p2 p3 p4

As expected, t NB-LB results in a tighter lower bound than NB-LB. In fact, t NB-LB is the same as the inﬂuence since node d can only be inﬂuenced through node c whose probability of being inﬂuenced is exactly computed in t NB-LB with t = 3.

5. Experimental Results

In this section, we evaluate NB-UB and NB-LB in independent cascade models on a variety of classical synthetic networks. Network Generation. We consider 4 classical random graph models with the parameters shown as follows: Erdos Renyi random graphs with ER(n = 1000, p = 0.003), scale-free networks SF(n = 1000, α = 2.5), random regular graphs Reg(n = 1000, d = 3), and random tree graphs with power-law degree distributions T(n = 1000, α = 3). For each graph model, we generate 100 networks IC(G, p A) and a seed as follows. The bidirected graph G is deﬁned on the largest connected component of a graph drawn from the graph model, the seed node s is a randomly selected vertex, and A is the adjacency matrix of G. The corresponding IC model has the same transmission probability p for every edge. Evaluation of Bounds. For each generated network, we compute the following quantities for each p {0.1, 0.2, . . . , 0.9}.

σmc: the estimation of the inﬂuence with 106 Monte Carlo simulations.

σ+: the upper bound obtained by NB-UB.

σ+ spec: the spectral upper bound by (Lemonnier et al., 2014).

σ : the lower bound obtained by NB-LB.

σ prob: the probabilistic lower bound obtained by 10 Monte Carlo simulations.

There has not been any tight lower bound for general networks. The lower bound introduced in (Khim et al., 2016) is eﬃcient and tight for tree networks. However, for general

Abbe, Kulkarni and Lee

Figure 6: (a) The average relative gap of the upper bounds: NB-UB and the spectral upper bound in (Lemonnier et al., 2014) for various types of networks. (b) The average relative gap of the lower bounds: NB-LB and the probabilistic lower bound computed by MC simulations for various types of networks.

Generalized Nonbacktracking Bounds on the Influence

networks, since it considers all paths of length k {0, 1, ..., m} for some m [n], the computational complexity is O(|V |m) which is greater than the computational complexity of the nonbacktracking lower bound for any m > 2. Therefore, the probabilistic lower bound is chosen for the experiments. The sample size of 10 is determined to exceed the computational complexity of NB-LB algorithm. In Figure 6, we compare the average relative gap of the bounds for every network model and for each transmission probability, where the true value is assumed to be σmc. For example, the average relative gap of NB-UB for 100 Erdos Renyi networks {Ni}100 i=1 with the transmission probability p is computed by 1 100 P i [100] σ+[Ni] σmc[Ni]

σmc[Ni] , where σ+[Ni] and σmc[Ni] denote NB-UB and the MC estimation, respectively, for the network Ni. Results. Figure 6 shows that NB-UB outperforms the upper bound in (Lemonnier et al., 2014) for the Erdos-Renyi and random bidirected 3-regular networks, and performs comparably for the scale-free networks. Also, NB-LB gives tighter bounds than the MC bounds on the Erdos-Renyi, scale-free, and random regular networks when the transmission probability is small, p < 0.4. NB-UB and NB-LB compute the exact inﬂuence for the tree networks since both algorithms avoid backtracking walks. Next, we show the bounds on exemplary networks.

5.1. Upper Bounds

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

Transmission Probability

Upper bounds of the influence

MC (5) MC (10) MC (30) MC (300) MC (3000) NB-UB Spectral

0 0.1 0.2 0.3 0.4 0.5

0.5 0.6 0.7 0.8 0.9 1

180.92 605.8

Figure 7: The ﬁgure compares various upper bounds on the inﬂuence in the 3-regular network in section 5.1. The MC upper bounds are computed with various simulation sizes and shown with the data points indicated with MC(N), where N is the number of simulations. The spectral upper bound in (Lemonnier et al., 2014) is shown in red line, and NB-UB is shown in green line.

In order to illustrate typical behavior of the bounds, we have chosen the network in Figure 7 as follows. First, we generate 100 random 3-regular graphs G with 1000 nodes and assign a random seed s. Then, the corresponding IC model is deﬁned as IC(G, B = p A). For each network, we compute NB-UB and MC estimation. Then, we compute the score for each network, where the score is deﬁned as the sum of the square diﬀerences between the

Abbe, Kulkarni and Lee

upper bounds and MC estimations over the transmission probability p {0.1, 0.2,. . ., 0.9}. Finally, a graph whose score is the median from all 100 scores is chosen for Figure 7.

In ﬁgure 7, we compare 1) the upper bounds introduced (Lemonnier et al., 2014) and 2) the probabilistic upper bounds obtained by Monte Carlo simulations with 99% conﬁdence level, to NB-UB. The MC upper bounds are computed with the various sample sizes N {5, 10, 30, 300, 3000}. It is evident from the ﬁgure that a larger sample size provides a tighter probabilistic upper bound. NB-UB outperforms the bound by (Lemonnier et al., 2014) and the probabilistic MC bound when the transmission probability is relatively small. Further, it shows a similar trend as the MC simulations with large sample size.

Next, we show r-nonbacktracking upper bounds on a small world graph G = SW(n, k, q) where n = 1000 is the number of nodes in the network, k = 4 is the number of neighbors of each node, and q = 0.01 is the re-wiring probability. The IC model on the small world graph G is deﬁned as IC(G, B = p A), where A is the adjacency matrix of G, p is the transmission probability for every edge, and the seed node s is chosen arbitrarily.

Figure 8: The ﬁgure compares r-nonbacktracking upper bounds with various values of r in a small world network. The MC estimation with simulation size 106 is computed for comparison.

In Figure 8, we compare the r-nonbacktracking upper bounds with r {2, 3, 4, 5} to MC estimation. For p < 0.32, all upper bounds are very close to the inﬂuence, and for p > 0.42 the upper bounds with r 5 become close to the maximum value, 1000. Therefore, we show the bounds for p [0.32, 0.42]. Note that when r = 2, the r-nonbacktracking upper bound (r NB-UB) is the nonbacktracking upper bound (NB-UB). By avoiding r = 3 backtracking walks (triangles) in addition to backtracking walks, the upper bound is signiﬁcantly improved. The same applies to r = 4, 5 nonbacktracking bounds. As r increases, the bounds get tighter and closer to the estimated inﬂuence. It is worth noting that while the upper bound results in deterministic, as opposed to probabilistic, upper bound, the computational complexity for r 4 exceed the complexity of MC approximation.

Generalized Nonbacktracking Bounds on the Influence

5.2. Lower Bounds

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Transmission Probability

Lower bounds of the influence

MC (5) MC (12) MC (30) MC (300) MC (3000) NB-LB

Figure 9: The ﬁgure shows lower bounds on the inﬂuence of a scale-free network in section 5.2. The probabilistic lower bounds are obtained by MC with various simulation sizes. The data points indicated with MC(N) are obtained by N number of simulations. NB-LB is shown in green line.

In this section, we show lower bounds on an exemplary network. We adopt a similar selection process as described in Section 5.1 but with the scale-free networks, with 3000 nodes and α = 2.5.

We compare probabilistic lower bounds obtained by MC with 99% conﬁdence level to NB-LB. The bounds from Monte Carlo simulations are computed with various sample sizes N {5, 12, 30, 300, 3000}, which accounts for a constant, log(|V |), 0.01|V |, 0.1|V |, and |V |. NB-LB outperforms the probabilistic bounds by MC with small sample sizes. Recall that the computational complexity of NB-LB is O(|V | + |E|), which is the computational complexity of a constant number of Monte Carlo simulations. In Figure 9, it shows that NBLB is tighter than the probabilistic lower bounds with the same computational complexity.

6. Conclusion

We propose both upper and lower bounds on the inﬂuence in the independent cascade models and provide algorithms to eﬃciently compute the bounds. The proposed upper bound is monotone and submodular. We extend the results by proposing tunable bounds which can adjust the trade-oﬀbetween the eﬃciency and the accuracy and by using rnonbacktracking walks instead of (2-)nonbacktracking walks. Finally, the tightness and the performance of bounds are shown with the experimental results.

Abbe, Kulkarni and Lee

Acknowledgments

The authors thank Colin Sandon for helpful discussions. This research was partly supported by the NSF CAREER Award CCF-1552131, the ARO grant W911NF-16-1-0051, and NSF Center for Science of Information (CSo I) Grant CCF-0939370.

Generalized Nonbacktracking Bounds on the Influence

Appendix A. Proof of Theorem 2

We start by deﬁning the following events for the independent cascade model IC(G, B), where G = (V, E) and |V | = n, and a seed set S0 V .

Deﬁnition 13. For any u, v V , l {0, . . . , n 1}, and S V , we deﬁne

A(v) = {v is inﬂuenced}

Al(v) = P Pl(S0 v){P is live}

Al(u v) = P Pl(S0 u),P

v{P is live and edge (u, v) is live}

Al,S(v) = P {P Pl(S0 v):P

w, w S}{P is live}.

In other words, Al(v) is the event that node v is inﬂuenced by live paths of length l, Al(u v) is the event that v is inﬂuenced by node u with live paths of length l + 1, i.e., there exists a live path of length l + 1 from a seed to v that ends with edge (u, v), and Al,S(v) is the event that node v is inﬂuenced by length l live paths which do not include any node in S. Note that for any pair of events (A, B), we use the notation AB for A B.

Lemma 1. For any v V ,

l=0 (1 pl(v)).

For any v V and l {0, . . . , n 1},

u N (v) (1 pl(u v)).

Proof. Recall that p(v) = P(A(v)), pl(v) = P(Al(v)), and pl(u v) = P(Al(u v)).

p(v) = P( n 1 l=0 Al(v))

= 1 P( n 1 l=0 Al(v)C)

l=0 P(Al(v)C) (25)

l=0 (1 pl(v)).

Equation (25) follows from positive correlation among the events Al(v)C for all v V , which can be proved by FKG inequality. Similarly,

pl(v) = P( u N (v)Al(u v))

= 1 P( u N (v)Al(u v)C)

u N (v) P(Al(u v)C)

u N (v) (1 pl(u v)). (26)

Abbe, Kulkarni and Lee

Theorem 2. For any independent cascade model IC(G, B) and a seed set S0 V ,

l=0 (1 UBl(v))) =: σ+(S0), (27)

where UBl(v) is obtained recursively as in Deﬁnition 7.

Proof. We provide proof by induction. The initial condition, for l = 0, can be easily checked. For every s S0, s+ N+(s), u V \S0, and v N+(u),

p0(s) = 1 UB0(s) = 1

p0(s s+) = Bss+ UB0(s s+) = Bss+

p0(u) = 0 UB0(u) = 0

p0(u v) = 0 UB0(u v) = 0.

For each l L, assume that pl(v) UBl(v) and pl(u v) UBl(u v) for all u, v V . Since pl(s) = pl(s s+) = pl(s s) = 0 for every l 1, s S0, s+ N+(s), and s N (s), it is suﬃcient to show p L+1(v) UBL+1(v) and p L+1(u v) UBL+1(u v) for all u V \S0, and v N+(u), For any v V \ S0,

p L+1(v) = P( u N (v)Euv AL,{v}(u)),

where Euv denotes the event that edge (u, v) is live, i.e., P(Euv) = Buv. Thus,

p L+1(v) = 1 P( u N (v)(Euv AL,{v}(u))C)

u N (v) (1 P(Euv AL,{v}(u))) (28)

u N (v) (1 p L(u v))

u N (v) (1 UBL(u v)) = UBL+1(v), (29)

where Equation (28) is obtained by the positive correlation among the events Euv AL,{v}(u), and Equation (29) comes from the assumption. For any v V \ S0 and w N+(v),

p L+1(v w) = P(Evw AL+1,{w}(v))

= Bvw P(AL+1,{w}(v)). (30)

Generalized Nonbacktracking Bounds on the Influence

Equation (30) follows from the independence between the events Evw and AL+1,{w}(v). If w N (v),

p L+1(v w) = Bvw P( u N (v)\{w}Euv AL,{v,w}(u))

u N (v)\{w} (1 P(Euv AL,{v,w}(u)))

u N (v)\{w} (1 p L(u v))

u N (v)\{w} (1 UBL(u v))

Equation (31) holds, since the two events satisfy Euv AL,{v,w}(u) Euv AL,{v}(u). Recall that, if w N (v),

UBL+1(v w) = Bvw(1 1 UBL+1(v) 1 UBL(w v))

u N (v)\{w} (1 UBL(u v))).

Thus, p L+1(v w) UBL+1(v w), for all w N+(v) N (v). If w / N (v),

p L+1(v w) = Bvw P( u N (v)Euv AL,{v,w}(u))

u N (v) (1 UBL(u v))

= Bvw UBL+1(v) = UBL+1(v w).

Hence, p L+1(v w) UBL+1(v w), for all w N+(v), concluding the proof. Finally, by Lemma 1,

l=0 (1 pl(v)))

l=0 (1 UBl(v))) = σ+(S0).

Appendix B. Proof of Theorem 3

Theorem 3. For any independent cascade model IC(G, B), the nonbacktracking upper bound σ+ : 2V [0, |V |] is monotone and submodular.

Abbe, Kulkarni and Lee

Recall that

l [n 1] (1 UBl(v)))

UBl(u) = 1 Y

w N (u) (1 UBl 1(w u))

UBl(u v) = Buv

w N (u)\{v} (1 UBl 1(w u))

Since the summation of submodular functions is also submodular, it remains to prove that (1 Q l [n 1](1 UBl(v))) is submodular. Note that Be and N (v) for any e E and v V is independent of the seed set and that UB0(u v) for any u, v V is constant, so it is equivalent to prove the following statement.

Claim 1. If fi(S) [0, 1] for all i [m] is monotone and submodular, then 1 Q i [m](1 fi(S)) is also monotone and submodular.

Proof. We provide a proof by induction. The initial condition holds for k = 1. Assume that 1 Q i [k 1](1 fi(S)) is monotone (non decreasing) and submodular for some k m. Let gk 1(A) = 1 Q i [k 1](1 fi(A)). Then, for any A B V and s V ,

gk 1(A {s}) gk 1(A) gk 1(B {s}) gk 1(B) (32)

fk(A {s}) fk(A) fk(B {s}) fk(B). (33)

By deﬁnition,

gk(A) = 1 (1 gk 1(A))(1 fk(A)). (34)

gk is monotone as:

gk(A {s}) = 1 (1 gk 1(A {s}))(1 fk(A {s}))

1 (1 gk 1(A))(1 fk(A)) = gk(A) (35)

Next, to prove submodularity, we want to show:

(1 gk 1(A))(1 fk(A)) (1 gk 1(A {s}))(1 fk(A {s}))

(1 gk 1(B))(1 fk(B)) (1 gk 1(B {s}))(1 fk(B {s})). (36)

To simplify, for any i [m] and S V , let (1 gi(S)) = g i(S) and (1 fi(S)) = f i(S). Then, Equation(32) and (33) become

g k 1(A) g k 1(A {s}) g k 1(B) g k 1(B {s})

f k(A) f k(A {s}) f k(B) f k(B {s})

Generalized Nonbacktracking Bounds on the Influence

and Equation(36) is equivalent to

g k 1(A)f k(A) g k 1(A {s})f k(A {s}) g k 1(B)f k(B) g k 1(B {s})f k(B {s}).

Let g k 1(B) g k 1(B {s}) = δg and f k(B) f k(B {s}) = δf. Note that as f and g are non-decreasing, δg, 0 and δf 0. Then,

g k 1(A)f k(A) g k 1(A {s})f k(A {s})

(g k 1(A {s}) + δg)(f k(A {s}) + δf) g k 1(A {s})f k(A {s})

= δfδg + δgf k(A {s}) + δfg k 1(A {s})

δfδg + δgf k(B {s}) + δfg k 1(B {s})

= g k 1(B)f k(B) g k 1(B {s})f k(B {s})

Appendix C. Proof of Theorem 4

To prove Theorem 4, we ﬁrst introduce the following events in IC(G, B) where G = (V, E) and |V | = n.

Deﬁnition 14. For any v V , l {0, . . . , n 1}, we deﬁne

A(v) = {v is inﬂuenced}

A(P) = {P is live}

Al(v) = P Pl(S0 v){P is live} = P Pl(S0 v)A(P).

For any 2 r n and P Pr 1(v ), let Al(v, P) be the event that node v is inﬂuenced by a live path of length l (r 1) that does not include any node in path P.

Next, we recall Deﬁnition 9 and Theorem 4.

Deﬁnition 9. For all l {0, . . . , n 1}, u V and P Pr 1(u ), UBl(u) [0, 1] and UBl(u, P) [0, 1] are deﬁned recursively as follows. Initial condition: For every l r 1 and u V ,

UBl(u) = f( P Pl(S0 u)(P, 1)).

For every P Pr 1(s ) and s S0,

UBr 1(s, P) = 1.

Recursion: For every l {r, . . . , n 1}, s S0, u, x V \S0, v N+(u)\S0, and Q Pr 1(v ),

UBl(s) = 0, UBl(s, P) = 0 for any path P

UBl(v, Q) = 1 Y

P Q (1 Buv UBl 1(u, P))

UBl(x) = f( P {P Pr 1(y x): y V \S0}(P, UBl(y, P)))

Abbe, Kulkarni and Lee

Theorem 4. For any independent cascade model IC(G, B), a seed set S0 V and an integer r 2,

l=0 (1 UBl(v))) =: σ+ r (S0) σ+(S0), (37)

where UBl(v) is obtained recursively as in Deﬁnition 9.

Proof. First, let p(v) = P(A(v)), p(P) = P(A(P)), pl(v) = P(Al(v)), and pl(v, P) = P(Al(v, P)). By Lemma 1, it is suﬃcient to show pl(v) UBl(v) for all v V and l {0, 1, . . . , n 1}. We provide proof by induction on l. The initial condition can be easily checked. For every l r 1, u V , v N+(u) and s S0,

pl(u) = P( P Pl(S0 u)A(P)) f( P Pl(S0 u)(P, 1)) = UBl(u).

For every P Pr 1(s ) and s S0,

pr 1(s, P) = 1 UBr 1(s, P) = 1.

Next, for each l L, assume that pl(v) UBl(v) and pl(v, P) UBl(v, P) for every v V and P Pr 1(v ). For every l r, s S0 and P Pr 1(s ), pl(s) = 0 and pl(s, P) = 0, since a seed cannot be inﬂuenced again by other seeds and cannot inﬂuence other nodes after the ﬁrst step l = 0. Therefore, to conclude the proof, it is suﬃcient to show p L+1(v) UBL+1(v) and p L+1(v, P) UBL+1(v, P) for every v V \ S0 and P Pr 1(v ). For any u, v V \ S0, P Pr 1(u ), and Q Pr 1(v ) such that P Q, let P = (u = u1, . . . , uk) and Q = (v = v1, . . . , vk). Recall that if P Q, (u, v, u3 = v2 . . . , uk = vk 1, vk) is a path of length r. Then, for any P Q, AL(u, P) contains all events that an in-neighbor u of node v is inﬂuenced by a live path that does not contain any node in path P = (u, v, v2, . . . , vk 1). Notice that the event AL+1(v, Q) is the union of events that an in-neighbor u of node v is inﬂuenced by a live path that does not contain any node in path Q = (v, v2, . . . , vk) and edge (u, v) is live. In other words, AL+1(v, Q) = u N (v)Euv AL(u, P Q), where P Q = (u, v, u3 = v2 . . . , uk = vk 1, vk) and Euv is the event that edge (u, v) is live. Thus, AL+1(v, Q) P QEuv AL(u, P), which implies

p L+1(v, Q) P( P QEuv AL(u, P)).

By the positive correlation among the events Euv AL(u, P),

p L+1(v, Q) 1 Y

P Q (1 Puv P(AL(u, P)))

P Q (1 Puv UBL(u, P)) = UBL+1(v, Q), (38)

where Equation (38) follows from the assumption.

Generalized Nonbacktracking Bounds on the Influence

For any x V \ S0,

p L+1(x) = P( P PL+1(S0 x)A(P))

P( P {P Pr 1(y x): y V \S0}AL+1(y, P)A(P)) (39)

f( P {P Pr 1(y x): y V \S0}(P, p L+1(y, P))) (40)

f( P {P Pr 1(y x): y V \S0}(P, UBL+1(y, P))) (41)

= UBL+1(x).

Equation (39) follows from P PL+1(S0 x)A(P) P {P Pr 1(y x): y V \S0}AL+1(y, P) since the right hand side also includes events that y is inﬂuenced by a walk rather than a path. Equation (40) holds because the function f computes the upper bound rather than the exact probability of the input events, and Equation (41) comes from the assumption. Therefore, p L+1(v, Q) UBL+1(v, Q) and p L+1(x) UBL+1(x), concluding the proof by induction. Then, by Lemma 1,

l=0 (1 pl(v)))

l=0 (1 UBl(v))) = σ+ r (S0).

Appendix D. Proof of Theorem 5

Theorem 5. For any independent cascade model IC(G, B) and a seed set S0 and their MDAS IC(G , B ),

vk V LB(vk) =: σ (S0), (42)

where LB(vk) is obtained recursively as in Deﬁnition 12.

Proof. We provide proof by induction. For any vk V , let A(vk) be the event that node vk is inﬂuenced in MDAS IC(G , B ), and for every edge (vj, vk), let Evj,vk be the event that edge (vj, vk) is live, i.e., P(Evj,vk) = B vjvk. Recall that p(vk) = P(A(vk)). The initial condition k = 1 holds, since p(v1) = 1 LB(v1) = 1 (v1 is a seed). Now, for every k K, assume p(vk) LB(vk). Then, for node v K+1,

p(v K+1) = P( vj N (v K+1)Evjv K+1A(vj)).

Abbe, Kulkarni and Lee

We re-label vertices in N (v K+1) = {u1, . . . , um(K+1)} where m(K + 1) = in-deg(v K+1), and let Qi K+1 = B uiv K+1. Then, for any integer m m(K + 1),

p(v K+1) = P( m(K+1) i=1 Euiv K+1A(ui))

P( m i=1Euiv K+1A(ui))

i=1 P(Euiv K+1A(ui))

j=1 P(Euiv K+1A(ui)Eujv K+1A(uj)) (43)

i=1 Qi K+1P(A(ui))

j=1 Qi K+1Qj K+1P(A(ui)A(uj)) (44)

i=1 Qi K+1P(A(ui))(1

j=1 Qj K+1). (45)

Equation (43) follows from the principle of inclusion and exclusion. Equation (44) results from the Independence between the event that an edge ending with v K+1 is live and the event that a node vi is inﬂuenced where i < K + 1. Equation (45) holds since P(A(ui)) P(A(ui)A(uj)). Now, deﬁne m = max{m m(K + 1) : Pm 1 j=1 Qj K+1 1}. Then,

i=1 Qi K+1P(A(ui))(1

j=1 Qj K+1)

i=1 Qi K+1LB(ui)(1

j=1 Qj K+1) (46)

= LB(v K+1).

Equation (46) follows since 1 Pi 1 j=1 Qj K+1 0 for all i m by the deﬁnition of m . Thus, p(vi) LB(vi) for all vi V , concluding the induction proof. Finally,

i=1 p(vi) (47)

i=1 LB(vi) = σ (S0).

Equation (47) holds since its right hand side equals to the inﬂuence of the subnetwork, IC(G , B ).

Generalized Nonbacktracking Bounds on the Influence

Emmanuel Abbe and Colin Sandon. Detection in the stochastic block model with multiple clusters: proof of the achievability conjectures, acyclic bp, and the informationcomputation gap. ar Xiv preprint ar Xiv:1512.09080, 2015.

Emmanuel Abbe, Sanjeev Kulkarni, and Eun Jee Lee. Nonbacktracking bounds on the inﬂuence in independent cascade models. In Advances in Neural Information Processing Systems 30, pages 1407 1416. 2017.

Charles Bordenave, Marc Lelarge, and Laurent Massouli e. Non-backtracking spectrum of random graphs: community detection and non-regular ramanujan graphs. In Foundations of Computer Science (FOCS), 2015 IEEE 56th Annual Symposium on, pages 1347 1357. IEEE, 2015.

Wei Chen and Shang-Hua Teng. Interplay between social inﬂuence and network centrality: A comparative study on shapley centrality and single-node-inﬂuence centrality. In Proceedings of the 26th International Conference on World Wide Web, pages 967 976. International World Wide Web Conferences Steering Committee, 2017.

Wei Chen, Yajun Wang, and Siyu Yang. Eﬃcient inﬂuence maximization in social networks. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 199 208. ACM, 2009.

Moez Draief, Ayalvadi Ganesh, and Laurent Massouli e. Thresholds for virus spread on networks. In Proceedings of the 1st international conference on Performance evaluation methodolgies and tools, page 51. ACM, 2006.

Cees M Fortuin, Pieter W Kasteleyn, and Jean Ginibre. Correlation inequalities on some partially ordered sets. Communications in Mathematical Physics, 22(2):89 103, 1971.

Mark Granovetter. Threshold models of collective behavior. American journal of sociology, pages 1420 1443, 1978.

Brian Karrer and Mark EJ Newman. Message passing approach for general epidemic models. Physical Review E, 82(1):016101, 2010.

Brian Karrer, MEJ Newman, and Lenka Zdeborov a. Percolation on sparse networks. Physical review letters, 113(20):208702, 2014.

David Kempe, Jon Kleinberg, and Eva Tardos. Maximizing the spread of inﬂuence through a social network. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 137 146. ACM, 2003.

Abdelmajid Khelil, Christian Becker, Jing Tian, and Kurt Rothermel. An epidemic model for information diﬀusion in manets. In Proceedings of the 5th ACM international workshop on Modeling analysis and simulation of wireless and mobile systems, pages 54 60. ACM, 2002.

Abbe, Kulkarni and Lee

Justin T Khim, Varun Jog, and Po-Ling Loh. Computing and maximizing inﬂuence in linear threshold and triggering models. In Advances in Neural Information Processing Systems, pages 4538 4546, 2016.

Florent Krzakala, Cristopher Moore, Elchanan Mossel, Joe Neeman, Allan Sly, Lenka Zdeborov a, and Pan Zhang. Spectral redemption in clustering sparse networks. Proceedings of the National Academy of Sciences, 110(52):20935 20940, 2013.

Eun Jee Lee, S. Kamath, E. Abbe, and S. R. Kulkarni. Spectral bounds for independent cascade model with sensitive edges. In 2016 Annual Conference on Information Science and Systems (CISS), pages 649 653, March 2016. doi: 10.1109/CISS.2016.7460579.

Remi Lemonnier, Kevin Scaman, and Nicolas Vayatis. Tight bounds for inﬂuence in diﬀusion networks and application to bond percolation and epidemiology. In Advances in Neural Information Processing Systems, pages 846 854, 2014.

Jure Leskovec, Lada A Adamic, and Bernardo A Huberman. The dynamics of viral marketing. ACM Transactions on the Web (TWEB), 1(1):5, 2007.

Dunia Lopez-Pintado and Duncan J Watts. Social inﬂuence, binary decisions and collective dynamics. Rationality and Society, 20(4):399 443, 2008.

Boris Shulgin, Lewi Stone, and Zvia Agur. Pulse vaccination strategy in the sir epidemic model. Bulletin of Mathematical Biology, 60(6):1123 1148, 1998.

Chi Wang, Wei Chen, and Yajun Wang. Scalable inﬂuence maximization for independent cascade model in large-scale social networks. Data Mining and Knowledge Discovery, 25 (3):545 576, 2012.

Duncan J Watts. A simple model of global cascades on random networks. Proceedings of the National Academy of Sciences, 99(9):5766 5771, 2002.

Jiang Yang and Scott Counts. Predicting the speed, scale, and range of information diﬀusion in twitter. 2010.