# generalized_nonbacktracking_bounds_on_the_influence__a7e4220a.pdf Journal of Machine Learning Research 21 (2020) 1-36 Submitted 3/18; Revised 10/19; Published 02/20 Generalized Nonbacktracking Bounds on the Influence in Independent Cascade Models Emmanuel Abbe eabbe@princeton.edu Program in Applied and Computational Mathematics and Department of Electrical Engineering Princeton University Princeton, NJ 08540, USA Sanjeev Kulkarni kulkarni@princeton.edu Department of Electrical Engineering Princeton University Princeton, NJ 08540, USA Eun Jee Lee ejlee.mail@gmail.com Program in Applied and Computational Mathematics Princeton University Princeton, NJ 08540, USA Editor: Bert Huang This paper develops deterministic upper and lower bounds on the influence measure in a network, more precisely, the expected number of nodes that a seed set can influence in the independent cascade model. In particular, our bounds exploit r-nonbacktracking walks and Fortuin Kasteleyn Ginibre (FKG) type inequalities, and are computed by message passing algorithms. Further, we provide parameterized versions of the bounds that control the trade-offbetween efficiency and accuracy. Finally, the tightness of the bounds is illustrated on various network models. Keywords: Influence Estimation, Nonbacktracking Walk, Message Passing, Social Networks, Independent Cascade Model 1. Introduction Social interaction is a building block of the society. Through interacting with each other, people exchange their information, ideas, opinions, etc. and influence one another. Influence propagation is a study which investigates how influence spread given a social network from initially influenced nodes, called seeds, in the network. Studying how influence spreads in a network allows us to answer many important questions in a broad range of fields, such as viral marketing (Leskovec et al., 2007), sociology (Granovetter, 1978; Lopez-Pintado and Watts, 2008; Watts, 2002), communication (Khelil et al., 2002), epidemiology (Shulgin et al., 1998), and social network analysis (Yang and Counts, 2010). With the advent of the Internet and the availability of large data on social networks, influence propagation is becoming the center of interests in machine learning and data mining community. The community has proposed several different mathematical models to abstractize influence propagation in the real world, and one of the most widely adopted c 2020 Emmanuel Abbe, Sanjeev Kulkarni, Eun Jee Lee. License: CC-BY 4.0, see https://creativecommons.org/licenses/by/4.0/. Attribution requirements are provided at http://jmlr.org/papers/v21/18-112.html. Abbe, Kulkarni and Lee models is the independent cascade model (ICM). In this model, a network is defined as a directed graph G = (V, E) where every edge is given transmission probability. Each edge decides its state, live or not, with a Bernoulli trial with the success rate equals to its transmission probability. Then, the influence propagation starts from the initially influenced nodes, called the seeds, and spreads along the live edges. A node is influenced if it is reachable from the seeds with the live edges. While various important information can be extracted by studying ICM, the most fundamental and essential problem in influence propagation is to find the influence, the expected number of influenced nodes given a network and a set of initially influenced nodes. Unfortunately, computing the influence of a network in ICM is proven to be #P hard (Wang et al., 2012) and thus many previous works have used Monte Carlo (MC) simulations to estimate the true influence (Kempe et al., 2003; Chen et al., 2009). Despite its simplicity, approximating the influence via MC simulations can be computationally expensive. MC approximation requires O(|V |2) trials to concentrate and each trial requires O(|V | + |E|) computation. Considering that modern real-world networks often have multiple millions of nodes, this amount of computation quickly becomes unreasonable. To overcome the limitations of Monte Carlo simulations, many researchers have been taking both algorithmic and theoretical approaches to approximate the influence of given seeds in a network. Chen and Teng (Chen and Teng, 2017) provided a probabilistic guarantee on estimating the influence of a single seed with a relative error bound with the expected running time O(ℓ(|V | + |E|)|V | log |V |/ε2), such that with probability 1 1/nℓ, for all node v, the computed influence of v has relative error at most ε. Draief et al., (Draief et al., 2006) introduced an upper bound for the influence of any set of seeds by using the spectral radius of the adjacency matrix. Tighter upper bounds were later suggested in (Lemonnier et al., 2014) which relate the ratio of influenced nodes in a network to the spectral radius of the so-called Hazard matrix. Further, improved upper bounds which account for sensitive edges were introduced in (Lee et al., 2016). In contrast, there has been little work on finding a tight lower bound for the influence. An exception is a work by Khim et al. (Khim et al., 2016), where the lower bound is obtained by only considering the influence through the paths whose length is no more than a parameter m. In this article, we extend the results in the paper (Abbe et al., 2017) and propose upper and lower bounds on the influence using nonbacktracking walks and Fortuin Kasteleyn Ginibre (FKG) type inequalities. FKG inequality is useful to prove correlation between non-decreasing functions, which is instrumental in influence propagation since the probability that a node is influenced is non-decreasing with respect to the partial order of random variables describing the states of the edges. The bounds can be efficiently obtained by message passing algorithms. This shows that nonbacktracking walks can also impact influence propagation, making another case for the use of nonbacktracking walks in graph and network problems as in (Krzakala et al., 2013; Karrer et al., 2014; Bordenave et al., 2015; Abbe and Sandon, 2015), discussed later in the paper. Next, we show that the proposed upper bound is monotone and submodular. This property allows us to use the upper bound, rather than some approximations, when greedily finding a set of seeds that maximizes the influence. Further, we provide parametrized versions of the bounds that can adjust the trade-offbetween the efficiency and the accuracy of the bounds. Generalized Nonbacktracking Bounds on the Influence 2. Background We introduce here the independent cascade model and provide background for the main results. Definition 1 (Independent Cascade Model). Consider a directed graph G = (V, E) with |V | = n, a transmission probability matrix B [0, 1]n n, and a seed set S0 V . For any u V , let N+(u) be the set of out-neighbors of node u. The independent cascade model IC(G, B) sequentially generates the influenced set St V starting from the seed set S0 for each step t 1 as follows. At the beginning of step t, St is initialized to be an empty set. Then, each node u St 1 attempts to influence v N+(u)\ t 1 i=0Si with probability Buv, i.e., node u influences its uninfluenced out-neighbor v with probability Buv. If v is influenced as a result of the attempts, add v to St. At the end of step t, St contains all nodes that are influenced during step t. The process stops at T if ST = at the end of the step t = T. The set of the influenced nodes at the end of propagation is defined as S = T 1 i=0 St. We often refer an edge (u, v) being live if node u influences node v. The independent cascade model is equivalent to the live-edge graph model, where the influence happens at once, rather than sequentially. The live-edge graph model first decides the state of every edge with a Bernoulli trial, i.e., edge (u, v) becomes live independently with probability Buv. Then, the set of the influenced nodes is defined as the nodes that are reachable from at least one of the seeds by the live edges. Definition 2 (Influence). The expected number of nodes that are influenced at the end of the propagation process is called the influence (rather than the expected influence, with a slight abuse of terminology) of a seed set S0 on IC(G, B), and is defined as v V P(v is influenced). It is shown in (Wang et al., 2012) that computing the influence σ(S0) in the independent cascade model IC(G, B) is #P-hard, even with a single seed, i.e., |S0| = 1. Next, we define nonbacktracking (NB) walks on a directed graph. Nonbacktracking walks have been used for studying the characteristics of networks. To the best of our knowledge, the use of NB walks in the context of epidemics was first introduced in the paper of Karrer et al. (Karrer and Newman, 2010) and later applied to percolation in (Karrer et al., 2014). In particular, Karrer et al. reformulate the spread of influence as a message passing process and demonstrate how the resulting equations can be used to calculate an upper bound on the number of nodes that are susceptible at a given time. As we shall see, we take a different approach to the use of the NB walks, which focuses on the effective contribution of a node in influencing another node and accumulates such contributions to obtain upper and lower bounds. More recently, nonbacktracking walks are used for community detection (Krzakala et al., 2013; Bordenave et al., 2015; Abbe and Sandon, 2015). Definition 3 (Nonbacktracking Walk). Let G = (V, E) be a directed graph. A nonbacktracking walk of length k is defined as w(k) = (v0, v1, . . . , vk), where vi V and (vi 1, vi) E for all i [k], and vi 1 = vi+1 for all i [k 1]. Abbe, Kulkarni and Lee We next recall a key inequality introduced by Fortuin et. al (Fortuin et al., 1971). Theorem 1 (FKG Inequality). Let (Γ, ) be a distributive lattice, where Γ is a finite partially ordered set, ordered by , and let µ be a positive measure on Γ satisfying the following condition: for all x, y Γ, µ(x y)µ(x y) µ(x)µ(y), where x y = max{z Γ : z x, z y} and x y = min{z Γ : y z, y z}. Let f and g be both increasing (or both decreasing) functions on Γ. Then, x Γ µ(x))( X x Γ f(x)g(x)µ(x)) ( X x Γ f(x)µ(x))( X x Γ g(x)µ(x)). 3. Nonbacktracking Upper Bounds In this section, we present three types of nonbacktracking upper bounds on the influence in the independent cascade model and explain the motivations and intuitions of the bounds. The bounds use nonbacktracking walks and FKG inequalities and are computed efficiently by message passing algorithms. The proposed upper bounds listed below have varying efficiency and accuracy trade-offs. Nonbacktracking upper bound (NB-UB) Tunable nonbacktracking upper bound (t NB-UB) r-nonbacktracking upper bound (r NB-UB) In particular, the nonbacktracking upper bound on a network based on a graph G(V, E) runs in O(|V |2 + |V ||E|), whereas Monte Carlo simulation would require O(|V |3 + |V |2|E|) computations without knowing the variance of the influence, which is harder to estimate than the influence. The reason for the large computational complexity of MC is that in order to ensure that the standard error of the estimation does not grow with respect to |V |, MC requires O(|V |2) computations. Hence, for large networks, where MC may not be feasible, our algorithms can still provide bounds on the influence. Furthermore, the proposed upper bound is monotone and submodular, so that it can be used in place of MC approximation in greedy algorithm for influence maximization problem. Details of the greedy algorithm in influence maximization problem can be found in (Kempe et al., 2003). The tunable nonbacktracking upper bound better accounts for the sensitive (as defined in (Lee et al., 2016)) edges near the seeds. However, for large values of t, it may result in exponential computational complexity for dense networks. The r-nonbacktracking upper bound generalizes the nonbacktracking upper bound by considering r-nonbacktracking walks, i.e., avoiding cycles of length r rather than just backtracking, and runs in O((|V | + |E|)|V |r 1). This section is organized as follows. First, sections 3.1 through 3.3 present the definition of each proposed upper bound followed by an efficient algorithm to compute the bound and its computational complexity. Next, section 3.4 shows how each upper bound is computed on a small example network and compares the bounds with the influence. Generalized Nonbacktracking Bounds on the Influence 3.1. Nonbacktracking Upper Bound (NB-UB) We start by defining the following terms for the independent cascade model IC(G, B), where G = (V, E) and |V | = n. Definition 4. For any v V , we define the set of in-neighbors N (v) = {u V : (u, v) E} and the set of out-neighbors N+(v) = {u V : (v, u) E}. Definition 5. For any v V and l [n 1], the set Pl(S0 v) is defined as the set of all paths with length l from any seed s S0 to v. We call a path P is live iffevery edge in P is live. For l = 0, we define P0(S0 v) as the set (of size one) of the zero-length path containing node v and assume the path P P0(S0 v) is live iffv S0. Definition 6. For any v V and l {0, . . . , n 1}, we define p(v) = P(v is influenced) pl(v) = P( P Pl(S0 v){P is live}) pl(u v) = P( P Pl(S0 u),P v{P is live and edge (u, v) is live}) In other words, pl(v) is the probability that node v is influenced by live paths of length l, i.e., there exists a live path of length l from a seed to v, and pl(u v) is the probability that v is influenced by node u with live paths of length l + 1, i.e., there exists a live path of length l + 1 from a seed to v that ends with edge (u, v). Lemma 1. For any v V , l=0 (1 pl(v)). (1) For any v V and l [n 1], u N (v) (1 pl 1(u v)). (2) Lemma 1, which can be proved by FKG inequalities, suggests that given pl 1(u v), we may compute an upper bound on the influence. Ideally, pl 1(u v) can be computed by considering all paths that end with (u, v) having length l. However, this results in exponential complexity O(nl), as l goes up to n 1. Thus, in Definition 7, we present an efficient way to compute an upper bound UBl 1(u v) on pl 1(u v), which in turns gives an upper bound UBl(v) on pl(v), with the following recursion formula. Definition 7. For all l {0, . . . , n 1} and u, v V such that (u, v) E, UBl(u) [0, 1] and UBl(u v) [0, 1] are defined recursively as follows. Initial condition: For every s S0, s+ N+(s), u V \S0, and v N+(u), UB0(s) = 1, UB0(s s+) = Bss+ (3) UB0(u) = 0, UB0(u v) = 0. (4) Abbe, Kulkarni and Lee Recursion: For every l [n 1], s S0, s+ N+(s), s N (s), u V \ S0, and v N+(u)\S0, UBl(s) = 0, UBl(s s+) = 0, UBl(s s) = 0 (5) UBl(u) = 1 Y w N (u) (1 UBl 1(w u)) (6) ( Buv(1 1 UBl(u) 1 UBl 1(v u)), if v N (u) Buv UBl(u), otherwise. Equation (5) follows from that for any seed node s S0 and for all l > 0, the probabilities pl(s) = 0, pl(s s+) = 0, and pl(s s) = 0. A naive way to compute UBl(u v) is UBl(u v) = Buv UBl 1(u), but this results in an extremely loose bound due to the backtracking. For a tighter bound, we use nonbacktracking in Equation (7), i.e., when computing UBl(u v), we ignore the contribution of UBl 1(v u). Finally, an upper bound on the probability that node v is influence at the end of propagation is computed by UB(v) = (1 Qn 1 l=0 (1 UBl(v))). The following theorem shows that the summation of UB(v) for v V results in an upper bound on the influence. Theorem 2 (Nonbacktracking Upper Bound (NB-UB)). For any independent cascade model IC(G, B) and a seed set S0 V , l=0 (1 UBl(v))) =: σ+(S0), (8) where UBl(v) is obtained recursively as in Definition 7. The Nonbacktracking Upper Bound is monotone and submodular. Therefore, it can be used in place of approximations on the influence in various greedy algorithms for the influence maximization problem. Theorem 3. For any independent cascade model IC(G, B), the nonbacktracking upper bound σ+ : 2V [0, |V |] is monotone and submodular. Next, we present Nonbacktracking Upper Bound (NB-UB) algorithm which computes UBl(v) and UBl(u v) by message passing. At the l-th iteration, the variables in NB-UB have the following meanings. Sl is the set of nodes that are considered in computation at the l-th iteration. Mcurr(v) = {(u, UBl 1(u v)) : u is an in-neighbor of v and u Sl 1} contains the current messages to node v and their sources, where the source is an in-neighbor u of v that is in Sl 1 and the message is the upper bound UBl 1(u v). MSrc(v) = {u : u is a in-neighbor of v and u Sl 1} is the set of in-neighbors of v in Sl 1. Mcurr(v)[u] = UBl 1(u v) is the current message from u to v. Mnext(v) = {(u, UBl(u v)) : u is an in-neighbor of v and u Sl} contains the set of messages to node v and their sources for the algorithm to use next. Generalized Nonbacktracking Bounds on the Influence Algorithm 1 Nonbacktracking Upper Bound (NB-UB) Initialize: UBl(v) = 0 for all 0 l n 1 and v V Initialize: Insert (s, 1) to Mnext(s) for all s S0 for l = 0 to n 1 do for u Sl do Mcurr(u) = Mnext(u) Clear Mnext(u) UBl(u) = Process Incoming Msg UB(Mcurr(u)) for u Sl do for v N+(u) \ S0 do Sl+1.insert(v) if v MSrc(u) then UBl(u v) = Generate Outgoing Msg UB(Mcurr(u)[v], UBl(u), Buv) Mnext(v).insert((u, UBl(u v))). else UBl(u v) = Generate Outgoing Msg UB(0, UBl(u), Buv) Mnext(v).insert((u, UBl(u v))). Output: UBl(u) for all l, u At the beginning, every seed node s S0 is initialized such that Mcurr(s) = {(s, 1)} in order to satisfy the initial condition, UB0(s) = 1. For l-th iteration, every node u in Sl is processed as follows. First, Process Incoming Msg UB(Mcurr(u)) computes UBl(u) as in Equation (6). Second, u passes a message to its neighbor v N+(u) \ S0 along the edge (u, v), and v stores (inserts) the message in Mnext(v) in preparation for the next iteration. The message contains 1) the source of the message, u, and 2) the upper bound, UBl(u v), which is computed as in Equation (7), by the function Generate Outgoing Msg UB. Finally, the algorithm outputs UBl(u) for all u V and l {0, . . . , n 1}, and the upper bound σ+(S0) is computed by Equation (8). Computational complexity: Notice that for each iteration l {0, . . . , n 1}, the algorithm accesses at most n nodes, and for each node v, the functions Process Incoming Msg UB and Generate Outgoing Msg UB are computed in O(deg(v)) and O(1), respectively. Therefore, the worst case computational complexity is O(|V |2 + |V ||E|). 3.2. Tunable Nonbacktracking Upper Bound (t NB-UB) One key observation on NB-UB is that when computing the nonbacktracking upper bound UBl(v) recursively on l, the computations with small l s are more important than the ones with larger l s since UBl(v) is a function of UBl 1(u) for some u V . In other words, computing loose bounds UBl(v) for smaller l hurts the accuracy of the bound on the influence more than computing loose bounds for larger l, because the effect of loose UBl(v) with smaller l builds up more in the final bound σ+(S0). Therefore, we propose a parameterized upper bound t NB-UB that computes pl(v) up to l t precisely. As shown in Algorithm 2, t NB-UB inputs the parameter t, which indicates the maximum length of the paths that the algorithm finds to compute the exact, rather than the upper bound on, probability of influence. That is, the algorithm computes p t(u) that node u is Abbe, Kulkarni and Lee influenced by live paths whose length is less than or equal to t. p t(u) = P( P { t i=0Pi(S0 u)}{P is live}). (9) Then, we start (non-parameterized) NB-UB algorithm from l = t + 1 with the new initial conditions: for all u V and v N+(u), UBt(u) = p t(u) (10) UBt(u v) = pt(u v) (11) Finally, the upper bound t NB-UB is computed as P v V (1 Qn 1 l=t (1 UBl(v))). Algorithm 2 Tunable nonbacktracking upper bound (t NB-UB) parameter: non-negative integer t n 1 Initialize: UBt(v) = 0 for all t l n 1 and v V for u V do UBt(u) = p t(u) for v N+(u) \ S0 do if pt(u v) > 0 then St+1.insert(v) Mnext(v).insert(u, pt(u v)) for l = t + 1 to n 1 do for u Sl do Mcurr(u) = Mnext(u) Clear Mnext(u) UBl(u) = Process Incoming Msg UB(Mcurr(u)) for u Sl do for v N+(u) \ S0 do Sl+1.insert(v) if v Mcurr(u) then UBl(u v) = Generate Outgoing Msg UB(Mcurr(u)[v], UBl(u), Buv) Mnext(v).insert((u, UBl(u v))). else UBl(u v) = Generate Outgoing Msg UB(0, UBl(u), Buv) Mnext(v).insert((u, UBl(u v))). Output: UBl(u) for all l = {t, t + 1, . . . , n 1}, u V For larger values of t, the algorithm results in tighter upper bounds, while the computational complexity may increase exponentially for dense networks. Thus, this method is most applicable in sparse networks, where the degree of each node is bounded. 3.3. r Nonbacktracking Upper Bound (r NB-UB) In this section, we generalize the nonbacktracking upper bound by considering r-nonbacktracking walks, i.e., avoiding cycles of length r rather than just backtracks. The r-nonbacktracking upper bound is not necessarily a t NB-UB which computes p t(v) for all v V exactly, even in case of r = t. Generalized Nonbacktracking Bounds on the Influence Definition 8. Consider a directed graph G = (V, E). For any integer r 1 and node u V , let Pr 1( u) be the set of all paths of length r 1 ending at node u and Pr 1(u ) be the set of all paths of length r 1 starting from node u. To simplify the notation, we use Pr 1(S ) for a set of nodes S to denote the set of all paths of length r 1 starting from any one of the nodes in S. For two same length paths P = (v1, . . . , vr) and Q = (u1, . . . , ur), we use the notation P Q when vi+1 = ui for all i [r 1] and v1 = ur. In other words, P Q iff (v1, v2 = u1, . . . , vr = ur 1, ur) is a path of length r. For two paths P = (v0, . . . , vr) and Q = (u1, . . . , ur), we use the notation Q P when vi = ui for all i [r]. That is, Q is a subpath of P which is obtained by removing the first node v0 from P. Also, let P Q = (v0, v1) be the edge in path P but not in path Q. Recall that NB-UB computes the upper bounds UBl(u) on pl(u) and UBl(u v) on pl(u v) recursively. Similarly, r NB-UB computes the upper bounds UBl(u) and UBl(u, P) recursively, as defined in Definition 9. UBl(u, P) is analogous to UBl(u v), since UBl(u, P) provides the effective influence of u along the path P whereas UBl(u v) gives the effective influence of u along the edge (u, v). Definition 9 (r-Nonbacktracking Upper Bound (r NB-UB)). For all l {0, . . . , n 1}, u V and P Pr 1(u ), we define UBl(u) [0, 1] and UBl(u, P) [0, 1] as follows, using function f defined later in Definition 10. Initial condition: For every l r 1 and u V , UBl(u) = f( P Pl(S0 u)(P, 1)). (12) For every P Pr 1(s ) and s S0, UBr 1(s, P) = 1. (13) Recursion: For every l {r, . . . , n 1}, s S0, u, x V \S0, v N+(u)\S0, and Q Pr 1(v ), UBl(s) = 0, UBl(s, P) = 0 for any path P (14) UBl(v, Q) = 1 Y P Q (1 Buv UBl 1(u, P)) (15) UBl(x) = f( P {P Pr 1(y x): y V \S0}(P, UBl(y, P))). (16) Equation (12) computes the upper bounds on the probability that node u is influenced by at least one length l path, for each l r 1. To provide the initial values, we set UBr 1(s, P) = 1 indicating that the probability of a seed node s being influenced equals to 1. In the recursion steps, Equation (15) computes UBl(v, Q) the upper bound on the probability that the starting node v of path Q is influenced. Note that the upper bound UBl(v, Q) = UBl (r 1)(v), since UBl(v, Q) is the upper bound on the probability that node v is influenced by a path of length l (r 1) that does not contain any node in Q, whereas UBl (r 1)(v) is the upper bound on the probability that node v is influenced by any path of length l (r 1). Note that given path Q, the starting node v is deterministic. We use UBl(v, Q) instead of UBl(Q) in order to emphasize that the bound is for node v of path Q. Abbe, Kulkarni and Lee Finally, Equation (16) computes UBl(x), the upper bound on the probability that node x is influenced by at least one r-nonbacktracking walks of length l by the function f. In the following, we define the function f. Definition 10. Consider a directed graph G = (V, E) and a non-negative integer r. The function f takes input P (r) = {P1, . . . , Pm} and x(r) = {x(r) P1 , . . . , x(r) Pm}, where P (r) is a set of length r paths in G ending at the same node, say v, and x(r) is the set of real numbers in [0, 1] associated with the paths. When the input is the empty sets, f( , ) = 0. Otherwise, f(P (r), x(r)) computes the output by the following recursion. For every i [r] from r to 1, P (i 1) = {P : P P, P P (i)} (17) and for every P P (i 1), x(i 1) P = 1 Y P {P:P P,P P (i)} (1 Bex(i) P ) (18) where e = P P and Be is the transmission probability of edge e. By the definition of P (r), the set P (0) has only one element which is the path P0 = (v) of length 0 containing node v. Therefore, at the end of the recursion, i = 1, it computes the final output x(0) P0 [0, 1]. We next illustrates how to compute the function f in a small example. Consider a directed graph G where all edges have the same transmission probability p. Let the function f take the inputs P (3) = {(a, b, c, d), (a, f, c, d), (e, f, c, d)} and f(3) = {1, 1, 1}. Then, P (2) = {(b, c, d), (f, c, d)} since (b, c, d) (a, b, c, d) and (f, c, d) (a, f, c, d), (e, f, c, d). Likewise, P (1) = {(c, d)} as (c, d) (b, c, d), (f, c, d) and P (0) = {(d)} as (d) (c, d). In figure 1, it shows the sets P (3), . . . , P (0) in a radix tree structure. With the radix tree, we can easily find the set {P : P P, P P (i)} by accessing the descendants of the path P . For example, path P = (f, c, d) in the radix tree has two descendants a and e, where each corresponds to the path (a, f, c, d) and (e, f, c, d). This illustrates the set {P : (f, c, d) P, P P (i)} = {(a, f, c, d), (e, f, c, d)}. Figure 1: An example of radix tree structure Generalized Nonbacktracking Bounds on the Influence The function f computes the output recursively from r = 3 to 1. f(2) (b,c,d) = 1 Y P {(a,b,c,d)} (1 pf(3) P ) = p f(2) (f,c,d) = 1 Y P {(a,f,c,d),(e,f,c,d)} (1 pf(3) P ) = 2p p2 f(1) (c,d) = 1 Y P {(b,c,d),(f,c,d)} (1 pf(2) P ) = 3p2 p3 2p4 + p5 f(0) (d) = 1 Y P {(c,d)} (1 pf(1) P ) = 3p3 p4 2p5 + p6. We use the function f to compute the upper bound UBl(v) in each step, instead of using Lemma 1, Equation (2) as in NB-UB. This results in a tighter upper bound since Equation (2) bounds the union of events that node v is influenced by at least one of the walks by treating each walk independent whereas f accounts for the dependencies among the walks. Also, this function ensures that r-nonbacktracking upper bound is smaller than or equal to r -nonbacktracking upper bound when r > r . Theorem 4. For any independent cascade model IC(G, B), a seed set S0 V and an integer r 2, l=0 (1 UBl(v))) =: σ+ r (S0) σ+(S0), (19) where UBl(v) is obtained recursively as in Definition 9. Next, we present the r-Nonbacktracking Upper Bound (r NB-UB) algorithm which computes UBl(v) and UBl(v, P) by message passing. At the l-th iteration, the variables in r NB-UB have the following meanings. Sl represents the set of nodes that are used in the computations at the l-th iteration. Mcurr(u) = {(P, UBl 1( , P)) : P Pr 1( u) and u Sl 1} is the set of messages that node u received in the previous step. The message contains a (r 1)-length path ending at node u and the bound associated with the path. Mnext(u) = {(P, UBl( , P)) : P Pr 1( u) and u Sl} is the set messages that node u receive in the current iteration. The message contains a (r 1)-length path ending at u and the bound associated with the path. At the beginning, the first r 1 upper bounds for each node are computed as well as UBr 1(s, P) for all s S0 and P Pr 1(S0 ) to satisfy the initial condition in Definition 9. For each l-th iteration, the algorithm processes every node u in Sl as follows. First, f(Mcurr(u)) computes the upper bound UBl(u) as in Equation (16) defined in Definition 10. Second, u transmits a message to its neighbor v N+(u) \ S0, and v stores (inserts) the message in Mnext(v) for the next iteration. The message contains 1) the path Q of length r 1 ending at node v which satisfies P Q, and 2) UBl(v2, Q), which is computed as in Abbe, Kulkarni and Lee Algorithm 3 r-nonbacktracking upper bound (r NB-UB) parameter: non-negative integer 2 r n 1 Initialize: UBl(v) = 0 for all 0 l n 1 and v V for l = 0 to r 2 do UBl(u) = f( P Pl(S0 u)(P, 1)), l r 1 for s S0 do for P = (p1, . . . , pr = u) Pr 1(s ) do UBr 1(s, P) = 1 Mnext(u).insert(P, UBr 1(s, P)) Sr 1.insert(u) for l = r 1 to n 1 do for u Sl do Mcurr(u) = Mnext(u) Clear Mnext(u) UBl(u) = f(Mcurr(u)) for u Sl do for Q = (q1, . . . , qr = v) {Q : P Q , P Pr 1( u)} do UBl(q1, Q) = r Generate Outgoing Msg UB(Mcurr(u)) Mnext(v).insert(Q, UBl(q1, Q)) Sl+1.insert(v) Output: UBl(u) for all 0 l n 1, u V Equation (15), by the function r Generate Outgoing Msg UB. Note that v2 is the start node of path Q, which is the second node in path P. Finally, the algorithm outputs UBl(u) for all u V and l {0, . . . , n 1}, and the upper bound σ+ r (S0) is computed by Equation (19). Computational complexity: To compute f, we first generate a radix tree using the input paths then compute the output from the leaves. Generating the radix tree takes O(r M) where r 1 is the length of the paths and M is the number of paths since inserting a path in a radix tree takes O(r). Then, we start the computation from the leaves to the root accessing each node once. Therefore, computing f takes O(r M). Note that for every l r and u V , there are at most | deg(u)|V |r 2| elements in Mcurr(u), since Mcurr(u) may contain all paths of length r 1 ending at node u. Therefore, function f takes O(deg(u)|V |r 2). To find the set {Q : P Q , P Pr 1( u)}, for every P Pr 1( u), check if adding an out-neighbor of node u to the end of path P forms a length r path. This takes O(deg(pr)). We pre-compute this process for every path of length r 1, which takes O((|E| + |V |)|V |r 1). In the algorithm, we compute function f at most (n r)n times and pre-compute the relationships P Q once, resulting the overall computational complexity O((|E| + |V |)|V |r 1). 3.4. Comparisons of the Upper Bounds In this section, we compare the three types of nonbacktracking upper bounds, NB-UB, t NBUB, and r NB-UB, in a small independent cascade model. The model IC(G, B) is defined Generalized Nonbacktracking Bounds on the Influence Next, process {𝑐, 𝑑} Mcurr(𝑏) { 𝑎, 𝑝} Mnext( ) UB1 𝑏 𝑎= 0 UB1 𝑏 𝑐= 𝑝2 UB1 𝑏 𝑑= 𝑝2 Next, process {𝑏} Mcurr(𝑎) { 𝑎, 1 } Mnext( ) UB0 𝑎 𝑏= 𝑝 UB0 𝑎 1 Next, process {𝑐, 𝑑, 𝑒} Mcurr(𝑐) { 𝑏, 𝑝2 } Mnext( ) UB2 𝑐 𝑏= 0 UB2 𝑐 𝑑= 𝑝3 Mcurr(𝑑) { 𝑏, 𝑝2 } Mnext( ) UB2 𝑑 𝑏= 0 UB2 𝑑 𝑐= 𝑝3 UB2 𝑑 𝑒= 𝑝3 Next, process {𝑏, 𝑒} Mcurr(𝑐) { 𝑑, 𝑝3 } Mnext( ) UB3 𝑐 𝑏= 𝑝4 UB3 𝑐 𝑑= 0 UB3 𝑐 𝑝3 Mcurr(𝑑) { 𝑐, 𝑝3 } Mnext( ) UB3 𝑑 𝑏= 𝑝4 UB3 𝑑 𝑐= 0 UB3 𝑑 𝑒= 𝑝4 Mcurr(𝑒) { 𝑑, 𝑝3 } Mnext( ) UB3 𝑒 𝑑= 0 UB3 𝑒 𝑝3 Mcurr(𝑏) { 𝑐, 𝑝4 , 𝑑, 𝑝4 } UB4 𝑏 2𝑝4 𝑝8 Graph 𝒢= 𝑉, 𝐸 Transmission probability matrix 𝒫= 𝑝A Seed set Mcurr(𝑒) { 𝑑, 𝑝4 } UB4 𝑒 𝑝4 Nonbacktracking Upper Bound (NB-UB) Figure 2: The step-wise illustration of NB-UB algorithm on the example network. on an bidirected graph G = (V, E), where V = {a, b, c, d, e}, S0 = {a}, and every edge has the same transmission probability p. In Figure 2, we describe how NB-UB algorithm runs on the network IC(G, B). Since |V | = 5, the algorithm iterates from l = 0 to l = 4. On each step, it processes the nodes in the set Sl and passes the messages generated by them. For example, at l = 3, the nodes in S3 = {c, d, e} are processed. From previous iteration l = 2, node c sent the message (c, UB2(c d)) to d, and node d sent the message (d, UB2(d c)) to c and (d, UB2(d e)) to e. Thus, Mcurr(c) = {(d, UB2(d c))} = {(d, p3)} Mcurr(d) = {(c, UB2(c d))} = {(c, p3)} Mcurr(e) = {(d, UB2(d e))} = {(d, p3)} and the nodes c, d, e are processed as follows. First, compute the upper bounds at the step l = 3 as UB3(c) = Process Incoming Msg UB(Mcurr(c)) = p3 UB3(d) = Process Incoming Msg UB(Mcurr(d)) = p3 UB3(e) = Process Incoming Msg UB(Mcurr(e)) = p3. Abbe, Kulkarni and Lee Next, each node computes the messages for its out-neighbors. UB3(c b) = Generate Outgoing Msg UB(0, UB3(c), Bcb) = Bcb(1 1 UB3(c) UB3(c d) = Generate Outgoing Msg UB(UB2(d c), UB3(c), Bcd) = Bcd(1 1 UB3(c) 1 UB2(d c)) = 0 UB3(d b) = Generate Outgoing Msg UB(0, UB3(d), Bdb) = Bdb(1 1 UB3(d) UB3(d c) = Generate Outgoing Msg UB(UB2(c d), UB3(d), Bdc) = Bdc(1 1 UB3(d) 1 UB2(c d)) = 0 UB3(d e) = Generate Outgoing Msg UB(0, UB3(d), Bde) = Bde(1 1 UB3(d) UB3(e d) = Generate Outgoing Msg UB(UB2(d e), UB3(d), Bed) = Bed(1 1 UB3(d) 1 UB2(d e)) = 0. (20) Then, node c send message (b, UB3(c b)) to b, and node d send messages (d, UB3(d b)) to b and (d, UB3(d e)) to e, concluding the process of the l = 3 step. P 3(S0 a) {(a)} UB3(a) 1 Mnext( ) 𝑛𝑜𝑛𝑒 Mcurr(𝑒) { 𝑑, 𝑝4 } UB4 𝑒 𝑝4 𝑙= 4 Initialize : 𝑡= 3 Tunable Nonbacktracking Upper Bound (t NB-UB), t=3 P 3(S0 c) { a, b, c , (a, b, d, c)} UB3(c) 𝑝2 + 𝑝3 𝑝4 Mnext( ) UB3 𝑐 𝑏= 0 UB3 𝑐 𝑑= 0 P 3(S0 d) { a, b, d , (a, b, c, d)} UB3(d) 𝑝2 + 𝑝3 𝑝4 Mnext( ) UB3 𝑑 𝑏= 0 UB3 𝑑 𝑐= 0 UB3 𝑑 𝑒= 𝑝4 P 3(S0 e) { (a, b, d, e)} UB3(e) 𝑝3 Mnext( ) UB3 𝑒 𝑑= 0 P 3(S0 b) {(a, b)} UB3(b) 𝑝 Mnext( ) 𝑛𝑜𝑛𝑒 Next, process {𝑒} Figure 3: The step-wise illustration of t NB-UB algorithm on the example network. Next, we show the process of t NB-UB in the same network when t = 3 in Figure 3. As the first step, t NB-UB finds all paths from the seed a having the length less than or equal to 3. Then, for each node u {a, b, c, d, e}, it computes the exact probability UB3(u) that the node is influenced through a path of length less than or equal to 3. For example, UB3(d) = P(Path (a, b, d) is live or Path (a, b, c, d) is live) = p2 + p3 p4. Generalized Nonbacktracking Bounds on the Influence To initialize the process, t NB-UB computes UB3(u v), the probability that node v is influenced directly by u with a live path of length 4 from the seed a. For node d, there is a path (a, b, c, d) of length 3 from the seed, so we compute UB3(d v) for its neighbors {b, c, e}. Notice that there is no path from a that ends with (d, b) with length 4 neither a path from a that ends with (d, c) with length 4. However, (a, b, c, d, e) is a path from a ending with (d, e). Thus, UB3(d b) = 0 UB3(d c) = 0 UB3(d e) = P(Path (a, b, c, d, e) is live) = p4. After the initialization, t NB-UB repeats the same process as NB-UB from l = t + 1 to l = n 1. Process 𝑙= 1 Next, process {𝑐, 𝑑} 𝑃1(𝑠0 ) {(𝑎, 𝑏)} UB1 𝑏 𝑝 Mnext( ) 𝑈𝐵2 𝑎, (𝑎, 𝑏, 𝑐) = 1 𝑈𝐵2 𝑎, (𝑎, 𝑏, 𝑑) = 1 Process 𝑙= 0 𝑃0(𝑠0 ) { 𝑎} UB0 𝑎 1 Next, process {𝑐, 𝑑, 𝑒} Mcurr(𝑐) { (𝑎, 𝑏, 𝑐), 1 } Mnext( ) 𝑈𝐵3 𝑏, (𝑏, 𝑐, 𝑑) = 𝑝 UB2 𝑐 𝑝2 Mcurr(𝑑) { (𝑎, 𝑏, 𝑑), 1 } Mnext( ) 𝑈𝐵3 𝑏, (𝑏, 𝑑, 𝑐) = 𝑝 𝑈𝐵3 𝑏, (𝑏, 𝑑, 𝑒) = 𝑝 UB2 𝑑 𝑝2 Next, process {𝑒} Mcurr(𝑐) { (𝑏, 𝑑, 𝑐), 𝑝} Mnext( ) 𝑛𝑜𝑛𝑒 UB3 𝑐 𝑝3 Mcurr(𝑑) { 𝑏, 𝑐, 𝑑, 𝑝} Mnext( ) 𝑈𝐵4 𝑐, (𝑐, 𝑑, 𝑒) = 𝑝2 Mcurr(𝑒) { (𝑏, 𝑑, 𝑒), 𝑝} Mnext( ) 𝑛𝑜𝑛𝑒 UB3 𝑒 𝑝3 Mcurr(𝑒) { (𝑐, 𝑑, 𝑒), 𝑝2 } UB4 𝑒 𝑝4 r-Nonbacktracking Upper Bound (r NB-UB) Initialize : 𝑟= 3 Figure 4: The step-wise illustration of r NB-UB algorithm on the example network. In Figure 4, we illustrate r NB-UB in the same example network. For the initialization, r NB-UB computes UBi(v) for all i {0, . . . , r 2} = {0, 1} and v {a, b, c, d, e}. Since there is only one length 0 path from the seed a, we have P = (a), UB0(a) = 1 and UB0(v) = 0, v {b, c, d, e}. Similarly, UB1(b) = p and UB1(v) = 0, v {a, c, d, e}. Then, the algorithm finds all paths from the seed a with length r 1 = 2 and sends messages (P, UB2(a, P) = 1) to the end node of the each path. In this network, there are two paths of length 2, (a, b, c) and (a, b, d). Thus, r NB-UB sends ((a, b, c), 1) to node c and ((a, b, d), 1) to node d. From l = r 1 to n 1, each node v in set Sl first computes the upper bound UBl(v) by the function f in Definition 10. For example, at l = 2, nodes in S2 = {c, d} compute the following upper bounds. Since both Mcurr(c) = {(a, b, c), 1} and Mcurr(d) = {(a, b, d), 1} Abbe, Kulkarni and Lee have only one element each, UB2(c) = 1 (1 p(1 (1 p))) = p2 UB2(d) = 1 (1 p(1 (1 p))) = p2. Next, each node in set Sl checks whether adding its out-neighbor to the previously received messages paths forms a path or not. If so, send a message to the neighbor with the updated path and value. For example, node c has the message ((a, b, c), 1) from l = 1 step. For node c s two out-neighbors b and d, adding d forms a path but not adding b. Thus, node c sends the message ((b, c, d), p) to node d, where (b, c, d) is length 2 sub-path of the path (a, b, c, d) ending at node d. Similarly, d sends the message ((b, d, c), p) to node c and ((b, d, e), p) to node e, concluding the l = 2 step. Now, we compare the three upper bounds, NB-UB, t NB-UB, and r NB-UB, on the example graph G to the exact influence σ. σ = 1 + p + (p2 + p3 p4) + (p2 + p3 p4) + (p3 + p4 p5) = 1 + p + 2p2 + 3p3 p4 p5 σ+ = 1 + (p + 2p4 2p5 p8 + p9) + (p2 + p3 p5) + (p2 + p3 p5) + (p3 + p4 p7) = 1 + p + 2p2 + 3p3 + 3p4 4p5 p7 p8 + p9 σ+ t = 1 + p + (p2 + p3 p4) + (p2 + p3 p4) + (p3 + p4 p7) = 1 + p + 2p2 + 3p3 p4 p7 σ+ r = 1 + p + (p2 + p3 p5) + (p2 + p3 p5) + (p3 + p4 p7) = 1 + p + 2p2 + 3p3 + p4 2p5 p7 The tunable upper bound σ+ t with t = 3 results in the tightest bound. t NB-UB computes the influence probability for nodes a, b, c and d exactly, since they can only be influenced by a path with length less than or equal to 3. The r-nonbacktracking bound σ+ r with r = 3 avoids all triangles when computing the bound. Thus, it only considers paths rather than walks. However, r NB-UB is still not exact, because it assumes the independence among the events that a node is influenced by a length l path, for all l n 1. The nonbacktracking bound σ+ results in the worst bound. The bound not only assumes the independence as in r NB-UB, but also considers all walks that include the triangle formed by nodes b, c and d. 4. Nonbacktracking Lower Bounds In this section, we introduce a nonbacktracking lower bound on the influence in the independent cascade model. The bound uses nonbacktracking walks and is computed efficiently by message passing algorithms in O(|V | + |E|). Similar to nonbacktracking upper bound, we also present a parameterized version of the lower bound, tunable nonbacktracking lower bound. Furthermore, from the proposed upper σ+ and lower σ bounds on the expected number of influenced nodes, we can compute an upper bound on the variance given by (σ+ σ )2/4. This could be used to estimate the number of computations needed by MC simulations. Computing the upper bound on the variance with the proposed bounds can be done in O(|V |2 + |V ||E|), whereas computing the variance with MC requires O(|V |5 + |V |4|E|). Generalized Nonbacktracking Bounds on the Influence This section is organized as follows. First, sections 4.1 and 4.2 shows the definitions of the proposed lower bounds and efficient algorithms to compute them. Next, section 4.3 shows how each lower bound is computed on a small example network and compares the bounds with the influence. 4.1. Nonbacktracking Lower Bound (NB-LB) A naive way to compute a lower bound on the influence of a seed set S0 in a network IC(G, B) is to reduce the network to a (spanning) tree network, by removing edges. Then, since there is a unique path from a node to another, we can compute the influence of the tree network, which is a lower bound on the influence in the original network, in O(|V |). We take this approach of generating a subnetwork from the original network, yet we avoid the significant gap between the bound and the influence by considering the following directed acyclic subnetwork, in which there is no backtracking walk. Definition 11 (Min-distance Directed Acyclic Subnetwork). Consider an independent cascade model IC(G, B), where G = (V, E) and |V | = n, and a seed set S0 V . Let d(S0, v) := mins S0 d(s, v), i.e., the minimum distance from a seed in S0 to v. A minimumdistance directed acyclic subnetwork (MDAS), IC(G , B ), where G = (V , E ), is obtained as follows. V = {v1, ..., vn} is an ordered set of nodes such that for every i < j, d(S0, vi) d(S0, vj). E = {(vi, vj) E : i < j}, i.e., E is obtained from E by removing edges whose source node comes later in the order than its destination node. B vivj = Bvivj, if (vi, vj) E , and B vivj = 0, otherwise. If there are multiple ordered sets of vertices satisfying the condition, we may choose one arbitrarily. While any of the ordered sets satisfying the condition results in a lower bound, the tightness of the bound can varies depending on the ordered set chosen for the computation. This is further explained in 4.3 with an example. For any k [n], let p(vk) be the probability that vk V is influenced in MDAS, IC(G , B ). Since p(vk) is equivalent to the probability of the union of the events that an in-neighbor ui N (vk) influences vk, p(vk) can be computed by the principle of inclusion and exclusion. Thus, we may compute a lower bound on p(vk), using Bonferroni inequalities, if we know the probabilities that in-neighbors u and v both influences vk, for every pair u, v N (vk). However, computing such probabilities can take O(kk). Hence, we present LB(vk) which efficiently computes a lower bound on p(vk) by the following recursion. Definition 12. For all vk V , LB(vk) [0, 1] is defined by the recursion on k as follows. Initial condition: For every vs S0, LB(vs) = 1. (21) Recursion: For every vk V \ S0, B uivk LB(ui)(1 j=1 B ujvk) Abbe, Kulkarni and Lee where N (vk) = {u1, . . . , um} is the ordered set of in-neighbors of vk in IC(G , B ) and m =max{m m : Pm 1 j=1 B ujvk 1}. Remark. Since the i-th summand in Equation (22) can use Pi 2 j=1 B ujvk, which is already computed in (i 1)-th summand, to compute Pi 1 j=1 B ujvk, the summation takes at most O(deg(vk)). Note that Equation (22) is formulated so that the computational complexity is minimized. However, we may use some other equations to achieve the lower bound property. Theorem 5 (Nonbacktracking Lower Bound (NB-LB)). For any independent cascade model IC(G, B), a seed set S0 and its MDAS, IC(G , B ), vk V LB(vk) =: σ (S0), (23) where LB(vk) is obtained recursively as in Definition 12. Next, we present Nonbacktracking Lower Bound (NB-LB) algorithm which efficiently computes LB(vk). At k-th iteration, the variables in NB-LB represent the followings. M(vk) = {(LB(vj), B vjvk) : vj is an in-neighbor of vk}, set of pairs (incoming message from an in-neighbor vj to vk, the transmission probability of edge (vj, vk)). Algorithm 4 Nonbacktracking Lower Bound (NB-LB) Input: directed acyclic network IC(G , B ) Initialize: σ = 0 Initialize: Insert (1, 1) to M(vi) for all vi S0 for k = 1 to n do LB(vk) = Process Incoming Msg LB(M(vk)) σ += LB(vk) for vl N+(vk) \ S0 do M(vl).insert((LB(vk), B vkvl)) At the beginning, every seed node s S0 is initialized such that M(s) = {(1, 1)} in order to satisfy the initial condition, LB(s) = 1. For each k-th iteration, node vk is processed as follows. First, LB(vk) is computed as in the Equation (22), by the function Process Incoming Msg LB, and added to σ . Second, vk passes the message (LB(vk), B vkvl) to its out-neighbor vl N+(vk)\S0, and vl stores (inserts) it in M(vl). Finally, the algorithm outputs σ , the lower bound on the influence. Computational complexity: Obtaining an arbitrary directed acyclic subnetwork from the original network takes O(|V | + |E|). Next, the algorithm iterates through the nodes V = {v1, . . . , vn}. For each node vk, Process Incoming Msg LB takes O(deg(vk)) and vk sends messages to its out-neighbors in O(deg(vk)). Hence, the worst case computational complexity is O(|V | + |E|). Generalized Nonbacktracking Bounds on the Influence 4.2. Tunable Nonbacktracking Lower Bound (t NB-LB) The first step of tunable NB-LB is the same as NB-LB, which is to order the vertex set as V = {v1, . . . , vn} to satisfy P(S0, vi) P(S0, vj), for every i < j. Next, given a nonnegative integer parameter t n, we obtain a t-size subnetwork IC(G[Vt], B[Vt]) and a new seed set S0 Vt, where G[Vt] is the vertex-induced subgraph which is induced by the set of nodes Vt = {v1, . . . , vt}, and B[Vt] is the corresponding transmission probability matrix. For each vi Vt, we compute the exact probability pt(vi) that node vi is influenced from the new seed set S0 Vt in the subnetwork IC(G[Vt], B[Vt]). Then, we start (non-parameterized) NB-LB algorithm from k = t + 1 with the new initial condition: for all k t, LB(vk) = pt(vk). (24) Finally, t NB-LB computes the lower bound as P vk V LB(vk). Algorithm 5 Tunable nonbacktracking lower bound (t NB-LB) parameter: non-negative integer t n Initialize: σ = 0 for k = 1 to t do LB(vk) = pt(vk) σ += LB(vk) for vi {N+(vk) {vj : j > t}} do M(vi).insert((LB(vk), B vkvi)) for k = t + 1 to n do LB(vk) = Process Incoming Msg LB(M(vk)) σ += LB(vk) for vi N+(vk) \ S0 do M(vi).insert((LB(vk), B vkvi)) For larger t, the algorithm results in a tighter bound. However, the computational complexity may increase exponentially with respect to t. To avoid such large computational complexity, the algorithm can adopt Monte Carlo simulations on the subnetwork. However, this modification results in probabilistic lower bounds, rather than theoretically guaranteed lower bounds. Nonetheless, this can still give a significant improvement, because the Monte Carlo simulations on a smaller network require less computation to stabilize the estimation. 4.3. Illustration of Nonbacktracking Lower Bounds (NB-LB) In Figure 5, we show step-wise lower bound computation by NB-LB on a small network IC(G, B) defined on an bidirected graph G = (V, E), where V = {a, b, c, d}, S0 = {a}, and every edge has the same transmission probability p. For each k, Table 1 shows the values of the key variables, M(vk), LB(vk), and (LB(vk), B vkvl) for the out-neighbors vl N+(vk)\S0, and shows the change in σ . Abbe, Kulkarni and Lee 𝐌𝐃𝐀𝐒 𝒌= 𝟏 𝒌= 𝟐 𝒌= 𝟑 𝒂 𝒑𝟐 𝒑𝟐+ 𝒑𝟑 𝒑𝟒 Figure 5: The step-wise illustration of NB-LB on the example network. k = 1 k = 2 k = 3 k = 4 M(vk) {(1, 1)} {(1, p)} {(1, p), (p, p)} {(p + p2 p3, p)} LB(vk) 1 p p + p2 p3 p2 + p3 p4 N +(vk) \ S0 {b, c} {c} {d} (LB(vk), B vkvl) to vl (1, p) to b and c (p, p) to c (p + p2 p3, p) to d σ 1 1 + p 1 + 2p + p2 p3 1 + 2p + 2p2 p4 Table 1: The values of the key variables in NB-LB on the example network in Figure 5. We obtain MDAS from the network as follows. Since d(S0, a) = 0, d(S0, b) = d(S0, c) = 1 and d(S0, d) = 2, we order the vertices as {v1 = a, v2 = b, v3 = c, v4 = d} so that d(S0, vi) d(S0, vj), for every i < j. NB-LB algorithm processes the nodes {v1 = a, v2 = b, v3 = c, v4 = d} sequentially. For example, at k =3, node c is processed. During the previous step at k = 1, node a sent the message (LB(a), B ac) to node c, and at k = 2, node b sent the message (LB(b), B bc) to node c. Thus, node c has the message: M(c) = {(LB(a), B ac), (LB(b), B bc)} = {(1, p), (p, p)}. Then, it computes LB(c) with the function Process Incoming Msg LB. LB(c) = Process Incoming Msg LB(M(c)) = B ac LB(a) + B bc LB(b)(1 B ac) = p + p2 p3. Recall that σ = 1 + p, at the end of the iteration k = 2. Thus, at k = 3, σ = 1 + p + LB(c) = 1 + 2p + p2 p3. Next, since N+(c) \ S0 = {d}, node c sends the message (LB(c), Bcd) = (p + p2 p3, p) to node d, concluding the process of the k = 3 step. At k = 4, the algorithm computes LB(d) = p(p + p2 p3) and the final lower bound: σ = 1 + 2p + p2 p3 + LB(d) = 1 + 2p + 2p2 p4. For this example network, there is another MDAS that satisfies the condition in 11 with the ordered vertices {v1 = a, v2 = c, v3 = b, v4 = d}. We can also obtain a lower bound using this MDAS, which is 1 + 2p + 2p2 p3. Notice that this value is different from the lower bound we computed above but still is a lower bound for the expected number of the influenced nodes. Generalized Nonbacktracking Bounds on the Influence Next, we compute t NB-LB on the same network with t = 3. First, for k = 1 to k = 3, the probabilities that nodes a, b, c are influenced are exactly computed as follows. LB(b) = p + p2 p3 LB(c) = p + p2 p3 Then, for k = 4, the algorithm computes the message M(d) = {(LB(c), B cd)} and, in turns, the lower bound LB(d) = B cd LB(c) = p(p + p2 p3). Thus, the final lower bound is σ t = 1 + 2p + 3p2 p3 p4. Finally, we compare the lower bounds to the influence. σ = 1 + 2p + 3p2 p3 p4 σ = 1 + 2p + 2p2 p4 σ t = 1 + 2p + 3p2 p3 p4 As expected, t NB-LB results in a tighter lower bound than NB-LB. In fact, t NB-LB is the same as the influence since node d can only be influenced through node c whose probability of being influenced is exactly computed in t NB-LB with t = 3. 5. Experimental Results In this section, we evaluate NB-UB and NB-LB in independent cascade models on a variety of classical synthetic networks. Network Generation. We consider 4 classical random graph models with the parameters shown as follows: Erdos Renyi random graphs with ER(n = 1000, p = 0.003), scale-free networks SF(n = 1000, α = 2.5), random regular graphs Reg(n = 1000, d = 3), and random tree graphs with power-law degree distributions T(n = 1000, α = 3). For each graph model, we generate 100 networks IC(G, p A) and a seed as follows. The bidirected graph G is defined on the largest connected component of a graph drawn from the graph model, the seed node s is a randomly selected vertex, and A is the adjacency matrix of G. The corresponding IC model has the same transmission probability p for every edge. Evaluation of Bounds. For each generated network, we compute the following quantities for each p {0.1, 0.2, . . . , 0.9}. σmc: the estimation of the influence with 106 Monte Carlo simulations. σ+: the upper bound obtained by NB-UB. σ+ spec: the spectral upper bound by (Lemonnier et al., 2014). σ : the lower bound obtained by NB-LB. σ prob: the probabilistic lower bound obtained by 10 Monte Carlo simulations. There has not been any tight lower bound for general networks. The lower bound introduced in (Khim et al., 2016) is efficient and tight for tree networks. However, for general Abbe, Kulkarni and Lee Figure 6: (a) The average relative gap of the upper bounds: NB-UB and the spectral upper bound in (Lemonnier et al., 2014) for various types of networks. (b) The average relative gap of the lower bounds: NB-LB and the probabilistic lower bound computed by MC simulations for various types of networks. Generalized Nonbacktracking Bounds on the Influence networks, since it considers all paths of length k {0, 1, ..., m} for some m [n], the computational complexity is O(|V |m) which is greater than the computational complexity of the nonbacktracking lower bound for any m > 2. Therefore, the probabilistic lower bound is chosen for the experiments. The sample size of 10 is determined to exceed the computational complexity of NB-LB algorithm. In Figure 6, we compare the average relative gap of the bounds for every network model and for each transmission probability, where the true value is assumed to be σmc. For example, the average relative gap of NB-UB for 100 Erdos Renyi networks {Ni}100 i=1 with the transmission probability p is computed by 1 100 P i [100] σ+[Ni] σmc[Ni] σmc[Ni] , where σ+[Ni] and σmc[Ni] denote NB-UB and the MC estimation, respectively, for the network Ni. Results. Figure 6 shows that NB-UB outperforms the upper bound in (Lemonnier et al., 2014) for the Erdos-Renyi and random bidirected 3-regular networks, and performs comparably for the scale-free networks. Also, NB-LB gives tighter bounds than the MC bounds on the Erdos-Renyi, scale-free, and random regular networks when the transmission probability is small, p < 0.4. NB-UB and NB-LB compute the exact influence for the tree networks since both algorithms avoid backtracking walks. Next, we show the bounds on exemplary networks. 5.1. Upper Bounds 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 Transmission Probability Upper bounds of the influence MC (5) MC (10) MC (30) MC (300) MC (3000) NB-UB Spectral 0 0.1 0.2 0.3 0.4 0.5 0.5 0.6 0.7 0.8 0.9 1 180.92 605.8 Figure 7: The figure compares various upper bounds on the influence in the 3-regular network in section 5.1. The MC upper bounds are computed with various simulation sizes and shown with the data points indicated with MC(N), where N is the number of simulations. The spectral upper bound in (Lemonnier et al., 2014) is shown in red line, and NB-UB is shown in green line. In order to illustrate typical behavior of the bounds, we have chosen the network in Figure 7 as follows. First, we generate 100 random 3-regular graphs G with 1000 nodes and assign a random seed s. Then, the corresponding IC model is defined as IC(G, B = p A). For each network, we compute NB-UB and MC estimation. Then, we compute the score for each network, where the score is defined as the sum of the square differences between the Abbe, Kulkarni and Lee upper bounds and MC estimations over the transmission probability p {0.1, 0.2,. . ., 0.9}. Finally, a graph whose score is the median from all 100 scores is chosen for Figure 7. In figure 7, we compare 1) the upper bounds introduced (Lemonnier et al., 2014) and 2) the probabilistic upper bounds obtained by Monte Carlo simulations with 99% confidence level, to NB-UB. The MC upper bounds are computed with the various sample sizes N {5, 10, 30, 300, 3000}. It is evident from the figure that a larger sample size provides a tighter probabilistic upper bound. NB-UB outperforms the bound by (Lemonnier et al., 2014) and the probabilistic MC bound when the transmission probability is relatively small. Further, it shows a similar trend as the MC simulations with large sample size. Next, we show r-nonbacktracking upper bounds on a small world graph G = SW(n, k, q) where n = 1000 is the number of nodes in the network, k = 4 is the number of neighbors of each node, and q = 0.01 is the re-wiring probability. The IC model on the small world graph G is defined as IC(G, B = p A), where A is the adjacency matrix of G, p is the transmission probability for every edge, and the seed node s is chosen arbitrarily. Figure 8: The figure compares r-nonbacktracking upper bounds with various values of r in a small world network. The MC estimation with simulation size 106 is computed for comparison. In Figure 8, we compare the r-nonbacktracking upper bounds with r {2, 3, 4, 5} to MC estimation. For p < 0.32, all upper bounds are very close to the influence, and for p > 0.42 the upper bounds with r 5 become close to the maximum value, 1000. Therefore, we show the bounds for p [0.32, 0.42]. Note that when r = 2, the r-nonbacktracking upper bound (r NB-UB) is the nonbacktracking upper bound (NB-UB). By avoiding r = 3 backtracking walks (triangles) in addition to backtracking walks, the upper bound is significantly improved. The same applies to r = 4, 5 nonbacktracking bounds. As r increases, the bounds get tighter and closer to the estimated influence. It is worth noting that while the upper bound results in deterministic, as opposed to probabilistic, upper bound, the computational complexity for r 4 exceed the complexity of MC approximation. Generalized Nonbacktracking Bounds on the Influence 5.2. Lower Bounds 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Transmission Probability Lower bounds of the influence MC (5) MC (12) MC (30) MC (300) MC (3000) NB-LB Figure 9: The figure shows lower bounds on the influence of a scale-free network in section 5.2. The probabilistic lower bounds are obtained by MC with various simulation sizes. The data points indicated with MC(N) are obtained by N number of simulations. NB-LB is shown in green line. In this section, we show lower bounds on an exemplary network. We adopt a similar selection process as described in Section 5.1 but with the scale-free networks, with 3000 nodes and α = 2.5. We compare probabilistic lower bounds obtained by MC with 99% confidence level to NB-LB. The bounds from Monte Carlo simulations are computed with various sample sizes N {5, 12, 30, 300, 3000}, which accounts for a constant, log(|V |), 0.01|V |, 0.1|V |, and |V |. NB-LB outperforms the probabilistic bounds by MC with small sample sizes. Recall that the computational complexity of NB-LB is O(|V | + |E|), which is the computational complexity of a constant number of Monte Carlo simulations. In Figure 9, it shows that NBLB is tighter than the probabilistic lower bounds with the same computational complexity. 6. Conclusion We propose both upper and lower bounds on the influence in the independent cascade models and provide algorithms to efficiently compute the bounds. The proposed upper bound is monotone and submodular. We extend the results by proposing tunable bounds which can adjust the trade-offbetween the efficiency and the accuracy and by using rnonbacktracking walks instead of (2-)nonbacktracking walks. Finally, the tightness and the performance of bounds are shown with the experimental results. Abbe, Kulkarni and Lee Acknowledgments The authors thank Colin Sandon for helpful discussions. This research was partly supported by the NSF CAREER Award CCF-1552131, the ARO grant W911NF-16-1-0051, and NSF Center for Science of Information (CSo I) Grant CCF-0939370. Generalized Nonbacktracking Bounds on the Influence Appendix A. Proof of Theorem 2 We start by defining the following events for the independent cascade model IC(G, B), where G = (V, E) and |V | = n, and a seed set S0 V . Definition 13. For any u, v V , l {0, . . . , n 1}, and S V , we define A(v) = {v is influenced} Al(v) = P Pl(S0 v){P is live} Al(u v) = P Pl(S0 u),P v{P is live and edge (u, v) is live} Al,S(v) = P {P Pl(S0 v):P w, w S}{P is live}. In other words, Al(v) is the event that node v is influenced by live paths of length l, Al(u v) is the event that v is influenced by node u with live paths of length l + 1, i.e., there exists a live path of length l + 1 from a seed to v that ends with edge (u, v), and Al,S(v) is the event that node v is influenced by length l live paths which do not include any node in S. Note that for any pair of events (A, B), we use the notation AB for A B. Lemma 1. For any v V , l=0 (1 pl(v)). For any v V and l {0, . . . , n 1}, u N (v) (1 pl(u v)). Proof. Recall that p(v) = P(A(v)), pl(v) = P(Al(v)), and pl(u v) = P(Al(u v)). p(v) = P( n 1 l=0 Al(v)) = 1 P( n 1 l=0 Al(v)C) l=0 P(Al(v)C) (25) l=0 (1 pl(v)). Equation (25) follows from positive correlation among the events Al(v)C for all v V , which can be proved by FKG inequality. Similarly, pl(v) = P( u N (v)Al(u v)) = 1 P( u N (v)Al(u v)C) u N (v) P(Al(u v)C) u N (v) (1 pl(u v)). (26) Abbe, Kulkarni and Lee Theorem 2. For any independent cascade model IC(G, B) and a seed set S0 V , l=0 (1 UBl(v))) =: σ+(S0), (27) where UBl(v) is obtained recursively as in Definition 7. Proof. We provide proof by induction. The initial condition, for l = 0, can be easily checked. For every s S0, s+ N+(s), u V \S0, and v N+(u), p0(s) = 1 UB0(s) = 1 p0(s s+) = Bss+ UB0(s s+) = Bss+ p0(u) = 0 UB0(u) = 0 p0(u v) = 0 UB0(u v) = 0. For each l L, assume that pl(v) UBl(v) and pl(u v) UBl(u v) for all u, v V . Since pl(s) = pl(s s+) = pl(s s) = 0 for every l 1, s S0, s+ N+(s), and s N (s), it is sufficient to show p L+1(v) UBL+1(v) and p L+1(u v) UBL+1(u v) for all u V \S0, and v N+(u), For any v V \ S0, p L+1(v) = P( u N (v)Euv AL,{v}(u)), where Euv denotes the event that edge (u, v) is live, i.e., P(Euv) = Buv. Thus, p L+1(v) = 1 P( u N (v)(Euv AL,{v}(u))C) u N (v) (1 P(Euv AL,{v}(u))) (28) u N (v) (1 p L(u v)) u N (v) (1 UBL(u v)) = UBL+1(v), (29) where Equation (28) is obtained by the positive correlation among the events Euv AL,{v}(u), and Equation (29) comes from the assumption. For any v V \ S0 and w N+(v), p L+1(v w) = P(Evw AL+1,{w}(v)) = Bvw P(AL+1,{w}(v)). (30) Generalized Nonbacktracking Bounds on the Influence Equation (30) follows from the independence between the events Evw and AL+1,{w}(v). If w N (v), p L+1(v w) = Bvw P( u N (v)\{w}Euv AL,{v,w}(u)) u N (v)\{w} (1 P(Euv AL,{v,w}(u))) u N (v)\{w} (1 p L(u v)) u N (v)\{w} (1 UBL(u v)) Equation (31) holds, since the two events satisfy Euv AL,{v,w}(u) Euv AL,{v}(u). Recall that, if w N (v), UBL+1(v w) = Bvw(1 1 UBL+1(v) 1 UBL(w v)) u N (v)\{w} (1 UBL(u v))). Thus, p L+1(v w) UBL+1(v w), for all w N+(v) N (v). If w / N (v), p L+1(v w) = Bvw P( u N (v)Euv AL,{v,w}(u)) u N (v) (1 UBL(u v)) = Bvw UBL+1(v) = UBL+1(v w). Hence, p L+1(v w) UBL+1(v w), for all w N+(v), concluding the proof. Finally, by Lemma 1, l=0 (1 pl(v))) l=0 (1 UBl(v))) = σ+(S0). Appendix B. Proof of Theorem 3 Theorem 3. For any independent cascade model IC(G, B), the nonbacktracking upper bound σ+ : 2V [0, |V |] is monotone and submodular. Abbe, Kulkarni and Lee Recall that l [n 1] (1 UBl(v))) UBl(u) = 1 Y w N (u) (1 UBl 1(w u)) UBl(u v) = Buv w N (u)\{v} (1 UBl 1(w u)) Since the summation of submodular functions is also submodular, it remains to prove that (1 Q l [n 1](1 UBl(v))) is submodular. Note that Be and N (v) for any e E and v V is independent of the seed set and that UB0(u v) for any u, v V is constant, so it is equivalent to prove the following statement. Claim 1. If fi(S) [0, 1] for all i [m] is monotone and submodular, then 1 Q i [m](1 fi(S)) is also monotone and submodular. Proof. We provide a proof by induction. The initial condition holds for k = 1. Assume that 1 Q i [k 1](1 fi(S)) is monotone (non decreasing) and submodular for some k m. Let gk 1(A) = 1 Q i [k 1](1 fi(A)). Then, for any A B V and s V , gk 1(A {s}) gk 1(A) gk 1(B {s}) gk 1(B) (32) fk(A {s}) fk(A) fk(B {s}) fk(B). (33) By definition, gk(A) = 1 (1 gk 1(A))(1 fk(A)). (34) gk is monotone as: gk(A {s}) = 1 (1 gk 1(A {s}))(1 fk(A {s})) 1 (1 gk 1(A))(1 fk(A)) = gk(A) (35) Next, to prove submodularity, we want to show: (1 gk 1(A))(1 fk(A)) (1 gk 1(A {s}))(1 fk(A {s})) (1 gk 1(B))(1 fk(B)) (1 gk 1(B {s}))(1 fk(B {s})). (36) To simplify, for any i [m] and S V , let (1 gi(S)) = g i(S) and (1 fi(S)) = f i(S). Then, Equation(32) and (33) become g k 1(A) g k 1(A {s}) g k 1(B) g k 1(B {s}) f k(A) f k(A {s}) f k(B) f k(B {s}) Generalized Nonbacktracking Bounds on the Influence and Equation(36) is equivalent to g k 1(A)f k(A) g k 1(A {s})f k(A {s}) g k 1(B)f k(B) g k 1(B {s})f k(B {s}). Let g k 1(B) g k 1(B {s}) = δg and f k(B) f k(B {s}) = δf. Note that as f and g are non-decreasing, δg, 0 and δf 0. Then, g k 1(A)f k(A) g k 1(A {s})f k(A {s}) (g k 1(A {s}) + δg)(f k(A {s}) + δf) g k 1(A {s})f k(A {s}) = δfδg + δgf k(A {s}) + δfg k 1(A {s}) δfδg + δgf k(B {s}) + δfg k 1(B {s}) = g k 1(B)f k(B) g k 1(B {s})f k(B {s}) Appendix C. Proof of Theorem 4 To prove Theorem 4, we first introduce the following events in IC(G, B) where G = (V, E) and |V | = n. Definition 14. For any v V , l {0, . . . , n 1}, we define A(v) = {v is influenced} A(P) = {P is live} Al(v) = P Pl(S0 v){P is live} = P Pl(S0 v)A(P). For any 2 r n and P Pr 1(v ), let Al(v, P) be the event that node v is influenced by a live path of length l (r 1) that does not include any node in path P. Next, we recall Definition 9 and Theorem 4. Definition 9. For all l {0, . . . , n 1}, u V and P Pr 1(u ), UBl(u) [0, 1] and UBl(u, P) [0, 1] are defined recursively as follows. Initial condition: For every l r 1 and u V , UBl(u) = f( P Pl(S0 u)(P, 1)). For every P Pr 1(s ) and s S0, UBr 1(s, P) = 1. Recursion: For every l {r, . . . , n 1}, s S0, u, x V \S0, v N+(u)\S0, and Q Pr 1(v ), UBl(s) = 0, UBl(s, P) = 0 for any path P UBl(v, Q) = 1 Y P Q (1 Buv UBl 1(u, P)) UBl(x) = f( P {P Pr 1(y x): y V \S0}(P, UBl(y, P))) Abbe, Kulkarni and Lee Theorem 4. For any independent cascade model IC(G, B), a seed set S0 V and an integer r 2, l=0 (1 UBl(v))) =: σ+ r (S0) σ+(S0), (37) where UBl(v) is obtained recursively as in Definition 9. Proof. First, let p(v) = P(A(v)), p(P) = P(A(P)), pl(v) = P(Al(v)), and pl(v, P) = P(Al(v, P)). By Lemma 1, it is sufficient to show pl(v) UBl(v) for all v V and l {0, 1, . . . , n 1}. We provide proof by induction on l. The initial condition can be easily checked. For every l r 1, u V , v N+(u) and s S0, pl(u) = P( P Pl(S0 u)A(P)) f( P Pl(S0 u)(P, 1)) = UBl(u). For every P Pr 1(s ) and s S0, pr 1(s, P) = 1 UBr 1(s, P) = 1. Next, for each l L, assume that pl(v) UBl(v) and pl(v, P) UBl(v, P) for every v V and P Pr 1(v ). For every l r, s S0 and P Pr 1(s ), pl(s) = 0 and pl(s, P) = 0, since a seed cannot be influenced again by other seeds and cannot influence other nodes after the first step l = 0. Therefore, to conclude the proof, it is sufficient to show p L+1(v) UBL+1(v) and p L+1(v, P) UBL+1(v, P) for every v V \ S0 and P Pr 1(v ). For any u, v V \ S0, P Pr 1(u ), and Q Pr 1(v ) such that P Q, let P = (u = u1, . . . , uk) and Q = (v = v1, . . . , vk). Recall that if P Q, (u, v, u3 = v2 . . . , uk = vk 1, vk) is a path of length r. Then, for any P Q, AL(u, P) contains all events that an in-neighbor u of node v is influenced by a live path that does not contain any node in path P = (u, v, v2, . . . , vk 1). Notice that the event AL+1(v, Q) is the union of events that an in-neighbor u of node v is influenced by a live path that does not contain any node in path Q = (v, v2, . . . , vk) and edge (u, v) is live. In other words, AL+1(v, Q) = u N (v)Euv AL(u, P Q), where P Q = (u, v, u3 = v2 . . . , uk = vk 1, vk) and Euv is the event that edge (u, v) is live. Thus, AL+1(v, Q) P QEuv AL(u, P), which implies p L+1(v, Q) P( P QEuv AL(u, P)). By the positive correlation among the events Euv AL(u, P), p L+1(v, Q) 1 Y P Q (1 Puv P(AL(u, P))) P Q (1 Puv UBL(u, P)) = UBL+1(v, Q), (38) where Equation (38) follows from the assumption. Generalized Nonbacktracking Bounds on the Influence For any x V \ S0, p L+1(x) = P( P PL+1(S0 x)A(P)) P( P {P Pr 1(y x): y V \S0}AL+1(y, P)A(P)) (39) f( P {P Pr 1(y x): y V \S0}(P, p L+1(y, P))) (40) f( P {P Pr 1(y x): y V \S0}(P, UBL+1(y, P))) (41) = UBL+1(x). Equation (39) follows from P PL+1(S0 x)A(P) P {P Pr 1(y x): y V \S0}AL+1(y, P) since the right hand side also includes events that y is influenced by a walk rather than a path. Equation (40) holds because the function f computes the upper bound rather than the exact probability of the input events, and Equation (41) comes from the assumption. Therefore, p L+1(v, Q) UBL+1(v, Q) and p L+1(x) UBL+1(x), concluding the proof by induction. Then, by Lemma 1, l=0 (1 pl(v))) l=0 (1 UBl(v))) = σ+ r (S0). Appendix D. Proof of Theorem 5 Theorem 5. For any independent cascade model IC(G, B) and a seed set S0 and their MDAS IC(G , B ), vk V LB(vk) =: σ (S0), (42) where LB(vk) is obtained recursively as in Definition 12. Proof. We provide proof by induction. For any vk V , let A(vk) be the event that node vk is influenced in MDAS IC(G , B ), and for every edge (vj, vk), let Evj,vk be the event that edge (vj, vk) is live, i.e., P(Evj,vk) = B vjvk. Recall that p(vk) = P(A(vk)). The initial condition k = 1 holds, since p(v1) = 1 LB(v1) = 1 (v1 is a seed). Now, for every k K, assume p(vk) LB(vk). Then, for node v K+1, p(v K+1) = P( vj N (v K+1)Evjv K+1A(vj)). Abbe, Kulkarni and Lee We re-label vertices in N (v K+1) = {u1, . . . , um(K+1)} where m(K + 1) = in-deg(v K+1), and let Qi K+1 = B uiv K+1. Then, for any integer m m(K + 1), p(v K+1) = P( m(K+1) i=1 Euiv K+1A(ui)) P( m i=1Euiv K+1A(ui)) i=1 P(Euiv K+1A(ui)) j=1 P(Euiv K+1A(ui)Eujv K+1A(uj)) (43) i=1 Qi K+1P(A(ui)) j=1 Qi K+1Qj K+1P(A(ui)A(uj)) (44) i=1 Qi K+1P(A(ui))(1 j=1 Qj K+1). (45) Equation (43) follows from the principle of inclusion and exclusion. Equation (44) results from the Independence between the event that an edge ending with v K+1 is live and the event that a node vi is influenced where i < K + 1. Equation (45) holds since P(A(ui)) P(A(ui)A(uj)). Now, define m = max{m m(K + 1) : Pm 1 j=1 Qj K+1 1}. Then, i=1 Qi K+1P(A(ui))(1 j=1 Qj K+1) i=1 Qi K+1LB(ui)(1 j=1 Qj K+1) (46) = LB(v K+1). Equation (46) follows since 1 Pi 1 j=1 Qj K+1 0 for all i m by the definition of m . Thus, p(vi) LB(vi) for all vi V , concluding the induction proof. Finally, i=1 p(vi) (47) i=1 LB(vi) = σ (S0). Equation (47) holds since its right hand side equals to the influence of the subnetwork, IC(G , B ). Generalized Nonbacktracking Bounds on the Influence Emmanuel Abbe and Colin Sandon. Detection in the stochastic block model with multiple clusters: proof of the achievability conjectures, acyclic bp, and the informationcomputation gap. ar Xiv preprint ar Xiv:1512.09080, 2015. Emmanuel Abbe, Sanjeev Kulkarni, and Eun Jee Lee. Nonbacktracking bounds on the influence in independent cascade models. In Advances in Neural Information Processing Systems 30, pages 1407 1416. 2017. Charles Bordenave, Marc Lelarge, and Laurent Massouli e. Non-backtracking spectrum of random graphs: community detection and non-regular ramanujan graphs. In Foundations of Computer Science (FOCS), 2015 IEEE 56th Annual Symposium on, pages 1347 1357. IEEE, 2015. Wei Chen and Shang-Hua Teng. Interplay between social influence and network centrality: A comparative study on shapley centrality and single-node-influence centrality. In Proceedings of the 26th International Conference on World Wide Web, pages 967 976. International World Wide Web Conferences Steering Committee, 2017. Wei Chen, Yajun Wang, and Siyu Yang. Efficient influence maximization in social networks. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 199 208. ACM, 2009. Moez Draief, Ayalvadi Ganesh, and Laurent Massouli e. Thresholds for virus spread on networks. In Proceedings of the 1st international conference on Performance evaluation methodolgies and tools, page 51. ACM, 2006. Cees M Fortuin, Pieter W Kasteleyn, and Jean Ginibre. Correlation inequalities on some partially ordered sets. Communications in Mathematical Physics, 22(2):89 103, 1971. Mark Granovetter. Threshold models of collective behavior. American journal of sociology, pages 1420 1443, 1978. Brian Karrer and Mark EJ Newman. Message passing approach for general epidemic models. Physical Review E, 82(1):016101, 2010. Brian Karrer, MEJ Newman, and Lenka Zdeborov a. Percolation on sparse networks. Physical review letters, 113(20):208702, 2014. David Kempe, Jon Kleinberg, and Eva Tardos. Maximizing the spread of influence through a social network. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 137 146. ACM, 2003. Abdelmajid Khelil, Christian Becker, Jing Tian, and Kurt Rothermel. An epidemic model for information diffusion in manets. In Proceedings of the 5th ACM international workshop on Modeling analysis and simulation of wireless and mobile systems, pages 54 60. ACM, 2002. Abbe, Kulkarni and Lee Justin T Khim, Varun Jog, and Po-Ling Loh. Computing and maximizing influence in linear threshold and triggering models. In Advances in Neural Information Processing Systems, pages 4538 4546, 2016. Florent Krzakala, Cristopher Moore, Elchanan Mossel, Joe Neeman, Allan Sly, Lenka Zdeborov a, and Pan Zhang. Spectral redemption in clustering sparse networks. Proceedings of the National Academy of Sciences, 110(52):20935 20940, 2013. Eun Jee Lee, S. Kamath, E. Abbe, and S. R. Kulkarni. Spectral bounds for independent cascade model with sensitive edges. In 2016 Annual Conference on Information Science and Systems (CISS), pages 649 653, March 2016. doi: 10.1109/CISS.2016.7460579. Remi Lemonnier, Kevin Scaman, and Nicolas Vayatis. Tight bounds for influence in diffusion networks and application to bond percolation and epidemiology. In Advances in Neural Information Processing Systems, pages 846 854, 2014. Jure Leskovec, Lada A Adamic, and Bernardo A Huberman. The dynamics of viral marketing. ACM Transactions on the Web (TWEB), 1(1):5, 2007. Dunia Lopez-Pintado and Duncan J Watts. Social influence, binary decisions and collective dynamics. Rationality and Society, 20(4):399 443, 2008. Boris Shulgin, Lewi Stone, and Zvia Agur. Pulse vaccination strategy in the sir epidemic model. Bulletin of Mathematical Biology, 60(6):1123 1148, 1998. Chi Wang, Wei Chen, and Yajun Wang. Scalable influence maximization for independent cascade model in large-scale social networks. Data Mining and Knowledge Discovery, 25 (3):545 576, 2012. Duncan J Watts. A simple model of global cascades on random networks. Proceedings of the National Academy of Sciences, 99(9):5766 5771, 2002. Jiang Yang and Scott Counts. Predicting the speed, scale, and range of information diffusion in twitter. 2010.