# modelindependent_online_learning_for_influence_maximization__b8cd20f6.pdf Model-Independent Online Learning for Influence Maximization Sharan Vaswani 1 Branislav Kveton 2 Zheng Wen 2 Mohammad Ghavamzadeh 3 Laks V.S. Lakshmanan 1 Mark Schmidt 1 We consider influence maximization (IM) in social networks, which is the problem of maximizing the number of users that become aware of a product by selecting a set of seed users to expose the product to. While prior work assumes a known model of information diffusion, we propose a novel parametrization that not only makes our framework agnostic to the underlying diffusion model, but also statistically efficient to learn from data. We give a corresponding monotone, submodular surrogate function, and show that it is a good approximation to the original IM objective. We also consider the case of a new marketer looking to exploit an existing social network, while simultaneously learning the factors governing information propagation. For this, we propose a pairwise-influence semi-bandit feedback model and develop a Lin UCB-based bandit algorithm. Our model-independent analysis shows that our regret bound has a better (as compared to previous work) dependence on the size of the network. Experimental evaluation suggests that our framework is robust to the underlying diffusion model and can efficiently learn a near-optimal solution. 1. Introduction The aim of viral marketing is to spread awareness about a specific product via word-of-mouth information propagation over a social network. More precisely, marketers (agents) aim to select a fixed number of influential users (called seeds) and provide them with free products or discounts. They assume that these users will influence their neighbours and, transitively, other users in the social network to adopt the product. This will thus result in information propagating across the network as more users adopt or become aware of the product. The marketer has a budget on the number of free products and must choose seeds 1University of British Columbia 2Adobe Research 3Deep Mind (The work was done when the author was with Adobe Research). Correspondence to: Sharan Vaswani . Proceedings of the 34 th International Conference on Machine Learning, Sydney, Australia, PMLR 70, 2017. Copyright 2017 by the author(s). in order to maximize the influence spread which is the expected number of users that become aware of the product. This problem is referred to as influence maximization (IM). Existing solutions to the IM problem require as input, the underlying diffusion model which describes how information propagates through the network. The IM problem has been studied under various probabilistic diffusion models such as independent cascade (IC) and linear threshold (LT) models (Kempe et al., 2003). Under these common models, there has been substantial work on developing efficient heuristics and approximation algorithms (Chen et al., 2009; Leskovec et al., 2007; Goyal et al., 2011b;a; Tang et al., 2014; 2015). Unfortunately, knowledge of the underlying diffusion model and its parameters is essential for the existing IM algorithms to perform well. For example, Du et al. (2014) empirically showed that misspecification of the diffusion model can lead to choosing bad seeds and consequently to a low spread. In practice, it is not clear how to choose from amongst the increasing number of plausible diffusion models (Kempe et al., 2003; Gomez Rodriguez et al., 2012; Li et al., 2013). Even if we are able to choose a diffusion model according to some prior information, the number of parameters for these models scales with the size of the network (for example, it is equal to the number of edges for both the IC and LT models) and it is not clear how to set these. Goyal et al. (2011a) showed that even when assuming the IC or LT model, correct knowledge of the model parameters is critical to choosing good seeds that lead to a large spread. Some papers try to learn these parameters from past propagation data (Saito et al., 2008; Goyal et al., 2010; Netrapalli & Sanghavi, 2012). However in practice, such data is hard to obtain and the large number of parameters makes this learning challenging. To overcome these difficulties, we propose a novel parametrization for the IM problem in terms of pairwise reachability probabilities (Section 2). This parametrization depends only on the state of the network after the information diffusion has taken place. Since it does not depend on how information diffuses, it is agnostic to the underlying diffusion model. To select seeds based on these reachability probabilities, we propose a monotone and submodular surrogate objective function based on the notion of maximum reachability (Section 3). Our surrogate function can be optimized efficiently and is a a good approximation to Model-Independent Online Learning for Influence Maximization the IM objective. We theoretically bound the quality of this approximation. Our parametrization may be of independent interest to the IM community. Next, we consider learning how to choose good seeds in an online setting. Specifically, we focus on the case of a new marketer looking to exploit an existing network to market their product. They need to choose a good seed set, while simultaneously learning the factors affecting information propagation. This motivates the learning framework of IM semi-bandits (Vaswani et al., 2015; Chen et al., 2016; Wen et al., 2017). In these works, the marketer performs IM over multiple rounds and learns about the factors governing the diffusion on the fly. Each round corresponds to an IM attempt for the same or similar products. Each attempt incurs a loss in the influence spread (measured in terms of cumulative regret) because of the lack of knowledge about the diffusion process. The aim is to minimize the cumulative regret incurred across multiple such rounds. This leads to the classic exploration-exploitation trade-off where the marketer must either choose seeds that either improve their knowledge about the diffusion process ( exploration ) or find a seed set that leads to a large expected spread ( exploitation ). Note that all previous works on IM semi-bandits assume the IC model. We propose a novel semi-bandit feedback model based on pairwise influence (Section 4). Our feedback model is weaker than the edge-level feedback proposed in (Chen et al., 2016; Wen et al., 2017). Under this feedback, we formulate IM semi-bandit as a linear bandit problem and propose a scalable Lin UCB-based algorithm (Section 5). We bound the cumulative regret of this algorithm (Section 6) and show that our regret bound has the optimal dependence on the time horizon, is linear in the cardinality of the seed set, and as compared to the previous literature, has a better dependence on the size of the network. In Section 7, we describe how to construct features based on the graph Laplacian eigenbasis and describe a practical implementation of our algorithm. Finally, in Section 8, we empirically evaluate our proposed algorithm on a real-world network and show that it is statistically efficient and robust to the underlying diffusion model. 2. Influence Maximization The IM problem is characterized by the triple (G, C, D), where G is a directed graph encoding the topology of the social network, C is the collection of feasible seed sets, and D is the underlying diffusion model. Specifically, G = (V, E), where V = {1, 2, . . . , n} and E are the node and edge sets of G, with cardinalities n = |V| and m = |E|, respectively. The collection of feasible seed sets C is determined by a cardinality constraint on the sets and possibly some combinatorial constraints (e.g. matroid constraints) that rule out some subsets of V. This implies that C {S V : |S| K}, for some K n. The diffusion model D specifies the stochastic process under which influence is propagated across the social network once a seed set S 2 C is selected. Without loss of generality, we assume that all stochasticity in D is encoded in a random vector w, referred to as the diffusion random vector. Note that throughout this paper, we denote vectors in bold case. We assume that each diffusion has a corresponding w sampled independently from an underlying probability distribution P specific to the diffusion model. For the widelyused models IC and LT, w is an m-dimensional binary vector encoding edge activations for all the edges in E, and P is parametrized by m influence probabilities, one for each edge. Once w is sampled, we use D(w) to refer to the particular realization of the diffusion model D. Note that by definition, D(w) is deterministic, conditioned on w. Given the above definitions, an IM attempt can be described as: the marketer first chooses a seed set S 2 C and then nature independently samples a diffusion random vector w P. Note that the influenced nodes in the diffusion are completely determined by S and D(w). We use the indicator 1 2 {0, 1} to denote if the node v is influenced under the seed set S and the particular realization D(w). For a given (G, D), once a seed set S C is chosen, for each node v 2 V, we use F(S, v) to denote the probability that v is influenced under the seed set S, i.e., F(S, v) = E where the expectation is over all possible realizations D(w). We denote by F(S) = P v2V F(S, v), the expected number of nodes that are influenced when the seed set S is chosen. The aim of the IM problem is to maximize F(S) subject to the constraint S 2 C, i.e., to find S 2 arg max S2C F(S). Although IM is an NP-hard problem in general, under common diffusion models such as IC and LT, the objective function F(S) is monotone and submodular, and thus, a near-optimal solution can be computed in polynomial time using a greedy algorithm (Nemhauser et al., 1978). In this work, we assume that D is any diffusion model satisfying the following monotonicity assumption: Assumption 1. For any v 2 V and any subsets S1 S2 V, if F(S1, v) F(S2, v), then F(S, v) is monotone in S. Note that all progressive diffusion models (models where once the user is influenced, they can not change their state), including those in (Kempe et al., 2003; Gomez Rodriguez et al., 2012; Li et al., 2013) satisfy Assumption 1. 3. Surrogate Objective We now motivate and propose a surrogate objective for the IM problem based on the notion of maximal pairwise reachability. We start by defining some useful notation. For any set S V and any set of pairwise probabilities p : V V ! [0, 1], for all nodes v 2 V, we define f(S, v, p) = maxu2S pu,v (2) Model-Independent Online Learning for Influence Maximization where pu,v is the pairwise probability associated with the ordered node pair (u, v). We further define f(S, p) = P v2V f(S, v, p). Note that for all p, f(S, p) is always monotone and submodular in S (Krause & Golovin, 2012). For any pair of nodes u, v 2 V, we define the pairwise reachability from u to v as p u,v = F({u}, v), i.e., the probability that v will be influenced, if u is the only seed node under graph G and diffusion model D. Throughout this paper, we use source node and seed interchangeably and refer to the nodes not in the seed set S as target nodes. We define f(S, v, p ) = maxu2S p u,v as the maximal pairwise reachability from the seed set S to the target node v. Our proposed surrogate objective for the IM problem is f(S, p ) = P v2V f(S, v, p ). Based on this objective, an approximate solution e S to the IM problem can be obtained by maximizing f(S, p ) under the constraint S 2 C, e S 2 arg max S2C f(S, p ) (3) Recall that S is the optimal solution to the IM problem. To quantify the quality of the surrogate, we define the surrogate approximation factor as = f( e S, p )/F(S ). The following theorem, (proved in Appendix A) states that we can obtain the following upper and lower bounds on : Theorem 1. For any graph G, seed set S 2 C, and diffusion model D satisfying Assumption 1, 1 f(S, p ) F(S), 2 If F(S) is submodular in S, then 1/K 1. The above theorem implies that for any progressive model satisfying Assumption 1, maximizing f(S, p ) is equivalent to maximizing a lower-bound on the true spread F(S). For both IC and LT models, F(S) is both monotone and submodular, and the approximation factor can be bounded from below by 1/K. In Section 8, we empirically show that in cases of practical interest, f(S, p ) is a good approximation to F(S) and that is much larger than 1/K. Finally, note that solving e S 2 arg max S2C f(S, p ) exactly might be computationally intractable and thus we need to compute a near-optimal solution based on an approximation algorithm. In this paper, we refer to such approximation algorithms as oracles to distinguish them from learning algorithms. Let ORACLE be a specific oracle and let b S = ORACLE(G, C, p) be the seed set output by it. For any 2 [0, 1], we say that ORACLE is an -approximation algorithm if for all p : V V ! [0, 1], f( b S, p) max S2C f(S, p). For our particular case, since f(S, p ) is submodular, a valid oracle is the greedy algorithm which gives an = 1 1/e approximation (Nemhauser et al., 1978). Hence, given the knowledge of p , we can obtain an approcimate solution to the IM problem without knowing the exact underlying diffusion model. 4. Influence Maximization Semi-Bandits We now focus on the case of a new marketer trying to learn the pairwise reachabilities by repeatedly interacting with the network. We describe the observable feedback (Section 4.2) and the learning framework (Section 4.3). 4.1. Influence Maximization Semi-Bandits In an influence maximization semi-bandit problem, the agent (marketer) knows both G and C, but does not know the diffusion model D. Specifically, the agent knows neither the model of D, for instance whether D is the IC or LT model; nor its parameters, for instance the influence probabilities in the IC or LT model. Consider a scenario in which the agent interacts with the social network for T rounds. At each round t 2 {1, . . . , T}, the agent first chooses a seed set St 2 C based on its prior knowledge and past observations and then nature independently samples a diffusion random vector wt P. Influence thus diffuses in the social network from St according to D(wt). The agent s reward at round t is the number of the influenced nodes St, v, D(wt) Recall that by definition, E [rt|St, D(wt)] = F (St). After each such IM attempt, the agent observes the pairwise influence feedback (described next) and uses it to improve the subsequent IM attempts. The agent s objective is to maximize the expected cumulative reward across the T rounds, i.e., to maximize E . This is equivalent to minimizing the cumulative regret defined subsequently. 4.2. Pairwise Influence Feedback Model We propose a novel IM semi-bandit feedback model referred to as pairwise influence feedback. Under this feedback model, at the end of each round t, the agent observes 1 {u}, v, D(wt) for all u 2 St and all v 2 V. In other words, it observes whether or not v would be influenced, if the agent selects S = {u} as the seed set under the diffusion model D(wt). This form of semi-bandit feedback is plausible in most IM scenarios. For example, on sites like Facebook, we can identify the user who influenced another user to share or like an article, and thus, can transitively trace the propagation to the seed which started the diffusion. Note that our assumption is strictly weaker than (and implied by) edge level semi-bandit feedback (Chen et al., 2016; Wen et al., 2017): from edge level feedback, we can identify the edges along which the diffusion travelled, and thus, determine whether a particular source node is responsible for activating a target node. However, from pairwise feedback, it is impossible to infer a unique edge level feedback. 4.3. Linear Generalization Parametrizing the problem in terms of reachability probabilities results in O(n2) parameters that need to be Model-Independent Online Learning for Influence Maximization learned. Without any structural assumptions, this becomes intractable for large networks. To develop statistically efficient algorithms for large-scale IM semi-bandits, we make a linear generalization assumption similar to (Wen et al., 2015; 2017). Assume that each node v 2 V is associated with two vectors of dimension d, the seed (source) weight v 2 0 2: Initialize u,0 λId, bu,0 0, b u,0 0 for all u 2 V, and UCB pu,v 1 for all u, v 2 V 3: for t = 1 to T do 4: Choose St ORACLE (G, C, p) 5: for u 2 St do 6: Get pairwise influence feedback yu,t 7: bu,t bu,t 1 + Xyu,t 8: u,t u,t 1 + σ 2XXT 9: b u,t σ 2 1 10: pu,v Proj[0,1] hb u,txvi + ckxvk 1 , 8v 2 V 11: end for 12: for u 62 St do 13: bu,t = bu,t 1 14: u,t = u,t 1 15: end for 16: end for with any diffusion model D satisfying Assumption ]refassum:monotone. The only requirement to apply DILin UCB is that the IM semi-bandit provides the pairwise influence feedback described in Section 4.2. The inputs to DILin UCB include the network topology G, the collection of the feasible sets C, the optimization algorithm ORACLE, the target feature matrix X, and three algorithm parameters c, λ, σ > 0. The parameter λ is a regularization parameter whereas σ is proportional to the noise in the observations and hence controls the learning rate. For each source node u 2 V and time t, we define the Gram matrix u,t 2 0, any feature matrix X, any -approximation oracle ORACLE, and any c satisfying + 2 log (n2T) + if we apply DILin UCB with input (ORACLE, X, c, λ, σ), then its -scaled cumulative regret is upper-bounded as 1 + n T dλσ2 For the tabular case X = I, we obtain a tighter bound Recall that specifies the quality of the surrogate approximation. Notice that if we choose λ = σ = 1, and choose c s.t. Inequality 5 is tight, then our regret bound is e O(n2d KT/( )) for general feature matrix X, and e O(n2.5p KT/( )) in the tabular case. Here the e O hides log factors. We now briefly discuss the tightness of our regret bounds. First, note that the O(1/ ) factor is due to the surrogate objective approximation discussed in Section 3, and the O(1/ ) factor is due to the fact that ORACLE is an -approximation algorithm. Second, note that the e O( T)-dependence on time is near-optimal, and the e O( K)-dependence on the cardinality of the seed sets is standard in the combinatorial semi-bandit literature (Kveton et al., 2015). Third, for general X, notice that the e O(d)-dependence on feature dimension is standard in linear bandit literature (Dani et al., 2008; Wen et al., 2015). To explain the e O(n2) factor in this case, notice that one O(n) factor is due to the magnitude of the reward (the reward is from 0 to n, rather than 0 to 1), whereas one e O(pn) factor is due to the statistical dependence of the pairwise reachabilities. Assuming statistical independence between these reachabilities (similar to Chen et al. (2016)), we can shave off this e O(pn) factor. However this assumption is unrealistic in practice. Another e O(pn) is due to the fact that we learn one u for each source node u (i.e. there is no generalization across the source nodes). Finally, for the tabular case X = I, the dependence on d no longer exists, but there is another e O(pn) factor due to the fact that there is no generalization across target nodes. We conclude this section by sketching the proof for Theorem 2 (the detailed proof is available in Appendix B and Appendix C). We define the good event as u)| ckxvk 1 u,t 1 8u, v 2 V, t T}, and the bad event F as the complement of F. We then decompose the -scaled regret R (T) over F and F, and obtain the following inequality: where P(F) is the probability of F. The regret bounds in Theorem 2 are derived based on worst-case bounds on PT u,t 1 (Appendix B.2), and a bound on P(F) based on the self-normalized bound for matrix-valued martingales developed in Theorem 3 (Appendix C). 7. Practical Implementation In this section, we briefly discuss how to implement our proposed algorithm, DILin UCB, in practical semi-bandit IM problems. Specifically, we will discuss how to construct features in Section 7.1, how to enhance the practical performance of DILin UCB based on Laplacian regularization in Section 7.2, and how to implement DILin UCB computationally efficiently in real-world problems in Section 7.3. 7.1. Target Feature Construction Although DILin UCB is applicable with any target feature matrix X, in practice, its performance is highly dependent on the quality of X. In this subsection, we motivate and propose a systematic feature construction approach based on the unweighted Laplacian matrix of the network topology G. For all u 2 V, let p u 2