# modelindependent_online_learning_for_influence_maximization__b8cd20f6.pdf

Model-Independent Online Learning for Inﬂuence Maximization

Sharan Vaswani 1 Branislav Kveton 2 Zheng Wen 2 Mohammad Ghavamzadeh 3 Laks V.S. Lakshmanan 1

Mark Schmidt 1

We consider inﬂuence maximization (IM) in social networks, which is the problem of maximizing the number of users that become aware of a product by selecting a set of seed users to expose the product to. While prior work assumes a known model of information diffusion, we propose a novel parametrization that not only makes our framework agnostic to the underlying diffusion model, but also statistically efﬁcient to learn from data. We give a corresponding monotone, submodular surrogate function, and show that it is a good approximation to the original IM objective. We also consider the case of a new marketer looking to exploit an existing social network, while simultaneously learning the factors governing information propagation. For this, we propose a pairwise-inﬂuence semi-bandit feedback model and develop a Lin UCB-based bandit algorithm. Our model-independent analysis shows that our regret bound has a better (as compared to previous work) dependence on the size of the network. Experimental evaluation suggests that our framework is robust to the underlying diffusion model and can efﬁciently learn a near-optimal solution.

1. Introduction The aim of viral marketing is to spread awareness about a speciﬁc product via word-of-mouth information propagation over a social network. More precisely, marketers (agents) aim to select a ﬁxed number of inﬂuential users (called seeds) and provide them with free products or discounts. They assume that these users will inﬂuence their neighbours and, transitively, other users in the social network to adopt the product. This will thus result in information propagating across the network as more users adopt or become aware of the product. The marketer has a budget on the number of free products and must choose seeds

1University of British Columbia 2Adobe Research 3Deep Mind (The work was done when the author was with Adobe Research). Correspondence to: Sharan Vaswani <sharanv@cs.ubc.ca>.

Proceedings of the 34 th International Conference on Machine Learning, Sydney, Australia, PMLR 70, 2017. Copyright 2017 by the author(s).

in order to maximize the inﬂuence spread which is the expected number of users that become aware of the product. This problem is referred to as inﬂuence maximization (IM).

Existing solutions to the IM problem require as input, the underlying diffusion model which describes how information propagates through the network. The IM problem has been studied under various probabilistic diffusion models such as independent cascade (IC) and linear threshold (LT) models (Kempe et al., 2003). Under these common models, there has been substantial work on developing efﬁcient heuristics and approximation algorithms (Chen et al., 2009; Leskovec et al., 2007; Goyal et al., 2011b;a; Tang et al., 2014; 2015).

Unfortunately, knowledge of the underlying diffusion model and its parameters is essential for the existing IM algorithms to perform well. For example, Du et al. (2014) empirically showed that misspeciﬁcation of the diffusion model can lead to choosing bad seeds and consequently to a low spread. In practice, it is not clear how to choose from amongst the increasing number of plausible diffusion models (Kempe et al., 2003; Gomez Rodriguez et al., 2012; Li et al., 2013). Even if we are able to choose a diffusion model according to some prior information, the number of parameters for these models scales with the size of the network (for example, it is equal to the number of edges for both the IC and LT models) and it is not clear how to set these. Goyal et al. (2011a) showed that even when assuming the IC or LT model, correct knowledge of the model parameters is critical to choosing good seeds that lead to a large spread. Some papers try to learn these parameters from past propagation data (Saito et al., 2008; Goyal et al., 2010; Netrapalli & Sanghavi, 2012). However in practice, such data is hard to obtain and the large number of parameters makes this learning challenging.

To overcome these difﬁculties, we propose a novel parametrization for the IM problem in terms of pairwise reachability probabilities (Section 2). This parametrization depends only on the state of the network after the information diffusion has taken place. Since it does not depend on how information diffuses, it is agnostic to the underlying diffusion model. To select seeds based on these reachability probabilities, we propose a monotone and submodular surrogate objective function based on the notion of maximum reachability (Section 3). Our surrogate function can be optimized efﬁciently and is a a good approximation to

Model-Independent Online Learning for Inﬂuence Maximization

the IM objective. We theoretically bound the quality of this approximation. Our parametrization may be of independent interest to the IM community.

Next, we consider learning how to choose good seeds in an online setting. Speciﬁcally, we focus on the case of a new marketer looking to exploit an existing network to market their product. They need to choose a good seed set, while simultaneously learning the factors affecting information propagation. This motivates the learning framework of IM semi-bandits (Vaswani et al., 2015; Chen et al., 2016; Wen et al., 2017). In these works, the marketer performs IM over multiple rounds and learns about the factors governing the diffusion on the ﬂy. Each round corresponds to an IM attempt for the same or similar products. Each attempt incurs a loss in the inﬂuence spread (measured in terms of cumulative regret) because of the lack of knowledge about the diffusion process. The aim is to minimize the cumulative regret incurred across multiple such rounds. This leads to the classic exploration-exploitation trade-off where the marketer must either choose seeds that either improve their knowledge about the diffusion process ( exploration ) or ﬁnd a seed set that leads to a large expected spread ( exploitation ). Note that all previous works on IM semi-bandits assume the IC model.

We propose a novel semi-bandit feedback model based on pairwise inﬂuence (Section 4). Our feedback model is weaker than the edge-level feedback proposed in (Chen et al., 2016; Wen et al., 2017). Under this feedback, we formulate IM semi-bandit as a linear bandit problem and propose a scalable Lin UCB-based algorithm (Section 5). We bound the cumulative regret of this algorithm (Section 6) and show that our regret bound has the optimal dependence on the time horizon, is linear in the cardinality of the seed set, and as compared to the previous literature, has a better dependence on the size of the network. In Section 7, we describe how to construct features based on the graph Laplacian eigenbasis and describe a practical implementation of our algorithm. Finally, in Section 8, we empirically evaluate our proposed algorithm on a real-world network and show that it is statistically efﬁcient and robust to the underlying diffusion model.

2. Inﬂuence Maximization

The IM problem is characterized by the triple (G, C, D), where G is a directed graph encoding the topology of the social network, C is the collection of feasible seed sets, and D is the underlying diffusion model. Speciﬁcally, G = (V, E), where V = {1, 2, . . . , n} and E are the node and edge sets of G, with cardinalities n = |V| and m = |E|, respectively. The collection of feasible seed sets C is determined by a cardinality constraint on the sets and possibly some combinatorial constraints (e.g. matroid constraints) that rule out some subsets of V. This implies that C {S V : |S| K}, for some K n. The diffusion model D speciﬁes the stochastic process under which

inﬂuence is propagated across the social network once a seed set S 2 C is selected. Without loss of generality, we assume that all stochasticity in D is encoded in a random vector w, referred to as the diffusion random vector. Note that throughout this paper, we denote vectors in bold case. We assume that each diffusion has a corresponding w sampled independently from an underlying probability distribution P speciﬁc to the diffusion model. For the widelyused models IC and LT, w is an m-dimensional binary vector encoding edge activations for all the edges in E, and P is parametrized by m inﬂuence probabilities, one for each edge. Once w is sampled, we use D(w) to refer to the particular realization of the diffusion model D. Note that by deﬁnition, D(w) is deterministic, conditioned on w.

Given the above deﬁnitions, an IM attempt can be described as: the marketer ﬁrst chooses a seed set S 2 C and then nature independently samples a diffusion random vector w P. Note that the inﬂuenced nodes in the diffusion are completely determined by S and D(w). We use the indicator 1

2 {0, 1} to denote if the node v is inﬂuenced under the seed set S and the particular realization D(w). For a given (G, D), once a seed set S C is chosen, for each node v 2 V, we use F(S, v) to denote the probability that v is inﬂuenced under the seed set S, i.e.,

F(S, v) = E

where the expectation is over all possible realizations D(w). We denote by F(S) = P

v2V F(S, v), the expected number of nodes that are inﬂuenced when the seed set S is chosen. The aim of the IM problem is to maximize F(S) subject to the constraint S 2 C, i.e., to ﬁnd S 2 arg max S2C F(S). Although IM is an NP-hard problem in general, under common diffusion models such as IC and LT, the objective function F(S) is monotone and submodular, and thus, a near-optimal solution can be computed in polynomial time using a greedy algorithm (Nemhauser et al., 1978). In this work, we assume that D is any diffusion model satisfying the following monotonicity assumption:

Assumption 1. For any v 2 V and any subsets S1 S2 V, if F(S1, v) F(S2, v), then F(S, v) is monotone in S. Note that all progressive diffusion models (models where once the user is inﬂuenced, they can not change their state), including those in (Kempe et al., 2003; Gomez Rodriguez et al., 2012; Li et al., 2013) satisfy Assumption 1.

3. Surrogate Objective

We now motivate and propose a surrogate objective for the IM problem based on the notion of maximal pairwise reachability. We start by deﬁning some useful notation. For any set S V and any set of pairwise probabilities p : V V ! [0, 1], for all nodes v 2 V, we deﬁne

f(S, v, p) = maxu2S pu,v (2)

Model-Independent Online Learning for Inﬂuence Maximization

where pu,v is the pairwise probability associated with the ordered node pair (u, v). We further deﬁne f(S, p) = P

v2V f(S, v, p). Note that for all p, f(S, p) is always monotone and submodular in S (Krause & Golovin, 2012).

For any pair of nodes u, v 2 V, we deﬁne the pairwise reachability from u to v as p

u,v = F({u}, v), i.e., the probability that v will be inﬂuenced, if u is the only seed node under graph G and diffusion model D. Throughout this paper, we use source node and seed interchangeably and refer to the nodes not in the seed set S as target nodes. We deﬁne f(S, v, p ) = maxu2S p

u,v as the maximal pairwise reachability from the seed set S to the target node v.

Our proposed surrogate objective for the IM problem is f(S, p ) = P

v2V f(S, v, p ). Based on this objective, an approximate solution e S to the IM problem can be obtained by maximizing f(S, p ) under the constraint S 2 C,

e S 2 arg max S2C f(S, p ) (3)

Recall that S is the optimal solution to the IM problem. To quantify the quality of the surrogate, we deﬁne the surrogate approximation factor as = f( e S, p )/F(S ). The following theorem, (proved in Appendix A) states that we can obtain the following upper and lower bounds on :

Theorem 1. For any graph G, seed set S 2 C, and diffusion model D satisfying Assumption 1,

1 f(S, p ) F(S),

2 If F(S) is submodular in S, then 1/K 1.

The above theorem implies that for any progressive model satisfying Assumption 1, maximizing f(S, p ) is equivalent to maximizing a lower-bound on the true spread F(S). For both IC and LT models, F(S) is both monotone and submodular, and the approximation factor can be bounded from below by 1/K. In Section 8, we empirically show that in cases of practical interest, f(S, p ) is a good approximation to F(S) and that is much larger than 1/K.

Finally, note that solving e S 2 arg max S2C f(S, p ) exactly might be computationally intractable and thus we need to compute a near-optimal solution based on an approximation algorithm. In this paper, we refer to such approximation algorithms as oracles to distinguish them from learning algorithms. Let ORACLE be a speciﬁc oracle and let b S

= ORACLE(G, C, p) be the seed set output by it. For any 2 [0, 1], we say that ORACLE is an -approximation algorithm if for all p : V V ! [0, 1], f( b S, p) max S2C f(S, p). For our particular case, since f(S, p ) is submodular, a valid oracle is the greedy algorithm which gives an = 1 1/e approximation (Nemhauser et al., 1978). Hence, given the knowledge of p , we can obtain an approcimate solution to the IM problem without knowing the exact underlying diffusion model.

4. Inﬂuence Maximization Semi-Bandits

We now focus on the case of a new marketer trying to learn the pairwise reachabilities by repeatedly interacting with the network. We describe the observable feedback (Section 4.2) and the learning framework (Section 4.3).

4.1. Inﬂuence Maximization Semi-Bandits

In an inﬂuence maximization semi-bandit problem, the agent (marketer) knows both G and C, but does not know the diffusion model D. Speciﬁcally, the agent knows neither the model of D, for instance whether D is the IC or LT model; nor its parameters, for instance the inﬂuence probabilities in the IC or LT model. Consider a scenario in which the agent interacts with the social network for T rounds. At each round t 2 {1, . . . , T}, the agent ﬁrst chooses a seed set St 2 C based on its prior knowledge and past observations and then nature independently samples a diffusion random vector wt P. Inﬂuence thus diffuses in the social network from St according to D(wt). The agent s reward at round t is the number of the inﬂuenced nodes

St, v, D(wt)

Recall that by deﬁnition, E [rt|St, D(wt)] = F (St). After each such IM attempt, the agent observes the pairwise inﬂuence feedback (described next) and uses it to improve the subsequent IM attempts. The agent s objective is to maximize the expected cumulative reward across the T rounds, i.e., to maximize E

. This is equivalent to minimizing the cumulative regret deﬁned subsequently.

4.2. Pairwise Inﬂuence Feedback Model

We propose a novel IM semi-bandit feedback model referred to as pairwise inﬂuence feedback. Under this feedback model, at the end of each round t, the agent observes 1

{u}, v, D(wt)

for all u 2 St and all v 2 V. In other words, it observes whether or not v would be inﬂuenced, if the agent selects S = {u} as the seed set under the diffusion model D(wt). This form of semi-bandit feedback is plausible in most IM scenarios. For example, on sites like Facebook, we can identify the user who inﬂuenced another user to share or like an article, and thus, can transitively trace the propagation to the seed which started the diffusion. Note that our assumption is strictly weaker than (and implied by) edge level semi-bandit feedback (Chen et al., 2016; Wen et al., 2017): from edge level feedback, we can identify the edges along which the diffusion travelled, and thus, determine whether a particular source node is responsible for activating a target node. However, from pairwise feedback, it is impossible to infer a unique edge level feedback.

4.3. Linear Generalization

Parametrizing the problem in terms of reachability probabilities results in O(n2) parameters that need to be

Model-Independent Online Learning for Inﬂuence Maximization

learned. Without any structural assumptions, this becomes intractable for large networks. To develop statistically efﬁcient algorithms for large-scale IM semi-bandits, we make a linear generalization assumption similar to (Wen et al., 2015; 2017). Assume that each node v 2 V is associated with two vectors of dimension d, the seed (source) weight

v 2 <d and the target feature xv 2 <d. We assume that the target feature xv is known, whereas

v is unknown and needs to be learned. The linear generalization assumption is stated as:

Assumption 2. For all u, v 2 V, p

u,v can be well approximated by the inner product of

u and xv, i.e.,

Note that for the tabular case (the case without generalization across p

u,v), we can always choose xv = ev 2 <n and

u,1, . . . , p

T , where ev is an n-dimensional indicator vector with the v-th element equal to 1 and all other elements equal to 0. However, in this case d = n, which is not desirable. Constructing target features when d n is non-trivial. We discuss a feature construction approach based on the unweighted graph Laplacian in Section 7. We use matrix X 2 <d n to encode the target features. Specifically, for v = 1, . . . , n, the v-th column of X is set as xv. Note that X = I 2 <n n in the tabular case.

Finally, note that under Assumption 2, estimating the reachability probabilities becomes equivalent to estimating n (one for each source) d-dimensional weight vectors. This implies that Assumption 2 reduces the number of parameters to learn from O(n2) to O(dn), and thus, is important for developing statistically efﬁcient algorithms for largescale IM semi-bandits.

4.4. Performance Metric

We benchmark the performance of an IM semi-bandit algorithm by comparing its spread against the attainable inﬂuence assuming perfect knowledge of D. Since it is NP-hard to compute the optimal seed set even when with perfect knowledge, similar to (Wen et al., 2017; Chen et al., 2016), we measure the performance of an IM semi-bandit algorithm by scaled cumulative regret. Speciﬁcally, if St is the seed set selected by the IM semi-bandit algorithm at round t, for any 2 (0, 1), the -scaled cumulative regret R (T) in the ﬁrst T rounds is deﬁned as

R (T) = T F(S ) 1

5. Algorithm

In this section, we propose a Lin UCB-based IM semibandit algorithm, called diffusion-independent Lin UCB (DILin UCB), whose pseudocode is in Algorithm 1. As its name suggests, DILin UCB is applicable to IM semi-bandits

Algorithm 1 Diffusion-Independent Lin UCB (DILin UCB)

1: Input: G = (V, E), C, oracle ORACLE, target feature

matrix X 2 Rd n, algorithm parameters c, λ, σ > 0 2: Initialize u,0 λId, bu,0 0, b u,0 0 for all

u 2 V, and UCB pu,v 1 for all u, v 2 V 3: for t = 1 to T do 4: Choose St ORACLE (G, C, p) 5: for u 2 St do 6: Get pairwise inﬂuence feedback yu,t 7: bu,t bu,t 1 + Xyu,t 8: u,t u,t 1 + σ 2XXT

9: b u,t σ 2 1

10: pu,v Proj[0,1]

hb u,txvi + ckxvk 1

, 8v 2 V 11: end for 12: for u 62 St do 13: bu,t = bu,t 1 14: u,t = u,t 1 15: end for 16: end for

with any diffusion model D satisfying Assumption ]refassum:monotone. The only requirement to apply DILin UCB is that the IM semi-bandit provides the pairwise inﬂuence feedback described in Section 4.2.

The inputs to DILin UCB include the network topology G, the collection of the feasible sets C, the optimization algorithm ORACLE, the target feature matrix X, and three algorithm parameters c, λ, σ > 0. The parameter λ is a regularization parameter whereas σ is proportional to the noise in the observations and hence controls the learning rate. For each source node u 2 V and time t, we deﬁne the Gram matrix u,t 2 <d d, and bu,t 2 <d as the vector summarizing the past propagations from u. The vector u,t is the source weight estimate for node u at round t. The mean reachability probability from u to v is given by hb u,t, xvi,

whereas its variance is given as kxvk 1

u,txv. Note that u and bu are sufﬁcient statistics for computing UCB estimates pu,v for all v 2 V. The parameter c trades off the mean and variance in the UCB estimates and thus controls the degree of optimism of the algorithm.

All the Gram matrices are initialized to λId, where Id denotes the d-dimensional identity matrix whereas the vectors bu,0 and u,0 are set to d-dimensional all-zeros vectors. At each round t, DILin UCB ﬁrst uses the existing UCB estimates to compute the seed set St based on the given oracle ORACLE (line 4 of Algorithm 1). Then, it observes the pairwise reachability vector yu,t for all the selected seeds in St. The vector yu,t is an n-dimensional column vector such that yu,t(v) = 1 ({u}, v, D(wt)) indicating whether node v is reachable from the source u at round t. Finally, for each of the K selected seeds u 2 St, DILin UCB up-

Model-Independent Online Learning for Inﬂuence Maximization

dates the sufﬁcient statistics (lines 7 and 8 of Algorithm 1) and the UCB estimates (line 10 of Algorithm 1). Here, Proj[0,1][ ] projects a real number onto the [0, 1] interval.

6. Regret Bound

In this section, we derive a regret bound for DILin UCB, under (1) Assumption 1, (2) perfect linear generalization i.e. p

u, xvi for all u, v 2 V, and (3) the assumption that ||xv||2 1 for all v 2 V. Notice that (2) is the standard assumption for linear bandit analysis (Dani et al., 2008), and (3) can always be satisﬁed by rescaling the target features. Our regret bound is stated below:

Theorem 2. For any λ, σ > 0, any feature matrix X, any -approximation oracle ORACLE, and any c satisfying

+ 2 log (n2T) +

if we apply DILin UCB with input (ORACLE, X, c, λ, σ), then its -scaled cumulative regret is upper-bounded as

1 + n T dλσ2

For the tabular case X = I, we obtain a tighter bound

Recall that speciﬁes the quality of the surrogate approximation. Notice that if we choose λ = σ = 1, and choose c s.t. Inequality 5 is tight, then our regret bound is e O(n2d

KT/( )) for general feature matrix X, and e O(n2.5p

KT/( )) in the tabular case. Here the e O hides log factors. We now brieﬂy discuss the tightness of our regret bounds. First, note that the O(1/ ) factor is due to the surrogate objective approximation discussed in Section 3, and the O(1/ ) factor is due to the fact that ORACLE is an -approximation algorithm. Second, note that the e O(

T)-dependence on time is near-optimal, and the e O(

K)-dependence on the cardinality of the seed sets is standard in the combinatorial semi-bandit literature (Kveton et al., 2015). Third, for general X, notice that the

e O(d)-dependence on feature dimension is standard in linear bandit literature (Dani et al., 2008; Wen et al., 2015). To explain the e O(n2) factor in this case, notice that one O(n) factor is due to the magnitude of the reward (the reward is from 0 to n, rather than 0 to 1), whereas one e O(pn) factor is due to the statistical dependence of the pairwise reachabilities. Assuming statistical independence between these reachabilities (similar to Chen et al. (2016)), we can shave off this e O(pn) factor. However this assumption is unrealistic in practice. Another e O(pn) is due to the fact that we learn one

u for each source node u (i.e. there is no generalization across the source nodes). Finally, for the

tabular case X = I, the dependence on d no longer exists, but there is another e O(pn) factor due to the fact that there is no generalization across target nodes.

We conclude this section by sketching the proof for Theorem 2 (the detailed proof is available in Appendix B and Appendix C). We deﬁne the good event as

u)| ckxvk 1

u,t 1 8u, v 2 V, t T},

and the bad event F as the complement of F. We then decompose the -scaled regret R (T) over F and F, and obtain the following inequality:

where P(F) is the probability of F. The regret bounds in Theorem 2 are derived based on worst-case bounds on PT

u,t 1 (Appendix B.2), and a

bound on P(F) based on the self-normalized bound for matrix-valued martingales developed in Theorem 3 (Appendix C).

7. Practical Implementation

In this section, we brieﬂy discuss how to implement our proposed algorithm, DILin UCB, in practical semi-bandit IM problems. Speciﬁcally, we will discuss how to construct features in Section 7.1, how to enhance the practical performance of DILin UCB based on Laplacian regularization in Section 7.2, and how to implement DILin UCB computationally efﬁciently in real-world problems in Section 7.3.

7.1. Target Feature Construction

Although DILin UCB is applicable with any target feature matrix X, in practice, its performance is highly dependent on the quality of X. In this subsection, we motivate and propose a systematic feature construction approach based on the unweighted Laplacian matrix of the network topology G. For all u 2 V, let p

u 2 <n be the vector encoding the reachabilities from the seed u to all the target nodes v 2 V. Intuitively, p

u tends to be a smooth graph function in the sense that target nodes close to each other (e.g., in the same community) tend to have similar reachabilities from u. From (Belkin et al., 2006; Valko et al., 2014), we know that a smooth graph function (in this case, the reachability from a source) can be expressed as a linear combination of eigenvectors of the weighted Laplacian of the network. In our case, the edge weights correspond to inﬂuence probabilities and are unknown in the IM semi-bandit setting. However, we use the above intuition to construct target features based on the unweighted Laplacian of G. Speciﬁcally, for a given d = 1, 2, . . . , n, we set the feature matrix X to be the bottom d eigenvectors (associated with d smallest eigenvalues) of the unweighted Laplacian of G. Other approaches to construct target features include the neighbour-

Model-Independent Online Learning for Inﬂuence Maximization

hood preserving node-level features as described in (Grover & Leskovec, 2016; Perozzi et al., 2014). We leave the investigation of other feature construction approaches to future work.

7.2. Laplacian Regularization

One limitation of our proposed DILin UCB algorithm is that it does not generalize across the seed nodes u. Speciﬁcally, it needs to learn the source node feature

u for each source node u separately, which is inefﬁcient for largescale semi-bandit IM problems. Similar to target features, the source features also tend to be smooth in the sense that k

u2k2 is small if nodes u1 and u2 are adjacent. We use this idea to design a prior which ties together the source features for different nodes, and hence transfers information between them. This idea of Laplacian regularization has been used in multi-task learning (Evgeniou et al., 2005) and for contextual-bandits in (Cesa-Bianchi et al., 2013; Vaswani et al., 2017). Speciﬁcally, at each round t, we compute b u,t by minimizing the following objective w.r.t u:

(yu,j XT u)2 + λ2

|| u1 u2||2

where λ2 0 is the regularization parameter. The implementation details are provided in Appendix D.

7.3. Computational Complexity

We now characterize the computational complexity of DILin UCB, and discuss how to implement it efﬁciently. Note that at each time t, DILin UCB needs to ﬁrst compute a solution St based on ORACLE, and then update the UCBs. Since u,t is positive semi-deﬁnite, the linear system in line 9 of Algorithm 1 can be solved using conjugate gradient in O(d2) time. It is straightforward to see the computational complexity to update the UCBs is O(Knd2). The computational complexity to compute St is dependent on ORACLE. For the classical setting in which C = {S V : |S| K} and ORACLE is the greedy algorithm, the computational complexity is O(Kn). To speed this up, we use the idea of lazy evaluations for submodular maximization proposed in (Minoux, 1978; Leskovec et al., 2007). It is known that this results in improved running time in practice.

8. Experiments

8.1. Empirical Veriﬁcation of Surrogate Objective

In this subsection, we empirically verify that the surrogate f(S, p ) proposed in Section 3 is a good approximation of the true IM objective F(S). We conduct our tests on random Kronecker graphs, which are known to capture many properties of real-world social networks (Leskovec et al., 2010). Speciﬁcally, we generate a social network instance

0 5 10 15 20 25 30 35 0

F (S) f(S, p )

0 5 10 15 20 25 30 35 0

g ) UB LB f( Sg, p )

Figure 1. Experimental veriﬁcation of surrogate objective.

(G, D) as follows: we randomly sample G as a Kronecker graph with n = 256 and sparsity equal to 0.03 1 (Leskovec et al., 2005). We choose D as the IC model and sample each of its inﬂuence probabilities independently from the uniform distribution U(0, 0.1). Note that this range of inﬂuence probabilities is guided by the empirical evidence in (Goyal et al., 2010; Barbieri et al., 2013). To weaken the dependence on a particular instance, all the results in this subsection are averaged over 10 randomly generated instances.

We ﬁrst numerically estimate the pairwise reachabilities p for all 10 instances based on social network simulation. In a simulation, we randomly sample a seed set S with cardinality K between 1 and 35, and record the pairwise inﬂuence indicator yu(v) from each source u 2 S to each target node v in this simulation. The reachability p

u,v is estimated by averaging the yu(v) values across 50k such simulations.

Based on the p so estimated, we compare f(S, p ) and F(S) as K, the seed set cardinality, varies from 2 to 35. For each K and each social network instance, we randomly sample 100 seed sets S with cardinality K. Then, we evaluate f(S, p ) based on the estimated p ; and numerically evaluate F(S) by averaging results of 500 inﬂuence simulations (diffusions). For each K, we average both F(S) and f(S, p ) across the random seed sets in each instance as well as across the 10 instances. We plot the average F(S) and f(S, p ) as a function of K in Figure 1(a). The plot shows that f(S) is a good lower bound on the true expected spread F(S), especially for low K.

Finally, we empirically quantify the surrogate approximation factor . As before, we vary K from 2 to 35 and average across 10 instances. Let = 1 e 1. For each instance and each K, we ﬁrst use the estimated p and the greedy algorithm to ﬁnd an -approximation solution e Sg to the surrogate problem max S f(S, p ). We then use the state-of-the-art IM algorithm (Tang et al., 2014) to compute an -approximation solution S

g to the IM problem max S F(S). Since F(S

g) F(S ) (Nemhauser et al.,

g)/ is an upper bound on F(S ).

From Theorem 1, LB

g)/K F(S )/K is a lower

1Based on the sparsity of typical social networks.

Model-Independent Online Learning for Inﬂuence Maximization

bound on f( e S, p ). We plot the average values (over 10 instances) of F(S

g), f( e Sg, p ), UB and LB against K in Figure 1(b). We observe that the difference in spreads does not increase rapidly with K. Although is lower-bounded with

1 K , in practice for all K 2 [2, 35], f( e Sg,p ) F (S g ) 0.55. This shows that in practice, our surrogate approximation is reasonable even for large K.

8.2. Performance of DILin UCB

We now demonstrate the performances of variants of DILin UCB and compare them with the start of the art. We choose the social network topology G as a subgraph of the Facebook network available at (Leskovec & Krevl, 2014), which consists of n = 4k nodes and m = 88k edges. Since true diffusion model is unavailable, we assume the diffusion model D is either an IC model or an LT model, and sample the edge inﬂuence probabilities independently from the uniform distribution U(0, 0.1). We also choose T = 5k rounds.

We compare DILin UCB against the CUCB algorithm (Chen et al., 2016) in both the IC model and the LT model, with K = 10. CUCB (referred to as CUCB(K) in plots) assumes the IC model, edge-level feedback and learns the inﬂuence probability for each edge independently. We demonstrate the performance of three variants of DILin UCB - the tabular case with X = I, independent estimation for each source node using target features (Algorithm 1) and Laplacian regularized estimation with target features (Appendix D). In the subsequent plots, to emphasize the dependence on K and d, these are referred to as TAB(K), I(K,d) and L(K,d) respectively. We construct features as described in Section 7.1. Similar to spectral clustering (Von Luxburg, 2007), the gap in the eigenvalues of the unweighted Laplacian can be used to choose the number of eigenvectors d. In our case, we choose the bottom d = 50 eigenvectors for constructing target features and show the effect of varying d in the next experiment. Similar to (Gentile et al., 2014), all hyper-parameters for our algorithm are set using an initial validation set of 500 rounds. The best validation performance was observed for λ = 10 4 and σ = 1.

We now brieﬂy discuss the performance metrics used in this section. For all S V and all t = 1, 2 . . ., we deﬁne rt(S) = P

v2V I (S, v, D(wt)), which is the realized reward at time t if S is chosen at that time. One performance metric is the per-step reward. Speciﬁcally, in one simulation, the per-step reward at time t is deﬁned as

t . Another performance metric is the cumulative regret. Since it is computationally intractable to derive S , our regret is measured with respect to S

g, the - approximation solution discussed in Section 8.1. In one simulation, the cumulative regret at time t is deﬁned as R(t) = Pt

. All the subsequent results are averaged across 5 independent simulations.

Figures 2(a) and 2(b) show the cumulative regret when the

underlying diffusion model is IC and LT, respectively. We have the following observations: (i) As compared to CUCB, the cumulative regret increases at a slower rate for all variants of DILin UCB, under both the IC and LT models, and for both the tabular case and case with features. (ii) Exploiting target features (linear generalization) in DILin UCB leads to a much smaller cumulative regret. (iii) CUCB is not robust to model misspeciﬁcation: it has a near linear cumulative regret under LT model. (iv) Laplacian regularization has little effect on the cumulative regret in these two cases. These observations clearly demonstrate the two main advantages of DILin UCB: it is both statistically efﬁcient and robust to diffusion model misspeciﬁcation. To explain (iv), we argue that the current combination of T, K, d and n results in sufﬁcient feedback for independent estimation to perform well and hence it is difﬁcult to observe any additional beneﬁt of Laplacian regularization. We provide additional evidence for this argument in the next experiment.

In Figure 3(a), we quantify the effect of varying d when the underlying diffusion model is IC and make the following observations: (i) The cumulative regret for both d = 10 and d = 100 is higher than that for d = 50. (ii) Laplacian regularization leads to observably lower cumulative regret when d = 100. Observation (iii) implies that d = 10 does not provide enough expressive power for linear generalization across the nodes of the network, whereas it is relatively difﬁcult to estimate 100-dimensional

u vectors within 5k rounds. Observation (iv) implies that tying source node estimates together imposes an additional bias which becomes important while learning higher dimensional coefﬁcients. This shows the potential beneﬁt of using Laplacian regularization for larger networks, where we will need higher d for linear generalization across nodes. We obtain similar results under the LT model.

In Figures 3(b) and 3(c), we show the effect of varying K on the per-step reward. We compare CUCB and the independent version of our algorithm when the underlying model is IC and LT. We make the following observations: (i) For both IC and LT, the per-step reward for all methods increases with K. (ii) For the IC model, the perstep reward for our algorithm is higher than CUCB when K = {5, 10, 20}, but the difference in the two spreads decreases with K. For K = 50, CUCB outperforms our algorithm. (iii) For the LT model, the per-step reward of our algorithm is substantially higher than CUCB for all K. Observation (i) is readily explained since both IC and LT are progressive models, and satisfy Assumption 1. To explain (ii), note that CUCB is correctly speciﬁed for the IC model. As K becomes higher, more edges become active and CUCB observes more feedback. It is thus able to learn more efﬁciently, leading to a higher per-step reward compared to our algorithm when K = 50. Observation (iii) again demonstrates that CUCB is not robust to diffusion model misspeciﬁcation, while DILin UCB is.

Model-Independent Online Learning for Inﬂuence Maximization

0 1000 2000 3000 4000 5000 0

Number of rounds

Cumulative Regret

CUCB(10) TAB(10) I(10,50) L(10,50)

(a) IC Model

0 1000 2000 3000 4000 5000 0

Number of rounds

Cumulative Regret

CUCB(10) TAB(10) I(10,50) L(10,50)

(b) LT Model

Figure 2. Comparing DILin UCB and CUCB on the Facebook subgraph with K = 10.

0 1000 2000 3000 4000 5000 0

Number of rounds

Cumulative Regret

CUCB(10) I(10,10) L(10,10) I(10,50) L(10,50) I(10,100) L(10,100)

(a) Effect of d in IC

0 1000 2000 3000 4000 5000 0

Number of rounds

Per step Reward

CUCB(5) CUCB(20) CUCB(50) I(5,50) I(20,50) I(50,50)

(b) Effect of K in IC

0 1000 2000 3000 4000 5000 0

Number of rounds

Per step Reward

CUCB(5) CUCB(20) CUCB(50) I(5,50) I(20,50) I(50,50)

(c) Effect of K in LT

Figure 3. Effects of varying d or K.

9. Related Work

IM semi-bandits have been studied in several recent papers (Wen et al., 2017; Chen et al., 2016; Vaswani et al., 2015; Carpentier & Valko, 2016). Chen et al. (2016) studied IM semi-bandit under edge-level feedback and the IC diffusion model. They formulated it as a combinatorial multi-armed bandit problem and proposed a UCB algorithm (CUCB). They only consider the tabular case, and derive an O(n3) regret bound that also depends on the reciprocal of the minimum observation probability p of an edge. This can be problematic in for example, a line graph with L edges where all edge weights are 0.5. Then 1/p is 2L 1, implying an exponentially large regret. Moreover, they assume that source nodes inﬂuence the target nodes independently, which is not true in most practical social networks. In contrast, both our algorithm and analysis are diffusion independent, and our analysis does not require the independent inﬂuence assumption made in (Chen et al., 2016). Our regret bound is O(n2.5) in the tabular case and O(n2d) in the general linear bandit case. Vaswani et al. (2015) use "-greedy and Thompson sampling algorithms for a different and more challenging feedback model, where the learning agent observes inﬂuenced nodes but not the edges. They do not give any theoretical guarantees. Concurrent to our work, Wen et al. (2017) consider a linear generalization model across edges and prove regret bounds under

edge-level feedback. Note that all of the above papers assume the IC diffusion model.

Carpentier & Valko (2016); Fang & Tao (2014) consider a simpler local model of inﬂuence, in which information does not transitively diffuse across the network. Lei et al. (2015) consider the related, but different, problem of maximizing the number of unique activated nodes across multiple rounds. They do not provide any theoretical analysis.

10. Conclusion

In this paper, we described a novel model-independent parametrization and a corresponding surrogate objective function for the IM problem. We used this parametrization to propose DILin UCB, a diffusion-independent learning algorithm for IM semi-bandits. We conjecture that with an appropriate generalization across source nodes, it may be possible to get a more statistically efﬁcient algorithm and get rid of an additional O(pn) factor in the regret bound. In the future, we hope to address alternate bandit algorithms such Thompson sampling, and feedback models such as node-level in Vaswani et al. (2015).

Acknowledgements: This research was supported by the Natural Sciences and Engineering Research Council of Canada.

Model-Independent Online Learning for Inﬂuence Maximization

Abbasi-Yadkori, Yasin, P al, D avid, and Szepesv ari, Csaba.

Improved algorithms for linear stochastic bandits. In Advances in Neural Information Processing Systems, pp. 2312 2320, 2011.

Barbieri, Nicola, Bonchi, Francesco, and Manco, Giuseppe. Topic-aware social inﬂuence propagation models. Knowledge and information systems, 37(3): 555 584, 2013.

Belkin, Mikhail, Niyogi, Partha, and Sindhwani, Vikas.

Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. Journal of machine learning research, 7(Nov):2399 2434, 2006.

Carpentier, Alexandra and Valko, Michal. Revealing graph

bandits for maximizing local inﬂuence. In International Conference on Artiﬁcial Intelligence and Statistics, 2016.

Cesa-Bianchi, Nicolo, Gentile, Claudio, and Zappella, Gio-

vanni. A gang of bandits. In Advances in Neural Information Processing Systems, pp. 737 745, 2013.

Chen, Wei, Wang, Yajun, and Yang, Siyu. Efﬁcient inﬂuence maximization in social networks. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 199 208. ACM, 2009.

Chen, Wei, Wang, Yajun, Yuan, Yang, and Wang, Qin-

shi. Combinatorial multi-armed bandit and its extension to probabilistically triggered arms. Journal of Machine Learning Research, 17(50):1 33, 2016.

Dani, Varsha, Hayes, Thomas P, and Kakade, Sham M.

Stochastic linear optimization under bandit feedback. In COLT, pp. 355 366, 2008.

Du, Nan, Liang, Yingyu, Balcan, Maria-Florina, and

Song, Le. Inﬂuence Function Learning in Information Diffusion Networks. Journal of Machine Learning Research, 32:2016 2024, 2014. URL http://machinelearning.wustl.edu/mlpapers/ papers/icml2014c2{_}du14.

Evgeniou, Theodoros, Micchelli, Charles A, and Pontil,

Massimiliano. Learning multiple tasks with kernel methods. Journal of Machine Learning Research, 6(Apr): 615 637, 2005.

Fang, Meng and Tao, Dacheng. Networked bandits with

disjoint linear payoffs. In Internattional Conference on Knowledge Discovery and Data Mining, 2014.

Gentile, Claudio, Li, Shuai, and Zappella, Giovanni. On-

line clustering of bandits. In Proceedings of the 31st International Conference on Machine Learning (ICML14), pp. 757 765, 2014.

Gomez Rodriguez, M, Sch olkopf, B, Pineau, Langford J,

et al. Inﬂuence maximization in continuous time diffusion networks. In 29th International Conference on Machine Learning (ICML 2012), pp. 1 8. International Machine Learning Society, 2012.

Goyal, Amit, Bonchi, Francesco, and Lakshmanan, Laks VS. Learning inﬂuence probabilities in social networks. In Proceedings of the third ACM international conference on Web search and data mining, pp. 241 250. ACM, 2010.

Goyal, Amit, Bonchi, Francesco, and Lakshmanan, Laks VS. A data-based approach to social inﬂuence maximization. Proceedings of the VLDB Endowment, 5(1):73 84, 2011a.

Goyal, Amit, Lu, Wei, and Lakshmanan, Laks VS. Sim-

path: An efﬁcient algorithm for inﬂuence maximization under the linear threshold model. In Data Mining (ICDM), 2011 IEEE 11th International Conference on, pp. 211 220. IEEE, 2011b.

Grover, Aditya and Leskovec, Jure. node2vec: Scalable

feature learning for networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 855 864. ACM, 2016.

Hestenes, Magnus Rudolph and Stiefel, Eduard. Methods

of conjugate gradients for solving linear systems, volume 49. 1952.

Kempe, David, Kleinberg, Jon, and Tardos, Eva. Maxi-

mizing the spread of inﬂuence through a social network. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 137 146. ACM, 2003.

Krause, Andreas and Golovin, Daniel. Submodular func-

tion maximization. Tractability: Practical Approaches to Hard Problems, 3(19):8, 2012.

Kveton, Branislav, Wen, Zheng, Ashkan, Azin, and

Szepesvari, Csaba. Tight regret bounds for stochastic combinatorial semi-bandits. In AISTATS, 2015.

Lei, Siyu, Maniu, Silviu, Mo, Luyi, Cheng, Reynold, and

Senellart, Pierre. Online inﬂuence maximization. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, NSW, Australia, August 10-13, 2015, pp. 645 654, 2015.

Leskovec, Jure and Krevl, Andrej. SNAP Datasets: Stan-

ford large network dataset collection. http://snap. stanford.edu/data, June 2014.

Leskovec, Jure, Krause, Andreas, Guestrin, Carlos, Falout-

sos, Christos, Van Briesen, Jeanne, and Glance, Natalie.

Model-Independent Online Learning for Inﬂuence Maximization

Cost-effective outbreak detection in networks. In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 420 429. ACM, 2007.

Leskovec, Jure, Chakrabarti, Deepayan, Kleinberg, Jon,

Faloutsos, Christos, and Ghahramani, Zoubin. Kronecker graphs: An approach to modeling networks. The Journal of Machine Learning Research, 11:985 1042, 2010.

Leskovec, Jurij, Chakrabarti, Deepayan, Kleinberg, Jon,

and Faloutsos, Christos. Realistic, mathematically tractable graph generation and evolution, using kronecker multiplication. In European Conference on Principles of Data Mining and Knowledge Discovery, pp. 133 145. Springer, 2005.

Li, Yanhua, Chen, Wei, Wang, Yajun, and Zhang, Zhi-

Li. Inﬂuence diffusion dynamics and inﬂuence maximization in social networks with friend and foe relationships. In Proceedings of the sixth ACM international conference on Web search and data mining, pp. 657 666. ACM, 2013.

Minoux, Michel. Accelerated greedy algorithms for maxi-

mizing submodular set functions. In Optimization Techniques, pp. 234 243. Springer, 1978.

Nemhauser, George L, Wolsey, Laurence A, and Fisher,

Marshall L. An analysis of approximations for maximizing submodular set functions. Mathematical Programming, 14(1):265 294, 1978.

Netrapalli, Praneeth and Sanghavi, Sujay. Learning the graph of epidemic cascades. In ACM SIGMETRICS Performance Evaluation Review, volume 40, pp. 211 222. ACM, 2012.

Perozzi, Bryan, Al-Rfou, Rami, and Skiena, Steven. Deep-

walk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 701 710. ACM, 2014.

Saito, Kazumi, Nakano, Ryohei, and Kimura, Masahiro.

Prediction of information diffusion probabilities for independent cascade model. In Knowledge-Based Intelligent Information and Engineering Systems, pp. 67 75. Springer, 2008.

Tang, Youze, Xiao, Xiaokui, and Yanchen, Shi. Inﬂuence maximization: Near-optimal time complexity meets practical efﬁciency. 2014.

Tang, Youze, Shi, Yanchen, and Xiao, Xiaokui. Inﬂuence maximization in near-linear time: A martingale approach. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, SIGMOD 15, pp. 1539 1554, 2015. ISBN 978-1-4503-2758-9.

Valko, Michal, Munos, R emi, Kveton, Branislav, and

Koc ak, Tom aˇs. Spectral bandits for smooth graph functions. In 31th International Conference on Machine Learning, 2014.

Vaswani, Sharan, Lakshmanan, Laks. V. S., and Mark

Schmidt. Inﬂuence maximization with bandits. Technical report, http://arxiv.org/abs/1503.00024, 2015. URL http://arxiv.org/abs/1503.00024.

Vaswani, Sharan, Schmidt, Mark, and Lakshmanan, Laks.

Horde of bandits using gaussian markov random ﬁelds. In Artiﬁcial Intelligence and Statistics, pp. 690 699, 2017.

Von Luxburg, Ulrike. A tutorial on spectral clustering. Statistics and computing, 17(4):395 416, 2007.

Wen, Zheng, Kveton, Branislav, and Ashkan, Azin. Efﬁ-

cient learning in large-scale combinatorial semi-bandits. In Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015, 2015.

Wen, Zheng, Kveton, Branislav, Valko, Michal, and

Vaswani, Sharan. Online inﬂuence maximization under independent cascade model with semi-bandit feedback. ar Xiv preprint ar Xiv:1605.06593v2, 2017.