# online_influence_maximization_under_linear_threshold_model__fe044d39.pdf

Online Inﬂuence Maximization under Linear

Threshold Model

Shuai Li1 Fang Kong1 Kejie Tang1 Qizhi Li1 Wei Chen2

1Shanghai Jiao Tong University 2Microsoft Research {shuaili8,fangkong,tangkj00,qizhili}@sjtu.edu.cn weic@microsoft.com

Online inﬂuence maximization (OIM) is a popular problem in social networks to learn inﬂuence propagation model parameters and maximize the inﬂuence spread at the same time. Most previous studies focus on the independent cascade (IC) model under the edge-level feedback. In this paper, we address OIM in the linear threshold (LT) model. Because node activations in the LT model are due to the aggregated effect of all active neighbors, it is more natural to model OIM with the node-level feedback. And this brings new challenge in online learning since we only observe aggregated effect from groups of nodes and the groups are also random. Based on the linear structure in node activations, we incorporate ideas from linear bandits and design an algorithm LT-Lin UCB that is consistent with the observed feedback. By proving group observation modulated (GOM) bounded smoothness property, a novel result of the inﬂuence difference in terms of the random observations, we provide a regret of order O(poly(m)

T), where m is the number of edges and T is the number of rounds. This is the ﬁrst theoretical result in such order for OIM under the LT model. In the end, we also provide an algorithm OIM-ETC with regret bound O(poly(m) T 2/3), which is model-independent, simple and has less requirement on online feedback and ofﬂine computation.

1 Introduction

Social networks play an important role in spreading information in people s life. In viral marketing, companies wish to broadcast their products by making use of the network structure and characteristics of inﬂuence propagation. Speciﬁcally, they want to provide free products to the selected users (seed nodes), let them advertise through the network and maximize the purchase. There is a budget of the free products and the goal of the companies is to select the optimal seed set to maximize the inﬂuence spread. This problem is called inﬂuence maximization (IM) [19] and has a wide range of applications including recommendation systems, link prediction and information diffusion.

In the IM problem, the social network is usually modeled as a directed graph with nodes representing users and directed edges representing inﬂuence relationship between users. IM studies how to select a seed set under a given inﬂuence propagation model to maximize the inﬂuence spread when the weights are known. Independent cascade (IC) model and linear threshold (LT) model [19] are two most widely used models to characterize the inﬂuence propagation in a social network, and both models use weights on edges as model parameters.

In many real applications, however, the weights are usually unknown in advance. For example, in viral marketing, it is unrealistic to assume that the companies know the inﬂuence abilities beforehand. A possible solution is to learn those parameters from the diffusion data collected in the past [6, 36]. But this method lacks the ability of adaptive learning based on the need of inﬂuence maximization.

Corresponding author

34th Conference on Neural Information Processing Systems (Neur IPS 2020), Vancouver, Canada.

This motivates the studies on the online inﬂuence maximization (OIM) problem [28, 10, 11, 47, 49, 50, 45, 44], where the learner tries to estimate model parameters and maximize inﬂuence in an iterative manner.

The studies on OIM are based on the multi-armed bandit (MAB) problem, which is a classical online learning framework and has been well studied in the literature [27]. MAB problem is formulated as a T-round game between a learner and the environment. In each round, the learner needs to decide which action to play and the environment will then reveal a reward according to the chosen action. The objective of the learner is to accumulate as many rewards as possible. An MAB algorithm needs to deal with the tradeoff between exploration and exploitation: whether the learner should try actions that has not been explored well yet (exploration) or focus on the action with the best performance so far (exploitation). Two algorithms, the explore-then-commit (ETC) [15] and the upper conﬁdence bound (UCB) [4], are widely followed in the stochastic MAB setting, where the reward of each action follows an unknown but ﬁxed distribution.

Most existing works in OIM focus on IC model under edge-level feedback [10, 11, 47, 49, 50], where the information propagates independently between pairs of users and the learner can observe the liveness of individual edges as long as its source node is inﬂuenced. The independence assumption makes the formulation simple but a bit unrealistic. Often in the real scenarios, the inﬂuence propagations are correlated with each other. The LT model is usually used to model the herd behavior that a person is more likely to be inﬂuenced if many of her friends are inﬂuenced [7, 17, 20]. Thus for the LT model, it is more natural to use the node-level feedback where we only observe the node activations, since it is hard to pinpoint which neighbor or neighbors actually contribute to an activation in a herd behavior.

In this paper, we ﬁrst formulate the OIM problem under the LT model with the node-level feedback and distill effective information based on the feedback. The main challenge is that only the aggregated group effect on node activations can be observed and the aggregated groups are also random. Based on the linear structure of the LT model, we incorporate the idea of linear bandits and propose the LT-Lin UCB algorithm, whose update mechanism is consistent with the distilled information. By proving group observation modulated (GOM) bounded smoothness, a key property on the inﬂuence spread under two different weight vectors, we can bound the regret. Such a property is similar to the triggering probability modulated (TPM) bounded smoothness condition under the IC model with edge-level feedback [47], but the derivation in our case under the node-level feedback is more difﬁcult. The regret is of order O(poly(m)

T log T), where m is the number of edges and T is the number of rounds. Our LT-Lin UCB is the ﬁrst OIM algorithm under the LT model that achieves the regret in this order. Finally we give OIM-ETC algorithm, applying to both IC and LT with node-level feedback. Though simple, it has less requirement on the observed feedback and the ofﬂine computation, and it achieves the regret bound O(poly(m)T 2/3), O(poly(m) log(T)/ 2).

Related Work The problem of IM was ﬁrst proposed as a discrete optimization problem by Kempe et al. [19]. Since then, various aspects of IM have been extensively studied (see [9, 31] for surveys in this area). Two most popular models in this ﬁeld are the IC and LT models. The former assumes that the inﬂuence between pairs of users are independent and the latter characterizes the herd behavior. Some works [46, 18, 19, 42] study the IC model and some [12, 16, 19, 42] study the LT model. They all assume the weights on the edges are known and focus on the model properties and approximated solutions. We treat them as the ofﬂine setting.

When the weight vectors are unknown, Chen et al. [11, 47] study the problem in the online setting, selecting seed sets as well as learning the parameters. They study the IC model with edge-level feedback, propose CUCB algorithm and show that CUCB achieves the distribution-dependent and distribution-independent regret bounds of O(poly(m) log(T)) and O(poly(m)

T) respectively. Later Wen et al. [49] consider the large-scale setting and assume the edge probability is a linear function of the edge s feature vector. They provide a Lin UCB-based algorithm with O(dmn

T ln(T)) worst-case regret, where d is the feature dimension and n is the number of nodes. Wu et al. [50] assume that each edge probability can be decomposed as the product of the inﬂuence probability of the start node and the susceptibility probability of the end node motivated by network assortativity. All these works study the IC model with edge-level feedback.

Vaswani et al. [44] uses a heuristic objective function for OIM and brings up a model-independent algorithm under the pairwise feedback, where a node is inﬂuenced by a seed node or not. This applies to both IC and LT and the feedback scheme is relaxed than the edge-level feedback. Unfortunately,

however, the heuristic objective has no theoretical approximation guarantee. Also, Vaswani et al. [45] study the IC model with node-level feedback about the estimation gap to that under the edge-level feedback but has no regret analysis. A report [43] studies the LT model with node-level feedback by optimization approaches but without theoretical guarantees. There is another work [26] studying the problem of linear multi-resource allocation, which can be formulated as a bipartite LT model. But they assume every node in the left partition (resources) is selected and the algorithm needs to assign allocations for each pair of left node and right node (tasks) representing the corresponding allocation of resources on tasks. Thus the problem is different from our OIM. The OIM problem under LT has been open for several years. We are the ﬁrst to provide a reasonable formulation with an algorithm of regret O(

OIM is a variant of combinatorial MAB (CMAB) [10, 22], where in each round the learner selects a combination of (base) arms. Most works [25, 24] study stochastic setting with the linear objective and semi-bandit feedback where the learner can observe the selected base arm s reward and the reward of the action is a linear function of these base arms rewards. CMAB in the stochastic setting with the linear objective and bandit feedback, where only the linear reward of the selected combination can be observed, is a special case of linear bandits. In the linear bandit setting, the learner selects a vector each round and the reward is a linear function of the selected vector action [3]. The most popular method to solve it is to construct conﬁdence ellipsoids [14, 1, 38]. There are also works [8, 13] for CMAB in the adversarial setting and bandit feedback. But OIM is different: its objective function is non-linear and is dependent on unchosen and probabilistically triggered base arms.

OIM is related to the problem of online learning with graph feedback [2] where the learner can observe the feedback of unchosen arms based on the graph structure. Though some of them study random graphs [33, 29, 21] where the set of observed arms is random, the settings are different. Under the graph feedback, the observations of unchosen arms are additional and the reward only depends on the chosen arms, while under the OIM, the additional observations also contribute to the reward. Cascading bandits [23, 30] also consider triggering on any selected list of arms and the triggering is in the order of the lists. Compared with graph feedback and OIM setting, its triggering graph is determined by the learning agent, not the adversary.

As a generalization of graph feedback, partial monitoring [5] is also related to OIM. Most works in this direction, if applied directly to the OIM setting, are inefﬁcient due to the exponentially large action space. Lin et al. [32] study a combinatorial version of partial monitoring and their algorithm provides a regret of order O(poly(m)T 2/3 log T) for OIM with LT. Our OIM-ETC, however, has regret bounds of O(poly(m)T 2/3) (better in the order of T) as well as a problem-dependent bound O(poly(m) log T).

This section describes the setting of online inﬂuence maximization (OIM) under linear threshold (LT) model. The IM problem characterizes how to choose the seed nodes to maximize the inﬂuence spread on a social network. The network is usually represented by a directed graph G = (V, E) where V is the set of users and E is the set of relationships between users. Each edge e is associated with a weight w(e) 2 [0, 1]. For example, an edge e = (u, v) =: eu,v could represent user v follows user u on Twitter and w(e) represents the inﬂuence ability of user u on user v. Denote w = (w(e))e2E to be the weight vector and n = |V | , m = |E| , D to be node number, edge number and the diameter respectively, where the diameter of the graph is deﬁned as the maximum (directed) distance between the pair of nodes in any connected component. Let N(v) = N in(v) be the set of all incoming neighbors of v, shortened as in-neighbors.

Recall that under IC model, each edge is alive with probability equal to the associated weight independently and a node is inﬂuenced if there is a directed path connecting from a seed node in the realized graph. Compared to the IC model, the LT model does not require the strong assumption of independence and describes the joint inﬂuence of the active in-neighbors on a user, reﬂecting the herd behavior that often occurs in real life [7, 17, 20].

Now we describe in detail the diffusion process under the LT model. Suppose the seed set is S. In the beginning, each node is assigned with a threshold v, which is independently uniformly drawn from [0, 1] and characterizes the susceptibility level of node v. Denote = ( v)v2V to be the threshold vector. Let S be the set of activated nodes by the end of time . At time = 0, only nodes in

the seed set are activated: S0 = S. At time + 1 with 0, for any node v /2 S that has not been activated yet, it will be activated if the aggregated inﬂuence of its active in-neighbors exceeds its threshold: P

w(eu,v) v. Such diffusion process will last at most D time steps. The

size of the inﬂuenced nodes is denoted as r(S, w, ) = |SD|. Let r(S, w) = E [r(S, w, )] be the inﬂuence spread of seed set S where the expectation is taken over all random variables v s. The IM problem is to ﬁnd the seed set S with the size at most K under weight vector w to maximize the inﬂuence spread, max S2A r(S, w), where A = {S V : |S| K} is the action set for the seed nodes. We also adopt the usual assumption that P

u2N(v) w(eu,v) 1 for any v 2 V . This assumption makes LT have an equivalent live-edge graph formulation like IC model [19, 9]. The term of graph G and seed size K will be omitted when the context is clear. Here we emphasize that the model parameters are the weights w while the threshold vector is not model parameter (which follows uniform distribution).

The (ofﬂine) IM is NP-hard under the LT model but it can be approximately solved [19, 42]. For a ﬁxed weight vector w, let SOpt

w be an optimal seed set and Optw be its corresponding inﬂuence spread: SOpt

w 2 argmax S2Ar(S, w) and Optw = r(SOpt

w , w). Let Oracle be an (ofﬂine) oracle that outputs a solution given the weight vector as input. Then for , β 2 [0, 1], the Oracle is an ( , β)-approximation if P (r(S0, w) Optw) β where S0 = Oracle(w) is a solution returned

by the Oracle for the weight vector w. Note when = β = 1 the oracle is exact.

The online version is to maximize the inﬂuence spread when the weight vector (or the model parameter) w = (w(e))e2E is unknown. In each round t, the learner selects a seed set St, receives the observations and then updates itself accordingly. For the type of observations, previous works on IC mostly assume the edge-level feedback: the learner can observe the outgoing edges of each active node [11, 49, 50]. But for the LT model, it is not very realistic to assume the learner can observe which in-neighbor inﬂuences the target user since the LT model characterizes the aggregate inﬂuence of a crowd. So we consider a more realistic node-level feedback2 in this paper: the learner can only observe the inﬂuence diffusion process on node sets as St,0, . . . , St, , . . . in round t.

The objective of the OIM is to minimize the cumulative -scaled regret [10, 49] over total T rounds:

where the expectation is over the randomness on the threshold vector and the output of the adopted ofﬂine oracle in each round .

Throughout this paper, we will use round t to denote a step in online learning and use time of round t to denote an inﬂuence diffusion step of seed set St in round t.

3 LT-Lin UCB Algorithm

In this section, we show how to distill effective information based on the feedback and propose a Lin UCB-type algorithm, LT-Lin UCB, for OIM under LT. For each node v 2 V , denote wv = (w(eu,v))u2N(v) to be the weight vector of its incoming edges. Let χ(eu,v) 2 {0, 1}|N(v)| be the

one-hot representation of the edge eu,v over all of v s incoming edges {eu,v : u 2 N(v)}, that is its e0-entry is 1 if and only if e0 = eu,v. Then w(eu,v) = χ(eu,v)>wv. For a subset of edges E0 {eu,v : u 2 N(v)}, we deﬁne χ(E0) := P

e2E0 χ(e) 2 {0, 1}|N(v)| to be the vector whose e-entry is 1 if and only if e 2 E0. Here we abuse the notation that χ({e}) = χ(e). By this notation, the weight sum of the edges in E0 is simply written as χ(E0)>wv. A subset V 0 N(v) of v s in-neighbors can activate v if the weight sum of associated edges exceeds the threshold, that is χ(E0)>wv v with E0 = {eu,v : u 2 V 0}.

Fix a diffusion process S0, S1, . . . , S , . . ., where the seed set is S0. For each node v, deﬁne

1(v) := min

{ = 0, . . . , D : N(v) \ S 6= ;} (2)

2One may think of the node-level feedback as knowing only the set of nodes activated by the end of the diffusion process. We refer to this as (partial) node-level feedback and ours as (full) node-level feedback. This naming comes from [35].

as the earliest time step when node v has active in-neighbors. Particularly we set 1(v) = D + 1 if node v has no active in-neighbor until the diffusion ends. For any 1(v), further deﬁne

E (v) := {eu,v : u 2 N(v) \ S } (3)

as the set of incoming edges associated with active in-neighbors of v at time step .

Recall that the learner can only observe the aggregated inﬂuence ability of a node s active in-neighbors. Let 2(v) represent the time step that node v is inﬂuenced ( 2(v) > 1(v)), which is equivalent to mean that v s active in-neighbors of time 2(v) 1 succeed to inﬂuence it but those in time 2(v) 2 fail (E 1 = ;). Thus the deﬁntion of 2(v) can be written as

= 0, . . . , D : χ(E 2(v))>wv < v χ(E 1(v))>wv

For consistency, we set 2(v) = D + 1 if node v is ﬁnally not inﬂuenced after the information diffusion ends. Then based on the deﬁnition of 1(v) and 2(v), we can obtain that node v is not inﬂuenced at time 2 ( 1(v), 2(v)), which means that the set of active in-neighbors of v at time step 1 fails to activate it.

According to the rule of information diffusion under the LT model, an event that E0 {eu,v : u 2 N(v)} fails to activate v is equivalent to χ(E0)>wv < v, which happens with probability 1 χ(E0)>wv since v is uniformly drawn from [0, 1]. Similarly an event that E0 {eu,v : u 2 N(v)} succeeds to activate v is equivalent to χ(E0)>wv v, which happens with probability χ(E0)>wv. So for node v who has active in-neighbors, v is not inﬂuenced at time step ( 1(v) < < 2(v)) means that the set of v s active in-neighbors by 1 fails to activate it, thus we can use (χ(E 1(v)), 0) to update our belief on the unknown weight vector wv; v is inﬂuenced at time step 2(v) means that the set of v s active in-neighbors by 2(v) 1 succeeds to activate it, we can thus use (χ(E 2(v) 1(v)), 1) to update our belief on the unknown weight vector wv; v is ﬁnally not inﬂuenced means that all of its active in-neighbors (by time step D) fail to activate it, we can use (χ(E 2(v) 1(v)), 0) to update wv since 2(v) is deﬁned as D + 1 in this case. Note all of these events are correlated (based on a same v), thus we can only choose at most one of them to update wv for node v who has active in-neighbors. If v has no active in-neighbors, we have no observation on wv and could update nothing.

Figure 1: An example of diffusion process starting from S = {s} under LT. The upper part describes an inﬂuence diffusion process where yellow nodes represent inﬂuenced nodes by the current time. The lower part describes what E 1, E 2 1 are where we use blue (red) color to represent the edges and the associated active in-neighbors in E 1 (E 2 1, respectively) for the objective black node.

Figure 1 gives an example of diffusion process and the deﬁnitions of edge-sets E 1 and E 2 1. The diffusion process is illustrated by the upper four ﬁgures, where the set S of inﬂuenced nodes by time is yellow colored. The lower ﬁve columns represent the sets E 1, E 2 1 for different nodes. For example, node v7 has active in-neighbors starting from = 1, thus 1(v7) = 1 and E 1(v7)(v7) = {eu,v7 : u 2 N(v7) \ S1} = {ev1,v7, ev2,v7}. And v7 is inﬂuenced at = 3 thus 2(v7) = 3 and E 2(v7) 1(v7) = {eu,v7 : u 2 N(v7) \ S2} = {ev1,v7, ev2,v7, ev5,v7}. Node v6 has no active in-neighbors, thus 1(v6) = 2(v6) = D + 1, both its E 1(v6)(v6) and E 2(v6) 1(v6) are empty sets.

The above describes how to distill key observations for a diffusion under the LT model and also explains the update rule in the design of the algorithm. Denote 1, 2, E at round t as t,1, t,2, Et, and the diffusion process at round t as St,0, . . . , St, , . . .. Here we abuse a bit the notation S to represent both the seed set and the spread set in a round when the context is clear.

Our algorithm LT-Lin UCB is given in Algorithm 1. It maintains the Gramian matrix Mv and the moment vector bv of regressand by regressors to store the information for wv. At each round t, the learner ﬁrst computes the conﬁdence ellipsoid for wv based on the current information (line 4) (see the following lemma).

Lemma 1. Given {(At, yt)}1

t=1 with At 2 {0, 1}N and yt 2 {0, 1} as a Bernoulli random variable with E [yt | A1, y1, . . . , At 1, yt 1, At] = A>

t wv, let Mt = I + Pt

be the linear regression estimator. Then with probability at least 1 δ, for all t 1, it holds that wv lies in the conﬁdence set

w0 2 [0, 1]N : kw0 ˆwtk Mt

N log(1 + t N) + 2 log 1

Algorithm 1 LT-Lin UCB

1: Input: Graph G = (V, E); seed set cardinality K; exploration parameter t,v > 0 for any t, v;

ofﬂine oracle Pair Oracle 2: Initialize: M0,v I 2 R|N(v)| |N(v)|, b0,v 0 2 R|N(v)| 1, ˆw0,v 0 2 R|N(v)| 1 for any

node v 2 V 3: for t = 1, 2, 3, . . . do

4: Compute the conﬁdence ellipsoid Ct,v =

v 2 [0, 1]|N(v)| 1 : kw0

v ˆwt 1,vk Mt 1,v t,v

for any node v 2 V 5: Compute the pair (St, wt) by Pair Oracle with conﬁdence set Ct = {Ct,v}v2V and seed set cardinality K 6: Select the seed set St and observe the feedback 7: // Update 8: for node v 2 V do 9: Initialize At,v 0 2 R|N(v)| 1, yt,v 0 2 R 10: Uniformly randomly choose 2 { 0 : t,1(v) 0 t,2(v) 1} 11: if v is inﬂuenced and = t,2(v) 1 then 12: At,v = χ (Et, (v)), yt,v = 1 13: else if = 1(v), . . . , 2(v) 2 or = 2(v) 1 but v is not inﬂuenced then 14: At,v = χ (Et, (v)), yt,v = 0 15: end if 16: Mt,v Mt 1,v + At,v A>

t,v, bt,v bt 1,v + yt,v At,v, ˆwt,v = M 1

t,v bt,v 17: end for 18: end for

This lemma is a direct corollary of [1, Theorem 2] for the concentration property of the weight vector

wv. Thus when t,v

|N(v)| log(1 + t|N(v)|) + 2 log 1

|N(v)|, the true weight vector wv lies in the conﬁdence set Ct,v (line 4) for any t with probability at least 1 δ.

Given the conﬁdence set Cv for wv, the algorithm expects to select the seed set by solving the weight-constrained inﬂuence maximization (WCIM) problem

argmax(S,w0):S2A,w02C r(S, w0) . (5)

This (ofﬂine) optimization problem turns out to be highly nontrivial. Since we want to focus more on the online learning solution, we defer the full discussion on the ofﬂine optimization, including its general difﬁculty and our proposed approximate algorithms for certain graph classes such as directed acyclic graphs to Appendix B.

Suppose its best solution is (SPOpt

C ) where P stands for pair . Let Pair Oracle be an ofﬂine oracle to solve the optimization problem. We say Pair Oracle is an ( , β)-approximation

oracle if P

r(S0, w0) r(SPOpt

β where (S0, w0) is an output by the oracle when the conﬁdence set is C. Then the algorithm runs with the seed set output by the Pair Oracle and the conﬁdence set Ct = {Ct,v}v2V (line 5).

After observing the diffusion process (line 6), For each node v who has active in-neighbors, we randomly choose its active in-neighbors at time step 1(v), . . . , 2(v) 1 to update (line 10). Speciﬁcally, if v is inﬂuenced and = 2(v) 1, then it means that the set of active in-neighbors at time step succeeds to activate v, thus we use (χ(Et, (v)), 1) to update (line 12); if = 1(v), . . . , 2(v) 2 or = 2(v) 1 but node v is not inﬂuenced, it means that the set of active in-neighbors at fail to activate node v, thus we use (χ(Et, (v)), 0) to update (line 14). These updates are consistent with the distilled observations we get for nodes who have active in-neighbors. For node v who has no active in-neighbors, we have no obervation on wv and not update on it since the set { 0 : 1(v) 0 2(v) 1} is an empty set in this case.

For example in Figure 1, node v7 has active in-neighbors from 1(v7) = 1 and is inﬂuenced at 2(v7) = 3. The LT-Lin UCB will uniformly randomly choose 2 {1, 2} (line 10). It updates (Av7 = χ(E1(v7)), yv7 = 0) if = 1 (line 14) and (Av7 = χ(E2(v7)), yv7 = 1) otherwise

(line 12). For nodes v1, v2, v3, they all have 1 = 0 and 2 = 1. Thus for these three nodes, the algorithm chooses = 0 (line 10) and updates (Av = χ(E0(v)), yv = 1) (line 12). Node v4 has active in-neighbors from 1(v4) = 1 but is not inﬂuenced ﬁnally, the algorithm will randomly choose 2 {1, 2 . . . , D} and update (Av4 = χ(E (v4)), yv4 = 0) (line 14). Node v6 has no active in-neighbors, so we have no observation for its weight vector and will not update on it.

3.1 Regret Analysis

We now provide the group observation modulated (GOM) bounded smoothness property for LT model, an important relationship of the inﬂuence spreads under two weight vectors. It plays a crucial role in the regret analysis and states that the difference of the inﬂuence spread r(S, w) under two weight vectors can be bounded in terms of the weight differences of the distilled observed edge sets under one weight vector. It is conceptually similar to the triggering probability modulate (TPM) bounded smoothness condition under the IC model with edge-level feedback [47], but its derivation and usage are quite different. For the seed set S, deﬁne the set of all nodes related to a node v, VS,v, to be the set of nodes that are on any path from S to v in graph G. Theorem 1. (GOM bounded smoothness) For any two weight vectors w, w0 2 [0, 1]m with P

u2N(v) w(eu,v) 1, the difference of their inﬂuence spread for any seed set S can be bounded as

|r(S, w0) r(S, w)| E

(w0(e) w(e))

where the deﬁnitions of 1(u), 2(u) and E (u) are all under weight vector w, and the expectation is taken over the randomness of the thresholds on nodes.

This theorem connects the reward difference with weight differences on the distilled observations, which are also the information used to update the algorithm (line 7-17). It links the effective observations, updates of the algorithm and the regret analysis. The proof needs to deal with intricate dependency among activation events, and is put in Appendix A.1. due to the space constraint.

For seed set S 2 A and node u 2 V \ S, deﬁne NS,u := P

v2V \S 1{u 2 VS,v} n K to be the number of nodes that u is relevant to. Then for the vector NS = (NS,u)u2V , deﬁne the upper bound of its L2-norm over all feasible seed sets

γ(G) := max

S,u (n K)pn = O(n3/2) ,

Algorithm 2 OIM-ETC

1: Input: G = (V, E), seed size K, exploration budget k, time horizon T, ofﬂine oracle Oracle 2: for s 2 [k], u 2 V do 3: Choose {u} as the seed set 4: Xs(eu,v) := 1{v is activated} for any v 2 N out(u) 5: end for 6: Compute ˆw(e) := 1

s=1 Xs(e) for any e 2 E

7: ˆS = Oracle( ˆw) 8: for the remaining T nk rounds do 9: Choose ˆS as the seed set 10: end for

which is a constant related to the graph. Then we have the following regret bound. Theorem 2. Suppose the LT-Lin UCB runs with an ( , β)-approximation Pair Oracle and parameter

n log(1 + tn) + 2 log 1

δ +pn for any node v 2 V . Then the β-scaled regret satisﬁes

R(T) 2 T γ(G)D

mn T log(1 + T)/ log(1 + n) + nδ T(n k) . (7)

When δ = 1/(n

T), R(T) C γ(G) Dn

m T log(T) for some universal constant C.

Due to space limits, the proof and the detailed discussions, as well as the values of γ(G), are put in Appendix A.

4 The Explore-then-Commit Algorithm

This section presents the explore-then-commit (ETC) algorithm for OIM. Though simple, it is efﬁcient and model independent, applying to both LT and IC model with less requirement on feedback and ofﬂine computation.

Recall that under LT model, a node v is activated if the sum of weights from active in-neighbors exceeds the threshold v, which is uniformly drawn from [0, 1]. Since the feedback is node-level, if the activated node v has more than one active in-neighbors, then we can only observe the group inﬂuence effect of her active in-neighbors instead of each single in-neighbor. A simple way to overcome this limitation and manage to observe directly the single weight w(eu,v) is to select a single seed {u} and take only the ﬁrst step inﬂuence as feedback, which formulates our OIM-ETC algorithm (Algorithm 2), representing the ETC algorithm of the OIM problem.

Our OIM-ETC takes the exploration budget k as input parameter such that each node u is selected as the (single) seed for k rounds (line 3). For each round in which u is the seed, each outgoing neighbor (shortened as out-neighbor) v 2 N out(u) will be activated in the ﬁrst step with probability P (w(eu,v) > v) = w(eu,v) since the threshold v is independently uniformly drawn from [0, 1]. Thus the ﬁrst-step node-level feedback is actually edge-level feedback and we can observe the independent edges from the ﬁrst-step feedback (line 4). Since each node is selected k times, we have k observations of Bernoulli random variables with expectation w(eu,v) in this exploration phase. Then we take the empirical estimate ˆw(e) for each w(e) (line 6) after the exploration and run with the seed set output by the ofﬂine Oracle (line 7) for the remaining T nk exploitation rounds (line 9). We assume the ofﬂine Oracle is ( , β)-approximation.

Since it only needs the ﬁrst step of the diffusion process and calls only once of the usual IM oracle, it is efﬁcient and has less requirement. By selecting reasonable k, we can derive good regret bounds. Before that we need two deﬁnitions. Deﬁnition 1. (Bad seed set) A seed set S is bad if r(S, w) < Optw. The set of bad seed sets is SB := {S | r(S, w) < Optw}. Deﬁnition 2. (Gaps of bad seed sets) For a bad seed set S 2 SB, its gap is deﬁned as S := Optw r(S, w). The maximum and minimum gap are deﬁned as

max := Optw min {r(S, w) | S 2 SB} , (8) min := Optw max {r(S, w) | S 2 SB} . (9)

Theorem 3. When k = max

, the β-scaled regret bound of our OIM-ETC algorithm over T rounds satisﬁes

T max, n max + 2m2n3 max

When k = 3.9(m2T/n)2/3, the β-scaled regret bound of OIM-ETC algorithm over T rounds satisﬁes

R(T) 3.9(mn)4/3T 2/3 + 1 = O

(mn)4/3T 2/3

The proof of the problem-dependent bound follows routine ideas of ETC algorithms but the proof of the problem-independent bound is new. The proofs and discussions are put in Appendix C.

5 Conclusion

In this paper, we formulate the problem of OIM under LT model with node-level feedback and design how to distill effective information from observations. We prove a novel GOM bounded smoothness property for the spread function, which relates the limited observations, algorithm updates and the regret analysis. We propose LT-Lin UCB algorithm, provide rigorous theoretical analysis and show a competitive regret bound of O(poly(m)

T ln(T)). Our LT-Lin UCB is the ﬁrst algorithm for LT model with such regret order. Besides, we design OIM-ETC algorithm with theoretical analysis on its distribution-dependent and distribution-independent regret bounds. The algorithm is efﬁcient, applies to both LT and IC models, and has less requirements on feedback and ofﬂine computation.

In studying the OIM with LT model, we encounter an optimization problem of weight-constrained inﬂuence maximization (WCIM). Reconsidering an (ofﬂine) optimization problem by relaxing some ﬁxed parameter to elements of a convex set is expected to be common in online learning. So we believe this problem could have independent interest. Also the OIM problem under IC model with node-level feedback is an interesting future work. Our regret analysis goes through thanks to the linearity of the LT model. But the local triggering is nonlinear for IC model, and thus we expect more challenges in the design and analysis of IC model with node-level feedback. Applying Thompson sampling to inﬂuence maximization is also an interesting future direction, but it could also be challenging, since it may not work well with ofﬂine approximation oracles as pointed out in [48].

Acknowledgement

This work is sponsored by Shanghai Sailing Program. We thank Chihao Zhang for valuable discussions.

Broader Impact

Spread happens everywhere, no matter when a company wants to advertise their products, a group wants to seek attention, or even virus like COVID-19 walks between places. The modelling of a spread is always based on a directed graph and uses some diffusion assumptions. The linear threshold (LT) model, considered in our paper, is one of the most popular models. The inﬂuence maximization is the problem to pursue the widest spread, while the online version is to adaptively achieve this goal through an interactive manner with no knowledge of the parameters.

Compared with application research, theoretical analysis will provide analysis and guarantees for the designed algorithms. The theoretical work usually include extreme cases to make sure the algorithm does have a good property, while in application research the possible experiments are always limited and usually do not reﬂect corner cases. The latter might be an issue if we want to transfer the model to some unseen scenes, e.g. automatic drive.

Also among our derivations, we reveal a class of optimization problems (i.e. WCIM), which we expect to happen a lot if the community wants to solve some (ofﬂine) problem without the knowledge of the parameters using an online manner. This class, as a theoretical problem, is highly non-trivial and can introduce new problems to the optimization area. The hardness of this arising problem also reﬂects the difﬁculty of transforming ofﬂine or heuristic solutions to online or automatic ones, like the Auto ML direction and the topics of automatically adjusting hyper-parameters.

[1] Yasin Abbasi-Yadkori, Dávid Pál, and Csaba Szepesvári. Improved algorithms for linear

stochastic bandits. In Advances in Neural Information Processing Systems, pages 2312 2320, 2011.

[2] Noga Alon, Nicolo Cesa-Bianchi, Ofer Dekel, and Tomer Koren. Online learning with feedback

graphs: Beyond bandits. In Annual Conference on Learning Theory, volume 40. Microtome Publishing, 2015.

[3] Peter Auer. Using conﬁdence bounds for exploitation-exploration trade-offs. Journal of Machine

Learning Research, 3:397 422, 2002.

[4] Peter Auer, Nicolo Cesa-Bianchi, and Paul Fischer. Finite-time analysis of the multi-armed

bandit problem. Machine learning, 47(2-3):235 256, 2002.

[5] Gábor Bartók, Dean P Foster, Dávid Pál, Alexander Rakhlin, and Csaba Szepesvári. Partial

monitoring classiﬁcation, regret bounds, and algorithms. Mathematics of Operations Research, 39(4):967 997, 2014.

[6] Simon Bourigault, Sylvain Lamprier, and Patrick Gallinari. Representation learning for infor-

mation diffusion through social networks: An embedded cascade model. In Proceedings of the 9th ACM international conference on Web Search and Data Mining, pages 573 582, 2016.

[7] Damon Centola and Michael Macy. Complex contagions and the weakness of long ties. American journal of Sociology, 113(3):702 734, 2007.

[8] Nicolo Cesa-Bianchi and Gábor Lugosi. Combinatorial bandits. Journal of Computer and

System Sciences, 78(5):1404 1422, 2012.

[9] Wei Chen, Laks V. S. Lakshmanan, and Carlos Castillo. Information and Inﬂuence Propagation

in Social Networks. Morgan & Claypool Publishers, 2013.

[10] Wei Chen, Yajun Wang, and Yang Yuan. Combinatorial multi-armed bandit: General framework,

results and applications. In Proceedings of the 30th International Conference on Machine Learning, pages 151 159, 2013.

[11] Wei Chen, Yajun Wang, Yang Yuan, and Qinshi Wang. Combinatorial multi-armed bandit and

its extension to probabilistically triggered arms. The Journal of Machine Learning Research, 17(1):1746 1778, 2016.

[12] Wei Chen, Yifei Yuan, and Li Zhang. Scalable inﬂuence maximization in social networks under

the linear threshold model. In Proceedings of the 2010 IEEE International Conference on Data Mining, pages 88 97, 2010.

[13] Richard Combes, Mohammad Sadegh Talebi Mazraeh Shahi, Alexandre Proutiere, et al. Com-

binatorial bandits revisited. In Advances in Neural Information Processing Systems, pages 2116 2124, 2015.

[14] Varsha Dani, Thomas P Hayes, and Sham M Kakade. Stochastic Linear Optimization under

Bandit Feedback. Citeseer, 2008.

[15] Aurélien Garivier, Tor Lattimore, and Emilie Kaufmann. On explore-then-commit strategies. In

Advances in Neural Information Processing Systems, pages 784 792, 2016.

[16] Amit Goyal, Wei Lu, and Laks V. S. Lakshmanan. SIMPATH: An efﬁcient algorithm for

inﬂuence maximization under the linear threshold model. In Proceedings of the 2011 IEEE 11th International Conference on Data Mining, pages 211 220, 2011.

[17] Mark Granovetter. Threshold models of collective behavior. American journal of sociology,

83(6):1420 1443, 1978.

[18] Kyomin Jung, Wooram Heo, and Wei Chen. IRIE: Scalable and robust inﬂuence maximization

in social networks. In Proceedings of the 2012 IEEE 12th International Conference on Data Mining, pages 918 923, 2012.

[19] David Kempe, Jon Kleinberg, and Éva Tardos. Maximizing the spread of inﬂuence through

a social network. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 137 146, 2003.

[20] Elias Boutros Khalil, Bistra Dilkina, and Le Song. Scalable diffusion-aware optimization of

network topology. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1226 1235, 2014.

[21] Tomáš Kocák, Gergely Neu, and Michal Valko. Online learning with erd os-rényi sideobservation graphs. In Proceedings of the 32nd Conference on Uncertainty in Artiﬁcial Intelligence, pages 339 346, 2016.

[22] Fang Kong, Qizhi Li, and Shuai Li. Survey on online inﬂuence maximization. Computer

Science, 47(5):7 13, 2020.

[23] Branislav Kveton, Csaba Szepesvari, Zheng Wen, and Azin Ashkan. Cascading bandits: Learning to rank in the cascade model. In Proceedings of the 32nd International Conference on Machine Learning, pages 767 776, 2015.

[24] Branislav Kveton, Zheng Wen, Azin Ashkan, Hoda Eydgahi, and Brian Eriksson. Matroid

bandits: Fast combinatorial optimization with learning. In Proceedings of the 30th Conference on Uncertainty in Artiﬁcial Intelligence, pages 420 429, 2014.

[25] Branislav Kveton, Zheng Wen, Azin Ashkan, and Csaba Szepesvari. Tight Regret Bounds for

Stochastic Combinatorial Semi-Bandits. In Proceedings of the 18th International Conference on Artiﬁcial Intelligence and Statistics, pages 535 543, 2015.

[26] Tor Lattimore, Koby Crammer, and Csaba Szepesvári. Linear multi-resource allocation with

semi-bandit feedback. In Advances in Neural Information Processing Systems, pages 964 972, 2015.

[27] Tor Lattimore and Csaba Szepesvári. Bandit algorithms. Cambridge University Press, 2020.

[28] Siyu Lei, Silviu Maniu, Luyi Mo, Reynold Cheng, and Pierre Senellart. Online inﬂuence maxi-

mization. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 645 654, 2015.

[29] Shuai Li, Wei Chen, Zheng Wen, and Kwong-Sak Leung. Stochastic online learning with prob-

abilistic graph feedback. In Proceedings of the 34th AAAI Conference on Artiﬁcial Intelligence, 2020.

[30] Shuai Li, Baoxiang Wang, Shengyu Zhang, and Wei Chen. Contextual combinatorial cascading

bandits. In Proceedings of the 33rd International Conference on Machine Learning, pages 1245 1253, 2016.

[31] Yuchen Li, Ju Fan, Yanhao Wang, and Kian-Lee Tan. Inﬂuence maximization on social graphs:

A survey. IEEE Transactions on Knowledge and Data Engineering, 30(10):1852 1872, 2018.

[32] Tian Lin, Bruno Abrahao, Robert Kleinberg, John Lui, and Wei Chen. Combinatorial par-

tial monitoring game with linear feedback and its applications. In Proceedings of the 31st International Conference on Machine Learning, pages 901 909, 2014.

[33] Fang Liu, Swapna Buccapatnam, and Ness Shroff. Information directed sampling for stochastic

bandits with graph feedback. In Proceedings of the 32nd AAAI Conference on Artiﬁcial Intelligence, 2018.

[34] Elchanan Mossel and Sebastien Roch. Submodularity of inﬂuence in social networks: From

local to global. SIAM Journal on Computing, 39(6):2176 2188, 2010.

[35] Harikrishna Narasimhan, David C Parkes, and Yaron Singer. Learnability of inﬂuence in

networks. In Advances in Neural Information Processing Systems, pages 3186 3194, 2015.

[36] Praneeth Netrapalli and Sujay Sanghavi. Learning the graph of epidemic cascades. In Pro-

ceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems, pages 211 222, 2012.

[37] Parikshit Ram and Alexander G Gray. Maximum inner-product search using cone trees. In

Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 931 939, 2012.

[38] Paat Rusmevichientong and John N Tsitsiklis. Linearly parameterized bandits. Mathematics of

Operations Research, 35(2):395 411, 2010.

[39] Fumin Shen, Wei Liu, Shaoting Zhang, Yang Yang, and Heng Tao Shen. Learning binary codes

for maximum inner product search. In Proceedings of the IEEE International Conference on Computer Vision, pages 4148 4156, 2015.

[40] Anshumali Shrivastava and Ping Li. Asymmetric lsh (alsh) for sublinear time maximum

inner product search (mips). In Advances in Neural Information Processing Systems, pages 2321 2329, 2014.

[41] Anthony Man-Cho So, Yinyu Ye, and Jiawei Zhang. A uniﬁed theorem on sdp rank reduction.

Mathematics of Operations Research, 33(4):910 920, 2008.

[42] Youze Tang, Yanchen Shi, and Xiaokui Xiao. Inﬂuence maximization in near-linear time: A

martingale approach. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pages 1539 1554, 2015.

[43] Sharan Vaswani and Nayantara Duttachoudhury. Learning inﬂuence diffusion probabilities

under the linear threshold model. Github pages, 2013. https://vaswanis.github.io/ social_networks_report.pdf.

[44] Sharan Vaswani, Branislav Kveton, Zheng Wen, Mohammad Ghavamzadeh, Laks V. S. Lak-

shmanan, and Mark Schmidt. Model-independent online learning for inﬂuence maximization. In Proceedings of the 34th International Conference on Machine Learning, pages 3530 3539, 2017.

[45] Sharan Vaswani, Laks V. S. Lakshmanan, Mark Schmidt, et al. Inﬂuence maximization with

bandits. ar Xiv preprint ar Xiv:1503.00024, 2015.

[46] Chi Wang, Wei Chen, and Yajun Wang. Scalable inﬂuence maximization for independent cas-

cade model in large-scale social networks. Data Mining and Knowledge Discovery, 25(3):545 576, 2012.

[47] Qinshi Wang and Wei Chen. Improving regret bounds for combinatorial semi-bandits with

probabilistically triggered arms and its applications. In Advances in Neural Information Processing Systems, pages 1161 1171, 2017.

[48] Siwei Wang and Wei Chen. Thompson sampling for combinatorial semi-bandits. In Proceedings

of the 35th International Conference on Machine Learning, pages 5114 5122, 2018.

[49] Zheng Wen, Branislav Kveton, Michal Valko, and Sharan Vaswani. Online inﬂuence maxi-

mization under independent cascade model with semi-bandit feedback. In Advances in Neural Information Processing Systems, pages 3022 3032, 2017.

[50] Qingyun Wu, Zhige Li, Huazheng Wang, Wei Chen, and Hongning Wang. Factorization bandits

for online inﬂuence maximization. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 636 646, 2019.