# online_influence_maximization_under_linear_threshold_model__fe044d39.pdf Online Influence Maximization under Linear Threshold Model Shuai Li1 Fang Kong1 Kejie Tang1 Qizhi Li1 Wei Chen2 1Shanghai Jiao Tong University 2Microsoft Research {shuaili8,fangkong,tangkj00,qizhili}@sjtu.edu.cn weic@microsoft.com Online influence maximization (OIM) is a popular problem in social networks to learn influence propagation model parameters and maximize the influence spread at the same time. Most previous studies focus on the independent cascade (IC) model under the edge-level feedback. In this paper, we address OIM in the linear threshold (LT) model. Because node activations in the LT model are due to the aggregated effect of all active neighbors, it is more natural to model OIM with the node-level feedback. And this brings new challenge in online learning since we only observe aggregated effect from groups of nodes and the groups are also random. Based on the linear structure in node activations, we incorporate ideas from linear bandits and design an algorithm LT-Lin UCB that is consistent with the observed feedback. By proving group observation modulated (GOM) bounded smoothness property, a novel result of the influence difference in terms of the random observations, we provide a regret of order O(poly(m) T), where m is the number of edges and T is the number of rounds. This is the first theoretical result in such order for OIM under the LT model. In the end, we also provide an algorithm OIM-ETC with regret bound O(poly(m) T 2/3), which is model-independent, simple and has less requirement on online feedback and offline computation. 1 Introduction Social networks play an important role in spreading information in people s life. In viral marketing, companies wish to broadcast their products by making use of the network structure and characteristics of influence propagation. Specifically, they want to provide free products to the selected users (seed nodes), let them advertise through the network and maximize the purchase. There is a budget of the free products and the goal of the companies is to select the optimal seed set to maximize the influence spread. This problem is called influence maximization (IM) [19] and has a wide range of applications including recommendation systems, link prediction and information diffusion. In the IM problem, the social network is usually modeled as a directed graph with nodes representing users and directed edges representing influence relationship between users. IM studies how to select a seed set under a given influence propagation model to maximize the influence spread when the weights are known. Independent cascade (IC) model and linear threshold (LT) model [19] are two most widely used models to characterize the influence propagation in a social network, and both models use weights on edges as model parameters. In many real applications, however, the weights are usually unknown in advance. For example, in viral marketing, it is unrealistic to assume that the companies know the influence abilities beforehand. A possible solution is to learn those parameters from the diffusion data collected in the past [6, 36]. But this method lacks the ability of adaptive learning based on the need of influence maximization. Corresponding author 34th Conference on Neural Information Processing Systems (Neur IPS 2020), Vancouver, Canada. This motivates the studies on the online influence maximization (OIM) problem [28, 10, 11, 47, 49, 50, 45, 44], where the learner tries to estimate model parameters and maximize influence in an iterative manner. The studies on OIM are based on the multi-armed bandit (MAB) problem, which is a classical online learning framework and has been well studied in the literature [27]. MAB problem is formulated as a T-round game between a learner and the environment. In each round, the learner needs to decide which action to play and the environment will then reveal a reward according to the chosen action. The objective of the learner is to accumulate as many rewards as possible. An MAB algorithm needs to deal with the tradeoff between exploration and exploitation: whether the learner should try actions that has not been explored well yet (exploration) or focus on the action with the best performance so far (exploitation). Two algorithms, the explore-then-commit (ETC) [15] and the upper confidence bound (UCB) [4], are widely followed in the stochastic MAB setting, where the reward of each action follows an unknown but fixed distribution. Most existing works in OIM focus on IC model under edge-level feedback [10, 11, 47, 49, 50], where the information propagates independently between pairs of users and the learner can observe the liveness of individual edges as long as its source node is influenced. The independence assumption makes the formulation simple but a bit unrealistic. Often in the real scenarios, the influence propagations are correlated with each other. The LT model is usually used to model the herd behavior that a person is more likely to be influenced if many of her friends are influenced [7, 17, 20]. Thus for the LT model, it is more natural to use the node-level feedback where we only observe the node activations, since it is hard to pinpoint which neighbor or neighbors actually contribute to an activation in a herd behavior. In this paper, we first formulate the OIM problem under the LT model with the node-level feedback and distill effective information based on the feedback. The main challenge is that only the aggregated group effect on node activations can be observed and the aggregated groups are also random. Based on the linear structure of the LT model, we incorporate the idea of linear bandits and propose the LT-Lin UCB algorithm, whose update mechanism is consistent with the distilled information. By proving group observation modulated (GOM) bounded smoothness, a key property on the influence spread under two different weight vectors, we can bound the regret. Such a property is similar to the triggering probability modulated (TPM) bounded smoothness condition under the IC model with edge-level feedback [47], but the derivation in our case under the node-level feedback is more difficult. The regret is of order O(poly(m) T log T), where m is the number of edges and T is the number of rounds. Our LT-Lin UCB is the first OIM algorithm under the LT model that achieves the regret in this order. Finally we give OIM-ETC algorithm, applying to both IC and LT with node-level feedback. Though simple, it has less requirement on the observed feedback and the offline computation, and it achieves the regret bound O(poly(m)T 2/3), O(poly(m) log(T)/ 2). Related Work The problem of IM was first proposed as a discrete optimization problem by Kempe et al. [19]. Since then, various aspects of IM have been extensively studied (see [9, 31] for surveys in this area). Two most popular models in this field are the IC and LT models. The former assumes that the influence between pairs of users are independent and the latter characterizes the herd behavior. Some works [46, 18, 19, 42] study the IC model and some [12, 16, 19, 42] study the LT model. They all assume the weights on the edges are known and focus on the model properties and approximated solutions. We treat them as the offline setting. When the weight vectors are unknown, Chen et al. [11, 47] study the problem in the online setting, selecting seed sets as well as learning the parameters. They study the IC model with edge-level feedback, propose CUCB algorithm and show that CUCB achieves the distribution-dependent and distribution-independent regret bounds of O(poly(m) log(T)) and O(poly(m) T) respectively. Later Wen et al. [49] consider the large-scale setting and assume the edge probability is a linear function of the edge s feature vector. They provide a Lin UCB-based algorithm with O(dmn T ln(T)) worst-case regret, where d is the feature dimension and n is the number of nodes. Wu et al. [50] assume that each edge probability can be decomposed as the product of the influence probability of the start node and the susceptibility probability of the end node motivated by network assortativity. All these works study the IC model with edge-level feedback. Vaswani et al. [44] uses a heuristic objective function for OIM and brings up a model-independent algorithm under the pairwise feedback, where a node is influenced by a seed node or not. This applies to both IC and LT and the feedback scheme is relaxed than the edge-level feedback. Unfortunately, however, the heuristic objective has no theoretical approximation guarantee. Also, Vaswani et al. [45] study the IC model with node-level feedback about the estimation gap to that under the edge-level feedback but has no regret analysis. A report [43] studies the LT model with node-level feedback by optimization approaches but without theoretical guarantees. There is another work [26] studying the problem of linear multi-resource allocation, which can be formulated as a bipartite LT model. But they assume every node in the left partition (resources) is selected and the algorithm needs to assign allocations for each pair of left node and right node (tasks) representing the corresponding allocation of resources on tasks. Thus the problem is different from our OIM. The OIM problem under LT has been open for several years. We are the first to provide a reasonable formulation with an algorithm of regret O( OIM is a variant of combinatorial MAB (CMAB) [10, 22], where in each round the learner selects a combination of (base) arms. Most works [25, 24] study stochastic setting with the linear objective and semi-bandit feedback where the learner can observe the selected base arm s reward and the reward of the action is a linear function of these base arms rewards. CMAB in the stochastic setting with the linear objective and bandit feedback, where only the linear reward of the selected combination can be observed, is a special case of linear bandits. In the linear bandit setting, the learner selects a vector each round and the reward is a linear function of the selected vector action [3]. The most popular method to solve it is to construct confidence ellipsoids [14, 1, 38]. There are also works [8, 13] for CMAB in the adversarial setting and bandit feedback. But OIM is different: its objective function is non-linear and is dependent on unchosen and probabilistically triggered base arms. OIM is related to the problem of online learning with graph feedback [2] where the learner can observe the feedback of unchosen arms based on the graph structure. Though some of them study random graphs [33, 29, 21] where the set of observed arms is random, the settings are different. Under the graph feedback, the observations of unchosen arms are additional and the reward only depends on the chosen arms, while under the OIM, the additional observations also contribute to the reward. Cascading bandits [23, 30] also consider triggering on any selected list of arms and the triggering is in the order of the lists. Compared with graph feedback and OIM setting, its triggering graph is determined by the learning agent, not the adversary. As a generalization of graph feedback, partial monitoring [5] is also related to OIM. Most works in this direction, if applied directly to the OIM setting, are inefficient due to the exponentially large action space. Lin et al. [32] study a combinatorial version of partial monitoring and their algorithm provides a regret of order O(poly(m)T 2/3 log T) for OIM with LT. Our OIM-ETC, however, has regret bounds of O(poly(m)T 2/3) (better in the order of T) as well as a problem-dependent bound O(poly(m) log T). This section describes the setting of online influence maximization (OIM) under linear threshold (LT) model. The IM problem characterizes how to choose the seed nodes to maximize the influence spread on a social network. The network is usually represented by a directed graph G = (V, E) where V is the set of users and E is the set of relationships between users. Each edge e is associated with a weight w(e) 2 [0, 1]. For example, an edge e = (u, v) =: eu,v could represent user v follows user u on Twitter and w(e) represents the influence ability of user u on user v. Denote w = (w(e))e2E to be the weight vector and n = |V | , m = |E| , D to be node number, edge number and the diameter respectively, where the diameter of the graph is defined as the maximum (directed) distance between the pair of nodes in any connected component. Let N(v) = N in(v) be the set of all incoming neighbors of v, shortened as in-neighbors. Recall that under IC model, each edge is alive with probability equal to the associated weight independently and a node is influenced if there is a directed path connecting from a seed node in the realized graph. Compared to the IC model, the LT model does not require the strong assumption of independence and describes the joint influence of the active in-neighbors on a user, reflecting the herd behavior that often occurs in real life [7, 17, 20]. Now we describe in detail the diffusion process under the LT model. Suppose the seed set is S. In the beginning, each node is assigned with a threshold v, which is independently uniformly drawn from [0, 1] and characterizes the susceptibility level of node v. Denote = ( v)v2V to be the threshold vector. Let S be the set of activated nodes by the end of time . At time = 0, only nodes in the seed set are activated: S0 = S. At time + 1 with 0, for any node v /2 S that has not been activated yet, it will be activated if the aggregated influence of its active in-neighbors exceeds its threshold: P w(eu,v) v. Such diffusion process will last at most D time steps. The size of the influenced nodes is denoted as r(S, w, ) = |SD|. Let r(S, w) = E [r(S, w, )] be the influence spread of seed set S where the expectation is taken over all random variables v s. The IM problem is to find the seed set S with the size at most K under weight vector w to maximize the influence spread, max S2A r(S, w), where A = {S V : |S| K} is the action set for the seed nodes. We also adopt the usual assumption that P u2N(v) w(eu,v) 1 for any v 2 V . This assumption makes LT have an equivalent live-edge graph formulation like IC model [19, 9]. The term of graph G and seed size K will be omitted when the context is clear. Here we emphasize that the model parameters are the weights w while the threshold vector is not model parameter (which follows uniform distribution). The (offline) IM is NP-hard under the LT model but it can be approximately solved [19, 42]. For a fixed weight vector w, let SOpt w be an optimal seed set and Optw be its corresponding influence spread: SOpt w 2 argmax S2Ar(S, w) and Optw = r(SOpt w , w). Let Oracle be an (offline) oracle that outputs a solution given the weight vector as input. Then for , β 2 [0, 1], the Oracle is an ( , β)-approximation if P (r(S0, w) Optw) β where S0 = Oracle(w) is a solution returned by the Oracle for the weight vector w. Note when = β = 1 the oracle is exact. The online version is to maximize the influence spread when the weight vector (or the model parameter) w = (w(e))e2E is unknown. In each round t, the learner selects a seed set St, receives the observations and then updates itself accordingly. For the type of observations, previous works on IC mostly assume the edge-level feedback: the learner can observe the outgoing edges of each active node [11, 49, 50]. But for the LT model, it is not very realistic to assume the learner can observe which in-neighbor influences the target user since the LT model characterizes the aggregate influence of a crowd. So we consider a more realistic node-level feedback2 in this paper: the learner can only observe the influence diffusion process on node sets as St,0, . . . , St, , . . . in round t. The objective of the OIM is to minimize the cumulative -scaled regret [10, 49] over total T rounds: where the expectation is over the randomness on the threshold vector and the output of the adopted offline oracle in each round . Throughout this paper, we will use round t to denote a step in online learning and use time of round t to denote an influence diffusion step of seed set St in round t. 3 LT-Lin UCB Algorithm In this section, we show how to distill effective information based on the feedback and propose a Lin UCB-type algorithm, LT-Lin UCB, for OIM under LT. For each node v 2 V , denote wv = (w(eu,v))u2N(v) to be the weight vector of its incoming edges. Let χ(eu,v) 2 {0, 1}|N(v)| be the one-hot representation of the edge eu,v over all of v s incoming edges {eu,v : u 2 N(v)}, that is its e0-entry is 1 if and only if e0 = eu,v. Then w(eu,v) = χ(eu,v)>wv. For a subset of edges E0 {eu,v : u 2 N(v)}, we define χ(E0) := P e2E0 χ(e) 2 {0, 1}|N(v)| to be the vector whose e-entry is 1 if and only if e 2 E0. Here we abuse the notation that χ({e}) = χ(e). By this notation, the weight sum of the edges in E0 is simply written as χ(E0)>wv. A subset V 0 N(v) of v s in-neighbors can activate v if the weight sum of associated edges exceeds the threshold, that is χ(E0)>wv v with E0 = {eu,v : u 2 V 0}. Fix a diffusion process S0, S1, . . . , S , . . ., where the seed set is S0. For each node v, define 1(v) := min { = 0, . . . , D : N(v) \ S 6= ;} (2) 2One may think of the node-level feedback as knowing only the set of nodes activated by the end of the diffusion process. We refer to this as (partial) node-level feedback and ours as (full) node-level feedback. This naming comes from [35]. as the earliest time step when node v has active in-neighbors. Particularly we set 1(v) = D + 1 if node v has no active in-neighbor until the diffusion ends. For any 1(v), further define E (v) := {eu,v : u 2 N(v) \ S } (3) as the set of incoming edges associated with active in-neighbors of v at time step . Recall that the learner can only observe the aggregated influence ability of a node s active in-neighbors. Let 2(v) represent the time step that node v is influenced ( 2(v) > 1(v)), which is equivalent to mean that v s active in-neighbors of time 2(v) 1 succeed to influence it but those in time 2(v) 2 fail (E 1 = ;). Thus the defintion of 2(v) can be written as = 0, . . . , D : χ(E 2(v))>wv < v χ(E 1(v))>wv For consistency, we set 2(v) = D + 1 if node v is finally not influenced after the information diffusion ends. Then based on the definition of 1(v) and 2(v), we can obtain that node v is not influenced at time 2 ( 1(v), 2(v)), which means that the set of active in-neighbors of v at time step 1 fails to activate it. According to the rule of information diffusion under the LT model, an event that E0 {eu,v : u 2 N(v)} fails to activate v is equivalent to χ(E0)>wv < v, which happens with probability 1 χ(E0)>wv since v is uniformly drawn from [0, 1]. Similarly an event that E0 {eu,v : u 2 N(v)} succeeds to activate v is equivalent to χ(E0)>wv v, which happens with probability χ(E0)>wv. So for node v who has active in-neighbors, v is not influenced at time step ( 1(v) < < 2(v)) means that the set of v s active in-neighbors by 1 fails to activate it, thus we can use (χ(E 1(v)), 0) to update our belief on the unknown weight vector wv; v is influenced at time step 2(v) means that the set of v s active in-neighbors by 2(v) 1 succeeds to activate it, we can thus use (χ(E 2(v) 1(v)), 1) to update our belief on the unknown weight vector wv; v is finally not influenced means that all of its active in-neighbors (by time step D) fail to activate it, we can use (χ(E 2(v) 1(v)), 0) to update wv since 2(v) is defined as D + 1 in this case. Note all of these events are correlated (based on a same v), thus we can only choose at most one of them to update wv for node v who has active in-neighbors. If v has no active in-neighbors, we have no observation on wv and could update nothing. Figure 1: An example of diffusion process starting from S = {s} under LT. The upper part describes an influence diffusion process where yellow nodes represent influenced nodes by the current time. The lower part describes what E 1, E 2 1 are where we use blue (red) color to represent the edges and the associated active in-neighbors in E 1 (E 2 1, respectively) for the objective black node. Figure 1 gives an example of diffusion process and the definitions of edge-sets E 1 and E 2 1. The diffusion process is illustrated by the upper four figures, where the set S of influenced nodes by time is yellow colored. The lower five columns represent the sets E 1, E 2 1 for different nodes. For example, node v7 has active in-neighbors starting from = 1, thus 1(v7) = 1 and E 1(v7)(v7) = {eu,v7 : u 2 N(v7) \ S1} = {ev1,v7, ev2,v7}. And v7 is influenced at = 3 thus 2(v7) = 3 and E 2(v7) 1(v7) = {eu,v7 : u 2 N(v7) \ S2} = {ev1,v7, ev2,v7, ev5,v7}. Node v6 has no active in-neighbors, thus 1(v6) = 2(v6) = D + 1, both its E 1(v6)(v6) and E 2(v6) 1(v6) are empty sets. The above describes how to distill key observations for a diffusion under the LT model and also explains the update rule in the design of the algorithm. Denote 1, 2, E at round t as t,1, t,2, Et, and the diffusion process at round t as St,0, . . . , St, , . . .. Here we abuse a bit the notation S to represent both the seed set and the spread set in a round when the context is clear. Our algorithm LT-Lin UCB is given in Algorithm 1. It maintains the Gramian matrix Mv and the moment vector bv of regressand by regressors to store the information for wv. At each round t, the learner first computes the confidence ellipsoid for wv based on the current information (line 4) (see the following lemma). Lemma 1. Given {(At, yt)}1 t=1 with At 2 {0, 1}N and yt 2 {0, 1} as a Bernoulli random variable with E [yt | A1, y1, . . . , At 1, yt 1, At] = A> t wv, let Mt = I + Pt be the linear regression estimator. Then with probability at least 1 δ, for all t 1, it holds that wv lies in the confidence set w0 2 [0, 1]N : kw0 ˆwtk Mt N log(1 + t N) + 2 log 1 Algorithm 1 LT-Lin UCB 1: Input: Graph G = (V, E); seed set cardinality K; exploration parameter t,v > 0 for any t, v; offline oracle Pair Oracle 2: Initialize: M0,v I 2 R|N(v)| |N(v)|, b0,v 0 2 R|N(v)| 1, ˆw0,v 0 2 R|N(v)| 1 for any node v 2 V 3: for t = 1, 2, 3, . . . do 4: Compute the confidence ellipsoid Ct,v = v 2 [0, 1]|N(v)| 1 : kw0 v ˆwt 1,vk Mt 1,v t,v for any node v 2 V 5: Compute the pair (St, wt) by Pair Oracle with confidence set Ct = {Ct,v}v2V and seed set cardinality K 6: Select the seed set St and observe the feedback 7: // Update 8: for node v 2 V do 9: Initialize At,v 0 2 R|N(v)| 1, yt,v 0 2 R 10: Uniformly randomly choose 2 { 0 : t,1(v) 0 t,2(v) 1} 11: if v is influenced and = t,2(v) 1 then 12: At,v = χ (Et, (v)), yt,v = 1 13: else if = 1(v), . . . , 2(v) 2 or = 2(v) 1 but v is not influenced then 14: At,v = χ (Et, (v)), yt,v = 0 15: end if 16: Mt,v Mt 1,v + At,v A> t,v, bt,v bt 1,v + yt,v At,v, ˆwt,v = M 1 t,v bt,v 17: end for 18: end for This lemma is a direct corollary of [1, Theorem 2] for the concentration property of the weight vector wv. Thus when t,v |N(v)| log(1 + t|N(v)|) + 2 log 1 |N(v)|, the true weight vector wv lies in the confidence set Ct,v (line 4) for any t with probability at least 1 δ. Given the confidence set Cv for wv, the algorithm expects to select the seed set by solving the weight-constrained influence maximization (WCIM) problem argmax(S,w0):S2A,w02C r(S, w0) . (5) This (offline) optimization problem turns out to be highly nontrivial. Since we want to focus more on the online learning solution, we defer the full discussion on the offline optimization, including its general difficulty and our proposed approximate algorithms for certain graph classes such as directed acyclic graphs to Appendix B. Suppose its best solution is (SPOpt C ) where P stands for pair . Let Pair Oracle be an offline oracle to solve the optimization problem. We say Pair Oracle is an ( , β)-approximation oracle if P r(S0, w0) r(SPOpt β where (S0, w0) is an output by the oracle when the confidence set is C. Then the algorithm runs with the seed set output by the Pair Oracle and the confidence set Ct = {Ct,v}v2V (line 5). After observing the diffusion process (line 6), For each node v who has active in-neighbors, we randomly choose its active in-neighbors at time step 1(v), . . . , 2(v) 1 to update (line 10). Specifically, if v is influenced and = 2(v) 1, then it means that the set of active in-neighbors at time step succeeds to activate v, thus we use (χ(Et, (v)), 1) to update (line 12); if = 1(v), . . . , 2(v) 2 or = 2(v) 1 but node v is not influenced, it means that the set of active in-neighbors at fail to activate node v, thus we use (χ(Et, (v)), 0) to update (line 14). These updates are consistent with the distilled observations we get for nodes who have active in-neighbors. For node v who has no active in-neighbors, we have no obervation on wv and not update on it since the set { 0 : 1(v) 0 2(v) 1} is an empty set in this case. For example in Figure 1, node v7 has active in-neighbors from 1(v7) = 1 and is influenced at 2(v7) = 3. The LT-Lin UCB will uniformly randomly choose 2 {1, 2} (line 10). It updates (Av7 = χ(E1(v7)), yv7 = 0) if = 1 (line 14) and (Av7 = χ(E2(v7)), yv7 = 1) otherwise (line 12). For nodes v1, v2, v3, they all have 1 = 0 and 2 = 1. Thus for these three nodes, the algorithm chooses = 0 (line 10) and updates (Av = χ(E0(v)), yv = 1) (line 12). Node v4 has active in-neighbors from 1(v4) = 1 but is not influenced finally, the algorithm will randomly choose 2 {1, 2 . . . , D} and update (Av4 = χ(E (v4)), yv4 = 0) (line 14). Node v6 has no active in-neighbors, so we have no observation for its weight vector and will not update on it. 3.1 Regret Analysis We now provide the group observation modulated (GOM) bounded smoothness property for LT model, an important relationship of the influence spreads under two weight vectors. It plays a crucial role in the regret analysis and states that the difference of the influence spread r(S, w) under two weight vectors can be bounded in terms of the weight differences of the distilled observed edge sets under one weight vector. It is conceptually similar to the triggering probability modulate (TPM) bounded smoothness condition under the IC model with edge-level feedback [47], but its derivation and usage are quite different. For the seed set S, define the set of all nodes related to a node v, VS,v, to be the set of nodes that are on any path from S to v in graph G. Theorem 1. (GOM bounded smoothness) For any two weight vectors w, w0 2 [0, 1]m with P u2N(v) w(eu,v) 1, the difference of their influence spread for any seed set S can be bounded as |r(S, w0) r(S, w)| E (w0(e) w(e)) where the definitions of 1(u), 2(u) and E (u) are all under weight vector w, and the expectation is taken over the randomness of the thresholds on nodes. This theorem connects the reward difference with weight differences on the distilled observations, which are also the information used to update the algorithm (line 7-17). It links the effective observations, updates of the algorithm and the regret analysis. The proof needs to deal with intricate dependency among activation events, and is put in Appendix A.1. due to the space constraint. For seed set S 2 A and node u 2 V \ S, define NS,u := P v2V \S 1{u 2 VS,v} n K to be the number of nodes that u is relevant to. Then for the vector NS = (NS,u)u2V , define the upper bound of its L2-norm over all feasible seed sets γ(G) := max S,u (n K)pn = O(n3/2) , Algorithm 2 OIM-ETC 1: Input: G = (V, E), seed size K, exploration budget k, time horizon T, offline oracle Oracle 2: for s 2 [k], u 2 V do 3: Choose {u} as the seed set 4: Xs(eu,v) := 1{v is activated} for any v 2 N out(u) 5: end for 6: Compute ˆw(e) := 1 s=1 Xs(e) for any e 2 E 7: ˆS = Oracle( ˆw) 8: for the remaining T nk rounds do 9: Choose ˆS as the seed set 10: end for which is a constant related to the graph. Then we have the following regret bound. Theorem 2. Suppose the LT-Lin UCB runs with an ( , β)-approximation Pair Oracle and parameter n log(1 + tn) + 2 log 1 δ +pn for any node v 2 V . Then the β-scaled regret satisfies R(T) 2 T γ(G)D mn T log(1 + T)/ log(1 + n) + nδ T(n k) . (7) When δ = 1/(n T), R(T) C γ(G) Dn m T log(T) for some universal constant C. Due to space limits, the proof and the detailed discussions, as well as the values of γ(G), are put in Appendix A. 4 The Explore-then-Commit Algorithm This section presents the explore-then-commit (ETC) algorithm for OIM. Though simple, it is efficient and model independent, applying to both LT and IC model with less requirement on feedback and offline computation. Recall that under LT model, a node v is activated if the sum of weights from active in-neighbors exceeds the threshold v, which is uniformly drawn from [0, 1]. Since the feedback is node-level, if the activated node v has more than one active in-neighbors, then we can only observe the group influence effect of her active in-neighbors instead of each single in-neighbor. A simple way to overcome this limitation and manage to observe directly the single weight w(eu,v) is to select a single seed {u} and take only the first step influence as feedback, which formulates our OIM-ETC algorithm (Algorithm 2), representing the ETC algorithm of the OIM problem. Our OIM-ETC takes the exploration budget k as input parameter such that each node u is selected as the (single) seed for k rounds (line 3). For each round in which u is the seed, each outgoing neighbor (shortened as out-neighbor) v 2 N out(u) will be activated in the first step with probability P (w(eu,v) > v) = w(eu,v) since the threshold v is independently uniformly drawn from [0, 1]. Thus the first-step node-level feedback is actually edge-level feedback and we can observe the independent edges from the first-step feedback (line 4). Since each node is selected k times, we have k observations of Bernoulli random variables with expectation w(eu,v) in this exploration phase. Then we take the empirical estimate ˆw(e) for each w(e) (line 6) after the exploration and run with the seed set output by the offline Oracle (line 7) for the remaining T nk exploitation rounds (line 9). We assume the offline Oracle is ( , β)-approximation. Since it only needs the first step of the diffusion process and calls only once of the usual IM oracle, it is efficient and has less requirement. By selecting reasonable k, we can derive good regret bounds. Before that we need two definitions. Definition 1. (Bad seed set) A seed set S is bad if r(S, w) < Optw. The set of bad seed sets is SB := {S | r(S, w) < Optw}. Definition 2. (Gaps of bad seed sets) For a bad seed set S 2 SB, its gap is defined as S := Optw r(S, w). The maximum and minimum gap are defined as max := Optw min {r(S, w) | S 2 SB} , (8) min := Optw max {r(S, w) | S 2 SB} . (9) Theorem 3. When k = max , the β-scaled regret bound of our OIM-ETC algorithm over T rounds satisfies T max, n max + 2m2n3 max When k = 3.9(m2T/n)2/3, the β-scaled regret bound of OIM-ETC algorithm over T rounds satisfies R(T) 3.9(mn)4/3T 2/3 + 1 = O (mn)4/3T 2/3 The proof of the problem-dependent bound follows routine ideas of ETC algorithms but the proof of the problem-independent bound is new. The proofs and discussions are put in Appendix C. 5 Conclusion In this paper, we formulate the problem of OIM under LT model with node-level feedback and design how to distill effective information from observations. We prove a novel GOM bounded smoothness property for the spread function, which relates the limited observations, algorithm updates and the regret analysis. We propose LT-Lin UCB algorithm, provide rigorous theoretical analysis and show a competitive regret bound of O(poly(m) T ln(T)). Our LT-Lin UCB is the first algorithm for LT model with such regret order. Besides, we design OIM-ETC algorithm with theoretical analysis on its distribution-dependent and distribution-independent regret bounds. The algorithm is efficient, applies to both LT and IC models, and has less requirements on feedback and offline computation. In studying the OIM with LT model, we encounter an optimization problem of weight-constrained influence maximization (WCIM). Reconsidering an (offline) optimization problem by relaxing some fixed parameter to elements of a convex set is expected to be common in online learning. So we believe this problem could have independent interest. Also the OIM problem under IC model with node-level feedback is an interesting future work. Our regret analysis goes through thanks to the linearity of the LT model. But the local triggering is nonlinear for IC model, and thus we expect more challenges in the design and analysis of IC model with node-level feedback. Applying Thompson sampling to influence maximization is also an interesting future direction, but it could also be challenging, since it may not work well with offline approximation oracles as pointed out in [48]. Acknowledgement This work is sponsored by Shanghai Sailing Program. We thank Chihao Zhang for valuable discussions. Broader Impact Spread happens everywhere, no matter when a company wants to advertise their products, a group wants to seek attention, or even virus like COVID-19 walks between places. The modelling of a spread is always based on a directed graph and uses some diffusion assumptions. The linear threshold (LT) model, considered in our paper, is one of the most popular models. The influence maximization is the problem to pursue the widest spread, while the online version is to adaptively achieve this goal through an interactive manner with no knowledge of the parameters. Compared with application research, theoretical analysis will provide analysis and guarantees for the designed algorithms. The theoretical work usually include extreme cases to make sure the algorithm does have a good property, while in application research the possible experiments are always limited and usually do not reflect corner cases. The latter might be an issue if we want to transfer the model to some unseen scenes, e.g. automatic drive. Also among our derivations, we reveal a class of optimization problems (i.e. WCIM), which we expect to happen a lot if the community wants to solve some (offline) problem without the knowledge of the parameters using an online manner. This class, as a theoretical problem, is highly non-trivial and can introduce new problems to the optimization area. The hardness of this arising problem also reflects the difficulty of transforming offline or heuristic solutions to online or automatic ones, like the Auto ML direction and the topics of automatically adjusting hyper-parameters. [1] Yasin Abbasi-Yadkori, Dávid Pál, and Csaba Szepesvári. Improved algorithms for linear stochastic bandits. In Advances in Neural Information Processing Systems, pages 2312 2320, 2011. [2] Noga Alon, Nicolo Cesa-Bianchi, Ofer Dekel, and Tomer Koren. Online learning with feedback graphs: Beyond bandits. In Annual Conference on Learning Theory, volume 40. Microtome Publishing, 2015. [3] Peter Auer. Using confidence bounds for exploitation-exploration trade-offs. Journal of Machine Learning Research, 3:397 422, 2002. [4] Peter Auer, Nicolo Cesa-Bianchi, and Paul Fischer. Finite-time analysis of the multi-armed bandit problem. Machine learning, 47(2-3):235 256, 2002. [5] Gábor Bartók, Dean P Foster, Dávid Pál, Alexander Rakhlin, and Csaba Szepesvári. Partial monitoring classification, regret bounds, and algorithms. Mathematics of Operations Research, 39(4):967 997, 2014. [6] Simon Bourigault, Sylvain Lamprier, and Patrick Gallinari. Representation learning for infor- mation diffusion through social networks: An embedded cascade model. In Proceedings of the 9th ACM international conference on Web Search and Data Mining, pages 573 582, 2016. [7] Damon Centola and Michael Macy. Complex contagions and the weakness of long ties. American journal of Sociology, 113(3):702 734, 2007. [8] Nicolo Cesa-Bianchi and Gábor Lugosi. Combinatorial bandits. Journal of Computer and System Sciences, 78(5):1404 1422, 2012. [9] Wei Chen, Laks V. S. Lakshmanan, and Carlos Castillo. Information and Influence Propagation in Social Networks. Morgan & Claypool Publishers, 2013. [10] Wei Chen, Yajun Wang, and Yang Yuan. Combinatorial multi-armed bandit: General framework, results and applications. In Proceedings of the 30th International Conference on Machine Learning, pages 151 159, 2013. [11] Wei Chen, Yajun Wang, Yang Yuan, and Qinshi Wang. Combinatorial multi-armed bandit and its extension to probabilistically triggered arms. The Journal of Machine Learning Research, 17(1):1746 1778, 2016. [12] Wei Chen, Yifei Yuan, and Li Zhang. Scalable influence maximization in social networks under the linear threshold model. In Proceedings of the 2010 IEEE International Conference on Data Mining, pages 88 97, 2010. [13] Richard Combes, Mohammad Sadegh Talebi Mazraeh Shahi, Alexandre Proutiere, et al. Com- binatorial bandits revisited. In Advances in Neural Information Processing Systems, pages 2116 2124, 2015. [14] Varsha Dani, Thomas P Hayes, and Sham M Kakade. Stochastic Linear Optimization under Bandit Feedback. Citeseer, 2008. [15] Aurélien Garivier, Tor Lattimore, and Emilie Kaufmann. On explore-then-commit strategies. In Advances in Neural Information Processing Systems, pages 784 792, 2016. [16] Amit Goyal, Wei Lu, and Laks V. S. Lakshmanan. SIMPATH: An efficient algorithm for influence maximization under the linear threshold model. In Proceedings of the 2011 IEEE 11th International Conference on Data Mining, pages 211 220, 2011. [17] Mark Granovetter. Threshold models of collective behavior. American journal of sociology, 83(6):1420 1443, 1978. [18] Kyomin Jung, Wooram Heo, and Wei Chen. IRIE: Scalable and robust influence maximization in social networks. In Proceedings of the 2012 IEEE 12th International Conference on Data Mining, pages 918 923, 2012. [19] David Kempe, Jon Kleinberg, and Éva Tardos. Maximizing the spread of influence through a social network. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 137 146, 2003. [20] Elias Boutros Khalil, Bistra Dilkina, and Le Song. Scalable diffusion-aware optimization of network topology. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1226 1235, 2014. [21] Tomáš Kocák, Gergely Neu, and Michal Valko. Online learning with erd os-rényi sideobservation graphs. In Proceedings of the 32nd Conference on Uncertainty in Artificial Intelligence, pages 339 346, 2016. [22] Fang Kong, Qizhi Li, and Shuai Li. Survey on online influence maximization. Computer Science, 47(5):7 13, 2020. [23] Branislav Kveton, Csaba Szepesvari, Zheng Wen, and Azin Ashkan. Cascading bandits: Learning to rank in the cascade model. In Proceedings of the 32nd International Conference on Machine Learning, pages 767 776, 2015. [24] Branislav Kveton, Zheng Wen, Azin Ashkan, Hoda Eydgahi, and Brian Eriksson. Matroid bandits: Fast combinatorial optimization with learning. In Proceedings of the 30th Conference on Uncertainty in Artificial Intelligence, pages 420 429, 2014. [25] Branislav Kveton, Zheng Wen, Azin Ashkan, and Csaba Szepesvari. Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits. In Proceedings of the 18th International Conference on Artificial Intelligence and Statistics, pages 535 543, 2015. [26] Tor Lattimore, Koby Crammer, and Csaba Szepesvári. Linear multi-resource allocation with semi-bandit feedback. In Advances in Neural Information Processing Systems, pages 964 972, 2015. [27] Tor Lattimore and Csaba Szepesvári. Bandit algorithms. Cambridge University Press, 2020. [28] Siyu Lei, Silviu Maniu, Luyi Mo, Reynold Cheng, and Pierre Senellart. Online influence maxi- mization. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 645 654, 2015. [29] Shuai Li, Wei Chen, Zheng Wen, and Kwong-Sak Leung. Stochastic online learning with prob- abilistic graph feedback. In Proceedings of the 34th AAAI Conference on Artificial Intelligence, 2020. [30] Shuai Li, Baoxiang Wang, Shengyu Zhang, and Wei Chen. Contextual combinatorial cascading bandits. In Proceedings of the 33rd International Conference on Machine Learning, pages 1245 1253, 2016. [31] Yuchen Li, Ju Fan, Yanhao Wang, and Kian-Lee Tan. Influence maximization on social graphs: A survey. IEEE Transactions on Knowledge and Data Engineering, 30(10):1852 1872, 2018. [32] Tian Lin, Bruno Abrahao, Robert Kleinberg, John Lui, and Wei Chen. Combinatorial par- tial monitoring game with linear feedback and its applications. In Proceedings of the 31st International Conference on Machine Learning, pages 901 909, 2014. [33] Fang Liu, Swapna Buccapatnam, and Ness Shroff. Information directed sampling for stochastic bandits with graph feedback. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence, 2018. [34] Elchanan Mossel and Sebastien Roch. Submodularity of influence in social networks: From local to global. SIAM Journal on Computing, 39(6):2176 2188, 2010. [35] Harikrishna Narasimhan, David C Parkes, and Yaron Singer. Learnability of influence in networks. In Advances in Neural Information Processing Systems, pages 3186 3194, 2015. [36] Praneeth Netrapalli and Sujay Sanghavi. Learning the graph of epidemic cascades. In Pro- ceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems, pages 211 222, 2012. [37] Parikshit Ram and Alexander G Gray. Maximum inner-product search using cone trees. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 931 939, 2012. [38] Paat Rusmevichientong and John N Tsitsiklis. Linearly parameterized bandits. Mathematics of Operations Research, 35(2):395 411, 2010. [39] Fumin Shen, Wei Liu, Shaoting Zhang, Yang Yang, and Heng Tao Shen. Learning binary codes for maximum inner product search. In Proceedings of the IEEE International Conference on Computer Vision, pages 4148 4156, 2015. [40] Anshumali Shrivastava and Ping Li. Asymmetric lsh (alsh) for sublinear time maximum inner product search (mips). In Advances in Neural Information Processing Systems, pages 2321 2329, 2014. [41] Anthony Man-Cho So, Yinyu Ye, and Jiawei Zhang. A unified theorem on sdp rank reduction. Mathematics of Operations Research, 33(4):910 920, 2008. [42] Youze Tang, Yanchen Shi, and Xiaokui Xiao. Influence maximization in near-linear time: A martingale approach. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pages 1539 1554, 2015. [43] Sharan Vaswani and Nayantara Duttachoudhury. Learning influence diffusion probabilities under the linear threshold model. Github pages, 2013. https://vaswanis.github.io/ social_networks_report.pdf. [44] Sharan Vaswani, Branislav Kveton, Zheng Wen, Mohammad Ghavamzadeh, Laks V. S. Lak- shmanan, and Mark Schmidt. Model-independent online learning for influence maximization. In Proceedings of the 34th International Conference on Machine Learning, pages 3530 3539, 2017. [45] Sharan Vaswani, Laks V. S. Lakshmanan, Mark Schmidt, et al. Influence maximization with bandits. ar Xiv preprint ar Xiv:1503.00024, 2015. [46] Chi Wang, Wei Chen, and Yajun Wang. Scalable influence maximization for independent cas- cade model in large-scale social networks. Data Mining and Knowledge Discovery, 25(3):545 576, 2012. [47] Qinshi Wang and Wei Chen. Improving regret bounds for combinatorial semi-bandits with probabilistically triggered arms and its applications. In Advances in Neural Information Processing Systems, pages 1161 1171, 2017. [48] Siwei Wang and Wei Chen. Thompson sampling for combinatorial semi-bandits. In Proceedings of the 35th International Conference on Machine Learning, pages 5114 5122, 2018. [49] Zheng Wen, Branislav Kveton, Michal Valko, and Sharan Vaswani. Online influence maxi- mization under independent cascade model with semi-bandit feedback. In Advances in Neural Information Processing Systems, pages 3022 3032, 2017. [50] Qingyun Wu, Zhige Li, Huazheng Wang, Wei Chen, and Hongning Wang. Factorization bandits for online influence maximization. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 636 646, 2019.