# robust_influence_maximization_for_hyperparametric_models__3891f058.pdf

Robust Inﬂuence Maximization for Hyperparametric Models

Dimitris Kalimeris 1 Gal Kaplun 1 Yaron Singer 1

In this paper we study the problem of robust inﬂuence maximization in the independent cascade model under a hyperparametric assumption. In social networks users inﬂuence and are inﬂuenced by individuals with similar characteristics and as such they are associated with some features. A recent surging research direction in inﬂuence maximization focuses on the case where the edge probabilities on the graph are not arbitrary but are generated as a function of the features of the users and a global hyperparameter. We propose a model where the objective is to maximize the worst-case number of inﬂuenced users for any possible value of that hyperparameter. We provide theoretical results showing that proper robust solution in our model is NP-hard and an algorithm that achieves improper robust optimization. We make-use of sampling based techniques and of the renowned multiplicative weight updates algorithm. Additionally we validate our method empirically and prove that it outperforms the state-of-the-art robust inﬂuence maximization techniques.

1. Introduction

In this paper we study robust inﬂuence maximization for hyperparametric diffusion models. First studied by Domingos and Richardson (Domingos & Richardson, 2001) and later elegantly formulated in seminal work by Kempe, Kleinberg, and Tardos (Kempe et al., 2003), inﬂuence maximization is the algorithmic task of selecting a small set of individuals who can effectively spread information in a network. The problem we formulate and address in this paper pertains to inﬂuence maximization for cases in which the information spread model in the network is subject to some uncertainty.

1Department of Computer Science, Harvard University, Cambridge, MA, USA. Correspondence to: Dimitris Kalimeris <kalimeris@g.harvard.edu>, Gal Kaplun <galkaplun@g.harvard.edu>.

Proceedings of the 36 th International Conference on Machine Learning, Long Beach, California, PMLR 97, 2019. Copyright 2019 by the author(s).

The most well studied model for information spread is the celebrated Independent Cascade (IC) model. In this model the social network is modeled by a graph and every pair of nodes u, v that are connected with an edge e = (u, v) E are associated with a probability pe that quantiﬁes the probability of u spreading information to v. Information spread in this model stochastically progresses from a set of nodes that initiates information to the rest of the nodes in the network as dictated by the graph topology and probabilities encoded on the edges. Inﬂuence maximization is then the algorithmic task of selecting a ﬁxed set of individuals that maximize the expected number of nodes that receive information. More formally, given a graph G = (V, E, p) where V is the set of nodes, E is the set of edges, and p [0, 1]|E| is the vector of edge probabilities, and a parameter k |V |, inﬂuence maximization is the optimization problem:

max S:|S| k fp(S)

where fp(S) is the expected number of nodes in the network that receive information when S is the initial set of nodes that spreads information.

In their seminal work, Kempe, Kleinberg, and Tardos proved that when p is known inﬂuence maximization can be reduced to monotone submodular maximization under a cardinality constraint. Consequently, a simple greedy algorithm that iteratively selects the node whose marginal inﬂuence is approximately maximal obtains a solution that is arbitrarily close to a 1 1/e factor of optimal (Nemhauser et al., 1978).

1.1. Inﬂuence maximization under model uncertainty

In recent years there has been a growing concern regarding the sensitivity of inﬂuence maximization to model uncertainty (Goyal et al., 2011; Adiga et al., 2013). In particular, small perturbations or uncertainty regarding the probability vector can have dramatic effects on the quality of a solution (see example illustrated in Figure 1), and even approximating the stability of an instance to small perturbations is intractable (He & Kempe, 2015).

1.2. Inﬂuence maximization under different models

The sensitivity to small errors that we mentioned in the previous subsection is one issue. The second issue that one needs to consider is that we might actually have several

Robust Inﬂuence Maximization for Hyperparametric Models

Figure 1. This graph indicates the sensitivity of the inﬂuence maximization task to minor differences in the diffusion probabilities and hence the necessity of robust optimization. If all the diffusion probabilities are 1, selecting the node in the beginning of the chain maximizes the inﬂuence. However, if the edge probabilities are 1 ϵ it will be optimal to select the node in the end of the chain.

models we want to optimize over. For example consider the case where a clothing company wants to advertise shirts and sweaters. The probabilities in the graph will be slightly different but we expect that the set of inﬂuential nodes will be more or less the same. Hence, it makes sense to try to identify a set of nodes that are inﬂuential, for all the underlying products, i.e. robust for the different models that we care (where each model induces different probabilities).

Robust inﬂuence maximization. To account for model uncertainty there has recently been a growing body of literature on robust inﬂuence maximization (He & Kempe, 2016; Chen et al., 2016a; 2017; Ohsaka & Yoshida, 2017; Anari et al., 2019) where the goal is to ﬁnd a set of nodes whose inﬂuence is maximal over an entire set P of models:

arg max S:|S| k min p P fp(S)

For a general set of models P it is easy to see that the robust inﬂuence maximization problem is either trivial or intractable. Speciﬁcally, if we have conﬁdence intervals for the diffusion probabilities, i.e. pe [c e , c+ e ] for all edges e then max S minp fp(S) simpliﬁes into max S fp (S) where p = (c e )e E due to the monotonicity of fp in p. In general P can be exponentially large hence the problem is intractable. Natural approaches like discretization, sampling, or maximin optimization over the inﬂuence function will fail to work for two computational reasons: 1) The space is of exponential dimension in |E| and 2) The inﬂuence function is highly non concave-convex, a form of functions that is amenable to max-min optimization. To circumvent these difﬁculties, previous work on robust inﬂuence maximization have taken two different approaches. The ﬁrst approach solves the max-min objective but assuming that the number of models is polynomial in the size of the problem (e.g. (Chen et al., 2017; Anari et al., 2019)). The second focuses on the robust ratio ρ(S) := min p P fp(S)/fp(S p)

where S p denotes the optimal solution for fp (He & Kempe, 2016; Chen et al., 2016a) which is a natural direction that comes with the caveat of not optimizing for the total number of nodes that are inﬂuenced.

However, optimizing for the robust ratio is not the right

Figure 2. Optimizing for the robust ratio may cost a multiplicative factor of n in the worst case. Here, n is the number of blue nodes and we consider k = 1. In f1, the optimal solution is node v while in f2 it is u. The table shows the robust ratios and the expected number of inﬂuenced nodes for nodes u and v in the form (ratio, expected). The ratio objective will choose node u, inﬂuencing only 1 node in expectation. However, the direct approach will select node v for n inﬂuenced nodes in expectation. Maximizing the robust ratio is not the right objective to consider.

objective to consider and it can be up to a factor of n worse than the real robust solution as proved in the following lemma (proof in Appendix A) and illustrated in Figure 2.

Lemma 1. Let P be a set of inﬂuence functions. Consider the solutions to the objectives: ˆSr = arg max S:|S| k ρ(S) and ˆSv = arg max S:|S| k minp P fp(S). There exists a set of inﬂuence functions P for which minp fp( ˆSr) = 1 n minp fp( ˆSv), and this approximation ratio is tight.

Robust optimization in hyperparameteric models. A recent line of work in inﬂuence maximization and learning in networks explores the interaction of correlations of edge probabilities with the inﬂuence spread (Wen et al., 2015; Vaswani et al., 2017; Aral & Dhillon, 2018; Kalimeris et al., 2018). Speciﬁcally, it restricts the hypothesis class of IC by imposing correlations on the way that the probabilities are created. It assumes that each node in the network is associated with some features encoding information about it, for example social (age, etc.) or graph-related (degree, pagerank, etc.) characteristics. The inﬂuence probability between two nodes is a function of their features and a global low-dimensional hyperparameter θ. Such approaches have been shown to have sample complexity that only depends on the dimension d and importantly have been shown to be highly predictive on real information spread data collected from Facebook even with hyperparametric models parameterized by small number of dimensions (Kalimeris et al., 2018). Intuitively, instead of searching individual probabilities we need to search for a hyperparameter in a much smaller space and once we ﬁnd the right one it pins down all the inﬂuence probabilities.

Low-dimensional hyperparametric models circumvent the hardness associated with continuous spaces, as they impose structure and the complexity of the inﬂuence model is largely determined by the dimension d. The main question we address in this paper can be informally stated as follows:

Robust Inﬂuence Maximization for Hyperparametric Models

Is there a computationally efﬁcient algorithm to perform robust optimization for hyperparametric models?

1.3. Main result

In this paper we show that in contrast to general inﬂuence models, the hyperparamteric approach enables tractable solutions to the robust inﬂuence maximization problem. At a high level, we show that by a simple sampling procedure one can ﬁnd an efﬁcient reduction from continuous to discrete robust inﬂuence maximization that allows us to approximate the value max min f instead of the ratio.

1.4. Paper Organization

We begin by formalizing the robust inﬂuence maximization problem and the hyperparametric model in Section 2. In Section 3 we describe the main technical result of the paper which allows reducing the continuous robust optimization problem to a discrete problem. In Section 4 we use this reduction and introduce the Hyperparametric Inﬂuence Robust Optimizer (HIRO) algorithm for robust inﬂuence maximization. In Section 5 we provide a strong hardness result that shows the NP-hardness of robust optimization, even in the sense of bi-criteria approximation. Finally we evaluate the empirical performance of our algorithm in Section 6.

2. Preliminaries

A social network is modeled by a graph G = (V, E) where V is the set of individuals in the network and E represents the friendships between them. We use n and m to denote |V | and |E| respectively. One of the core models for diffusion, the process through which information ﬂows between the nodes of G, is the Independent Cascade model which was popularized in the seminal work of (Kempe et al., 2003).

The Independent Cascade (IC) model. The IC model describes a discrete-step stochastic process through which diffusion spreads from a set of initially active individuals to the rest of the nodes in the network. Each node can be active or inactive and each edge e E in the network is associated with some probability pe. All nodes begin as inactive and at time step t = 0 a subset of nodes S V , called the seed set, is chosen and becomes active. At every time step t + 1, every node u that became active at time step t attempts to inﬂuence every of its non-active neighbors v, independently and succeeds with probability p(u,v).

Inﬂuence functions. For a given graph G = (V, E) and vector of probabilities p [0, 1]m the inﬂuence function fp : 2V R measures the expected number of nodes that will become inﬂuenced in the graph G for a seed set S V :

A E r A(S) Y

e/ A (1 pe)

where r A(S) denotes the number of nodes that are reachable in G from S using only edges from A. An important property of fp is that it is monotone submodular for any p.

Inﬂuence maximization. For a given inﬂuence function fp : 2[n] R and value k n, inﬂuence maximization is the optimization problem: max S:|S| k fp(S). The problem is NP-hard but since the inﬂuence function is monotone and submodular (Kempe et al., 2003) a simple greedy algorithm which iteratively selects nodes whose marginal contribution is largest obtains a solution that is a 1 1/e factor of the optimal solution (Nemhauser et al., 1978) and this approximation ratio is optimal unless P=NP (Feige, 1998).

Robust inﬂuence maximization. Given a graph G = (V, E) and a set of probability vectors P the goal of robust inﬂuence maximization is to ﬁnd a solution of size k that has high value for every possible inﬂuence function fp that can be generated by p P:

max S:|S| k min p P fp(S)

There are two sources of inapproximability known for this problem: ﬁrst, since it a generalization of inﬂuence maximization it is NP-hard to get any approximation better than 1 1/e and we are therefore satisﬁed with solutions that are approximately optimal. The second source of inapproximability is due to the fact that our solution space {S : |S| k} is highly non-convex which makes it intractable to obtain proper solutions (Figure 3). In particular, it is NP-hard to ﬁnd a set of size k that obtains any approximation better than O(log(n)) for the robust optimization problem (Chen et al., 2017). For this reason, we seek bi-criteria approximations. A solution ˆS is an (α, β) bi-criteria approximation to the max-min solution of size k if β| ˆS| k and:

min p P fp( ˆS) α max S:|S| k min p P fp(S)

Due to its sources of hardness the gold standard for robust inﬂuence maximization are 1 1/e, Ω(log 1(n))

bicreteria approximations (Chen et al., 2017; Krause et al., 2008; He & Kempe, 2016; Anari et al., 2019).

Hyperparametric inﬂuence models. A hyperparametric model H : Θ X [0, 1] restricts the traditional IC model by imposing correlations between the probabilities of different edges. Each edge e = (u, v) is associated with a d-dimensional feature vector xe X [ 1, 1]d

encoding information about its endpoints. The probability of u inﬂuencing v is a function of the features x(u,v), and a global hyperparameter θ Θ [ B, B]d, for some constant B > 0. That is: pe = H(θ, xe). The most standard hyperparametric models are Generalized Linear Models (GLM) for which:

H(θ, xe) = h(θ xe) + ξe

Robust Inﬂuence Maximization for Hyperparametric Models

Figure 3. The green node denotes the optimal solution in each case. Any deterministic solution has robust inﬂuence of 1 (only the selected node). However, the distribution selecting uniformly from {u, v} (improper solution) has expected inﬂuence of n/2.

where ξe is drawn from some bounded distribution. To ease the presentation, throughout this paper we treat the model as if ξe = 01. Our results hold for a family of generalized linear models that we deﬁne later which includes standard choices for h are linear, sigmoid, or the logarithm functions.

Hyperparameter Inﬂuence Robust Optimization. For a given hyperparametric model H, set of features {xe}e E and hyperparameter space Θ [ B, B]d the set of possible diffusion probabilities is:

H := {{H(θ, xe)}e E : θ Θ}

and robust inﬂuence maximization then reduces to:

min p H fp( ˆS) α max S:|S| k min p H fp(S)

Throughout the paper we prove theorems for general feature spaces and hyperparametric models. When the hyperparametric model H and set of features is clear from context, it will be convenient to use fθ instead of fp where p is the the probability vector p generated by a hyperparametric model. We will use the abbreviated notation deﬁned above:

min θ Θ fθ( ˆS) α max S:|S| k min θ Θ fθ(S)

3. Robust Optimization

In this section we describe the main result of the paper. We show that for an extremely broad class of hyperparametric models robust inﬂuence maximization is computationally tractable. More speciﬁcally, we show that for generalized linear hyperparameteric models that are 1Lipschitz, for any ϵ > 0, a natural sampling procedure from the hyperparameter space Θ = [ B, B]d generates

l O d n m d

δ inﬂuence functions f1, . . . , fl such that with probability 1 δ, robust continuous optimization over Θ reduces to robust discrete optimization over the functions {fi}l i=1. That is, an algorithm that returns ˆS V : | ˆS| k satisfying:

min i [l] fi( ˆS) α max S:|S| k min i [l] fi(S)

1We note that the results carry over when ξe is drawn from a distribution with mean 0 and bounded support.

implies the existence of an algorithm for which:

min θ Θ fθ( ˆS) α max S:|S| k min θ Θ fθ(S) ϵ.

This reduction from the continuous space of inﬁnitelymany functions {fθ}θ Θ to polynomially-many functions {fi}i [l] is handled in two steps. We ﬁrst prove that inﬂuence functions that are generated by a class of hyperparametric models that we call stable have bounded Lipschitzness (Section 3.1). Using this property we then prove that sampling polynomially-many functions from the hyperparametric model sufﬁces to obtain approximation to the robust objective (Section 3.2). Finally, we show how to produce near optimal solutions to the robust optimization problem deﬁned on {fθ}θ Θ by implementing a best-response oracle on a set of sampled functions {fi}i [l] using a Multiplicative Weight Updates (MWU) procedure (Section 4).

3.1. Stability implies Lipschitzness

We now prove that fθ is L-Lipschitz for L poly(n) if the hyperparametric model that generates it is stable.

Deﬁnition 1. A hyperparametric model H : Θ X [0, 1] is stable if it is a generalized linear model that is 1-Lipschitz with respect to the ℓ1 norm, i.e. for θ Θ, xe X we have that H(θ, xe) = h(θ xe) and for every θ, θ Θ:

|h(θ xe) h(θ xe)| θ θ 1

It is easy to verify that the hyperparametric models used in inﬂuence maximization literature are stable: linear H(θ, xe) = θ xe, logistic H(θ, xe) = 1 1+exp( θ xe) and probit H(θ, xe) = Φ(θ xe) where Φ is the CDF of the standard Gaussian, appear in (Wen et al., 2015; Vaswani et al., 2017; Kalimeris et al., 2018) and are all stable.

Intuitively, stable hyperparametric models are not sensitive to small changes of the hyperparameter. Hence, despite the fact that a modiﬁcation on θ affects the probabilities in all edges, the difference is not very large and we are able to bound the absolute change in the inﬂuence function. The following two lemmas are inspired by (Chen et al., 2016a) and (Chen et al., 2016b) who proved similar results for non-hyperparametric models.

Lemma 2. Assume that the hyperparametric model is stable. Then, the inﬂuence function fθ is Lipschitz with respect to the ℓ1 norm with Lipschitz constant L = nm.

Proof. Since H is stable we get that for every e E and every two edge probabilities pe, p e produced by the hyperparametric model by parameters θ, θ Θ:

|pe p e| = |H(θ, xe) H(θ , xe)| (1)

= |h(θ xe) h(θ xe)| θ θ 1. (2)

Robust Inﬂuence Maximization for Hyperparametric Models

Hence, θ θ 1 ϵ implies |pe p e| ϵ.

Now notice that the inﬂuence function is monotone with respect to the diffusion probabilities. As a result, the maximum change that can occur given the constraint |pe p e| ϵ is when p e = min{pe + ϵ, 1} (or p e = max{0, pe ϵ}) for all e E. We focus on the ﬁrst case; the second is identical.

Fix a seed set S. When the probability of an edge e = (u, v) is increased from pe to pe + ϵ, there is an increase in activation probability for all the nodes that are reachable from S through u by at most ϵ. There are at most n nodes that are reachable through u so the total change in the inﬂuence of set S in that case is nϵ. Using the same argument for each edge in the network gives the desired bound of nmϵ.

In our method, the Lipschitz parameter is polynomially related to the complexity of implementing a best-response oracle in the MWU procedure. The fact that the Lipschitzness L is polynomial in the number of nodes in the graph n is, therefore, a crucial property as it makes the robust optimization problem computationally tractable.

Tightness of the Lipschitz constant. The Lipschitzness of the function polynomially determines the computational complexity of our method. As we show, the Lipschitzness bound L = n m is tight in the worst case over all possible graphs (proof is deferred to Appendix A).

Lemma 3. There is a graph for which any inﬂuence function generated by a stable non-trivial2 generalized linear model that is continuous has Lipschitzness L = nm.

3.2. Covering Θ via sampling

The Lipschitzness property from Lemma 2 is useful since it bounds the change in the estimation of inﬂuence function fθ when we approximate it with another function fθ for θ, θ

that are ϵ-close, i.e. θ θ 1 ϵ. Thus, if we can construct a set of representative parameters Θϵ s.t. θ Θ there is some θ which is ϵ close with respect to the ℓ1 norm we can reduce the robust optimization problem over the inﬁnite hyperparameter space to the ﬁnite case. We know now this can be done using a reasonable number of samples from Θ.

Lemma 4. Let Θ = [ B, B]d and ϵ, δ > 0. If we sample a

set Θϵ of size s O d Bd

δ uniformly at random from Θ, then with probability at least 1 δ, for any θ Θ there exists θ Θϵ such that θ θ 1 ϵ.

Proof. It sufﬁces to partition the hypercube [ B, B]d into ℓ1 balls of radius ϵ and bound the number of points necessary to sample in order to have at least one point in each ℓ1 ball. Since every point in Θ is covered by a ball and every two

2h : R [λ, 1 λ] is non-trivial if it is continuous and surjective for some xe X.

points in the ball are close, sampling points that correspond to every ball is a representative set of parameters that are ϵ-close w.r.t. the ℓ1 distance to any parameter in Θ.

It is well known that r = 2B d

ϵ d balls of ℓ1 radius ϵ sufﬁce to cover the d-dimensional hypercube [ B, B]d. Let b1, . . . , br denote the covering balls and θ1, . . . , θs be points drawn uniformly at random from Θ = [ B, B]d. Then:

P[ bj such that θi bj] X

j P[ θi bj]

r P[ θi b1]

Using s O d( Bd

ϵ )d log Bd

ϵδ samples we get:

P[ bj such that θi bj] δ

Thus, since Θ jbj and the radius of the balls is ϵ we get that for every θ Θ there exists a θ {θi}i [s] that is ϵ close w.r.t. ℓ1 norm with probability at least 1 δ.

Furthermore, we can extend the notion of covering a convex space Θ to the coverage of a family of functions.

Deﬁnition 2. Let F = {fθ : 2V R | θ Θ} be a family of inﬂuence functions and Fϵ F such that |Fϵ| < . We say that Fϵ ϵ-covers F if for any f F there exists an fϵ Fϵ and any S V s.t. |f(S) fϵ(S)| ϵ.

The following corollary is obtained by fusing lemmas 2 and 4 with Deﬁnition 2.

Corollary 1. Let F = {fθ : 2V R | θ Θ} be the family of inﬂuence functions and let Fϵ be sampled uniformly at random from F, such that |Fϵ| O d LBd

δ . Then Fϵ ϵ-covers F with probability at least 1 δ.

4. Reducing Continuous to Discrete RO

In the previous section we proved that since the inﬂuence function is Lipschitz, the inﬁnite family of functions F = {fθ | θ Θ} can be well-approximated by the ﬁnite (and crucially, polynomially-sized) family Fϵ = {fθϵ | θϵ Θϵ}, where Θϵ is obtained by sampling u.a.r. from Θ. Hence, the task of robust inﬂuence maximization in the hyperparametric model intuitively reduces to ﬁnding a procedure to perform robust optimization on a ﬁnite set of functions.

The following lemma formalizes this intuition by stating that, under mild conditions, α-approximate continuous robust optimization reduces to α-approximate discrete robust optimization. The proof is deferred to Appendix A.

Lemma 5. Let F = {fθ : 2V R | θ Θ [ B, B]d} be a family of inﬂuence functions. Consider a family

Robust Inﬂuence Maximization for Hyperparametric Models

Algorithm 1 HIRO: Hyperparam Inf Robust Optimizer

Input: G = (V, E), {xe}e E, H : Θ X [0, 1], ϵ, δ

T O d(log n+log log 1

δ ) ϵ2 , η log l

f1, . . . , fl SAMPLE(G, H, Θ, {xe}e E) w1[1], . . . , w1[l] 1/l, . . . , 1/l for each time step t [T] do

St GREEDY(Pl i=1 wt[i]fi) for each i [l] do

end for end for Output: select S u.a.r. from {S1, S2, . . . , ST }

Fϵ F s.t. Fϵ ϵ-covers F. Then, α-approximate robust optimization on F reduces to α-approximate robust optimization on Fϵ. That is, an algorithm that returns ˆS V :

min fθ Fϵ fθ( ˆS) α max S V min fθ Fϵ fθ(S)

implies an algorithm that for any ϵ > 0 returns ˆS V s.t.:

min fθ F fθ( ˆS) α max S V min fθ F fθ(S) ϵ.

The HIRO algorithm. Given the reduction from the continuous hyperparametric problem to the discrete we can now describe the Hyperparameteric Inﬂuence Robust Optimizer (HIRO) Algorithm 1 which gives an optimal bi-criteria approximation to the robust inﬂuence maximization in the hyperparametric setting. HIRO takes as input a graph G, the edge features {xe}e E, the hyperparametric model (H, Θ) that dictates the edge probabilities and an error parameter ϵ that controls the quality of the returned solution. The output is a set of k nodes. It starts by sampling l points from Θ and hence, constructing l different inﬂuence functions that serve as a proxy for the continuous problem. It assigns uniform weights to these functions and runs MWU for a number of steps that depends on ϵ, where in each step higher emphasis is placed on the ones with poor historical performance. In every iteration it optimizes a convex combination of the functions, which is possible since all fis are monotone submodular, and hence amenable to optimization using the GREEDY algorithm. It keeps the outcome as a candidate solution. In the end, one of the candidate solutions is returned u.a.r. The intuition is that since each solution is good for some iteration of the algorithm (meaning for some speciﬁc weighting of the fis), on expectation the solution

that is returned is good for all the functions that performed poorly for some iteration of the MWU and hence, robust.

Theorem 1. HIRO with error parameter ϵ, runs in time poly(n, ϵ, log(1/δ)) and returns the uniform distribution U over solutions {S1, . . . , ST }, s.t. with probability at least 1 δ:

min θ Θ E S U[fθ(S)] 1 1

max S:|S| k min θ Θ fθ(S) 2ϵ.

where T O d(log n+log log 1

Proof. From Corollary 1 we know that constructing l different inﬂuence functions f1, . . . , fl by sampling l = O d LBd

δ points u.a.r. from Θ yields an ϵ-cover of Θ with probability at least 1 δ.

Since L n m (Lemma 2) and d is constant according to the main assumption of the hyperparametric model, we know that constructing the cover takes polynomial time.

(Chen et al., 2017) proved that for inﬂuence functions f1, . . . , fl, the MWU procedure, run for T iterations, with a learning rate of of η = log(l)/2T that uses GREEDY as an approximate best-response oracle, returns the uniform distribution over T solutions s.t.:

min i [l] E ˆ S[fi( ˆS)] 1 1

max S min i [l] fi(S) O

By applying Lemma 5 we get:

min θ Θ E ˆ S U[fθ( ˆS)] 1 1

max S min θ Θ fθ(S)-ϵ-O

For l O d LBd

δ we get the desired bound by setting

T O d(log n+log log 1

δ ) ϵ2 as required, since L B O(n).

The result of the previous theorem implies the following bicriteria approximation guarantee by returning ˆS = T t=1St instead of the uniform distribution over the {S1, . . . , ST }.

Corollary 2. HIRO with error parameter ϵ, returns a set ˆS of size at most O d(log n+log log(1/δ))

ϵ2 k which, with probability at least 1 δ, satisﬁes:

min θ Θ fθ( ˆS) 1 1

max S:|S| k min θ Θ fθ(S) 2ϵ.

5. Lower Bound

In this section, we prove that a structural assumption such as the hyperparametric restriction of the IC model is vital for

Robust Inﬂuence Maximization for Hyperparametric Models

robust inﬂuence maximization, since otherwise we might need to sample a number of functions that is exponential in n. We do so by providing a strong hardness result: deﬁne the family of functions Fp = {fp | p P} for some ﬁnite P. We prove that the problem max S:|S| k minp P fp(S) is NP-hard to approximate within any constant factor with a reasonable bicriteria approximation.

This is in sharp contrast with the main result about hyperparametric robust inﬂuence maximization that we proved in Theorem 1 and Corollary 2 since, assuming the hyperparametric model, P is always of polynomial size and hence we only need to increase our budget by a factor of O(log n). In Theorem 2 we formalize the impossibility result. We provide a proof sketch, the full proof is in Appendix B.

Theorem 2. Let G be a graph with n nodes and m edges, and δ, ϵ > 0. There is no algorithm to ﬁnd a set ˆS of size | ˆS| (1 δ) ln |P| k that achieves an approximation factor better than O 1 n1 ϵ to the problem max S:|S| k minp P fp(S), where P {λ, 1 λ}m, λ = o(1), and |P| Ω(poly(n)), unless P = NP.

This essentially means that if in our problem, we need to construct a cover that contains more than a polynomial number of functions then, the best bicriteria approximation we can hope for will contain signiﬁcantly more than k nodes.

Proof. Our reduction is based on (He & Kempe, 2016). The authors there prove the hardness of a different version of robust inﬂuence maximization where the objective is to approximate well the individual optima for a set of different inﬂuence functions instead of inﬂuencing as many nodes as possible in the worst case. We reduce from GAP SET COVER. The use of the gap version of the problem is to show that even if we augment the budget of nodes by a factor of roughly log |P| the problem remains NP-hard.

Given an instance of gap set cover, we construct a bipartite graph on n nodes, m edges, and poly(n) different probability sets that correspond to the different inﬂuence functions, such that when there is a set cover of size at most k then there exists a seed set S for which all the inﬂuence functions have high value, while when there is no set cover of size log n k, then there is at least one inﬂuence function that has low value for any seed set, even if we allow sets of size log n k. The asymptotic difference between the values of the objective functions, with and without the cover set, enables the decision of the gap set cover.

6. Experiments

To measure the empirical performance of the robust hyperparametric approach we conducted four sets of experiments: (1) First, we examine how many functions are empirically required for the sufﬁcient covering of F = {fθ | θ Θ}

and observe that in practice it is far smaller than the theoretical worst-case upper bound of Corollary 1. (2) We examine the rate of convergence of HIRO to the robust solution and show substantially faster convergence than in Theorem 1. (3) We benchmark the performance of HIRO against other methods for robust inﬂuence maximization and observed that it consistently outperforms all previous methods. (4) We measure how well a seed set of size k performs compared to a seed set of slightly augmented budget according to the bi-criteria approximation guarantee (Corollary 2).

Graphs. We generated four different synthetic networks using standard random graph models to analyze the impact of topological variations among different social networks. All networks were generated with n = 500 vertices. We used the Barab asi-Albert preferential attachment model where each new node is connected to 4 (preferably highdegree) existing ones, the Watts-Strogatz small world model where each node is connected to 5 nodes in the ring topology and the probability of rewiring an edge is 3/n, Erd os-R enyi random graph model with edge-construction probability p = 3/n and the conﬁguration model with a power law degree distribution and α = 2. For a more detailed description of these models please refer to Appendix C.

Hyperparameric model. We used the sigmoid function as the hyperparameteric model to determine the diffusion probabilities, i.e. h(θ xe) = 1 1+exp( θ xe) as in (Kalimeris et al., 2018). We generated d random features in [ 1, 1] for every edge. We used d = 5, however our results are consistent across a large range of dimensions d and featuregenerating techniques, such as normal or uniform distributions over the unit hyper-cube [ 1, 1]d and it s discrete analog { 1, 1}d. We sampled Θϵ = {θ1, . . . , θl} from Θ = [ 1, 1]d and generated the family of inﬂuence functions Fϵ = {fi | θi Θϵ} for l = 20. In addition we set T = 10 HIRO iterations with the exception of Experiments 1 and 2, where l and T are the free variable, respectively.

Benchmarks. We benchmark the performance of HIRO with respect to the following algorithms:

Random Seed: We select k nodes u.a.r.; to account for variance, we average over the solutions from 100 trials. Top k Degree nodes: In this benchmark we chose the k highest-degree nodes in the graph. Random Greedy: In robust optimization problems, a typical approach is to randomize over the best strategies against any possible inﬂuence function. Consequently, this method runs GREEDY algorithm on every function fi for θi Θε and chooses one of the outputs uniformly at random. LUGREEDY: We compare ourselves against Algorithm 2 in (Chen et al., 2016a) and refer to is as LUGREEDY. LUGREEDY is an algorithm specially developed for robust inﬂuence maximization.

Robust Inﬂuence Maximization for Hyperparametric Models

Figure 4. Comparison to benchmark solutions of the robust problem.

Figure 5. Illustration of the gap between a regular seed set size and the bi-criteria augmented budget.

It is oblivious to the structure of the model and only accounts for the conﬁdence interval of each edge. LUGREEDY produces two solutions, by running the GREEDY algorithm twice - over both the lower and upper boundaries of the conﬁdence intervals. Then, the algorithm chooses the best solution out of the two, assuming the lower boundaries as true probabilities. LUGREEDY achieves the best approximation for the robust ratio objective, mentioned in the introduction.

6.1. Experimental results

We perform 50 trials for each experiment and plot mean and standard deviation in the ﬁgures.

Experiment 1. (Figure 6, Appendix C) The goal of this experiment is to understand the sample complexity of the hyperparameter, i.e. how many functions do we need to sample to approximate the robust solution accurately. We sample l = 50 values from Θ as a benchmark (can be seen as a test/validation set). In that way we create a cover Fε that serves as a proxy for F = {fθ| θ Θ}. We plot min 1 i l fi(Sr) where Sr is the result of running HIRO over

r {1, 10, 20, 30, 40, 50} functions. We plot three such trials with different seed sizes k {10, 25, 50}. Figure 6, deferred to the appendix, demonstrates that in practice sample complexity plateaus, even though theoretically increase in the number of functions should bring about an increase in the value of the robust solution.

Experiment 2. (Figure 7, Appendix C) We show that HIRO converges rapidly in practice, that is, even after a constant number of iterations, HIRO achieves great results. We run the experiment three times, for different values

of k {10, 25, 50}. For each trial, we run HIRO for a T {1, 5, 10, 15} number of iterations. Figure 7, found in the appendix, illustrates this inquiry. Notwithstanding the small world model, which has slow growth, other plots indeed seem to converge quickly for all values of k.

Experiment 3. (Figure 4) This experiment compares HIRO with the benchmarks and illustrates consistency of our algorithm. We plot the value of a benchmark as a function of the size of the seed set. We can see that across all generative models and proposed benchmarks, even after few iterations, our algorithm performs at the very top, with the lowest variance. It is readily seen that other benchmarks are competitive when the seed set is small, but as the seed set grows so does the gap in performance of the best heuristic.

Experiment 4. (Figure 5) We evaluate the gap in the value of the robust solution between a seed set of size k and a seed set of size β k log n. Recall that the HIRO algorithm chooses at random among T solutions a seed with size k. Instead, we take a union of these solutions and return a sizek subset of the union, where β (0, 1]. Here we choose k = 10 and we report the results in Figure 5.

7. Conclusion

In this paper, we proposed a new formulation of robust inﬂuence maximization by utilizing a very broad class of hyperparametric models. We provided an efﬁcient reduction from continuous to discrete robust inﬂuence maximization and an optimal and computationally tractable algorithm for the problem in terms of bi-criteria approximation. We empirically assessed its performance and found that it consistently surpasses state-of-the-art methods.

Robust Inﬂuence Maximization for Hyperparametric Models

Adiga, A., Kuhlman, C., Mortveit, H., and Kumar, A. Sensitivity of diffusion dynamics to network uncertainty. In AAAI Conf. on Artiﬁcial Intelligence, 2013.

Anari, N., Haghtalab, N., Naor, J. S., Pokutta, S., Singh, M., and Torricok, A. Structured robust submodular maximization: ofﬂine and online algorithms. In The 22nd International Conference on Artiﬁcial Intelligence and Statistics (AISTATS), 2019.

Aral, L. and Dhillon, P. Social inﬂuence maximization under empirical inﬂuence models. In Nature Human Behavior, pp. 375 382, 2018.

Chen, R., Lucier, B., Singer, Y., and Syrgkanis, V. Robust optimization for non-convex objectives. In Annual Conference on Neural Information Processing Systems (NIPS), 2017.

Chen, W., Lin, T., Tan, Z., Zhao, M., and Zhou, X. Robust inﬂuence maximization. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, 2016a.

Chen, W., Wang, Y., and Yuan, Y. Combinatorial multiarmed bandit and its extension to probabilistically triggered arms. In Journal of Machine Learning Research, volume 17, pp. 1 33, 2016b.

Dinur, I. and Steurer, D. Analytical approach to parallel repetition. In In Proceedings 45th ACM Symposium on Theory of Computing (STOC),, pp. 624 633, 2014.

Domingos, P. and Richardson, M. Mining the network value of customers. In Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 57 66. ACM, 2001.

Feige, U. A threshold of ln n for approximating set cover. In Journal of the ACM (JACM), volume 45(4), pp. 634?652, 1998.

Goyal, A., Bonchi, F., and Lakshmanan, V. A data-based approach to social inﬂuence maximization. In Proc. VLDB Endowment, 5(1), pp. 73 84, 2011.

He, X. and Kempe, D. Stability of inﬂuence maximization. Co RR, abs/1501.04579, 2015. URL http://arxiv. org/abs/1501.04579.

He, X. and Kempe, D. Robust inﬂuence maximization. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, 2016.

Kalimeris, D., Singer, Y., Subbian, K., and Weinsberg, U. Learning diffusion using hyperparameters. In International Conference on Machine Learning (ICML), 2018.

Karp, R. M. Reducibility among combinatorial problems. In

In R. E. Miller and J. W. Thatcher (editors). Complexity of Computer Computations. New York: Plenum., pp. 85 103, 1972.

Kempe, D., Kleinberg, J., and Tardos, E. Maximizing the spread of inﬂuence through a social network. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 137 146. ACM, 2003.

Krause, A., Mc Mahan, H. B., Guestrin, C., and Gupta, A. Robust submodular observation selection. In Journal of Machine Learning Research, volume 9, pp. 2761 2801, 2008.

Nemhauser, G. L., Wolsey, L. A., and Fisher, M. L. An analysis of approximations for maximizing submodular set functions. In Mathematical Programming 14(1), pp. 265 294, 1978.

Ohsaka, N. and Yoshida, Y. Portfolio optimization for inﬂuence spread. In Proceedings of the 26th International Conference on World Wide Web, WWW 2017, Perth, Australia, April 3-7, 2017, pp. 977 985, 2017. doi: 10.1145/3038912.3052628. URL https://doi. org/10.1145/3038912.3052628.

Vaswani, S., Kveton, B., Wen, Z., Ghavamzadeh, M., Lakshmanan, L. V. S., and Schmidt, M. Model-independent online learning for inﬂuence maximization. In International Conference on Machine Learning (ICML), 2017.

Wen, Z., Kveton, B., and Ashkan, A. Efﬁcient learning in large-scale combinatorial semi-bandits. In International Conference on Machine Learning (ICML 2015), pp. 1113 1122, 2015.