# robust_influence_maximization_for_hyperparametric_models__3891f058.pdf Robust Influence Maximization for Hyperparametric Models Dimitris Kalimeris 1 Gal Kaplun 1 Yaron Singer 1 In this paper we study the problem of robust influence maximization in the independent cascade model under a hyperparametric assumption. In social networks users influence and are influenced by individuals with similar characteristics and as such they are associated with some features. A recent surging research direction in influence maximization focuses on the case where the edge probabilities on the graph are not arbitrary but are generated as a function of the features of the users and a global hyperparameter. We propose a model where the objective is to maximize the worst-case number of influenced users for any possible value of that hyperparameter. We provide theoretical results showing that proper robust solution in our model is NP-hard and an algorithm that achieves improper robust optimization. We make-use of sampling based techniques and of the renowned multiplicative weight updates algorithm. Additionally we validate our method empirically and prove that it outperforms the state-of-the-art robust influence maximization techniques. 1. Introduction In this paper we study robust influence maximization for hyperparametric diffusion models. First studied by Domingos and Richardson (Domingos & Richardson, 2001) and later elegantly formulated in seminal work by Kempe, Kleinberg, and Tardos (Kempe et al., 2003), influence maximization is the algorithmic task of selecting a small set of individuals who can effectively spread information in a network. The problem we formulate and address in this paper pertains to influence maximization for cases in which the information spread model in the network is subject to some uncertainty. 1Department of Computer Science, Harvard University, Cambridge, MA, USA. Correspondence to: Dimitris Kalimeris , Gal Kaplun . Proceedings of the 36 th International Conference on Machine Learning, Long Beach, California, PMLR 97, 2019. Copyright 2019 by the author(s). The most well studied model for information spread is the celebrated Independent Cascade (IC) model. In this model the social network is modeled by a graph and every pair of nodes u, v that are connected with an edge e = (u, v) E are associated with a probability pe that quantifies the probability of u spreading information to v. Information spread in this model stochastically progresses from a set of nodes that initiates information to the rest of the nodes in the network as dictated by the graph topology and probabilities encoded on the edges. Influence maximization is then the algorithmic task of selecting a fixed set of individuals that maximize the expected number of nodes that receive information. More formally, given a graph G = (V, E, p) where V is the set of nodes, E is the set of edges, and p [0, 1]|E| is the vector of edge probabilities, and a parameter k |V |, influence maximization is the optimization problem: max S:|S| k fp(S) where fp(S) is the expected number of nodes in the network that receive information when S is the initial set of nodes that spreads information. In their seminal work, Kempe, Kleinberg, and Tardos proved that when p is known influence maximization can be reduced to monotone submodular maximization under a cardinality constraint. Consequently, a simple greedy algorithm that iteratively selects the node whose marginal influence is approximately maximal obtains a solution that is arbitrarily close to a 1 1/e factor of optimal (Nemhauser et al., 1978). 1.1. Influence maximization under model uncertainty In recent years there has been a growing concern regarding the sensitivity of influence maximization to model uncertainty (Goyal et al., 2011; Adiga et al., 2013). In particular, small perturbations or uncertainty regarding the probability vector can have dramatic effects on the quality of a solution (see example illustrated in Figure 1), and even approximating the stability of an instance to small perturbations is intractable (He & Kempe, 2015). 1.2. Influence maximization under different models The sensitivity to small errors that we mentioned in the previous subsection is one issue. The second issue that one needs to consider is that we might actually have several Robust Influence Maximization for Hyperparametric Models Figure 1. This graph indicates the sensitivity of the influence maximization task to minor differences in the diffusion probabilities and hence the necessity of robust optimization. If all the diffusion probabilities are 1, selecting the node in the beginning of the chain maximizes the influence. However, if the edge probabilities are 1 ϵ it will be optimal to select the node in the end of the chain. models we want to optimize over. For example consider the case where a clothing company wants to advertise shirts and sweaters. The probabilities in the graph will be slightly different but we expect that the set of influential nodes will be more or less the same. Hence, it makes sense to try to identify a set of nodes that are influential, for all the underlying products, i.e. robust for the different models that we care (where each model induces different probabilities). Robust influence maximization. To account for model uncertainty there has recently been a growing body of literature on robust influence maximization (He & Kempe, 2016; Chen et al., 2016a; 2017; Ohsaka & Yoshida, 2017; Anari et al., 2019) where the goal is to find a set of nodes whose influence is maximal over an entire set P of models: arg max S:|S| k min p P fp(S) For a general set of models P it is easy to see that the robust influence maximization problem is either trivial or intractable. Specifically, if we have confidence intervals for the diffusion probabilities, i.e. pe [c e , c+ e ] for all edges e then max S minp fp(S) simplifies into max S fp (S) where p = (c e )e E due to the monotonicity of fp in p. In general P can be exponentially large hence the problem is intractable. Natural approaches like discretization, sampling, or maximin optimization over the influence function will fail to work for two computational reasons: 1) The space is of exponential dimension in |E| and 2) The influence function is highly non concave-convex, a form of functions that is amenable to max-min optimization. To circumvent these difficulties, previous work on robust influence maximization have taken two different approaches. The first approach solves the max-min objective but assuming that the number of models is polynomial in the size of the problem (e.g. (Chen et al., 2017; Anari et al., 2019)). The second focuses on the robust ratio ρ(S) := min p P fp(S)/fp(S p) where S p denotes the optimal solution for fp (He & Kempe, 2016; Chen et al., 2016a) which is a natural direction that comes with the caveat of not optimizing for the total number of nodes that are influenced. However, optimizing for the robust ratio is not the right Figure 2. Optimizing for the robust ratio may cost a multiplicative factor of n in the worst case. Here, n is the number of blue nodes and we consider k = 1. In f1, the optimal solution is node v while in f2 it is u. The table shows the robust ratios and the expected number of influenced nodes for nodes u and v in the form (ratio, expected). The ratio objective will choose node u, influencing only 1 node in expectation. However, the direct approach will select node v for n influenced nodes in expectation. Maximizing the robust ratio is not the right objective to consider. objective to consider and it can be up to a factor of n worse than the real robust solution as proved in the following lemma (proof in Appendix A) and illustrated in Figure 2. Lemma 1. Let P be a set of influence functions. Consider the solutions to the objectives: ˆSr = arg max S:|S| k ρ(S) and ˆSv = arg max S:|S| k minp P fp(S). There exists a set of influence functions P for which minp fp( ˆSr) = 1 n minp fp( ˆSv), and this approximation ratio is tight. Robust optimization in hyperparameteric models. A recent line of work in influence maximization and learning in networks explores the interaction of correlations of edge probabilities with the influence spread (Wen et al., 2015; Vaswani et al., 2017; Aral & Dhillon, 2018; Kalimeris et al., 2018). Specifically, it restricts the hypothesis class of IC by imposing correlations on the way that the probabilities are created. It assumes that each node in the network is associated with some features encoding information about it, for example social (age, etc.) or graph-related (degree, pagerank, etc.) characteristics. The influence probability between two nodes is a function of their features and a global low-dimensional hyperparameter θ. Such approaches have been shown to have sample complexity that only depends on the dimension d and importantly have been shown to be highly predictive on real information spread data collected from Facebook even with hyperparametric models parameterized by small number of dimensions (Kalimeris et al., 2018). Intuitively, instead of searching individual probabilities we need to search for a hyperparameter in a much smaller space and once we find the right one it pins down all the influence probabilities. Low-dimensional hyperparametric models circumvent the hardness associated with continuous spaces, as they impose structure and the complexity of the influence model is largely determined by the dimension d. The main question we address in this paper can be informally stated as follows: Robust Influence Maximization for Hyperparametric Models Is there a computationally efficient algorithm to perform robust optimization for hyperparametric models? 1.3. Main result In this paper we show that in contrast to general influence models, the hyperparamteric approach enables tractable solutions to the robust influence maximization problem. At a high level, we show that by a simple sampling procedure one can find an efficient reduction from continuous to discrete robust influence maximization that allows us to approximate the value max min f instead of the ratio. 1.4. Paper Organization We begin by formalizing the robust influence maximization problem and the hyperparametric model in Section 2. In Section 3 we describe the main technical result of the paper which allows reducing the continuous robust optimization problem to a discrete problem. In Section 4 we use this reduction and introduce the Hyperparametric Influence Robust Optimizer (HIRO) algorithm for robust influence maximization. In Section 5 we provide a strong hardness result that shows the NP-hardness of robust optimization, even in the sense of bi-criteria approximation. Finally we evaluate the empirical performance of our algorithm in Section 6. 2. Preliminaries A social network is modeled by a graph G = (V, E) where V is the set of individuals in the network and E represents the friendships between them. We use n and m to denote |V | and |E| respectively. One of the core models for diffusion, the process through which information flows between the nodes of G, is the Independent Cascade model which was popularized in the seminal work of (Kempe et al., 2003). The Independent Cascade (IC) model. The IC model describes a discrete-step stochastic process through which diffusion spreads from a set of initially active individuals to the rest of the nodes in the network. Each node can be active or inactive and each edge e E in the network is associated with some probability pe. All nodes begin as inactive and at time step t = 0 a subset of nodes S V , called the seed set, is chosen and becomes active. At every time step t + 1, every node u that became active at time step t attempts to influence every of its non-active neighbors v, independently and succeeds with probability p(u,v). Influence functions. For a given graph G = (V, E) and vector of probabilities p [0, 1]m the influence function fp : 2V R measures the expected number of nodes that will become influenced in the graph G for a seed set S V : A E r A(S) Y e/ A (1 pe) where r A(S) denotes the number of nodes that are reachable in G from S using only edges from A. An important property of fp is that it is monotone submodular for any p. Influence maximization. For a given influence function fp : 2[n] R and value k n, influence maximization is the optimization problem: max S:|S| k fp(S). The problem is NP-hard but since the influence function is monotone and submodular (Kempe et al., 2003) a simple greedy algorithm which iteratively selects nodes whose marginal contribution is largest obtains a solution that is a 1 1/e factor of the optimal solution (Nemhauser et al., 1978) and this approximation ratio is optimal unless P=NP (Feige, 1998). Robust influence maximization. Given a graph G = (V, E) and a set of probability vectors P the goal of robust influence maximization is to find a solution of size k that has high value for every possible influence function fp that can be generated by p P: max S:|S| k min p P fp(S) There are two sources of inapproximability known for this problem: first, since it a generalization of influence maximization it is NP-hard to get any approximation better than 1 1/e and we are therefore satisfied with solutions that are approximately optimal. The second source of inapproximability is due to the fact that our solution space {S : |S| k} is highly non-convex which makes it intractable to obtain proper solutions (Figure 3). In particular, it is NP-hard to find a set of size k that obtains any approximation better than O(log(n)) for the robust optimization problem (Chen et al., 2017). For this reason, we seek bi-criteria approximations. A solution ˆS is an (α, β) bi-criteria approximation to the max-min solution of size k if β| ˆS| k and: min p P fp( ˆS) α max S:|S| k min p P fp(S) Due to its sources of hardness the gold standard for robust influence maximization are 1 1/e, Ω(log 1(n)) bicreteria approximations (Chen et al., 2017; Krause et al., 2008; He & Kempe, 2016; Anari et al., 2019). Hyperparametric influence models. A hyperparametric model H : Θ X [0, 1] restricts the traditional IC model by imposing correlations between the probabilities of different edges. Each edge e = (u, v) is associated with a d-dimensional feature vector xe X [ 1, 1]d encoding information about its endpoints. The probability of u influencing v is a function of the features x(u,v), and a global hyperparameter θ Θ [ B, B]d, for some constant B > 0. That is: pe = H(θ, xe). The most standard hyperparametric models are Generalized Linear Models (GLM) for which: H(θ, xe) = h(θ xe) + ξe Robust Influence Maximization for Hyperparametric Models Figure 3. The green node denotes the optimal solution in each case. Any deterministic solution has robust influence of 1 (only the selected node). However, the distribution selecting uniformly from {u, v} (improper solution) has expected influence of n/2. where ξe is drawn from some bounded distribution. To ease the presentation, throughout this paper we treat the model as if ξe = 01. Our results hold for a family of generalized linear models that we define later which includes standard choices for h are linear, sigmoid, or the logarithm functions. Hyperparameter Influence Robust Optimization. For a given hyperparametric model H, set of features {xe}e E and hyperparameter space Θ [ B, B]d the set of possible diffusion probabilities is: H := {{H(θ, xe)}e E : θ Θ} and robust influence maximization then reduces to: min p H fp( ˆS) α max S:|S| k min p H fp(S) Throughout the paper we prove theorems for general feature spaces and hyperparametric models. When the hyperparametric model H and set of features is clear from context, it will be convenient to use fθ instead of fp where p is the the probability vector p generated by a hyperparametric model. We will use the abbreviated notation defined above: min θ Θ fθ( ˆS) α max S:|S| k min θ Θ fθ(S) 3. Robust Optimization In this section we describe the main result of the paper. We show that for an extremely broad class of hyperparametric models robust influence maximization is computationally tractable. More specifically, we show that for generalized linear hyperparameteric models that are 1Lipschitz, for any ϵ > 0, a natural sampling procedure from the hyperparameter space Θ = [ B, B]d generates l O d n m d δ influence functions f1, . . . , fl such that with probability 1 δ, robust continuous optimization over Θ reduces to robust discrete optimization over the functions {fi}l i=1. That is, an algorithm that returns ˆS V : | ˆS| k satisfying: min i [l] fi( ˆS) α max S:|S| k min i [l] fi(S) 1We note that the results carry over when ξe is drawn from a distribution with mean 0 and bounded support. implies the existence of an algorithm for which: min θ Θ fθ( ˆS) α max S:|S| k min θ Θ fθ(S) ϵ. This reduction from the continuous space of infinitelymany functions {fθ}θ Θ to polynomially-many functions {fi}i [l] is handled in two steps. We first prove that influence functions that are generated by a class of hyperparametric models that we call stable have bounded Lipschitzness (Section 3.1). Using this property we then prove that sampling polynomially-many functions from the hyperparametric model suffices to obtain approximation to the robust objective (Section 3.2). Finally, we show how to produce near optimal solutions to the robust optimization problem defined on {fθ}θ Θ by implementing a best-response oracle on a set of sampled functions {fi}i [l] using a Multiplicative Weight Updates (MWU) procedure (Section 4). 3.1. Stability implies Lipschitzness We now prove that fθ is L-Lipschitz for L poly(n) if the hyperparametric model that generates it is stable. Definition 1. A hyperparametric model H : Θ X [0, 1] is stable if it is a generalized linear model that is 1-Lipschitz with respect to the ℓ1 norm, i.e. for θ Θ, xe X we have that H(θ, xe) = h(θ xe) and for every θ, θ Θ: |h(θ xe) h(θ xe)| θ θ 1 It is easy to verify that the hyperparametric models used in influence maximization literature are stable: linear H(θ, xe) = θ xe, logistic H(θ, xe) = 1 1+exp( θ xe) and probit H(θ, xe) = Φ(θ xe) where Φ is the CDF of the standard Gaussian, appear in (Wen et al., 2015; Vaswani et al., 2017; Kalimeris et al., 2018) and are all stable. Intuitively, stable hyperparametric models are not sensitive to small changes of the hyperparameter. Hence, despite the fact that a modification on θ affects the probabilities in all edges, the difference is not very large and we are able to bound the absolute change in the influence function. The following two lemmas are inspired by (Chen et al., 2016a) and (Chen et al., 2016b) who proved similar results for non-hyperparametric models. Lemma 2. Assume that the hyperparametric model is stable. Then, the influence function fθ is Lipschitz with respect to the ℓ1 norm with Lipschitz constant L = nm. Proof. Since H is stable we get that for every e E and every two edge probabilities pe, p e produced by the hyperparametric model by parameters θ, θ Θ: |pe p e| = |H(θ, xe) H(θ , xe)| (1) = |h(θ xe) h(θ xe)| θ θ 1. (2) Robust Influence Maximization for Hyperparametric Models Hence, θ θ 1 ϵ implies |pe p e| ϵ. Now notice that the influence function is monotone with respect to the diffusion probabilities. As a result, the maximum change that can occur given the constraint |pe p e| ϵ is when p e = min{pe + ϵ, 1} (or p e = max{0, pe ϵ}) for all e E. We focus on the first case; the second is identical. Fix a seed set S. When the probability of an edge e = (u, v) is increased from pe to pe + ϵ, there is an increase in activation probability for all the nodes that are reachable from S through u by at most ϵ. There are at most n nodes that are reachable through u so the total change in the influence of set S in that case is nϵ. Using the same argument for each edge in the network gives the desired bound of nmϵ. In our method, the Lipschitz parameter is polynomially related to the complexity of implementing a best-response oracle in the MWU procedure. The fact that the Lipschitzness L is polynomial in the number of nodes in the graph n is, therefore, a crucial property as it makes the robust optimization problem computationally tractable. Tightness of the Lipschitz constant. The Lipschitzness of the function polynomially determines the computational complexity of our method. As we show, the Lipschitzness bound L = n m is tight in the worst case over all possible graphs (proof is deferred to Appendix A). Lemma 3. There is a graph for which any influence function generated by a stable non-trivial2 generalized linear model that is continuous has Lipschitzness L = nm. 3.2. Covering Θ via sampling The Lipschitzness property from Lemma 2 is useful since it bounds the change in the estimation of influence function fθ when we approximate it with another function fθ for θ, θ that are ϵ-close, i.e. θ θ 1 ϵ. Thus, if we can construct a set of representative parameters Θϵ s.t. θ Θ there is some θ which is ϵ close with respect to the ℓ1 norm we can reduce the robust optimization problem over the infinite hyperparameter space to the finite case. We know now this can be done using a reasonable number of samples from Θ. Lemma 4. Let Θ = [ B, B]d and ϵ, δ > 0. If we sample a set Θϵ of size s O d Bd δ uniformly at random from Θ, then with probability at least 1 δ, for any θ Θ there exists θ Θϵ such that θ θ 1 ϵ. Proof. It suffices to partition the hypercube [ B, B]d into ℓ1 balls of radius ϵ and bound the number of points necessary to sample in order to have at least one point in each ℓ1 ball. Since every point in Θ is covered by a ball and every two 2h : R [λ, 1 λ] is non-trivial if it is continuous and surjective for some xe X. points in the ball are close, sampling points that correspond to every ball is a representative set of parameters that are ϵ-close w.r.t. the ℓ1 distance to any parameter in Θ. It is well known that r = 2B d ϵ d balls of ℓ1 radius ϵ suffice to cover the d-dimensional hypercube [ B, B]d. Let b1, . . . , br denote the covering balls and θ1, . . . , θs be points drawn uniformly at random from Θ = [ B, B]d. Then: P[ bj such that θi bj] X j P[ θi bj] r P[ θi b1] Using s O d( Bd ϵ )d log Bd ϵδ samples we get: P[ bj such that θi bj] δ Thus, since Θ jbj and the radius of the balls is ϵ we get that for every θ Θ there exists a θ {θi}i [s] that is ϵ close w.r.t. ℓ1 norm with probability at least 1 δ. Furthermore, we can extend the notion of covering a convex space Θ to the coverage of a family of functions. Definition 2. Let F = {fθ : 2V R | θ Θ} be a family of influence functions and Fϵ F such that |Fϵ| < . We say that Fϵ ϵ-covers F if for any f F there exists an fϵ Fϵ and any S V s.t. |f(S) fϵ(S)| ϵ. The following corollary is obtained by fusing lemmas 2 and 4 with Definition 2. Corollary 1. Let F = {fθ : 2V R | θ Θ} be the family of influence functions and let Fϵ be sampled uniformly at random from F, such that |Fϵ| O d LBd δ . Then Fϵ ϵ-covers F with probability at least 1 δ. 4. Reducing Continuous to Discrete RO In the previous section we proved that since the influence function is Lipschitz, the infinite family of functions F = {fθ | θ Θ} can be well-approximated by the finite (and crucially, polynomially-sized) family Fϵ = {fθϵ | θϵ Θϵ}, where Θϵ is obtained by sampling u.a.r. from Θ. Hence, the task of robust influence maximization in the hyperparametric model intuitively reduces to finding a procedure to perform robust optimization on a finite set of functions. The following lemma formalizes this intuition by stating that, under mild conditions, α-approximate continuous robust optimization reduces to α-approximate discrete robust optimization. The proof is deferred to Appendix A. Lemma 5. Let F = {fθ : 2V R | θ Θ [ B, B]d} be a family of influence functions. Consider a family Robust Influence Maximization for Hyperparametric Models Algorithm 1 HIRO: Hyperparam Inf Robust Optimizer Input: G = (V, E), {xe}e E, H : Θ X [0, 1], ϵ, δ T O d(log n+log log 1 δ ) ϵ2 , η log l f1, . . . , fl SAMPLE(G, H, Θ, {xe}e E) w1[1], . . . , w1[l] 1/l, . . . , 1/l for each time step t [T] do St GREEDY(Pl i=1 wt[i]fi) for each i [l] do end for end for Output: select S u.a.r. from {S1, S2, . . . , ST } Fϵ F s.t. Fϵ ϵ-covers F. Then, α-approximate robust optimization on F reduces to α-approximate robust optimization on Fϵ. That is, an algorithm that returns ˆS V : min fθ Fϵ fθ( ˆS) α max S V min fθ Fϵ fθ(S) implies an algorithm that for any ϵ > 0 returns ˆS V s.t.: min fθ F fθ( ˆS) α max S V min fθ F fθ(S) ϵ. The HIRO algorithm. Given the reduction from the continuous hyperparametric problem to the discrete we can now describe the Hyperparameteric Influence Robust Optimizer (HIRO) Algorithm 1 which gives an optimal bi-criteria approximation to the robust influence maximization in the hyperparametric setting. HIRO takes as input a graph G, the edge features {xe}e E, the hyperparametric model (H, Θ) that dictates the edge probabilities and an error parameter ϵ that controls the quality of the returned solution. The output is a set of k nodes. It starts by sampling l points from Θ and hence, constructing l different influence functions that serve as a proxy for the continuous problem. It assigns uniform weights to these functions and runs MWU for a number of steps that depends on ϵ, where in each step higher emphasis is placed on the ones with poor historical performance. In every iteration it optimizes a convex combination of the functions, which is possible since all fis are monotone submodular, and hence amenable to optimization using the GREEDY algorithm. It keeps the outcome as a candidate solution. In the end, one of the candidate solutions is returned u.a.r. The intuition is that since each solution is good for some iteration of the algorithm (meaning for some specific weighting of the fis), on expectation the solution that is returned is good for all the functions that performed poorly for some iteration of the MWU and hence, robust. Theorem 1. HIRO with error parameter ϵ, runs in time poly(n, ϵ, log(1/δ)) and returns the uniform distribution U over solutions {S1, . . . , ST }, s.t. with probability at least 1 δ: min θ Θ E S U[fθ(S)] 1 1 max S:|S| k min θ Θ fθ(S) 2ϵ. where T O d(log n+log log 1 Proof. From Corollary 1 we know that constructing l different influence functions f1, . . . , fl by sampling l = O d LBd δ points u.a.r. from Θ yields an ϵ-cover of Θ with probability at least 1 δ. Since L n m (Lemma 2) and d is constant according to the main assumption of the hyperparametric model, we know that constructing the cover takes polynomial time. (Chen et al., 2017) proved that for influence functions f1, . . . , fl, the MWU procedure, run for T iterations, with a learning rate of of η = log(l)/2T that uses GREEDY as an approximate best-response oracle, returns the uniform distribution over T solutions s.t.: min i [l] E ˆ S[fi( ˆS)] 1 1 max S min i [l] fi(S) O By applying Lemma 5 we get: min θ Θ E ˆ S U[fθ( ˆS)] 1 1 max S min θ Θ fθ(S)-ϵ-O For l O d LBd δ we get the desired bound by setting T O d(log n+log log 1 δ ) ϵ2 as required, since L B O(n). The result of the previous theorem implies the following bicriteria approximation guarantee by returning ˆS = T t=1St instead of the uniform distribution over the {S1, . . . , ST }. Corollary 2. HIRO with error parameter ϵ, returns a set ˆS of size at most O d(log n+log log(1/δ)) ϵ2 k which, with probability at least 1 δ, satisfies: min θ Θ fθ( ˆS) 1 1 max S:|S| k min θ Θ fθ(S) 2ϵ. 5. Lower Bound In this section, we prove that a structural assumption such as the hyperparametric restriction of the IC model is vital for Robust Influence Maximization for Hyperparametric Models robust influence maximization, since otherwise we might need to sample a number of functions that is exponential in n. We do so by providing a strong hardness result: define the family of functions Fp = {fp | p P} for some finite P. We prove that the problem max S:|S| k minp P fp(S) is NP-hard to approximate within any constant factor with a reasonable bicriteria approximation. This is in sharp contrast with the main result about hyperparametric robust influence maximization that we proved in Theorem 1 and Corollary 2 since, assuming the hyperparametric model, P is always of polynomial size and hence we only need to increase our budget by a factor of O(log n). In Theorem 2 we formalize the impossibility result. We provide a proof sketch, the full proof is in Appendix B. Theorem 2. Let G be a graph with n nodes and m edges, and δ, ϵ > 0. There is no algorithm to find a set ˆS of size | ˆS| (1 δ) ln |P| k that achieves an approximation factor better than O 1 n1 ϵ to the problem max S:|S| k minp P fp(S), where P {λ, 1 λ}m, λ = o(1), and |P| Ω(poly(n)), unless P = NP. This essentially means that if in our problem, we need to construct a cover that contains more than a polynomial number of functions then, the best bicriteria approximation we can hope for will contain significantly more than k nodes. Proof. Our reduction is based on (He & Kempe, 2016). The authors there prove the hardness of a different version of robust influence maximization where the objective is to approximate well the individual optima for a set of different influence functions instead of influencing as many nodes as possible in the worst case. We reduce from GAP SET COVER. The use of the gap version of the problem is to show that even if we augment the budget of nodes by a factor of roughly log |P| the problem remains NP-hard. Given an instance of gap set cover, we construct a bipartite graph on n nodes, m edges, and poly(n) different probability sets that correspond to the different influence functions, such that when there is a set cover of size at most k then there exists a seed set S for which all the influence functions have high value, while when there is no set cover of size log n k, then there is at least one influence function that has low value for any seed set, even if we allow sets of size log n k. The asymptotic difference between the values of the objective functions, with and without the cover set, enables the decision of the gap set cover. 6. Experiments To measure the empirical performance of the robust hyperparametric approach we conducted four sets of experiments: (1) First, we examine how many functions are empirically required for the sufficient covering of F = {fθ | θ Θ} and observe that in practice it is far smaller than the theoretical worst-case upper bound of Corollary 1. (2) We examine the rate of convergence of HIRO to the robust solution and show substantially faster convergence than in Theorem 1. (3) We benchmark the performance of HIRO against other methods for robust influence maximization and observed that it consistently outperforms all previous methods. (4) We measure how well a seed set of size k performs compared to a seed set of slightly augmented budget according to the bi-criteria approximation guarantee (Corollary 2). Graphs. We generated four different synthetic networks using standard random graph models to analyze the impact of topological variations among different social networks. All networks were generated with n = 500 vertices. We used the Barab asi-Albert preferential attachment model where each new node is connected to 4 (preferably highdegree) existing ones, the Watts-Strogatz small world model where each node is connected to 5 nodes in the ring topology and the probability of rewiring an edge is 3/n, Erd os-R enyi random graph model with edge-construction probability p = 3/n and the configuration model with a power law degree distribution and α = 2. For a more detailed description of these models please refer to Appendix C. Hyperparameric model. We used the sigmoid function as the hyperparameteric model to determine the diffusion probabilities, i.e. h(θ xe) = 1 1+exp( θ xe) as in (Kalimeris et al., 2018). We generated d random features in [ 1, 1] for every edge. We used d = 5, however our results are consistent across a large range of dimensions d and featuregenerating techniques, such as normal or uniform distributions over the unit hyper-cube [ 1, 1]d and it s discrete analog { 1, 1}d. We sampled Θϵ = {θ1, . . . , θl} from Θ = [ 1, 1]d and generated the family of influence functions Fϵ = {fi | θi Θϵ} for l = 20. In addition we set T = 10 HIRO iterations with the exception of Experiments 1 and 2, where l and T are the free variable, respectively. Benchmarks. We benchmark the performance of HIRO with respect to the following algorithms: Random Seed: We select k nodes u.a.r.; to account for variance, we average over the solutions from 100 trials. Top k Degree nodes: In this benchmark we chose the k highest-degree nodes in the graph. Random Greedy: In robust optimization problems, a typical approach is to randomize over the best strategies against any possible influence function. Consequently, this method runs GREEDY algorithm on every function fi for θi Θε and chooses one of the outputs uniformly at random. LUGREEDY: We compare ourselves against Algorithm 2 in (Chen et al., 2016a) and refer to is as LUGREEDY. LUGREEDY is an algorithm specially developed for robust influence maximization. Robust Influence Maximization for Hyperparametric Models Figure 4. Comparison to benchmark solutions of the robust problem. Figure 5. Illustration of the gap between a regular seed set size and the bi-criteria augmented budget. It is oblivious to the structure of the model and only accounts for the confidence interval of each edge. LUGREEDY produces two solutions, by running the GREEDY algorithm twice - over both the lower and upper boundaries of the confidence intervals. Then, the algorithm chooses the best solution out of the two, assuming the lower boundaries as true probabilities. LUGREEDY achieves the best approximation for the robust ratio objective, mentioned in the introduction. 6.1. Experimental results We perform 50 trials for each experiment and plot mean and standard deviation in the figures. Experiment 1. (Figure 6, Appendix C) The goal of this experiment is to understand the sample complexity of the hyperparameter, i.e. how many functions do we need to sample to approximate the robust solution accurately. We sample l = 50 values from Θ as a benchmark (can be seen as a test/validation set). In that way we create a cover Fε that serves as a proxy for F = {fθ| θ Θ}. We plot min 1 i l fi(Sr) where Sr is the result of running HIRO over r {1, 10, 20, 30, 40, 50} functions. We plot three such trials with different seed sizes k {10, 25, 50}. Figure 6, deferred to the appendix, demonstrates that in practice sample complexity plateaus, even though theoretically increase in the number of functions should bring about an increase in the value of the robust solution. Experiment 2. (Figure 7, Appendix C) We show that HIRO converges rapidly in practice, that is, even after a constant number of iterations, HIRO achieves great results. We run the experiment three times, for different values of k {10, 25, 50}. For each trial, we run HIRO for a T {1, 5, 10, 15} number of iterations. Figure 7, found in the appendix, illustrates this inquiry. Notwithstanding the small world model, which has slow growth, other plots indeed seem to converge quickly for all values of k. Experiment 3. (Figure 4) This experiment compares HIRO with the benchmarks and illustrates consistency of our algorithm. We plot the value of a benchmark as a function of the size of the seed set. We can see that across all generative models and proposed benchmarks, even after few iterations, our algorithm performs at the very top, with the lowest variance. It is readily seen that other benchmarks are competitive when the seed set is small, but as the seed set grows so does the gap in performance of the best heuristic. Experiment 4. (Figure 5) We evaluate the gap in the value of the robust solution between a seed set of size k and a seed set of size β k log n. Recall that the HIRO algorithm chooses at random among T solutions a seed with size k. Instead, we take a union of these solutions and return a sizek subset of the union, where β (0, 1]. Here we choose k = 10 and we report the results in Figure 5. 7. Conclusion In this paper, we proposed a new formulation of robust influence maximization by utilizing a very broad class of hyperparametric models. We provided an efficient reduction from continuous to discrete robust influence maximization and an optimal and computationally tractable algorithm for the problem in terms of bi-criteria approximation. We empirically assessed its performance and found that it consistently surpasses state-of-the-art methods. Robust Influence Maximization for Hyperparametric Models Adiga, A., Kuhlman, C., Mortveit, H., and Kumar, A. Sensitivity of diffusion dynamics to network uncertainty. In AAAI Conf. on Artificial Intelligence, 2013. Anari, N., Haghtalab, N., Naor, J. S., Pokutta, S., Singh, M., and Torricok, A. Structured robust submodular maximization: offline and online algorithms. In The 22nd International Conference on Artificial Intelligence and Statistics (AISTATS), 2019. Aral, L. and Dhillon, P. Social influence maximization under empirical influence models. In Nature Human Behavior, pp. 375 382, 2018. Chen, R., Lucier, B., Singer, Y., and Syrgkanis, V. Robust optimization for non-convex objectives. In Annual Conference on Neural Information Processing Systems (NIPS), 2017. Chen, W., Lin, T., Tan, Z., Zhao, M., and Zhou, X. Robust influence maximization. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, 2016a. Chen, W., Wang, Y., and Yuan, Y. Combinatorial multiarmed bandit and its extension to probabilistically triggered arms. In Journal of Machine Learning Research, volume 17, pp. 1 33, 2016b. Dinur, I. and Steurer, D. Analytical approach to parallel repetition. In In Proceedings 45th ACM Symposium on Theory of Computing (STOC),, pp. 624 633, 2014. Domingos, P. and Richardson, M. Mining the network value of customers. In Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 57 66. ACM, 2001. Feige, U. A threshold of ln n for approximating set cover. In Journal of the ACM (JACM), volume 45(4), pp. 634?652, 1998. Goyal, A., Bonchi, F., and Lakshmanan, V. A data-based approach to social influence maximization. In Proc. VLDB Endowment, 5(1), pp. 73 84, 2011. He, X. and Kempe, D. Stability of influence maximization. Co RR, abs/1501.04579, 2015. URL http://arxiv. org/abs/1501.04579. He, X. and Kempe, D. Robust influence maximization. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, 2016. Kalimeris, D., Singer, Y., Subbian, K., and Weinsberg, U. Learning diffusion using hyperparameters. In International Conference on Machine Learning (ICML), 2018. Karp, R. M. Reducibility among combinatorial problems. In In R. E. Miller and J. W. Thatcher (editors). Complexity of Computer Computations. New York: Plenum., pp. 85 103, 1972. Kempe, D., Kleinberg, J., and Tardos, E. Maximizing the spread of influence through a social network. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 137 146. ACM, 2003. Krause, A., Mc Mahan, H. B., Guestrin, C., and Gupta, A. Robust submodular observation selection. In Journal of Machine Learning Research, volume 9, pp. 2761 2801, 2008. Nemhauser, G. L., Wolsey, L. A., and Fisher, M. L. An analysis of approximations for maximizing submodular set functions. In Mathematical Programming 14(1), pp. 265 294, 1978. Ohsaka, N. and Yoshida, Y. Portfolio optimization for influence spread. In Proceedings of the 26th International Conference on World Wide Web, WWW 2017, Perth, Australia, April 3-7, 2017, pp. 977 985, 2017. doi: 10.1145/3038912.3052628. URL https://doi. org/10.1145/3038912.3052628. Vaswani, S., Kveton, B., Wen, Z., Ghavamzadeh, M., Lakshmanan, L. V. S., and Schmidt, M. Model-independent online learning for influence maximization. In International Conference on Machine Learning (ICML), 2017. Wen, Z., Kveton, B., and Ashkan, A. Efficient learning in large-scale combinatorial semi-bandits. In International Conference on Machine Learning (ICML 2015), pp. 1113 1122, 2015.