# robust_budget_allocation_via_continuous_submodular_functions__a7cba7d1.pdf Robust Budget Allocation via Continuous Submodular Functions Matthew Staib 1 Stefanie Jegelka 1 The optimal allocation of resources for maximizing influence, spread of information or coverage, has gained attention in the past years, in particular in machine learning and data mining. But in applications, the parameters of the problem are rarely known exactly, and using wrong parameters can lead to undesirable outcomes. We hence revisit a continuous version of the Budget Allocation or Bipartite Influence Maximization problem introduced by Alon et al. (2012) from a robust optimization perspective, where an adversary may choose the least favorable parameters within a confidence set. The resulting problem is a nonconvex-concave saddle point problem (or game). We show that this nonconvex problem can be solved exactly by leveraging connections to continuous submodular functions, and by solving a constrained submodular minimization problem. Although constrained submodular minimization is hard in general, here, we establish conditions under which such a problem can be solved to arbitrary precision . 1. Introduction The optimal allocation of resources for maximizing influence, spread of information or coverage, has gained attention in the past few years, in particular in machine learning and data mining (Domingos & Richardson, 2001; Kempe et al., 2003; Chen et al., 2009; Gomez Rodriguez & Sch olkopf, 2012; Borgs et al., 2014). In the Budget Allocation Problem, one is given a bipartite influence graph between channels S and people T, and the task is to assign a budget y(s) to each channel s in S with the goal of maximizing the expected number of influenced people I(y). Each edge (s, t) 2 E between channel s and 1Massachusetts Institute of Technology. Correspondence to: Matthew Staib , Stefanie Jegelka . Proceedings of the 34 th International Conference on Machine Learning, Sydney, Australia, PMLR 70, 2017. Copyright 2017 by the author(s). person t is weighted with a probability pst that, e.g., an advertisement on radio station s will influence person t to buy some product. The budget y(s) controls how many independent attempts are made via the channel s to influence the people in T. The probability that a customer t is influenced when the advertising budget is y is (s,t)2E[1 pst]y(s), (1) and hence the expected number of influenced people is I(y) = P t2T It(y). We write I(y; p) = I(y) to make the dependence on the probabilities pst explicit. The total budget y must remain within some feasible set Y which may encode e.g. a total budget limit P s2S y(s) C. We allow the budgets y to be continuous, as in (Bian et al., 2017). Since its introduction by Alon et al. (2012), several works have extended the formulation of Budget Allocation and provided algorithms (Bian et al., 2017; Hatano et al., 2015; Maehara et al., 2015; Soma et al., 2014; Soma & Yoshida, 2015). Budget Allocation may also be viewed as influence maximization on a bipartite graph, where information spreads as in the Independent Cascade model. For integer y, Budget Allocation and Influence Maximization are NPhard. Yet, constant-factor approximations are possible, and build on the fact that the influence function is submodular in the binary case, and DR-submodular in the integer case (Soma et al., 2014; Hatano et al., 2015). If y is continuous, the problem is a concave maximization problem. The formulation of Budget Allocation assumes that the transmission probabilities are known exactly. But this is rarely true in practice. Typically, the probabilities pst, and possibly the graph itself, must be inferred from observations (Gomez Rodriguez et al., 2010; Du et al., 2013; Narasimhan et al., 2015; Du et al., 2014; Netrapalli & Sanghavi, 2012). In Section 4 we will see that a misspecification or point estimate of parameters pst can lead to much reduced outcomes. A more realistic assumption is to know confidence intervals for the pst. Realizing this severe deficiency, recent work studied robust versions of Influence Maximization, where a budget y must be chosen that maximizes the worst-case approximation ratio over a set of possible influence functions (He & Kempe, 2016; Chen et al., 2016; Lowalekar et al., 2016). The resulting optimization problem is hard but admits bicriteria approximations. Robust Budget Allocation via Continuous Submodular Functions In this work, we revisit Budget Allocation under uncertainty from the perspective of robust optimization (Bertsimas et al., 2011; Ben-Tal et al., 2009). We maximize the worst-case influence not approximation ratio for p in a confidence set centered around the best guess (e.g., posterior mean). This avoids pitfalls of the approximation ratio formulation (which can be misled to return poor worst-case budgets, as demonstrated in Appendix A), while also allowing us to formulate the problem as a max-min game: p2P I(y; p), (2) where an adversary can arbitrarily manipulate p within the confidence set P. With p fixed, I(y; p) is concave in y. However, the influence function I(y; p) is not convex, and not even quasiconvex, in the adversary s variables pst. The new, key insight we exploit in this work is that I(y; p) has the property of continuous submodularity in p in contrast to previously exploited submodular maximization in y and can hence be minimized by generalizing techniques from discrete submodular optimization (Bach, 2015). The techniques in (Bach, 2015), however, are restricted to box constraints, and do not directly apply to our confidence sets. In fact, general constrained submodular minimization is hard (Svitkina & Fleischer, 2011; Goel et al., 2009; Iwata & Nagano, 2009). We make the following contributions: 1. We present an algorithm with optimality bounds for Robust Budget Allocation in the nonconvex adversarial scenario (2). 2. We provide the first results for continuous submodu- lar minimization with box constraints and one more nice constraint, and conditions under which the algorithm is guaranteed to return a global optimum. 1.1. Background and Related Work We begin with some background material and, along the way, discuss related work. 1.1.1. SUBMODULARITY OVER THE INTEGER LATTICE AND CONTINUOUS DOMAINS Submodularity is perhaps best known as a property of set functions. A function F : 2V ! R defined on subsets S V of a ground set V is submodular if for all sets S, T V , it holds that F(S) + F(T) F(S \ T) + F(S [T). A similar definition extends to functions defined over a distributive lattice L, e.g. the integer lattice. Such a function f is submodular if for all x, y 2 L, it holds that f(x) + f(y) f(x _ y) + f(x y). (3) For the integer lattice and vectors x, y, x _ y denotes the coordinate-wise maximum and x y the coordinate-wise minimum. Submodularity has also been considered on continuous domains X Rd, where, if f is also twicedifferentiable, the property of submodularity means that all off-diagonal entries of the the Hessian are nonpositive, i.e., @f(x) @xi@xj 0 for all i 6= j (Topkis, 1978, Theorem 3.2). These functions may be convex, concave, or neither. Submodular functions on lattices can be minimized by a reduction to set functions, more precisely, ring families (Birkhoff, 1937). Combinatorial algorithms for submodular optimization on lattices are discussed in (Khachaturov et al., 2012). More recently, Bach (2015) extended results based on the convex Lov asz extension, by building on connections to optimal transport. The subclass of L\-convex functions admits strongly polynomial time minimization (Murota, 2003; Kolmogorov & Shioura, 2009; Murota & Shioura, 2014), but does not apply in our setting. Similarly, results for submodular maximization extend to integer lattices, e.g. (Gottschalk & Peis, 2015). Stronger results are possible if the submodular function also satisfies diminishing returns: for all x y (coordinate-wise) and i such that y+ei 2 X, it holds that f(x+ei) f(x) f(y+ei) f(y). For such DR-submodular functions, many approximation results for the set function case extend (Bian et al., 2017; Soma & Yoshida, 2015; Soma et al., 2014). In particular, Ene & Nguyen (2016) show a generic reduction to set function optimization that they apply to maximization. In fact, it also applies to minimization: Proposition 1.1. A DR-submodular function f defined on Qn i=1[ki] can be minimized in strongly polynomial time O(n4 log4 k log2(n log k) EO + n4 log4 k log O(1)(n log k)), where k = maxi ki and EO is the time complexity of evaluating f. Here, [ki] = {0, 1, . . . , ki 1}. In particular, the time complexity is logarithmic in k. For general lattice submodular functions, this is not possible without further assumptions. 1.1.2. RELATED PROBLEMS A sister problem of Budget Allocation is Influence Maximization on general graphs, where a set of seed nodes is selected to start a propagation process. The influence function is still monotone submodular and amenable to the greedy algorithm (Kempe et al., 2003), but it cannot be evaluated explicitly and requires approximation (Chen et al., 2010). Stochastic Coverage (Goemans & Vondr ak, 2006) is a version of Set Cover where the covering sets Si V are random. A variant of Budget Allocation can be written as stochastic coverage with multiplicity. Stochastic Coverage has mainly been studied in the online or adaptive setting, where logarithmic approximation factors can be achieved (Golovin & Krause, 2011; Deshpande et al., 2016; Adamczyk et al., 2016). Robust Budget Allocation via Continuous Submodular Functions Our objective function (2) is a signomial in p, i.e., a linear combination of monomials of the form Q i . General signomial optimization is NP-hard (Chiang, 2005), but certain subclasses are tractable: posynomials with all nonnegative coefficients can be minimized via Geometric Programming (Boyd et al., 2007), and signomials with a single negative coefficient admit sum of squares-like relaxations (Chandrasekaran & Shah, 2016). Our problem, a constrained posynomial maximization, is not in general a geometric program. Some work addresses this setting via monomial approximation (Pascual & Ben-Israel, 1970; Ecker, 1980), but, to our knowledge, our algorithm is the first that solves this problem to arbitrary accuracy. 1.1.3. ROBUST OPTIMIZATION Two prominent strategies of addressing uncertainty in parameters of optimization problems are stochastic and robust optimization. If the distribution of the parameters is known (stochastic optimization), formulations such as value-at-risk (Va R) and conditional value-at-risk (CVa R) (Rockafellar & Uryasev, 2000; 2002) apply. In contrast, robust optimization (Ben-Tal et al., 2009; Bertsimas et al., 2011) assumes that the parameters (of the cost function and constraints) can vary arbitrarily within a known confidence set U, and the aim is to optimize the worst-case setting, i.e., y sup u,A,b2U {g(y; u) s.t. Ay b}. (4) Here, we will only have uncertainty in the cost function. In this paper we are principally concerned with robust maximization of the continuous influence function I(y), but mention some results for the discrete case. While there exist results for robust and CVa R optimization of modular (linear) functions (Nikolova, 2010; Bertsimas & Sim, 2003), submodular objectives do not in general admit such optimization (Maehara, 2015), but variants admit approximations (Zhang et al., 2014). The brittleness of submodular optimization under noise has been studied in (Balkanski et al., 2016; 2017; Hassidim & Singer, 2016). Approximations for robust submodular and influence optimization have been studied in (Krause et al., 2008; He & Kempe, 2016; Chen et al., 2016; Lowalekar et al., 2016), where an adversary can pick among a finite set of objective functions or remove selected elements (Orlin et al., 2016). 2. Robust and Stochastic Budget Allocation The unknown parameters in Budget Allocation are the transmission probabilities pst or edge weights in a graph. If these are estimated from data, we may have posterior distributions or, a weaker assumption, confidence sets for the parameters. For ease of notation, we will work with the failure probabilities xst = 1 pst instead of the pst directly, and write I(y; x) instead of I(y; p). 2.1. Stochastic Optimization If a (posterior) distribution of the parameters is known, a simple strategy is to use expectations. We place a uniform prior on xst, and observe nst independent observations drawn from Ber(xst). If we observe st failures and and βst successes, the resulting posterior distribution on the variable Xst is Beta(1 + st, 1 + βst). Given such a posterior, we may optimize y2Y I(y; E[X]), or (5) y2Y E[I(y; X)]. (6) Proposition 2.1. Problems (5) and (6) are concave maximization problems over the (convex) set Y and can be solved exactly. Concavity of (6) follows since it is an expectation over concave functions, and the problem can be solved by stochastic gradient ascent or by explicitly computing gradients. Merely maximizing expectation does not explicitly account for volatility and hence risk. One option is to include variance (Ben-Tal & Nemirovski, 2000; Bertsimas et al., 2011; Atamt urk & Narayanan, 2008): y2Y E[I(y; X)] + " Var(I(y; X)), (7) but in our case this CVa R formulation seems difficult: Fact 2.1. For y in the nonnegative orthant, the term p Var(I(y; X)) need not be convex or concave, and need not be submodular or supermodular. This observation does not rule out a solution, but the apparent difficulties further motivate a robust formulation that, as we will see, is amenable to optimization. 2.2. Robust Optimization The focus of this work is the robust version of Budget Allocation, where we allow an adversary to arbitrarily set the parameters x within an uncertainty set X. This uncertainty set may result, for instance, from a known distribution, or simply assumed bounds. Formally, we solve x2X I(y; x), (8) + is a convex set with an efficient projection oracle, and X is an uncertainty set containing an estimate ˆx. In the sequel, we use uncertainty sets X = {x 2 Box(l, u) : R(x) B}, where R is a distance (or divergence) from the estimate ˆx, and Box(l, u) is the box Q (s,t)2E[lst, ust]. The intervals [lst, ust] can be thought of Robust Budget Allocation via Continuous Submodular Functions as either confidence intervals around ˆx, or, if [lst, ust] = [0, 1], enforce that each xst is a valid probability. Common examples of uncertainty sets used in Robust Optimization are Ellipsoidal and D-norm uncertainty sets (Bertsimas et al., 2011). Our algorithm in Section 3.1 applies to both. Ellipsoidal uncertainty. The ellipsoidal or quadratic uncertainty set is defined by X Q(γ) = {x 2 Box(0, 1) : (x ˆx)T 1(x ˆx) γ}, where is the covariance of the random vector X of probabilities distributed according to our Beta posteriors. In our case, since the distributions on each xst are independent, 1 is actually diagonal. Writing = diag(σ2), we have x 2 Box(0, 1) : where Rst(x) = (xst ˆxst)2σ 2 D-norm uncertainty. The D-norm uncertainty set is similar to an 1-ball around ˆx, and is defined as x : 9c 2 Box(0, 1) s.t. xst = ˆxst + (ust ˆxst)cst, Essentially, we allow an adversary to increase ˆxst up to some upper bound ust, subject to some total budget γ across all terms xst. The set X D(γ) can be rewritten as x 2 Box(ˆx, u) : where Rst(xst) = (xst ˆxst)/(ust ˆxst) is the fraction of the interval [ˆxst, ust] we have used up in increasing xst. The min-max formulation maxy2Y minx2X I(y; x) has several merits: the model is not tied to a specific learning algorithm for the probabilities x as long as we can choose a suitable confidence set. Moreover, this formulation allows to fully hedge against a worst-case scenario. 3. Optimization Algorithm As noted above, the function I(y; x) is concave as a function of y for fixed x. As a pointwise minimum of concave functions, F(y) := minx2X I(y; x) is concave. Hence, if we can compute subgradients of F(y), we can solve our max-min-problem via the subgradient method, as outlined in Algorithm 1. A subgradient gy 2 @F(y) at y is given by the gradient of I(y; x ) for the minimizing x 2 arg minx2X I(y; x), i.e., Algorithm 1 Subgradient Ascent Input: suboptimality tolerance " > 0, initial feasible budget y(0) 2 Y Output: "-optimal budget y for Problem (8) repeat x(k) arg minx2X I(y(k); x) g(k) ry I(y(k); x(k)) L(k) I(y(k); x(k)) U (k) maxy2Y I(y; x(k)) γ(k) (U (k) L(k))/kg(k)k2 2 y(k+1) proj Y(y(k) + γ(k)g(k)) k k + 1 until U (k) L(k) " gy = ry I(y; x ). Hence, we must be able to compute x for any y. We also obtain a duality gap: for any x0, y0 we have min x2X I(y0; x) max x2X I(y; x) max y2Y I(y; x0). (9) This means we can estimate the optimal value I and use it in Polyak s stepsize rule for the subgradient method (Polyak, 1987). But I(y; x) is not convex in x, and not even quasiconvex. For example, standard methods (Wainwright & Chiang, 2004, Chapter 12) imply that f(x1, x2, x3) = 1 x1x2 px3 is not quasiconvex on R3 +. Moreover, the above-mentioned signomial optimization techniques do not apply for an exact solution either. So, it is not immediately clear that we can solve the inner optimization problem. The key insight we will be using is that I(y; x) has a different beneficial property: while not convex, I(y; x) as a function of x is continuous submodular. Lemma 3.1. Suppose we have n 1 differentiable functions fi : R ! R+, for i = 1, . . . , n, either all nonincreasing or all nondecreasing. Then, f(x) = Qn i=1 fi(xi) is a continuous supermodular function from Rn to R+. Proof. For n = 1, the resulting function is modular and therefore supermodular. In the case n 2, we simply need to compute derivatives. The mixed derivatives are fk(xk). (10) By monotonicity, f 0 j have the same sign, so their product is nonnegative, and since each fk is nonnegative, the entire expression is nonnegative. Hence, f(x) is continuous supermodular by Theorem 3.2 of (Topkis, 1978). Corollary 3.1. The influence function I(y; x) defined in Section 2 is continuous submodular in x over the nonnegative orthant, for each y 0. Robust Budget Allocation via Continuous Submodular Functions Proof. Since submodularity is preserved under summation, it suffices to show that each function It(y) is continuous submodular. By Lemma 3.1, since fs(z) = zy(s) is nonnegative and monotone nondecreasing for y(s) 0, the product Q (s,t)2E xy(s) st is continuous supermodular in x. Flipping the sign and adding a constant term yields It(y), which is hence continuous submodular. Conjecture 3.1. Strong duality holds, i.e. x2X I(y; x) = min y2Y I(y; x). (11) If strong duality holds, then the duality gap maxy2Y I(y; x ) minx2X I(y ; x) in Equation (9) is zero at optimality. If I(y; x) were quasiconvex in x, strong duality would hold by Sion s min-max theorem, but this is not the case. In practice, we observe that the duality gap always converges to zero. Bach (2015) demonstrates how to minimize a continuous submodular function H(x) subject to box constraints x 2 Box(l, u), up to an arbitrary suboptimality gap " > 0. The constraint set X in our Robust Budget Allocation problem, however, has box constraints with an additional constraint R(x) B. This case is not addressed in any previous work. Fortunately, for a large class of functions R, there is still an efficient algorithm for continuous submodular minimization, which we present in the next section. 3.1. Constrained Continuous Submodular Function Minimization We next address an algorithm for minimizing a monotone continuous submodular function H(x) subject to box constraints x 2 Box(l, u) and a constraint R(x) B: minimize H(x) s.t. R(x) B x 2 Box(l, u). If H and R were convex, the constrained problem would be equivalent to solving, with the right Lagrange multipler λ 0: minimize H(x) + λ R(x) s.t. x 2 Box(l, u). (13) Although H and R are not necessarily convex here, it turns out that a similar approach indeed applies. The main idea of our approach bears similarity with (Nagano et al., 2011) for the set function case, but our setting with continuous functions and various uncertainty sets is more general, and requires more argumentation. We outline our theoretical results here, and defer further implementation details and proofs to the appendix. Following (Bach, 2015), we discretize the problem; for a sufficiently fine discretization, we will achieve arbitrary accuracy. Let A be an interpolation mapping that maps the discrete set Qn i=1[ki] into Box(l, u) = Qn i=1[li, ui] via the componentwise interpolation functions Ai : [ki] ! [li, ui]. We say Ai is δ-fine if Ai(xi + 1) Ai(xi) δ for all xi 2 {0, 1, . . . , ki 2}, and we say the full interpolation function A is δ-fine if each Ai is δ-fine. This mapping yields functions Hδ : Qn i=1[ki] ! R and Rδ : Qn i=1[ki] ! R via Hδ(x) = H(A(x)) and Rδ(x) = R(A(x)). Hδ is lattice submodular (on the integer lattice). This construction leads to a reduction of Problem (12) to a submodular minimization problem over the integer lattice: minimize Hδ(x) + λRδ(x) s.t. x 2 Qn i=1[ki]. (14) Ideally, there should then exist a λ such that the associated minimizer x(λ) yields a close to optimal solution for the constrained problem. Theorem 3.1 below states that this is indeed the case. Moreover, a second benefit of submodularity is that we can find the entire solution path for Problem (14) by solving a single optimization problem. Lemma 3.2. Suppose H is continuous submodular, and suppose the regularizer R is strictly increasing and separable: R(x) = Pn i=1 Ri(xi). Then we can recover a minimizer x(λ) for the induced discrete Problem (14) for any λ 2 R by solving a single convex optimization problem. The problem in question arises from a relaxation h# that extends Hδ in each coordinate i to a function on distributions over the domain [ki]. These distributions are represented via their inverse cumulative distribution functions i, which take the coordinate xi as input, and output the probability of exceeding xi. The function h# is an analogue of the Lov asz extension of set functions to continuous submodular functions (Bach, 2015), it is convex and coincides with Hδ on lattice points. Formally, this resulting single optimization problem is: minimize h#( ) + Pn ji=1 aixi( i(xi)) s.t. 2 Qn # refers to the set of ordered vectors z 2 Rk that satisfy z1 z2 zk, the notation i(xi) denotes the xi-th coordinate of the vector i, and the aixi are strictly convex functions given by aixi(t) = 1 i (xi 1)]. (16) Problem (15) can be solved by Frank-Wolfe methods (Frank & Wolfe, 1956; Dunn & Harshbarger, 1978; Lacoste-Julien, 2016; Jaggi, 2013). This is because the greedy algorithm for computing subgradients of the Lov asz Robust Budget Allocation via Continuous Submodular Functions extension can be generalized, and yields a linear optimization oracle for the dual of Problem (15). We detail the relationship between Problems (14) and (15), as well as how to implement the Frank-Wolfe methods, in Appendix C. Let be the optimal solution for Problem (15). For any λ, we obtain a rounded solution x(λ) for Problem (14) by thresholding: we set x(λ)i = max{j | 1 j ki 1, i (j) λ}, or zero if i (j) < λ for all j. Each x(λ0) is the optimal solution for Problem (14) with λ = λ0. We use the largest parameterized solution x(λ) that is still feasible, i.e. the solution x(λ ) where λ solves min Hδ(x(λ)) s.t. λ 0 Rδ(x(λ)) B. This λ can be found efficiently via binary search or a linear scan. Theorem 3.1. Let H be continuous submodular and monotone decreasing, with 1-Lipschitz constant G, and let R be strictly increasing and separable. Assume all entries i (j) of the optimal solution of Problem (15) are distinct. Let x0 = A(x(λ )) be the thresholding corresponding to the optimal solution λ of Problem (17), mapped back into the original continuous domain X. Then x0 is feasible for the continuous Problem (12), and is a 2Gδapproximate solution: H(x0) 2Gδ + min x2Box(l,u), R(x) B H(x). Theorem 3.1 implies an algorithm for solving Problem (12) to "-optimality: (1) set δ = "/G, (2) compute which solves Problem (15), (3) find the optimal thresholding of by determining the smallest λ for which Rδ(x(λ )) B, and (4) map x(λ ) back into continuous space via the interpolation mapping A. Optimality Bounds. Theorem 3.1 is proved by comparing x0 and x to the optimal solution on the discretized mesh d 2 argmin x2Qn i=1[ki]:Rδ(x) B Beyond the theoretical guarantee of Theorem 3.1, for any problem instance and candidate solution x0, we can compute a bound on the gap between H(x0) and Hδ(x d). The following two bounds are proved in the appendix: 1. We can generate a discrete point x(λ+) satisfying H(x0) [H(x0) Hδ(x(λ+))] + Hδ(x 2. The Lagrangian yields the bound H(x0) λ (B R(x0)) + Hδ(x Improvements. The requirement in Theorem 3.1 that the elements of be distinct may seem somewhat restrictive, but as long as has distinct elements in the neighborhood of our particular λ , this bound still holds. We see in Section 4.1.1 that in practice, almost always has distinct elements in the regime we care about, and the bounds of Remark 3.1 are very good. If H is DR-submodular and R is affine in each coordinate, then Problem (14) can be represented more compactly via the reduction of Ene & Nguyen (2016), and hence problem (12) can be solved more efficiently. In particular, the influence function I(y; x) is DR-submodular in x when for each s, y(s) = 0 or y(s) 1. 3.2. Application to Robust Budget Allocation The above algorithm directly applies to Robust Allocation with the uncertainty sets in Section 2.2. The ellipsoidal uncertainty set X Q corresponds to the constraint that P (s,t)2E Rst(xst) γ with Rst(x) = (xst ˆxst)2σ 2 st , and x 2 Box(0, 1). By the monotonicity of I(x, y), there is never incentive to reduce any xst below ˆxst, so we can replace Box(0, 1) with Box(ˆx, 1). On this interval, each Rst is strictly increasing, and Theorem 3.1 applies. For D-norm sets, we have Rst(xst) = (xst ˆxst)/(ust ˆxst). Since each Rst is monotone, Theorem 3.1 applies. Runtime and Alternatives. Since the core algorithm is Frank-Wolfe, it is straightforward to show that Problem (15) can be solved to "-suboptimality in time O(" 1n2δ 3 1|T|2 log nδ 1), where is the minimum derivative of the functions Ri. If has distinct elements separated by , then choosing " = 2 δ/8 results in an exact solution to (14) in time O( 2n2δ 4 2|T|2 log nδ 1). Noting that Hδ + λRδ is submodular for all λ, one could instead perform binary search over λ, each time converting the objective into a submodular set function via Birkhoff s theorem and solving submodular minimization e.g. via one of the recent fast methods (Chakrabarty et al., 2017; Lee et al., 2015). However, we are not aware of a practical implementation of the algorithm in (Lee et al., 2015). The algorithm in (Chakrabarty et al., 2017) yields a solution in expectation. This approach also requires care in the precision of the search over λ, whereas our approach searches directly over the O(nδ 1) elements of . 4. Experiments We evaluate our Robust Budget Allocation algorithm on both synthetic test data and a real-world bidding dataset from Yahoo! Webscope (yah) to demonstrate that our method yields real improvements. For all experiments, we Robust Budget Allocation via Continuous Submodular Functions D-norm, C = 0.4 0.01 Ellipsoidal, C = 0.4 0.026 D-norm, C = 4 0.01 Ellipsoidal, C = 4 Figure 1. Visualization of the sorted values of i (j) (blue dots) with comparison to the particular Lagrange multiplier λ (orange line). In most regimes there are no duplicate values, so that Theorem 3.1 applies. The theorem only needs distinctness at λ . used Algorithm 1 as the outer loop. For the inner submodular minimization step, we implemented the pairwise Frank-Wolfe algorithm of (Lacoste-Julien & Jaggi, 2015). In all cases, the feasible set of budgets Y is {y 2 RS s2S y(s) C} where the specific budget C depends on the experiment. Our code is available at git.io/v HXk O. 4.1. Synthetic On the synthetic data, we probe two questions: (1) how often does the distinctness condition of Theorem 3.1 hold, so that we are guaranteed an optimal solution; and (2) what is the gain of using a robust versus non-robust solution in an adversarial setting? For both settings, we set |S| = 6 and |T| = 2 and discretize with δ = 0.001. We generated true probabilties pst, created Beta posteriors, and built both Ellipsoidal uncertainty sets X Q(γ) and D-norm sets X D(γ). 4.1.1. OPTIMALITY Theorem 3.1 and Remark 3.1 demand that the values i (j) be distinct at our chosen Lagrange multiplier λ and, under this condition, guarantee optimality. We illustrate this in four examples: for Ellipsoidal or a D-norm uncertainty set, and a total influence budget C 2 {0.4, 4}. Figure 3 shows all elements of in sorted order, as well as a horizontal line indicating our Lagrange multiplier λ which serves as a threshold. Despite some plateaus, the entries i (j) are distinct in most regimes, in particular around λ , the regime that is needed for our results. Moreover, in practice (on the Yahoo data) we observe later in Figure 3 that both solutiondependent bounds from Remark 3.1 are very good, and all solutions are optimal within a very small gap. 0 5 10 15 0 0.3 D-norm, C = 0.400 0 400 800 0 0.3 Ellipsoidal, C = 0.400 0 5 10 15 0 D-norm, C = 3.740 0 400 800 0 Ellipsoidal, C = 3.740 Figure 2. Comparison of worst-case expected influences for Dnorm uncertainty sets X D(γ) (left) and ellipsoidal uncertainty sets X Q(γ) (right), for different total budget bounds C. For any particular adversary budget γ, we compare minx2X(γ) I(y; x) for each candidate allocation y. 4.1.2. ROBUSTNESS AND QUALITY Next, we probe the effect of a robust versus non-robust solution for different uncertainty sets and budgets γ of the adversary. We compare our robust solution with using a point estimate for x, i.e., ynom 2 argmaxy2Y I(y; ˆx), treating estimates as ground truth, and the stochastic solution yexpect 2 argmaxy2Y E[I(y; X)] as per Section 2.1. These two optimization problems were solved via standard first-order methods using TFOCS (Becker et al., 2011). Figure 2 demonstrates that indeed, the alternative budgets are sensitive to the adversary and the robustly-chosen budget yrobust performs better, even in cases where the other budgets achieve zero influence. When the total budget C is large, yexpect performs nearly as well as yrobust, but when resources are scarce (C is small) and the actual choice seems to matter more, yrobust performs far better. 4.2. Yahoo! data To evaluate our method on real-world data, we formulate a Budget Allocation instance on advertiser bidding data from Yahoo! Webscope (yah). This dataset logs bids on 1000 different phrases by advertising accounts. We map the phrases to channels S and the accounts to customers T, with an edge between s and t if a corresponding bid was made. For each pair (s, t), we draw the associated transmission probability pst uniformly from [0, 0.4]. We bias these towards zero because we expect people not to be easily influenced by advertising in the real world. We then generate an estimate ˆpst and build up a posterior by gener- Robust Budget Allocation via Continuous Submodular Functions Figure 3. Convergence properties of our algorithm on real data. In the first plot, p and d refer to primal and dual values, with dual gap shown on the second plot. The third plot demonstrates that the problem-dependent suboptimality bounds of Remark 3.1 (x for x(λ+) and L for Lagrangian) are very small (good) for all inner iterations of this run. ating nst samples from Ber(pst), where nst is the number of bids between s and t in the dataset. This transformation yields a bipartite graph with |S| = 1000, |T| = 10475, and more than 50,000 edges that we use for Budget Allocation. In our experiments, the typical gap between the naive ynom and robust yrobust was 100500 expected influenced people. We plot convergence of the outer loop in Figure 3, where we observe fast convergence of both primal influence value and the dual bound. 4.3. Comparison to first-order methods Given the success of first-order methods on nonconvex problems in practice, it is natural to compare these to our method for finding the worst-case vector x. On one of our Yahoo problem instances with D-norm uncertainty set, we compared our submodular minimization scheme to Frank Wolfe with fixed stepsize as in (Lacoste-Julien, 2016), implementing the linear oracle using MOSEK (MOSEK Ap S, 2015). Interestingly, from various initializations, Frank Wolfe finds an optimal solution, as verified by comparing to the guaranteed solution of our algorithm. Note that, due to non-convexity, there are no formal guarantees for Frank Wolfe to be optimal here, motivating the question of global convergence properties of Frank-Wolfe in the presence of submodularity. It is important to note that there are many cases where firstorder methods are inefficient or do not apply to our setup. These methods require either a projection oracle (PO) onto or linear optimization oracle (LO) over the feasible set X defined by , u and R(x). The D-norm set admits a LO via linear programming, but we are not aware of any efficient LO for Ellipsoidal uncertainty, nor PO for either set, that does not require quadratic programming. Even more, our algorithm applies for nonconvex functions R(x) which induce nonconvex feasible sets X. Such nonconvex sets may not even admit a unique projection, while our algorithm achieves provable solutions. 0 40 80 7500 Figure 4. Convergence properties of Frank-Wolfe (FW), versus the optimal value attained with our scheme (SFM). 5. Conclusion We address the issue of uncertain parameters (or, model misspecification) in Budget Allocation or Bipartite Influence Maximization (Alon et al., 2012) from a robust optimization perspective. The resulting Robust Budget Allocation is a nonconvex-concave saddle point problem. Although the inner optimization problem is nonconvex, we show how continuous submodularity can be leveraged to solve the problem to arbitrary accuracy ", as can be verified with the proposed bounds on the duality gap. In particular, our approach extends continuous submodular minimization methods (Bach, 2015) to more general constraint sets, introducing a mechanism to solve a new class of constrained nonconvex optimization problems. We confirm on synthetic and real data that our method finds high-quality solutions that are robust to parameters varying arbitrarily in an uncertainty set, and scales up to graphs with over 50,000 edges. There are many compelling directions for further study. The uncertainty sets we use are standard in the robust optimization literature, but have not been applied to e.g. Robust Influence Maximization; it would be interesting to generalize our ideas to general graphs. Finally, despite the inherent nonconvexity of our problem, first-order methods are often able to find a globally optimal solution. Explaining this phenomenon requires further study of the geometry of constrained monotone submodular minimization. Acknowledgements We thank the anonymous reviewers for their helpful suggestions. We also thank MIT Supercloud and the Lincoln Laboratory Supercomputing Center for providing computational resources. This research was conducted with Government support under and awarded by Do D, Air Force Office of Scientific Research, National Defense Science and Engineering Graduate (NDSEG) Fellowship, 32 CFR 168a, and also supported by NSF CAREER award 1553284. Robust Budget Allocation via Continuous Submodular Functions Yahoo! Webscope dataset ydata-ysm-advertiser-bidsv1 0. URL http://research.yahoo.com/ Academic_Relations. Adamczyk, Marek, Sviridenko, Maxim, and Ward, Justin. Submodular Stochastic Probing on Matroids. Mathematics of Operations Research, 41(3):1022 1038, 2016. Alon, Noga, Gamzu, Iftah, and Tennenholtz, Moshe. Op- timizing Budget Allocation Among Channels and Influencers. In WWW. 2012. Atamt urk, Alper and Narayanan, Vishnu. Polymatroids and mean-risk minimization in discrete optimization. Operations Research Letters, 36(5):618 622, 2008. Bach, Francis. Submodular Functions: From Discrete to Continous Domains. ar Xiv:1511.00394, 2015. Balkanski, Eric, Rubinstein, Aviad, and Singer, Yaron. The power of optimization from samples. In NIPS, 2016. Balkanski, Eric, Rubinstein, Aviad, and Singer, Yaron. The limitations of optimization from samples. In STOC, 2017. Becker, Stephen R., Cand es, Emmanuel J., and Grant, Michael C. Templates for convex cone problems with applications to sparse signal recovery. Mathematical programming computation, 3(3):165 218, 2011. Ben-Tal, Aharon and Nemirovski, Arkadi. Robust solutions of Linear Programming problems contaminated with uncertain data. Mathematical Programming, 88(3): 411 424, 2000. Ben-Tal, Aharon, El Ghaoui, Laurent, and Nemirovski, Arkadi. Robust Optimization. Princeton University Press, 2009. Bertsimas, Dimitris and Sim, Melvyn. Robust discrete op- timization and network flows. Mathematical programming, 98(1):49 71, 2003. Bertsimas, Dimitris, Brown, David B., and Caramanis, Constantine. Theory and Applications of Robust Optimization. SIAM Review, 53(3):464 501, 2011. Best, Michael J. and Chakravarti, Nilotpal. Active set al- gorithms for isotonic regression; A unifying framework. Mathematical Programming, 47(1-3):425 439, 1990. Bian, Andrew An, Mirzasoleiman, Baharan, Buhmann, Joachim M., and Krause, Andreas. Guaranteed Nonconvex Optimization: Submodular Maximization over Continuous Domains. In AISTATS, 2017. Birkhoff, Garrett. Rings of sets. Duke Mathematical Jour- nal, 3(3):443 454, 1937. Borgs, Christian, Brautbar, Michael, Chayes, Jennifer, and Lucier, Brendan. Maximizing Social Influence in Nearly Optimal Time. In SODA, 2014. Boyd, Stephen, Kim, Seung-Jean, Vandenberghe, Lieven, and Hassibi, Arash. A tutorial on geometric programming. Optimization and engineering, 8(1):67 127, 2007. Chakrabarty, Deeparnab, Lee, Yin Tat, Sidford, Aaron, and Wong, Sam Chiu-wai. Subquadratic submodular function minimization. In STOC, 2017. Chandrasekaran, Venkat and Shah, Parikshit. Relative En- tropy Relaxations for Signomial Optimization. SIAM Journal on Optimization, 26(2):1147 1173, 2016. Chen, Wei, Wang, Yajun, and Yang, Siyu. Efficient influ- ence maximization in social networks. In KDD, 2009. Chen, Wei, Wang, Chi, and Wang, Yajun. Scalable Influence Maximization for Prevalent Viral Marketing in Large-scale Social Networks. In KDD, 2010. Chen, Wei, Lin, Tian, Tan, Zihan, Zhao, Mingfei, and Zhou, Xuren. Robust influence maximization. In KDD. 2016. Chiang, Mung. Geometric Programming for Communica- tion Systems. Commun. Inf. Theory, 2(1/2):1 154, 2005. Deshpande, Amol, Hellerstein, Lisa, and Kletenik, Devo- rah. Approximation Algorithms for Stochastic Submodular Set Cover with Applications to Boolean Function Evaluation and Min-Knapsack. ACM Trans. Algorithms, 12(3):42:1 42:28, 2016. Domingos, Pedro and Richardson, Matt. Mining the net- work value of customers. In KDD, 2001. Du, Nan, Song, Le, Gomez Rodriguez, Manuel, and Zha, Hongyuan. Scalable influence estimation in continuoustime diffusion networks. In NIPS. 2013. Du, Nan, Liang, Yingyu, Balcan, Maria-Florina, and Song, Le. Influence function learning in information diffusion networks. In ICML, 2014. Dunn, Joseph C. and Harshbarger, S. Conditional gradi- ent algorithms with open loop step size rules. Journal of Mathematical Analysis and Applications, 62(2):432 444, 1978. Ecker, Joseph. Geometric Programming: Methods, Com- putations and Applications. SIAM Review, 22(3):338 362, 1980. Ene, Alina and Nguyen, Huy L. A Reduction for Opti- mizing Lattice Submodular Functions with Diminishing Returns. ar Xiv:1606.08362, 2016. Frank, Marguerite and Wolfe, Philip. An algorithm for quadratic programming. Naval Research Logistics Quarterly, 3(1-2):95 110, 1956. Goel, Gagan, Karande, Chinmay, Tripathi, Pushkar, and Wang, Lei. Approximability of combinatorial problems with multi-agent submodular cost functions. In FOCS, 2009. Goemans, Michel and Vondr ak, Jan. Stochastic Covering and Adaptivity. In LATIN 2006: Theoretical Informatics. Springer Berlin Heidelberg, 2006. Golovin, Daniel and Krause, Andreas. Adaptive Submod- ularity: Theory and Applications in Active Learning and Stochastic Optimization. Journal of Artificial Intelligence, 42:427 486, 2011. Robust Budget Allocation via Continuous Submodular Functions Gomez Rodriguez, Manuel and Sch olkopf, Bernhard. In- fluence maximization in continuous time diffusion networks. In ICML, 2012. Gomez Rodriguez, Manuel, Leskovec, Jure, and Krause, Andreas. Inferring networks of diffusion and influence. In KDD, 2010. Gottschalk, Corinna and Peis, Britta. Submodular func- tion maximization on the bounded integer lattice. In Approximation and Online Algorithms: 13th International Workshop (WAOA), 2015. Hassidim, Avinatan and Singer, Yaron. Submodular opti- mization under noise. ar Xiv preprint ar Xiv:1601.03095, 2016. Hatano, Daisuke, Fukunaga, Takuro, Maehara, Takanori, and Kawarabayashi, Ken-ichi. Lagrangian Decomposition Algorithm for Allocating Marketing Channels. In AAAI, 2015. He, Xinran and Kempe, David. Robust influence maximization. In KDD. 2016. Iwata, Satoru and Nagano, Kiyohito. Submodular function minimization under covering constraints. In FOCS, 2009. Jaggi, Martin. Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization. In ICML, 2013. Kempe, David, Kleinberg, Jon, and Tardos, Eva. Maximiz- ing the Spread of Influence Through a Social Network. In KDD, 2003. Khachaturov, Vladimir R., Khachaturov, Roman V., and Khachaturov, Ruben V. Supermodular programming on finite lattices. Computational Mathematics and Mathematical Physics, 52(6):855 878, 2012. Kolmogorov, Vladimir and Shioura, Akiyoshi. New algo- rithms for convex cost tension problem with application to computer vision. Discrete Optimization, 6:378 393, 2009. Krause, Andreas, Mc Mahan, H Brendan, Guestrin, Carlos, and Gupta, Anupam. Robust submodular observation selection. Journal of Machine Learning Research, 9(Dec): 2761 2801, 2008. Lacoste-Julien, Simon. Convergence Rate of Frank-Wolfe for Non-Convex Objectives. ar Xiv:1607.00345, 2016. Lacoste-Julien, Simon and Jaggi, Martin. On the global lin- ear convergence of Frank-Wolfe optimization variants. In NIPS, 2015. Lee, Yin Tat, Sidford, Aaron, and Wong, Sam Chiu-wai. A faster cutting plane method and its implications for combinatorial and convex optimization. In FOCS, 2015. Lowalekar, Meghna, Varakantham, Pradeep, and Kumar, Akshat. Robust Influence Maximization: (Extended Abstract). In AAMAS, 2016. Maehara, Takanori. Risk averse submodular utility maxi- mization. Operations Research Letters, 43(5):526 529, 2015. Maehara, Takanori, Yabe, Akihiro, and Kawarabayashi, Ken-ichi. Budget Allocation Problem with Multiple Advertisers: A Game Theoretic View. In ICML, 2015. MOSEK Ap S. MOSEK MATLAB Toolbox 8.0.0.57, 2015. URL http://docs.mosek.com/8.0/ toolbox/index.html. Murota, Kazuo. Discrete convex analysis. SIAM, 2003. Murota, Kazuo and Shioura, Akiyoshi. Exact bounds for steepest descent algorithms of l-convex function minimization. Operations Research Letters, 42:361 366, 2014. Nagano, Kiyohito, Kawahara, Yoshinobu, and Aihara, Kazuyuki. Size-constrained submodular minimization through minimum norm base. In ICML, 2011. Narasimhan, Harikrishna, Parkes, David C, and Singer, Yaron. Learnability of influence in networks. In NIPS. 2015. Netrapalli, Praneeth and Sanghavi, Sujay. Learning the graph of epidemic cascades. In SIGMETRICS. 2012. Nikolova, Evdokia. Approximation algorithms for reliable stochastic combinatorial optimization. In APPROX. 2010. Orlin, James B., Schulz, Andreas, and Udwani, Rajan. Ro- bust monotone submodular function maximization. In IPCO, 2016. Pascual, Luis D. and Ben-Israel, Adi. Constrained max- imization of posynomials by geometric programming. Journal of Optimization Theory and Applications, 5(2): 73 80, 1970. Polyak, Boris T. Introduction to Optimization. Number 04; QA402. 5, P6. 1987. Rockafellar, R Tyrrell and Uryasev, Stanislav. Optimiza- tion of conditional value-at-risk. Journal of risk, 2:21 42, 2000. Rockafellar, R Tyrrell and Uryasev, Stanislav. Conditional value-at-risk for general loss distributions. Journal of banking & finance, 26(7):1443 1471, 2002. Soma, Tasuku and Yoshida, Yuichi. A Generalization of Submodular Cover via the Diminishing Return Property on the Integer Lattice. In NIPS, 2015. Soma, Tasuku, Kakimura, Naonori, Inaba, Kazuhiro, and Kawarabayashi, Ken-ichi. Optimal Budget Allocation: Theoretical Guarantee and Efficient Algorithm. In ICML, 2014. Svitkina, Zoya and Fleischer, Lisa. Submodular approxi- mation: Sampling-based algorithms and lower bounds. SIAM Journal on Computing, 40(6):1715 1737, 2011. Topkis, Donald M. Minimizing a submodular function on a lattice. Operations research, 26(2):305 321, 1978. Wainwright, Kevin and Chiang, Alpha. Fundamental Methods of Mathematical Economics. Mc Graw-Hill Education, 2004. Zhang, Peng, Chen, Wei, Sun, Xiaoming, Wang, Yajun, and Zhang, Jialin. Minimizing seed set selection with Robust Budget Allocation via Continuous Submodular Functions probabilistic coverage guarantee in a social network. In KDD, 2014.