# nonmonotone_drsubmodular_maximization_over_general_convex_sets__357f762e.pdf Non-monotone DR-submodular Maximization over General Convex Sets Christoph D urr1 , Nguyễn Kim Thắng2 , Abhinav Srivastav2 and Léo Tible2 1LIP6, Sorbonne University, France 2IBISC, Univ Évry, University Paris-Saclay, France Christoph.Durr@lip6.fr, kimthang.nguyen@univ-evry.fr, abhinavsriva@gmail.com, leo.tible@ens-cachan.fr Many real-world problems can often be cast as the optimization of DR-submodular functions defined over a convex set. These functions play an important role with applications in many areas of applied mathematics, such as machine learning, computer vision, operation research, communication systems or economics. In addition, they capture a subclass of non-convex optimization that provides both practical and theoretical guarantees. In this paper, we show that for maximizing nonmonotone DR-submodular functions over a general convex set (such as up-closed convex sets, conic convex set, etc) the Frank-Wolfe algorithm achieves an approximation guarantee which depends on the convex set. To the best of our knowledge, this is the first approximation guarantee for general convex sets. Finally we benchmark our algorithm on problems arising in machine learning domain with the real-world datasets. 1 Introduction Submodular (set function) optimization has been widely studied for decades The domain has been under extensive investigation in recent years due to its numerous applications in statistics and machine learning , for example active learning [Golovin and Krause, 2011], viral marketing [Kempe et al., 2003], network monitoring [Gomez Rodriguez et al., 2010], document summarization [Lin and Bilmes, 2011], crowd teaching [Singla et al., 2014], feature selection [Elenberg et al., 2018], deep neural networks [Elenberg et al., 2017], diversity models [Djolonga et al., 2016] and recommender systems [Guillory and Bilmes, 2011]. One of the primary reason for the popularity of submodular functions, is its diminishing returns property. This property has been observed in many settings and is known also as the economy of scale property in economics and operations research. Continuous diminishing-return submodular (DRsubmodular) functions are natural extension of submodular set functions to the continuous setting, specifically by extending the notion of diminishing returns property to continuous domains. These functions have recently gained a significant attention in both, machine learning and theoretical computer science communities [Bach, 2016; Bian et al., 2018; 2017a; Chen et al., 2018] while formulating real-world problems such as non-concave quadratic programming, revenue maximization, log-submodularity and mean inference for Determinantal Point Processes. More examples of this can be found in [Ito and Fujimaki, 2016], [Kulesza et al., 2012], and [Djolonga and Krause, 2014]. Moreover, DR-submodular functions captures a subclass of non-convex functions that enables both exact minimization [Iwata et al., 2001]and approximate maximization [Bian et al., 2017a] in polynomial time. Previous work on the DR-submodular maximization problem has been studied with crucial assumptions on either additional properties of the functions (e.g., monotonicity) or special structures of the convex sets (e.g., unconstrained or downclosed structures). Though, the majority of real-world problems can be formulated as non-monotone DR-submodular functions over a general convex sets. In this paper, we investigate the problem of maximizing constrained non-monotone DR-submodular functions. Our result provides an approximation algorithm for maximizing smooth, non-monotone DR-submodular function over general convex sets. As the name suggest, general convex sets includes conic convex sets, upward-closed convex sets, mixture of covering and packing linear constraints, etc. Prior to this work, no algorithm with performance guarantee was known for the above mentioned problem. In this paper, we consider non-negative DR-submodular functions F, that at any point x [0, 1]n, F(x) 0 and the value of F(x) and its gradient (denoted by F(x)) can be evaluated in the polynomial time. Given a convex set K [0, 1]n, we aim to design an iterative algorithm that produces a solution x T after T iterations such that F(x T ) max x K α F(x) β where α, β are some parameters. Such an algorithm is called an α-approximation with convergence rate β (that depends on, among others, the number of iterations T). 1.1 Our Contributions The problem of maximizing a non-monotone DR-submodular function over a general convex set has been proved to be hard. Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20) Specifically, any constant-approximation algorithm for this problem must require super-polynomial many value queries to the function [Vondrák, 2013]. Determining the approximation ratio as a function depending on the problem parameters and characterizing necessary and sufficient regularity conditions that enable efficient approximation algorithm for the problem constitute an important direction. Exploring the underlying properties of DR-submodularity, we show that the celebrated Frank-Wolf algorithm achieves an approximation ratio of 1 minx K x with the con- vergence rate of O n ln2 T , where T is the number of iterations applied in the algorithm. (The exact convergence rate with parameter dependencies is O βD2 (ln T )2 where β is the smoothness of function F and D is the diameter of K.) In particular, if the set K contains the origin 0 then for arbitrary constant ϵ > 0, after T = O e n/ϵ (subexponential) iterations, the algorithm outputs a solution x T such that F(x T ) 1 3 maxx K F(x ) ϵ. Note that K containing the origin 0 is not necessarily down-closed1, for example a set K the intersection of a cone1 and the hypercube [0, 1]n. As mentioned earlier, due to [Vondrák, 2013], when aiming for approximation algorithms for convex sets beyond the down-closed ones, a super-polynomial number of iterations is unavoidable. To the best of our knowledge, this is the first algorithm approximation guarantee for maximizing nonmonotone DR-submodular functions over non-down-closed convex sets. 1.2 Related Work The problem of submodular (set function) minimization has been studied in [Iwata et al., 2001]. See [Bach and others, 2013] for a survey on connections with applications in machine learning. Submodular maximization is an NP-hard problem. Several approximation algorithms have been given in the offline setting, for example a 1/2-approximation for unconstrained domains [Buchbinder et al., 2015; Buchbinder and Feldman, 2018], a (1 1/e)-approximation for monotone smooth submodular functions [Calinescu et al., 2011] or a (1/e)-approximation for non-motonotone submodular functions on down-closed polytopes [Feldman et al., 2011]. Continuous extension of submodular functions play a crucial role in submodular optimization, especially in submodular maximization, including the multilinear relaxation and the softmax extension. These belong to the class of DRsubmodular functions. [Bian et al., 2017b] considered the problem of maximizing monotone DR-functions subject to down-closed convex sets and proved that the greedy method proposed by [Calinescu et al., 2011], which is a variant of the Frank-Wolfe algorithm, guarantees a (1 1/e)- approximation. It has been observed by [Hassani et al., 2017] that the greedy method is not robust in stochastic settings (where only unbiased estimates of gradients are available). Subsequently, they showed that the gradient meth- 1We refer to Section 2 for a formal definition ods achieve 1/2-approximation in stochastic settings. Maximizing non-monotone DR-submodular functions is harder. Very recently, [Bian et al., 2018] and [Niazadeh et al., 2018] have independently presented algorithms with the same approximation guarantee of 1/2 for the problem of maximizing non-monotone DR-submodular functions over K = [0, 1]n. Both algorithm are inspired by the bi-greedy algorithm in [Buchbinder et al., 2015; Buchbinder and Feldman, 2018]. [Bian et al., 2017a] made a further step by providing an 1/e-approximation algorithm over down-closed convex sets. Remark again that when aiming for approximation algorithms, the restriction to down-closed polytopes is unavoidable. Specifically, [Vondrák, 2013] proved that any algorithm for the problem over a general base of matroids that guarantees a constant approximation must require exponentially many value queries to the function. 2 Preliminaries and Notations We introduce some basic notions, concepts and lemmas which will be used throughout the paper. We use boldface letters, e.g., x, z to represent vectors. We denote xi as the ith entry of x and xt as the decision vector at time step t. For two n-dimensional vectors x, y, we say that x y iff xi yi for all 1 i n. Moreover, x y is defined as a vector such that (x y)i = max{xi, yi} and similarly x y is a vector such that (x y)i = min{xi, yi}. In the paper, we use the Euclidean norm by default (so the dual norm is itself). The infinity norm is defined as x = maxn i=1 |xi|. In the paper, we consider a bounded convex set K and w.l.o.g. assume that K [0, 1]n. We say that K is general if K is simply a convex set without any particular property. K is down-closed if for any vectors x and y such that y K and x y, then x is also in K. A cone is a set C such that if x C then αx C for any constant α > 0. Besides, the diameter of the convex domain K (denoted by D) is defined as supx,y K x y . A function F : [0, 1]n R+ is DR-submodular if for all vectors x, y [0, 1]n with x y, any basis vector ei = (0, . . . , 0, 1, 0, . . . , 0) and any constant α > 0 such that x + αei [0, 1]n, y + αei [0, 1]n, it holds that F(x + αei) F(x) F(y + αei) F(y). (1) This is called diminishing-return (DR) property. Note that if function F is differentiable then the diminishing-return (DR) property (1) is equivalent to F(x) F(y) x y [0, 1]n. (2) Moreover, if F is twice-differentiable then the DR property is equivalent to all of the entries of its Hessian being nonpositive, i.e., 2F xi xj 0 for all 1 i, j n. A differentiable function F : K Rn R is said to be β-smooth if for any x, y K, we have F(y) F(x) + F(x), y x + β 2 y x 2 (3) or equivalently, F(x) F(y) β x y . (4) A property of DR-submodular functions, that we will use in our paper, is the following. Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20) Lemma 1 ([Hassani et al., 2017], proof of Theorem 4.2). For every x, y K and any DR-submodular function F : [0, 1]n R+, it holds that F(x), y x F(x y) + F(x y) 2F(x). Consequently, as F is non-negative, F(x), y x F(x y) 2F(x). 3 DR-submodular Maximization over General Convex Sets In this section, we consider the problem of maximizing a DRsubmodular function over a general convex set. Approximation algorithms [Bian et al., 2017b; 2017a] have been presented and all of them are adapted variants of the Frank-Wolfe method. However, those algorithms require that the convex set is down-closed. This structure is crucial in their analyses in order to relate their solution to the optimal solution. Using some property of DR-submodularity (specifically, Lemma 1), we show that beyond the down-closed structure, the Frank Wolf algorithm guarantees an approximation solution for general convex sets. Below, we present the pseudocode of our variant of the Frank-Wolfe algorithm. Algorithm 1 Frank-Wolfe Algorithm 1: Let x1 arg minx K x . 2: for t = 1 to T do 3: Compute vt arg maxv K F(xt 1), v 4: Update xt (1 ηt)xt 1 + ηtvt with step-size ηt = δ t HT where HT is the T-th harmonic number and δ represents the constant (ln 3)/2. 5: end for 6: return x T First, we show that during the execution of the algorithm, the following invariant is maintained. Lemma 2. It holds that 1 xt i e δ O(1/ ln2 T ) (1 x1 i ) for every 1 i n and every 1 t T. Proof. Fix a dimension i [n]. For every t 2, we have 1 xt i = 1 (1 ηt)xt 1 i ηtvt i (using the Update step from Algorithm 1) 1 (1 ηt)xt 1 i ηt (1 vt i) = (1 ηt)(1 xt 1 i ) e ηt η2 t (1 xt 1 i ) (since 1 u e u u2 for 0 u < 1/2) Using this recursion, we have, 1 xt i e Pt t =2 ηt Pt t =2 η2 t (1 x1 i ) e δ O(1/ ln2 T ) (1 x1 i ) because Pt t =2 η2 t = Pt t =2 δ2 t 2H2 T = O 1/(ln T)2 (since δ is a constant, HT = Θ(ln T), Pt t =2 1 t 2 = O(1)); and Pt t =2 ηt = δ HT Pt t =2 1 t δ Ht The following lemma was first observed in [Feldman et al., 2011] and was generalized in [Bian et al., 2017a, Lemma 3]. Lemma 3 ([Feldman et al., 2011; Bian et al., 2017a]). For every x, y K and any DR-submodular function F : [0, 1]n R+, it holds that F(x y) 1 x F(y). Theorem 1. Let K [0, 1]n be a convex set and let F : [0, 1]n R+ is a non-monotone β-smooth DR-submodular function. Let D be the diameter of K. Then Algorithm 1 yields a solution x T K such that the following inequality holds: 1 min x K x max x K F x Proof. Let x K be the maximum solution of F. Let r = e δ O(1/ ln2 T ) (1 maxi x1 i ). Note that from Lemma 2, it follows that (1 xt ) r for every t. Next we present a recursive formula in terms of F(xt) and F(x ): 2F xt+1 r F x = 2F (1 ηt+1)xt + ηt+1vt+1 r F x (Using the Update step from Algorithm 1) 2F xt r F x 2ηt+1 F (1 ηt+1)xt + ηt+1vt+1 , (xt vt+1) β(ηt+1)2 xt vt+1 2 (β-smoothness as defined by Inequality (3)) = 2F xt r F x + 2ηt+1 F xt , (vt+1 xt) + 2ηt+1 F (1 ηt+1)xt + ηt+1vt+1 (vt+1 xt) 2ηt+1 F xt , (vt+1 xt) β(ηt+1)2 vt+1 xt 2 2F xt r F x + 2ηt+1 F xt , (vt+1 xt) 2ηt+1 F at F xt (vt+1 xt) β(ηt+1)2 vt+1 xt 2 (where at = (1 ηt+1)xt + ηt+1vt+1) (Using Cauchy-Schwarz) 2F xt r F x + 2ηt+1 F xt , (vt+1 xt) 3β(ηt+1)2 vt+1 xt 2 (β-smoothness as defined by Inequality (4)) 2F xt r F x + 2ηt+1 F xt , x xt 3β(ηt+1)2 vt+1 xt 2 (definition of vt+1) 2F xt r F x + 2ηt+1 F x xt 2F xt Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20) 3β(ηt+1)2 vt+1 xt 2 (Lemma 1) 2F xt r F x + 2ηt+1 r F x 2F xt 3β(ηt+1)2 vt+1 xt 2 (Lemma 3) (1 2ηt+1) 2F xt r F x 3β(ηt)2D2, where D is the diameter of K. Let ht = 2F xt r F x . By the previous inequality and the choice of ηt, we have ht+1 (1 2ηt+1) ht 3β(ηt)2D2 = (1 2ηt+1) ht O βδ2D2 where we used the facts that HT = Θ(ln T). Note that the constants hidden by the O-notation above is small. Therefore, t=2 (1 2ηt) h1 O βδ2D2 e 2 PT t=2 ηt 4 PT t=2 η2 t h1 O βδ2D2 (since (1 u) e u u2 for 0 u < 1/2) e 2δ O(1/ ln2 T ) h1 O where the last inequality holds because PT t=2 η2 t = PT t=2 δ2 t2H2 T = O 1/(ln T)2 (as δ is a constant) and PT t=2 ηt = δ HT PT t=2 1 t δ. Hence, 2F x T r F x e 2δ O 1/(ln T )2 2F x1 r F x which implies, 1 e 2δ O 1/(ln T )2 F x O By the definition of r, the multiplicative factor of F x in the right-hand side is e δ O(1/ ln2 T ) 1 e 2δ O 1/(ln T )2 2 (1 max i x1 i ) Note that for T sufficiently large, O(1/(ln T)2) 1. Optimizing this factor by choosing δ = ln 3 (1 max i x1 i )F x O and the theorem follows. Remark. The above theorem also indicates that the performance of the algorithm, specifically the approximation ratio, depends also on the initial starting point x1. This matches to the observation in non-convex optimization that the quality of the output solution depends on, among others, the initialization. The following corollary is a direct consequence of the theorem for a particular family of convex sets K beyond the down-closed ones. Note that for any convex set K [0, 1]n, the diameter D n. Corollary 1. If 0 K, then the guarantee in Theorem 1 can be written as: max x K F x O βn (ln T)2 where the starting point x1 = 0. 4 Experiments In this section, we validate our algorithm for non-monotone DR submodular optimization on both real-world and synthetic datasets. Our experiments are broadly classified into following two categories: 1. We compare our algorithm against two previous known algorithms for maximizing non-monotone DRsubmodular function over down-closed polytopes mentioned in [Bian et al., 2017a]. Recall that our algorithm applies to a more general setting where the convex set is not required to be down-closed. 2. Next, we show the performance of our algorithm for maximizing non-monotone DR-submodular function over general polytopes. Recall that no previous algorithm was known to have performance guarantees for this problem. All experiments are performed in MATLAB using CPLEX optimization tool on MAC OS version 10.142. 4.1 Performance over Down-closed Polytopes Here, we benchmark the performance of our variant of the Frank-Wolfe algorithm from Section 3 against the previous known two algorithms for maximizing continuous DR submodular functions over down-closed polytopes mentioned in [Bian et al., 2017a]. We considered QUADPROGIP, which is a global solver for non-convex quadratic programming, as a baseline. We run all the algorithms for 100 iterations. All the results are the average of 20 repeated experiments. For the sake of completion, we describe below the problem and different settings used. We follow closely the experimental settings from [Bian et al., 2017a], and adapted their source codes to our algorithm. Quadratic Programming. As a state-of-the-art global solver, we used QUADPROGIP to find the global optimum 2The source code is available at https://sites.google.com/ site/abhinavsriva/ijcai-20-code and https://www.ibisc.univ-evry.fr/ ~thang Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20) (a) (b) (c) Figure 1: Uniform distribution with down-closed polytope and (a) m = 0.5n (b) m = n (c) m = 1.5n . which is used to calculate the approximation ratios. Our problem instances are synthetic quadratic objectives with downclosed polytope constraints, i.e., 2x Hx + h x + c and K = {x Rn +|Ax b, x u, A Rm n + , b Rm +}. Note that in previous sections, we have assumed w.l.o.g that K [0, 1]n. By scaling our results hold as well for the general box constraint x u, provided the entries of u are upper bounded by a constant. Both objective and constraints were randomly generated, using the following two ways: Uniform Distribution. H Rn n is a symmetric matrix with uniformly distributed entries in [ 1, 0]; A Rm n has uniformly distributed entries in [µ, µ+1], where µ = 0.01 is a small positive constant in order to make entries of A strictly positive for down-closed polytopes. Exponential Distribution. Here, the entries of H and A are sampled from exponential distributions Exp(λ) where given a random variable y 0, the probability density function of y is defined by λe λy, and for y < 0, its density is fixed to be 0. Specifically, each entry of H is sampled from Exp(1). Each entry of A is sampled from Exp(0.25) + µ, where µ = 0.01 is a small positive constant. We set b = 1m, and u to be the tightest upper bound of K by uj = mini [m] bi Aij , j [n]. In order to make f nonmonotone, we set h = 0.2 H u. To make sure that f is non-negative, first we solve the problem minx P 1 2x Hx + h x using QUADPROGIP. Let the solution to be ˆx, then we set c = f(ˆx) + 0.1 |f(ˆx)|. The approximation ratios w.r.t. the dimension n are plotted in Figures 1 and 2 for the two distributions. In each figure, we set the number of constraints to be m = 0.5n , m = n and m = 1.5n , respectively. We can observe that our version of Frank-Wolfe (denoted our-frank-wolfe) has comparable performance with the stateof-the-art algorithms when optimizing DR-submodular functions over down-closed convex sets. Note that the performance is clearly consistent with the proven approximation guarantee of 1/e shown in [Bian et al., 2017a]. We also show that the performance of our algorithm is consistent with the proven approximation guarantee of (1/3 3) for down-closed convex sets. 4.2 Performance over General Polytopes We consider the problem of revenue maximization on a (undirected) social network graph G(V, E) where V the set of vertices (users) and E is set of edges (connections between users). Additionally, there are weight wij of the edge between vertex i and vertex j. The goal is to offer for free or advertise a product to users so that the revenue increases through their word-of-mouth" effect on others. If one invests x unit of cost on a user i V , the user i becomes an advocate of the product (independently from other users) with probability 1 (1 p)x where p (0, 1) is a parameter. Intuitively, it means that for investing a unit cost to i, we have an extra chance that the user i becomes an advocate with probability p [Soma and Yoshida, 2017]. Let S V be a set of users who advocate for the product. Note that S is random set. Then the revenue with respect to S is defined as P i S P j V \S wij. Let f : R|V | + R+ be the expected revenue obtained in this model, that is f(x) = ES X j:i =j wij(1 (1 p)xi)(1 p)xj It has been shown that f is a non-monotone DR-submodular function [Soma and Yoshida, 2017]. We imposed a minimum and a maximum investment constraint on the problem such that 0.25 P i xi 1. This, in addition with xi 0, forms a general (non-down-closed) convex set K := {x R|V | + : 0.25 P i xi 1, xi 0 i}. In our experiments, we used the Advogato network with 6.5K users (vertices) and 61K connections (edges). We set p = 0.0001. In Figure 3, we show the performance of the Frank-Wolfe algorithm (as mentioned in Section 3) and commonly used Gradient Ascent algorithm. It is imperative to note that no performance guarantee is known for the Gradient Ascent algorithm for maximizing a non-monotone DR-submodular function over a general constraint polytope. We can clearly observe that the Frank-Wolfe algorithm performs at least as good as the commonly used Gradient Ascent algorithm. 5 Conclusion In this paper, we have provided performance guarantees for the problems of maximizing non-monotone submodular/DR- Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20) (a) (b) (c) Figure 2: Exponential distribution with down-closed polytope and (a) m = 0.5n , (b) m = n, (c) m = 1.5n Figure 3: Revenue Maximization on Advogato dataset submodular functions over convex sets. This result is complemented by experiments in different contexts. Moreover, the results give raise to the question of designing online algorithms for non-monotone DR-submodular maximization over a general convex set. Characterizing necessary and sufficient regularity conditions/structures that enable efficient algorithm with approximation and regret guarantees is an interesting direction to pursue. Acknowledgements This research work is supported by the ANR OATA no ANR15-CE40-0015-01. We would also like to thanks Mengxiao Zhang for useful discussions. [Bach and others, 2013] Francis Bach et al. Learning with submodular functions: A convex optimization perspective. Foundations and Trends in Machine Learning, 6(23):145 373, 2013. [Bach, 2016] Francis Bach. Submodular functions: from discrete to continuous domains. Mathematical Programming, pages 1 41, 2016. [Bian et al., 2017a] Andrew An Bian, Kfir Levy, Andreas Krause, and Joachim M. Buhmann. Continuous DRsubmodular maximization: Structure and algorithms. In Neur IPS, 2017. [Bian et al., 2017b] Andrew An Bian, Baharan Mirzasoleiman, Joachim Buhmann, and Andreas Krause. Guaranteed non-convex optimization: Submodular maximiza- tion over continuous domains. In AISTATS, pages 111 120, 2017. [Bian et al., 2018] An Bian, Joachim M Buhmann, and Andreas Krause. Optimal DR-submodular maximization and applications to provable mean field inference. In Neur IPS, 2018. [Buchbinder and Feldman, 2018] Niv Buchbinder and Moran Feldman. Deterministic algorithms for submodular maximization problems. ACM TALG, 14(3):32, 2018. [Buchbinder et al., 2015] Niv Buchbinder, Moran Feldman, Joseph Seffi, and Roy Schwartz. A tight linear time (1/2)- approximation for unconstrained submodular maximization. SIAM Computing, 44(5):1384 1402, 2015. [Calinescu et al., 2011] Gruia Calinescu, Chandra Chekuri, Martin Pál, and Jan Vondrák. Maximizing a monotone submodular function subject to a matroid constraint. SIAM J. Computing, 40(6):1740 1766, 2011. [Chen et al., 2018] Lin Chen, Hamed Hassani, and Amin Karbasi. Online continuous submodular maximization. In AISTAT, 2018. [Djolonga and Krause, 2014] Josip Djolonga and Andreas Krause. From map to marginals: Variational inference in bayesian submodular models. In Neur IPS, pages 244 252, 2014. [Djolonga et al., 2016] Josip Djolonga, Sebastian Tschiatschek, and Andreas Krause. Variational inference in mixed probabilistic submodular models. In Neur IPS, pages 1759 1767, 2016. [Elenberg et al., 2017] Ethan Elenberg, Alexandros G Dimakis, Moran Feldman, and Amin Karbasi. Streaming weak submodularity: Interpreting neural networks on the fly. In Neur IPS, pages 4044 4054, 2017. [Elenberg et al., 2018] Ethan R Elenberg, Rajiv Khanna, Alexandros G Dimakis, Sahand Negahban, et al. Restricted strong convexity implies weak submodularity. The Annals of Statistics, 46(6B):3539 3568, 2018. [Feldman et al., 2011] Moran Feldman, Joseph Naor, and Roy Schwartz. A unified continuous greedy algorithm for submodular maximization. In FOCS, pages 570 579, 2011. [Golovin and Krause, 2011] Daniel Golovin and Andreas Krause. Adaptive submodularity: Theory and applications Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20) in active learning and stochastic optimization. J. of AI Research, 42:427 486, 2011. [Gomez Rodriguez et al., 2010] Manuel Gomez Rodriguez, Jure Leskovec, and Andreas Krause. Inferring networks of diffusion and influence. In SIGKDD, pages 1019 1028, 2010. [Guillory and Bilmes, 2011] Andrew Guillory and Jeff A Bilmes. Simultaneous learning and covering with adversarial noise. In ICML, volume 11, pages 369 376, 2011. [Hassani et al., 2017] Hamed Hassani, Mahdi Soltanolkotabi, and Amin Karbasi. Gradient methods for submodular maximization. In Neur IPS, pages 5841 5851, 2017. [Ito and Fujimaki, 2016] Shinji Ito and Ryohei Fujimaki. Large-scale price optimization via network flow. In Neur IPS, pages 3855 3863, 2016. [Iwata et al., 2001] Satoru Iwata, Lisa Fleischer, and Satoru Fujishige. A combinatorial strongly polynomial algorithm for minimizing submodular functions. J. ACM, 48(4):761 777, 2001. [Kempe et al., 2003] David Kempe, Jon Kleinberg, and Éva Tardos. Maximizing the spread of influence through a social network. In SIGKDD, pages 137 146, 2003. [Kulesza et al., 2012] Alex Kulesza, Ben Taskar, et al. Determinantal point processes for machine learning. Foundations and Trends in Machine Learning, 5(2 3):123 286, 2012. [Lin and Bilmes, 2011] Hui Lin and Jeff Bilmes. A class of submodular functions for document summarization. In ACL, pages 510 520, 2011. [Niazadeh et al., 2018] Rad Niazadeh, Tim Roughgarden, and Joshua R Wang. Optimal algorithms for continuous non-monotone submodular and dr-submodular maximization. In Neur IPS, 2018. [Singla et al., 2014] Adish Singla, Ilija Bogunovic, Gábor Bartók, Amin Karbasi, and Andreas Krause. Nearoptimally teaching the crowd to classify. In ICML, pages 154 162, 2014. [Soma and Yoshida, 2017] Tasuku Soma and Yuichi Yoshida. Non-monotone DR-submodular function maximization. In AAAI, pages 898 904, 2017. [Vondrák, 2013] Jan Vondrák. Symmetry and approximability of submodular maximization problems. SIAM J. Computing, 42(1):265 304, 2013. Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20)