# instance_specific_approximations_for_submodular_maximization__f5f4a9e3.pdf

Instance Speciﬁc Approximations for Submodular Maximization

Eric Balkanski 1 Sharon Qian 2 Yaron Singer 2

For many optimization problems in machine learn ing, ﬁnding an optimal solution is computation ally intractable and we seek algorithms that per form well in practice. Since computational in tractability often results from pathological in stances, we look for methods to benchmark al gorithms performance against optimal solutions on real-world instances. The main challenge is that an optimal solution cannot be efﬁciently com puted for intractable problems, and we therefore often do not know how far a solution is from be ing optimal. A major question is therefore how to measure the performance of an algorithm in comparison to an optimal solution on instances we encounter in practice.

In this paper, we address this question in the context of submodular optimization problems. For the canonical problem of submodular max imization under a cardinality constraint, it is in tractable to compute a solution that is better than a 1 1/e 0.63 fraction of the optimum. Al gorithms like the celebrated greedy algorithm are guaranteed to achieve this 1 1/e bound on any instance, and are used in practice.

Our main contribution is not a new algorithm for submodular maximization but an analytical method that measures how close an algorithm for submodular maximization is to optimal on a given problem instance. We use this method to show that on a wide variety of real-world datasets and objectives, the approximation of the solution found by greedy goes well beyond 1 1/e and is often at least 0.95. We develop this method using a novel technique that lower bounds the objective of a dual minimization problem to obtain an upper bound on the value of an optimal solution to the primal maximization problem.

1Columbia University, New York, NY, USA 2Harvard Univer sity, Cambridge, MA, USA. Correspondence to: Eric Balkanski <eb3224@columbia.edu>.

Proceedings of the 38 th International Conference on Machine Learning, PMLR 139, 2021. Copyright 2021 by the author(s).

1. Introduction

A central challenge in machine learning is that many of the optimization problems we deal with are computationally intractable. For problems like clustering, sparse recovery, and maximum likelihood estimation for example, ﬁnding an optimal solution is computationally intractable and we seek heuristics that perform well in practice. Computa tional intractability implies that under worst case analysis any efﬁcient algorithm is suboptimal; however, worst-case approximation guarantees are often due to pathological in stances that are not representative of instances we encounter in practice. Thus, we would like to be assured that the al gorithms we use, despite poor performance on pathological instances, perform provably well on real-world instances.

In order to evaluate the performance of an algorithm on realworld instances, we would like to measure its performance in comparison to an optimal solution. The main challenge however, is that we cannot evaluate the performance of an algorithm against an optimal solution since ﬁnding an opti mal solution for a computationally intractable problem is, by deﬁnition, intractable. Thus, we often do not know how far an algorithm solution is from optimal, and whether there is a substantially better algorithm. Therefore, for compu tationally intractable problems, our main challenge is not necessarily how to design better algorithms, but rather how to measure the performance of an algorithm in comparison to a theoretically optimal solution on real-world instances.

How do we measure the performance of an algorithm on speciﬁc instances for problems that are intractable?

In this paper, we develop a method to measure how close to optimal the performance of an algorithm is on speciﬁc instances for the broad class of submodular maximization problems. Since many objectives that we aim to optimize, such as coverage, diversity, entropy, and graph cuts are submodular, submodular maximization algorithms are heav ily employed in applications such as speech and document summarization (Lin & Bilmes, 2011), recommender sys tems (Mirzasoleiman et al., 2016b), feature selection (Das & Kempe, 2011), sensor placement (Guestrin et al., 2005), and network analysis (Kempe et al., 2003). Submodular maximization provides an ideal framework to address this problem since it is intractable to compute a solution that

Instance Speciﬁc Approximations for Submodular Maximization

is better than a 1 1/e approximation for the canonical problem of maximizing a monotone submodular function under a cardinality constraint (Nemhauser & Wolsey, 1978). In addition, multiple algorithms are known to enjoy con stant factor approximation guarantees, such as the greedy algorithm that achieves this 1 1/e approximation on any instance (Nemhauser et al., 1978).

Our contribution. We develop a novel and efﬁcient method, called DUAL, to measure how close to optimal the performance of an algorithm is on an instance of maxi mizing a monotone submodular function under a cardinality constraint. These instance speciﬁc approximations are ob tained by upper bounding the optimal value of an instance. We use this method to show that greedy, as well as other submodular maximization algorithms, perform signiﬁcantly better than 1 1/e in practice. On a wide variety of large real-world datasets and objectives, we ﬁnd that approxima tion of the solution found by greedy almost always exceeds 0.85 and often exceeds 0.95, a 50 percent improvement over 1 1/e 0.63. Using this method, we also ﬁnd that greedy and another algorithm with the same 1 1/e theoret ical guarantee have signiﬁcantly different approximations in practice. Additionally, we show that DUAL signiﬁcantly outperforms multiple benchmarks for measuring instance speciﬁc approximations.

Technical overview. DUAL derives an approximation ob tained by a solution S to an instance of max|S| k f(S), where f : 2N R is a monotone submodular function, by upper bounding the optimal value OPT. To upper bound OPT, we consider a dual problem g(v) = min S:f(S) v |S| of ﬁnding the solution S of minimum size that has value at least v. We then construct a function g(v) that lower bounds g(v) and is such that the maximum value v such that g(v) k can be found in O(n log n) running time. By exploiting the fact that g lower bounds g, we show that this maximum value v is an upper bound on OPT.

The construction of g is motivated by coverage functions, a special case of submodular functions, where the goal is to maximize the coverage of a universe U. We show that the dual objective g : 2U R, corresponding to the minimum size of a set S N which covers T U, can be lower bounded by an additive function : 2U R, which is close to g in practice. Since is additive, it can be minimized efﬁciently to give a lower bound on the dual problem, which implies an upper bound on OPT.

Related work. A line of work has investigated different properties of submodular functions that enable improved approximation guarantees for the greedy algorithm. The curvature c [0, 1] of a function f measures how close f is to additive (Conforti & Cornu ejols, 1984; Soma & Yoshida,

2017). Submodular sharpness, an analog of sharpness from continuous optimization (Lojasiewicz, 1963), measures the behavior of a function f around the set of optimal solu tions (Pokutta et al., 2020). For stability, a function f is perturbation-stable if the optimal solution for maximizing f does not change under small perturbations (Chatziafratis et al., 2017). Finally, drawing from smoothed analysis, Ru binstein and Zhao (2021) show improved approximations using randomized perturbations of the budget k. For curva ture, sharpness and stability, the main issue with using these parameterized properties to measure how close to optimal the greedy algorithm is on speciﬁc instances is that their pa rameters cannot be computed efﬁciently. Since they require brute-force computation, they have only been computed on small instances with at most n = 20 elements (Pokutta et al., 2020) and cannot be used on real-world instances. In addition, on these small instances, these three properties and budget-smooth analysis all yield approximations that are not as strong as those obtained by DUAL. The main goal of this line of work is to provide an explanation for greedy s strong performance in practice. In contrast, our paper focuses on the problem of measuring the performance of greedy, and other algorithms, in practice.

Sakaue and Ishihata (2018) develop a method to upper bound the optimal value of a submodular maximization instance, but with a different goal of using this upper bound to accelerate submodular maximization algorithms. This method is used as a benchmark in our experiments. Baeza Yates, Boldi, and Chierichetti (2015) construct a method to obtain instance-speciﬁc approximations for the special case of coverage functions by exploiting the linear program formulation of coverage functions and using the dual for mulation of its relaxation as an upper bound of the optimal value. This method does not extend to general submodu lar problems, which cannot be formulated as a polynomial size LP. We instead propose a different dual problem for instance-speciﬁc approximations for general submodular objectives.

1.1. Preliminaries

A function f : 2N R is submodular if f S (a) f T (a) for all S T N and a N \ T , where f S (a) = f(S {a}) f(S) is the marginal contribution of a to S. It is monotone if f(S) f(T ) for all S T N. A function f : 2P R is a coverage function if there exists a bipartite graph G = (P, D, E) over primal and dual elements P D such that f(S) = |NG(S)| where NG(S) = a S NG(a) D denotes the neighbors of S P in G. We say that set S covers T , or equivalently that T is covered by S, if T NG(S). When G is clear from the context, we write N(S) instead of NG(S) to denote the neighbors of S in graph G.

Instance Speciﬁc Approximations for Submodular Maximization

Given an instance of the problem max|S| k f(S) and a solution S? to this problem, we aim to compute an approxi mation for S?, i.e. a lower bound on f(S?)/ max|S| k f(S) or, equivalently, an upper bound on max|S| k f(S).

2. Instance Speciﬁc Approximations for Max-Coverage

In this section, we present a method that measures how close to optimal a solution is for maximum coverage problems. As a warm-up, we focus on this special case to motivate and provide intuition for the method for submodular functions presented in Section 3. We emphasize that our main contri bution is for general submodular maximization problems, which, unlike coverage problems, cannot be formulated as a polynomial size linear program.

In Section 2.1, we introduce the problem of minimum cover under a cardinality constraint, which is a generalization of the classical set cover problem. We show that a lower bound on this minimum cover problem implies an upper bound on the optimal value OPT for the max-coverage problem. In Section 2.2, we present an additive lower bound on the dual problem, which is used to efﬁciently compute a lower bound of the optimal value to the dual problem. Then, in Section 2.3, we develop a more sophisticated lower bound on the dual problem. Due to space constraints, we defer missing proofs to Appendix A.

2.1. The dual problem

We introduce the minimum cover under a cardinality con straint problem. Recall that given a bipartite graph G over nodes P and nodes D, the problem of maximum cover age under a cardinality constraint problem is to ﬁnd the k elements S P that maximize the number of elements NG(S) D covered by S. In contrast, the minimum cover under a cardinality constraint problem is to ﬁnd the v ele ments T D that minimize the number of elements S P needed to cover T NG(S). Deﬁnition 1. The minimum cover under a cardinality con straint problem is deﬁned as min T D:|T | v g(T ), where g(T ) = min S P :T N(S) |S| is the size of the minimum cover of T and where v [|D|] is the cardinality constraint.

This problem is a generalization of the classical set cover problem, which ﬁnds the minimum number of elements S to cover all elements D. We obtain the following duality property: a lower bound on minimum cover implies an upper bound on maximum coverage, and vice-versa. Lemma 1. Let f : 2P N be a coverage function deﬁned over a biparite graph between elements P and D. For any k [|P |] and v [|D|], max S P :|S| k f(S) < v if and only if min T D:|T | v g(T ) > k where g : 2D N is the size of the minimum cover of T as deﬁned in Deﬁnition 1.

We refer to max S P :|S| k f(S) and min T D:|T | v g(T ) as the primal and dual problems. We also refer to P and D as the primal and dual elements.

2.2. Warm-up: Approximations via an additive lower bound on the dual

This dual problem admits an additive1 lower bound that is, as we will show empirically in Section 5, close to the dual objective in practice. This is in contrast to the primal maximum coverage problem, which is far from additive on real instances. We deﬁne the individual value vb of each

1 dual element b as vb = mina N(b) . The additive |N(a)| function : 2D R is deﬁned as follows:

X X 1 (T ) = vb = min . a N (b) |N(a)| b T b T

Note that if b D is covered by primal element a P , then a covers at most 1/vb dual elements. In other words, 1/vb is an upper bound on the value obtained from an element a that covers b.

We use this additive lower bound ( ) on the dual objective to design a method that returns an upper bound on the optimal value for the primal problem. Method 1 ﬁrst orders the dual elements b by increasing value vb. It then ﬁnds the preﬁx {b1, . . . , bi? } of this ordering where i? is the minimum size i such that ({b1, . . . , bi}) = Pi

j=1 vbj > k, and then returns i? . In other words, it ﬁnds the smallest size i such that (T ) > k for all sets T of size i.

Method 1 Linear bound on dual objective for coverage

input bipartite graph G = (P, D, E), constraint k

1 vb mina N (b) , for each b D |N(a)| (b1, . . . , b|D|) elements D ordered by increasing vbi i? min{i : Pi > k} j=1 vbj return i?

The analysis. We ﬁrst show that ( ) is a lower bound on the dual objective g( ) (Lemma 2). We then show that {b1, . . . , bi} minimizes over all sets of size at least i (Lemma 3). Together, these imply that there are no sets of primal elements of size k which cover i? dual elements and we obtain i? > max S P :|S| k f(S) (Theorem 1).

Lemma 2. For any coverage function deﬁned by a bipartite graph G between P and D, we have that for any set T D, (T ) g(T ) = min S P :T N(S) |S|.

1A function g is additive if there exist values v1, . . . , vm such P that g(T ) = b T vb.

Instance Speciﬁc Approximations for Submodular Maximization

Proof. For any S P such that T N(S), we have

X X 1 |S| = |N(a)| a S b N (a) X X 1 min a0 N(b) |N(a0)| a S b N (a) X 1 min = (T ). a N(b) |N(a)| b T

Lemma 3. Consider the ordering of dual elements D by increasing singleton values, i.e., (b1, . . . , b|D|) where (bi) (bj ) for all i < j, then, for all v |D|, (Dv) = min T D:|T | v (T ) where Dv = {b1, . . . , bv}.

Proof. Since is an additive function, the set T of size at least v with minimum value is the set consisting of the v dual elements of minimum singleton value, which is Dv.

We are now ready to formally prove that Method 1 returns an upper bound on the optimal value to the primal problem.

Theorem 1. For any k, let i? be the value returned by Method 1, then i? > max S P :|S| k f(S).

Proof. By deﬁnition of i?, Lemma 3 and Lemma 2,

k < (Di? ) = min (T ) min min |S|. T D:|T | i? T D:|T | i? S P : T N(S)

By Lemma 1, we get i? > max S P :|S| k f(S).

2.3. Improved method for coverage functions

We improve Method 1 by constructing a lower bound g( ) on the dual objective g( ) that is tighter than ( ). The function g(T ) is obtained by partitioning the collection of dual elements T into j parts j Pi = T . We deﬁne the i=1 weight w(Pi) of part Pi to be

1 w(Pi) = |Pi| max vb = |Pi| max min . b Pi b Pi a N(b) |N(a)|

We note that if w(Pi) > 1, then dual elements Pi cannot be covered by a single primal element since there must exist b Pi such that maxa N(b) |N(a)| < |Pi|. This motivates the following deﬁnition of a valid partition.

Deﬁnition 2. A partition P1, . . . , Pj of T is valid if w(Pi) 1 for all i j.

This deﬁnition is such that if a partition P1, . . . , Pj is not valid, then there must exist a part Pi which cannot be cov ered by a single primal element. We exploit this property to

deﬁne the following improved lower bound g : 2D R on the dual objective:

g(T ) = min{j : a valid partition P1, . . . , Pj of T }.

This lower bound g(T ) is always tighter than the additive lower bound ( ).

Proposition 1. For any coverage function f : 2P N deﬁned by bipartite graph (P, D, E), we have g(T ) (T ) for all T D.

Similarly as with Method 1, we use this lower bound g( ) on the dual objective to design a method that returns an upper bound on the optimal value for the primal problem. Method 2, which will be generalized to Method 3 for submodular functions, iteratively constructs a valid partition k

κ=1Pκ of a collection of dual elements T such that |T | is maximized. At iteration κ, part Pκ of the partition is deﬁned as the collection of dual elements {biκ 1+1, . . . , biκ }, which are the dual elements with minimum value vb that are not in the previous parts P1, . . . , Pi 1, where iκ is the maximum index such that part Pi is valid. The method returns value ik, which is the total size | k

κ=1 Pκ| of the partition.

Method 2 Dual bound via partitioning for coverage

input bipartite graph G = (P, D, E), constraint k

1 vb mina N (b) , for each b D |N(a)| (b1, . . . , b|D|) elements D ordered by increasing vbi i0 0 for κ = 1 to k do

iκ max{i : (i iκ 1) vbi 1} Pκ {biκ 1+1, . . . , biκ } return ik

Since Method 2 is a special case of Method 3, the analysis of Method 3 in the next section also applies to Method 2.

3. Instance Speciﬁc Approximations for Submodular Maximization

In this section, we present our main method, DUAL, which generalizes Method 2 to submodular functions. For cov erage functions, the value achieved by a solution S corre sponds to the number of dual elements covered by S and the optimal value can be upper bounded by analyzing dual elements. However, for general submodular functions, there are no dual elements corresponding to the solution value. We ﬁrst introduce a similar dual minimization problem as for coverage, but deﬁned over values v R instead of dual elements T D. We then construct a lower bound on the dual objective and use it to design a method that upper bounds OPT in O(n log n) running time. Missing proofs are in Appendix B.1.

Instance Speciﬁc Approximations for Submodular Maximization

We introduce the minimum submodular cover under a car dinality constraint problem, where the goal is to ﬁnd the smallest set S of value at least v:

g(v) = min |S|. S N:f (S) v

This problem is a generalization of the submodular cover problem (Wolsey, 1982), which is to ﬁnd the smallest set S of value at least f(N). We note that, unlike the dual objective for coverage functions, there are no dual elements.

Next, we deﬁne a function g(v) which lower bounds this dual objective. Similarly as for coverage functions, we con sider partitions of the dual space and we deﬁne a collection of valid partitions that is used to then deﬁne g(v). We as sume that the ground set of elements N = {a1, . . . , an} is indexed by decreasing singleton value, i.e., f(ai) f(aj ) for all i < j. We deﬁne Ai := {a1, . . . , ai}. Deﬁnition 3. Values v1, . . . , vk R form a valid partition P ing of v R if j [k] vj = v and there exists a witness W N such that, for all j [k], f(aij ) vj where

( j ) X ij = min i : ai W, f(W Ai) v .

As we will show in Lemma 4, for any solution set S, v1 = f S0 (S1), v2 = f S1 (S2), . . . , v|S| = f S|S| 1 (S) forms a valid partioning of v = f(S), where Sj is the set of j ele ments in S with the largest singleton value. This implies that if a partition v1, . . . , vk is not valid, then there is no solution S of size k such that f Sj 1 (Sj ) vj for all j [k]. Thus, if a value v does not have a valid partitioning v1, . . . , vk, then there are no solution S of size k such that f(S) v. This implies that the following function g : R N lower bounds the dual objective (Lemma 4):

g(v) = min{k : a valid partition v1, . . . , vk of v}.

We use the lower bound g(v) on the dual objective to construct a method that returns an upper bound on OPT. Method 3 iteratively constructs a valid partition v1, . . . , vk P of the dual space such that j [k] vj is maximized. It ﬁrst orders the elements ai N by decreasing singleton value f(ai). Then, at each iteration j, it deﬁnes value vj to be the maximum value v such that partition v1, . . . , vj is a valid partition with witness W = Aij .

Method 3 Method for submodular functions input function f, cardinality constraint k (a1, . . . , an) elements ordered by decreasing f(ai) For j = 1 to k do

vj max{v : f(aij ) v where Pj 1 ij = min{i : f(Ai) v}} =1 v Pk return j=1 vj

Value vj at iteration j can be found by iterating through elements indexed by i {ij 1 + 1, ij 1 + 2, . . .} until i? , where i? is the minimum index such that f(Ai? ) Pj 1 f(ai? ). Since an element ai is considered at =1 v most once over all iterations, the total running time of the for loop is O(n). Thus, the running time of Method 3 is O(n log n) due to the sorting of the elements by singleton value. More details on ﬁnding value vj in Appendix B.2.

The analysis. We ﬁrst show that g(v) lower bounds g(v) in Lemma 4. The proof uses the fact that g(v) is monotone.

Lemma 4. For any submodular function f and v R, g(v) g(v) = min S N:f (S) v |S|.

Pk We now show that the value returned by the j=1 vj method is the maximum value v such that g(v) k.

Pk Lemma 5. Let OPT = j=1 vj be the solution returned by Method 3, then g(OPT + ϵ) > k for all ϵ > 0.

Finally, we combine Lemma 4 and Lemma 5 to show that Method 3 returns an upper bound of OPT.

Pk Theorem 2. For any k, Let OPT = j=1 vj be the solution returned by Method 3, then OPT max S N:|S| k f(S).

Proof. We ﬁrst show by contrapositive that a lower bound k < g(v) = min S N:f(S) v |S| on the dual problem im plies an upper bound v > max S N:|S| k f(S) on the pri mal problem. Assume max S N:|S| k f(S) v. Then, there exists S? such that f(S?) v and |S?| k and we get g(v) = min S N:f (S) v |S| |S?| k.

Then, by Lemma 5 and Lemma 4, we have

k < g(OPT + ϵ) g(OPT + ϵ) = min |S| S N:f (S) OPT+ϵ

which implies OPT + ϵ > max|S| k f(S) for all ϵ > 0.

We describe our main method, DUAL, which uses Method 3 as a subroutine. In the case where a small number of ele ments have very large singleton values, Method 3 can re turn an arbitrarily bad approximation to OPT (See example in Appendix B.3). To circumvent this issue, DUAL calls Method 3 on the marginal contribution function f S (T ) = f(S T ) f(S) for each S in a collection of sets S given as input. If {} S, then DUAL is no worse than Method 3. If there is S S such that there are no elements with large singleton value according to f S , then DUAL circumvents the issue previously mentioned. We note that adding more sets to S can only improve the approximation given by DUAL.

Instance Speciﬁc Approximations for Submodular Maximization

20 40 60 80 100 k

Approximation

Youtube, Infl. Max (n=1045)

Greedy Local Search Lazier Gr Random Gr

20 40 60 80 100 k

Approximation

Uber, Car Dispatch (n=1000)

20 40 60 80 100 k

Approximation

Citation, Infl. Max (n=9877)

20 40 60 80 100 k

Approximation

Movie Lens, Movie Rec. (n=3706)

20 40 60 80 100 k

Approximation

Movie Lens, Facility Loc. (n=500)

20 40 60 80 100 k

Approximation

Facebook, Rev. Max (n=769)

20 40 60 80 100 k

Approximation

Census, Feat. Selection (n=109)

10 20 30 40 50 k

Approximation

Sensor Placement (n=54)

Figure 1. Approximations computed by DUAL for the performance of four different submodular maximization algorithms.

Method 4 DUAL input function f, constraint k, collection of sets S

OPT f(N) for S in S do

0 OPT Method3(f S , k)

0 OPT min(f(S) + OPT , OPT) return OPT

Theorem 3. For any set collection of sets S, Method 4 returns OPT such that OPT OPT.

4. Guarantees for DUAL

In this section, we show guarantees on the performance of DUAL, i.e., guarantees on how far the upper bound, OPT, returned by DUAL is to OPT. We note that our analysis holds for the more demanding f(Sg) benchmark, which is the value of the greedy solution Sg . The analysis is deferred to Appendix C.

We ﬁrst show that, if Sg S, then the solution OPT returned by DUAL is such that OPT 2 OPT. Of course, this 2 approximation can be improved to e/(e 1) with the upper bound (e/(e 1)) f(Sg). However, for instance speciﬁc approximations, this upper bound only gives a 1 1/e approximation for greedy on each instance while DUAL can give a much stronger approximation.

Proposition 2. Let Sg be the solution retuned by the greedy algorithm to the problem max|S| k f(S). Then, if f is a monotone submodular function and Sg S, DUAL returns OPT such that OPT 2f(Sg) 2OPT.

We also show that, for functions with curvature c, the solu

1 tion OPT returned by DUAL is such that OPT OPT. 1 c Since the curvature parameter c cannot be computed efﬁ ciently, the known (1 e c)/c approximation obtained by greedy (Conforti & Cornu ejols, 1984) is an approximation that cannot be computed efﬁciently (unless c is known).

Unlike (1 e c)/c , OPT provides an approximation for functions with curvature that can be efﬁciently computed. This approximation is guaranteed to improve over 1 1/e when c < 1/e.

Proposition 3. Let f be a monotone submodular function with curvature c. Then, DUAL returns OPT such that OPT

1 1 f(Sg) OPT. 1 c 1 c

5. Experiments

We utilize DUAL to obtain bounds on the approximation achieved by submodular maximization algorithms in prac tice. We show that GREEDY and other algorithms ﬁnd solutions that approximate the optimal solution signiﬁcantly better than 1 1/e on a wide variety of real-world datasets and objectives. We also show that DUAL outperforms multiple benchmarks for deriving approximations for the solution found by GREEDY. For all instances, we use S = {S1, S2, . . . , S20} {S25, S30, . . . , S50} as an input to DUAL, where Si is the greedy solution of size i.

5.1. Approximations for submodular maximization algorithms using DUAL

We begin by evaluating the bounds derived by DUAL on the approximation achieved by four different submodular maximization algorithms. The goal here is not to provide a comprehensive comparison of submodular maximization algorithms, but instead to analyze the approximations com puted by DUAL.

Algorithms for submodular maximization. The GREEDY algorithm obtains the optimal 1 1/e approxima tion (Nemhauser et al., 1978) and is widely considered as the standard algorithm for monotone submodular maximization under a cardinality constraint. LOCAL SEARCH obtains a 1/2 approximation guarantee (Nemhauser et al., 1978) and is another widely used algorithm. LAZIER-THAN-LAZY

Instance Speciﬁc Approximations for Submodular Maximization

20 40 60 80 100 k

Approximation

Youtube, Infl. Max (n=1045)

Dual Marginal Top-k Sakaue-Ishihata

20 40 60 80 100 k

Approximation

Uber, Car Dispatch (n=1000)

0 200 400 600 800 1000 k

Approximation

Citation, Infl. Max (n=9877)

0 200 400 600 800 1000 k

Approximation

Movie Lens, Movie Rec. (n=3706)

20 40 60 80 100 k

Approximation

Movie Lens, Facility Loc. (n=500)

20 40 60 80 100 k

Approximation

Facebook, Rev. Max (n=769)

20 40 60 80 100 k

Approximation

Census, Feat. Selection (n=109)

10 20 30 40 50 k

Approximation

Sensor Placement (n=54)

Figure 2. GREEDY approximations computed by DUAL compared to multiple benchmarks on various submodular objectives.

GREEDY, also called sample greedy, improves the running time of greedy by sampling a small subset of elements at each iteration (Mirzasoleiman et al., 2015; Buchbinder et al., 2015). RANDOM GREEDY handles submodular functions that are not necessarily monotonic by introducing randomization into the element selection step and obtains a 1 1/e approximation guarantee for monotone submoduar functions (Buchbinder et al., 2014). We provide further details on these algorithms in Appendix D. For randomized algorithms, we average the results over 5 runs.

Settings. We examine the approximations computed by DUAL for these algorithms on 8 different datasets and ob jectives. Additional details can be found in Appendix E.1.

Inﬂuence maximization: As in Mirzasoleiman, Badani

diyuru, and Karbasi (2016a), we use Youtube social network data (Yang & Leskovec, 2015) and sample n = 1, 045 users from 50 large communities. We se lect k users by maximizing inﬂuence: f(S) = |NG(S)|.

Car dispatch: Our goal is to select the k best locations to deploy drivers to cover the maximum number of pickups. As in Balkanski and Singer (2018), we analyze n = 1, 000 locations of Uber pickups (Five Thirty Eight, 2019) and assign a weight wi to each neighborhood ni that is proportional to the number of trips in the neighborhood.

Inﬂuence maximization: We use a citation network

of Physics collaborations (Leskovec et al., 2007) with n = 9, 877 authors (nodes) and 25,998 co-authorships (edges), and maximize f(S) = |NG(S)|.

Movie recommendation: As in Mirzasoleiman, Badani

diyuru, and Karbasi (2016a), we use the Movie Lens dataset (Harper & Konstan, 2015) of n = 3, 706 movies to recommend k movies that have both good overall rat ings and are highly rated by the most users.

Facility Location: As in Lindgren, Wu, and Dimakis

(2016), we use facility location objective and the movie ranking matrix [rij ] from the Movie Lens dataset where rij is user j s ranking on movie i to select k movies to rec

1 P ommend from N using f(S) = i S maxj U rij . |N |

Revenue maximization: We use a revenue maximization objective from Breuer, Balkanski, and Singer (2020) on the Cal Tech Facebook Network dataset (Traud et al., 2012) of 769 Facebook users N.

Feature selection: We use the Adult Income dataset

(Blake & Merz, 1998) and select a subset of features to predict income level Y . We extract 109 binary features as in Kazemi, Zadimoghaddam, and Karbasi (2018) and use a joint entropy objective to select features.

Sensor placement: As in Krause, Singh, and Guestrin

(2008), we use the Berkeley Intel Lab dataset which comprises of 54 sensors that collect temperature informa tion. We select k sensors that maximize entropy.

Results. In Figure 1, we see that DUAL computes bounds on the approximations achieved by all four algorithms that are signiﬁcantly better than 1 1/e. For GREEDY and LOCAL SEARCH, DUAL derives nearly identical approxima tions that are almost always over 0.85. In many instances, such as movie recommendation, facility location, feature se lection, and sensor placement, approximations are over 0.95. The approximations given by DUAL for LAZIER-THAN

LAZY GREEDY are either identical to GREEDY and LOCAL SEARCH, or 0.02 to 0.05 worse. Even though RANDOM

GREEDY and GREEDY have the same theoretical guarantee of 1 1/e for monotone submodular functions, the gap in their approximations on these instances is signiﬁcant.

In most cases, the approximations obtained by DUAL follow a Nike-swoosh shape as a function of constraint k. For k = 1, the algorithms are either exactly or near-optimal;

Instance Speciﬁc Approximations for Submodular Maximization

the lowest approximations are obtained for small values of k, and then approximations rebound and slowly increase as k increases. For experiments with n > 3000 and the Facebook revenue maximization setting, the values of k (up to 100) are too small to observe this increase.

5.2. DUAL vs benchmarks for GREEDY approximations

For the next set of experiments, we compare DUAL to multi ple benchmarks, which also compute approximations for the performance of submodular maximization algorithms. We ﬁx a single algorithm, GREEDY, and compare the approxi mations found by DUAL and the benchmarks. We consider large instances, as well as small instances with n 20 ele ments, where we can compute the curvature and sharpness benchmarks that require brute-force computation.

5.2.1. EXPERIMENTS ON LARGE INSTANCES

Benchmarks. We consider the following benchmarks that upper bound the optimal value. These benchmarks can be efﬁciently computed on large instances.

Top-k. For a simple baseline, we upper bound OPT using the k elements, A, with maximum singleton value f(a): P OPTk = f(a). a A

Marginal. By using the value of GREEDY solutions Si of size i as well as GREEDY analysis techniques, we derive the following more sophisticated bound:

f(Sj ) (1 1/k)j if(Si) OPTk 1 (1 1/k)j i

for all i < j (See Appendix F.1 for proof). We compute the minimum upper bound over all i < j n pairs.

Sakaue-Ishihata. Sakaue and Ishihata (2018) propose

a different upper bound of OPT using GREEDY proof techniques and derive the following bound:

f(S) OPTk Qk , 1 i=1 βi

f Si 1 (ai) where βi = 1 P (a) and Ai is the set of k a Ai f Si 1 elements with maximal singleton values for f Si 1 .

Results. We compare the approximations of GREEDY found by DUAL to those found by TOPK, MARGINAL and SAKAUE-ISHIHATA. Figure 2 shows that DUAL consis tently outperforms or matches the baselines. TOP-K bench mark performs poorly in most cases, which implies that most objectives are not close to additive. MARGINAL and SAKAUE-ISHIHATA are the strongest benchmarks, but are still signiﬁcantly outperformed by DUAL on most instances. For Movie Lens movie recommendation and Facebook rev enue maximization, where objectives are close to additive, TOP-K outperforms MARGINAL and SAKAUE-ISHIHATA.

5.2.2. EXPERIMENTS ON SMALL INSTANCES

Benchmarks. We consider the following benchmarks on small datasets that are parameterized properties that yield improved approximation guarantees. Unlike the bench marks that upper bound OPT, these benchmarks identify properties that guarantee a bound on the GREEDY approxi mation for any function that satisﬁes the properties. How ever, computing the parameters of these properties requires brute force computation and is computationally infeasible on large datasets.

f S (a) Curvature. The curvature c = 1 min S,a of a f (a) function measures how close f is to additive (Conforti & Cornu c)/c ejols, 1984). It yields an improved (1 e approximation for GREEDY.

Soma-Yoshida. Soma and Yoshida (2017) propose an improved notion of curvature for monotone submodular functions f that can be decomposed as g + h, where g is monotone submodular and h is M \-concave. The h-

h(S) curvature γh = 1 min S N f(S) yields an improved approximation by improving upon the previous curvature parameter.

Sharpness. The property of submodular sharpness was introduced by Pokutta, Singh, and Torrico (2020) as an explanation for the performance of GREEDY in practice. It is the analog of sharpness from continuous optimiza tion (Lojasiewicz, 1963) and assumes that any solution which differs signiﬁcantly from the optimal solution has a substantially lower value. On small instances, we com pute the Dynamic Submodular Sharpness property of the function, which is the sharpness property that yields the best approximation. More details in Appendix F.2.

Another benchmark is stability, which guarantees that GREEDY ﬁnds the optimal solution if the instance is sufﬁ ciently stable, i.e., its optimal solution remains optimal un der small perturbations of the function (Chatziafratis et al., 2017). However, our settings are not perturbation-stable because there are multiple near-optimal solutions.

Results. For small instances where we can exactly com pute necessary parameters as well as OPT by brute-force, we follow the experimental setup from (Pokutta et al., 2020). For k [1, 10], we randomly choose n = 2k elements to comprise the ground set and analyze the result of DUAL ver sus benchmarks on objectives, facility location and movie recommendation, from Pokutta, Singh, and Torrico (2020) on the Movie Lens dataset. More details in Appendix E.2.

In Figure 3, we observe that DUAL yields the best approxi mations. For facility location, DUAL shows a near-optimal approximation while other benchmark approximations are

Instance Speciﬁc Approximations for Submodular Maximization

1 2 3 4 5 6 7 8 9 10 k

Approximation

Movie Lens, Facility Location.

Dual Sharpness Curvature Soma-Yoshida True Approx

1 2 3 4 5 6 7 8 9 10 k

Approximation

Movie Lens, Movie Rec.

20 40 60 80 100 k

Approximation

Youtube, Infl. Max

Dual Method3 Method1

20 40 60 80 100 k

Approximation

Movie Lens, Facility Location

Figure 3. GREEDY approximations computed by DUAL and multi

ple benchmarks on small instances (n 20).

k DUAL OPT CURV. SOMA. SHARP. 6 0.0160 0.0317 0.365 0.729 0.830 8 0.0298 0.457 7.99 17.13 18.47 10 0.0716 7.17 156.1 372.4 359.1

Table 1. Average runtimes (in seconds) of benchmark methods on Movie Lens facility location setting, where n = 2k.

near 1 1/e for larger k. For movie recommendation, the gap between the different benchmarks is smaller. In both settings and for all k, GREEDY ﬁnds a near-optimal solution.

We additionally report benchmark runtimes in Table 1 for the facility location objective and ﬁnd that CURVATURE, SOMA-YOSHIDA, SHARPNESS and OPT, which all require brute-force computation, become exponentially slower as k increases. At k = 10, n = 20, the average time to compute curvature approximation is 156 seconds while sharpness and the improved curvature benchmark both have a computation time of over 300 seconds. These methods are much slower than even brute-force computing OPT which takes 7.2 sec onds. While these benchmarks are not scalable, DUAL, which is at least 1000 times faster than these two methods for the facility location objective when k = 10, is scalable for larger datasets.

5.2.3. COMPARISON OF PROPOSED METHODS

We compare Method 1, Method 3, and DUAL on a cov erage objective and compare Method 3 and DUAL on a non-coverage objective in Figure 4. We observe that even Method 1, which employs the additive lower bound on the dual objective, ﬁnds approximations that are above 0.8. This indicates that, unlike the primal objective, the dual objective is close to additive. By partitioning the dual space (Method 3), a small improvement over Method 1 is achieved. Finally, by considering the upper bound on the optimal solution for the functions f S for S S (DUAL), the approximations further improve. This improvement is minor for Movie Lens, but can be around 0.1 for some k on You Tube.

Figure 4. GREEDY approximations computed by Method 1,

Method 3, and DUAL.

Acknowledgements

This research was supported by a IBM Ph D Fellowship, BSF grant 2014389, NSF grant CAREER CCF-1452961, NSF USICCS proposal 1540428, Google research award, and a Facebook research award.

Baeza-Yates, R., Boldi, P., and Chierichetti, F. Essential web pages are easy to ﬁnd. In Proceedings of the 24th International Conference on World Wide Web, pp. 97 107, 2015.

Balkanski, E. and Singer, Y. Approximation guarantees for adaptive sampling. In International Conference on Machine Learning, pp. 384 393, 2018.

Blake, C. L. and Merz, C. J. UCI machine learning repository, 1998. URL http://archive.ics.uci. edu/ml.

Breuer, A., Balkanski, E., and Singer, Y. The FAST algo

rithm for submodular maximization. In Proceedings of the 37th International Conference on Machine Learning, pp. 1134 1143, 2020.

Buchbinder, N., Feldman, M., Naor, J., and Schwartz, R. Submodular maximization with cardinality constraints. In Proceedings of the twenty-ﬁfth annual ACM-SIAM sym posium on Discrete algorithms, pp. 1433 1452. SIAM, 2014.

Buchbinder, N., Feldman, M., and Schwartz, R. Compar

ing apples and oranges: Query tradeoff in submodular maximization. In SODA, number CONF, pp. 1149 1168, 2015.

Chatziafratis, V., Roughgarden, T., and Vondr ak, J. Stability and recovery for independence systems. ar Xiv preprint ar Xiv:1705.00127, 2017.

Conforti, M. and Cornu Submodular set func ejols, G. tions, matroids and the greedy algorithm: tight worst-case bounds and some generalizations of the rado-edmonds theorem. Discrete applied mathematics, 7(3):251 274, 1984.

Instance Speciﬁc Approximations for Submodular Maximization

Das, A. and Kempe, D. Submodular meets spectral: Greedy algorithms for subset selection, sparse approximation and dictionary selection. ar Xiv preprint ar Xiv:1102.3975, 2011.

Five Thirty Eight. Kaggle, 2019. URL https: //www.kaggle.com/fivethirtyeight/

uber-pickups-in-new-york-city.

Guestrin, C., Krause, A., and Singh, A. P. Near-optimal sensor placements in gaussian processes. In Proceedings of the 22nd international conference on Machine learning, pp. 265 272, 2005.

Harper, F. M. and Konstan, J. A. The movielens datasets: History and context. Acm transactions on interactive intelligent systems (tiis), 5(4):1 19, 2015.

Kazemi, E., Zadimoghaddam, M., and Karbasi, A. Scalable deletion-robust submodular maximization: Data summa rization with privacy and fairness constraints. In Interna tional conference on machine learning, pp. 2544 2553, 2018.

Kempe, D., Kleinberg, J., and Tardos, E. Maximizing the spread of inﬂuence through a social network. In KDD, 2003.

Krause, A., Singh, A., and Guestrin, C. Near-optimal sen

sor placements in gaussian processes: Theory, efﬁcient algorithms and empirical studies. Journal of Machine Learning Research, 9(Feb):235 284, 2008.

Leskovec, J., Kleinberg, J., and Faloutsos, C. Graph evolu

tion: Densiﬁcation and shrinking diameters. ACM trans actions on Knowledge Discovery from Data (TKDD), 1 (1):2 es, 2007.

Lin, H. and Bilmes, J. A class of submodular functions for document summarization. In Human Language Technolo gies, 2011.

Lindgren, E., Wu, S., and Dimakis, A. G. Leveraging spar

sity for efﬁcient submodular data summarization. Ad vances in Neural Information Processing Systems, 29: 3414 3422, 2016.

Lojasiewicz, S. Une propriet e topologique des sousensembles analytiques r in ?les equations aux eels, d ees partielles (paris, 1962)? editions du centre na eriv tional de la recherche scientiﬁque, 1963.

Mirzasoleiman, B., Badanidiyuru, A., Karbasi, A., Vondr ak, J., and Krause, A. Lazier than lazy greedy. In Proceed ings of the AAAI Conference on Artiﬁcial Intelligence, volume 29, 2015.

Mirzasoleiman, B., Badanidiyuru, A., and Karbasi, A. Fast constrained submodular maximization: Personalized data summarization. In ICML, pp. 1358 1367, 2016a.

Mirzasoleiman, B., Badanidiyuru, A., and Karbasi, A. Fast constrained submodular maximization: Personalized data summarization. In ICML, pp. 1358 1367, 2016b.

Nemhauser, G. L. and Wolsey, L. A. Best algorithms for ap

proximating the maximum of a submodular set function. Mathematics of operations research, 3(3):177 188, 1978.

Nemhauser, G. L., Wolsey, L. A., and Fisher, M. L. An analysis of approximations for maximizing submodular set functions i. Mathematical programming, 14(1):265 294, 1978.

Pokutta, S., Singh, M., and Torrico, A. On the unreasonable effectiveness of the greedy algorithm: Greedy adapts to sharpness. In International Conference on Machine Learning, pp. 7772 7782. PMLR, 2020.

Rubinstein, A. and Zhao, J. Budget-smoothed anal ysis for submodular maximization. ar Xiv preprint ar Xiv:2102.05782, 2021.

Sakaue, S. and Ishihata, M. Accelerated best-ﬁrst search with upper-bound computation for submodular function maximization. In Proceedings of the AAAI Conference on Artiﬁcial Intelligence, volume 32, 2018.

Soma, T. and Yoshida, Y. A new approximation guarantee for monotone submodular function maximization via dis crete convexity. ar Xiv preprint ar Xiv:1709.02910, 2017.

Traud, A. L., Mucha, P. J., and Porter, M. A. Social structure of facebook networks. Physica A: Statistical Mechanics and its Applications, 391(16):4165 4180, 2012.

Wolsey, L. A. An analysis of the greedy algorithm for the submodular set covering problem. Combinatorica, 2(4): 385 393, 1982.

Yang, J. and Leskovec, J. Deﬁning and evaluating network communities based on ground-truth. Knowledge and Information Systems, 42(1):181 213, 2015.