# consistent_submodular_maximization__f325ae9a.pdf

Consistent Submodular Maximization

Paul D utting * 1 Federico Fusco * 2 Silvio Lattanzi * 1 Ashkan Norouzi-Fard * 1 Morteza Zadimoghaddam * 1

Maximizing monotone submodular functions under cardinality constraints is a classic optimization task with several applications in data mining and machine learning. In this paper we study this problem in a dynamic environment with consistency constraints: elements arrive in a streaming fashion and the goal is maintaining a constant approximation to the optimal solution while having a stable solution (i.e., the number of changes between two consecutive solutions is bounded). We provide algorithms in this setting with different trade-offs between consistency and approximation quality. We also complement our theoretical results with an experimental analysis showing the effectiveness of our algorithms in real-world instances.

1. Introduction

Submodular optimization is a powerful framework for modeling and solving problems that exhibit the widespread diminishing returns property. Thanks to its effectiveness, it has been applied across diverse domains, including video analysis (Zheng et al., 2014), data summarization (Lin & Bilmes, 2011; Bairi et al., 2015), sparse reconstruction (Bach, 2010; Das & Kempe, 2011), and active learning (Golovin & Krause, 2011; Amanatidis et al., 2022).

In this paper, we focus on submodular maximization under cardinality constraints: given a submodular function f, a universe of elements V , and a cardinality constraint k, the goal is to ﬁnd a set S of at most k elements that maximizes f(S). Submodular maximization under cardinality constraints is NP-hard, nevertheless efﬁcient approximation algorithms exist for this task in both the centralized and the streaming setting (Nemhauser et al., 1978; Badanidiyuru et al., 2014; Kazemi et al., 2019).

*Equal contribution 1Google Research 2Sapienza University of Rome, Rome, Italy. Correspondence to: Federico Fusco <federico.fusco@uniroma1.it>.

Proceedings of the 41 st International Conference on Machine Learning, Vienna, Austria. PMLR 235, 2024. Copyright 2024 by the author(s).

One aspect of efﬁcient approximation algorithms for submodular maximization that has received little attention so far, is the stability of the solution. In fact, for some of the known algorithms, even adding a single element to the universe of elements V may completely change the ﬁnal output (see Appendix A for some examples). Unfortunately, this is problematic in many real-world applications where consistency is a fundamental system requirement. Indeed, a ﬂurry of recent work has started to explore various optimization problems under stability and consistency constraints such as clustering (Lattanzi & Vassilvitskii, 2017; Cohen Addad et al., 2022; Fichtenberger et al., 2021; Guo et al., 2021; Łacki et al., 2024), facility location (Cohen-Addad et al., 2019; Bhattacharya et al., 2022), and online learning (Jaghargh et al., 2019).

Having solutions that evolve smoothly is central in many practical application of submodular optimization. Consider, for example, the data summarization task in an evolving setting where elements are added to the universe V . In this setting, having a stable summary that changes as little as possible from step to step is very important both for serving the summary to a user or for using it in a machine learning model. In fact, in both settings a drastic change of the solution may have negative impact on system usability, it could harm user attention, and adversely effect the performance of the machine learning model.

For these reasons, in this paper we initiate the study of submodular maximization under consistency constraints, where we allow the solutions to change only slightly after each element insertion. More formally, consider a stream V of exactly n elements, chosen by an adversary. Denote by Vt = {e1, . . . , et} V the set of all elements inserted up to the t-th stream operation, and let OPTt be an optimum feasible solution for Vt. Our goal is to design an algorithm with two key properties. On the one hand, we want the algorithm to maintain, at the end of each operation t, a solution St Vt, with |St| k, of high value f(St). In particular, we say that an algorithm is an α-approximation of the best solution if αf(St) f(OPTt), for all t = 1, . . . , n. On the other hand, we want the dynamic solution to not change much after consecutive insertions: we say that an algorithm is C-consistent if |St \ St 1| C for all t = 2, . . . , n. In general, we say that an algorithm is consistent, without specifying C, when C is constant.

Consistent Submodular Maximization

It is interesting to note that the SWAPPING algorithm by Chakrabarti & Kale (2015) already conjugates constant approximation with constant consistency . SWAPPING maintains a dynamic feasible solution and each new arriving element is added to the solution if either it ﬁts into the cardinality constraint or it is possible to swap it with some lowvalue element. It is well known that SWAPPING achieves a 4-approximation, and from the previous description it is also clear that it is 1-consistent.

Putting consistency aside, it is NP-hard to get an approximation guarantee better than e/(e 1) (Feige, 1998), which can be achieved by recomputing a greedy solution (Nemhauser et al., 1978) from scratch after every insertion. However, such approach is not consistent (see Appendix A).

A line of work that is related to our model is that of fullydynamic submodular maximization (e.g., Lattanzi et al., 2020; Monemizadeh, 2020; D utting et al., 2023; Banihashem et al., 2024). There, the algorithm is given an arbitrary stream of insertions and deletions, and the goal is to maintain a good dynamic solution with low amortized running time. While the constraint on running time naturally induces algorithms characterized by solutions that do not change often, known algorithms for fully dynamic submodular maximization are not consistent, as they all contemplate the possibility of recomputing the solution from scratch from time to time.

Our Contribution. Given these considerations, it is natural to ask if it is possible to obtain a better trade-off between quality and consistency. We answer this question positively:

We ﬁrst provide a (3.147 + O(1/k))-approximation algorithm that is 1-consistent, improving on the guarantees of the SWAPPING algorithm.

We then provide a (2.619 + ε)-approximation algorithm that is O(1/ε)-consistent, where the O notation hides poly-logarithmic factors in 1/ε.

We complement our positive results with a lower bound showing that for any constant C, no deterministic algorithm can be C-consistent and return a better than 2 approximation. Since both our algorithms are deterministic, the

Following e.g., D utting et al. (2022; 2023), we call SWAPPING the instantiation of the general framework by Chakrabarti & Kale (2015) for the special case of matroid constraints. We refer to Appendix B for the pseudocode. It is possible to show that the 4 is tight for the approximation factor. For an example please refer to Appendix B. As is common in the submodular maximization literature, the parameter ε is intended to be a small constant that the algorithm designer can tune according to the application at hand: it is possible to attain an approximation arbitrarily close to 2.619, at the cost of a worse consistency.

lower bound shows that our algorithms obtain a near-optimal quality-consistency tradeoff. We leave the resolution of the remaining gaps, and the study of randomized algorithms as exciting directions for future work.

We also present extensive experiments with real-world data sets and a synthetic data set (Section 6 and Appendices B and C). The experiments show that our algorithms achieve comparable value as SWAPPING and the non-consistent SIEVE-STREAMING (Badanidiyuru et al., 2014) on realworld data sets; while achieving signiﬁcant savings in the total number of changes. Furthermore, the synthetic data set constructed using a hard instance for SWAPPING presented in Appendix B conﬁrms the improvements in the worstcase approximation guarantees relative to SWAPPING from our theoretical analysis, showing that there too the gains can be signiﬁcant (in the order of the 21.325% and 34.525% improvements that we show in our analysis).

Our Techniques. Our ﬁrst algorithm, ENCOMPASSINGSET, maintains a benchmark set Bt that is used to decide whether to add or discard any new element. More precisely, any arriving element et is added to Bt 1 if, upon arrival,the marginal contribution of et to Bt 1, that is f(et | Bt 1), is at least β/k f(Bt 1). Here β is a judiciously chosen constant that is larger than 1, namely β = 1.14. At any given time t, the solution St maintained by the algorithm consists of the last k elements added to Bt.

This algorithm is 1-consistent by construction, while the approximation guarantee descends from the following two properties of this algorithm. First, the (potentially infeasible) benchmark set Bt achieves a (1 + β)-approximation to f(OPTt) (where 1+β = 2.14 by the choice of β). Second, due to the exponential nature of the condition by which elements are added to the benchmark set, the elements in Bt that are not part of St only account for a small fraction of the value of Bt; namely, f(Bt) (1 + β/k)kf(Bt \ St). Intuitively, the second property shows that St captures a signiﬁcant fraction of f(Bt), while the ﬁrst property shows that f(Bt) is a good approximation to f(OPTt). A careful analysis shows that the two properties lead to the claimed factor of 3.147 + O(1/k).

Our second algorithm, CHASING-LOCAL-OPT, provides a better approximation guarantee at the cost of possibly performing more than one swap per step (but still at most constantly many). Rather than maintaining a benchmark set, this algorithm only maintains a solution St, and updates it via local improvements. It applies a similar swapping condition as ENCOMPASSING-SET, by requiring that the marginal value of an arriving element et to St 1 should be at least φ/k f(St 1), where φ 1.61 is the golden ratio. Other than ENCOMPASSING-SET, however, rather than swapping out the oldest element that was added, it swaps

Consistent Submodular Maximization

out an element r whose marginal contribution f(r | S r)

to the current solution S is less than a 1/k-fraction of the current solution s value f(S).

Moreover, after the arrival of each element et and its possible addition to St, it performs up to N = O(1/ε) additional swaps. For this it considers all elements that have arrived so far, which we denote by Vt, and it tries to add them oneby-one by the same condition and procedure that is used for newly arriving elements. The purpose of these extra swaps is to drive the maintained solution St closer to a local optimum: a solution S Vt such that there is no element x Vt such that f(x | S) φ/k f(S). The improved approximation guarantee then stems from the fact that either the algorithm was at a local optimum in the not too distant past, or it performed many swaps since.

2. Preliminaries

We consider a set function f : 2V R 0 on a ground set V of cardinality n. Given two sets X, Y V , the marginal gain of X with respect to Y , f (X | Y ), quantiﬁes the change in value of adding X to Y and is deﬁned as

f (X | Y ) = f(X Y ) f(Y ).

When X consists of a singleton x, we use the shorthand f(x | Y ) instead of f({x} | Y ). Function f is called monotone if f (e | X) 0 for each set X V and element e V , and submodular if for any two sets X Y V and any element e V \Y we have f (e | X) f (e | Y ) .

Throughout the paper, we assume that f is monotone and that it is normalized, i.e., f( ) = 0. We model access to the submodular function f via a value oracle that computes f(S) for given S V . The problem of maximizing a function f under a cardinality constraint k is deﬁned as selecting a set S V with |S| k that maximizes f(S).

3. Impossibility Result

Putting computational efﬁciency aside, it may be possible to design a consistent algorithm which maintains the optimal solution, or an arbitrarily good approximation. We prove that this is not the case: no deterministic algorithm with constant consistency enjoys an approximation guarantee better than 2. We remark that this is an information-theoretical bound, and concerns the streaming nature of the problem.

Theorem 3.1. Fix any constant C and precision parameter ε (0, 1). No C-consistent (deterministic) algorithm provides a (2 ε)-approximation.

Proof. Fix any constant C, precision parameter ε > 0, and a deterministic algorithm A that is C-consistent, we construct

We use S r instead of S \{r} and S +x instead of S {x}.

Algorithm 1 ENCOMPASSING-SET

1: Environment: Stream V , function f, cardinality k 2: Threshold parameter β 1.14 3: B0 , S0 , and t 1 4: for et new element arriving do 5: if f(et | Bt 1) β

k f(Bt 1) then 6: Bt Bt 1 + et 7: St St 1 + et 8: if |St| = k + 1 then 9: remove from St the element es with smallest s 10: t t + 1

a covering instance such that A does not maintain a (2 ε) approximation. Let G = {g1, . . . , gn} be a ground set and V be a family of subsets of G such that V contains all the subsets of G of cardinality 1 and k, with k = n/2. The covering function f is naturally deﬁned on V , and we consider the task of maximizing f with cardinality k.

Observe the behaviour of A on the sequence {g1}, . . . {gn}. At the end of this partial sequence A maintains a certain solution S = {{gi1}, . . . , {giℓ}}, with ℓ k. Now suppose the next element to arrive is {gi1, . . . , giℓ, giℓ+1, . . . , gik}, where giℓ+1, . . . , gik are some arbitrary elements not covered by S. The value of the optimal solution after this insertion is 2k 1 (just take the last subset and k 1 non overlapping singletons). The value of S is ℓ k and, even if A adds to S the subset {gi1, . . . , giℓ, giℓ+1, . . . , gik} and C 1 other singletons, it cannot get a solution of value more than k + C. The theorem follows by choosing appropriate values for k: k 3C/ε.

4. ENCOMPASSING-SET

In this section, we present the ENCOMPASSING-SET algorithm, which achieves an approximation guarantee of 3.146 + O(1/k) and 1-consistency (changes at most one element for each insertion). ENCOMPASSING-SET maintains a benchmark set Bt to which it adds all the elements that, upon arrival, exhibit a marginal contribution to Bt that is at least β/k f(Bt). At any given stream operation t, the solution St is given by the last k elements added to Bt. We refer to the pseudocode for further details. We prepare the analysis of the properties of ENCOMPASSING-SET with two Lemmata. We start relating the value of the optimal solution with that of the benchmark.

Lemma 4.1. After each insertion et, the following holds:

f(OPTt) (1 + β) f(Bt).

Proof. Consider any element that belongs to OPTt but not to the benchmark set Bt after the computation following the insertion of et, i.e., es OPTt \Bt, with s t. Element

Consistent Submodular Maximization

es has not been included to Bs (because it does not belong to Bt Bs) upon its insertion, so the following holds:

f(es | Bt) f(es | Bs 1) (by submodularity)

k f(Bs 1) (since es / Bs)

k f(Bt). (by monotonicity)

So, for any element es OPTt \Bt, it holds that

f(es | Bt) β

k f(Bt). (1)

The above inequality is the crucial ingredient of the proof:

f(OPTt) f(OPTt Bt) (by monotonicity)

es OPTt \Bt f(es | Bt)

f(Bt) + | OPTt | β

k f(Bt) (by Ineq. 1)

(1 + β)f(Bt). (because | OPTt | k)

Note, the second inequality comes from submodularity.

As a second preliminary step, we argue that the elements in Bt that are not included to the current solution St only account for a small fraction of f(Bt). Lemma 4.2. After each insertion et, the following inequality holds:

f(Bt) 1 + β

k f(Bt \ St).

Proof. The elements in the current solution are naturally sorted according to the order in which they are inserted in the stream and then added to the solution: St = {st1, st2, . . . stℓ}. Element stℓis the last one added and, clearly, ℓ k and tℓ t. Each one of these eti elements has been added to the solution because it passed the value test: f(sti | Bti 1) β

k f(Bti 1).

Now, set Bti 1 can be rewritten in terms of the current benchmark set Bt and the elements in the solution St: Bti 1 = Bt \ {sti, . . . , stℓ}, so the previous inequality can be rewritten as

f(sti | Bt \ {sti, . . . , stℓ}) β

k f(Bt \ {sti, . . . , stℓ}).

If we add to both sides of the above inequality the term f(Bt \ {sti, . . . , stℓ}), we get that

f(Bt\{sti+1, . . . , stℓ}) 1 + β

f(Bt\{sti, . . . , stℓ}).

Iterating the above argument we get the desired bound:

f(Bt) 1 + β

k f(Bt \ {stℓ})

k 2 f(Bt \ {stℓ 1, stℓ})

k ℓ f(Bt \ St).

The Lemma follows by recalling that ℓ k.

We now have all the ingredients to analyze ENCOMPASSINGSET.

Theorem 4.3. ENCOMPASSING-SET is 1-consistent and maintains a 3.147 + O(1/k) approximation.

Proof. First observe that the algorithm is indeed 1consistent: every time the solution St changes, exactly one element is inserted and exactly one is removed from it.

We move our attention to the approximation guarantee. We start by noting that

f(Bt) + 1 + β

k k [f(St) + f(Bt \ St)] (Lemma 4.2)

k k f(Bt). (by submodularity)

By rearranging terms and applying Lemma 4.1 we get:

k k 1 1 + β

k k 1 1 + β

k k (1 + β) f(OPTt). (2)

We conclude the proof by providing a general lower bound for the multiplier of the right-hand side of the last inequality. We know that the following simple chain of inequality holds:

Plugging the above inequality into the multiplier in Equation (2), we have

k k 1 1 + β

k k (1 + β) eβ 1 eβ(1 + β) β2

k . (β = 1.14)

Taking the inverse yields the desired factor.

5. CHASING-LOCAL-OPT

In this section we present and analyze the CHASINGLOCAL-OPT algorithm, which exhibits a better approximation factor than both SWAPPING and ENCOMPASSING-SET. We refer to the pseudocode for further details. There are

Consistent Submodular Maximization

Algorithm 2 MIN-SWAP(S, x)

1: Input: Set S and element x 2: Environment: Function f and cardinality k 3: if |S| < k, then return S + x 4: Let r S be any element s.t. f(r | S r) f(S)/k 5: return S r + x

Algorithm 3 CHASING-LOCAL-OPT

1: Input: Precision parameter ε 2: Environment: Stream V , function f, cardinality k

ε 4: S0 and t 1 5: for et new element arriving do 6: if f(et | St 1) φ

k f(St 1) then 7: St MIN-SWAP(St 1, et) 8: for i = 1, . . . , N do 9: if x Vt such that f(x | St) φ

k f(St) then 10: St MIN-SWAP(St, x) 11: t t + 1

two differences with respect to ENCOMPASSING-SET. First, the way in which elements in the solution are swapped out: it is not the oldest element to be removed, but one with small enough value. This is formalized in the routine MINSWAP, which takes as input a set S and an element x, and is responsible for inserting x into S; if S already contains k elements, then x is swapped with an element r in S with marginal value not larger than the average value of S (so to maintain the cardinality of S bounded by k). Note, such an element r always exists by submodularity and a simple averaging argument:

x S f(x | S x) k min x S f(x | S x) .

The second difference is that after the arrival of each element and possibly its addition to the current solution, the algorithm performs up to N O(1/ε) additional swaps from Vt into the solution, using the same rule and subroutine as for newly arriving elements. The additional swaps performed by CHASING-LOCAL-OPT drive the maintained solution closer to a local optimum deﬁned as follows.

Deﬁnition 5.1. We say that a dynamic solution St is a local optimum if there exists no element x in Vt such that f(x | St) φ

The improved approximation guarantee stems from the fact that at any point in time, either the solution maintained by the algorithm was a local optimum not too far in the past, or many swaps were performed since.

Theorem 5.2. CHASING-LOCAL-OPT is O(1/ε)-consistent and maintains a (φ + 1 + 9ε)-approximation, where φ 1.619 is the golden ratio.

Before proving the theorem, we introduce a notational convention. During the execution of the algorithm, elements may be added and removed multiple times from the dynamic solution. Rather than thinking of such an element as one and the same element, it is convenient to think of this happening to multiple distinct copies of the same element so that each element is added and removed at most once. This allows us to work with sets instead of multi-sets in the analysis.

Proof of Theorem 5.2. The bound on the consistency is immediate, as for each insertion there are at most N + 1 = 1/ε logφ 12/ε + 1 = O(1/ε) changes in the solution. The rest of the proof is devoted to the analysis of the approximation guarantee, which we prove by induction on the number of insertions. For the ﬁrst element e1 of the stream there is nothing to prove, as S1 = V1 = {e1}. We analyze now the generic insertion et, with t > 1, assuming that the desired approximation holds for any previous insertion s < t. Let t be the last insertion index before t in which the solution St was a local optimum (see Deﬁnition 5.1), and denote with τ the maximum between t and (t εk ). We remark that t is at least 1, so τ is well deﬁned. We have that OPTt is the optimum after insertion et, and OPTτ is the optimum after insertion eτ. Sets Vt and Vτ, Vt and Vτ are deﬁned in a similar way. Consider how the solution changed between Sτ and St: some elements in Sτ were removed, some were added and remained in St, while others were added and later removed, possibly multiple times. To ease the analysis, we sort these inserted elements s1, s2, . . . , s L according to the order in which they were added (recall that multiple copies of the same element may appear in this sequence); this induces a natural sorting on the removed elements: we call rℓthe element that was swapped out to make room for sℓ(to avoid confusion, if no element was swapped out, we let rℓbe a dummy element with no value). We now deﬁne an auxiliary sequence of sets Aℓthat interpolates between the solution at insertion τ and that at insertion t: Aℓ= Sτ {s1, . . . , sℓ} \ {r1, . . . , rℓ}.

It holds that Sτ = A0, while St = AL. Moreover, the deﬁnition of the auxiliary sets motivates this relation:

Aℓ 1 + sℓ= Aℓ+ rℓ. (3)

By a telescopic argument, the above relation and the design of CHASING-LOCAL-OPT we have the following claim.

Claim 5.3. The following inequality holds true:

f(St) f(Sτ) + (φ 1)

ℓ=1 f(sℓ| Aℓ 1 rℓ).

Proof of Claim 5.3. The change in value between two consecutive auxiliary sets can be decomposed as follows exploiting the relation in Equation (3):

f(Aℓ) f(Aℓ 1) = f(sℓ| Aℓ 1) f(rℓ| Aℓ). (4)

Consistent Submodular Maximization

Now, the marginal value of sℓwith respect to Aℓ 1 is at least φ/k f(Aℓ 1), by the swapping conditions in lines 6 and 9 of CHASING-LOCAL-OPT. Furthermore, by the design of MIN-SWAP, we know that the element rℓthat is removed to make room for sℓhas small value. In formula,

f(sℓ| Aℓ 1) φ

k f(Aℓ 1) φf(rℓ| Aℓ 1 rℓ) (5)

We can now prove directly the inequality in the statement:

f(St) f(Sτ)

ℓ=1 f(Aℓ) f(Aℓ 1) (telescopic argument)

ℓ=1 f(sℓ| Aℓ 1) f(rℓ| Aℓ) (by Eqn. 4)

ℓ=1 f(sℓ| Aℓ 1) f(rℓ| Aℓ 1 rℓ)

ℓ=1 f(rℓ| Aℓ 1 rℓ). (by Eqn. 5)

Note, the second to last inequality follows by submodularity and the fact that Aℓ 1 rℓ= Aℓ sℓ Aℓ, due to the relation in Equation (3).

Denote now with I the set of elements that were inserted between eτ and et : I = Vt \ Vτ, and with A the set of all the elements that were, at some point, in the solution between time τ and t: A = t ℓ=τAℓ. It is possible to relate the value of St with that of the elements in I and A:

Claim 5.4. The following inequality holds true:

f(I A) (1 + 4ε)f(St) +

ℓ=1 f(rℓ| Aℓ 1 rℓ).

Proof of Claim 5.4. Consider any element g in I A. We have three cases: either element g belongs to St, g was added to the solution but was later swapped out, or it failed the swapping condition in line 6 upon insertion. Now, sort these elements according to the order in which they were discarded by the algorithm: (I A) \ St = {g1, . . . , g J} (g I\A is discarded upon insertion, while g A\(I St) is discarded when gets swapped out by the solution). For simplicity, denote with Gj the set of the ﬁrst j 1 such elements, we have the following two facts: (i) if gj I \ A, then it means that gj = et for some t {τ, . . . , t}, and the solution St St Gj; (ii) if gj A \ (I St), then it means that gj = rℓfor some ℓ {1, . . . , L}, and it holds that Aℓ rℓ St Gj.

Exploiting these two facts and submodularity, we have the following chain of inequalities:

f(I A) f(St)

j=1 f(gj | St Gj)

et I\(St A) f(et | St 1) +

ℓ=1 f(rℓ| Aℓ 1 rℓ)

et I\(St A) f(St 1) +

ℓ=1 f(rℓ| Aℓ 1 rℓ)

ℓ=1 f(rℓ| Aℓ 1 rℓ).

Note, the second inequality holds by the fact that ej failed the swapping condition in line 6 upon insertion; while the third inequality follows by observing that the sequence of f(St ) is non-decreasing, there are at most 2εk elements in I \ (St A), and φ (1, 2).

Another useful property of the auxiliary sets Aℓis to provide a clean way to formalize that adding new elements to the solution multiplicatively improves the value of the solution.

Claim 5.5. The following inequality holds true:

f(St) 1 + φ 1

Proof of Claim 5.5. Consider the generic subsequent terms ℓ 1 and ℓ, for ℓ= 1, . . . , L. Starting from rearranging Equation (4), we have the following:

f(Aℓ) = f(sℓ| Aℓ 1) f(rℓ| Aℓ) + f(Aℓ 1)

f(sℓ| Aℓ 1) f(rℓ| Aℓ 1 rℓ) + f(Aℓ 1)

φ f(sℓ| Aℓ 1) + f(Aℓ 1) (by Eqn. 5)

where the ﬁrst inequality follows by submodularity and the relation in Equation (3), while the last one by the design of MIN-SWAP: an element is added to the solution only if its marginal contribution is at least a φ/k fraction of f(Aℓ 1). Applying iteratively the above argument from St = AL to Sτ = A0 yields the desired result.

We now have all the ingredients to directly address the crux of the proof. We have two cases we analyze separately: either Sτ is a local optimum, or it is not.

Consistent Submodular Maximization

Sτ is a local optimum. If Sτ is a local optimum, then all the elements in OPTt that arrived before eτ, i.e., OPTt Vτ have low marginal contribution with respect to Sτ. Formally, we have the following result.

Claim 5.6. If Sτ is a local optimum, then

(1+4ε)f(St)+

ℓ=1 f(rℓ| Aℓ 1 rℓ) f(OPTt) φf(Sτ).

Proof of Claim 5.6. To prove this result, it sufﬁces to argue that the right-hand side of the inequality in the statement is at most f(I A), as it is then possible to conclude the argument by combining it with Claim 5.4. We have the following:

f(OPTt) f(I A)

f(OPTt | I A) (by monotonicity)

= f(OPTt Vτ | I A) (since I = Vt \ Vτ)

e OPTt Vτ f(e | Sτ)

φf(Aτ). (Aτ local optimum)

Note, the second inequality holds by submodularity as Sτ is contained into A. Reordering the terms of the inequality we get the desired lower bound on f(I A).

Summing the inequality in Claim 5.6 with φ times the inequality in Claim 5.3 yields the desired bound, thus concluding the argument for the ﬁrst case:

(1+φ + 4ε)f(St)

f(OPTt) + [φ(φ 1) 1]

ℓ=1 f(rℓ| Aℓ 1 rℓ)

In the previous inequality we crucially used the deﬁnition of the golden ratio as the solution of φ2 φ 1 = 0.

Sτ is not a local optimum. If Sτ is not a local optimum, then it means that L, the total number of swaps between insertion eτ and et, is at least ε k N, where N is deﬁned in the pseudocode as 1/ε logφ 12/ε . If we complement this with Claim 5.5 we get:

f(St) f(Sτ) 1 + φ 1

k logφ 12/ε

ε f(Sτ) (because (1 + x

12 (1 + φ + 9ε)εf(OPTτ),

where in the last inequality we crucially used the inductive assumption. By rearranging and using that ε (0, 1) and φ (1, 2), we get the following simple relation, which proves that the value of the elements arrived up to time τ can be safely ignored:

f(OPTτ) εf(St). (6)

We have all the ingredient to deal with the last case:

f(OPTt) f(OPTt Vτ) + f(I) (by submodularity)

f(OPTτ) + f(I) (by optimality of OPTτ)

εf(St) + f(I) (by Equation 6)

εf(St) + f(I A) (by monotonicity)

(1 + 5ε)f(St) +

ℓ=1 f(rℓ| Aℓ 1 rℓ)

(by Claim 5.4)

1 + 1 φ 1 + 9ε f(St) (by Claim 5.3)

= (1 + φ + 9ε)f(St),

where in the last equality we used the deﬁnition of φ as the golden ratio. This last case concludes the proof.

6. Experiments

In this section we evaluate the performance of our two algorithms on real-world data sets . We report here three case studies, while we defer to Appendix C other (qualitatively analogous) results, as well as further implementation details. We present additional experimental results that illustrate the gains in worst-case approximation guarantee in Appendix B. As benchmarks we consider the SWAPPING algorithm which provides a 4-approximation and is 1-consistent and the SIEVE-STREAMING algorithm, a (2 + ε)-approximation that is not consistent (see Appendix A for further details on the instability of the algorithm).

Inﬂuence Maximization. For our ﬁrst case study we consider the problem of inﬂuence maximization on a social network graph (e.g., Norouzi-Fard et al., 2018; Halabi et al., 2020), where the goal is to maintain a subset of the nodes to inﬂuence the rest of the graph. In such application, consistency is crucial as changing nodes may entail costs relative to terminating and issuing new contracts. We use the Facebook dataset from Mc Auley & Leskovec (2012) that consists of 4039 nodes V and 88234 edges E and, as measure of inﬂuence we consider the monotone and submodular dominating function:

f(S) = |{v V : s S and (s, v) E}|.

The code of the experiments is available at https:// github.com/fedefusco/Consistent-Submodular.

Consistent Submodular Maximization

0 1000 2000 3000 4000 Stream

Sieve Swapping Chasing Encompassing

(a) Inﬂuence Maximization on Facebook

0 2000 4000 6000 8000 Stream

Sieve Swapping Chasing Encompassing

(b) k-medoid Clustering on Run In Rome

0 2000 4000 6000 8000 Stream

Sieve Swapping Chasing Encompassing

(c) Log Det Maximization on Run In Rome

0 1000 2000 3000 4000 Stream

Consistency

Sieve Swapping Chasing Encompassing

(d) Inﬂuence Maximization on Facebook

0 2000 4000 6000 8000 Stream

Consistency

Sieve Swapping Chasing Encompassing

(e) k-medoid Clustering on Run In Rome

0 2000 4000 6000 8000 Stream

Consistency

Sieve Swapping Chasing Encompassing

(f) Log Det Maximization on Run In Rome

Figure 1: Experimental Results. The ﬁrst row reports the objective values, the second one the cumulative consistency.

Summarizing Geolocation Data. Our second and third case study concern the problem of maintaining a stable and representative summary from a sequence of geographical coordinates (e.g., Mirzasoleiman et al., 2017; D utting et al., 2022). We use the Run In Rome dataset (Fusco, 2022), that contains 8425 positions recorded by running activity in Rome, Italy. We consider two different objective functions used in geographical data summarization: the k-medoid and the kernel log-det. Consider the k-medoid function on the metric set (V, d) L(S) = 1 |V | P

v V mine S d(e, v). By introducing an auxiliary point e0 V we can turn L into a monotone submodular function (Mirzasoleiman et al., 2013): f(S) = L(e0) L(S + e0).

In our experiment we set e0 to be the ﬁrst point of each dataset. For the second objective, consider a kernel matrix K that depends on the pair-wise distances of the points, i.e. Ki,j = exp{ d(i,j)2

h2 } where d(i, j) denotes the distance between the ith and the jth point in the dataset and h is some constant. Following Krause & Golovin (2014), another common monotone submodular objective is f(S) = log det(I + αKS,S), where I is the |S| dimensional identity matrix, KS,S is the principal sub-matrix corresponding to the entries in S, and α is a regularization parameter (that we set to 10 in the experiments).

Experimental Results. In Figure 1, we present the performance of our algorithms and the benchmarks. The ﬁrst

row (Figures 1a to 1c) features the objective value of the dynamic solution maintained by the algorithms, while the second row (Figures 1d to 1f) reports the cumulative number of changes in the solutions. The experiments show that our algorithms, ENCOMPASSING-SET and CHASINGLOCAL-OPT, achieve comparable value as SWAPPING and SIEVE-STREAMING; while achieving notable savings in the total number of changes. For instance, in the setting of Figure 1d, SWAPPING is signiﬁcantly less consistent on aggregate than our algorithms (around a factor 25), while SIEVE-STREAMING changes the solution about 3 4 times more often. The superior cumulative consistency of our algorithms is also clear in the other experiments; in the settings of Figure 1e and Figure 1f SIEVE-STREAMING performs order of magnitudes more changes than either of our algorithms (about 500x and 10x), while SWAPPING performs between 50% and 100% more. The strict insertion rules implemented by our two algorithms seem to guarantee that only the crucial elements of the dataset are added to the solution. This phenomenon empirically induces a desirable global stability over the entire stream which goes beyond the theoretical per-round guarantees at the cost of possibly discarding moderately good elements.

7. Conclusion

In this paper, we initiate the study of consistency in submodular maximization. Consistency is a natural measure of

Consistent Submodular Maximization

stability of the online solution maintained by an algorithm, and has been extensively studied for clustering, facility location and online learning. We present two consistent algorithms, ENCOMPASSING-SET and CHASING-LOCAL-OPT, that exhibit a different approximation-consistency trade off (3.147 + O(1/k) and 1-consistent vs. 2.619 + O(ε) and O(1/ε)-consistent). They both substantially improve on the state of the art (a consistent 4-approximation), moving the approximability boundary closer to the optimal approximation factor, as evidenced by the information-theoretical lower bound of 2 that we prove to hold for any consistent deterministic algorithm. Besides closing the remaining gap in the approximation factor, our work raises many natural and compelling questions. First, the investigation of randomized algorithms may lead to better results, even beyond the lower bound of 2. Second, while some known algorithms already exhibit consistency, the explicit study of consistent algorithms for possibly non-monotone submodular functions and more general constraints (e.g., matroids and knapsack) may lead to improved results.

Impact Statement

The work is theoretical in nature and does not have any signiﬁcant ethical implications.

Acknowledgments

Federico Fusco is supported by the PNRR FAIR (Future Artiﬁcial Intelligence Research) project PE0000013 Spoke 5, the ERC Advanced Grant 788893 AMDROMA Algorithmic and Mechanism Design Research in Online Markets , the PNRR MUR project IR0000013-So Big Data.it, and the MUR PRIN project Learning in Markets and Society .

Amanatidis, G., Fusco, F., Lazos, P., Leonardi, S., Marchetti Spaccamela, A., and Reiffenh auser, R. Submodular maximization subject to a knapsack constraint: Combinatorial algorithms with near-optimal adaptive complexity. In ICML, pp. 231 242, 2021.

Amanatidis, G., Fusco, F., Lazos, P., Leonardi, S., and Reiffenh auser, R. Fast adaptive non-monotone submodular maximization subject to a knapsack constraint. J. Artif. Intell. Res., 74:661 690, 2022.

Bach, F. R. Structured sparsity-inducing norms through submodular functions. In NIPS, pp. 118 126, 2010.

Badanidiyuru, A., Mirzasoleiman, B., Karbasi, A., and Krause, A. Streaming submodular maximization: massive data summarization on the ﬂy. In KDD, pp. 671 680, 2014.

Bairi, R., Iyer, R. K., Ramakrishnan, G., and Bilmes, J. A. Summarization of multi-document topic hierarchies using submodular mixtures. In ACL, pp. 553 563, 2015.

Banihashem, K., Biabani, L., Goudarzi, S., Hajiaghayi, M., Jabbarzade, P., and Monemizadeh, M. Dynamic algorithms for matroid submodular maximization. In SODA, pp. 3485 3533. SIAM, 2024.

Bhattacharya, S., Lattanzi, S., and Parotsidis, N. Efﬁcient and stable fully dynamic facility location. In Neur IPS, 2022.

Chakrabarti, A. and Kale, S. Submodular maximization meets streaming: matchings, matroids, and more. Math. Program., 154:225 247, 2015.

Cohen-Addad, V., Hjuler, N., Parotsidis, N., Saulpic, D., and Schwiegelshohn, C. Fully dynamic consistent facility location. In Neur IPS, pp. 3250 3260, 2019.

Cohen-Addad, V., Lattanzi, S., Maggiori, A., efﬁcient, N., and stabledis. Online and consistent correlation clustering. In ICML, pp. 4157 4179, 2022.

Das, A. and Kempe, D. Submodular meets spectral: Greedy algorithms for subset selection, sparse approximation and dictionary selection. In ICML, pp. 1057 1064, 2011.

D utting, P., Fusco, F., Lattanzi, S., Norouzi-Fard, A., and Zadimoghaddam, M. Deletion robust submodular maximization over matroids. In ICML, pp. 5671 5693, 2022.

D utting, P., Fusco, F., Lattanzi, S., Norouzi-Fard, A., and Zadimoghaddam, M. Fully dynamic submodular maximization over matroids. In ICML, volume 202 of Proceedings of Machine Learning Research, pp. 8821 8835. PMLR, 2023.

Feige, U. A threshold of ln n for approximating set cover. J. ACM, 45(4):634 652, 1998.

Fichtenberger, H., Lattanzi, S., Norouzi-Fard, A., and Svensson, O. Consistent k-clustering for general metrics. In SODA, pp. 2660 2678, 2021.

Fusco, F. Run In Rome Dataset, 2022. https://github.com/fedefusco/Run In Rome-Dataset.

Golovin, D. and Krause, A. Adaptive submodularity: Theory and applications in active learning and stochastic optimization. J. Artif. Intell. Res., 42:427 486, 2011.

Guo, X., Kulkarni, J., Li, S., and Xian, J. Consistent kmedian: Simpler, better and robust. In AISTATS, pp. 1135 1143, 2021.

Halabi, M. E., Mitrovic, S., Norouzi-Fard, A., Tardos, J., and Tarnawski, J. Fairness in streaming submodular maximization: Algorithms and hardness. In Neur IPS, 2020.

Consistent Submodular Maximization

Halabi, M. E., Fusco, F., Norouzi-Fard, A., Tardos, J., and Tarnawski, J. Fairness in streaming submodular maximization over a matroid constraint. In ICML, volume 202 of Proceedings of Machine Learning Research, pp. 9150 9171. PMLR, 2023.

Harper, F. M. and Konstan, J. A. The Movie Lens datasets: History and context. ACM Trans. Interact. Intell. Syst., 5 (4):19, 2016.

Jaghargh, M. R. K., Krause, A., Lattanzi, S., and Vassilvitskii, S. Consistent online optimization: Convex and submodular. In AISTATS, pp. 2241 2250, 2019.

Kaggle. Uber pickups in New York City, 2020. https://www.kaggle.com/ﬁvethirtyeight/uber-pickups-innew-york-city.

Kazemi, E., Mitrovic, M., Zadimoghaddam, M., Lattanzi, S., and Karbasi, A. Submodular streaming in all its glory: Tight approximation, minimum memory and low adaptive complexity. In ICML, pp. 3311 3320, 2019.

Krause, A. and Golovin, D. Submodular function maximization. In Tractability, pp. 71 104. Cambridge University Press, 2014.

Lattanzi, S. and Vassilvitskii, S. Consistent k-clustering. In ICML, pp. 1975 1984, 2017.

Lattanzi, S., Mitrovic, S., Norouzi-Fard, A., Tarnawski, J., and Zadimoghaddam, M. Fully dynamic algorithm for constrained submodular optimization. In Neur IPS, 2020.

Lin, H. and Bilmes, J. A. A class of submodular functions for document summarization. In ACL, pp. 510 520, 2011.

Mc Auley, J. J. and Leskovec, J. Learning to discover social circles in ego networks. In NIPS, pp. 548 556, 2012.

Mirzasoleiman, B., Karbasi, A., Sarkar, R., and Krause, A. Distributed submodular maximization: Identifying representative elements in massive data. In NIPS, pp. 2049 2057, 2013.

Mirzasoleiman, B., Karbasi, A., and Krause, A. Deletionrobust submodular maximization: Data summarization with the right to be forgotten . In ICML, volume 70 of Proceedings of Machine Learning Research, pp. 2449 2458. PMLR, 2017.

Mitrovic, S., Bogunovic, I., Norouzi-Fard, A., Tarnawski, J., and Cevher, V. Streaming robust submodular maximization: A partitioned thresholding approach. In NIPS, pp. 4557 4566, 2017.

Monemizadeh, M. Dynamic submodular maximization. In Neur IPS, 2020.

Nemhauser, G. L., Wolsey, L. A., and Fisher, M. L. An analysis of approximations for maximizing submodular set functions i. Math. Program., 14:265 294, 1978.

Norouzi-Fard, A., Tarnawski, J., Mitrovic, S., Zandieh, A., Mousavifar, A., and Svensson, O. Beyond 1/2approximation for submodular maximization on massive data streams. In ICML, pp. 3826 3835, 2018.

Troyanskaya, O. G., Cantor, M. N., Sherlock, G., Brown, P. O., Hastie, T., Tibshirani, R., Botstein, D., and Altman, R. B. Missing value estimation methods for DNA microarrays. Bioinform., 17(6):520 525, 2001.

Zheng, J., Jiang, Z., Chellappa, R., and Phillips, P. J. Submodular attribute selection for action recognition in video. In NIPS, pp. 1341 1349, 2014.

Łacki, J., Haeupler, B., Grunau, C., Jayaram, R., and Rozhoˇn, V. Fully dynamic consistent k-center clustering. In SODA, pp. 3463 3484, 2024.

Consistent Submodular Maximization

A. Instability of known algorithms

We propose here two instances that highlight the instability of known algorithms. The instance in Example A.1 is such that both the optimal solution and the output of the greedy algorithm (Nemhauser et al., 1978) change entirely after every insertion. We then brieﬂy discuss, in Example A.2, a simple instance that forces the SIEVE-STREAMING algorithm (Badanidiyuru et al., 2014) and its modiﬁed version SIEVE-STREAMING++ (Kazemi et al., 2019) to behave in a non-consistent way. Example A.1. Let δ (0, 1) be a small parameter used to break ties, and consider the following weighted covering instance, parameterized by an integer i and cardinality constraint k. The base set E is given by the pairs {(a, b), for a, b {0, . . . , i}}. We refer to each pair (a, b) as an item. The weights of the items are as follows: all items have unitary weight, but the following:

w(0,0) = 0 w(a,0) = δ (2a + 1) for a = 0 w(0,b) = δ 2b for b = 0

The weighted covering function is monotone submodular and is deﬁned as follows:

(a,b): s S,(a,b) s w(a,b).

Note, f is deﬁned over subsets of E, not on items. The subsets of E we consider in our instance are the rows and columns of E: Ra is deﬁned as {(a, 0), . . . , (a, i)}, while Cb is deﬁned as {(0, b), . . . , (i, b)}. The stream is constructed as follows: C1, R1, C2, R2, . . . , Cℓ, Rℓ, . . . , Ci, Ri. Consider now what happens after 2k insertions. The optimal solution (which is the same output by running greedy on the elements arrived so far) is as follows: if the last arrived element is a row, then the optimal solution is given by the last k arrived rows; conversely, if the last arrived element is a column, then the optimal solution is given by the last k arrived columns. This means that the k elements in the dynamic solution change after each insertion! Note, the elements in the ﬁrst row and ﬁrst column are only there for tie-breaking. Example A.2. The SIEVE-STREAMING algorithms lazily maintains a set of geometrically increasing active thresholds O (of the type τ = (1 + ε)j, for some j Z and input parameter ε) and a candidate solution for each one of them; then outputs the best of these candidates. In particular, when a new element et arrives, with value way larger than all the previous ones, a new threshold is activated and the corresponding candidate solution Sτ is initiated (Sτ = {et}). It is then clear that any instance characterized by elements with dramatically increasing values would force the algorithm to continuously change its solution. For instance, consider an additive function with f(et) = 2t: after each insertion, the solution output by SIEVE-STREAMING would be the singleton {et}. Playing with similar arguments, it is not hard to construct an instance that completely change solution every k insertions (e.g., f(et) = k t/k ).

B. The analysis of SWAPPING is tight

The SWAPPING algorithm is known to provide a 4-approximation to the optimum (Chakrabarti & Kale, 2015). In this Section we ﬁrst report the pseudocode for completeness, and then prove that the analysis is tight, meaning that for any ε (0, 1), there exists an instance of the problem where the solution computed by SWAPPING is at least a (4 ε) factor away from the optimal one (Example B.1). Finally, in Figure 2 we report the empirical performances of SWAPPING, SIEVE-STREAMING, and our algorithms on such hard instance. Example B.1. Fix any ε (0, 1), and consider the following weighted covering instance, parameterized by an integer i that we set later and the cardinality constraint k = 2i. The set of items is E = {ej ℓ| j {0, . . . , i}, ℓ {1, . . . , k}}. Consider the partition of E into i + 1 bundles of items E0, E1, . . . , Ei, where each bundle has k items Ej = {ej 1, . . . , ej k}. Let δ > 0 be a small positive constant, which we will set later. The weight of the generic element ej ℓin E is wj ℓ= 2j if j = i and wi ℓ= 2i δ otherwise. Now that we have the auxiliary set E, we can deﬁne the stream π of subsets of E as follows. For 0 j < i, let πj be the subsequence {ej 1}, . . . , {ej k}, Ej. Let πi be the subsequence {ei 1}, . . . , {ei k} (without bundle Ei at the end). Then π is given by the concatenation of π0, π1, . . . , πi.

Now, the behaviour of SWAPPING on π is clear: it maintains in the solution the last k singletons that arrived up to bundle Ei 1 and ignores the elements in Ei (because of the small δ). In particular, at the end of the stream outputs the solution S = {{ei 1 1 }, . . . , {ei 1 k }}, for a value of

ℓ=1 wi 1 ℓ = k 2i 1.

Consistent Submodular Maximization

Algorithm 4 SWAPPING

1: Environment: stream π of elements, function f, cardinality k 2: S 3: for each new arriving element e from π do 4: w(e) f(e | S) 5: if |S| < k then 6: S S + e 7: else 8: se argmin{w(y) | y S} 9: if 2 w(se) w(e) then 10: S S se + e 11: Return S

0 100 200 300 400 Stream

Sieve Swapping Chasing Encompassing

(a) Weighted Covering - Objective Value

0 100 200 300 400 Stream

Consistency

Sieve Swapping Chasing Encompassing

(b) Weighted Covering - Cumulative Consistency

Figure 2: Experimental results on the weighted covering instance of Example B.1, for δ = 0.01, i = 7, and ε = 0.1.

Consider now the optimal solution S given by the i bundles E0, . . . , Ei 1 and k i singletons from the last bundle, e.g., {ei 1}, . . . , {ei k i}, for a value of

ℓ=k i+1 wi ℓ k (2i+1 1) kδ i 2i.

Note, S is indeed the optimal solution because of our choice of k = 2i: the total weight of the elements in E0 is k, while a singleton from Ei has weight 2i δ. We can now focus on the approximation factor, we have:

2i (δ + 1 + i).

Now, the negative terms go to zero when i goes to inﬁnity (and δ is small enough), thus for any ﬁxed precision ε it is possible to set i and δ so that f(S )/f(S) 4 ε.

C. Further Experimental Results

In our experiments, we set ε = 0.1 in SIEVE-STREAMING and CHASING-LOCAL-OPT, while the cardinality constraint k is consistently set to 20. The order of the stream of elements is the one intrinsic in the dataset we consider. In Figure 3, we report three extra experimental case studies. Besides studying the k-medoid and logdet objective on a random sample (10332 points) from the Uber pickups dataset (Kaggle, 2020) (see the last two columns of Figure 3 for the results), we present results for Personalized Movie Recommendation (ﬁrst column of Figure 3).

Consistent Submodular Maximization

0 1000 2000 3000 Stream

Sieve Swapping Chasing Encompassing

(a) Movie Lens Dataset

0 2000 4000 6000 8000 10000 Stream

Sieve Swapping Chasing Encompassing

(b) k-medoid Clustering on Uber Dataset

0 2000 4000 6000 8000 10000 Stream

Sieve Swapping Chasing Encompassing

(c) Log Det Maximization on Uber Dataset

0 1000 2000 3000 Stream

Consistency

Sieve Swapping Chasing Encompassing

(d) Movie Lens Dataset

0 2000 4000 6000 8000 10000 Stream

Consistency

Sieve Swapping Chasing Encompassing

(e) k-medoid Clustering on Uber Dataset

0 2000 4000 6000 8000 10000 Stream

Consistency

Sieve Swapping Chasing Encompassing

(f) Log Det Maximization on Uber Dataset

Figure 3: Further Experimental Results. The ﬁrst row reports the objective values, the second one the cumulative consistency.

Personalized Movie Recommendation. Movie recommendation systems are one of the common experiments in the context of submodular maximization (e.g., Amanatidis et al., 2021; D utting et al., 2022; Halabi et al., 2023). In this experiment, we have a large collection M of movies that arrive online and we want to design a recommendation system that proposes movies to users. For example, the summary may be a carousel of recommended movies presented to a downstream user, and we would like the selection to be fairly stable. We use the Movie Lens 1M database (Harper & Konstan, 2016), that contains 1000209 ratings for 3900 movies by 6040 users. Based on the ratings, it is possible to associate to each movie m, respectively user u, a feature vector vm, respectively vu. More speciﬁcally, we complete the users-movies rating matrix and then extract the feature vectors using a singular value decomposition and retaining the ﬁrst 30 singular values (Troyanskaya et al., 2001). Following the literature (e.g., Mitrovic et al., 2017), we measure the quality of a set of movies S with respect to user u (identiﬁed by her feature vector vu), using the following monotone submodular objective function:

fu(S) = (1 α) X

s S vu, vs + + α X

m M max s S vm, vs ,

where a, b + denotes the positive part of the scalar product. The ﬁrst term is linear and sum the predicted scores of user u (that is chosen as a random point in [0, 1]30 in our experiments) for the movies in S, while the second term has a facility-location structure and is a proxy for how well S covers all the movies. Finally, parameter α balances the trade off between the two terms; in our experiments it is set to 0.95.