# deletion_robust_nonmonotone_submodular_maximization_over_matroids__ca71d376.pdf

Journal of Machine Learning Research 26 (2025) 1-28 Submitted 9/23; Published 3/25

Deletion Robust Non-Monotone Submodular Maximization over Matroids

Paul D utting duetting@google.com Google Research, Z urich, Switzerland

Federico Fusco fuscof@diag.uniroma1.it Department of Computer, Control and Management Engineering Antonio Ruberti Sapienza University of Rome Rome, Italy

Silvio Lattanzi silviol@google.com Google Research, Barcelona, Spain

Ashkan Norouzi-Fard ashkannorouzi@google.com Google Research, Z urich, Switzerland

Morteza Zadimoghaddam zadim@google.com Google Research, Z urich, Switzerland

Editor: Pradeep Ravikumar

We study the deletion robust version of submodular maximization under matroid constraints. The goal is to extract a small-size summary of the data set that contains a high-value independent set even after an adversary deletes some elements. We present constant-factor approximation algorithms, whose space complexity depends on the rank k of the matroid, the number d of deleted elements, and the input precision ε. In the centralized setting we present a (4.494 + O(ε))-approximation algorithm with summary size O( k+d

ε ) that improves to a (3.582 + O(ε))-approximation with O(k + d

ε ) summary size when the objective is monotone. In the streaming setting we provide a (9.294 + O(ε))-approximation algorithm with summary size and memory O(k + d

ε ); the approximation factor is then improved to (5.582 + O(ε)) in the monotone case. Keywords: Submodular maximization, streaming algorithms, deletion robust, approximation algorithms, randomized algorithms

1. Introduction

Submodular maximization is a fundamental problem in machine learning that encompasses a broad range of applications, including active learning (Golovin and Krause, 2011) sparse reconstruction (Bach, 2010; Das and Kempe, 2011; Das et al., 2012), video analysis (Zheng et al., 2014), and data summarization (Lin and Bilmes, 2011; Bairi et al., 2015). Given a submodular function f, a universe of elements V , and a family F 2V of feasible subsets of V , the optimization problem consists in ﬁnding a set S F that maximizes f(S). A natural choice for F are capacity constraints (a.k.a. k-uniform matroid constraints) where any subset S of V of size at most k is feasible. Another standard restriction, which

c 2025 Paul D utting, Federico Fusco, Silvio Lattanzi, Ashkan Norouzi-Fard, and Morteza Zadimoghaddam.

License: CC-BY 4.0, see https://creativecommons.org/licenses/by/4.0/. Attribution requirements are provided at http://jmlr.org/papers/v26/23-1219.html.

D utting, Fusco, Lattanzi, Norouzi-Fard and Zadimoghaddam

generalizes capacity constraints and naturally comes up in various settings, is matroid constraints. For example, consider a movie recommendation application, where, given a large corpus of movies, we want to produce a set of recommended videos that contains at most one movie from each genre. This can be modeled as a partition matroid.

Exact submodular maximization is NP-hard, but eﬃcient constant-factor approximation algorithms exist in both the centralized and in the streaming setting (e.g., Fisher et al., 1978; C alinescu et al., 2011; Chakrabarti and Kale, 2015; Feldman et al., 2018; Buchbinder and Feldman, 2024).

In this work, we design algorithms for submodular maximization over matroids that are robust to deletions. Primary motivations for considering deletions in the applications are privacy and user preferences, as users may exert their right to be forgotten or may update their preferences and thus exclude some of the data points. For instance, in the earlier movie recommendation example, a user may mark some of the recommended videos as seen or inappropriate , and we may wish to quickly update the list of recommendations accordingly.

1.1 The Deletion Robust Approach

Following Mitrovic et al. (2017), we model robustness to deletion as a two-phase game against an adversary. In the ﬁrst phase, the algorithm receives a robustness parameter d and chooses subset W V as a (possibly randomized) summary of the whole data set V , which can be given either oﬄine or in streaming. Concurrently, an adversary selects a subset D V with |D| d. In the second phase, the adversary reveals D, and the algorithm determines a feasible solution in the portion of the summary that has not been deleted. The goal of the algorithm is to extract a feasible subset of W \ D that is competitive with the optimal solution in V \ D. Natural performance metrics in this model are the algorithm s approximation guarantee and its space complexity as measured by the size of the set W. We study an oblivious adversary, whose choice of the deleted elements D does not depend on the realized W, but possibly on the structure of the algorithm (not on its random bits). This is a natural assumption in the applications. Consider the movie recommendation example: the fact that a movie has already been watched by the user or that it is deemed inappropriate is independent of whether the speciﬁc ﬁlm has been selected or not in the summary.

In this model, to obtain any constant-factor approximation, the summary size has to be Ω(k + d), even when f is additive and the constraint is a k-uniform matroid. To see this, consider the case where exactly k + d the elements have unitary weight, and the remaining elements have weight zero. The adversary selects d of the valuable elements for deletion, but the algorithm does not know which. To be protected against any possible choice of the adversary, the best strategy of the algorithm is to choose W uniformly at random from the elements that carry weight. This leads to an expected weight of the surviving elements of k multiplied by |W|/(k+d), while the optimum is k.

Prior work gave deletion robust algorithms for the special case of k-uniform matroids and monotone objectives. The state-of-the-art is a 2 + O(ε) approximation with O(k + d log k

ε2 ) space in the centralized setting, while in the streaming setting the same approximation is achievable at the cost of an extra multiplicative factor of O(log k

ε ) in space complexity (Kazemi et al., 2018).

Deletion Robust Non-Monotone Submodular Maximization over Matroids

Reference Objective Model Approximation Summary size

Corollary 9 (this work) Non-Monotone Centralized 4.494 + O(ε) O k+d

Corollary 10 (this work) Monotone Centralized 3.582 + O(ε) O k + d ε2 log k

Zhang et al. (2022) Monotone Centralized 3.582 + O(ε) O k + d

Mirzasoleiman et al. (2017) Non-Monotone Streaming 5.205 O(kd) Theorem 12 (this work) Non-Monotone Streaming 9.294 + O(ε) O k + d ε2 log k

Mirzasoleiman et al. (2017) Monotone Streaming 3.147 O(kd) Corollary 15 (this work) Monotone Streaming 5.582 + O(ε) O k + d ε2 log k

Zhang et al. (2022) Monotone Streaming 4 O(k + d)

Table 1: The table summarizes the results for deletion robust submodular maximization with matroid constraints. Note that the O hides terms poly-logarithmic in the rank k. The approximation guarantees of Mirzasoleiman et al. (2017) are obtained via the state-of-the-art streaming algorithms for submodular maximization with matroid constraints (Buchbinder and Feldman (2019) and Feldman et al. (2022)). Zhang et al. (2022) is a follow-up paper.

For general matroids, Mirzasoleiman et al. (2017) give a black-box reduction that adapts (non-robust) submodular maximization algorithms to the deletion robust setting. Their approach yields a 3.147-approximation algorithm for monotone objectives using Feldman et al. (2022) and a 5.205 approximation for non-monotone objectives via Buchbinder and Feldman (2019). The downside of Mirzasoleiman et al. (2017) is that the space complexity is O(kd).1 The multiplicative factor d is inherent in the construction, and the k is needed by any streaming subroutine, so this approach necessarily leads to a space complexity of Ω(dk). The main open problem from their work is to design space-eﬃcient algorithms for the centralized and streaming problems. These questions are particularly important in many practical scenarios where data sets are large and space is a crucial resource.

1.2 Our Results

We present the ﬁrst constant-factor approximation algorithms for deletion robust submodular maximization subject to general matroid constraints with almost optimal space usage, i.e., our algorithms only use O(k + d) space. More formally, in the centralized setting we present a (4.494 + O(ε))-approximation algorithm with summary size O(k+d

ε), that can be improved to a (3.582+O(ε))-approximation with summary size O(k+ d

ε) if the objective is monotone. In the streaming setting we provide a (9.294 + O(ε))-approximation algorithm with summary size and memory O(k+ d

ε), that becomes a (5.582+O(ε))-approximation if the objective is monotone. We remark that, besides the black box results by Mirzasoleiman et al. (2017) that have a far-from-optimal space usage, we are the ﬁrst to oﬀer a positive result for non-monotone submodular functions in the deletion robust setting, with any type of constraint. All our results are summarized and compared to the existing literature in Table 1. For monotone objectives, the approximation factors in the two cases are 2 + β and 4 + β, where β is e/(e 1) 1.582, i.e., the best-possible approximation guarantee for the standard

1. Where the O notation hides logarithmic factors.

D utting, Fusco, Lattanzi, Norouzi-Fard and Zadimoghaddam

centralized problem (C alinescu et al., 2011; Feige, 1998). For the non-monotone case, we still have a 2 + β, where β 2.494 is the state-of-the-art approximation guarantee for non-robust centralized submodular maximization subject to matroid constraint (Buchbinder and Feldman, 2024); our approximation for the streaming setting also depends on the routine used, but in a more intricate way. We mention that the best approximation factor for submodular maximization subject to matroid constraint is still an open problem: the optimal approximation factor lies between 2.494 (provided by the current best algorithm by Buchbinder and Feldman, 2024), and 2.092 (as per the impossibility result in Gharan and Vondr ak, 2011). Thus, better algorithms may exist and would automatically improve our approximation guarantees. We point out that the state-of-the-art approximation guarantees for (non-robust) submodular maximization with matroid constraint in the streaming model are 3.147-approximation in the monotone case and a 5.205-approximation in the general non-negative, i.e. possibly non-monotone, case (Feldman et al., 2022). The price of robustness is thus just an extra small additive term in the approximation factor, depending on the setting. At the same time, up to possibly a logarithmic factor, the memory requirements of all our results are tight.

1.3 Our Techniques

Intuitively, the extra diﬃculty in obtaining space-eﬃcient robust deletion summaries with general matroid constraints as opposed to cardinality constraints is that the algorithm can only use elements respecting the matroid constraint to replace a good deleted element from a candidate solution. This phenomenon gets ampliﬁed when multiple elements need to be replaced. This contrasts sharply with the special case of k-uniform matroids, where all elements can replace any other element.

The Centralized Setting. Our algorithms start by setting a logarithmic number of value thresholds that span the average contribution of relevant optimum elements, and use these to group together elements with similar marginal values. The candidate solution is constructed using only elements from large enough bundles (at least a factor of 1/ε larger than the number of deletions). Random selection from a large bundle protects the value of the selected solution against adversarial deletions. In the centralized algorithm, it is possible to sweep through the thresholds in decreasing order. This monotonic behavior helps us design a charging mechanism for high-value elements dismissed due to the matroid constraint. We use the matroid structural properties to ﬁnd an injective mapping from the optimum elements rejected by the matroid property to the set of selected elements. This allows us to claim that the marginal value of missed opportunities cannot dominate the values of added elements. Finally, since each ﬁxed element of the base set has a very low probability of being added to the solution (due to the deletion robust sampling), we can handle for free the non-monotonicity of the objective in the analysis via a well-known sampling argument for submodular functions (Feige et al., 2007).

The Streaming Setting. In the streaming setting, elements arrive in an arbitrary order in terms of their membership to various bundles. We can still keep adding elements as long as a large enough bundle exists, but face the problem that elements from lower-value bundles

Deletion Robust Non-Monotone Submodular Maximization over Matroids

may prevent us from selecting some high-value elements due to the matroid constraint. So, when considering a new element, we allow for a swap operation with any of the elements in the solution to maintain feasibility with respect to the matroid constraint. We perform the swap if the marginal value of the new element e is substantially (a constant factor) higher than the marginal value that the element e we are kicking out of the solution had when it was added to the solution. The constant factor gap between marginal values helps us account for not only e but also the whole potential chain of elements that e caused directly or indirectly to be removed during the algorithm. As we repeatedly swap elements in and out of the solution, it no longer holds that each ﬁxed element of the base set has a very low probability of being, at some point, added to the solution. To tackle non-monotonicity, we thus embed a uniform sampling step (as in, e.g., Feldman et al., 2018; Amanatidis et al., 2021, 2022) in the procedure outlined above: any new element is discarded with some ﬁxed probability before even being considered for swapping.

The Deletion Parameter. Finally, a consideration on the deletion robust parameter d. Our algorithms (like those from earlier work) do not need exact knowledge of d; an upper bound yields the same approximation at the cost of a larger summary. Also, with some extra work, it is possible to show that our approximation guarantees degrade gracefully when the estimate of d is too conservative and more elements are deleted. Note that some prior knowledge on d is a reasonable assumption in the applications. Without prior information about d, achieving good approximation and memory eﬃciency is impossible.

1.4 Related Work

Robust submodular optimization has been studied for more than a decade. In Krause et al. (2008), the authors study robustness from the perspective of multiple agents each with its own submodular valuation function; their objective is to select a subset that maximizes the minimum among all the agents valuation functions. This objective could be seen as a max-min fair subset selection. Another robustness setting that deals with multiple valuation functions is distributionally robust submodular optimization (Staib et al., 2019) in which we can access samples from a distribution of valuation functions. These settings are fundamentally diﬀerent from the robustness setting we study in our paper.

Robustness against few deletions. Orlin et al. (2018) and Bogunovic et al. (2017) investigate robustness in the presence of a few deletions. In their model, the algorithm needs to ﬁnalize the solution before the adversarial deletions are revealed and the adversary sees the choices of the algorithm. Therefore, having at least k deletions reduces the value of any solution to zero. This is the most prohibitive deletion robust setting we are aware of in the literature, and not surprisingly, the positive results of Orlin et al. (2018) and Bogunovic et al. (2017) are mostly useful when we are dealing with a few deletions.

Dynamic setting with d deletions. Mirzasoleiman et al. (2017) study submodular maximization and provide a general framework to empower insertion-only algorithms to process deletions on the ﬂy as well as insertions. Their result works on general constraints, including matroids. As a result, they provide dynamic algorithms that process a stream of deletions and insertions with an extra multiplicative overhead of d (on both the computation time and memory footprint) compared to insertion-only algorithms. Their elegant idea

D utting, Fusco, Lattanzi, Norouzi-Fard and Zadimoghaddam

consists in running d + 1 concurrent streaming submodular maximization algorithms where d is the maximum number of deletions. Every element is sent to the ﬁrst algorithm. If it is not selected, it is sent to the second algorithm; if it is not selected again, it is sent to the third algorithm, and so on. With this trick, they maintain the invariant that the solution of one of these algorithms is untouched by the adversarial deletions, and therefore this set of O(dk) elements suﬃce to achieve robust algorithms for matroid constraints. This approach has the drawback of having per update computation time linearly dependent on d (in the worst case, a new element can be processes by each one of the d algorithms), which can be prohibitive for a large number of deletions. It also has a total memory of O(dk), which could be suboptimal compared to the lower bound of Ω(d + k) presented in the Introduction. The gap between this lower bound and the O(dk) memory footprint in Mirzasoleiman et al. (2017) motivates the follow-up works on designing even more memory and computationally eﬃcient algorithms.

Fully-dynamic setting. A closely related line of work investigates submodular maximization on stream of arbitrary insertions and deletions, where the goal is to maintain a good approximation to the dynamic optimum with per update computation time that depends poly-logarithmic in the total number of updates (length of the stream of updates). For cardinality constraint, Lattanzi et al. (2020) and Banihashem et al. (2023) provide a fully polynomial 2-approximation algorithm characterized by poly-logarithmic update time in both n and k, while Monemizadeh (2020) independently design an algorithm with similar approximation guarantee and an update time quadratic in the cardinality constraint with a smaller dependence on logarithmic terms. In subsequent work, D utting et al. (2025) and Banihashem et al. (2024) tackle matroid constraints by designing constant factor approximation algorithms with update time that depends at most logarithmically in n. Note that these algorithms much faster computation time comes at the cost of a potentially much larger memory footprint (up to the whole ground set), making them impractical in the deletion robust setting we study.

Two-phase model. Mitrovic et al. (2017) and Kazemi et al. (2018) are the ﬁrst to study the deletion robust setting we consider in our work. They independently design submodular maximization algorithms for k-uniform matroids. Mitrovic et al. (2017) propose a streaming algorithm that achieves constant competitive ratio with memory O((k+d) polylog(k)). Their results hold also against the adaptive adversary, that is aware of the summary set W before selecting the set D of deleted elements. On the other hand, Kazemi et al. (2018) design centralized and distributed algorithms with constant factor approximation and O(k+d log(k)) memory, as well as a streaming algorithm with 2-approximation factor and memory footprint of O(k log(k) + d log2(k)). We borrow some of their ideas, including bundling elements based on their marginal values and ensuring that elements are added to the solution before deletions only if they are selected uniformly at random from a large enough bundle (pool of candidates). A subsequent paper (Avdiukhin et al., 2019) shows how to obtain algorithms for the case of knapsack constraints with similar memory requirements and constant factor approximation. We aim to the generality of the matroid constraints studied in Mirzasoleiman et al. (2017) while maintaining the almost optimal computation time and space eﬃciency of Mitrovic et al. (2017) and Kazemi et al. (2018).

Deletion Robust Non-Monotone Submodular Maximization over Matroids

Sliding window. Another well-studied robustness model is the sliding window model. In this case, the deletions occur as the algorithm sweeps through the stream of elements; in that sense, they occur regularly rather than in an adversarial manner. Every element is deleted exactly W steps after its arrival. So, at every moment, the most recent W elements are present, and the objective is to select a subset of these present elements. Epasto et al. (2017) design a 3-approximation algorithm for the case of cardinality constraints with memory independent of the window size W. Zhao et al. (2019) provide algorithms that extend the sliding window algorithm to settings where elements have non-uniform lifespans and leave after arbitrary times.

Non-robust literature. Prior to the deletion robust models and motivated by large-scale applications, Mirzasoleiman et al. (2013) design the distributed greedy algorithm and shows it achieves provable guarantees for the cardinality constraint problem under some assumptions. In follow-up work, Mirrokni and Zadimoghaddam (2015) provide core-set frameworks that achieve constant factor approximation in the distributed setting. Barbosa et al. (2016) show how to approach the optimal e/(e 1) approximation guarantee by increasing the round complexity of the distributed algorithm. For the case of matroid constraints, Ene et al. (2019) provide distributed algorithms that achieve the e/(e 1) approximation with a poly-logarithmic number of distributed rounds. For the streaming setting, Feldman et al. (2022) provides a 3.147 competitive ratio algorithm with O(k) memory with a matroid constraint of rank k.

Preliminary version and follow-up. A preliminary version of this work considering only monotone objectives has appeared as D utting et al. (2022). In follow-up work, Zhang et al. (2022) design a simple algorithm for the streaming version of the deletion-robust submodular maximization problem with monotone objective subject to a p-matroid constraint. Their algorithm has optimal space usage of O(k + d) and achieves a 4p approximation in the streaming case. They handle robustness via an elegant non-uniform sampling technique that avoids the extra O(log k) due to keeping the diﬀerent buckets. For more details, see Table 1.

2. Problem formulation and preliminary results

We consider a set function f : 2V R 0 on a ground set V of n elements. Given two sets X, Y V , the marginal gain of X with respect to Y quantiﬁes the change in value of adding X to Y and is deﬁned as f (X | Y ) = f(X Y ) f(Y ) .

When X consists of a singleton x, we use the shorthand f(x | Y ) instead of f({x} | Y ). The function f is called monotone if f (e | X) 0 for any set X V and element e V , and submodular if for any two sets X and Y such that X Y V and any element e V \ Y we have f (e | X) f (e | Y ) . Throughout the paper, we assume that f is given in terms of a value oracle that computes f(S) for given S V , and that f is normalized, i.e., f( ) = 0. We do not assume monotonicity of the function f when not speciﬁed otherwise. We slightly abuse the notation, and for a set X and an element e, use X + e to denote X {e} and X e for X \ {e}.

Matroids. A non-empty family of sets M 2V is called a matroid if it satisﬁes the following properties:

D utting, Fusco, Lattanzi, Norouzi-Fard and Zadimoghaddam

Downward-closure: if A B and B M, then A M;

Augmentation: if A, B M with |A| < |B|, then e B such that A + e M.

We call a set A 2V independent, if A M, and dependent otherwise. An independent set that is maximal with respect to inclusion is called a base; all the bases of a matroid share the same cardinality k, which is referred to as the rank of the matroid. Dually to basis, for any A / M, a circuit C(A) is deﬁned as a minimal dependent subset of A, i.e., C(A) / M such that all its proper subsets are independent. The span or closure of a set A is deﬁned as the set cl(A) of all the elements e V such that r M(A + e) = r M(A)2. Given any matroid M and set S, it is possible to deﬁne two auxiliary matroids:

the contraction of M by S, written M/S, which is the matroid on V \ S with rank function r(T) = r M(T S) r M(S),

the restriction of M to S, which is the matroid M on S whose independent sets are the independent sets of M that are contained in S.

For a comprehensive overview of matroids and their properties, we refer to Schrijver (2003).

2.1 The Deletion Robust Model

The deletion robust model consists of two phases. The input of the ﬁrst phase is the ground set V and a robustness parameter d, while the input of the second phase is an adversarial set of d deleted elements D V , along with the outputs of the ﬁrst phase. The goal is to construct a small size summary W V that is robust to deletions in the ﬁrst phase, and a solution S W \ D that is independent with respect to matroid M in the second phase. The problem s diﬃculty lies in the fact that the summary W has to be robust against any possible choice of set D by an adversary oblivious to the randomness of the algorithm. For any set of deleted elements D, the optimum solution denoted by OPT(V \ D) is

OPT(V \ D) = argmax R V \D,R M f(R).

We say that a two-phase algorithm is an α-approximation for this problem if for all D V s.t. |D| d, it holds that

f(OPT(V \ D)) α E [f(SD)] ,

where SD is the solution produced in Phase II when set D is deleted, and the expectation is with respect to the internal randomization of the algorithm. Besides the approximation factor α, an important feature of a two-phase algorithm is its summary size, i.e., the cardinality of the set W returned by the ﬁrst phase. In this paper, we also consider the streaming version of the problem, where the elements in V are presented in some arbitrary order (Phase I). At the end of the online phase, the algorithm outputs a deletion-robust summary W. Finally, the deleted set D is revealed and

2. We denote with r M the rank function associated to matroid M (we omit the dependence on the matroid if clear from the context). The rank of a generic subset S of V is the cardinality of the largest independent set in S.

Deletion Robust Non-Monotone Submodular Maximization over Matroids

Phase II on W \D takes place oﬄine. The quality of an algorithm for the streaming problem is not only assessed by its approximation guarantee and summary size but also in terms of its memory, i.e., the cardinality of the buﬀer of elements maintained in the online phase.

2.2 Preliminary Results

We conclude this section with three preliminary results: the folklore Sampling Lemma by Buchbinder et al. (2014), a Combinatorial Lemma derived from the well-known Hall s Marriage Theorem, and a graph theoretical result from Feldman et al. (2018). While the ﬁrst lemma is our main tool to address non-monotonicity, the other two results are used for the charging arguments in the analysis of our algorithms. We start with the Sampling Lemma, originally proved by Feige et al. (2011), and that we report as in Lemma 2.2 of Buchbinder et al. (2014). It states that the value of a random set cannot decrease much if no single element has a probability too large to be sampled.

Lemma 1 (Sampling Lemma) Let f : 2V R 0 be a (possibly not normalized) submodular set function, let X V and let X(p) be a random subset, where each element of X appears with probability at most p (not necessarily independent). Then E [f(X(p))] (1 p)f( ) .

We move now to our Combinatorial Lemma. At a high level, it is a general tool that maps injectively elements in a dependent set to those of an independent set under some conditions.

Lemma 2 (Combinatorial Lemma) Consider a matroid M 2V , set F V , and set G M. Suppose that for all x G \ F there exist a set Fx F, Fx M such that Fx + x M. Then there exists a mapping h : G \ F F such that

h(x) Fx for all x G \ F

h(x) = h(y) for all x, y G \ F with x = y.

Intuitively, h as a semi-matching that matches all elements in G \ F to an element in x G\F Fx, while some of the elements in x G\F Fx F can remain unmatched. To prove the lemma, we use the combinatorial version of Hall s Marriage Theorem (Hall, 1935), which concerns set systems and the existence of a transversal (a.k.a. system of distinct representatives). Formally, let S be a family of ﬁnite subsets of a base set X (S may contain the same set multiple times). A transversal is an injective function f : S X such that f(S) S for every set S S. In other words, f selects one representative from each set in S so that no two of these representatives are equal.

Theorem 3 (Hall s Theorem) A family of sets S has a transversal if and only if S satisﬁes the marriage condition: for each subfamily W S, it holds that |W| S F W F .

We are ready to present the proof of our Combinatorial Lemma. Proof of Lemma 2 Let W be any subset of G \ F and deﬁne FW = S x W Fx. We want

to show that |FW | = S x W Fx |W|, this would conclude the proof by Theorem 3.

In order to do that we ﬁrst show that cl(W FW ) = cl(FW ), where cl(.) denotes the closure (or span) of a set. We have that W cl(FW ), in fact, for each element x W, there

D utting, Fusco, Lattanzi, Norouzi-Fard and Zadimoghaddam

Algorithm 1 Centralized Algorithm Phase I

1: Input: Precision ε and deletion parameter d

2: Vd elements with the d largest values.

3: V V \ Vd, A .

4: argmax{f(e) | e V }. Value of (d + 1)st largest element

5: T = n (1 + ε)i | i Z, ε (1+ε)k < (1 + ε)i o

6: for τ T in decreasing order do

7: Bτ {e V | A + e M, f(e | A) τ}

8: while |Bτ| k+d

9: e a random element sampled independently and uniformly from Bτ.

10: V V e, A A + e.

11: Bτ {e V | A + e M, f(e | A) τ}

12: Remove Bτ from V .

13: B Vd S τ T Bτ.

14: return A, B.

exists a subset Fx FW such that Fx + x M and therefore x cl(Fx), which implies, by monotonicity of the closure with respect to the inclusion, that x cl(FW ). We also know that FW cl(FW ), hence FW W cl(FW ). If we apply the closure to both sets, we get

cl(FW W) cl(cl(FW )) = cl(FW ) cl(FW W),

where the equality follows by noting that the closure of the closure is the closure itself. This shows that cl(FW ) and cl(FW W) coincide, as claimed. Now let s look at the matroid M restriction to cl(FW W) = cl(FW ). Afterward, contract this matroid by (F G) cl(FW ) and call the resulting matroid with M . We claim that W is independent in this new matroid M . This is due to W G \ F and W (F G) G and G M. If we call r the rank of the matroid M , we have r M(cl(FW )) r M(cl(FW ) \ (F B)) = r |W|. Finally, by the property of the rank function and of the closure, it holds that |FW | r M(FW ) = r M(cl(FW )). Putting these two chains of inequalities together, we obtain |FW | |W| as claimed.

The last preliminary result we present is Lemma 13 of Feldman et al. (2018). It concerns directed acyclic graphs (DAGs) where the nodes are also elements of a matroid. Under some assumptions, it guarantees the existence of an injective mapping between elements in an independent set and the sinks of the DAG. As a convention, we denote with δ+(u) the out-neighborhood of any node u.

Lemma 4 Consider an arbitrary directed acyclic graph G = (V, E) whose vertices are elements of some matroid M. If every non-sink vertex u of G is spanned by δ+(u) in M, then for every set S of vertices of G which is independent in M there must exist an injective function ψ such that, for every vertex u S, ψ(u) is a sink of G which is reachable from u.

Deletion Robust Non-Monotone Submodular Maximization over Matroids

3. Centralized Algorithm

In this section, we present a centralized algorithm for the Deletion Robust Submodular Maximization problem subject to a matroid constraint. We start by introducing some notations: e denotes the (d + 1)-th element with the largest value according to f, and is its value.

Phase I. Our Phase-I algorithm takes in input a generic precision parameter ε, and uses it to deﬁne the set of relevant thresholds T:

T = (1 + ε)i | i Z, ε (1 + ε)k < (1 + ε)i .

The ﬁrst phase of our algorithm constructs iteratively the summary W using two main sets: a candidate solution A and a reservoir B of good elements. The elements in the reservoir are selected according to their marginal value and are grouped into buckets of similar elements. The algorithm goes over the thresholds in T in decreasing order and updates sets A, B, and V (for loop in the pseudocode). Let τ be the threshold considered at some point, then Bτ contains any element e V such that f(e | A) τ and A + e M (lines 7 and 11). These high-contribution elements can be added to A while maintaining its independence. As long as the size of Bτ is larger than (k+d)/ε (line 8), the algorithm chooses uniformly at random an element from Bτ (line 9), adds it to A (line 10) and recomputes Bτ (line 11). We observe that A is robust to deletions, i.e., when d elements are deleted, the probability of one speciﬁc element in A being deleted is intuitively at most ε. Moreover, since each element added to A is drawn from a pool of elements with similar marginals, the value of this set after the deletions decreases at most by a factor (1 ε) in expectation (this is a very slack bound). As soon as the cardinality of Bτ drops below (k+d)/ε, no more elements can be added directly from it while keeping A robust and feasible. Therefore, we remove these elements from V and save them for Phase II (line 12). During the execution of the algorithm, we need to take special care of the top d element with the highest f values. To avoid complications, we remove them from the instance before starting the procedure and add them to set B at the end. This does not aﬀect the general logic and only simpliﬁes the presentation and the proofs. The pseudocode of the centralized algorithm for Phase I is given in Algorithm 1.

Phase II. The summary W computed at the end of Phase I is composed by the union of A and B and is then passed to the second phase of our algorithm, which uses as routine an arbitrary algorithm Alg for centralized submodular maximization subject to matroid constraint. Alg takes in input a set of elements, function f and matroid M and returns a β-approximate solution (as a convention, we denote with Alg(X) the solution output by the algorithm on input X). In this phase, we simply use Alg to compute a solution among all the elements in A and B that survived the deletion and return the best among the computed solution and the value of the surviving elements in A.

Theorem 5 Consider deletion-robust submodular maximization with matroid constraints in the centralized setting. For any constant δ (0, 1), the Centralized Algorithm (Algorithms 1 and 2) with precision parameter ε (0, δ) is in expectation a (2 + β + O(ε))-approximation algorithm with summary size O(k+d

ε), where β is the approximation factor of Alg.

D utting, Fusco, Lattanzi, Norouzi-Fard and Zadimoghaddam

Algorithm 2 Algorithm Phase II

1: Input: A and B from phase I, set D of deleted elements and optimization routine Alg

2: A A \ D, B B \ D

3: S Alg(A B )

4: return S argmax{f(A ), f( S)}

Proof We start by analyzing the summary size, composed of the sets returned by Algorithm 1 at the end of phase I. Set A belongs to M, so its size is no more than the rank of M: |A| k. Set B is the union of Vd (containing exactly d elements) and S τ T Bτ. Each set Bτ has at most k+d

ε element and there are at most 2

ε such sets (by design of T). Therefore

|A| + |B| k + d + 2(k+d)

We move to the approximation factor. We ﬁx any set D with |D| d and bound the ratio between the expected value of f(S) and f(OPT) (we omit the dependence on D since it is clear from the context). As a ﬁrst step, we relate the value of OPT with that of OPT A. This passage is immediately true when f is monotone but needs some work otherwise.

Lemma 6 It holds that (1 ε)f(OPT) E [f(OPT A)] .

Proof of Lemma 6 Every time a new random element is added to the candidate solution A, it is drawn uniformly at random from a large pool of elements of cardinality at least (k+d)/ε. Denote with xi the ith element added to the solution by the algorithm, by the above consideration, it holds that P (x = xi) ε/(k+d) for any ﬁxed x V and i = 1, . . . k. By union bound, we have the following:

i=1 P (x = xi) ε k k + d ε, x V. (1)

Consider the submodular function g : 2V R 0, deﬁned as g(T) = f(OPT T), for all T V . Applying the Sampling Lemma (Theorem 1) on g, and using Equation (1), yields:

E [f(OPT A)] = E [g(A)] (1 ε)g( ) = (1 ε)f(OPT).

This concludes the proof of the Lemma.

We now partition V \ D into subsets whose elements can be accounted for in a similar way. First, recall that we denoted by B the subset of B surviving the deletion: B = B \ D; it holds that B OPT B . Then, deﬁne L as the set of elements of low value:

L = n e V \ D | f(e | A) ε f(OPT)

Intuitively, L contains all the low-value elements: these elements do not increase the value of the submodular function considerably if added to A (so that we do not lose much by ignoring them). Finally, let F be the surviving elements not added to the summary because

Deletion Robust Non-Monotone Submodular Maximization over Matroids

of feasibility issues; formally, F = V \ (A B L D). We can now decompose the contributions of elements in OPT A using the sets we introduced:

(1 ε)f(OPT) E [f(OPT A)] (by Lemma 6)

E [f(A)] + E f(OPT B | A) (by submodularity)

+E [f(OPT F | A)] + E [f(OPT L | A)] (2)

We study separately the terms in Equation (2). Recall that A is the candidate solution once the deleted elements D are revealed; we use its expected value to bound the ﬁrst term of the inequality. Intuitively, for any element in A, the probability of being deleted is O(ε) since it has been sampled from (d+k)/ε similar elements uniformly at random, and only d elements are deleted. We formalize this idea in the following lemma.

Lemma 7 It holds that E [f(A)] 1+ε

1 ε E [f(A )].

Proof of Lemma 7 The proof is similar to that of Lemma 2 in Kazemi et al. (2017), but without relying on the monotonicity of f. For the sake of the analysis, for each threshold τ T, let Aτ be the set of elements in A that were added during an iteration of the for loop corresponding to threshold τ in Algorithm 1. Moreover, let nτ = |Aτ| and sort the elements in Aτ = {aτ 1, aτ 2, . . . , aτ nτ } according to the order in which they are added to Aτ. We can decompose the value of A using the deﬁnition of marginal value and a telescopic argument:

ℓ=1 f(aτ ℓ| t>τ At {aτ 1, aτ 2, . . . , aτ ℓ 1}).

The elements added in a speciﬁc iteration of the for loop share the same marginal up to an (1 + ε) factor (because they all have marginal contribution larger than the threshold τ of that iteration but did not pass the test with the previous threshold τ (1 + ε)):

τ f(aτ ℓ| t>τAt {aτ 1, aτ 2, . . . , aτ ℓ 1}) τ (1 + ε), τ T, ℓ= 1, . . . , nℓ. (3)

Thus, summing up these contributions, we have

τ T |Aτ| τ f(A) (1 + ε) X

τ T |Aτ| τ. (4)

We decompose the value of A = A \ D similarly. The crucial argument is that any aτ ℓ is drawn uniformly at random from a set of cardinality at least (d+k)/ε where at most d elements lie in D, so

P (aτ ℓ/ D | ℓ nτ) 1 ε d k+d 1 ε. (5)

D utting, Fusco, Lattanzi, Norouzi-Fard and Zadimoghaddam

Note that the randomness here is with respect to the random draw of the algorithm in Phase I, as D is ﬁxed but unknown to the algorithm. Let A τ = Aτ \ D = Aτ A , we have

ℓ=1 I {aτ ℓ/ D} f(aτ ℓ| t>τA t ({aτ 1, . . . , aτ ℓ 1} D))

ℓ=1 I {aτ ℓ/ D} f(aτ ℓ| t>τ At {aτ 1, . . . , aτ ℓ 1}) (by submodularity)

ℓ=1 I {aτ ℓ/ D} τ (by Equation (3))

τ T |A τ| τ.

We apply the expected value to the previous inequality, which results in

τ T E |A τ| τ (1 ε) X

τ T E [|Aτ|] τ 1 ε

1 + εE [f(A)] ,

while the last inequality follows from the right-hand side of Equation (4), the argument behind the second one is based on the linearity of expectation and the bound on the probability as in Equation (5):

E |A τ| = E

ℓ=1 I {aτ ℓ D}

ℓ=1 E [I {aτ ℓ D} I {|Aτ| t}]

ℓ=1 P aτ ℓ D |Aτ| ℓ P (|Aτ| t)

ℓ=1 P (|Aτ| ℓ) (by Equation (5))

= (1 ε)E [|Aτ|] . (6)

This concludes the proof of the Lemma.

For the second term of Equation (2), observe that OPT B M and is contained in the set of elements passed to Alg, so its value is dominated by β times that of Alg(A B ), by the approximation guarantees of the routine Alg:

E f(OPT B | A) β E [f(S)] . (7)

Recall that S is the ﬁnal solution output by the algorithm. Bounding the third term is more involved and is handled in the following lemma via a charging argument that uses our Combinatorial Lemma (Lemma 2).

Lemma 8 It holds that E [f(OPT F | A)] (1 + ε)E [f(A)].

Deletion Robust Non-Monotone Submodular Maximization over Matroids

Proof of Lemma 8 Sort the elements in OPT F = x1, x2, . . . according to the order in which they were removed from some Bτ because of feasibility constraint (since they are in V \ (A B D L) it must be the case). Once an element fails the feasibility test and is removed from some Bτ, it never becomes feasible again because only new elements are inserted in A during the execution of the ﬁrst phase. Furthermore, for each such xi, let Fi be the set A when xi fails the feasibility test. We know that Fi is independent since A is maintained independent throughout the execution of Algorithm 1; moreover, by deﬁnition of Fi, Fi + xi M. By Lemma 2, there exists an injective function h : OPT F A such that h(xi) Fi for all i. Let Ai denote the set A when the element h(xi) gets added to it. Since h(xi) has been added to A before xi, it holds that Ai Fi. We have

f(OPT F | A) X

x OPT F f(x | Ai) (by submodularity and Ai A)

x OPT F (1 + ε) f(h(xi) | Ai)

(1 + ε) f(A).

The second inequality uses the fact that if xi was still feasible to add when h(xi) was added, then their marginals are at most a (1 + ε) factor away. The last inequality follows from a telescopic decomposition of A, the fact that an element is added to the candidate solution A only if its marginal is positive (larger than a certain non-negative threshold), and the fact that h is injective. Applying the expectation to both extremes of the chain of inequalities concludes the proof of the lemma.

The fourth term of Equation (2) refers to at most k elements and can be bounded based on the deﬁnition of L: any element e in L is such that f(e|A) ε f(OPT)

k , by submodularity we have then that

E [f(OPT L | A)] E

e OPT L f(e | A)

ε f(OPT). (8)

Finally, we compose the four bounds in Equation (2), and obtain the following:

(1 ε)f(OPT)

E [f(A)] + E f(OPT B | A) + E [f(OPT F | A)] + E [f(OPT L | A)]

(2 + ε) E [f(A)] + βE [f(S)] + εf(OPT) (by Equations (7, 8) and Lemma 8)

(2 + ε)1 + ε

1 εE f(A ) + β E [f(S)] + εf(OPT) (by Theorem 7)

(2 + ε)1 + ε

1 ε + β E [f(S)] + ε f(OPT) (by deﬁnition of S)

Rearranging terms, we get

f(OPT) E [f(OPT A)]

1 2ε (2 + ε) (1 + ε)

(1 2ε)(1 ε) + β 1 2ε

[2 + β + O(ε)] E [f(S)] .

D utting, Fusco, Lattanzi, Norouzi-Fard and Zadimoghaddam

We conclude with a comment on what we mean by using O(ε) here and throughout the paper: for any ﬁxed constant β and ﬁxed δ (0, 1), there exists a constant C such that the O(ε) term can be replaced with C ε, for any ε (0, δ).

Combining the previous theorem with some routines for non-monotone submodular maximization with matroid constraints, we get an approximation factor of 4.494 if we use the state-of-the-art algorithm as in Buchbinder and Feldman (2024), or of 4.73 if we use the less involved measured continuous greedy (Feldman et al., 2011). Formally, we have the following corollary.

Corollary 9 Consider deletion-robust submodular maximization with matroid constraints in the centralized setting. For any constant δ (0, 1), there exists a constant Cδ such that for any ε (0, δ), a (4.494 + ε Cδ)-approximation algorithm with summary size O(k+d

Let s move our attention to monotone objectives. We get an improved approximation factor of 4 if we use as subroutine the greedy algorithm (Fisher et al., 1978) or 2 + e e 1 3.582 if we use continuous greedy as in C alinescu et al. (2011). Furthermore, it is possible to slightly improve the bound on the summary size.

Corollary 10 Consider deletion-robust submodular maximization with matroid constraints in the centralized setting. For any constant δ (0, 1), there exists a constant Cδ such that for any ε (0, δ), a (3.582 + ε Cδ)-approximation algorithm with summary size O(k + d

Proof First, we study the improved bound on the summary size. Since the function is monotone, we do not need to argue as in Lemma 6 to bound f(OPT) with f(OPT A); thus we can modify the condition on the while loop of Algorithm 1 to simply consider |Bτ| d/ε. With this tweak, there are still at most 1

ε buckets, each of cardinality at most d/ε, thus the summary size is O(k + d

ε). From the approximation point of view, Lemma 7 still holds, as the crucial property that P (I(ℓ, τ)) 1 ε is veriﬁed (we are sampling uniformly at random from a bucket of cardinality at least d/ε where there are at most d deleted elements). The rest of the analysis of Theorem 5 goes through without any other diﬀerence, and we get that

f(OPT) E [f(OPT A)] (2 + ε) 1 + ε

1 ε + β E [f(S)] + ε f(OPT)

We use as optimization routine Alg continuous greedy, therefore we can plug in β = e e 1 and, rearranging the terms we obtain

f(OPT) e (e 1)(1 ε) + (2 + ε)(1 + ε)

h e e 1 + 2 + 8e 7 δ(2e 1)

(e 1)(δ 1)2 | {z } Cδ

ε i E [f(S)] ,

where the last inequality can be numerically veriﬁed and holds for any ε (0, δ).

Deletion Robust Non-Monotone Submodular Maximization over Matroids

4. Streaming Setting

In this section, we consider the streaming setting, wherer the elements of V arrive on a stream in the ﬁrst phase, and the goal is to compute a small summary W using limited (online) memory. Our approach consists in carefully combining the swapping algorithm (Chakrabarti and Kale, 2015) with subsampling (Feldman et al., 2018) in a deletion robust fashion. Our algorithm maintains an independent candidate solution A and buckets Bτ that contain small reservoirs of elements with similar marginal contributions with respect to A. Beyond the candidate solution and the buckets, the algorithm keeps an extra buﬀer Vd containing the best d elements.

We start explaining the algorithm by deﬁning the thresholds values. We do not have an a priori estimate of OPT in the streaming setting. Thus, we do not know the thresholds corresponding to high quality elements upfront. This issue is overcome by initially considering all the powers of (1 + ε) and progressively removing the ones too small with respect to , i.e., the (d + 1)-st largest value seen so far. Note that at any point of its execution, the algorithm actually maintains at most a small (ﬁnite) subset of these thresholds.

Every new arriving element e is ﬁrst used to update Vd (line 7 in the pseudocode): if its value is larger than the minimum in Vd, then we add it to Vd and pop out e, the smallest value element from Vd. If the value of the newly arrived element is smaller than the smallest value element in Vd, we keep the set Vd intact. For simplicity, we denote with e to be the newly arrived element e in this case. In other words, element e is either the new element or the element that lost its place in Vd to the new element. In either case, we use element e in the next steps of the algorithm. The value of is updated if f(e) > (line 8) and all the buckets corresponding to thresholds that are too small are deleted, thus guaranteeing that only a logarithmic number of active buckets is kept (line 9). At this point, element e is put into the correct bucket Bτ, if such bucket still exists (line 11).

Now, new elements are repeatedly drawn uniformly at random from the buckets Bτ with size at least d/ε as long as no bucket contains more than d/ε elements (while loop line 12). These elements drawn from the buckets are added to A if and only if the subsampling is successful (line 13) and it is either feasible to add them directly or they can be swapped with a less important element in A.

We formalize this notion of importance . Each element in the solution is associated with a weight, deﬁned as its marginal value to A when it was ﬁrst considered for addition to the solution (line 14). An element in A is swapped for a more promising one only if the new one has a larger weight by at least a (1 + γ) factor and it is sampled independently with some probability p, while maintaining A independent.

Every time A changes, the buckets Bτ are wholly updated to maintain the invariant that Bτ contains only elements whose marginal value with respect to the current solution A is within τ and (1 + ε)τ (line 26). This property is crucial to ensure deletion robustness. Every bucket contains similar elements, i.e., whose marginal density with respect to the current solution is at most a multiplicative (1 + ε) factor away. When the stream terminates, the algorithm passes the deletion robust summary W (composed by the candidate solution A and B, containing Vd and the surviving buckets) to Algorithm 2. Note that both the centralized and the streaming pipelines share the same algorithm for the second phase.

D utting, Fusco, Lattanzi, Norouzi-Fard and Zadimoghaddam

Algorithm 3 Streaming Algorithm Phase I

1: Input: Precision ε, deletion parameter d, sampling probability p, swapping factor γ

2: T {(1 + ε)i | i Z}

3: A , 0, τmin 0, Bτ for all τ T

4: L , Γ , K , R , F Auxiliary sets for the analysis

5: Vd the ﬁrst d arriving elements

6: for every arriving element e do

7: Add e to Vd and then pop element e with smallest value from Vd 8: max{f(e), }, τmin ε 1+ε

k , T {τ T | τ τmin}

9: Remove all Bτ s.t. τ T The deleted elements are added to L

10: if τmin > f(e | A) then Discard e and continue L L + e

11: Find the largest threshold τ T s.t. f(e | A) τ and add e to Bτ 12: while τ T such that |Bτ| d/ε do

13: Remove one element g from Bτ independently and u.a.r. Γ Γ + g

14: w(g) f(g | A)

15: Draw Xg independently from a Bernoulli with parameter p

16: if A + g M then

17: if Xg = 1, then A A + g

18: else Discard g R R + g

20: kg argmin{w(k) | k C(A + g)}

21: if w(g) > (1 + γ) w(kg) then

22: if Xg = 1, then A A + g kg K K + kg 23: else Discard g R R + g

25: Discard g F F + g

26: Update {Bτ} according to A The deleted elements are added to L

27: B Vd τ T Bτ 28: return A, B

The pseudocode of the streaming algorithm is presented in Algorithm 3. Besides the operations already mentioned, part of the pseudocode relates only to the analysis (sets L, Γ, K, R, and F are indeed only functional to the analysis and play no role in the algorithm). Before stating and proving our theorem, we describe some of the properties of the weight function w that governs the swapping.

Lemma 11 Let w be the weight function, A the candidate solution, A = A \ D, and let K be the set of all the elements that, at some point, were in A but were later swapped. Then, the following properties hold:

(i) γ w(K) w(A) f(A)

(ii) w(A ) f(A )

(iii) f(A K) w(A K)

Deletion Robust Non-Monotone Submodular Maximization over Matroids

Proof The proof of the (i) ﬁrst inequality is similar to the one of Lemma 9 in Chakrabarti and Kale (2015). Crucially, the weight function w is linear, and once an element enters A, its weight is ﬁxed forever as the marginal value it contributed when entering A. Moreover, the weight of all the elements added to A is strictly positive. During the run of the algorithm, every time an element kg is removed from A, the weight of A increases by w(g) w(kg) by its replacement with some element g. Moreover, γ w(kg) w(g) w(kg) for every element kg K since (1 + γ)w(kg) w(g). Summing over all the elements in K, it holds that

kg K w(kg) 1

kg K [w(g) w(kg)] 1

We show the part of the ﬁrst point. Let {a1, a2, . . . aℓ} be the elements in A, sorted in the order in which they were added to A. We have that

i=1 f(ai | {a1, . . . , ai 1})

i=1 f(ai | Aai) =

i=1 w(ai) = w(A),

where Aai is the solution set right before ai is added to A. The second inequality (ii) follows from submodularity, because {a1, . . . , ai 1} Aai (Aai may also contains elements that are later removed from the solution). We have the following chain of inequalities

i=1 I {ai / D} f(ai | {a1, . . . , ai 1} D) (by a telescopic argument)

i=1 I {ai / D} f(ai | {a1, . . . , ai 1}) (by submodularity)

i=1 I {ai / D} f(ai | Aai) = w(A ).

(by submodularity and {a1, . . . , ai 1} Aai)

For the last (iii) inequality, let {x1, x2, . . . xt} be the elements in A K, sorted by the order in which they were added to A. We have that

i=1 f(xi | {x1, . . . , xi 1})

i=1 f(xi | Axi) =

i=1 w(xi) = w(A K),

where Axi is the solution set right before xi is added to A. The inequality follows from submodularity: Axi contains all the elements entered in A before xi but not those that have already been removed.

We are ﬁnally ready for the main result of the Section. Instead of stating a general black box result as in Theorem 5, we consider directly the algorithm from Buchbinder and Feldman (2024) as Alg and analyze the relative algorithm for a suitable choice of p and γ. This is just for clarity of exposition, as the dependence on β of the generic approximation is more involved than in the centralized case.

D utting, Fusco, Lattanzi, Norouzi-Fard and Zadimoghaddam

Theorem 12 Consider the problem of deletion robust monotone submodular maximization with matroid constraints in the streaming scenario. For any constant δ (0, 1), there exists a constant Cδ such that for any ε (0, δ), a (9.294 + Cδ ε)-approximation algorithm with summary and memory size O(k + d

Proof We prove that our streaming pipeline (Algorithm 3 and Algorithm 2) veriﬁes the theorem for the right choice of parameters and optimization routine. We start by bounding the memory used by the algorithm. Sets A and Vd always contains at most k, respectively d, elements. Every time a new element of the stream is considered, all the active Bτ contain at most d/ε elements; furthermore, there are always at most O(1

ε) of them. The invariant ensures that the active thresholds are smaller than (by submodularity) and larger than τmin. Overall, the memory of the algorithm and the summary size is thus O(k + d

ε). As in the analysis of Theorem 5, we ﬁx any set D and study the relative expected performance of our algorithm. In the pseudocode, we have introduced some auxiliary sets that are useful in the analysis: the base set V is partitioned into the elements that are considered at some point in line 13 (contained in Γ), those in the buckets or Vd (in set B) and the ones with low marginals (in set L). We have all the ingredients to decompose OPT A K accordingly:

(1 p)f(OPT) E [f(OPT A K)]

=E [f(A K)] + E [f(OPT L | A K)]

+ E [f(OPT B | A K)] + E [f(OPT Γ | A K)] (9)

Note that the ﬁrst inequality follows from the uniform sampling step (line 13) and the Sampling Lemma (Lemma 1): each ﬁxed element belongs to A K with a probability that is at most p. The second and third terms of the decomposition are fairly easy to bound. If an element e is added to L at some point it means that its marginal contribution with respect to the current solution A is smaller than the current estimate τmin of ε/kf(OPT) (of which it is consistently a lower bound), by submodularity this implies that also its contribution with respect to the ﬁnal A K is such that f(e | A K) ε/kf(OPT), all in all:

f(OPT L | A K) X

x OPT L f(x | A K) ε f(OPT). (10)

We know that OPT D = , this implies that OPT B is contained in B and thus in the summary W, which is passed to the second phase algorithm. Plus, note that OPT B is independent because it is contained in OPT. So, if we use as Alg a routine for centralized submodular maximization with matroid constraint that guarantees a β approximation, we have:

E [f(OPT B | A K)] E [f(OPT B)] β E [f(S)] . (11)

At this point, we get back to the decomposition in Equation (9) and bound the ﬁrst term. We show that the value of all the elements that at some point were in the candidate solutions (corresponding to set A K), is comparable with f(A ). This result overcomes two challenges: on the one hand, it shows that the total value of the elements swapped is upper bounded by 1/γ that of those that are not swapped; while on the other hand, that A is indeed robust to deletion.

Deletion Robust Non-Monotone Submodular Maximization over Matroids

Lemma 13 It holds that

E [f(A K)] 1 + ε

E f(A ) . (12)

Proof of Lemma 13 As a ﬁrst step, we show that w(A) is close to f(A ), at least in expectation with respect to the randomness of the algorithm. Let Aτ be the subset of elements that were added into A coming from Bτ and A τ = Aτ \ D. Moreover, let aτ ℓbe the ℓth element added to Aτ (if any), and denote with nτ the random cardinality of |Aτ|. The crucial observation is that when the algorithm decides to add an element to A from Bτ, then the probability that the element belongs to D is at most ε. Let Aa be the elements in A when a is added, so that w(a) = f(a | Aa). We have

ℓ=1 w(aτ ℓ), while w(A ) = X

ℓ=1 I {a / D} w(aτ ℓ)

We know that, for any thresholds τ, the weight of any element aτ ℓcoming from Bτ respects

τ w(aτ ℓ) < (1 + ε) τ (13)

We can then proceed similarly to the proof of Lemma 7:

E w(A ) = X

ℓ=1 I {a / D} w(aτ ℓ)

τ T τ E |A τ| (by Equation (13))

τ T τ E [|Aτ|] (by the same argument as in Equation (6))

(1 + ε)E [w(A)] , (14)

where, the last inequality follows from Equation (13). Now, let s move our attention to w(A) + w(K). Using the properties of the weight function (Theorem 11), we have that

E [f(A K)] E [w(A K)] (Property (iii))

= E [w(A) + w(K)] (Linearity of w)

E [w(A)] (by Property (i))

1 ε E w(A ) (by Equation (14))

1 ε E f(A ) (by Property (ii))

1 ε E [f(S)] . (By deﬁnition of S)

This concludes the proof of the lemma.

D utting, Fusco, Lattanzi, Norouzi-Fard and Zadimoghaddam

We are left with bounding the marginal contribution of elements in OPT Γ with respect to A K. Γ \ (A K) is partitioned into two disjoint sets, F and R. Set F contains all the elements that failed the swapping test (line 21), while R all the elements that were removed because of sampling either in line 18 or 23.

Lemma 14 Choosing p = 1/(γ+2), it holds that

E [f(OPT Γ | A K)] (1 + γ)1 + ε

1 ε E [f(S)] . (15)

Proof of Lemma 14 Let g be any element in OPT Γ, and ﬁx any compatible history Eg of the algorithm up to the point when g is considered in line 13 of Algorithm 3. The history Eg deterministically induces the candidate solution Ag when g is considered. Suppose g is not in F. In that case, it is possible to either add it directly or swap it with some kg Ag such that f(g | Ag) = w(g) (1 + γ)w(kg), then we have two possibilities: either g is successfully sampled and swapped with some element in the current candidate solution Ag, or g is added to R and discarded. Conditioning on Eg, we have:

p E [I {g R} w(g) | Eg] = p (1 p)E [w(g) | Eg] = (1 p) E [I {g A K} w(g) | Eg] .

Summing over all such g and taking the expectation over all the compatible Eg, we get:

p E [w(OPT R)] = (1 p) E [w(OPT (A K))] . (16)

We now move our attention to the elements in OPT F, i.e., the good elements in OPT that have been discarded because of the matroid constraint, and we ﬁx any realization of the randomness of the algorithm (so that both F and R are ﬁxed). As a ﬁrst step, we show the existence of the injection h : (OPT Γ) \ R A with the following properties:

(a) h restricted to A OPT is the identity

(b) w(g) (1 + γ)w(h(g)) for all g OPT F

(c) w(g) w(h(g)) for all g OPT K

We prove the existence of such h by constructing a suitable graph G where we apply Lemma 4. The nodes of G are the elements of Γ, while the edges are created iteratively as the algorithm considers elements in A. Let u be a generic element arriving in Γ \ R, and let Au be the candidate solution at that point. If Au + u M, then no edge is created at that iteration; otherwise, a unique cycle C(Au + u) in Au + u containing u exists. Now, we have two cases: either u is swapped with some element x in C(Au + u) Au, in which case we create directed edges from x to each element in C(Au + u) x, or u is rejected. In the latter case, we create directed edges from u to each element in C(Au + u) u. Note that, in both cases, the out-neighborhood of the node that gets discarded spans the element corresponding to the node itself. It is also clear that the graph G generated at the end of the procedure is acyclic, as an out-edge is only created when an element is discarded towards elements that have not been discarded yet. Thus, the graph respects the assumption of Lemma 4 and there exists an injective function h that associates every vertex/element u (OPT Γ) \ R

Deletion Robust Non-Monotone Submodular Maximization over Matroids

to a sink h(u) that is reachable from u. From a diﬀerent perspective, this h is a charging function that associates each element in (OPT Γ) \ R with an element in the ﬁnal solution A that accounts for it. It is easy to see that h respects the three properties we want to enforce. For Property (a), function h restricted to A OPT is the identity as elements in A are sinks. For the other two properties, consider any element g OPT F. Element g has been discarded without being added to the solution, meaning that the outgoing edges from g verify w(g) (1 + γ)w(u), for all u δ+(g). If we focus on the unique path from g to h(g), we see that after the ﬁrst step, when the weight decreases by at most a (1 + γ) factor, all the following edges do not decrease the weight, this is because an outgoing edge from an element of the solution either points to an element it is swapped with (whose weight is larger by at least a (1 + γ) factor) or to another element in the solution but with larger weight (as we swap the element with smallest weight in the cycle). This settles Property (b). To see why Property (c) holds, note that the same argument implies that, for all the elements already in the solution that are later swapped (OPT K), the weight does not decrease along the path to the sink. We can use the properties of the injective function h to bound w(OPT F).

E[w(OPT F)] (1 + γ)E [w(h(OPT F))] (by Property (b))

(1 + γ)E [w(h(OPT F)) + w(h(OPT K)) w(OPT K))] (by Property (c))

= (1 + γ)E [w(h(OPT (F K)) w(OPT K) + w(OPT A) w(OPT A)]

= (1 + γ)E [w(h((OPT Γ) \ R)) w(OPT (A K)))] (by Property (a))

(1 + γ)E [w(A)] (1 + γ)(w(OPT (A K))) (17)

Observe that in the last inequality we use that w(a) 0 for all a A, as h((OPT Γ) \ R) may be a strict subset of A. We can ﬁnally proceed to prove the lemma.

E[f(OPT Γ | A K)]

x OPT F f(x | A K)

x OPT R f(x | A K)

(by submodularity)

E [w(OPT F)] + E [w(OPT R)] (by submodularity)

(1 + γ)E [w(A)] + 1 p

p 1 γ E [w(OPT (A K))] (by Equations (16) and (17))

=(1 + γ)E [w(A)] (by choice of p)

1 ε E [f(S)] . (as in Lemma 13)

Note that the second inequality follows from submodularity and the fact that the candidate solution when an element is considered (and thus used to compute its weight) is always contained in A K.

Plugging Equations (10) to (12) and (15) into Equation (9) we obtain the following: 1 1 2 + γ

f(OPT) E [f(A K)] + E [f(OPT L | A K)]

+ E [f(OPT B | A K)] + E [f(OPT Γ | A K)]

1 ε + β E [f(S)] + ε f(OPT)

D utting, Fusco, Lattanzi, Norouzi-Fard and Zadimoghaddam

Rearranging terms, one gets

f(OPT) 1 ε 1 2 + γ

1 2 + γ + 1

1 ε + β E [f(S)]

= (2 + γ)[γ2 + (β + 2)γ + 1 + ε(γ2 + (2 β)γ + 1)]

γ(1 ε)[1 + γ ε(γ + 2)] E [f(S)]

(2 + γ)[γ2 + (β + 2)γ + 1]

γ[1 + γ ε(γ + 2)] 1 + ε 1 ε E [f(S)]

The theorem follows by using as optimization routine the state-of-the-art algorithm for centralized (non-monotone) submodular maximization with matroid constraint (β 2.494 as in Buchbinder and Feldman (2024)) and then choosing γ (and accordingly p) as to minimize the multiplicative factor. In particular, if we set γ 1.732, then for any ﬁxed δ there exists a constant Gδ such that we get an approximation of the form (9.294+Cδ ε) for all ε in (0, δ).

As a corollary of the previous theorem, we get an improved approximation for monotone objectives. If the objective function is monotone, there is no need for the sampling procedure (as the value of OPT is trivially upper bounded by that of OPT A K). Moreover, we can use the more eﬀective greedy (Fisher et al., 1978) or continuous greedy (C alinescu et al., 2011) subroutine as Alg.

Corollary 15 Consider the problem of deletion robust monotone submodular maximization with matroid constraints in the streaming setting. For any constant δ (0, 1), there exists a constant Cδ such that for any ε (0, δ), a (5.582 + ε Cδ)-approximation algorithm with summary size and memory O(k + d

Proof The proof is similar to the one of Theorem 12. The only diﬀerence is that we do not need to subsample as f(OPT) is smaller than f(OPT A K) by monotonicity. Thus, we set p = 1 and γ = 1 and then Equation (9) becomes:

f(OPT) E [f(OPT A K)]

=E [f(A K)] + E [f(OPT L | A K)]

+ E [f(OPT B | A K)] + E [f(OPT Γ | A K)]

1 εE [f(S)] + εf(OPT) + β E [f(S)] + E [f(OPT Γ | A K)]

For the last term, we use a more general statement. For all X Γ, X M, then it holds that w(X) 2 w(A). This is a direct application of Theorem 1 of Varadaraja (2011) (using a single matroid and setting r = 2). More speciﬁcally, we can imagine restricting the stream to consider only the elements in Γ, with the order in which they are considered in line 13. Choosing X = OPT Γ, we then get that

E [f(OPT Γ | A K)] E [w(OPT Γ)] 2E [w(A)] .

Deletion Robust Non-Monotone Submodular Maximization over Matroids

We use as optimization routine Alg continuous greedy, therefore we can plug in β = e e 1 and, rearranging the terms we obtain

f(OPT) 2 1 + ε

(1 ε)2 + 3e 2 (1 ε)(e 1)

h e e 1 + 4 + 9e 8 δ(5e 4)

(e 1)(1 δ)2 | {z } Cδ

ε i E [f(S)] .

The last inequality can be numerically veriﬁed and holds for any ε (0, δ).

5. Conclusion

We present the ﬁrst space-eﬃcient constant-factor approximation algorithms for deletionrobust submodular maximization over matroids in both the centralized and the streaming settings. In particular, we are also the ﬁrst to design space-eﬃcient deletion robust algorithms for non-monotone objectives with any type of constraints.

Acknowledgments

Federico s work was supported by the FAIR project (funded by the Next Generation EU program within the PNRR-PE-AI scheme. M4C2, investment 1.3, line on Artiﬁcial Intelligence), the ERC Advanced Grant 788893 AMDROMA and PNRR MUR project IR0000013So Big Data.it. Part of this work was done while Federico was an intern at Google Research, hosted by Paul D utting.

Georgios Amanatidis, Federico Fusco, Philip Lazos, Stefano Leonardi, Alberto Marchetti Spaccamela, and Rebecca Reiﬀenh auser. Submodular maximization subject to a knapsack constraint: Combinatorial algorithms with near-optimal adaptive complexity. In ICML, volume 139 of Proceedings of Machine Learning Research, pages 231 242. PMLR, 2021.

Georgios Amanatidis, Federico Fusco, Philip Lazos, Stefano Leonardi, and Rebecca Reiﬀenh auser. Fast adaptive non-monotone submodular maximization subject to a knapsack constraint. J. Artif. Intell. Res., 74:661 690, 2022.

Dmitrii Avdiukhin, Slobodan Mitrovic, Grigory Yaroslavtsev, and Samson Zhou. Adversarially robust submodular maximization under knapsack constraints. In KDD, pages 148 156, 2019.

Francis R. Bach. Structured sparsity-inducing norms through submodular functions. In NIPS, pages 118 126, 2010.

D utting, Fusco, Lattanzi, Norouzi-Fard and Zadimoghaddam

Ramakrishna Bairi, Rishabh K. Iyer, Ganesh Ramakrishnan, and JeﬀA. Bilmes. Summarization of multi-document topic hierarchies using submodular mixtures. In ACL, pages 553 563, 2015.

Kiarash Banihashem, Leyla Biabani, Samira Goudarzi, Mohammad Taghi Hajiaghayi, Peyman Jabbarzade, and Morteza Monemizadeh. Dynamic constrained submodular optimization with polylogarithmic update time. In ICML, volume 202 of Proceedings of Machine Learning Research, pages 1660 1691. PMLR, 2023.

Kiarash Banihashem, Leyla Biabani, Samira Goudarzi, Mohammad Taghi Hajiaghayi, Peyman Jabbarzade, and Morteza Monemizadeh. Dynamic algorithms for matroid submodular maximization. In SODA, pages 3485 3533. SIAM, 2024.

Rafael da Ponte Barbosa, Alina Ene, Huy L Nguyen, and Justin Ward. A new framework for distributed submodular maximization. In FOCS, pages 645 654, 2016.

Ilija Bogunovic, Slobodan Mitrovic, Jonathan Scarlett, and Volkan Cevher. Robust submodular maximization: A non-uniform partitioning approach. In ICML, pages 508 516, 2017.

Niv Buchbinder and Moran Feldman. Constrained submodular maximization via a nonsymmetric technique. Math. Oper. Res., 44(3):988 1005, 2019.

Niv Buchbinder and Moran Feldman. Constrained submodular maximization via new bounds for dr-submodular functions. In STOC, pages 1820 1831. ACM, 2024.

Niv Buchbinder, Moran Feldman, Joseph Naor, and Roy Schwartz. Submodular maximization with cardinality constraints. In SODA, pages 1433 1452. SIAM, 2014.

Gruia C alinescu, Chandra Chekuri, Martin P al, and Jan Vondr ak. Maximizing a monotone submodular function subject to a matroid constraint. SIAM J. Comput., 40(6):1740 1766, 2011.

Amit Chakrabarti and Sagar Kale. Submodular maximization meets streaming: matchings, matroids, and more. Math. Program., 154(1-2):225 247, 2015.

Abhimanyu Das and David Kempe. Submodular meets spectral: Greedy algorithms for subset selection, sparse approximation and dictionary selection. In ICML, pages 1057 1064, 2011.

Abhimanyu Das, Anirban Dasgupta, and Ravi Kumar. Selecting diverse features via spectral regularization. In NIPS, pages 1592 1600, 2012.

Paul D utting, Federico Fusco, Silvio Lattanzi, Ashkan Norouzi-Fard, and Morteza Zadimoghaddam. Deletion robust submodular maximization over matroids. In ICML, volume 162 of Proceedings of Machine Learning Research, pages 5671 5693. PMLR, 2022.

Paul D utting, Federico Fusco, Silvio Lattanzi, Ashkan Norouzi-Fard, and Morteza Zadimoghaddam. Fully dynamic submodular maximization over matroids. ACM Trans. Algorithms, 21(1):11:1 11:23, 2025.

Deletion Robust Non-Monotone Submodular Maximization over Matroids

Alina Ene, Huy L Nguyen, and Adrian Vladu. Submodular maximization with matroid and packing constraints in parallel. In STOC, pages 90 101, 2019.

Alessandro Epasto, Silvio Lattanzi, Sergei Vassilvitskii, and Morteza Zadimoghaddam. Submodular optimization over sliding windows. In WWW, pages 421 430, 2017.

Uriel Feige. A threshold of ln n for approximating set cover. J. ACM, 45(4):634 652, 1998.

Uriel Feige, Vahab S. Mirrokni, and Jan Vondr ak. Maximizing non-monotone submodular functions. In FOCS, pages 461 471. IEEE Computer Society, 2007.

Uriel Feige, Vahab S. Mirrokni, and Jan Vondr ak. Maximizing non-monotone submodular functions. SIAM J. Comput., 40(4):1133 1153, 2011.

Moran Feldman, Joseph Naor, and Roy Schwartz. A uniﬁed continuous greedy algorithm for submodular maximization. In FOCS, pages 570 579. IEEE Computer Society, 2011.

Moran Feldman, Amin Karbasi, and Ehsan Kazemi. Do less, get more: Streaming submodular maximization with subsampling. In Neur IPS, pages 730 740, 2018.

Moran Feldman, Paul Liu, Ashkan Norouzi-Fard, Ola Svensson, and Rico Zenklusen. Streaming submodular maximization under matroid constraints. In ICALP, volume 229 of LIPIcs, pages 59:1 59:20. Schloss Dagstuhl - Leibniz-Zentrum f ur Informatik, 2022.

Marshall L Fisher, George L Nemhauser, and Laurence A Wolsey. An analysis of approximations for maximizing submodular set functions ii. In Polyhedral combinatorics, pages 73 87. Springer, 1978.

Shayan Oveis Gharan and Jan Vondr ak. Submodular maximization by simulated annealing. In SODA, pages 1098 1116. SIAM, 2011.

Daniel Golovin and Andreas Krause. Adaptive submodularity: Theory and applications in active learning and stochastic optimization. J. Artif. Intell. Res., 42:427 486, 2011.

P. Hall. On representatives of subsets. Journal of the London Mathematical Society, s1-10 (1):26 30, 1935.

Ehsan Kazemi, Morteza Zadimoghaddam, and Amin Karbasi. Deletion-robust submodular maximization at scale. Co RR, abs/1711.07112, 2017.

Ehsan Kazemi, Morteza Zadimoghaddam, and Amin Karbasi. Scalable deletion-robust submodular maximization: Data summarization with privacy and fairness constraints. In ICML, volume 80, pages 2549 2558, 2018.

Andreas Krause, H Brendan Mc Mahan, Carlos Guestrin, and Anupam Gupta. Robust submodular observation selection. J. Mach. Learn. Res., 9(12), 2008.

Silvio Lattanzi, Slobodan Mitrovic, Ashkan Norouzi-Fard, Jakub Tarnawski, and Morteza Zadimoghaddam. Fully dynamic algorithm for constrained submodular optimization. In Neur IPS, 2020.

D utting, Fusco, Lattanzi, Norouzi-Fard and Zadimoghaddam

Hui Lin and JeﬀBilmes. A class of submodular functions for document summarization. In HLT, pages 510 520, 2011.

Vahab Mirrokni and Morteza Zadimoghaddam. Randomized composable core-sets for distributed submodular maximization. In STOC, pages 153 162, 2015.

Baharan Mirzasoleiman, Amin Karbasi, Rik Sarkar, and Andreas Krause. Distributed submodular maximization: Identifying representative elements in massive data. In NIPS, pages 2049 2057, 2013.

Baharan Mirzasoleiman, Amin Karbasi, and Andreas Krause. Deletion-robust submodular maximization: Data summarization with the Right to be Forgotten . In ICML, pages 2449 2458, 2017.

Slobodan Mitrovic, Ilija Bogunovic, Ashkan Norouzi-Fard, Jakub Tarnawski, and Volkan Cevher. Streaming robust submodular maximization: A partitioned thresholding approach. In NIPS, pages 4557 4566, 2017.

Morteza Monemizadeh. Dynamic submodular maximization. In Neur IPS, 2020.

James B Orlin, Andreas S Schulz, and Rajan Udwani. Robust monotone submodular function maximization. Math. Program., 172(1):505 537, 2018.

Alexander Schrijver. Combinatorial optimization: polyhedra and eﬃciency, volume 24. Springer Science & Business Media, 2003.

Matthew Staib, Bryan Wilder, and Stefanie Jegelka. Distributionally robust submodular maximization. In AISTATS, pages 506 516, 2019.

Ashwinkumar Badanidiyuru Varadaraja. Buyback problem - approximate matroid intersection with cancellation costs. In ICALP, volume 6755, pages 379 390, 2011.

Guangyi Zhang, Nikolaj Tatti, and Aristides Gionis. Coresets remembered and items forgotten: submodular maximization with deletions. In ICDM, pages 676 685. IEEE, 2022.

Junzhou Zhao, Shuo Shang, Pinghui Wang, John C. S. Lui, and Xiangliang Zhang. Submodular optimization over streams with inhomogeneous decays. In AAAI, pages 5861 5868, 2019.

Jingjing Zheng, Zhuolin Jiang, Rama Chellappa, and P. Jonathon Phillips. Submodular attribute selection for action recognition in video. In NIPS, pages 1341 1349, 2014.