# streaming_submodular_maximization_with_differential_privacy__3311eeb4.pdf Streaming Submodular Maximization with Differential Privacy Anamay Chaturvedi * 1 Huy L. Nguyen * 1 Thy Nguyen * 1 In this work, we study the problem of privately maximizing a submodular function in the streaming setting. Extensive work has been done on privately maximizing submodular functions in the general case when the function depends upon the private data of individuals. However, when the size of the data stream drawn from the domain of the objective function is large or arrives very fast, one must privately optimize the objective within the constraints of the streaming setting. We establish fundamental differentially private baselines for this problem and then derive better trade-offs between privacy and utility for the special case of decomposable submodular functions. A submodular function is decomposable when it can be written as a sum of submodular functions; this structure arises naturally when each summand function models the utility of an individual and the goal is to study the total utility of the whole population as in the well-known Combinatorial Public Projects Problem. Finally, we complement our theoretical analysis with experimental corroboration. 1. Introduction Consider the task of a service provider that trains machine learning models for a set of users, e.g., platforms such as Amazon Sagemaker and Microsoft Azure. In many cases, collecting features can be costly and the service provider has to select a limited number of features for their models. For instance, given a data set of health attributes of a number of individuals, different users may want to predict the likelihood of different diseases, and a subset of features may be useful for the illnesses of some patients but extraneous *Equal contribution 1Khoury College of Computer Science, Northeastern University, Boston, USA. Correspondence to: Anamay Chaturvedi , Huy L. Nguyen , Thy Nguyen . Proceedings of the 40 th International Conference on Machine Learning, Honolulu, Hawaii, USA. PMLR 202, 2023. Copyright 2023 by the author(s). for others. A common metric used to measure the utility of a set of features is the mutual information between the chosen set of features and the empirical likelihood of the diseases of interest under the Naive Bayes assumption. This function is known to exhibit a diminishing returns property and is submodular (Krause & Guestrin, 2005b). Definition 1.1 (Submodular functions). Let V be a finite set and f a function mapping from 2V to R. For all S V , we say that f(S) is the utility achieved by set S. For every element e V and set S V , the marginal utility of e over S (denoted f(e|S)) is the gain in utility provided by e if we add it to the set S; i.e. f(e|S) = f({e} S) f(S). The function f is called submodular if it has the diminishing returns property, i.e., for every pair of subsets S T V and element e not in T, f(e|T) f(e|S). A submodular function is called monotone if for all S T V , f(S) f(T). We let OPT denote any arbitrary utility-maximizing subset of V , potentially subject to additional constraints. The problem of maximizing a submodular function subject to constraints on the subsets of V one is allowed to pick occurs across domains like computer science, electrical engineering (Narayanan, 1997), and economics (Dobzinski & Schapira, 2006). In theoretical computer science, submodular maximization is a fundamental instance of combinatorial optimization, generalizing problems like Max-Cut and Facility Location (Schrijver et al., 2003). On the other hand, submodular maximization has also been applied concretely to numerous problems in machine learning such as feature selection for classification (Krause & Guestrin, 2005a), influence maximization in social networks (Kempe et al., 2003), document and corpus summarization (Lin & Bilmes, 2011), result diversification in recommender systems (Parambath et al., 2016) and exemplar based clustering (Dueck & Frey, 2007) see Mitrovic et al. (2017) and Chaturvedi et al. (2021) for more references. In many of these applications, one must publicly release a solution to a submodular maximization problem in which the information used to perform the optimization is private. Consider the example above where private data of patients is used to compute the utility of different sets of features. The solution depends on private information, and may reveal too much about the private data set. One way this can occur is when a relatively rare feature is picked, adversaries with Streaming Submodular Maximization with Differential Privacy side-information (such as where the data was collected) can then de-anonymize records in the private data set or deduce who participated in the data collection. Situations like these motivate the problem of differentially private submodular maximization, as first considered in Gupta et al. (2010). Differential privacy (DP) (Dwork et al., 2006b) is a property that when satisfied by an algorithm, allows one to promise strong information-theoretic guarantees limiting how much an adversary might learn about the private data set based on the output of that algorithm run on the data set. Definition 1.2 (Differential privacy, Dwork et al. (2006b)). Let X be the set of all possible data points. We say that a pair of data sets A, B drawn from X are neighbouring (denoted A B) if they differ in at most one data point I, so for instance A = B {I}. We say that an algorithm A mapping from data sets derived from X to some output codomain Y is (ε, δ)-differentially private for some ε, δ > 0 if for all pairs of neighbouring data sets A, B and all Y Y, P(A(A) Y ) eεP(A(B) Y ) + δ. The definition above says that the likelihood of any set of outcomes Y of an (ε, δ)-DP algorithm can vary by at most an eε multiplicative factor and an additive δ term if we were to add or drop any one point from our data set. For a given choice of privacy parameters ε and δ, (ε, δ)-DP is typically achieved by adding appropriately scaled noise to obfuscate sensitive values in the course of the computation, and may result in trading off some of accuracy to achieve the desired level of privacy. Such trade-offs have been shown to be intrinsic to achieving differential privacy for many problems, but a careful accounting for the noise added and the privacy gained can let one significantly improve this trade-off. In practice, one picks the value of ε to be a small constant that depends on the desired trade-off between privacy and utility. It is typically required that δ = o(1/n) to avoid the pathological case of completely revealing one person s data and claiming that privacy is achieved. Although there is an extensive line of work in privately maximizing submodular functions (Gupta et al., 2010; Mitrovic et al., 2017; Chaturvedi et al., 2021; Rafiey & Yoshida, 2020; Sadeghi & Fazel, 2021), as far as we know there is no work on doing so in the streaming setting. In this setting, the elements that may be added to the final solution set arrive in a stream such that either the stream length is significantly greater than the disk space available to the optimization algorithm, or the algorithm must make a decision immediately whether to retain this item for possible inclusion in the solution set (at some cost) or to reject it outright. Submodular maximization under streaming has been studied extensively (Gomes & Krause, 2010; Kumar et al., 2013; Badanidiyuru et al., 2014). Most notably, a (1 θ)/2-approximation algorithm, that retains only O( k log k θ ) many elements, was introduced by Badanidiyuru et al. (2014) for the problem of streaming submodular maximization. This algorithm is near-optimal as it is known that one cannot do better than an approximation factor of 1/2 (Feldman et al., 2020). Problem statement: In this work, we consider the problem at the intersection of these two lines of inquiry, i.e. submodular maximization in the streaming setting under the constraint of differential privacy. For every possible private data set A there is a corresponding monotonic (nondecreasing) submodular function f A, a public stream of elements V of length n, and a cardinality constraint k, and we want to find a subset S of V with cardinality at most k that achieves utility close to f A(OPT) in the streaming setting. Following previous work (Gupta et al., 2010; Mitrovic et al., 2017), we assume a known bound on the sensitivity of f A to A, i.e. for any A B, max S V |f A(S) f B(S)| λ. Without a bound on the sensitivity, f could be determined completely by a single user and either their privacy or the quality guarantee would have to fail. Even when the sensitivity is at most 1, our lower bound below demonstrates that we are bound to incur non-trivial additive error; it follows that in the case of unbounded sensitivity there can be no (ε, δ)-differentially private algorithm with a non-trivial utility guarantee. 1.1. Contributions General monotone submodular functions. The starting point for our work is the non-private algorithm of Badanidiyuru et al. (2014) for submodular maximization in the streaming setting, which we would like to privatize. The algorithm is given the submodular function f to be maximized, a cardinality constraint of k, and a guess O for the optimal utility f(OPT). For every stream element e V , the algorithm adds e to the solution set S if the marginal utility provided by e exceeds the value O/2k. Algorithm 1 Streaming algorithm for monotone submodular maximization subject to a cardinality constraint, Badanidiyuru et al. (2014) input Monotonic submodular function f, cardinality constraint parameter k, element stream V , approximation parameter θ, estimate O of the optimum cost f(OPT) S for e V do if f(e|S) O/(2k) and |S| < k then S S {e} end if end for output S The authors showed that this algorithm satisfies the follow- Streaming Submodular Maximization with Differential Privacy ing utility guarantee: Theorem 1.3 (Badanidiyuru et al. (2014)). The final solution S satisfies f(S) min{O/2, f(OPT) O/2}. We see that if O = f(OPT), then we immediately get a 1/2-approximation algorithm. More generally if f(OPT) [E, m] for some E, m R>0, we can run multiple copies of this algorithm in parallel with a set of geometrically scaling guesses O = {E, (1 + θ)E, . . . , (1 + θ) log1+θ m E E, m}. It then follows that there is some O O which is within a (1 θ) multiplicative factor of f(OPT) leading to a net 1 θ 2 approximation. Since we must maintain and update a solution set SO for each O O, achieving this guarantee requires that we pay a spatial overhead of 2 log m/E One way of privatizing this algorithm is to appeal to the celebrated sparse vector technique. Given a sequence of sensitive yes/no queries (as is the case here where either an element is added or not added), the sparse vector technique provides a way of privatizing this sequence of checks with surprisingly little noise and low error. We see that the value f(e|S) is the only private quantity accessed in the submodular streaming maximization algorithm. The sparse vector technique adds independently sampled noise values drawn from a Laplace distribution with standard deviation O( k/ε) to the value of f(e|S) and O/2k before making the threshold check. This can be justified by showing that the net privacy lost scales only with number of elements added to the solution set (i.e. at most k). Formalizing this outline leads to the following result: Theorem 1.4. Given query access to a monotone submodular function f A with sensitivity λ taking values in [0, m], and an input stream V of length n, there exists an algorithm that when given a cardinality constraint of k, an approximation parameter θ, a failure probability η, and privacy parameters ε < 1 and δ, is (ε, δ)-DP and achieves utility at least (1 θ)f(OPT) λk1.5 log1.5 nk with probability 1 η, where OPT is any arbitrary optimal solution for the non-private submodular maximization problem and E = min{ k log n ε , m/2} and the total number of elements retained in memory is O( k log m/E θ ). Further, this algorithm operates in just one pass. This result serves as a differentially private baseline for general monotone submodular functions - it achieves a similar multiplicative approximation factor as the non-private algorithm, but suffers significant additive loss. Minimizing this additive loss is key to achieving better utility for private streaming submodular maximization. Are there reasonable assumptions under which we can achieve even lower additive loss whilst preserving privacy? Decomposable monotone submodular functions. In the non-streaming setting, Gupta et al. (2010) showed that when the private submodular function to be maximized is λdecomposable, i.e., it can be written as a sum of monotonic submodular functions taking values in [0, λ] each corresponding to the data of one individual, then a much smaller amount of noise needs to be added for DP than in the general case. p A f{p}(S) For each agent p, the function f{p} is monotone submodular and takes values in [0, λ], which automatically implies f A has sensitivity λ. Gupta et al. (2010) achieved their result by picking k elements in sequence that approximately maximize the marginal utility that they return. To preserve privacy, these picks are made through the exponential mechanism, which randomly selects an element with probability proportional to its marginal utility. The authors performed an intricate analysis to show that the net privacy loss is in fact independent of the number of elements picked. In the streaming setting, however, the exponential mechanism cannot be applied as the algorithm only has access to the marginal utility of the current element in the stream at any time whereas executing the exponential mechanism requires knowledge of the marginal utility values of all elements (in particular elements that have not yet been seen by the algorithm). This limited access to the marginal utility values make the previous approach inapplicable, and therefore leads us to investigate the following question: Is possible to achieve a better utility in the streaming setting for monotone submodular functions that are decomposable? Our main result answers this question in the affirmative. To achieve this result we move beyond prior work in streaming with privacy and show that even when picking arbitrarily many elements k from an arbitrarily long stream of length n, remarkably it still suffices to add privatizing noise with just constant magnitude to the threshold check for each element. The core algorithmic change made is that we sample privatizing noise from the Gumbel distribution instead of the Laplace distribution. Although the Gumbel distribution has been used before in DP to simulate the exponential mechanism, its use in this manner to achieve privacy in the streaming setting is as far as we know entirely novel. Theorem 1.5. Given query access to a λ-decomposable monotone submodular function f A with m summands over a stream V of length n, there exists an algorithm that when given a cardinality constraint of k, an approximation parameter θ, a failure probability η, and privacy parameters Streaming Submodular Maximization with Differential Privacy ε < 1 and δ, achieves utility at least (1 θ)f(OPT) λk log2 m Eεδθ log 2nk log m/E with probability 1 η. Here E = min{ k log n ε , m/2} and the total number of elements retained in memory is O( k log m/E θ ). Further, this algorithm operates in just one pass. The privacy analysis of this result turns out to be quite involved and we give more detail below. The multiplicative approximation factor equals the (1 θ)/2-approximation factor of the non-private setting. However, similar to the private non-streaming setting in Chaturvedi et al. (2021) with matroid constraints as well as non-monotone objectives, we see a trade-off between the multiplicative and additive terms which we can tune via the multiplicative approximation parameter θ. Near-optimality. To show that the additive error term has the optimal dependence on k/ε (up to logarithmic terms) we also extend previous lower bounds by Gupta et al. (2010) for private submodular maximization from the (ε, 0) to (ε, δ)- setting. The proof proceeds similarly to that of the lower bound of Nguyen et al. (2021) for k-means clustering in the (ε, δ) setting, and the formal statement is as follows: Theorem 1.6. For all 0 ε, δ 1, k N, n k eε 1 δ , and c 4δ eε 1, if an (ε, δ)-DP algorithm for the submodular maximization problem for decomposable objectives achieves a multiplicative approximation factor of c, it must incur additive error Ω((kc/ε) log(ε/δ)). The gap in the exponent of k between the upper bound on the additive loss for general λ-sensitive submodular functions and the lower bound occurs in the non-streaming setting as well (compare Mitrovic et al. (2017) and our lower bound Theorem 1.6), and the assumption under which this gap is closed is the same as ours - i.e. λ-decomposability. Further, as we have observed before, the multiplicative approximation factor that we achieve is essentially the same as the non-private setting. In this sense our results are tight with the state of the art for both general private submodular maximization and streaming submodular maximization. We also find an improvement for decomposable submodular functions by using Gumbel noise instead of Laplace noise in practice by conducting experiments comparing their performance. We include all complete proofs and omitted technical details in the appendix. 1.2. Related work Papadimitriou et al. (2008) introduced the Combinatorial Public Projects problem (CPPP) - given a set A of m agents and n resources, and a private submodular utility function f{p} for every agent p the solver must coordinate with the agents to maximize f A := P p A f{p}, i.e. the sum of their utilities. This problem captures public welfare maximization and is interesting theoretically because in this setting agents are incentivized to lie to the solver and over-represent the utility they may individually derive from the resources that are picked for the group. The authors showed that unless NP BPP, there is no truthful and computationally efficient algorithm for the solver to achieve an approximation ratio better than n1/2 ϵ for any ϵ > 0. Gupta et al. (2010) were the first to consider the problem of differentially private submodular maximization. They showed that it is possible to privately optimize the objective in CPPP while losing an amount of privacy that is independent of k, the number of items picked, and achieved essentially optimal additive error. Since (ε, δ) privacy implies approximate (2ε + δ)-truthfulness, their result showed that a slight relaxation of the truthfulness condition considered by Papadimitriou et al. (2008) allowed constant factor approximation if the optimal utility was not too low. They also showed that optimizing a submodular function ε-privately under a cardinality constraint of k over a ground set of n elements must suffer additive loss Ω(k log n/k). Mitrovic et al. (2017) considered the more general case of private submodular maximization when the objective function has bounded sensitivity. They were motivated by the problem of feature selection under the constraint of differential privacy. They proposed algorithms for both general monotone and non-monotone objectives with bounded sensitivity, and provided extensions to matroid and p-extendable systems constraints. Although the error guarantees of their results match those of Gupta et al. (2010) in the case of decomposable functions, for the case of general monotone and non-monotone objectives they get higher additive error. Chaturvedi et al. (2021) extended the results of Gupta et al. (2010) from cardinality to matroid constraints, and from monotone to the non-monotone setting. They achieved this by adapting the Continuous Greedy algorithm of C alinescu et al. (2011) and the Measured Continuous Greedy algorithm of Feldman et al. (2011). They also made a small fix to the privacy analysis of Gupta et al. (2010) resulting in a weaker bound (by a constant factor) on the privacy loss. Rafiey & Yoshida (2020) also studied the problem of private submodular and k-submodular maximization subject to matroid constraints, and achieved the same multiplicative approximation factor as Chaturvedi et al. (2021) for monotone submodular maximization. In this work privacy and time-efficiency were optimized at the cost of higher additive error. Sadeghi & Fazel (2021) made further progress on private Streaming Submodular Maximization with Differential Privacy monotone submodular maximization for submodular functions with total curvature at most κ [0, 1] by achieving a (1 κ/e)-approximation algorithm and lower query complexity than the previous works. Salazar & Cummings (2021) considered a variant of this line of work wherein a sequence of private submodular functions defined over a common public ground set are processed, and at every iteration a set of at most k elements from the ground set must be picked before the function is observed. Here the goal is regret minimization, and the authors introduced differentially private algorithms that achieve sub-linear regret with respect to a (1 1/e)-approximation factor in the full information and bandit settings. 2. Preliminaries We first make a couple of simplifying technical observations. Remark 2.1. Following the setup of the DP set cover problem in Gupta et al. (2010), we assume that there is a publicly known upper bound on the number of agents which we denote by m; as the dependence of the error and space on m will be seen to be logarithmic, even a loose upper bound works well. Alternatively, we can allocate a small fraction of our privacy budget to privatize the number of agents m via the Laplace mechanism (Lemma 2.6). Remark 2.2. It will suffice to reason about functions with sensitivity 1, and 1-decomposable functions. For the more general case where the submodular function f has sensitivity λ or is λ-decomposable, we can reason about f/λ which ensures that our assumptions hold. We will use the following basic and advanced composition laws to reason about the privacy loss that occurs when multiple differentially private mechanisms are used in a modular fashion as subroutines. We follow the formulation in Dwork & Roth (2014). Theorem 2.3 (Basic composition, Dwork et al. (2006a); Dwork & Lei (2009)). If Mi is (εi, δi)-differentially private for i = 1, . . . , k, then the release of the outputs of all k mechanisms is (Pk i=1 εi, Pk i=1 δi)-differentially private. Theorem 2.4 (Advanced composition, Dwork et al. (2010)). For all ε, δ, δ > 0, given a set of (ε, δ)-differentially private mechanisms, if an adversary adaptively chooses k mechanisms to run on a private data set, then the tuple of responses is (ε , kδ + δ )-differentially private for ε = p 2k ln(1/δ )ε + kε(eε 1). In particular, for ε < 1, if ε = ε 2k ln(1/δ), then the net k-fold adaptive release is (ε, (k + 1)δ)-differentially private. We will appeal to the following private mechanism used to choose an element from a public set based on a private score function of bounded sensitivity. Lemma 2.5 (Exponential Mechanism, Mc Sherry & Talwar (2007)). Let q : X V R be a score function for a public domain V that depends on the private input data set drawn from X , i.e. q(A, v) is the score of v V for the data set A X . Let q = max A B maxv V |q(A, v) q(B, v)| be the sensitivity of q, i.e. the maximum possible change in the value of the score of an element for neighboring data sets. The exponential mechanism M samples v V with probability exp(εq(A, v)/(2 q)) and outputs v . The exponential mechanism is ε-differentially private. Further, for a finite set V , we have that with probability 1 γ, q A(v ) max v V q A(v) 2 q Given a real-valued function with bounded sensitivity, the Laplace mechanism can be used to privatize the value taken by that function on the private input data set. Lemma 2.6 (Laplace mechanism, Dwork et al. (2006b)). Given a function f : 2X R such that max A B |f(A) f(B)| f, the mapping A 7 f(A) + α where α Lap( f/ε) is ε-differentially private. Here Lap(σ) denotes the standard Laplace distribution with scale parameter σ. We will also draw random values from the Gumbel distribution for improved privacy guarantees. We recall the distribution function for later use. Definition 2.7 (Gumbel distribution). The Gumbel distribution is defined on R. When the mean is µ and the scale parameter is γ, the distribution is defined by its CDF F(x) = exp e (x µ)/γ , or alternatively its density function γ exp (x µ)/γ + e (x µ)/γ . 2.1. The sparse vector technique Following Dwork & Roth (2014), we recall the pseudocode and utility guarantee of the sparse vector technique. This result is attributed to Dwork et al. (2009); we refer the reader to the end of chapter 3 of Dwork & Roth (2014) for a more comprehensive discussion. Theorem 2.8 (Dwork et al. (2009)). For σ = δ )/ε, Dα = Lap(σ) and Dβ = Lap(2σ), Algorithm 2 is (ε, δ)-DP. Further, with probability 1 β, for all queries i, if ˆνi is the privatized query value fi + νi, then |ˆνi νi| (ln n + ln 4c Streaming Submodular Maximization with Differential Privacy Algorithm 2 Sparse, Dwork et al. (2009); Dwork & Roth (2014) input Arbitrary (possibly adaptive) stream of sensitivity 1 queries f1, f2, . . . , a threshold T, a cutoff point k, threshold noise Dα, score noise Dβ. α Dα Let ˆTi = T + α for i {0, . . . , k 1} Let count = 0 for query ei do Let βi Dβ if fi + βi ˆTcount then ai = count = count + 1 else ai = end if if count k then Halt end if end for output Stream of yes/no outputs (a1, a2, . . . ) 3. Private streaming with Laplace noise Our main algorithm for maximizing submodular monotone functions in both the general and decomposable cases is Algorithm 3. At a high level we would like to adapt the nonprivate submodular streaming algorithm of Badanidiyuru et al. (2014) to the private setting by privatizing the check that is made when adding an element e to the solution set S. Doing so essentially gives us an instantiation of Algorithm 2 wherein the sensitivity 1 queries are the values f(e|S), the marginal utilities of the stream elements for the private function over the solution set picked thus far, and the threshold being compared with is O/2k, where O is a guess for the optimal utility OPT. In the case where f is monotone submodular and has sensitivity 1 but is not decomposable, Algorithm 3 sets the privatizing noise distributions Dα and Dβ directly according to the analysis of the sparse vector technique, i.e. Theorem 2.8. As in the non-private case, since the value OPT is not known, we let O vary over a set O = {E, (1+θ)E, . . . , m}; it follows that |O| = log1+θ m E + 1, this is denoted T in the pseudo-code. In this case the ideal choice of E is the lowest possible additive error that we can achieve in the DP setting, and can be set to be equal to the lower bound which we derive in Theorem 1.6. Indeed, if f(OPT) < E, it can be shown that the utility guarantees hold trivially. Further, we observe that our net privacy budget of (ε, δ) must be split across these T-many calls to Algorithm 2 by the composition laws of privacy described in the preliminary (for a stronger theoretical guarantee one appeals to the advanced composition rule); this is how we derive our expression for ε which is the privacy parameter passed in each call to Algorithm 2. The stream V can then be processed in parallel for each guess O O. At the end of the stream we see that we have some T-many solutions SO corresponding to different guesses O O for OPT. To choose a final solution one again has to access the private values f A(SO), and to account for this we appeal to the exponential mechanism to choose a candidate SO that achieves near-maximal utility among the set of guesses {SO : O}. Applying the composition laws for privacy across all calls to private mechanisms and accounting for the additive error introduced by appealing to the sparse vector technique one derives Theorem 1.4. Algorithm 3 Private streaming submodular maximization input Monotone submodular function f A, cardinality constraint parameter k, failure probability β, privacy parameters (ε, δ), element stream V , approximation parameter θ Let E = min n k log n Let T = log1+θ m E + 1 if f A has sensitivity 1 but is not 1-decomposable then Let ε = ε 4 2T ln((T +1)/δ) 32k ln T +1 δ ε Let Dα = Lap(σ) Let Dβ = Lap(2σ) else if f A is 1-decomposable then T log1.5 T/δ (exact expression in Lemma 4.2) Dα = Gumb(γ) Dβ = Gumb(γ) end if Let O = {E, (1 + θ)E, (1 + θ)2E, . . . , (1 + θ) log1+θ m E E, m} for ei V do for all O O do SO Algorithm 2 (Query stream (f A(e|SO))e V , threshold O 2k, cutoff k, Dα, Dβ) where ai = add element ei to SO ai = reject element ei end for end for SO Exponential Mechanism({SO : O O}, f A( ), privacy parameter ε/2) (in the notation of Lemma 2.5, q(A, SO) = f A(SO)) output SO Streaming Submodular Maximization with Differential Privacy 4. Private streaming with Gumbel Noise In the setting where the given objective is decomposable, we show that by setting our privatizing noise distributions Dα and Dβ equal to a Gumbel distribution with an appropriate scale parameter, we can reduce the dependence of the additive error on k/ε to what is asymptotically the best possible. As the utility analysis in this case is identical to that before, we focus on just the privacy analysis in this section. We fix some arbitrary guess O for OPT. For any possible output set S = {ei1, . . . , eik}, we want to bound the ratio Pr[A(f A) = S]/ Pr[A(f B) = S], where A = B I (the other case B = A I turns out to be relatively straightforward). The output S = {vi1, . . . , vik} is equivalent to a stream of outputs a1, a2, . . . a|V |, where ai = denotes that the element ei was picked, and ai = denotes rejection. For every index i, let Si denote the set of elements already accepted when the element ei is being processed. We have the following technical lemma: Lemma 4.1. Conditioned on having picked the set Si of elements with |Si| = u < k, Pr[ai = |Si] = Z 1 exp w A(i|Si)eαu/γ dα|Si|, w A(i|Si) = exp 1 2k f A(ei|Si) , and αu Gumbel(0, γ). We think of the term w A(i|Si) as the weight of element ei; we see that the probability that ei is accepted increases as its weight increases. By appealing to this lemma iteratively and integrating over the noise variables αu, it is possible to show that Pr[A(A) = S] = Qr u=1 Xu where Xu equals w A(iu|Siu) 1 + P j (iu 1,iu) w A(j|Siu) 1 + P j (iu 1,iu] w A(j|Siu) . From some elementary algebra it then follows that Pr[A(f A) = S] Pr[A(f B) = S] r Y j (iu 1,iu) w A(j|Siu) 1 + P j (iu 1,iu) w B(j|Siu) We see that by definition w A(j|Siu) = w B(j|Siu) exp f{I}(ej|Siu)/γ , whence we can write j (iu 1,iu) w A(j|Siu) j (iu 1,iu) w B(j|Siu) = E j Pu[exp f{I}(ej|S)/γ ]. Here Pu is the distribution on (iu 1, iu) that picks element j with probability w B(ej|Siu), and no element at all with probability 1. At a high level, what we show is that the increase in the likelihood of an element eiu being picked can be bounded in terms of the expectation of a function of the marginal gain of ej over Siu for the agent I when ej is drawn according to Pu. We denote this expectation term Yu, whence we can write Pr[A(f A) = S] Pr[A(f B) = S] r Y We use this upper bound to show that for most sets S that are likely to be picked by the algorithm given the function f B (i.e. with probability 1 δ), Qk u=1 Yu and consequently the likelihood of S under f A is not too large. Since the submodular function f I takes values in [0, 1], intuitively we expect the product of the Yu to telescope over u and concentrate strongly. We also see that if Yu is large, then the likelihood of some element with its index in the interval (iu 1, iu) being picked is relatively high (note that although the distribution according to which the next element is picked and Pu are not identical, they are similar and closely related). In particular, it should be unlikely that the algorithm picks sequence of elements (ei1, . . . , eik) for which this product is large. However, to formally derive a privacy guarantee from this outline there are many technical challenges that need to be resolved. The distribution over which the expectation term of interest depends on the set of elements already picked. We formalize this situation by defining a multiround probabilistic process and proceeding by induction. The expectation term derived above does not admit a useful concentration bound directly, so instead we must analyze a O(1/ε)th-moment of the net privacy loss. The formal statement that we derive is as follows: Lemma 4.2. Consider the following k-round probabilistic process. Let vj := w B(iu 1 + j|Siu). In each round u, it is the case that the set of elements Siu = {i1, . . . , iu 1} has been picked, and the element iu = j + iu 1 is picked with probability pj = 1 1 + v1 + + vj 1 vj 1 + v1 + + vj . Then, for each q = 1, . . . , r, for a value of c = γ/4 = 2 ε ln 2 log 2 εδ, the following bound holds: u=q Y c u |Siq ε(1 f{p}(Siq)). We see that the right hand side of the bound above has a telescoping form similar to what we expected - the higher the utility of the current set Siq, the lower is the expected product of the subsequent Yu terms for u > q. Streaming Submodular Maximization with Differential Privacy We recall that we bounded from above the ratio of proba- bilities, Pr[A(f A) = S]/ Pr[A(f B) = S] by Qk u=1 Yu 2 . We can now apply Markov s inequality on the c-th exponent of this random variable so as to exploit the bound on the expectation that we have derived; this completes the core of our privacy analysis. Similar to before we must account for the T many calls to Algorithm 2 as well as the exponential mechanism and this is identical to the case with Laplace noise. 5. Experiments In this section, we empirically evaluate our approach on a practical instance of private submodular maximization, k-medians clustering 1. Given a set of points V , a metric d : V V R, a private set of demand points P V , the objective of the k-medians problem is to select a set of points S V , |S| k to minimize cost(S) = P p P d(p, S), where d(p, S) = mins S d(p, s). An application of his problem is allocating relatively few (k) service centers to be able to reach a large set of clients (P) and ensure that there is at least one service location not too far from most clients; when the clients locations are private but the service locations are public, this problem requires a differentially private solver. We can optimize this objective by casting it into the following submodular maximization problem: max S V,|S| k P p P 1 d(p, S)/G, where G is a normalization constant so that fp(S) = 1 d(p, S)/G [0, 1]. Setting d(p, ) = G, it can be checked that cost(S) is a monotone decomposable submodular function. We compare the performance of Algorithm 3 with Laplace and Gumbel noise on two data sets. First, following Mitrovic et al. (2017); Chaturvedi et al. (2021) we use the Uber data set (Five Thirty Eight, 2019) of Uber cab pick ups in Manhattan for the month of April 2014; the goal is to allocate public waiting locations for Uber cabs so as to serve requests from clients within short distances. Second, we construct a synthetic dataset in R2 by generating clients P from a mixture of 3 Gaussian distributions, each with identity covariance matrix and mean chosen uniformly at random from a bounding box of [20] [20]. We sample 15000 points for one Gaussian distribution, and 2500 points for each of the other two. For both settings, we set d( , ) to be the ℓ1 or Manhattan distance, i.e. d(a, b) = |a1 b1|+|a2 b2|. We set V to be a 50 50 2-D grid of points uniformly spanning the rectangular domain. We compare our two algorithms with an approach that selects k Random points from the stream as a differentially 1The code used to run all experiments may be found at www.github.com/thydnguyen/Priv Submodular Opt. All experiments were performed on a PC with 5.2 GHz i9 chip and 64 GB RAM. private baseline and the Non-private algorithm 1. For both data sets, we set δ = 1/|P|1.5 and θ = 0.2. In Figure 3 and Figure 6 we graph the clustering cost versus the cardinality constraint k on the Taxi and Synthetic data sets respectively. We also tabulate the exact numerical values recorded in the appendix. We measure and report the mean and standard deviation of the clustering cost over 20 random runs with varied k and ε. We set E = min (k log n/ε, |P|/2) for the private algorithms, and E = min(maxei V f(ei), k log n/ε, |P|/2) for the non-private algorithm. This guarantees that the number of thresholds used in the non-private algorithm is at least that used in the private algorithms. Instead of using the exponential mechanism to output the solution in algorithm 3, we use the Report Noisy Max mechanism with equivalent privacy guarantee and similar tail bound (see Dwork & Roth (2014)); this avoids potential overflow issues with the exponential mechanism. When the number of elements left in the stream of a non-private instance is less than k |S|, we add the rest of the points to S. This does not affect the theoretical guarantee, but might benefit the algorithm empirically. Although we apply advanced composition in our theoretical analyses as it asymptotically requires lower noise than basic composition, because of the difference in constant coefficients, basic composition works better for the number of thresholds we need to consider. For this experiment, we apply basic composition and set ε = ε/T, δ = δ/T. In Figures 1 and 4, we report the results for ε = 0.1. We observe that Gumbel noise outperforms Laplace noise in both settings. We observe similar results in Figures 2 and 5 for ε = 1. Gumbel noise continues to outperform Laplace noise in both settings. Increasing the privacy budget from 0.1 to 1 slightly improves the utility of the differentially private approaches. One interesting artefact that we observe for the synthetic data set is that the clustering cost actually increases as we increase the solution size from k = 100 to k = 125 when using Laplace noise. In general this should not happen as increasing the number of centers can only reduce the clustering cost if the centers are picked in the same way across experiments. However, since we have a fixed privacy budget and are forced to split this budget among a greater number of choices when using Laplace noise, we end up making a larger number of poorer quality picks for a net worse solution. This phenomenon has also been seen in other works on private clustering (Nguyen et al., 2021). On the other hand, since the algorithm with Gumbel noise uses a scale parameter which is invariant in the cardinality constraint, there is no such worsening of performance with a more generous cardinality constraint. We are able to ensure Streaming Submodular Maximization with Differential Privacy that apart from better performance overall, increasing our budget can only lead to a lower clustering cost and we do not need to consider private hyperparameter optimization over k, for instance. Cardinality constraint k Clustering cost Laplace Gumbel Non-private Figure 1. ε = 0.1 Cardinality constraint k Clustering cost Laplace Gumbel Non-private Figure 2. ε = 1 Figure 3. Comparison of clustering cost (lower is better) achieved by various streaming algorithms on the Taxi data set. Acknowledgements All the authors were supported by NSF CAREER grant 1750716. Badanidiyuru, A., Mirzasoleiman, B., Karbasi, A., and Krause, A. Streaming submodular maximization: massive data summarization on the fly. In Macskassy, S. A., Perlich, C., Leskovec, J., Wang, W., and Ghani, R. (eds.), The 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 14, New York, NY, USA - August 24 - 27, 2014, pp. 671 680. ACM, 2014. doi: 10.1145/2623330.2623637. URL https: //doi.org/10.1145/2623330.2623637. C alinescu, G., Chekuri, C., P al, M., and Vondr ak, J. Maximizing a monotone submodular function subject to a matroid constraint. SIAM J. Comput., 40(6):1740 1766, 2011. doi: 10.1137/080733991. URL https: //doi.org/10.1137/080733991. Cardinality constraint k Clustering cost Laplace Gumbel Non-private Figure 4. ε = 0.1 Cardinality constraint k Clustering cost Laplace Gumbel Non-private Figure 5. ε = 1 Figure 6. Comparison of clustering cost (lower is better) achieved by various streaming algorithms on the synthetic data set. Chaturvedi, A., Le Nguyen, H., and Zakynthinou, L. Differentially private decomposable submodular maximization. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35 No. 8., pp. 6984 6992, 2021. Dobzinski, S. and Schapira, M. An improved approximation algorithm for combinatorial auctions with submodular bidders. In Proceedings of the Seventeenth Annual ACMSIAM Symposium on Discrete Algorithms, SODA 2006, Miami, Florida, USA, January 22-26, 2006, pp. 1064 1073. ACM Press, 2006. URL http://dl.acm.org/ citation.cfm?id=1109557.1109675. Dueck, D. and Frey, B. J. Non-metric affinity propagation for unsupervised image categorization. In IEEE 11th International Conference on Computer Vision, ICCV 2007, Rio de Janeiro, Brazil, October 14-20, 2007, pp. 1 8. IEEE Computer Society, 2007. doi: 10.1109/ ICCV.2007.4408853. URL https://doi.org/10. 1109/ICCV.2007.4408853. Dwork, C. and Lei, J. Differential privacy and robust statistics. In Mitzenmacher, M. (ed.), Proceedings of the 41st Annual ACM Symposium on Theory of Computing, STOC 2009, Bethesda, MD, USA, May 31 - June 2, 2009, pp. 371 380. ACM, 2009. doi: 10. Streaming Submodular Maximization with Differential Privacy 1145/1536414.1536466. URL https://doi.org/ 10.1145/1536414.1536466. Dwork, C. and Roth, A. The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci., 9(3-4):211 407, 2014. doi: 10.1561/0400000042. URL https://doi.org/10.1561/0400000042. Dwork, C., Kenthapadi, K., Mc Sherry, F., Mironov, I., and Naor, M. Our data, ourselves: Privacy via distributed noise generation. In Vaudenay, S. (ed.), Advances in Cryptology - EUROCRYPT 2006, 25th Annual International Conference on the Theory and Applications of Cryptographic Techniques, St. Petersburg, Russia, May 28 - June 1, 2006, Proceedings, volume 4004 of Lecture Notes in Computer Science, pp. 486 503. Springer, 2006a. doi: 10.1007/11761679\ 29. URL https://doi.org/10.1007/11761679_29. Dwork, C., Mc Sherry, F., Nissim, K., and Smith, A. D. Calibrating noise to sensitivity in private data analysis. In Halevi, S. and Rabin, T. (eds.), Theory of Cryptography, Third Theory of Cryptography Conference, TCC 2006, New York, NY, USA, March 4-7, 2006, Proceedings, volume 3876 of Lecture Notes in Computer Science, pp. 265 284. Springer, 2006b. doi: 10.1007/11681878\ 14. URL https://doi.org/10.1007/11681878_14. Dwork, C., Naor, M., Reingold, O., Rothblum, G. N., and Vadhan, S. P. On the complexity of differentially private data release: efficient algorithms and hardness results. In Mitzenmacher, M. (ed.), Proceedings of the 41st Annual ACM Symposium on Theory of Computing, STOC 2009, Bethesda, MD, USA, May 31 - June 2, 2009, pp. 381 390. ACM, 2009. doi: 10. 1145/1536414.1536467. URL https://doi.org/ 10.1145/1536414.1536467. Dwork, C., Rothblum, G. N., and Vadhan, S. P. Boosting and differential privacy. In 51th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2010, October 23-26, 2010, Las Vegas, Nevada, USA, pp. 51 60. IEEE Computer Society, 2010. doi: 10.1109/FOCS.2010.12. URL https://doi.org/10.1109/FOCS.2010. 12. Feldman, M., Naor, J., and Schwartz, R. A unified continuous greedy algorithm for submodular maximization. In Ostrovsky, R. (ed.), IEEE 52nd Annual Symposium on Foundations of Computer Science, FOCS 2011, Palm Springs, CA, USA, October 22-25, 2011, pp. 570 579. IEEE Computer Society, 2011. doi: 10. 1109/FOCS.2011.46. URL https://doi.org/10. 1109/FOCS.2011.46. Feldman, M., Norouzi-Fard, A., Svensson, O., and Zenklusen, R. The one-way communication complexity of submodular maximization with applications to streaming and robustness. In Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing, pp. 1363 1374, 2020. Five Thirty Eight. Uber pickups in new york city, Nov 2019. URL https://www.kaggle. com/datasets/fivethirtyeight/ uber-pickups-in-new-york-city. Gomes, R. and Krause, A. Budgeted nonparametric learning from data streams. In F urnkranz, J. and Joachims, T. (eds.), Proceedings of the 27th International Conference on Machine Learning (ICML-10), June 21-24, 2010, Haifa, Israel, pp. 391 398. Omnipress, 2010. URL https://icml.cc/Conferences/ 2010/papers/433.pdf. Gupta, A., Ligett, K., Mc Sherry, F., Roth, A., and Talwar, K. Differentially private combinatorial optimization. In Charikar, M. (ed.), Proceedings of the Twenty First Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2010, Austin, Texas, USA, January 1719, 2010, pp. 1106 1125. SIAM, 2010. doi: 10.1137/ 1.9781611973075.90. URL https://doi.org/10. 1137/1.9781611973075.90. Kempe, D., Kleinberg, J. M., and Tardos, E. Maximizing the spread of influence through a social network. In Getoor, L., Senator, T. E., Domingos, P. M., and Faloutsos, C. (eds.), Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, August 24 - 27, 2003, pp. 137 146. ACM, 2003. doi: 10.1145/956750.956769. URL https://doi.org/ 10.1145/956750.956769. Krause, A. and Guestrin, C. Near-optimal nonmyopic value of information in graphical models. In UAI 05, Proceedings of the 21st Conference in Uncertainty in Artificial Intelligence, Edinburgh, Scotland, July 26-29, 2005, pp. 324 331. AUAI Press, 2005a. URL https:// dslpitt.org/uai/display Article Details. jsp?mmnu=1&smnu=2&article_id=1238& proceeding_id=21. Krause, A. and Guestrin, C. Near-optimal nonmyopic value of information in graphical models. In UAI 05, Proceedings of the 21st Conference in Uncertainty in Artificial Intelligence, Edinburgh, Scotland, July 26-29, 2005, pp. 324 331. AUAI Press, 2005b. URL https:// dslpitt.org/uai/display Article Details. jsp?mmnu=1&smnu=2&article_id=1238& proceeding_id=21. Kumar, R., Moseley, B., Vassilvitskii, S., and Vattani, A. Fast greedy algorithms in mapreduce and stream- Streaming Submodular Maximization with Differential Privacy ing. In Blelloch, G. E. and V ocking, B. (eds.), 25th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA 13, Montreal, QC, Canada - July 23 - 25, 2013, pp. 1 10. ACM, 2013. doi: 10.1145/ 2486159.2486168. URL https://doi.org/10. 1145/2486159.2486168. Lin, H. and Bilmes, J. A. A class of submodular functions for document summarization. In Lin, D., Matsumoto, Y., and Mihalcea, R. (eds.), The 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference, 19-24 June, 2011, Portland, Oregon, USA, pp. 510 520. The Association for Computer Linguistics, 2011. URL https://aclanthology.org/P11-1052/. Mc Sherry, F. and Talwar, K. Mechanism design via differential privacy. In 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS 2007), October 20-23, 2007, Providence, RI, USA, Proceedings, pp. 94 103. IEEE Computer Society, 2007. doi: 10.1109/FOCS.2007.41. URL https://doi.org/ 10.1109/FOCS.2007.41. Mitrovic, M., Bun, M., Krause, A., and Karbasi, A. Differentially private submodular maximization: Data summarization in disguise. In Precup, D. and Teh, Y. W. (eds.), Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017, volume 70 of Proceedings of Machine Learning Research, pp. 2478 2487. PMLR, 2017. URL http://proceedings.mlr.press/ v70/mitrovic17a.html. Narayanan, H. Submodular functions and electrical networks, volume 54. Elsevier, 1997. Nemhauser, G. L., Wolsey, L. A., and Fisher, M. L. An analysis of approximations for maximizing submodular set functions i. Mathematical programming, 14(1):265 294, 1978. Nguyen, H. L., Chaturvedi, A., and Xu, E. Z. Differentially private k-means via exponential mechanism and max cover. Proceedings of the AAAI Conference on Artificial Intelligence, 35(10):9101 9108, May 2021. doi: 10. 1609/aaai.v35i10.17099. URL https://ojs.aaai. org/index.php/AAAI/article/view/17099. Papadimitriou, C. H., Schapira, M., and Singer, Y. On the hardness of being truthful. In 49th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2008, October 25-28, 2008, Philadelphia, PA, USA, pp. 250 259. IEEE Computer Society, 2008. doi: 10. 1109/FOCS.2008.54. URL https://doi.org/10. 1109/FOCS.2008.54. Parambath, S. P., Usunier, N., and Grandvalet, Y. A coverage-based approach to recommendation diversity on similarity graph. In Sen, S., Geyer, W., Freyne, J., and Castells, P. (eds.), Proceedings of the 10th ACM Conference on Recommender Systems, Boston, MA, USA, September 15-19, 2016, pp. 15 22. ACM, 2016. doi: 10.1145/2959100.2959149. URL https: //doi.org/10.1145/2959100.2959149. Rafiey, A. and Yoshida, Y. Fast and private submodular and k-submodular functions maximization with matroid constraints. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, volume 119 of Proceedings of Machine Learning Research, pp. 7887 7897. PMLR, 2020. URL http://proceedings.mlr.press/ v119/rafiey20a.html. Sadeghi, O. and Fazel, M. Differentially private monotone submodular maximization under matroid and knapsack constraints. In Banerjee, A. and Fukumizu, K. (eds.), The 24th International Conference on Artificial Intelligence and Statistics, AISTATS 2021, April 13-15, 2021, Virtual Event, volume 130 of Proceedings of Machine Learning Research, pp. 2908 2916. PMLR, 2021. URL http://proceedings.mlr.press/v130/ sadeghi21a.html. Salazar, S. P. and Cummings, R. Differentially private online submodular maximization. In International Conference on Artificial Intelligence and Statistics, pp. 1279 1287. PMLR, 2021. Schrijver, A. et al. Combinatorial optimization: polyhedra and efficiency, volume 24. Springer, 2003. Streaming Submodular Maximization with Differential Privacy A. Proof of Theorem 1.3 In this section we reproduce the proof of utility of Algorithm 1 of Badanidiyuru et al. (2014). Theorem 1.3 (Badanidiyuru et al. (2014)). The final solution S satisfies f(S) min{O/2, f(OPT) O/2}. Proof. We see from the pseudocode of Algorithm 1 that when an element ei V is processed by the stream, if Si is the set of accepted elements at that point then ei is retained and added to the solution if and only if f(ei|Si) O/2k. Let S = {ei1, . . . , ei|S|}. We consider two cases depending on the size of S. If |S| = k, then j=1 f(eij|Sij) In the second case, if |S| < k, then all elements ei OPT \ S must have been rejected i.e. f(ei|Si) < O/(2k). Let OPT\S = {ej1, . . . , ejt} (i.e. {j1, . . . , jt} is some subset of {i1, . . . , i|S|}). Since f is monotonic and Si S for all ei V , we have that f(eji|S) f(eji|Sji) < O/(2k). We can write X eji OP T \S f(eji|S) k O On the other hand, we also have that eji OP T \S f(eji|S) i=1 f(eji|S {ej1, . . . , eji 1}) = f(S {ej1, . . . , ejt}) f(S) = f(OPT) f(S), where in the above we use that f(eji|S) f(eji|S {ej1, . . . , eji 1}) by the submodularity of f, and then let the sum of marginal gains telescope. From the two inequalities above we get that f(S) f(OPT) O/2. The desired lower bound now follows by simply taking the min over the two cases. We see that if O = f(OPT), then by setting the threshold according to this value we immediately get a 1/2-approximation algorithm. In the setting where we do not have this information, but have the promise that the optimum value lies in the range [E, m], we can run multiple copies of this algorithm with a set of geometrically scaling guesses O = {E, (1 + θ)E, (1 + θ)2E, . . . , (1 + θ) log1+θ m E E, m}. Essentially, we try geometrically varying guesses for f(OPT) so that we are assured that there is some O O such that f(OPT) O (1 + θ)f(OPT). From the guarantee of Theorem 1.3, we get that if SO is the output of Algorithm 1 when run with O = O , then f(SO ) min{O /2, f(OPT) O /2} min{f(OPT)/2, (1 θ)f(OPT)/2} (1 θ)f(OPT)/2. We see that since we must maintain and update all the SO for O O, achieving this guarantee requires that we pay a computational and spatial overhead of 2 log m/E θ for a 1 θ 2 -approximation. We also note that this algorithm requires just one pass over the data stream. Streaming Submodular Maximization with Differential Privacy B. Proof of Theorem 1.4 For ease of reference we reproduce the pseudo-code of Algorithm 2 and Algorithm 3. Algorithm 2 Sparse, Dwork et al. (2009); Dwork & Roth (2014) input Arbitrary (possibly adaptive) stream of sensitivity 1 queries f1, f2, . . . , a threshold T, a cutoff point k, threshold noise Dα, score noise Dβ. Output is a stream of answers a1, a2, . . . , ai { , } α Dα Let ˆTi = T + α for i {0, . . . , k 1} Let count = 0 for query ei do Let βi Dβ if fi + βi ˆTcount then Output ai = count = count + 1 else ai = end if if count k then Halt end if end for Lemma B.16. If α (al, au) and for all elements ei V , βi (bl, bu), then Algorithm 2 has the promise that when run with threshold T = O/2k, if we add all elements ei for which ai = to SO, then 2 min{O, f(OPT) O} kbu + kal. Proof. Let SO = {ei1, . . . , eik}. Let SO iu := {ei1, . . . , eiu 1} for u k. Since the selected elements must have succeeded in the threshold check, it must be the case that f(eiu|SO iu) + βiu O f(eiu|SO iu) + bu O f(eiu|SO iu) O Hence we have that u=0 f(eiu|SO iu) 2 kbu + kal On the other hand if |SO| = r < k, as in the proof of Theorem 1.3, let OPT\S = {ej1, . . . , ejt}, and as before, we have that f(SO) f(OPT) X eji OPT\SO f(eji|SO) i=1 (O/2k + αji βji) 2 kbu + kal. Streaming Submodular Maximization with Differential Privacy Algorithm 3 Private streaming submodular maximization input Monotone submodular function f A, cardinality constraint parameter k, failure probability β, privacy parameters (ε, δ), element stream V , approximation parameter θ if f A has sensitivity 1 but is not 1-decomposable then Let ε = ε 4 2T ln((T +1)/δ) 32k ln T +1 δ ε Let Dα = Lap(σ) Let Dβ = Lap(2σ) else if f A is 1-decomposable then T log1.5 T/δ (exact expression in Lemma 4.2) Dα = Gumb(γ) Dβ = Gumb(γ) end if Let E = min n k log n Let T = log1+θ m E + 1 Let O = {E, (1 + θ)E, (1 + θ)2E, . . . , (1 + θ) log1+θ m E E, m} for ei V do for all O O do SO Algorithm 2 (Query stream (f A(e|SO))e V , threshold O 2k, cutoff k, Dα, Dβ) where ai = add element ei to SO ai = reject element ei end for end for SO EM({SO : O O}, ε/2) output SO Remark B.17. Note that although the noise random variables βei for ei O are all drawn independently and have mean 0, we have implicitly conditioned on SO of elements having already been picked, and so we cannot claim and exploit independence so as to derive a better concentration bound. One would expect to see noise values biased high, making it more likely that that element have passed the check. Although we should be able to derive a concentration bound for the threshold noise random variables αO that scales as O( k), as the other noise term dominates in magnitude this does not help asymptotically. Proof of Theorem 1.4. We assume for now that f has sensitivity 1. From Theorem 2.8 and the choice of σ in Algorithm 2 we see that each one of the T = |O| calls to is (ε , δ T +1)-DP. Then, by advanced composition (Theorem 2.4), it follows that since ε as set in the pseudocode is 2T ln(T + 1)/δ , the T calls to Algorithm 2 are collectively ε/2, T δ T +1 + δ T +1 -differentially private. We apply basic composition to account for the ε/2-private call to the exponential mechanism and get (ε, δ)-differential privacy in sum. We now derive the the utility guarantee. For all O O, and all αi, with probability 1 η 2k T |αi| 8 log 2T/η Streaming Submodular Maximization with Differential Privacy Similarly, with probability 1 η 2n T , we have that for an element ej, |βj| 4 log 2n T/η Applying the union bound, it follows that with probability 1 η, for all n elements across all T thresholds as well as for all T guesses the respective noise variables αi, βj are bounded as above. It follows that we can apply Lemma B.16 with au = al = 8 log T/η σ , and bu = bl = 4 log n/η σ . Substituting, we get that for all guesses O with probability 1 η/2, 2 min {O, 2f(OPT) O} 4k log 2n T/η σ 8k log 2k T/η As O varies geometrically between E and m, and we have the promise that f takes values in [E, m], it follows that there is a choice O such that f(OPT) O (1 + θ)f(OPT) min{O , 2f(OPT) O } (1 θ)f(OPT) f(SO ) (1 θ)f(OPT) 2 12k log 2nk T/η From the guarantee of the exponential mechanism we get that with probability 1 η/2, f(SO ) f(SO ) 2 In sum, putting everything together and applying the union bound, we have that with probability 1 η, f(SO ) (1 θ)f(OPT) 2 12k log(2nk T/η) q (1 θ)f(OPT) k T log(T/δ) log(1/δ) log nk T/η (1 θ)f(OPT) k1.5 log1.5 nk log m/E ηθδ log0.5 m/E wherein we use that T = O(log1+θ m/E) = O log m/E We now set E = min n k log n ε , m/2 o , and show that the claimed bound holds. If f(OPT) lies in the prescribed interval [E, m], then we have already shown that the claimed bound holds. On the other hand, if f(OPT) < E, the additive loss in utility is k1.5 log1.5 nk log 2 ηθδ log0.5 2 θ > k log n and so the RHS of the claimed bound is negative, in which case any choice of SO fulfills the claimed bound trivially. We can therefore say that unconditionally, with probability 1 η, f(SO ) (1 θ)f(OPT) k1.5 log1.5 nk In the more general case where f has sensitivity λ, we observe that our analysis holds for the function f/λ, and that the optimum utility maximizing set for f/λ is the same as that of f and its utility is f(OPT) λ , so we have that with probability 1 η, Streaming Submodular Maximization with Differential Privacy λ (1 θ)f(OPT) k1.5 log1.5 nk Multiplying throughout by λ leads to the stated bound for the general case. C. Proof of Theorem 1.5 The key technical lemma in the privacy analysis of Algorithm 3 with Gumbel noise is the following. Lemma C.1. Algorithm 2 is (ε, δ)-differentially private for Dα, Dβ = Gumb(γ), where γ = 8 ε ln 2 log 2 The proof of this result is technically involved, and we defer the proof to the end of this section. In addition to Lemma C.1, we will also need some standard concentration bounds for the Gumbel distribution. Lemma C.2. If αi Gumbel(µ, γ), the following statements hold. 1. With probability 1 β, x µ + γ log 1/β. 2. With probability 1 β, x µ γ log log 1 Proof. 1. We recall that the CDF for Gumbel(µ, γ) is exp( exp( (x µ)/γ)). Then, 1 exp( exp( (x µ)/γ)) β exp( (x µ)/γ) log 1 β γ log 1 log 1 1 β x µ + γ log 1 log 1 1 β . Using the series expansion log 1 x = x x2/2 x, we have that log 1 1 β = log 1 β log 1 log 1 1 β log 1 So in particular, if x µ + γ log 1 β , then P(αi x) β. 2. Similar to the first part, we have that exp( exp( (x µ)/γ)) β exp( (x µ)/γ) log 1 γ log log 1 x µ γ log log 1 With these lemmas we can now derive our main result. Streaming Submodular Maximization with Differential Privacy Proof of Theorem 1.5. We assume for now that f is 1-decomposable. From Lemma C.2, we have that for every guess Oi, with probability 1 η/2k T αi γ log log 2k T η , γ log 2k T Similarly, for every element ei and every threshold O, with probability 1 η/2n T βi γ log log 2n T η , γ log 2n T Applying the union bound, it follows that in the notation of Lemma B.16 we can set al = γ log log 2k T au = γ log 2k T bl = γ log log 2n T bu = γ log 2n T and conclude that with probability 1 η, for all thresholds O, 2 min{O, f(OPT) O} kγ log 2n T η kγ log log 2k T 2 min{O, f(OPT) O} 2kγ log 2nk T As in the proof of Theorem 1.4, as long as f(OPT) [E, m] there exists a choice O for which f(SO ) (1 θ)f(OPT) 2 2kγ log 2nk T From the utility guarantee of the exponential mechanism (Lemma 2.5) we have that f(S ) f(SO ) 2 (1 θ)f(OPT) 2 2kγ log 2nk T (1 θ)f(OPT) T log1.5 T log T/δ εδ log 2nk T (1 θ)f(OPT) E log1.5 log m E log log m/E θδ εδθ log 2nk log m/E wherein we use that γ = O T ε log1.5 T log T/δ εδ , and T = log1+θ m/E = O log m/E θ . Again, setting E = min n k log n ε , m/2 o , we get that if f(OPT) [E, m], then a good choice of O exists and the desired bound follows. On the other hand, if f(OPT) < E < k log n ε , then since the additive error term is at least k log n ε , the RHS of the claimed bound is negative, and any choice of SO fulfills the claimed bound trivially. Simplifying terms a bit we get f(S ) (1 θ)f(OPT) k log2 m Eεδθ log 2nk log m/E Streaming Submodular Maximization with Differential Privacy In the more general case where f is λ-decomposable, we observe that our analysis holds for the 1-decomposable function f/λ, and that the optimum utility maximizing set for f/λ is the same as that of f and its utility is f(OPT) λ . It follows that with probability 1 η, λ (1 θ)f(OPT) k log2 m Eεδθ log 2nk log m/E Multiplying throughout by λ leads to the stated bound. We now derive the privacy guarantee. Since the T 2 log m E θ -many instantiations of Algorithm 2 are run with independent random bits, we can use advanced composition to argue that the net privacy loss suffered by releasing the sets SO is 2T log(1/δ ), (T + 1)δ ), where it suffices to set γ = 8 ε ln 2 log 2 ε δ by Lemma C.1. Replacing ε by ε 4 2T log(1/δ ) in the expression for γ, it follows ε ln 2 log 4T p 2T log(T/δ) θ log1.5 log 1 Algorithm 3 with Gumbel noise is (ε/2, δ)-differentially private. We can then apply the exponential mechanism on this public set of choices for an additional (ε/2, 0)-privacy loss, giving us the claimed expression. To prove Lemma C.1, we first derive some technical lemmas that characterize the probability of stream elements succeeding in their privatized threshold checks. Lemma 4.1 characterizes the probability of an element ei being picked condition on the set S already having been picked. Lemma C.3. Conditioned on having picked the set Si of elements with |Si| = u < k, Pr[ai = |Si] = Z 1 exp w A(i|Si)eαu/γ dα|Si|, w A(i|Si) = exp 1 2k f A(ei|Si) , and αu Gumbel(0, γ). Proof. We recall that for an element ei to be picked by Algorithm 3 (which happens if and only if the output ai of Algorithm 2 on input ei equals ), it must be the case that if Si is the set of elements picked thus far then |Si| = u < k, and that the privatized marginal utility of ei given Si has been picked beats the privatized threshold O/2k + αu. We can write Pr[ai = |Si] = Pr f(ei|Si) + βi O 1 f(ei|Si) + βi O where αu, βi Gumbel(0, γ), and 1[ ] denotes the indicator of the event in its argument. Since αu and βi are drawn independently of each other, we can factorize their joint density function and write Pr[ai = |Si] = Z 2k + αu f(ei|Si) dαu. (1) Streaming Submodular Maximization with Differential Privacy Since βi Gumbel(0, γ), we have that 2k + αu f(ei|Si) = 1 Pr βi < O 2k + αu f(ei|Si) = 1 exp exp 1 2k + αu f(ei|Si) = 1 exp w A(i|Si)eαu/γ . Substituting for the integrand in Equation (1), we get the stated result. Lemma C.20. Let A(A) denote the set of elements indicated by Algorithm 2 to be picked for the decomposable submodular function f A. If S = (ei1, ei2, . . . , eir), then Pr[A(A) = S] = w A(iu|Siu) (1 + P j (iu 1,iu) w A(j|Siu))(1 + P j (iu 1,iu] w A(j|Siu)). where Siu = {ei1, ei2, . . . , eiu 1}, i.e. the set of elements already picked when element eiu is considered. Proof. The stream e1, e2, . . . is given as input to Algorithm 2 and the elements ei1, . . . , eir are picked. For this to be the case, for every u {1, . . . , r}, all the elements after iu 1 (the element picked before iu)), and before iu must fail their privatized checks conditioned on {i1, . . . , iu 1} having already been picked, but iu must itself succeed. In a way similar to Lemma 4.1, we can factor the joint density function of the αu and the βj as they are drawn independently and write Pr[A(A) = S] = u=1 Pr[aiu = |Siu] Y j (iu 1,iu) Pr[aj = |Siu] (1 exp w A(iu|u)eαu/γ ) Y j (iu 1,iu) exp w A(j|Siu)eαu/γ dαu (1 exp( w A(iu|u)ez)) exp j (iu 1,iu) w A(j|Siu) exp z e z dz In the above we use that the PDF of αu is P(x) = 1 γ + e x/γ) , and make the variable substitution z = x/γ. We can simplify the integrand of each factor as follows. (1 exp( w A(iu|Siu)ez)) exp j (iu 1,iu) w A(j|Siu) z e z(1 + X j (iu 1,iu) w A(j|Siu)) z e z(1 + X j (iu 1,iu] w A(j|Siu)) We can integrate the summands separately; as they have the same form, to derive the resulting expression it suffices to compute the first integral, which we denote I. z e z(1 + X j (iu 1,iu) w A(j|Siu)) Let t = e z(1 + P j (iu 1,iu) w A(j|Siu)). Then, dt = e z(1 + P j (iu 1,iu) w A(j|Siu))dz. Substituting this variable, we get I = 1 (1 + P j (iu 1,iu) w A(j|Siu)) j (iu 1,iu) w A(j|Siu). Streaming Submodular Maximization with Differential Privacy It follows that Pr[A(A) = S] = j (iu 1,iu) w A(j|Siu) 1 1 + P j (iu 1,iu] w A(j|Siu) w A(iu|Siu) (1 + P j (iu 1,iu) w A(j|Siu))(1 + P j (iu 1,iu] w A(j|Siu)). Proof of Lemma C.1. Let A, B X, and let A = B {I}. Let S = (ei1, . . . , eir) be any sequence of elements that can be picked by the algorithm (i.e. r k). First we show that Pr[A(A) = S] eε Pr[A(B) = S] + δ Substituting from Lemma C.20 and rearranging terms, we have that Pr[A(A) = E] Pr[A(B) = E] = w A(iu|Siu) w B(iu|Siu) j (iu 1,iu) w B(j|Siu))(1 + P j (iu 1,iu] w B(j|Siu)) (1 + P j (iu 1,iu) w A(j|Siu))(1 + P j (iu 1,iu] w A(j|Siu)) . We bound the two factors of this expression separately. For the first factor, we have w A(iu|Siu) w B(iu|Siu) = 2k f A(eiu|Siu) 2k f B(eiu|Siu) u=1 f B(eiu|Siu) f A(eiu|Siu) u=1 fp(eiu|Siu) The second factor is bounded trivially from above by 1; to see this, we observe that the following sequence of inequalities holds. w A(j|u) = exp 1 2k f A(ei|Siu) 2k f B(ei|Siu) + fp(ei|Siu)) 2k f B(ei|Siu) In sum, it follows that any value of γ 1/ε suffices. We now show that Pr[A(B) = S] eε Pr[A(A) = S] + δ. To this end we consider the reciprocal of the ratio we bounded for the first case, i.e. Pr[A(B) = E] Pr[A(A) = E] = w B(iu|Siu) w A(iu|Siu) j (iu 1,iu) w A(j|Siu))(1 + P j (iu 1,iu] w A(j|Siu)) j (iu 1,iu) w B(j|Siu))(1 + P j (iu 1,iu] w B(j|Siu)). (2) To bound this ratio, we first derive a simple relaxation. Streaming Submodular Maximization with Differential Privacy Claim C.21. The following bound holds: Pr[A(B) = E] Pr[A(A) = E] j (iu 1,iu) w A(j|Siu) j (iu 1,iu) w B(j|Siu) Proof. We observe that w B(iu|Siu) w A(iu|Siu) = exp f{p}(eiu|S) w B(iu|Siu) w A(iu|Siu) 1 + P j (iu 1,iu] w A(j|Siu) j (iu 1,iu] w B(j|Siu) = exp f{p}(eiu|S) j (iu 1,iu) w A(j|Siu) + w A(iu|Siu) j (iu 1,iu) w B(j|Siu) + w B(iu|Siu) = exp f{p}(eiu|S) 1 + P j (iu 1,iu) w A(j|Siu) + exp f{p}(eiu|S)/γ w B(iu|Siu) j (iu 1,iu) w B(j|Siu) + w B(iu|Siu) exp f{p}(eiu|S)/γ + exp f{p}(eiu|S)/γ P j (iu 1,iu) w A(j|Siu) + w B(iu|Siu) j (iu 1,iu) w B(j|Siu) + w B(iu|Siu) j (iu 1,iu) w A(j|Siu) + w B(iu|Siu) j (iu 1,iu) w B(j|Siu) + w B(iu|Siu) j (iu 1,iu) w A(j|Siu) j (iu 1,iu) w B(j|Siu). The claim now follows by applying this upper bound for each factor in Equation (2) as u varies from 1 to r. We will now focus on bounding the expression j (iu 1,iu) w A(j|Siu) j (iu 1,iu) w B(j|Siu). We first observe that this expression can be identified with the expectation of an monotonically increasing function of the marginal utility of the agent {p} = A\B. Definition C.22. Let vj := w B(iu 1 + j|Siu). Let Pu be a distribution over { } {1, . . . , iu iu 1} such that Pu(iu + j) vj and Pu( ) 1. With this definition, we see that j>0 w A(iu + j|Siu) 1 + P j>0 w B(iu + j|Siu) = 1 + P j (iu 1,iu) exp f{p}(ej|Siu)/γ w B(iu + j|Siu) 1 + P j>0 w B(iu + j|Siu) j (iu 1,iu) exp f{p}(ej|Siu) vj 1 + P j (iu 1,iu) vj exp f{p}(ej|S)/γ We let Yu := Ej Pu[exp f{p}(ej|Siu)/γ ]. At a high level the key insight is that since the sum of marginal utilities Pr u=1 f{p}(eiu|Siu) of any agent is at most 1, the net privacy loss, which we have shown to be bounded above by a product of monotonic functions of the sequential marginal utilities may also be bounded more tightly than the O( k/ε) bound that arises from advanced composition. To prove this stronger concentration bound we first derive a moment bound on these functions of the expected marginal utility Yu. Streaming Submodular Maximization with Differential Privacy The complication here is that the elements picked by the algorithm affect the marginal utilities of all subsequent elements considered; we proceed by formalizing a probabilistic process capturing the behaviour of this algorithm in a manner similar to that of Gupta et al. (2010) and Chaturvedi et al. (2021). Lemma C.23. Consider the following k-round probabilistic process. Let vj := w B(iu 1 + j|Siu). In each round u, it is the case that the set of elements Siu = {i1, . . . , iu 1} has been picked, and the element iu = j + iu 1 is picked with probability pj = 1 1 + v1 + + vj 1 vj 1 + v1 + + vj . Then, for each q = 1, . . . , r, for a value of c = γ/4 = 2 ε ln 2 log 2 εδ, the following bound holds: u=q Y c u |Siq ε(1 f{p}(Siq)). Before we prove this lemma, we first prove a minor claim that linearizes the dependence on the moments Y c u on the marginal utility random variable f{p}(eiu+j|Siu). Claim C.23. For c, γ such that 1 c < γ, Y c u 1 + (e 1)c γ E j Pu[f{p}(eiu 1+j|Siu)]. Proof. By definition, we have that j (iu 1,iu) exp f{p}(eiu 1+j|Siu)/γ vj 1 + P j (iu 1,iu) vj Applying Jensen s inequality, we get that Y c u 1 + P j (iu 1,iu) exp cf{p}(eiu 1+j|Siu)/γ vj 1 + P j (iu 1,iu) vj . Since c < γ and f{p}(eiu 1+j|Siu) 1, by applying the inequality ex < 1 + (e 1)x for x 1, it follows that exp cf{p}(eiu 1+j|Siu)/γ 1 + (e 1)cf{p}(eiu 1+j|Siu). Applying this bound and continuing, we get that Y c u 1 + P j (iu 1,iu)(1 + (e 1)cf{p}(eiu 1+j|Siu)/γ)w B(iu 1 + j|Siu) j (iu 1,iu) w B(iu 1 + j|Siu) j (iu 1,iu)(e 1)cf{p}(eiu 1+j|Siu)w B(iu 1 + j|Siu)/γ 1 + P j (iu 1,iu) w B(iu 1 + j|Siu) γ E j Pu[f{p}(eiu+j|Siu)]. Proof of Lemma 4.2. We proceed by reverse induction on q. For the base case, i.e. q = k, we have that E ik[Y c k ] E 1 + (e 1)c γ E j Pk[f{p}(eik 1+j)] γ sup j>0 f{p}(eik 1+j|Sik) γ (1 f{p}(Sik)) ε(1 f{p}(Sik)), Streaming Submodular Maximization with Differential Privacy where in the above we use that c/γ = 1/4 1 (e 1)ε for ε 1, and that f{p}(eik 1+j|Sik) + f{p}(Sik) = f{p}(Sik {eik 1+j}) 1 for any choice of j. For the induction step, we assume that the statement is true for u = q + 1, . . . , k, and derive a bound for the case u = q. E iq,...,ik u=q Y c u |Siq Y c q E iq+1,...,ik u=q+1 Y c u |Sq ε(1 f{p}(Sq)) |Siq ε(1 f{p}(Siq) f{p}(eiq|Siq)) |Siq γ E j Pq[f{p}(ej|Siq)] 1 + 1 ε(1 f{p}(Siq) f{p}(eiq|Siq)) |Siq ε(1 f{p}(Siq)) 1 ε E iq[f{p}(eiq|Siq)|Siq] E j Pq[f{p}(ej|Siq)] 1 + 1 ε(1 f{p}(Siq) f{p}(eiq|Siq)) ε(1 f{p}(Siq)) 1 ε E iq[f{p}(eiq|Siq)|Siq] + (e 1)(1 + 1/ε) 4 E iq[ E j Pq[f{p}(ej|Siq)]], where in the above we use that ε(1 f{p}(Siq) f{p}(eiq|Siq)) 1 + 1 It follows that it would suffice to show that the last two terms sum to at most 0. We have that E iq[ E j Pq[f{p}(ej|Siq)]] = X w 1 pw E j Pq[f{p}(ej|Siq)|iq = iq 1 + w] vx 1 + v1 + + vw 1 f{p}(eiq 1+x|Siq) The outer expectation in the display above corresponds to iq being picked as described by the probabilistic process (and Algorithm 3), and the expectation inside is the expression that we used to bound the privacy loss term for any one round; conditioned on the choice of iq we recall that it is a distribution over (iq 1, iq). We switch the sums in the display above to get E iq[ E j Pq[f{p}(ej|Siq)]] = X x 1 f{p}(eiq 1+x|Siq) X vx 1 + v1 + + vw 1 pw x 1 f{p}(eiq 1+x|Siq) vx 1 + v1 + + vx Further we have X 1 1 + v1 + + vw 1 vw 1 + v1 + + vw 1 1 + v1 + + vw 1 vw 1 + v1 + + vw 1 1 + v1 + + vw 1 vw 1 + v1 + + vw = 1 1 + v1 + + vx 1 . Streaming Submodular Maximization with Differential Privacy Substituting, we get E iq[ E j Pq[f{p}(ej|Siq)]] X x 1 f{p}(eiq 1+x|Siq) vx 1 + v1 + + vx 1 1 + v1 + + vx 1 x 1 pxf{p}(eiq 1+x|Siq) = E iq[f{p}(iq|Siq)]. So in sum, we have that E iq,...,ik u=q Y c u |Siq ε(1 f{p}(Siq)) + E iq[f{p}(iq|Siq)] 1 ε + (e 1)(1 + 1/ε) ε(1 f{p}(Siq)). wherein we use that for ε < 1, (e 1)(1+1/ε) Returning to the proof of Lemma C.1, we see that the probabilistic process defined and analysed in Lemma 4.2 can be identified with a run of Algorithm 2 with Gumbel noise, where the input stream has been appended with infinitely many items of 0 marginal utility - this ensures that k complete rounds are executed, but the output distribution on the non-trivial items is identical. Setting q = 1 in Lemma 4.2, since f{p}( ) = 0, we see that u=1 Y c u |Siq E i1,...,ik j (iu 1,iu) w A(j|Siu) j (iu 1,iu) w B(j|Siu) Pr i1,...,ik j (iu 1,iu) w A(j|Siu) j (iu 1,iu) w B(j|Siu) > (1 + ε)1/2 # (1 + ε)c/2 , wherein in the last step we apply Markov s inequality. Since ε < 1, we have that (1 + 1/ε) (1 + ε)c/2 2/ε (1 + ε)c/2 2/ε exp ε ln 2 c wherein we use that for ε < 1, 1 + ε exp(ε ln 2). Setting c = 2 ε ln 2 log 2 εδ, which we note is 1, we get that (1 + 1/ε) (1 + ε)c/2 2/ε exp(ln 2/εδ) It follows that with probability 1 δ, j (iu 1,iu) w A(j|Siu) 1 + P j (iu 1,iu) w B(j|Siu) (1 + ε)1/2 Pr[A(B) = E] Pr[A(A) = E] 1 + ε. It follows that a run of Algorithm 2 with Gumbel noise with noise parameter γ = 4c = 8 ε ln 2 log 2 εδ is (ε, δ)-DP. Streaming Submodular Maximization with Differential Privacy D. Proof of Theorem 1.6 In this section we describe and prove a lower bound (Theorem 1.6) for private submodular maximization. This is a slightly weaker bound than that of Gupta et al. (2010) but is more general as it applies to the (ε, δ) instead of the (ε, 0) setting. Further, it also happens to have a decomposable objective, showing that Algorithm 3 with Gumbel noise has the optimal dependence on k and ε (up to logarithmic terms). Definition D.1 (Maximum coverage). Given a set system (U, S), i.e. a ground set U and a family S of subsets of U, the maximum coverage problem fixes a private target subset R U and a number k and asks the solver to pick T S such that R T T T and |T | k. We can recast this problem in the form of submodular maximization, and then construct a hard instance of maximum coverage to prove our lower bound for (ε, δ)-DP submodular maximization. Lemma D.2 (Maximum coverage as submodular maximization). Given a set system (U, S), and a set cover problem with a private target subset R U and budget k, it is easy to see that the objective |R ( T T T)| = X is a decomposable submodular function with |R| summands. Theorem 1.6. For all 0 ε, δ 1, k N, n k eε 1 δ , and c 4δ eε 1, if an (ε, δ)-DP algorithm for the submodular maximization problem for decomposable objectives achieves a multiplicative approximation factor of c, it must incur additive error Ω((kc/ε) log(ε/δ)). Proof. We construct a hard instance for maximum coverage. Let (U, S) be a set system where S consists of all the singletons in U. Let A be a set of size k picked uniformly at random from U, and let the data set DA = A [L] for L = ln c eε 1 δ 2ε . Let n := |DA| = |A| |L|. Let T be k subsets of S picked by the solver M . The objective we are trying to maximize is e DA 1{e} T . Let M be any (ε, δ)-DP algorithm for the set cover problem and let ϕ = EM,A[(M(DA) A)/|A|], i.e. ϕ is the average fraction of points of A (and consequently DA) that were recovered successfully by M. ϕ captures the average approximation factor achieved by the algorithm M over this family of hard instances. We see that since A is of size k, and the data set DA is simply points of A repeated with multiplicity, the collection of sets T = {{i} : i A} is a solution for this maximization problem that achieves f(OPT) = n. We observe that e DA 1{e} T = E A,M E i A[1i M(DA)] = E i U E A,M[1i M(DA)|i A]. Fixing any choice of i A, let i be uniformly random in U\A, and let A = (A\{i}) {i }; A is hence uniformly random over U\{i}. We see that there is a chain of sets D0 A, D1 A, . . . , DL A such that D0 A = DA, Dt A = (Dt 1 A \{i}) {i } for t [L], and DL A = DA (we recall that we treat data sets as multisets, allowing us to swap one copy of i for one copy of i at a time. Since M is (ε, δ)-DP, it follows that for all t [L], E M[1i M(Dt A)] exp( 2ε) E M[1i M(Dt 1 A )] 2δ. Streaming Submodular Maximization with Differential Privacy It follows that E M[1i M(D A)] exp( 2ε) E M[1i M(DL 1 A )] 2δ. exp( 4ε) E M[1i M(DL 2 A )] exp( 2ε) 2δ 2δ exp( 2Lε) E M[1i M(D0 A)] 2δ (1 + exp( 2ε) + exp( 4ε) + . . . ) exp( 2Lε) E M[1i M(DA)] 2δ 1 e 2ε exp( 2Lε) E M[1i M(DA)] 2δ e2ε 1. Taking the expectation over i U and the randomness in the choice of A, we get E i U E A,M[1i M(DA)|i A] ϕ exp( 2Lε) 2δ e2ε 1. It follows by the law of total expectation that E i U E A,M[1i M(DA)] ϕ exp( 2Lε) 2δ e2ε 1. The LHS is at most k/n, so rearranging terms we get n + 2δ e2ε 1 exp(ε 2L) ϕ. It follows that for n k e2ε 1 2δ , and L 1 2ε log c e2ε 1 8δ , ϕ is at most c/2. Hence for all c 8δ e2ε 1, either the algorithm fails to achieve the multiplicative approximation factor of c, or it incurs additive error ck L/2 = Ω((ck/ε) log(ε/δ)). E. Experimental data and further results E.1. Data for graphs of Section 5 We tabulate the data recorded in our main experiments for better scrutiny. As pointed out in the Section 5, note in particular the increases in the mean clustering cost when using Laplace noise for the synthetic data set (Tables 3 and 4) as we increase the cardinality constraint k from 100 to 125. There is a small jump in the cost of the Gumbel noise as well in Table 1, but this is attributable to the variance of the experiments, as the privatizing noise parameter used is the same regardless of the value of k. Table 1. Comparison of the mean clustering cost and variance over 20 runs for the Taxi data set with privacy parameter ε = 0.1 (graphed in Figure 1) Cardinality constraint k Random Laplace Gumbel Non-private 25 1.68 (0.73) 0.79 (0.34) 0.64 (0.24) 0.17 (0) 50 1.23 (0.55) 0.68 (0.23) 0.44 (0.12) 0.17 (0) 75 1 (0.44) 0.53 (0.21) 0.39 (0.12) 0.17 (0) 100 0.88 (0.34) 0.48 (0.14) 0.32 (0.09) 0.16 (0) 125 0.76 (0.26) 0.46 (0.2) 0.34 (0.11) 0.16 (0) Streaming Submodular Maximization with Differential Privacy Table 2. Comparison of the mean clustering cost and variance over 20 runs for the Taxi data set with privacy parameter ε = 1 (graphed in Figure 2) Cardinality constraint k Random Laplace Gumbel Non-private 25 1.68 (0.73) 0.7 (0.33) 0.62 (0.28) 0.15 (0) 50 1.23 (0.55) 0.64 (0.23) 0.4 (0.12) 0.14 (0) 75 1 (0.44) 0.52 (0.22) 0.36 (0.09) 0.14 (0) 100 0.88 (0.34) 0.49 (0.14) 0.3 (0.07) 0.14 (0) 125 0.76 (0.26) 0.44 (0.16) 0.28 (0.07) 0.14 (0) Table 3. Comparison of the mean clustering cost and variance over 20 runs for the Synthetic data set with privacy parameter ε = 0.1 (graphed in Figure 4) Cardinality constraint k Random Laplace Gumbel Non-private 25 1.57 (0.48) 0.98 (0.25) 0.83 (0.23) 0.38 (0) 50 1.09 (0.34) 0.7 (0.23) 0.61 (0.12) 0.22 (0) 75 0.92 (0.31) 0.59 (0.15) 0.55 (0.1) 0.22 (0) 100 0.85 (0.21) 0.53 (0.15) 0.49 (0.1) 0.21 (0) 125 0.8 (0.26) 0.56 (0.16) 0.48 (0.09) 0.21 (0) Table 4. Comparison of the mean clustering cost and variance over 20 runs for the Synthetic data set with privacy parameter ε = 1 (graphed in Figure 5) Cardinality constraint k Random Laplace Gumbel Non-private 25 1.57 (0.48) 0.95 (0.27) 0.75 (0.2) 0.38 (0) 50 1.09 (0.34) 0.66 (0.2) 0.58 (0.09) 0.22 (0) 75 0.92 (0.31) 0.59 (0.14) 0.51 (0.08) 0.18 (0) 100 0.85 (0.21) 0.48 (0.11) 0.46 (0.08) 0.16 (0) 125 0.8 (0.26) 0.54 (0.16) 0.41 (0.09) 0.15 (0) E.2. Additional experiments In Figure 9, we compare the performance of Algorithm 3 with Laplace and Gumbel noises with the Private (Mitrovic et al., 2017; Gupta et al., 2010) and the Non-private non-streaming (Non-private NS) (Nemhauser et al., 1978) algorithms. We evaluate the algorithms on the synthetic dataset described in Section 5 for ε = 0.1 and ε = 1 and δ = 1/|P|1.5, as before. We see that the private non-streaming algorithm performs significantly better, as expected. We recall that the multiplicative approximation ratio for the private non streaming algorithm is (1 1/e) which explains much of this gap; in the streaming case even for non-private algorithms the approximation factor is at best 1/2. Streaming Submodular Maximization with Differential Privacy Cardinality constraint k Clustering cost Laplace Gumbel Non-private NS Figure 7. ε = 0.1 Cardinality constraint k Clustering cost Laplace Gumbel Non-private NS Figure 8. ε = 1 Figure 9. Performance on the synthetic dataset.