# bounded_space_differentially_private_quantiles__b1879eec.pdf Published in Transactions on Machine Learning Research (06/2023) Bounded Space Differentially Private Quantiles Daniel Alabi alabid@cs.columbia.edu Columbia University Omri Ben-Eliezer omrib@mit.edu Massachusetts Institute of Technology Anamay Chaturvedi chaturvedi.a@northeastern.edu Northeastern University Reviewed on Open Review: https: // openreview. net/ forum? id= six OD8YVv M Estimating the quantiles of a large dataset is a fundamental problem in both the streaming algorithms literature and the differential privacy literature. However, all existing private mechanisms for distribution-independent quantile computation require space at least linear in the input size n. In this work, we devise a differentially private algorithm for the quantile estimation problem, with strongly sublinear space complexity, in the one-shot and continual observation settings. Our basic mechanism estimates any α-approximate quantile of a length-n stream over a data universe X with probability 1 β using O log(|X|/β) log n space while satisfying ϵ-differential privacy at a single time point. Our approach builds upon deterministic streaming algorithms for non-private quantile estimation instantiating the exponential mechanism using a utility function defined on sketch items, while (privately) sampling from intervals defined by the sketch. We also present another algorithm based on histograms that is especially well-suited to the multiple quantiles case. We implement our algorithms and experimentally evaluate them on synthetic and real-world datasets. 1 Introduction Quantile estimation is a fundamental subroutine in data analysis and statistics. For q [0, 1], the q-quantile of a dataset of size n is the element ranked qn when the elements are sorted from smallest to largest. Computing a small number of quantiles in a massive data set can serve as a quick and effective sketch of the shape of the data. Quantile estimation also serves an essential role in robust statistics, where data is generated from some distribution but is contaminated by a non-negligible fraction of outliers, i.e., out of distribution elements that may sometimes even be adversarial. For example, the median (50th percentile) of a dataset is used as a robust estimator of the mean in such situations where the data may be contaminated. Location parameters can also be (robustly) estimated via truncation or winsorization, an operation that relies on quantile estimation as a subroutine (Tukey, 1960; Huber, 1964). Rank-based nonparametric statistics can be used for hypothesis testing (e.g., the Kruskal-Wallis test statistic (Kruskal & Wallis, 1952)). Thus, designing quantile-based or rank-based estimators, whether distribution-dependent or distribution-agnostic, is important in many scenarios. Maintaining the privacy of individual users or data items, or even of groups, is an essential prerequisite in many modern data analysis and management systems. Differential privacy (DP) is a rigorous and now well-accepted definition of privacy for data analysis and machine learning. In particular, there is already a substantial amount of literature on differentially private quantile estimation (e.g., see (Nissim et al., 2007; Asi & Duchi, 2020; Gillenwater et al., 2021; Tzamos et al., 2020)).1 1Robust estimators are also known to be useful for accurate differentially private estimation; see, e.g., the work of Dwork and Lei (Dwork & Lei, 2009) in the context of quantile estimation for the interquartile range and for medians. Published in Transactions on Machine Learning Research (06/2023) All previous work either makes certain distributional assumptions about the input, or assumes the ability to access all input elements (thus virtually requiring a linear or worse space complexity). Such assumptions may be infeasible in many practical scenarios, where large scale databases have to quickly process streams of millions or billions of data elements without clear a priori distributional characteristics. The field of streaming algorithms aims to provide space-efficient algorithms for data analysis tasks such as these. These algorithms typically maintain good accuracy and fast running time while having space requirements that are substantially smaller than the size of the data. While distribution-agnostic quantile estimation is among the most fundamental problems in the streaming literature (Agarwal et al., 2013; Felber & Ostrovsky, 2017; Greenwald & Khanna, 2001; Hung & Ting, 2010; Karnin et al., 2016; Manku et al., 1999; Munro & Paterson, 1980; Shrivastava et al., 2004; Wang et al., 2013), no differentially private sublinear-space algorithms for the same task are currently known. Thus, the following question, essentially posed by (Smith, 2011) and (Mir et al., 2011), naturally arises: Can we design differentially private quantile estimators that use space sublinear in the stream length, have efficient running time, provide high-enough utility, and do not rely on restrictive distributional assumptions? It is well-known (Munro & Paterson, 1980) that exact computation of quantiles cannot be done with sublinear space, even where there are no privacy considerations. Thus, one must resort to approximation. Specifically, for a dataset of n elements, an α-approximate q-quantile is any element which has rank (q α)n when sorting the elements from smallest to largest, and it is known that the space complexity of α-approximating quantiles is Ω(1/α) (Munro & Paterson, 1980). In our case, the general goal is to efficiently compute α-approximate quantiles in a (pure or approximate) differentially private manner. 1.1 Our Contributions We answer the above question affirmatively by providing theoretically proven algorithms with accompanying experimental validation for quantile estimation with DP guarantees. The algorithms are suitable for private computation of either a single quantile or multiple quantiles. Concretely, the main contributions are: 1. We devise DPExp GK, a differentially private sublinear-space algorithm for quantile estimation based on the exponential mechanism. In order to achieve sublinear space complexity, our algorithm carefully instantiates the exponential mechanism with the basic blocks being intervals from the Greenwald Khanna (Greenwald & Khanna, 2001) data structure for non-private quantile estimation, rather than single elements. We prove general distribution-agnostic utility bounds on our algorithm and show that the space complexity is logarithmic in n. 2. We present DPHist GK, another differentially private mechanism for quantile estimation, which applies the Laplace mechanism to a histogram, again using intervals of the GK-sketch as the basic building block. We theoretically demonstrate that DPHist GK may be useful in cases where one has prior knowledge on the input. 3. We extend our results to the continual release setting, wherein we must maintain and output an estimate of the queried quantile in an online manner as the data is received and processed. 4. We empirically validate our results by evaluating DPExp GK, analyzing and comparing various aspects of performance on real-world and synthetic datasets. 2 Related Work 2.1 Quantile Approximation of Streams and Sketches Approximation of quantiles in large data streams (without privacy guarantees) is among the most wellinvestigated problems in the streaming literature (Wang et al., 2013; Greenwald & Khanna, 2016; Xiang et al., 2020). A classical result of Munro and Paterson from 1980 (Munro & Paterson, 1980) shows that Published in Transactions on Machine Learning Research (06/2023) computing the median exactly with p passes over a stream requires Ω(n1/p) space, thus implying the need for approximation to obtain very efficient (that is, at most polylogarithmic) space complexity. Manku, Rajagopalan and Lindsay (Manku et al., 1999) built on ideas from (Munro & Paterson, 1980) to obtain a randomized algorithm with only O((1/α) log2(nα)) for α-approximating all quantiles; a deterministic variant of their approach with the same space complexity exists as well (Agarwal et al., 2013). The best known deterministic algorithm is that of Greenwald and Khanna (GK) (Greenwald & Khanna, 2001) on which we build on in this paper, with a space complexity of O(α 1 log(αn)) to sketch all quantiles for n elements (up to rank approximation error of αn). A recent deterministic lower bound of Cormode and Veselý (Cormode & Veselý, 2020) (improving on (Hung & Ting, 2010)) shows that the GK algorithm is in fact optimal among deterministic (comparison-based) sketches. Randomization and sampling help for streaming quantiles, and the space complexity becomes independent of n; an optimal space complexity of O((1/α) log log(1/β)) was achieved by Karnin, Lang and Liberty (Karnin et al., 2016) (for failure probability β), concluding a series of work on randomized algorithms (Agarwal et al., 2013; Felber & Ostrovsky, 2017; Luo et al., 2016; Manku et al., 1999). The problem of biased or relative error quantiles, where one is interested in increased approximation accuracy for very small or very large quantiles, has also been investigated (Cormode et al.; 2006); it would be interesting to devise efficient differentially private algorithms for this problem. Recall that our approach is based on the Greenwald-Khanna deterministic all-quantiles sketch (Greenwald & Khanna, 2001). While some of the aforementioned randomized algorithms have a slightly better space complexity, differential privacy mechanisms are inherently randomized by themselves, and the analysis seems somewhat simpler and more intuitive when combined with a deterministic sketch. This, of course, does not rule out improved private algorithms based on modern efficient randomized sketches (see Section 7). 2.2 Differential Privacy Differentially private single quantile estimation: Works by Nissim, Raskhodnikova and Smith (Nissim et al., 2007) and Asi and Duchi (Asi & Duchi, 2020) improve the trade-offbetween accuracy and privacy by scaling the noise added for obfuscation in an instance-specific manner for median estimation in the absence of distributional assumptions. Another work by Dwork and Lei (Dwork & Lei, 2009) uses a propose-testrelease" paradigm to take advantage of local sensitivity; however, as observed in Gillenwater et al. (2021), in practice the error incurred by this method is relatively large as compared to other works like Tzamos et al. (2020). The work of Tzamos et al. (2020) achieves the optimal trade-offbetween privacy and utility in the distributional setting, but again as observed by Gillenwater et al. (2021), with a time complexity of O(n4), this method does not scale well to large data sets. The work of Gillenwater et al. (2021) shows how to optimize the division of the privacy budget to estimate m quantiles in a time-efficient manner. For estimation of m quantiles, their time and space complexity are O(mn log(n) + m2n) and O(m2n), respectively. They do an extensive experimental analysis and find lower error compared to previous work. However, although they provide intuition for why their method should incur relatively low error, they do not achieve formal theoretical accuracy guarantees. Kaplan et al. (2022) improve upon the work of Gillenwater et al. (2021) for the multiple approximate quantiles problem by using a tree-based (recursive) approach to CDF estimation. However, none of these works deal with quantile estimation in the sublinear space setting. Böhler & Kerschbaum (2020) solve the problem of estimating the joint median of two private data sets with time complexity sub-linear in the size of the data-universe and provide privacy guarantees for small data sets as well as limited group privacy guarantees unconditionally against polynomially time-bounded adversaries. Inherent privacy: Another line of work (Blocki et al., 2012; Smith et al., 2020; Choi et al., 2020) demonstrates that sketching algorithms for streaming problems might have inherent privacy guarantees under minimal assumptions on the dataset in some cases. For such algorithms, relatively little noise needs to be added to preserve privacy unconditionally. Published in Transactions on Machine Learning Research (06/2023) 3 Preliminaries and Notation We now give standard differential privacy notation and formally describe the quantile estimation problem. We also present the Greenwald-Khanna sketch guarantees in a suitable form. 3.1 Differential Privacy Definition 3.1 (Differential Privacy (Dwork et al., 2006)). Let Q : X n R be a (randomized) mechanism. For any ϵ 0, δ [0, 1], Q satisfies (ϵ, δ)-differential privacy if for any neighboring databases (that differ in one row) x x X n and any set S R, P[Q(x) S] eϵP[Q(x ) S] + δ. The probability is taken over the coin tosses of Q. We say that Q satisfies pure differential privacy (ϵ-DP) if δ = 0 and approximate differential privacy ((ϵ, δ)-DP) if δ > 0. We can set ϵ to be a small constant (e.g., between 0.01 and 2) but will require that δ n ω(1) be cryptographically small. 3.2 Quantile Approximation 2 3 4 5 Totally ordered domain X Data set S = {1, 2, 2, 3, 5, 2, 6, 5} Figure 1: In the data set S of 8 elements, the true 0.5 quantile is the value 2, and the values 2, 3, 4 and 5 are all acceptable 0.25-approximate answers. Note that although 1 is adjacent to 2 in the data universe X, it is not an acceptable output. In this subsection we make some definitions to formalize our analysis. Definition 3.2. (Quantiles) Let there be a totally ordered data universe X and an input data stream X = ((x1, 1), . . . , (xn, n)) (sometimes implicitly referred to as X = (x1, . . . , xn)). For (xi, i) X, let val((xi, i)) = xi and ix((xi, i)) = P j i |{(xj, j) : xj < xi or xj = xi, j < i}|. We abuse notation to say that for v1, v2 X, v1 v2 if ix(v1) ix(v2). Then, the q-quantile of X is val(v) for v X with ix(v) = qn . We observe that in this setting a value from the data universe can occur multiple times in a set. To account for this we define a range of ranks that any value can hold; this will be useful when reasoning about quantile approximation. Definition 3.3. (Rank and approximate quantiles) For x X, we define rmin(x) = |{v X : val(v) < x}|, rmax(x) = |{v X : val(v) x}| and rank(X, x) to be the interval [rmin(x), rmax(x)]. We say that x X is an α-approximate q-quantile for X if rank(X, x) [ qn αn, qn + αn] = . Consider the following example to see how these definitions play out in practice. Example 3.4. Given the data set {1, 2, 2, 3, 5, 2, 6, 5} (refer to figure 1), the 0.5 quantile is 2, which we distinguish from the median, which would be the average of the elements ranked 4 and 5, i.e, 2.5 for this data set. A straightforward way to obtain quantiles is to sort the dataset and the pick the element at the q n position. This method only works in the offline (non-streaming) setting. For α = 0.25, the α-approximate 0.5 quantiles are 2, 3, 4 and 5. Note that 4, which does not occur in the data set, is still a valid response, but 1, which occurs in the data set and is even adjacent to the true 0.5 quantile 2 in the data universe X, is not a valid response. We can now formalize the two versions of the problems that are dealt with in the literature on streaming quantile approximation. Let D be any distribution with random variable X D and CDF FX : R [0, 1]. For any q [0, 1], Qq D denotes the value x such that FX(x) = q. Published in Transactions on Machine Learning Research (06/2023) Definition 3.5 (Single Quantile). Given sample S = (x1, . . . , xn) in a streaming fashion in arbitrary order, construct a data structure for computing the quantile Qq D such that for any q [0, 1], with probability at least 1 β, |Qq D Qq S| α. Definition 3.6 (All Quantiles). Given sample S = (x1, . . . , xn) in a streaming fashion in arbitrary order, construct a data structure for computing the quantile Qq D such that with probability at least 1 β, for all values of q M where M [0, 1], |Qq D Qq S| α. 3.3 Non-private Quantile Streaming In this subsection, we provide formal guarantees needed for any sketch that our algorithms can build upon: Lemma 3.7. Given a data stream x1, . . . , xn of elements drawn from X, there exists a sketching algorithm that outputs a list S(X) of s = O 1 α log αn many tuples (vi, gi, i) (X N) N N for i = 1, . . . , s such that if X is the data multi-set then: 1. rank(X, val(vi)) [P j i gj, i + P 2. gi + i 2αn. 3. The first tuple is (min{x X}, 1, 0) and the last tuple is (max{x X}, 1, 0). 4. The vi are sorted in ascending order. Without loss of generality, the lower interval bounds P j i gj and upper interval bounds i + P j i gj are also sorted in increasing order. From conditions 1 and 2 above, it follows that the GK sketch can be used to compute α-approximate qquantiles by outputting the value vi for the unique index i such that P j i gj qn i + P j i gj. More generally, any sketch which achieves the properties outlined in Lemma 3.7 can be used in place of GK. 4 Differentially Private Algorithms In this section we present our two DP mechanisms for quantile estimation. Throughout, we assume that α is a user-defined approximation parameter. The goal is to obtain α-approximate q-quantiles. The new algorithms we introduce are: (1) DPExp GK (Algorithm 1): An exponential mechanism based (ϵ, 0)-DP algorithm for computing a single q-quantile. To solve the all-quantiles problem with approximation factor α, one can run this algorithm iteratively with target quantile 0, α, 2α, . . . . Doing so requires scaling the privacy parameter in each call by an additional α factor which increases the space complexity by a factor of 1/α. We also give an optimized implementation (algorithm 8) of this algorithm in the appendix. We extend our results to the continual observation setting (Dwork et al., 2010; Chan et al., 2011). (2) DPHist GK (Algorithm 2): A histogram based (ϵ, 0)-DP algorithm (Algorithm 2) for the α-approximate all-quantiles problem. The privacy guarantee of this algorithm is unconditional, but there is no universal theoretical utility bound as in the previous algorithm. However, in some cases the utility is provably better: for example, we show that if the data set is drawn from a normal distribution (with unknown mean and variance), we can avoid the quadractic 1/α2 factor in the sample complexity that we incur when using DPExp GKGumb for the same all-quantiles task. The rest of this section contains the descriptions of the two algorithms; we relegate all proofs to the appendix. 4.1 DPExp GKGumb: Exponential Mechanism Based Approach We first establish how the exponential mechanism and the GK sketch may be used in conjunction to solve the single quantile problem. Concretely, the high level idea is to call the privacy preserving exponential mechanism with a score function derived from the GK sketch. The exponential mechanism is a fundamental privacy primitive which when given a public set of choices and a private score for each choice outputs a choice Published in Transactions on Machine Learning Research (06/2023) Algorithm 1: DPExp GK: Exponential Mechanism DP Quantiles : High Level Description Data: X = (x1, x2, . . . , xn) Input: ϵ, α (approximation parameter), q [0, 1] (quantile parameters), δu (sensitivity) 1 S(X) = {(vi, gi, i) : i [s]} GK(X, α) 2 Define score function u(S(X), x) = min{|y qn | : y [ˆrmin(x), ˆrmax(x)]}. (1) ˆrmin(x) = max{ X j i gj : val(vi) < x} ˆrmax(x) = min{ i + X j i gj : val(vi) > x} 3 Execute the exponential mechanism with score function u(S(X), ), i.e. choose and output a single e X with probability 2δu u(S(X), e) . that with high probability has a score close to optimal whilst preserving privacy. In the course of constructing our algorithms, we have to resolve two problems; one, how to usefully construct a score function to pass to the exponential mechanism so that the private value derived is a good approximation to the q-quantile, and two, how to execute the exponential mechanism efficiently on the (possibly massive) data universe X. To resolve the first issue we devise the (not necessarily efficient) routine Algorithm 1; and to resolve the second one, we run an essentially equivalent but far more efficient routine, Algorithm 8. Constructing a score function: We recall that the GK sketch returns a short sequence of elements from the data set with a deterministic interval for their ranks and the promise that for any target quantile q [0, 1], there is some sketch element that lies within αn units in rank of qn . One technicality that we run into when trying to construct a score function on the data universe X is that when a single value occurs with very high frequency in the data set, the ranks of the set of occurrences can span a large interval in [0, n], and there is no one rank we can ascribe to it so as to compare it with the target rank qn . This can be resolved by defining the score for any data domain value in terms of the distance of its respective rank interval [rmin(x), rmax(x)] (formalized in Definition B.1) from qn ; elements whose intervals lie closer to the target have a higher score than further away ones. Efficiently executing the exponential mechanism: The exponential mechanism samples one of the public choices (in our case some element from the data universe X) with probability that increases with the quality of the choice according to the score function. In general every element can have a possibly different score and the efficiency of the exponential mechanism can vary widely depending on the context. In our setting, the succinctness of the GK sketch leads to a crucial observation: by defining the score function via the sketch, the data domain is partitioned into a relatively small number of sets such that the score function is constant on each partition. Concretely, for any two successive elements in the GK sketch, the range of values in the data universe that lie between them will have the same score according to our score function. We can hence first sample a partition from which to output a value, and then choose a value from within that interval uniformly at random. To make our implementation even more efficient and easy to use, we also make use of the Gumbel-max trick that allows us to iterate through the set of choices instead of storing them in memory - see appendix B.1 and the expanded pseudocode in algorithm 8 for more detail. On formalizing this outline we get the following formal guarantees for algorithm 1 (please see appendix B.1 for a complete proof). d( , ) denotes the ℓ1 metric on R. Published in Transactions on Machine Learning Research (06/2023) Theorem 4.1. Algorithm 1 is ϵ-differentially private. Let ˆx be the value returned by Algorithm 1 when initialized with target quantile q. The following statements hold: 1. Algorithm 1 can be implemented to run with space complexity O((1/α) log αn), such that with probability 1 β d( qn , [ˆrmin(ˆx), ˆrmax(ˆx)]) 2αn + 2(4αn + 2) log(|X|/β) 2. For n > 24 log |X|/β α min{ϵ,1} , Algorithm 1 can be implemented to run with space complexity O (αϵ) 1 log(|X|/β) log n such that with probability 1 β, ˆx is an α approximate q-quantile. 4.2 DPHist GK: Histogram Based Approach For methods in this section, we assume that we have K 1 disjoint bins each of width w (e.g., w = α/2) partitioning the data universe. These bins are used to construct a histogram. Essentially, Algorithm 2 builds an empirical histogram based on the GK sketch, adds noise so that the bin values satisfy (ϵ, 0)-DP, and converts this empirical histogram to an approximate empirical CDF, from which the quantiles can be approximately calculated. We demonstrate one use case of DPHist GK where the space complexity required improves upon the worst-case bound for DPExp GK, Theorem 4.1. While the histogram based mechanism does not have universal utility bounds in the spirit of the above theorem, the results in this section serve as one simple example where it may yield desirable accuracy while using less space. Algorithm 2: DPHist GK: Computing DP Quantiles in Bounded Space Data: X = (x1, x2, . . . , xn) Input: ϵ, α (approximation parameter), q [0, 1] (quantile parameters), w 1 Build summary sketch S(X) where 2 S(X) = {(vi, gi, i) : i [s]} GK(X, α) /* cell labels ai and counts ci = 0 */ 3 Initialize data-agnostic (empty) histogram Hist = (ai, ci), . . . with cell widths w 4 for (vi, gi, i) S(X) do 5 Insert gi counts of vi into histogram Hist 7 H = [ ]/* initialize empty list */ 8 for (ai, ci) Hist do 9 ci = max(0, ci + Lap(0, 2/ϵ)) 10 Append (ai, c + ci) to H 11 c = c + ci 13 for (b, rank) H do 14 if r < rank then 15 return b /* return last element of H */ 16 return H[|H| 1] Suppose that we are given an i.i.d. sample S = (X1, X2, . . . , Xn) such that for all i [n], Xi N(µ, σ2Id d), µ Rd. The goal is to estimate DP quantiles of the distribution N(µ, σ2Id d) without knowledge of µ or σ2. We will show how to estimate the quantiles assuming that σ2 is known. Note that it is easy to generalize the work to the case where σ2 is unknown as follows: For any sample S drawn from i.i.d. from N(µ, σ2Id d), the 1 β confidence interval is X σ n z1 β/2, Published in Transactions on Machine Learning Research (06/2023) where z1 β/2 is the 1 β/2 quantile of the standard normal distribution and X is the empirical mean. The length of this interval is fixed and equal to 2σz1 β/2 n = Θ σ n q β . In the case where σ2 is unknown, the confidence interval becomes X s n tn 1,1 β/2, where s2 = 1 n 1 Pn i=1(Xi X)2 is the sample variance (sample estimate of σ2) and tn 1,1 β/2 is the 1 β/2 quantile of the t-distribution with n 1 degrees of freedom. The length of the interval can be shown to be 2σ n kn tn 1,1 β/2 = Θ σ n where kn = 1 O(1/n) is an appropriately chosen constant (see (Lehmann & Romano, 2005; Karwa & Vadhan, 2018; Keener, 2010) for more details and discussion). We can hence assume that σ2 is known and proceed to show sample and space complexity bounds. 2 For any q (0, 1), we denote the q-quantile of the sample S = (X1, X2, . . . , Xn) as Qq S and the q-quantile of the distribution as Qq D. With this notation, for some β (0, 1] and α > 0, we wish to obtain a DP q-quantile Qq S such that P[ Qq D Qq S α] β. We shall proceed in a three-step approach: (1) Estimate a DP range of the population in sub-linear space; (2) Use this range of the population to construct a DP histogram using the stream S; (3) Use the DP histogram to estimate one or more quantiles via the sub-linear data structure of Greenwald and Khanna. Our main formal result for algorithm 2 can be summarized as follows (please see appendix B.2 for a complete proof). Theorem 4.2. For the one-dimensional normal distribution N(µ, σ2), let S = (X1, X2, . . . , Xn) be a data stream through which we wish to obtain Qq S, a DP estimate of the q-quantile of the distribution. For any q (0, 1), there exists an (ϵ, δ)-DP algorithm such that, with probability at least 1 β, we obtain |Qq D Qq S| α for any α > 0, β (0, 1], ϵ, δ (0, 1/n) and for stream length n max {min {A, B} , C} , where as long as µ ( R, R) and using space of O(max{ R α log αn}). In the case where d = 1, by Theorem 4.2, there exists an (ϵ, δ)-DP algorithm Qq S such that if µ ( R, R) then using space of O(max{ R α log αn}) (with probability 1) as long as the stream length n is at least Ω max min O R we get the guarantee that, for all β (0, 1], α > 0, P[ Qq D Qq S α] β. Intuitively, this means that: (1) Space: We need less space to estimate any quantile with DP guarantees if the distribution is less concentrated (i.e., σ can be large) or if we do not require a high degree of accuracy for our queries (i.e., α can be large). (2) Stream Length: We need a large stream length to estimate quantiles if we require a high degree of accuracy (i.e., smaller β, α), or do not have a good public estimate of µ (large R), or have small privacy parameters (small ϵ, δ), or have concentrated datasets (small σ). 5 Continual Observation We now describe how our one-shot approach can be used as a black box to obtain a continual observation solution (Dwork et al., 2010; Chan et al., 2011). In the absence of privacy, we recall that an α/2-approximate solution for the q-quantile problem on a stream of length n allows for an additive error of αn/2 in the rank of any candidate q-quantile solution. It follows that if we append any arbitrary αn/2-many elements to the end 2One could also estimate the variance in a DP way and then prove the complexity bounds. Published in Transactions on Machine Learning Research (06/2023) Algorithm 3: Continual Observation DP Quantiles Data: Input stream X = (x1, . . . , xn) for some n > nmin = Ω 1 α2ϵ log n log |X| log n αβ , privacy parameter ϵ, approximation parameter α, target quantile q 1 , failure probability β Let s = 0 2 ϵ O(αϵ/ log n) 5 cp(s) nmin /* stores stream checkpoints */ 6 Instantiate GK GK(α ) 7 vs /* holds α-approximate q-quantile */ 8 for stream element xs X do 10 GK.insert(xs) 11 if s = cp(s 1) then 12 vs DPExp GK(GK, ϵ , α , q, β ) 13 cp(s) cp(s 1)(1 + α/2) 15 vs vs 1 16 cp(s) cp(s 1) of the stream, these new elements can lead to an additional additive error of at most αn/2. This suggests that on releasing a DP quantile estimate at any point in the stream, we can avoid updating our q-quantile estimate for a proportionate number of additional elements whilst maintaining a (1+α)-approximation. This is essentially the idea behind the flip number used in the study of adversarially robust streaming algorithms. At a high level, the flip number is a measure of the number of times the output of an online algorithm changes by more than a (1 α) multiplicative factor (see (Ben-Eliezer et al., 2020) for more detail). Building on this idea, we can show that we only need update our estimate after every additional α/2-fraction as many elements are added as were present at the previous estimate. At the beginning of this process, i.e. when no stream elements have been processed, for the first few elements (nmin-many, as described in the pseudo-code) we will need to update our online estimate every time or omit producing any estimate. However, after this short warm-up prefix of the stream, a relatively small number of estimates serve as (1 + α)-approximate q quantiles for every point in the remainder of the stream. Accounting for the additional requirement of privacy is now easy - we merely need to privatize the estimate at each of the checkpoints, which are the points in the stream where the quantile estimate must be updated. For n elements in all there are O log1+α n nmin many such elements, which for a suitable choice of nmin simplifies essentially to O log n ϵα . In other words we see that the privacy loss scales only with the logarithm of the length of the stream because of the relatively few checkpoints that occur. The formal statement is as below, and we present a complete proof in appendix B.3. Theorem 5.1. Let ϵ, α > 0, n Z. For any β (0, 1], with probability 1 β, Algorithm 3 maintains an α-approximate q-quantile at every point s in the data stream for s = Ω log n log |X|/β α2ϵ . Furthermore, Algorithm 3 satisfies ϵ-DP and has space complexity Ω 1 α2ϵ log2 n log |X| log n We conclude by mentioning that our continual observation solution incurs roughly a O(log n/α) overhead in the space complexity, which is in line with classical works in differential privacy and adversarially robust streaming (Ben-Eliezer et al., 2020; Dwork et al., 2010). Concurrently, Stemmer and Kaplan (Kaplan & Stemmer, 2021) developed a notion of streaming sanitizers which yields a continual observation guarantee for free , without incurring such an overhead over the one-shot case. Published in Transactions on Machine Learning Research (06/2023) DPExp GKGumb (a) Error vs. stream length for Uniform distribution. DPExp GKGumb (b) Data structure sizes vs. stream length for Uniform distribution. DPExp GKGumb (c) Error vs. stream length for Normal distribution. DPExp GKGumb (d) Data structure sizes vs. stream length for Normal distribution. Figure 2: DPExp GKGumb versus DPExp Full for uniform or normally distributed data, α = 10 4, ϵ = 1, q = 0.5 6 Experimental Evaluation In this section, we experimentally evaluate our sublinear-space exponential-based mechanism, DPExp GKGumb. We study how well our algorithm performs in terms of accuracy and space usage. This section validates the theoretical results in the prequel - we find that the space complexity of the algorithm is indeed very small in practice, and that the accuracy is typically closely tied to the approximation parameter α. Our main baselines will be DPExp Full, which applies the exponential mechanism on the full data set without any sketching algorithms (see appendix C.1 for further implementation details), and the true quantile value. The true quantile values are used to compute relative error, the absolute value of the difference between the estimated quantile and the true quantile, divided by the standard deviation of the data set. We graph the mean relative error over 100 trials of the exponential mechanism per experiment, as well as the 80% confidence interval computed by taking the 10th and 90th percentile. We fix the target quantile to be estimated to q = 0.5 (the median) throughout as the relative error and the space usage generally do not seem to vary much with choice of q. 6.1 Synthetic data sets In this subsection, we compare our methods on synthetically generated datasets. We vary the parameters of the Stream Length n (the size of the input stream x1, . . . , xn), the Approximation Parameter α (the approximation factor used by the internal GK sketch), and the Data Distribution. We generate data from uniform and Gaussian distributions; we use a uniform distribution in range [0, 1] (i.e., U(0, 1)) or a normal distribution with mean 0 and variance 1 (i.e., N(0, 1)), clipped to the interval [ 10, 10].3. We show results on space usage and relative error (both plotted mostly on logarithmic scales) from non-DP estimates, as we vary the parameters listed above. In Figure 2, we vary the stream length n for an approximation factor of α = 10 4. The streams are either normally or uniformly distributed. In Figures 2a and 2b, we compare DPExp GKGumb (space strongly sublinear in n) vs. DPExp Full (uses space of O(n)) in terms of space usage and accuracy. In general we find that although our method DPExp GKGumb incurs higher error, in absolute terms it remains quite small and the 95% confidence intervals tend to be adjacent for DPExp GKGumb and DPExp Full. However, there is a clear trend of an exponential gap developing between their respective space usages which is a natural consequence of the space complexity guarantee of the GK sketch. This holds for both distributions studied. 3Clipping is required for the exponential mechanism, as it must operate on some bounded interval of values. In any case, we never expect to see samples from N(0, 1) that lie outside [ 10, 10] for any practical purpose; the probability for any given sample to satisfy this is minuscule, at about 10 21. Published in Transactions on Machine Learning Research (06/2023) 10 6 10 4 10 2 DPExp GKGumb (a) Uniform distribution: Error vs. stream length 101 103 105 DPExp GKGumb (b) Data structure sizes vs. stream length 10 5 10 3 10 1 DPExp GKGumb (c) Normal distribution: Error vs stream length 101 103 105 DPExp GKGumb (d) Data structure sizes vs. stream length Figure 3: DPExp GKGumb versus DPExp Full for uniform or normally distributed data, α = 10 1, ϵ = 1, q = 0.5 2 10 4 4 10 4 (a) Relative error vs. datastream size for α = 10 4, ϵ = 1, q = 0.5 (b) Relative error vs. datastream size for α = 10 1, ϵ = 1, q = 0.5 0.1 0.5 1 5 0 0.0015 Uniform (c) Approx. factor α = 10 4 0.1 0.5 1 5 0 0.1 (d) Approx. factor α = 10 1 Figure 4: Relative error versus approximation factor. In Figure 3, we vary the stream length for a relatively large approximation factor of 0.1. Here we see that compared to the non-approximate method we incur far higher error, although there is also a concomitant increase in the space savings. This is not a typical use-case since the non-private error can itself be large, but we get a complete picture of how space usage and performance vary with this user-defined parameter. In Figures 4c and 4d, we vary the approximation factor. In the small approximation setting we see the inverse tendency of accuracy with privacy which is characteristic of most DP algorithms. However, in the large approximation setting, there is no such clear drop in performance with more privacy. This motivates the question of determining the true interplay between the approximation factor α and the private parameter ϵ, as discussed further in Section 7. Our algorithm performs well in practical settings where one wishes to estimate some quantity across all data items privately and using small space. For example, our results indicate that choosing an approximation factor of α = 10 4 induces an error which is also of order about 10 4 for privately computing parameters chosen according to a uniform or normal distribution, all while saving orders of magnitude in the space complexity. 6.2 Real-World Datasets We repeat our investigation of the utility and space complexity comparison between DPExp GKGumb and DPExp Full with the following real-world data sets (Dua & Graff, 2017): (1) Taxi Service Trajectory: A dataset from the UCI machine learning repository describing trajectories performed by all 442 taxis (at 4The 80% CI for error incurred by DPExp Full was entirely supported on the point 0 and drops offaxis as we use the log scale. Published in Transactions on Machine Learning Research (06/2023) 0.1 0.5 1 5 10 4 10 2 100 1E-2 1E-5 DPExp Full (a) Taxi data set: Error versus privacy parameter ϵ for DPExp GKGumb for approximation α = 10 2 and 10 5, and DPExp Full. exact 1E-2 1E-5 103 (b) Taxi data set: Space used by DPExp GKGumb for α = 10 2 and 10 5, and DPExp Full (abbreviated exact for brevity). 0.1 0.5 1 5 10 4 10 2 1E-2 1E-5 DPExp Full (c) Gas sensor data set: Error versus privacy parameter ϵ for DPExp GKGumb for approximation α = 10 2 and 10 5, and DPExp Full4. exact 1E-2 1E-5 (d) Gas sensor data set: Space used by DPExp GKGumb for α = 10 2 and 10 5, and DPExp Full (abbreviated exact for brevity). Figure 5: Taxi and Gas sensor data sets. the time) in the city of Porto in Portugal (Moreira-Matias et al., 2013). This dataset contains real-valued attributes with about 1.5 million instances. (2) Gas Sensor Dataset: A UCI repository dataset containing recordings of 16 chemical sensors exposed to varying concentrations of two gas mixtures (Fonollosa et al., 2015). The sensor measurements are acquired continuously during a 12-hour time range and contains about 4 millions instances. We pick a real-valued attribute from each dataset (the TIMESTAMP and the first ETHYLENE_CO gas sensor value, respectively) and calculate the median on these datasets. The results are reported in Figures 5a and 5b, and Figures 5c and 5d respectively. We see that the larger the approximation factor, the larger the space savings are with DPExp GK. Comparing DPExp Full to DPExp GK, we see space savings of 2 times up to 1000 times as we vary the approximation factor. These results are consistent with our expectations that the space savings are inversely proportional to the allowed approximation factor. 7 Conclusion & Future Work In this work, we presented sublinear-space and differentially private algorithms for approximately estimating quantiles in a dataset. Our solutions are two-part: one based on the exponential mechanism and efficiently implemented via the use of the Gumbel distribution; the other based on constructing histograms. Our algorithms are supplemented with theoretical utility guarantees. Furthermore, we experimentally validate our methods on both synthetic and real-world datasets. Our work leaves room for further exploration in various directions. Interplay between α and ϵ: The space complexity bounds we obtain are (up to lower order terms) inversely linear in α and in ϵ. While it is either known or easy to show that such linear dependence in each of these parameters in itself is necessary, it is not clear whether the α 1ϵ 1 term in Theorem 4.1 can be replaced with, say, α 1 + ϵ 1. Such an improvement, if possible, seems to require substantially modifying the baseline Greenwald-Khanna sketch or adding randomness. Alternative streaming baselines: We base our mechanisms upon the GK-sketch, which is known to be space-optimal among deterministic streaming algorithms for quantile approximation. The use of a deterministic baseline simplifies the analysis and the overall solution, but better randomized streaming algorithms for the same problem are known to exist. What would be the benefit of working, e.g., with the (optimal among randomized algorithms) KLL-sketch (Karnin et al., 2016)? Dependence in universe size: The dependence of our space complexity bounds in the size of the universe, X, is logarithmic. Recent work of Kaplan et al. (Kaplan et al., 2020) (see also (Bun et al., 2015)) on Published in Transactions on Machine Learning Research (06/2023) the sample (not space) complexity of privately learning thresholds in one dimension, a fundamental problem at the intersection of learning theory and privacy, demonstrate a bound polynomial in log |X| on the sample complexity. As quantile estimation and threshold learning are closely related problems, this raises the question of whether techniques developed in the aforementioned papers can improve the dependence on |X| in our bounds. Random order: The results presented here (except for those about normally distributed data) all assume that the data stream is presented in worst case order, an assumption that may be too strong for some scenarios. Can improved bounds be proved when the data elements are chosen in advance but their order is chosen randomly? This can serve as a middle ground between the most general case (which we address in this paper) and the case where data is assumed to be generated according to a certain distribution. 8 Acknowledgements D.A. was supported by a Junior Fellowship from the Simons Foundation Society of Fellows, Cooperative Agreement CB20ADR0160001 with the U.S. Census Bureau, and a Fellowship from Meta AI. Most of this work was done while he was a Ph.D. student at Harvard University. A.C. was supported in part by NSF CAREER grant 1750716. Jacob Abernethy, Chansoo Lee, and Ambuj Tewari. Perturbation Techniques in Online Learning and Optimization, pp. 233 264. 2017. 23 Pankaj K. Agarwal, Graham Cormode, Zengfeng Huang, JeffM. Phillips, Zhewei Wei, and Ke Yi. Mergeable summaries. ACM Trans. Database Syst., 38(4):26:1 26:28, 2013. 2, 3 Hilal Asi and John C. Duchi. Near instance-optimality in differential privacy. Co RR, abs/2005.10630, 2020. Victor Balcer and Salil P. Vadhan. Differential privacy on finite computers. In 9th Innovations in Theoretical Computer Science Conference, ITCS 2018, January 11-14, 2018, Cambridge, MA, USA, volume 94 of LIPIcs, pp. 43:1 43:21. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2018. 23 Omri Ben-Eliezer, Rajesh Jayaram, David P. Woodruff, and Eylon Yogev. A framework for adversarially robust streaming algorithms. In Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS 2020, Portland, OR, USA, June 14-19, 2020, pp. 63 80, 2020. 9 Jeremiah Blocki, Avrim Blum, Anupam Datta, and Or Sheffet. The Johnson-Lindenstrauss transform itself preserves differential privacy. In 53rd Annual IEEE Symposium on Foundations of Computer Science, FOCS 2012, New Brunswick, NJ, USA, October 20-23, 2012, pp. 410 419. IEEE Computer Society, 2012. 3 Jonas Böhler and Florian Kerschbaum. Secure sublinear time differentially private median computation. In 27th Annual Network and Distributed System Security Symposium, NDSS 2020, San Diego, California, USA, February 23-26, 2020, 2020. 3 Mark Bun, Kobbi Nissim, Uri Stemmer, and Salil P. Vadhan. Differentially private release and learning of threshold functions. In IEEE 56th Annual Symposium on Foundations of Computer Science, FOCS 2015, Berkeley, CA, USA, 17-20 October, 2015, pp. 634 649, 2015. 12, 25 T.-H. Hubert Chan, Elaine Shi, and Dawn Song. Private and continual release of statistics. ACM Trans. Inf. Syst. Secur., 14(3):26:1 26:24, 2011. 5, 8 Seung Geol Choi, Dana Dachman-Soled, Mukul Kulkarni, and Arkady Yerukhimovich. Differentially-private multi-party sketching for large-scale statistics. Proc. Priv. Enhancing Technol., 2020(3):153 174, 2020. 3 Published in Transactions on Machine Learning Research (06/2023) Graham Cormode and Pavel Veselý. A tight lower bound for comparison-based quantile summaries. In Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS 20, pp. 81 93. Association for Computing Machinery, 2020. 3 Graham Cormode, Zohar S. Karnin, Edo Liberty, Justin Thaler, and Pavel Veselý. Relative error streaming quantiles. In Proceedings of the 40th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS 21, pp. 96 108. Association for Computing Machinery. 3 Graham Cormode, Flip Korn, S. Muthukrishnan, and Divesh Srivastava. Spaceand time-efficient deterministic algorithms for biased quantiles over data streams. In Proceedings of the Twenty-Fifth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, June 26-28, 2006, Chicago, Illinois, USA, pp. 263 272, 2006. 3 Dheeru Dua and Casey Graff. UCI machine learning repository, 2017. 11 A. Dvoretzky, J. Kiefer, and J. Wolfowitz. Asymptotic minimax character of the sample distribution function and of the classical multinomial estimator. Ann. Math. Statist., 27(3):642 669, 09 1956. doi: 10.1214/ aoms/1177728174. 24 Cynthia Dwork and Jing Lei. Differential privacy and robust statistics. In Proceedings of the 41st Annual ACM Symposium on Theory of Computing, STOC 2009, Bethesda, MD, USA, May 31 - June 2, 2009, pp. 371 380, 2009. 1, 3 Cynthia Dwork, Frank Mc Sherry, Kobbi Nissim, and Adam D. Smith. Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography, Third Theory of Cryptography Conference, TCC 2006, New York, NY, USA, March 4-7, 2006, Proceedings, pp. 265 284, 2006. 4, 26 Cynthia Dwork, Moni Naor, Toniann Pitassi, and Guy N. Rothblum. Differential privacy under continual observation. In Proceedings of the 42nd ACM Symposium on Theory of Computing, STOC 2010, Cambridge, Massachusetts, USA, 5-8 June 2010, pp. 715 724. ACM, 2010. 5, 8, 9 David Felber and Rafail Ostrovsky. A randomized online quantile summary in o((1/ϵ) log(1/ϵ)) words. Theory Comput., 13(1):1 17, 2017. 2, 3 Jordi Fonollosa, Sadique Sheik, Ramón Huerta, and Santiago Marco. Reservoir computing compensates slow response of chemosensor arrays exposed to fast varying gas concentrations in continuous monitoring. Sensors and Actuators B: Chemical, 215:618 629, 2015. ISSN 0925-4005. 12 Jennifer Gillenwater, Matthew Joseph, and Alex Kulesza. Differentially private quantiles. In Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pp. 3713 3722. PMLR, 2021. 1, 3 Michael Greenwald and Sanjeev Khanna. Space-efficient online computation of quantile summaries. In Proceedings of the 2001 ACM SIGMOD international conference on Management of data, Santa Barbara, CA, USA, May 21-24, 2001, pp. 58 66, 2001. 2, 3, 17, 19 Michael Greenwald and Sanjeev Khanna. Power-conserving computation of order-statistics over sensor networks. In Proceedings of the Twenty-third ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, June 14-16, 2004, Paris, France, pp. 275 285, 2004. 25 Michael B. Greenwald and Sanjeev Khanna. Quantiles and equi-depth histograms over streams. In Data Stream Management - Processing High-Speed Data Streams, pp. 45 86. 2016. 2 Peter J. Huber. Robust Estimation of a Location Parameter. The Annals of Mathematical Statistics, 35(1): 73 101, 1964. doi: 10.1214/aoms/1177703732. 1 Regant Y. S. Hung and Hing-Fung Ting. An ω(1/ϵ log(1/ϵ)) space lower bound for finding epsilonapproximate quantiles in a data stream. In Frontiers in Algorithmics, 4th International Workshop, FAW 2010, Wuhan, China, August 11-13, 2010. Proceedings, volume 6213 of Lecture Notes in Computer Science, pp. 89 100. Springer, 2010. 2, 3 Published in Transactions on Machine Learning Research (06/2023) Haim Kaplan and Uri Stemmer. A note on sanitizing streams with differential privacy, 2021. 9, 24 Haim Kaplan, Katrina Ligett, Yishay Mansour, Moni Naor, and Uri Stemmer. Privately learning thresholds: Closing the exponential gap. In Conference on Learning Theory, COLT 2020, 9-12 July 2020, Virtual Event [Graz, Austria], volume 125 of Proceedings of Machine Learning Research, pp. 2263 2285. PMLR, 2020. 12 Haim Kaplan, Shachar Schnapp, and Uri Stemmer. Differentially private approximate quantiles. In Proceedings of the 39th International Conference on Machine Learning, Proceedings of Machine Learning Research, pp. 10751 10761. PMLR, 2022. 3 Zohar S. Karnin, Kevin J. Lang, and Edo Liberty. Optimal quantile approximation in streams. In IEEE 57th Annual Symposium on Foundations of Computer Science, FOCS 2016, 9-11 October 2016, Hyatt Regency, New Brunswick, New Jersey, USA, pp. 71 78, 2016. 2, 3, 12 Vishesh Karwa and Salil Vadhan. Finite Sample Differentially Private Confidence Intervals. In Anna R. Karlin (ed.), 9th Innovations in Theoretical Computer Science Conference (ITCS 2018), volume 94 of Leibniz International Proceedings in Informatics (LIPIcs), pp. 44:1 44:9, 2018. 8, 25 R.W. Keener. Theoretical Statistics: Topics for a Core Course. Springer Texts in Statistics. Springer New York, 2010. 8 William H. Kruskal and W. Allen Wallis. Use of ranks in one-criterion variance analysis. Journal of the American Statistical Association, 47(260):583 621, 1952. doi: 10.1080/01621459.1952.10483441. 1 Erich L. Lehmann and Joseph P. Romano. Testing Statistical Hypotheses. Springer Texts in Statistics. Springer New York, New York, NY, 3. edition, 2005. ISBN 9780387276052. 8 Ge Luo, Lu Wang, Ke Yi, and Graham Cormode. Quantiles over data streams: experimental comparisons, new analyses, and further improvements. VLDB J., 25(4):449 472, 2016. 3 Gurmeet Singh Manku, Sridhar Rajagopalan, and Bruce G. Lindsay. Random sampling techniques for space efficient online computation of order statistics of large datasets. In SIGMOD 1999, Proceedings ACM SIGMOD International Conference on Management of Data, June 1-3, 1999, Philadelphia, Pennsylvania, USA, pp. 251 262, 1999. 2, 3 Frank Mc Sherry and Kunal Talwar. Mechanism design via differential privacy. In 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS 2007), October 20-23, 2007, Providence, RI, USA, Proceedings, pp. 94 103, 2007. 20, 21 Darakhshan J. Mir, S. Muthukrishnan, Aleksandar Nikolov, and Rebecca N. Wright. Pan-private algorithms via statistics on sketches. In Proceedings of the 30th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2011, June 12-16, 2011, Athens, Greece, pp. 37 48, 2011. 2 Ilya Mironov. On significance of the least significant bits for differential privacy. In the ACM Conference on Computer and Communications Security, CCS 12, Raleigh, NC, USA, October 16-18, 2012, pp. 650 661, 2012. 23 Luís Moreira-Matias, João Gama, Michel Ferreira, João Mendes-Moreira, and Luís Damas. Predicting taxipassenger demand using streaming data. IEEE Trans. Intell. Transp. Syst., 14(3):1393 1402, 2013. 12 J.I. Munro and M.S. Paterson. Selection and sorting with limited storage. Theoretical Computer Science, 12(3):315 323, 1980. ISSN 0304-3975. doi: https://doi.org/10.1016/0304-3975(80)90061-4. 2, 3 Kobbi Nissim, Sofya Raskhodnikova, and Adam D. Smith. Smooth sensitivity and sampling in private data analysis. In Proceedings of the 39th Annual ACM Symposium on Theory of Computing, San Diego, California, USA, June 11-13, 2007, pp. 75 84, 2007. 1, 3 Published in Transactions on Machine Learning Research (06/2023) Nisheeth Shrivastava, Chiranjeeb Buragohain, Divyakant Agrawal, and Subhash Suri. Medians and beyond: New aggregation techniques for sensor networks. In Proceedings of the 2nd International Conference on Embedded Networked Sensor Systems, Sen Sys 04, pp. 239 249. Association for Computing Machinery, 2004. ISBN 1581138792. 2 Adam D. Smith. Privacy-preserving statistical estimation with optimal convergence rates. In Proceedings of the 43rd ACM Symposium on Theory of Computing, STOC 2011, San Jose, CA, USA, 6-8 June 2011, pp. 813 822, 2011. 2, 21 Adam D. Smith, Shuang Song, and Abhradeep Thakurta. The Flajolet-Martin sketch itself preserves differential privacy: Private counting with minimal space. In Neur IPS, 2020. 3 J. W. Tukey. A survey of sampling from contaminated distributions. Contributions to Probability and Statistics, pp. 448 485, 1960. 1 Christos Tzamos, Emmanouil-Vasileios Vlatakis-Gkaragkounis, and Ilias Zadik. Optimal private median estimation under minimal distributional assumptions. In Neur IPS, 2020. 1, 3 Salil P. Vadhan. The complexity of differential privacy. In Tutorials on the Foundations of Cryptography, pp. 347 450. 2017. 25 Lu Wang, Ge Luo, Ke Yi, and Graham Cormode. Quantiles over data streams: an experimental study. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2013, New York, NY, USA, June 22-27, 2013, pp. 737 748, 2013. 2 Zhuolun Xiang, Bolin Ding, Xi He, and Jingren Zhou. Linear and range counting under metric-based local differential privacy. In IEEE International Symposium on Information Theory, ISIT 2020, pp. 908 913, 2020. 2 A Greenwald-Khanna Sketch For completeness of our algorithm s description, we specify the operations in the Greenwald-Khanna (GK) non-private sketch. Throughout, we will use n = n(t) to denote the number of elements encountered up to time t Z+. Some of the operations outlined here will be used a subroutines for the DP procedures. A.1 The Sketch Let X = (x1, x2, . . . , xn) be a stream of items and S(X) be the resulting sketch with size sublinear in n. The GK sketch stores S(X) = t0, t1, . . . , ts 1 , i {0, . . . , s 1}, ti = (vi, gi, i), where gi = rmin(vi) rmin(vi 1) and i = rmax(vi) rmin(vi). We reserve v0, vs 1 be denote the smallest and largest elements seem in the stream X, respectively. We use S(X)[i] to refer to the ith tuple in the sketch S(X). i.e., for any i, S(X)[i] = ti = (vi, gi, i). Implicitly, the goal is to (implicitly) maintain bounds rmin(v) and rmax(v) for every v in S(X). rmin(v) and rmax(v) are the lower and upper bounds on the rank of v amongst all items in X, respectively. We can compute these bounds as follows: rmin(vi) = X j i gj, rmin(vi) = X j i gj + i. As a result, gi + i 1 is an upper bound on the number of items between vi 1 and vi. In addition, n = P Published in Transactions on Machine Learning Research (06/2023) The sketch is built in such a way to guarantee (maximum) error of maxs 1 i=0 (gi + i)/2 for approximately computing any quantile using the sketch. We will also impose a tree structure over tuples in S(X) (mostly because of the merge procedure) as follows: the tree T(X) associated with S(X) has a node Vi for each ti. The parent of a node Vi is the node Vj such that j is the smallest index greater than i with band(tj) > band(ti). band(ti) is the band of i at time n and bandτ(n) as all tuples that had band value of τ. All possible values of are denoted as bands and it can take on values between (0, 1 42αn, . . . , 2i 1 2i 2αn, . . . , 2αn 1, 2αn) corresponding to capacities of (2αn, αn, . . . , 8, 4, 2, 1). A.2 Quantile Algorithm 4 computes the α-approximate q-quantile based on the sketch S(X) that has size that is sublinear in n. The algorithm goes through all tuples and checks if the condition max(r rmin(vi), rmax(vi) r) αn is satisfied and return (i, vi) as the representative approximate quantile. This algorithm will be used a subroutine for one or more of our differentially private algorithms. Lemma A.1 (Proposition 1 & Corollary 1 (Greenwald & Khanna, 2001)). If after receiving n items in the stream, the sketch S(X) satisfies the property maxi(gi + i) 2αn, then Algorithm 4 returns an αapproximate q-quantile. Proof. The algorithm computes r = qn . Then the condition max(r rmin(vi), rmax(vi) r) αn clearly is (by definition) an α-approximate q-quantile. We still need to show that such vi always exists. First set e = maxi(gi + i)/2. If r > n e, then rmin(vs 1) = rmax(vs 1) = n so that i = s 1 satisfies the property. When r n e, then the algorithm chooses the smallest index j such that rmax(vj) > r + e so that r e rmin(vj 1). This follows since if r e > rmin(vj 1) then rmax(vj) = rmin(vj 1)+gj + j > rmin(vj 1)+2e which contradicts the definition of e. Algorithm 4: Quantile(S(X), q, n, α): Computing α-Approximate Quantiles Input: S(X), q, n, α (approximation parameter) 1 Compute r = qn 2 for i = 0, . . . , s 1 do 3 (vi, gi, i) = S(X)[i] 4 if max(r rmin(vi), rmax(vi) r) αn then 5 return (i, vi) Algorithm 5 goes through a stream of items and inserts into the sketch. The algorithm calls a compress operator on the data structure every time that i 0 mod 1 2α for any i [m]. Algorithm 6 inserts a particular item xn into the data structure S(X). In the special case where xn is a minimum or maximum, it inserts the tuple (xn, 1, 0) at the beginning or end of S(X). Otherwise, it finds an index i such that vi 1 xn < vi and then inserts the tuple (xn, 1, 2αn ) into S(X) at position i. A.4 Compress The Compress operation is an internal operation used for compressing (contiguous) tuples in S(X). The goal of this operation is to merge a node and its descendants into either its right sibling or parent node. After merge, we have to maintain the property that the tuple is not full. A tuple is full when gi + i 2αn . Published in Transactions on Machine Learning Research (06/2023) Algorithm 5: Inserting a stream of items into Summary Sketch Data: x1, x2, . . . , xm, . . . Input: S(X), α (approximation parameter) 2 for i = 1, . . . , m, . . . do 3 if i 0 mod 1 2α then 4 Compress(S(X), α, n) 5 Insert(S(X), α, xi) 6 n = n + 1 7 return S(X), n Algorithm 6: Insert(S(X), α, xn): Inserting into Summary Sketch Data: xn Input: S(X), α (approximation parameter) 1 (v0, g0, 0) = S(X)[0] 2 (vs 1, gs 1, s 1) = S(X)[s 1] 3 if xn < v0 then 4 Shift all positions in S(X)[0 . . . s 1] to S(X)[i . . . s] 5 S(X)[0] = (xn, 1, 0) 6 else if xn > vs 1 then 7 S(X)[s] = (xn, 1, 0) 9 for i = 0, . . . , s 1 do 10 (vi, gi, i) = S(X)[i] 11 if vi 1 xn < vi then 12 Shift all positions in S(X)[i . . . s 1] to S(X)[i + 1 . . . s] 13 S(X)[i] = (xn, 1, 2αn ) 14 return S(X), s + 1 Algorithm 7: Compressing the Sketch Input: S(X), α (approximation parameter), n 1 if n < 1 2α then 3 for i = s 2, . . . , 0 do 4 ti = (vi, gi, i) = S(X)[i] 5 ti+1 = (vi+1, gi+1, i+1) = S(X)[i + 1] 6 Compute g i , the sum of g-values of tuple ti and its descendants 7 if band(ti) band(ti+1) & (g i + gi+1 + i+1 < 2αn) then 8 Delete all descendants of ti and the tuple ti from sketch S(X) 9 Update ti+1 in S(X) to (vi+1, g i + gi+1, i+1) Published in Transactions on Machine Learning Research (06/2023) By Proposition A.2, a node and its children will form a contiguous segment. Let g i be the sum of g-values of tuple ti and all of its descendants. Then merging ti and its descendants would update ti+1 in S(X) to (vi+1, g i + gi+1, i+1) and delete ti and all of its descendants. Proposition A.2 (Proposition 4 in (Greenwald & Khanna, 2001)). For any node V , the set of all its descendants in the tree forms a contiguous segment in S(X). A.5 Formal guarantee We briefly recall the main guarantees of the GK sketch that we appeal to in the main body of this work and provide a proof for the statements that are not reproduced from previous work. Lemma 3.7. Given a data stream x1, . . . , xn of elements drawn from X, there exists a sketching algorithm that outputs a list S(X) of s = O 1 α log αn many tuples (vi, gi, i) (X N) N N for i = 1, . . . , s such that if X is the data multi-set then: 1. rank(X, val(vi)) [P j i gj, i + P 2. gi + i 2αn. 3. The first tuple is (min{x X}, 1, 0) and the last tuple is (max{x X}, 1, 0). 4. The vi are sorted in ascending order. Without loss of generality, the lower interval bounds P j i gj and upper interval bounds i + P j i gj are also sorted in increasing order. Proof. The first three statements are part of the GK sketch guarantee. For the third statement, i.e., to see that the vi are sorted in ascending order, we see that the GK sketch construction ensures that val(vi) val(vi+1) for all i. Since an insertion operation always inserts a repeated value after all previous occurrences and the tuple order is always preserved, it follows that ix(vi) ix(vi+1) as well, so in sum rank(X, vi) rank(X, vi+1). In other words, the sort order in the GK sketch is stable. The fact that the sequence P j i gj is sorted in increasing order follows from the non-negativity of the gi. To ensure that i + P j i gj are sorted in increasing order note that we always have that rank(X, vi) rank(X, vi+1) so that we can decrement i and ensure that i + P j i gj i+1 + P j i+1 gj without violating the guarantees of the GK sketch. B Omitted proofs We first recall and add some definitions that we will require in the sequel. Definition B.1. Let X = ((x1, 1), . . . , (xn, n)) (sometimes implicitly referred to as X = (x1, . . . , xn)) be a stream of elements drawn from some finite totally ordered data universe X, i.e., xi X for all i [n]. 1. Rank: Given a totally ordered finite data universe X, a data set X and a value x X, let rank X(x) = rank(X, x) = P y X 1[y x]. 2. For (xi, i) X, let val((xi, i)) = xi and ix((xi, i)) = P j i |{(xj, j) : xj < xi or xj = xi, j < i}|. 3. For v1, v2 X, we say that v1 v2 if ix(v1) ix(v2). 4. The q-quantile of X is val(v) for v X with ix(v) = qn . 5. For x X, we define rmin(x) = |{v X : val(v) < x}|, rmax(x) = |{v X : val(v) x}| and rank(X, x) to be the interval [rmin(x), rmax(x)]. 6. We say that x X is an α-approximate q-quantile for X if rank(X, x) [ qn αn, qn + αn] = . With this notation, the data set X is naturally identified as a multi-set of elements drawn from X. Definition B.2. Let ˆrmin(x) = max{P j i gj : val(vi) < x} and ˆrmax(x) = min{ i +P j i gj : val(vi) > x}. Note that for every v X such that val(v) = x, ix(v) [ˆrmin(x), ˆrmax(x)]. Published in Transactions on Machine Learning Research (06/2023) B.1 Proofs of privacy and utility for DPExp GK Outline: From Definition B.2 to Lemma B.5, we formalize how the GK sketch may be used to construct rank interval estimates for any data domain value. We then recall and apply the exponential mechanism with a score function derived from the GK sketch (Definition B.6 to Lemma B.9 and Algorithm 1), and derive the error guarantee Lemma B.11. We conclude this subsection with a detailed description of an efficient implementation of the exponential mechanism (Algorithm 8 and Lemma B.12), and summarize our final accuracy and space complexity guarantees in Theorem 4.1. Remark B.3. We can add two additional tuples ( , 0, 0) and ( , 0, 0) to the sketch, which corresponds to respective rank intervals [0, 0] and [n + 1, n + 1]. The bounds gi + i 2αn are preserved. This will ensure that the sets {i : val(vi) < x} and {i : val(vi) > x} for any x X are always non-empty. We formalize the rank interval estimation in a partition-wise manner as below. Lemma B.4. Given a GK sketch (v1, g1, 1), . . . , (vs, gs, s), for every x X one of the following two cases holds: 1. x = vi for some i [s] and i = min{j : vj = x}, ˆrmin(x) = X ˆrmax(x) = min{ i + X j i gj : i , val(vi ) > val(vi)} 2. x (vi 1, vi), i.e., x > vi 1 and x < vi for some i [s], ˆrmin(x) = X ˆrmax(x) = i+1 + X Proof. Recall, by Remark B.3, that the first tuple and the last tuple are formal elements at and , ensuring that every data universe element either explicitly occurs in the GK sketch or lies between two values that occur in the GK sketch. Both statements now follow directly from Definition B.2 and the fact that the values vi occur in increasing order in the sketch (Lemma 3.7). We bound the quality of the rank interval estimate [ˆrmin(x), ˆrmax(x)] compared to the true rank interval [rmin(x), rmax(x)] as follows. Lemma B.5. |rmin(x) ˆrmin(x)| 2αn and |rmax(x) ˆrmax(x)| 2αn. Proof. Let i = argmaxi:val(vi) 0, the exponential mechanism Eϵ u : SS R outputs r R with probability exp( ϵ u(S(X),r) 2δu ) where δu = max X X ,r |u(S(X), r) u(S(X ), r)|. Published in Transactions on Machine Learning Research (06/2023) The following statement formalizes the trade-offbetween the privacy parameter ϵ and the tightness of the tail bound on the score attained by the exponential mechanism. Theorem B.7 ( (Mc Sherry & Talwar, 2007; Smith, 2011)). The exponential mechanism (Definition B.6) satisfies ϵ-differential privacy. Further, the following tail bound on the utility (the score of the output element) holds: P u(S(X), Eϵ u(S(X))) < max r R u(S(X), r) 2δu(t + ln s) where s is the size of the universe from which we are sampling from. To run the exponential mechanism using our approximate rank interval estimates, we define a score function as follows. Definition B.8. Let d( , ) denote the ℓ1 metric on R. Given a sketch S(X), we define a score function on X: u(S(X), x) = min{|y qn | : y [ˆrmin(x), ˆrmax(x)]} = d( qn , [ˆrmin(x), ˆrmax(x)]) The magnitude of the noise that is added in the course of the exponential mechanism depends on the sensitivity of the score function, which we bound from above as follows. Lemma B.9. For all n > 1/α, the sensitivity of u (i.e., δu) is at most 4αn + 2 units. Proof. Fix any data set X neighbouring X under swap DP and let [r min( ), r max( )] be the rank ranges with respect to X for values in X. Let [ˆr min( ), ˆr max( )] denote the confidence interval derived from the GK sketch S(X ) for values in X. Claim B.10. |rmin(x) r min(x)| 2, |rmax(x) r max(x)| 2. Proof. These bounds follow directly from the Definition of rmin and rmax; under swap DP at most two elements of the stream are changed which implies that the count of the sets defining these terms changes by at most 1 unit each for a total shift of 2 units (in fact, this can be bounded by 1 unit). We now prove the sensitivity bound. u(S(X), x) = d( qn , [ˆrmin(x), ˆrmax(x)]) d( qn , [rmin(x), rmax(x)]) + 2αn d( qn , [r min(x), r max(x)]) + 2αn + 2 d( qn , [ˆr min(x), ˆr max(x)]) + 4αn + 2 u(S(X ), x) + 4αn + 2. Swapping the positions of X and X , we get the reverse bound to complete the sensitivity analysis. We can now derive a high probability bound on the utility that is achieved by Algorithm 1. Lemma B.11. If ˆx is the value returned DPExp GK then with probability 1 β, d( qn , [ˆrmin(ˆx), ˆrmax(ˆx)]) 2αn + 2(4αn + 2) log(|X|/β) Proof. By construction, Algorithm 1 is simply a call to the exponential mechanism with score function u(S(X), ), Since for any target q-quantile, qn lies in [0, n] it follows that there is some i s such that qn [P j i gj, i +1 + gi +1 + P j i gj]. It follows that d( qn , [rmin(val(vi ), rmax(val(vi ))]) 2αn Published in Transactions on Machine Learning Research (06/2023) and that hence max(u(S(X), x)) 2αn. If x is the output of the exponential mechanism, then applying the utility tail bound we get that with probability 1 β, u(S(X), x ) 2αn 2(4αn + 2) log(|X|/β) By definition of u, the desired bound follows. Algorithm 8: DPExp GKGumb: Implementing the Exponential Mechanism on S(X) using the Gumbel Distribution (Optimized implementation of Algorithm 1) Data: X = (x1, x2, . . . , xn) Input: ϵ, α (approximation parameter), q [0, 1] (quantile parameters) 1 Build summary sketch S(X) and let s = |S(X)|. 2 Let (vi, gi, i) = S(X)[i] for all i [s] 3 max Index = 1 4 max Value = /* Iterating over tuple values vi */ 5 Let i = 1 6 while i <= s do 7 ˆrmin = P 8 ˆrmax = min{ i + P j i gj : val(vi ) > val(vi)} 9 ui = min{|y qn | : y [ˆrmin, ˆrmax]}. 11 f = f + Gumb(0, 1) 12 if f > max Value then 13 max Index = (i, tuple) 14 max Value = f 15 i min{j : vj > vi} /* Iterating over intervals between tuples X(vi 1, vi) X */ 16 Let i = 1 17 while i <= s do 18 if X(vi 1, vi) is not empty then 19 ˆrmin = P j i gj 20 ˆrmax = i+1 + P 21 ui 1,i = min{|y qn | : y [ˆrmin, ˆrmax]}. 22 f = log(|X(vi 1, vi)|) + ϵ 23 f = f + Gumb(0, 1) 24 if f > max Value then 25 max Index = (i, interval) 26 max Value = f 28 if max Index = (i, tuple) for some i 1, . . . , s then 29 return vi 30 else if max Index = (i, interval) for some i 1, . . . , s then 31 Pick v X(vi 1, vi) uniformly at random 32 return v As discussed before, in general a naive implementation of the exponential mechanism as in Algorithm 1 would in general not be efficient. To resolve this issue, in Algorithm 8 we take advantage of the partition of the data domain by the score function and the Gumbel-max trick to implement the exponential mechanism without Published in Transactions on Machine Learning Research (06/2023) any higher-order overhead and return an α-approximate q quantile. This trick has become a standard way to implement the exponential mechanism over intervals/tuples. Lemma B.12. Algorithm 8 implements Algorithm 1 on the data universe X with space complexity O(|S(X)|) (where S(X) is the GK sketch) and additional time complexity O(|S(X)| log |S(X)|). Proof. We see that it will suffice to show that Algorithm 8 executes the exponential mechanism with the same score function as Algorithm 1 to prove that it is a valid implementation of the latter. As noted in previous work (Abernethy et al., 2017), if Z1, . . . , ZN are drawn i.i.d. from standard Gumbel distribution, then P fi + Zi = max j [N]{fj + Zj} = exp(fi) P j [N] exp(fj), i [N]. We recall that when running the exponential mechanism on X, we want to sample the element x X with probability exp(ϵu(S(X), x). To implement the exponential mechanism via the identification with Gumbel argmax distribution above, we will simply compute the scores u(S(X), x) and let fi = ϵ u(S(X), x). For x X such that x = vi for some i [s], Algorithm 8 directly computes the scores according to Definition B.8 and Lemma B.4; this is formalized by lines 5 to 15 in the pseudo code. For x X which lie strictly between the tuple values {vi : i [s]}, we proceed as follows. Fixing i, from Lemma B.4 we have that that for X(vi 1, vi) := {x X : x > vi 1, x < vi}, the rank confidence interval estimate is the same, i.e. [P j i 1 gj, i + P j i gj]. It follows from Definition B.8 that for all such domain values the the score function value u(S(X), ) is equal; this is denoted ui 1,i in the pseudo code. By summing the probabilities for sampling individual domain elements, it follows that the likelihood of the exponential mechanism outputting some value from the set X(vi 1, vi) is |X(vi 1, vi)| exp(ϵui 1,i/2) = exp(ϵui 1,i/2 + log(|X(vi 1, vi)|)). This is formalized by lines 16 to 27 in the pseudo code. Finally, if some interval is selected, then by outputting elements chosen uniformly at random, we ensure that the likelihood of x X(vi 1, vi) being output is 1 |X(vi 1,vi)| exp(ϵui 1,i/2 + log(|X(vi 1, vi)|) = exp(ϵui 1,i). Note that we do not need to account for ties in the Gumbel scores as the event fi +Zi = fj +Zj for any j = i has measure 0. 5 To bound the space and time complexity; we note that by the guarantees of the GK sketch, the size of the sketch S(X) is O((1/α) log αn); we compute Gumbel scores by iterating over tuples and intervals of which there are at most O(S(X))-many of each, each computation takes at most O(log |S(X)|) time, and only the max score and index seen at any point is tracked in the course of the algorithm. We can now state and prove our main theorem in this section, proving utility bounds for α-approximating quantiles through DPExp GK with sublinear space. Theorem 4.1. Algorithm 1 is ϵ-differentially private. Let ˆx be the value returned by Algorithm 1 when initialized with target quantile q. The following statements hold: 1. Algorithm 1 can be implemented to run with space complexity O((1/α) log αn), such that with probability 1 β d( qn , [ˆrmin(ˆx), ˆrmax(ˆx)]) 2αn + 2(4αn + 2) log(|X|/β) 2. For n > 24 log |X|/β α min{ϵ,1} , Algorithm 1 can be implemented to run with space complexity O (αϵ) 1 log(|X|/β) log n such that with probability 1 β, ˆx is an α approximate q-quantile. 5As is usual in the privacy literature, we assume that the sampling of the Gumb(0, 1) distribution can be done on finiteprecision computers (Balcer & Vadhan, 2018). While the problem of formally dealing with rounding has not been settled in the privacy literature (Mironov, 2012), for any practical purpose it easily suffices to store the output of the Gumbel distribution using a few computer words. Published in Transactions on Machine Learning Research (06/2023) Our dependence in α, which for practical purposes is usually the most important term, is optimal. Recent subsequent work by Kaplan and Stemmer (Kaplan & Stemmer, 2021) shows how to improve the dependence in other parameters if approximate (rather than pure) differential privacy is allowed, or if the stream length is large enough. Proof. The privacy guarantee of Algorithm 1 follows from the privacy guarantee of the exponential mechanism and Lemma B.12. The accuracy bound in equation 2 is simply a restatement of Lemma B.11, and the space complexity bounds follow from the GK sketch space complexity bound O 1 α log αn . To derive the second statement, we substitute α min{ϵ,1} 24 log(|X|/β) for the approximation parameter α in equation 2 and get d( qn , [ˆrmin(ˆx), ˆrmax(ˆx)]) 2 α min{ϵ, 1}n 24 log(|X|/β) + 2(4( α min{ϵ,1} 24 log(|X|/β))n + 2) log(|X|/β) αn 12 log(|X|/β) + αn 3 + 4 log |X|/β The space complexity bound now follows directly from the space complexity bound derived in Lemma B.12, the space complexity bound O((1/α) log αn) for the GK sketch, and by substituting α min{ϵ,1} 24 log(|X|/β) for α. Sincve the approximation parameter must be greater than 1/n, we have that n 24 log(|X|/β) α min{ϵ,1} . B.2 Proofs of privacy and utility for DPHist GK In this section we prove Theorem 4.2, which we restate here for ease of reference. Theorem 4.2. For the one-dimensional normal distribution N(µ, σ2), let S = (X1, X2, . . . , Xn) be a data stream through which we wish to obtain Qq S, a DP estimate of the q-quantile of the distribution. For any q (0, 1), there exists an (ϵ, δ)-DP algorithm such that, with probability at least 1 β, we obtain |Qq D Qq S| α for any α > 0, β (0, 1], ϵ, δ (0, 1/n) and for stream length n max {min {A, B} , C} , where as long as µ ( R, R) and using space of O(max{ R α log αn}). Proof. For any stream S = (X1, . . . , Xn), we use the triangle inequality so that |Qq D Qq S| |Qq D Qq S| + |Qq S Qq S| (3) α/2 + α/2. (4) |Qq D Qq S| α/2 follows with probability 1 β/2 by Corollary B.14 and |Qq S Qq S| α/2 follows with probability 1 β/2 by Lemma B.15. The space complexity follows with probability 1 via the deterministic nature of the Greenwald-Khanna sketch. Lemma B.13 (Dvoretzky-Kiefer-Wolfowitz inequality (Dvoretzky et al., 1956)). For any n Z+, let X1, . . . , Xn be i.i.d. random variables with cumulative distribution function F so that F(x) is the probability that a single random variable X is less than x for any x R. Let the corresponding empirical distribution function be Fn(x) = 1 n Pn i=1 1[Xi x] for any x R. Then for any γ > 0, P sup x R |Fn(x) F(x)| > γ 2 exp( 2nγ2). Published in Transactions on Machine Learning Research (06/2023) Corollary B.14. For any q (0, 1), let Qq D be the q-quantile estimate for the distribution D and Qq S be the q-quantile estimate for the sample. Then, |Qq D Qq S| α/2 with probability 1 β/2 when n 2 α2 log 4/β. Proof. Follows by the DKW inequality (Lemma B.13) where n 1 2γ2 log 4/β and γ = α/2. Lemma B.15. For any q (0, 1), α > 0, β (0, 1], ϵ, δ (0, 1/n), there exists an (ϵ, δ)-differentially private algorithm Qq S for computing the q-quantile such that |Qq S Qq S| α/2, with probability 1 β for stream length n O min{O R Furthermore, with probability 1, Qq S uses space of O(max{ R α log αn}). Proof. First, by the tail bounds of the Gaussian distribution (Claim B.17), we can obtain that for any i [n], P[|Xi µ| > c] 2e c2/2σ2, so that by the union bound, P[ i, |Xi µ| c] 2ne c2/2σ2, which implies that for any β (0, 1], P[ i, |Xi µ| σ p 2 log 4n/β] 1 β/2, which holds by our sample complexity (stream length) guarantees. Next, let r = R/σ . 6 Divide [ R σ/2, R+σ/2] into 2r+1 bins of length at most σ each. Each bin Bj should equal ((j 0.5)σ, (j + 0.5)σ] for any j { r, . . . , r}. Next run the histogram learner of Lemma B.16 with per-bin accuracy parameter of α/K, high-probability parameter of β/2, privacy parameters ϵ, δ (0, 1/n), and number of bins K = 2 R/σ + 1. We can do this because of our sample complexity (stream length) bounds. Then we obtain noisy estimates p r, . . . , pr with per-bin accuracy of α/K. Then any quantile estimate would have accuracy of α (by summing noisy estimates for at most K bins). Next, we use these bins to construct a sketch (private by DP post-processing) based on the deterministic algorithms of (Greenwald & Khanna, 2004) to, with probability 1, obtain space of O(max{ R α log αn}). Lemma B.16 (Histogram Learner (Bun et al., 2015; Vadhan, 2017; Karwa & Vadhan, 2018)). For every K N { } and every collection of disjoint bins B1, . . . , BK defined on the domain X. For any n N, ϵ, δ (0, 1/n), α > 0, and β (0, 1), there exists an (ϵ, δ)-DP algorithm M : X n RK such that for every distribution D on the domain X, if 1. X1, . . . , Xn D, pk = P[Xi Bk] for any k [K], 2. ( p1, . . . , p K) M(X1, . . . , Xn), 3. n max n min n 8 ϵα log 2K βδ o , 1 2α2 log 4 then (over the randomness of the data X1, . . . , Xn and of M) 1. PX D,M[| pk pk| α] 1 β, 2. P[argmaxk pk = j] npj if K 2/δ, 6Note that this argument is similar to the arguments for Algorithm 1 in (Karwa & Vadhan, 2018). Published in Transactions on Machine Learning Research (06/2023) 3. P[argmaxk pk = j] npj + 2 exp( (ϵn/8) (maxk pk)) if K < 2/δ. Claim B.17 (Gaussian Tail Bound). Let Z be a random variable distributed according to a standard normal distribution (with mean 0 and variance 1). For every t > 0, P[|Z| > t] 2 exp( t2/2). Lemma B.18. Algorithm 2 satisfies (ϵ, 0)-DP. Proof. For any i [n], any item xi can belong in at most one bin. Plus, the global sensitivity of the function that computes the empirical histogram is 2, since changing a single item can change the contents of at most two bins. As a result, adding noise of Lap(0, 2/ϵ) to each bin satisfies ϵ-DP by Theorem B.19. Theorem B.19 (Laplace Mechanism (Dwork et al., 2006)). Fix ϵ > 0 and any function f : Yn RK. The Laplace mechanism outputs f(y) + (L1, . . . , LK), L1, . . . , LK Lap(0, GSf/ϵ) where GSf is the global sensitivity of the function f. Furthermore, the mechanism satisfies (ϵ, 0)-DP. B.3 Extension to the Continual Observation setting In this section we prove Theorem 5.1, which formalizes the privacy and utility guarantees of Algorithm 3 in the continual observation setting. We start by making some definitions that will aid us in our analysis. Definition B.20. We make the following definitions: 1. Stream Prefix: Given an input data stream X = (x1, . . . , xn) we define the prefix up to the index s element of this stream X[1 : s] := (x1, . . . , xs). Note that we can overload notation and treat X[1 : s] as a data set by ignoring the order in which the elements arrive. 2. Checkpoint: A set S = is a set of checkpoints for stream X if j S, value vj that is a (α/2)-approximate q-quantile for X[1 : j]. We first observe that if we have an α/2 approximate quantile for a given data set, then that estimate remains at least an α-approximate quantile for a slightly larger set as well. Lemma B.21. If x X is an α/2-approximate q-quantile for a data set X (for some q [0, 1]), then it is an α-approximate q-quantile for any data set X X such that |X | (1 + α/2)|X|. Proof. Since x is an α/2-approximate q-quantile, we have that (1 α/2)|X| rank X (x) (1 + α/2)|X|. We then have that rank X (x) = X y X 1[y x] + X y X \X 1[y x] y X 1[y x] X y X 1[y x] X y X 1[y x] + X y X \X 1[y x] (1 α/2)|X| X y X 1[y x] (1 + α/2)|X| + (|X | |X|) Published in Transactions on Machine Learning Research (06/2023) (1 α/2)|X| X y X 1[y x] (1 + α/2)|X| + α|X|/2 y X 1[y x] (1 + α)|X|, i.e., x is also an α-approximate q-quantile for X , as required. We now show that the choice of checkpoints made in the pseudocode of algorithm 3 is valid as per our definition. Lemma B.22. In Algorithm 3, the set {cp(s ) : s [n]} forms a valid set of checkpoints for the data stream X. More concretely vs is an α/2-approximate q-quantile for X[1 : s]. Proof. First we bound the size of the set of checkpoints {cp(s ) : s [n]}. Since a new checkpoint value is generated only when s (1+α/2)cp(s 1), it follows that for any new checkpoint where cp(s ) = cp(s 1), we have cp(s ) (1 + α/2)cp(s 1). Using Theorem 4.1, we set the first checkpoint value to 24 log |X|/β α ϵ , where α = α/2, β and ϵ are the accuracy parameter, the failure probability, and the private parameter that are passed in the calls to DPExp GK, respectively. 7 It follows that there are at most k log1+α/2 n = 3 α log n checkpoints using the fact that for all x > 1, x 1+x log(1 + x) x. Since vcp(s) is the output of DPExp GK given a GK sketch with accuracy parameter α/2 and privacy parameter ϵ it follows that with probability 1 β where β = αβ 3 log n, rank X[1:s ] (1 α/2, 1 + α/2)qn. We now apply the union bound over all private approximate quantile computations at checkpoints and are done. We can now prove our main technical result. Theorem 5.1. Let ϵ, α > 0, n Z. For any β (0, 1], with probability 1 β, Algorithm 3 maintains an α-approximate q-quantile at every point s in the data stream for s = Ω log n log |X|/β α2ϵ . Furthermore, Algorithm 3 satisfies ϵ-DP and has space complexity Ω 1 α2ϵ log2 n log |X| log n Proof. To see that Algorithm 3 is ϵ-DP, we observe that the output of this algorithm throughout the data stream can be summarized by its outputs at the checkpoints {cp(s) : s [n]} (the points in the stream at which a new checkpoint is reached and a new value released are known publicly, so this suffices for privacy analysis). It follows that there is a choice of ϵ = αϵ 3 log n that gives us an ϵ-DP mechanism. We now prove the accuracy guarantee. From the second statement of Theorem 4.1 we get that for points in the stream s such that cp(s) 48 log |X|/β α min{ϵ ,1}, i.e., the first checkpoint, the output vs will be an α/2-accurate quantile for X[1 : cp(s)|. Then, since |X[1 : s]| = s (1 + α/2)cp(s) (1 + α/2)|X[1 : cp(s)]|, by Lemma B.21 it follows that rank X[1:s](vs) (1 α, 1 + α)qn, i.e., vs is an α-approximate q-quantile for X[1 : s]. C Additional Experiments In this section, we include some additional experimental details and results. Varying ϵ for DPExp GKGumb: In Table 1, we vary ϵ and compare DPExp Full to DPExp GKGumb in terms of average absolute error and execution time (in seconds). We see that DPExp GKGumb is significantly faster than DPExp Full in terms of average execution time. However, DPExp GKGumb incurs larger error because of the approximation factor of α = 0.0001. 7The last checkpoint might occur at (1 + α/2)n. We may ignore checkpoints past n. Published in Transactions on Machine Learning Research (06/2023) ϵ DPExp Full (Error) DPExp GKGumb (Error) DPExp Full (Time) DPExp GKGumb (Time) 0.1 0.0019796 0.002172 0.1330166 0.028163 0.5 0.0004581 0.0013467 0.13113595 0.02804 1 0.000249894 0.00127795 0.1335331 0.0284041 5 0.00007110 0.00151 0.1341164231 0.028285 Table 1: α = 0.0001, n = 105, U(0, 10). Time in Seconds. Histogram-Based Algorithm: We also implemented the DPHist GK and tested by varying the bin width of the histogram. In general, we observe that DPExp GK has superior performance in terms of minimizing the error for any single quantile query. For example, we obtain averge absolute errors of at least 2.56 for 2 bins and 3.001 for 10 bins for q = 0.3 for DPHist GK. This suggests that we can rely on DPExp GK and its variants for general-purpose implementations with minimal error. However, it is possible that certain parameter settings (e.g., bin width) for DPHist GK might yield better performance. Continual Observation Algorithm: We also implement the continual observation algorithm that builds directly on DPExp GK. We fixed the first stream checkpoint to 10000. For α = 0.001 and stream length of n = 100000 from U(0, 10), we observe average absolute error of 0.00723 for ϵ {0.1, 0.5, 1, 5} over 100 trials. For α = 0.001 and stream length of n = 100000 from N(5, 1), we observe average absolute error of 0.00134. However, it is possible that the first few checkpoints might be an important parameter for the error of the sketch in the continual observation setting. We leave the full exploration of this question to future work. C.1 Full Space Quantile Computation Without the bounded space requirement (i.e., space sublinear in the stream length), we can use the exponential mechanism with a score function that uses the entire stream of values X. In that case, the sensitivity of the score function is at most 1. We use this as one of the baselines for our experimental validation. Lemma C.1. Given any insertion only stream X = (x1, x2, . . . , xn 1, xn), the sensitivity of the score function u (under swap differential privacy) is at most 1. i.e., δu 1. The function u is defined as u(X, e) = | rank(X, e) r| where r is the approximate q n rank of the sketch and rank(X, e) is the rank of e amongst all values in the stream X. Proof. Let |X| = n. The score function becomes | rank(X, e) nq| where nq = q n . Consider two streams with only one element changed: X, X , denoting the element by xd. Then at time d n, in the second stream x d is inserted instead of xd. In both cases, nq changes by at most q (in the case of add-remove DP) and for swap DP, nq remains the same. And for any e, rank(X , e) would differ from rank(X, e) by at most 1 since the rank of any element can change by at most 1 after adding, deleting, or replacing an item in the stream. Furthermore, for any n d, the rank of any e will differ in X, X by at most 1 replacing xd with x d can displace the rank of any element by at most 1. Also, the term nq will remain the same. (Note that in the add-remove privacy definition nq = q n would change to either q (n + 1) or q (n 1) .) The reverse triangle inequality says that for any real numbers x and y, |x y| ||x| |y||. As a result, | rank(X, e) nq| + | rank(X , e) nq| | rank(X , e) rank(X, e)| 1 for any e.