# anonymized_histograms_in_intermediate_privacy_models__d0465449.pdf Anonymized Histograms in Intermediate Privacy Models Badih Ghazi Pritish Kamath Ravi Kumar Pasin Manurangsi Google Research Mountain View, CA, US badihghazi@gmail.com, pritish@alum.mit.edu, ravi.k53@gmail.com, pasin@google.com We study the problem of privately computing the anonymized histogram (a.k.a. unattributed histogram), which is defined as the histogram without item labels. Previous works have provided algorithms with 1and 2 2-errors of O"(pn) in the central model of differential privacy (DP). In this work, we provide an algorithm with a nearly matching error guarantee of e O"(pn) in the shuffle DP and pan-private models. Our algorithm is very simple: it just post-processes the discrete Laplace-noised histogram! Using this algorithm as a subroutine, we show applications in privately estimating symmetric properties of distributions such as entropy, support coverage, and support size. 1 Introduction Computing histograms is among the most well-studied tasks in data analytics and machine learning. Suppose that there is a domain [D] := {1, . . . , D}, where the ith user s input is zi 2 [D]. The histogram of the users inputs {z1, . . . , zn} is defined as h := (h1, . . . , h D) where hj := |{i 2 [n] | zi = j}|, i.e., the number of users who contribute item j 2 [D]. For many tasks, however, the different items themselves are not important and it instead suffices to consider the anonymized histogram (a.k.a. unattributed histogram) corresponding to h, which is defined as nh := (n(1), . . . , n(D)) where n( ) denotes the th largest element among the hj s. Whenever h is clear from context, we will skip the subscript and denote the anonymized histogram as simply n. Anonymized histograms have several applications including estimating symmetric properties of discrete distributions [11, 2, 16, 31], privately releasing the degree-distributions in social networks [34, 33, 40], and anonymizing password frequency lists [13, 14]. For more details, we refer the reader to [44] and the references therein. In this work, we study private anonymized histograms. The notion of privacy we study is differential privacy (DP) [24, 23], which has emerged as a very popular notion of private data analysis leading to numerous practical deployments [27, 43, 30, 8, 21, 1, 36, 41]. Multiple works have studied the problem of computing private anonymized histogram, with the focus so far being on the central model of DP.1 Moreover, two measures of error have been studied: 1-error and 2 2-error2. For the 2 2-error case, Hay et al. [34] give an "-DP algorithm with 1In the central model of DP, a trusted curator has access to the raw inputs of the users, and is supposed to output a DP estimate of the desired function, in this case, the anonymized histogram. 2Defined as kn ˆnk1 = P j2[D] |n(j) ˆn(j)| and kn ˆnk2 j2[D](n(j) ˆn(j))2, where ˆn denotes the output anonymized histogram. 36th Conference on Neural Information Processing Systems (Neur IPS 2022). an expected error of e O(pn/"2)3. As for the 1-error, Blocki et al. [13] observed that the exponential mechanism yields an expected 1-error of O(pn/") since there are at most exp(O(pn)) anonymized histograms in total [32]; recently, this bound was improved to O( n log(1/")/") by Suresh [44]. On the lower bound front, Alda and Simon [5] proved a bound of ( n/") for the expected 1-error; recently, this was improved to ( n log(1/")/") by Manurangsi [37], matching the aforementioned upper bound of [44] to within a constant factor. The latter lower bound also applies to (", δ)-DP algorithms for any sufficiently small δ (depending only on "). The anonymized histogram problem generalizes the COUNT-DISTINCT problem, which asks for the number of items j such that hj > 0. COUNT-DISTINCT can be easily solved in the central DP model by applying the discrete Laplace mechanism. In the (non-interactive) local DP setting, Chen et al. [17] proved a lower bound of "(n), which means that one cannot asymptotically beat the trivial algorithm that always outputs zero. The strong lower bounds on the error incurred by protocols in the local setting generally motivate the study of intermediate models of privacy including the panprivate [25] and shuffle DP [12, 26, 18] models. In these models, it turns out that COUNT-DISTINCT can be solved to within e O"(pn)-error while lower bounds of "(pn) are known [38, 9]. In this work, we show that, surprisingly, the anonymized histogram problem (which seems much harder than COUNT-DISTINCT) can in fact be solved with essentially the same asymptotic error of e O"(pn) as COUNT-DISTINCT, in both the pan-private and shuffle DP models. On the other hand, the aforementioned lower bound [17] for COUNT-DISTINCT also implies an "(n) lower bound for the expected 1-error of anonymized histogram in the more challenging local DP model. In other words, in the typical case where " is an absolute constant, it is impossible to achieve any asymptotic advantage over the trivial algorithm that always outputs the all-zeros histogram. 1.1 Our Results A prominent approach for computing private histograms in the central DP model is to add discrete4 Laplace noise to each histogram entry. We show that there is a post-processing algorithm that takes such a noised histogram and produces an accurate estimate of the anonymized histogram: Theorem 1 (Informal; Theorems 9 and 21). There is an algorithm that takes in a noisy histogram, where an independent discrete Laplace noise of parameter 1/" is added to each entry, and outputs an approximate anonymized histogram such that the expected 1and 2 2-errors are5 e O"( Note that there is a dependency of D in the error bound in Theorem 1; when the domain size is large, this can dominate the pn term. Fortunately, we show that this can be overcome by first randomly hashing into B buckets before computing the histogram. By picking B to be O(n), we show that one can achieve an error that is e O"(pn) without any dependency on D: Theorem 2 (Informal; Corollary 13 and Theorem 22). There is an algorithm that takes in a noisy hashed histogram, where an independent discrete Laplace noise of parameter 1/" is added to each bucket after hashing, and outputs an approximate anonymized histogram such that the expected 1error is e O"(pn). A similar algorithm holds, which uses two noisy hashed histograms and achieves expected 2 2-error of e O"(pn). Random hashing and computing discrete Laplace-noised histograms can be implemented in the panprivate6 and shuffle DP settings [29, 10], where in the latter case we have to concede δ > 0 in the privacy parameter. Thus, the theorem above yields: Corollary 3. For any " > 0, δ 2 (0, 1], there is an "-DP algorithm for anonymized histogram in the pan-private model and an (", δ)-DP algorithm in the shuffle DP model, with expected 1and 2 2-errors of e O"(pn). 3In fact, Hay et al. [34] proved a slightly stronger expected error bound of O(d log3 n/"2), where d denotes the number of unique values appearing in n. Since there are at most pn different values in the worst case, this guarantee yields an O(pn log3 n/"2) bound for any instance. 4Our results can be adapted to continuous Laplace noise [24] with only a constant factor overhead in the error. We use discrete noise since it can be applied in the shuffle DP model (which is discrete in nature). 5 e O(f) denotes O(f logc f) for some constant c > 0 and O"(f) denotes O(g(")f) for some function g( ). 6This is done by starting with D i.i.d. discrete Laplace r.v. s and incrementing each entry for the next item. As an immediate application of the above, we get algorithms for estimating symmetric properties of distributions; a distribution property is said to be symmetric if it remains unchanged under relabeling of the domain symbols. For any (non-private) symmetric estimator with low sensitivity, we get a private estimator in the pan-private and shuffle DP models. Theorem 4 (Informal;Theorem 17). For all " > 0, δ 2 (0, 1], and distributions D, for any symmetric distribution property f, and any symmetric estimator ˆf, there exists an "-DP mechanism M in the pan-private model and an (", δ)-DP mechanism M in the shuffle DP model, such that M outputs an -approximation to f(D) with high probability. The sample complexity of the mechanism M is given as C ˆ f(f, ) + D ˆ f( , "), where the first term is the non-private sample complexity of ˆf and the second term depends on the sensitivity of ˆf. In particular, in Section 5, we apply the above result to estimate the Shannon entropy, support coverage, and support size of discrete distributions, which, to the best of our knowledge, is the first such sample complexity bound in the pan-private and shuffle DP models. 1.2 Overview of Techniques We now describe the high-level ideas of our algorithms and proofs. For ease of exposition, we will occasionally be informal; all details are formalized in later sections. To describe our algorithm, we need definitions of prevalence and cumulative prevalence. Definition 5 (Prevalence and Cumulative Prevalence). The prevalence of a histogram h is defined as 'h := ('h 0 , . . . , 'h n), where 'h r := |{j 2 [D] : hj = r}| is the number of entries with value r. The cumulative prevalence of a histogram h is defined as 'h 1, . . . , 'h n), where 'h r := |{j 2 [D] : hj r}| is the number of entries with value at least r. Prevalence and cumulative prevalence can be similarly defined for an anonymized histogram n; note that 'h = 'nh and 'h . An important property of cumulative prevalence is that it preserves the 1-distance. Observation 6. For all anonymized histograms n, ˆn, it holds that k'n k1 = kn ˆnk1. We stress that, while cumulative prevalence has been used before in DP algorithms for computing anonymized histograms [44, 37], these algorithms require access to the true anonymized histogram first and therefore will only work in the central DP model. Algorithm for 1-error. For each j 2 [D], using the discrete Laplace-noised count, we produce an unbiased estimate for whether hj r for each r 2 [n]. Adding these up over all j 2 [D] gives an unbiased estimate for 'h r. We then project the estimated ' back so that it corresponds to a valid anonymized histogram. It can be seen that this last step can at most double the error. While the algorithm described above is simple, it is unclear why it incurs an error of e O"( n + D). The analysis turns out to be quite delicate. The key is that the unbiased estimator we use has variance that decreases exponentially with |hj r|. (In other words, the uncertainty is high only when hj is close to r.) Roughly speaking, this means that the total error is dominated by the error in the case where hj = r. Suppose for simplicity that we only focus on this case. Since there are 'h r entries satisfying the condition, the expected 1-error for ' r will be e O"( 'hr ). Thus, in total, the 1-error of the estimated cumulative prevalence is dominated by e O"(P 'hr ). We can now apply the Cauchy Schwarz inequality to yield P r2[n] r 'hr = (pn log n), where the last equality follows from the fact that P r is simply the total counts in the histogram. This concludes our proof sketch. The full proof can be found in Section 3. Handling Large Domain Sizes. When D n, we randomly hash into B = e O(n) buckets and compute the noisy reduced histogram on these B buckets. We can use our approach above to compute the anonymized histogram on these B buckets with 1-error at most e O"(pn). While this is a reasonable approach, it does not yet give a good estimate for the original anonymized histogram: the reason is that there could be as many as e (n) collisions due to hashing. To handle this, we define a function that inverts the reduced anonymized histogram to the original anonymized histogram. We then show that (i) this inverse has constant sensitivity and (ii) the reduced histogram is concentrated around its mean with an 1-deviation of e O(pn). Combining these two allows us to conclude that the inverse of the noisy anonymized histogram has expected 1-error of e O"(pn) as desired. See Section 4 for details. Privately Computing Symmetric Properties of Distributions. A large body of work starting with [2] has shown that plug-in estimators using the so-called profile maximum likelihood (PML) distribution achieve nearly optimal sample complexity for many symmetric distribution properties. At the core of these works is an important fact that many symmetric distribution properties have estimators (based on the anonymized histogram) with low sensitivity. Our algorithms then apply these estimators on our private anonymized histogram. The sensitivity of the estimator, together with the 1-error bound we have shown for our anonymized histogram, immediately yield bounds on the errors of the estimators. See Section 5 for details. Algorithm for 2 2-error. Adapting the algorithm for 2 2-error proceeds as follows: recall (e.g., by H older s inequality) that kn ˆnk2 2 kn ˆnk1 kn ˆnk1. Notice also that the concentration of the noise implies that each discrete Laplace-noised count is within O(log D/") of the true value. Due to this, we may only change the last (i.e., projection ) step by adding an extra constraint that each entry of the output estimated anonymized histogram is within O(log D/") of the corresponding entry of the noisy histogram. This way we have ensured that kn ˆnk1 O(log D/"). Combining this with our earlier bound on the expected 1-error immediately yields the desired bound on the 2 2-error. See Appendix B for details. 1.3 Other Related Work The original paper on the pan-private model [25] also studies the problem of estimating 'n r . However, their focus is on algorithms with small space complexity, and, if one were to sum up their error bounds for all r directly, it would yield a trivial bound of (n) on the total error in the anonymized histogram. Several recent works have studied the testing/estimation/learning of symmetric and other properties of distributions under privacy constraints, mostly in the central DP model [3, 4, 6, 15, 20, 35] and some in the local DP model [22]. In particular, Acharya et al. [3] study privately computing symmetric distribution properties in the central model. Indeed, they also exploit the fact that the estimators have low sensitivity. However, in the central DP model, low sensitivity allows one to get a private estimate by adding Laplace noise directly to the non-private estimate, whereas we need to compute the estimator from our approximate anonymized histogram. 2 Differential Privacy In this section, we review the basics of differential privacy (DP) and the shuffle DP and the panprivate models. Let [n] denote the set {1, . . . , n} and let 1[ ] denote the binary indicator function. Two datasets S = {z1, . . . , zn} and S0 = {z0 1, . . . , z0 n} are said to be neighboring, denoted S S0, if there is an index i 2 [n] such that zj = z0 j for all j 2 [n] \ {i}. We recall the following definition [24, 23]: Definition 7 (Differential Privacy (DP)). Let " > 0 and δ 2 [0, 1]. A randomized algorithm M : Zn ! R is (", δ)-differentially private ((", δ)-DP) if, for all S S0 and all (measurable) outcomes E R, we have that Pr[M(S) 2 E] e" Pr[M(S0) 2 E] + δ. We denote (", 0)-DP as "-DP or pure-DP. The case when δ > 0 is referred to as approximate-DP. In the central DP model, all the inputs are stored and processed by an analyzer and the privacy is enforced only on the output of the analyzer. Shuffle DP [12, 26, 18]. In the shuffle DP model, there are three algorithms, namely, a local randomizer R, a shuffler S, and an analyzer A. Let S = {z1, . . . , zn} be the input dataset. The randomizer R takes z 2 S as input and outputs a multiset of messages. The shuffler S takes the multiset of messages obtained from R applied to each z 2 S and permutes them randomly. The analyzer A takes this permuted multiset and computes the final output. The privacy in the shuffle model is enforced on the output of the shuffler S, when a single input is changed. Pan-privacy [25]. In the pan-private model, there is an algorithm that takes in a data stream of unbounded length consisting of elements in the domain. It is required that the internal state of the algorithm after any number of steps satisfies "-DP over the data stream prefix until that step. We consider two data streams to be neighboring iff they differ by a single entry. This is sometimes referred to as the event-level or record-level setting [7], in contrast to the user-level setting that was the focus of the original work on pan-privacy [25]. We remark that the user-level setting is not appropriate for our notions of error as it allows for changing an entry of the histogram arbitrarily, meaning that we cannot achieve any non-trivial error guarantees. Furthermore, our definition above (which allows us to directly implement discrete Laplace-noised histogram) is different from the origin definition in [25], because the latter requires that revealing both the internal state and the final output simultaneously are DP. Nonetheless, our algorithm can be easily adapted for this more restricted model, but with a worse dependency on ". We explain this in more detail in Appendix E. Histograms in Central DP. For a distribution D, let x D denote that the random variable x is chosen from D. For p 2 (0, 1), the discrete Laplace distribution (aka symmetric geometric distribution), denoted by DLap(p), is the distribution supported on Z whose probability mass at i 2 Z is 1 p We use the following well-known fact about central DP and histograms. Fact 8. The algorithm that adds DLap(e "/2) noise to each entry of a histogram is "-DP in the central model. We refer to the output of this algorithm as the discrete Laplace-noised histogram. For a histogram h, we use nh to denote the anonymized histogram corresponding to h, often dropping the subscript whenever h is clear from context. 3 Post-Processing Noised Histogram In this section we describe our main algorithm, which obtains an anonymized histogram by suitably post-processing a noised histogram. We first define the following function f : Z ! R, which is used in our post-processing method described in Algorithm 1. 1 if m > 0, 1 + p (1 p)2 if m = 0, p (1 p)2 if m = 1, 0 if m < 1. Algorithm 1 Anonymized Histogram Estimator w.r.t. 1 loss. Input: Discrete Laplace-noised histogram h0, i.e., h0 j hj + DLap(p) for r 2 [n] do j r), where f is defined in (1) return ˆn that minimizes k'ˆn We give an efficient implementation of the above algorithm in Algorithm 5 (Appendix D.1), which runs in time O(D + n log n). The main idea is to compute ˆn that minimizes k'ˆn ˆ' k1 using 1-isotonic regression. We now state the main guarantee of the post-processing method. Theorem 9. For all histograms h with khk1 = n, the estimate ˆn returned by Algorithm 1 satisfies E[kˆn nk1] O Cp(n + D) log n where, Cp := p2 (1 p)5 + p 1 p. From Fact 8, for "-DP, we may set p = e "/2. Theorem 9 then gives a bound of O( (n + D) log n/"2.5) on the expected 1-error. The crucial result needed in our analysis is the guarantee of the individual estimator f. We show that f(h0 j r) is an unbiased estimator of 1[hj r] and furthermore its variance decreases (exponentially) as |hj r| increases. Lemma 10. For all h, r 2 N [ {0}, if h0 h + DLap(p), then it holds that E[f(h0 r)] = 1[h r], and (2) Var[f(h0 r)] 4p|h r|+1 p (1 p)3 + (1 p) Proof. Let := h r, 0 := h0 r, and x := p/(1 p)2. Consider g : Z ! R defined by g(m) := f(m) 1/2. Notice that (2) is equivalent to E[g( 0)] = 1[ 0] 1/2. Due to symmetry, it suffices to consider the case where 0. In this case, we have E[g( 0)] = EZ DLap(p)[g( + Z)] = Pr[Z > ] (1/2) + Pr[Z ] E[g( + Z) | Z ]. The last term can be expanded as follows: E[g( + Z) | Z ] = (1/2 + x) Pr[Z = | Z ] (1/2 + x) Pr[Z = 1 | Z ] (1/2) Pr[Z < 1 | Z ] = (1/2 + x) (1 p) (1/2 + x) p(1 p) (1/2) p2 = 1/2. Combining the two equations above, we arrive at E[g( 0)] = Pr[Z > ] (1/2) + Pr[Z ] (1/2) = 1/2, thereby proving (2). To prove (3), notice again that Var[f( 0)] = Var[g( 0)]. Again, due to symmetry, we may only consider the case 0. Here, we have Var[g( 0)] = EZ DLap(p)[(g( + Z) 1/2)2] = Pr[Z ] E[(g( + Z) 1/2)2 | Z ] p E[(g( + Z) 1/2)2 | Z ]. Similar to before, we can expand the last term as E[(g( + Z) 1/2)2 | Z ] = x2 Pr[Z = | Z ] + (1 + x)2 Pr[Z = 1 | Z ] + 1 Pr[Z < 1 | Z ] = x2 (1 p) + (1 + x)2 p(1 p) + p2 p2/(1 p)3 + 2(1 + x2) p(1 p) + p2 4p2/(1 p)3 + 2p(1 p). Plugging this into the above, we get Var[g( 0)] 4p +1(p/(1 p)3 + (1 p)) as desired. The following is an immediate consequence of Lemma 10, by summing over all j 2 [D]. Observation 11. For all r 2 [n], and := 4p p (1 p)3 + (1 p) it holds that E[ ˆ' r] = 'n r and Var[ ˆ' r] Pn =0 p| r| 'n Proof of Theorem 9. The expected 1-error is bounded as kˆn nk1 = k'ˆn ˆ' k1 + k ˆ' 'n ˆ' k1 =) kˆn nk1 2 P r ˆ' r|, (4) where we use that k'ˆn ˆ' k1 k ˆ' 'n k1 by our choice of ˆn. From Observation 11, we have Var[ ˆ' r] p =0 p| r| 'n Combining this with (4), we have E[kˆn nk1] 2 P =0 p| r| 'n =0 p| r| 'n (Cauchy Schwarz) p O(plog n) r2[n] r p| r| p O(plog n) 2( + 1) (P1 t=0(t + 1) pt) = p O(plog n) 2( + 1) (1/(1 p)2) /(1 p)2 O(plog n) /(1 p)2 O(plog n) 4 Reducing Domain Size via Hashing In this section we propose an algorithm to handle the case where D n. The approach in this case is to hash the domain into something smaller. Let B 2 N be the number of hash values. The distribution of the anonymized histogram produced after random hashing into B buckets is equivalent to the following process: I Let n = (n(1), . . . , n(D)) be the starting anonymized histogram. I Pick a uniformly random hash function H : [D] ! [B]. I Let hred := (hred 1 , . . . , hred B ) denote the reduced histogram given by hred j2H 1(i) n(j). I Let nred denote the corresponding anonymized histogram. Let ΓB be the mapping from n to E['nred ] where nred is generated as above, and the expectation is over the choice of random hash functions H. With this notation, we present Algorithm 2. Algorithm 2 Anonymized Histogram Estimator w.r.t. 1 loss, for large domains. Input: Discrete Laplace-noised histogram hred, i.e., hred j + DLap(p) Compute an estimate ˆnred of nred using Algorithm 1 return ˆn that minimizes kΓB(ˆn) 'ˆnred k1, s.t. kˆnk1 = n We give an efficient implementation of (a variant of) the above algorithm in Algorithm 7 (Appendix D.2), which runs in time e O"(D + n log n). The main result of this section is the following: Theorem 12. For all B > 4n and histograms h with khk1 n, the estimate ˆn returned by Algorithm 2 satisfies E[kˆn nk1] O(E[kˆnred nredk1] + n log n/ Cp(n + B) log n + n log n/ (using Theorem 9). By setting B = nplog n, we get the following corollary. Corollary 13. For all 0 < " O(1), Algorithm 2 for p = e "/2 and B = nplog n is an "-DP algorithm, and achieves an expected 1-error of E(n, ") = O(pn(log n)3/4/"2.5).7 7By choosing B = n "2.5 log n, the bound in Corollary 13 could be improved to O(pn((log n)3/4/"1.25 + (log n)1/2/"2.5)) for " (log 0.4 n); we state the simpler bound for brevity. We describe the main steps in the proof of Theorem 12. Lipschitzness of Inverse of ΓB. We start by showing that the inverse of ΓB is O(1)-Lipschitz: Lemma 14. [Proof in Appendix A.1] For B > 4n and all anonymized histograms n, n0 with knk1, kn0k1 n, kΓB(n) ΓB(n0)k1 kn n0k1/4 . Concentration of nred. We now bound the expected 1-distance between 'nred and its expectation ΓB(n). Lemma 15. Assume that B 2n. Then, E[k'nred ΓB(n)k1] O(n log n/ Proof. We have ΓB(n)k1] = P r2[n] E[|'nred r2[n] E[|'nred We next use the following bound on the variance. Lemma 16. [Proof in Appendix A.2] For all r 2 [n], Var['nred Plugging Lemma 16 into (5), we have (Cauchy Schwarz) O O(n log n) + P O(n log n) + O(log n) P t2[n 1] t 'n = O(n log n/ Putting things together. With all the components ready, we can now prove Theorem 12. Proof of Theorem 12. By Lemma 14, we have E[kn ˆnk1] 4 E[kΓB(n) ΓB(ˆn)k1] E[kΓB(n) 'ˆnred k1 + k'ˆnred E[kΓB(n) 'ˆnred (By definition of ˆn) k1 + kΓB(n) 'nred O(E[knred ˆnredk1] + n log n/ B) (From Lemma 15). 5 Estimating Symmetric Properties of Discrete Distributions In this section we show how to use Theorem 12 for the task of estimating symmetric properties of discrete distributions over [k]. Here, a distribution property is symmetric if it remains unchanged under relabeling of the domain symbols. For example, a notable such property is the Shannon entropy of a distribution D defined as H(D) := P x D(x) log 1 D(x), a central object in information theory, machine learning, and statistics. If the support is unbounded, estimating H(D) is impossible with any finite number of samples. Our goal is to estimate the entropy of a distribution D 2 k up to an additive error, where k denotes the set of all distributions over [k]. One of the key ideas in the literature (e.g., [3]) is to design low sensitivity estimators ˆf : X n ! R for the desired symmetric distribution property f : k ! R. The (non-private) sample complexity of a property f : k ! R using estimator ˆf, denoted by C ˆ f(f, ), is the smallest number of samples n needed to estimate f(D) upto accuracy with probability at least 0.9, that is, Pr[| ˆf(S) f(D)| > ] < 0.1.8 The sensitivity of an estimator ˆf is n, ˆ f, which is the smallest value for which it holds for adjacent datasets S S0 each with n elements, that | ˆf(S) ˆf(S0)| n, ˆ f. Let D ˆ f( , ") := min{n | n, ˆ f 0.1 /E(n, ")}, for E(n, ") defined in Corollary 13. We will only consider symmetric estimators ˆf, for which we will abuse notation to use ˆf(n) to denote ˆf(S) for any dataset S that corresponds to the anonymized histogram n. Theorem 17. For all 0 < " O(1), δ 2 (0, 1], for any symmetric distribution property f, and any symmetric estimator ˆf, there exists an "-DP mechanism in the pan-private model and (", δ)-DP mechanism in the shuffle DP model, such that Pr S Dn[|M(S) f(D)| > ] < 0.2 with sample complexity Proof. Let n denote the anonymized histogram corresponding to the sampled dataset S. The mechanism M simply outputs ˆf(ˆn) for ˆn returned by Algorithm 2. Clearly the mechanism M is DP due to the post-processing property. We have that for a suitable n = O , it holds that h333 ˆf(n) f(D) h333 ˆf(ˆn) ˆf(n) where the first inequality holds by definition of C ˆ f(f, /2), and the second inequality holds because | ˆf(ˆn) ˆf(n)| n, ˆ f kˆn nk1 and by the guarantee of Theorem 12 and Markov s inequality Pr[kˆn nk1 > 10E(n, ")] 0.1. By a union bound, we have Pr[| ˆf(ˆn) ˆf(n)| > ] 0.44. We get the following sample complexity bounds for private estimation of Shannon entropy in the pan-private and shuffle DP models, as an immediate application of Theorem 17. Corollary 18 (Proof in Appendix C). For all 0 < " O(1), δ 2 (0, 1], there exists an "-DP mechanism in the pan-private model and (", δ)-DP mechanism in the shuffle DP model, that can estimate the entropy of D 2 k up to an additive error of with a sample complexity of min λ2(0, 1 2 + log3.5(1/( ")) k λ2 log k + log2 k log1.5(1/( ")) These bounds have the same dependence on k as in the work of Acharya et al. [3]. The dependence on and " is slightly worse due to a higher cost of sensitivity in our setting, and the worse dependence on " in our guarantees in Corollary 13. We also derive sample complexity bounds for private estimation of support coverage and support size. The support coverage of a distribution D and an integer m is defined as Sm(D) := 8The choice of 0.1 is arbitrary; using the median trick , we can boost the success probability to 1 β with an additional multiplicative log(1/β) more samples. x2supp(D)(1 (1 D(x))m), i.e., the expected number of distinct elements seen if we draw m i.i.d. samples from D. Here we would like to estimate Sm(D) to within an additive error of m. The support size of a discrete distribution D is the number of atoms x such that D(x) > 0. In general, this is impossible with finite sample since some atom of D might have an arbritrarily small probability mass. To avoid this, we follow prior work and consider only probability distributions in 1/K := {D | 8x 2 supp(D), D(x) 1/K}, i.e., those with non-zero mass of at least 1/K at every atom, for some K. We defer the details of the exact sample complexity bounds and the proofs to Appendix C. 6 Conclusions and Future Directions In this paper, we give simple algorithms for privately computing anonymized histograms. Our algorithms can be implemented in the shuffle and pan-private models. There are several immediate open questions as discussed below. Our upper bounds have a dependency of O(1/"2.5) in the 1-error case and O(1/"3.5) in the 2 2error case; it is unclear if these are tight. Similarly, there is a lower order multiplicative term of O(log n) and O(log2 n) in our 1and 2 2-error bounds respectively. Closing these gaps would be an interesting next step; these would also lead to improvements to the sample complexity bounds on private estimation of symmetric distribution properties, such as the Shannon entropy (Corollary 18). While our analysis of the large domain case relies on the hash function being uniformly random, it is conceivable that this is not required for the approach to work. It will be interesting if this approach, perhaps with some modifications, can also work with a weaker family of hash functions such as a pairwise-independent hash family. Note also that our shuffle DP algorithm has δ > 0, i.e., approximate-DP. This may not be necessary: there are a couple of recent algorithms for computing histogram with shuffle DP with δ = 0 [28, 19]. Our post-processing approach does not immediately apply to these algorithms because the noise to each count is not a discrete Laplace noise (and in fact is not even an independent additive noise). Adapting our approach to their setting is another interesting research direction. [1] J. M. Abowd. The US Census Bureau adopts differential privacy. In KDD, pages 2867 2867, [2] J. Acharya, H. Das, A. Orlitsky, and A. T. Suresh. A unified maximum likelihood approach for estimating symmetric properties of discrete distributions. In ICML, pages 11 21, 2017. [3] J. Acharya, G. Kamath, Z. Sun, and H. Zhang. INSPECTRE: privately estimating the unseen. J. Priv. Confidentiality, 10(2), 2020. [4] J. Acharya, Z. Sun, and H. Zhang. Differentially private testing of identity and closeness of discrete distributions. In Neur IPS, pages 6879 6891, 2018. [5] F. Ald a and H. U. Simon. A lower bound on the release of differentially private integer parti- tions. IPL, 129:1 4, 2018. [6] M. Aliakbarpour, I. Diakonikolas, and R. Rubinfeld. Differentially private identity and equiv- alence testing of discrete distributions. In ICML, pages 169 178, 2018. [7] K. Amin, M. Joseph, and J. Mao. Pan-private uniformity testing. In COLT, pages 183 218, [8] Apple Differential Privacy Team. Learning with privacy at scale. Apple Machine Learning Journal, 2017. [9] V. Balcer, A. Cheu, M. Joseph, and J. Mao. Connecting robust shuffle privacy and pan-privacy. In SODA, pages 2384 2403, 2021. [10] B. Balle, J. Bell, A. Gasc on, and K. Nissim. Private summation in the multi-message shuffle model. In CCS, pages 657 676, 2020. [11] T. Batu, L. Fortnow, R. Rubinfeld, W. D. Smith, and P. White. Testing that distributions are close. In FOCS, pages 259 269, 2000. [12] A. Bittau, U. Erlingsson, P. Maniatis, I. Mironov, A. Raghunathan, D. Lie, M. Rudominer, U. Kode, J. Tinn es, and B. Seefeld. Prochlo: Strong privacy for analytics in the crowd. In SOSP, pages 441 459, 2017. [13] J. Blocki, A. Datta, and J. Bonneau. Differentially private password frequency lists. In NDSS, [14] J. Bonneau. The science of guessing: analyzing an anonymized corpus of 70 million pass- words. In S & P, pages 538 552, 2012. [15] B. Cai, C. Daskalakis, and G. Kamath. Priv it: Private and sample efficient identity testing. In ICML, pages 635 644, 2017. [16] M. Charikar, K. Shiragur, and A. Sidford. Efficient profile maximum likelihood for universal symmetric property estimation. In STOC, pages 780 791, 2019. [17] L. Chen, B. Ghazi, R. Kumar, and P. Manurangsi. On distributed differential privacy and counting distinct elements. In ITCS, pages 56:1 56:18, 2021. [18] A. Cheu, A. D. Smith, J. R. Ullman, D. Zeber, and M. Zhilyaev. Distributed differential privacy via shuffling. In EUROCRYPT, pages 375 403, 2019. [19] A. Cheu and C. Yan. Pure differential privacy from secure intermediaries. Co RR, abs/2112.10032, 2021. [20] I. Diakonikolas, M. Hardt, and L. Schmidt. Differentially private learning of structured discrete distributions. In NIPS, pages 2566 2574, 2015. [21] B. Ding, J. Kulkarni, and S. Yekhanin. Collecting telemetry data privately. In Neur IPS, pages 3571 3580, 2017. [22] J. C. Duchi, M. I. Jordan, and M. J. Wainwright. Minimax optimal procedures for locally private estimation. JASA, 2017. [23] C. Dwork, K. Kenthapadi, F. Mc Sherry, I. Mironov, and M. Naor. Our data, ourselves: Privacy via distributed noise generation. In EUROCRYPT, pages 486 503, 2006. [24] C. Dwork, F. Mc Sherry, K. Nissim, and A. D. Smith. Calibrating noise to sensitivity in private data analysis. In TCC, pages 265 284, 2006. [25] C. Dwork, M. Naor, T. Pitassi, G. N. Rothblum, and S. Yekhanin. Pan-private streaming algorithms. In ICS, pages 66 80, 2010. [26] U. Erlingsson, V. Feldman, I. Mironov, A. Raghunathan, K. Talwar, and A. Thakurta. Ampli- fication by shuffling: From local to central differential privacy via anonymity. In SODA, pages 2468 2479, 2019. [27] U. Erlingsson, V. Pihur, and A. Korolova. RAPPOR: Randomized aggregatable privacypreserving ordinal response. In CCS, pages 1054 1067, 2014. [28] B. Ghazi, N. Golowich, R. Kumar, P. Manurangsi, R. Pagh, and A. Velingker. Pure differen- tially private summation from anonymous messages. In ITC, pages 15:1 15:23, 2020. [29] B. Ghazi, R. Kumar, P. Manurangsi, and R. Pagh. Private counting from anonymous messages: Near-optimal accuracy with vanishing communication overhead. In ICML, pages 3505 3514, 2020. [30] A. Greenberg. Apple s differential privacy is about collecting your data but not your data. Wired, June, 13, 2016. [31] Y. Hao and A. Orlitsky. The broad optimality of profile maximum likelihood. In Neur IPS, pages 10989 11001, 2019. [32] G. H. Hardy and S. Ramanujan. Asymptotic Formulaæ in Combinatory Analysis. Proc. London Math. Soc., s2-17(1):75 115, 01 1918. [33] M. Hay, C. Li, G. Miklau, and D. D. Jensen. Accurate estimation of the degree distribution of private networks. In ICDM, pages 169 178, 2009. [34] M. Hay, V. Rastogi, G. Miklau, and D. Suciu. Boosting the accuracy of differentially private histograms through consistency. VLDB, 3(1):1021 1032, 2010. [35] V. Karwa and S. Vadhan. Finite sample differentially private confidence intervals. In ITCS, [36] K. Kenthapadi and T. T. L. Tran. Pripearl: A framework for privacy-preserving analytics and reporting at linkedin. In CIKM, pages 2183 2191, 2018. [37] P. Manurangsi. Tight bounds for differentially private anonymized histograms. In SOSA, pages 203 213, 2022. [38] D. J. Mir, S. Muthukrishnan, A. Nikolov, and R. N. Wright. Pan-private algorithms via statistics on sketches. In PODS, pages 37 48, 2011. [39] A. Orlitsky, A. T. Suresh, and Y. Wu. Optimal prediction of the number of unseen species. PNAS, 113(47):13283 13288, 2016. [40] S. Raskhodnikova and A. Smith. Efficient Lipschitz extensions for high-dimensional graph statistics and node private degree distributions. FOCS, 2016. [41] R. Rogers, S. Subramaniam, S. Peng, D. Durfee, S. Lee, S. K. Kancha, S. Sahay, and P. Aham- mad. Linkedin s audience engagements API: A privacy preserving data analytics system at scale. J. Priv. Confiden., 11(3), 2021. [42] G. Rote. Isotonic Regression by Dynamic Programming. In SOSA, pages 1:1 1:18, 2019. [43] S. Shankland. How Google tricks itself to protect Chrome user privacy. CNET, October, 2014. [44] A. T. Suresh. Differentially private anonymized histograms. In Neur IPS, pages 7969 7979, 1. For all authors... (a) Do the main claims made in the abstract and introduction accurately reflect the paper s contributions and scope? [Yes] (b) Did you describe the limitations of your work? [Yes] We discuss limitations and scope for future improvements in Section 6. (c) Did you discuss any potential negative societal impacts of your work? [N/A] Since this is a purely theoretical paper regarding private algorithms for approximately computing anonymized histograms, we do not foresee any immediate potential negative impacts. (d) Have you read the ethics review guidelines and ensured that your paper conforms to them? [Yes] 2. If you are including theoretical results... (a) Did you state the full set of assumptions of all theoretical results? [Yes] (b) Did you include complete proofs of all theoretical results? [Yes] The proof overviews are contained in the main body; with proofs of key lemmas included in the supplementary material. 3. If you ran experiments... (a) Did you include the code, data, and instructions needed to reproduce the main experi- mental results (either in the supplemental material or as a URL)? [N/A] (b) Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [N/A] (c) Did you report error bars (e.g., with respect to the random seed after running experi- ments multiple times)? [N/A] (d) Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [N/A] 4. If you are using existing assets (e.g., code, data, models) or curating/releasing new assets... (a) If your work uses existing assets, did you cite the creators? [N/A] (b) Did you mention the license of the assets? [N/A] (c) Did you include any new assets either in the supplemental material or as a URL? [N/A] (d) Did you discuss whether and how consent was obtained from people whose data you re using/curating? [N/A] (e) Did you discuss whether the data you are using/curating contains personally identifi- able information or offensive content? [N/A] 5. If you used crowdsourcing or conducted research with human subjects... (a) Did you include the full text of instructions given to participants and screenshots, if applicable? [N/A] (b) Did you describe any potential participant risks, with links to Institutional Review Board (IRB) approvals, if applicable? [N/A] (c) Did you include the estimated hourly wage paid to participants and the total amount spent on participant compensation? [N/A]