# improved_utility_analysis_of_private_countsketch__3d4be93b.pdf Improved Utility Analysis of Private Count Sketch Rasmus Pagh Basic Algorithms Research Copenhagen University of Copenhagen pagh@di.ku.dk Mikkel Thorup Basic Algorithms Research Copenhagen University of Copenhagen mikkel2thorup@gmail.com Sketching is an important tool for dealing with high-dimensional vectors that are sparse (or well-approximated by a sparse vector), especially useful in distributed, parallel, and streaming settings. It is known that sketches can be made differentially private by adding noise according to the sensitivity of the sketch, and this has been used in private analytics and federated learning settings. The post-processing property of differential privacy implies that all estimates computed from the sketch can be released within the given privacy budget. In this paper we consider the classical Count Sketch, made differentially private with the Gaussian mechanism, and give an improved analysis of its estimation error. Perhaps surprisingly, the privacy-utility trade-off is essentially the best one could hope for, independent of the number of repetitions in Count Sketch: The error is almost identical to the error from non-private Count Sketch plus the noise needed to make the vector private in the original, high-dimensional domain. 1 Introduction Count Sketch was introduced by Charikar, Chen, and Farach-Colton (2004) as a method for finding heavy hitters in data streams. In machine learning the same sketch was studied by Weinberger, Dasgupta, Langford, Smola, and Attenberg (2009) under the name feature hashing, or less formally the hashing trick . The idea is to map a high-dimensional vector x Rd to a lower-dimensional representation Ax RD, using a certain random linear mapping A. From Ax it is possible to estimate entries xi, with error that depends on D and the norm of x, and more generally to estimate inner products. It was shown in (Weinberger et al., 2009) that for y Rd, Ax, Ay is well-concentrated around x, y . Recent interest in sketching techniques is motivated by distributed applications such as federated learning (Kairouz et al., 2021), where many users contribute to creating a model on their combined data, without transferring data itself. In this context, sketching can be used to identify the large-magnitude entries in the sum of many high-dimensional vectors (one held by each user), using communication proportional to the number of such entries. Perhaps the most basic property of Count Sketch/feature hashing is the error of estimators for a single coordinate xi. The sketch directly provides independent, noisy estimators X1, . . . , Xk, each symmetric with mean xi. Two main ways of combining these into a more reliable estimator for xi have been studied: Taking the median estimator (Charikar et al., 2004), or taking the average of estimators (implicit in Weinberger et al. (2009)). Minton and Price (2014) showed that using the median estimator with k = Θ(log d) is not only is more robust, but alse decreases the variance of the estimator by a factor of Ω(log d). Later Larsen, Pagh, and Tˇetek (2021) showed that using the median estimator with k = 3 makes the error decrease quadratically with the sketch size D. In this paper we consider Count Sketches that have been made differentially private using the Gaussian mechanism, which is perhaps the most widely used mechanism for making high-dimensional vectors differentially private. This sketch, which we will refer to as Private Count Sketch, works by 36th Conference on Neural Information Processing Systems (Neur IPS 2022). adding i.i.d. Gaussian noise to each coordinate of the Count Sketch, with variance scaled according to the (squared L2) sensitivity k of the sketch. We give additional evidence that the median estimator is preferable to the mean by showing that its estimation noise is independent of the number k of estimators. Previous applications of sketching techniques to differential privacy have had noise that grew (polynomially) with k. The failure probability of Count Sketch decreases exponentially in k, so it is preferable to have k relatively large, and thus these methods faced a trade-off between the noise from the Gaussian mechanism and the error probability of the underlying Count Sketch. We show that this trade-off is not needed: It is possible to both get a highly reliable Count Sketch (by choosing k large enough) and achieve noise that depends only on the privacy parameters (independent of k). 2 Related work The analysis of the Gaussian mechanism is attributed in Dwork and Roth (2014, Appendix A) to the the authors of the seminal paper on differential privacy (Dwork, Mc Sherry, Nissim, and Smith, 2006). It has since been shown to have desirable properties related to keeping track of privacy loss under composition, see e.g. Bun & Steinke (2016); Mironov (2017). Local differential privacy. A variant of Count Sketch, the Count Mean Sketch has been used by Apple to privately collect information on heavy hitters, e.g. popular emoji in different countries (Apple Differential Privacy Team, 2017). This can be phrased in terms of summing n user vectors, each one a 1-hot vector with a single 1, and identifying the large entries. Their protocol works in the local model of differential privacy, which means that each report is differentially private. Lower bounds on local differential privacy (Chan, Shi, and Song, 2012) imply that the noise on estimates in this setting grow with Ω( n), where n is the number of users. Independently, Bassily, Nissim, Stemmer, and Thakurta (2020) improved theoretical results of Bassily and Smith (2015) on heavy hitter estimation in the local model, providing practical methods for matching the lower bound of (Chan et al., 2012). They also base their method on Count Sketch, though it is made private using a sampling technique combined with randomized response, not by adding noise to each coordinate. The noise from this step, rather than from Count Sketch itself, dominates the error. Acharya, Sun, and Zhang (2019) showed that similar error can be achieved without any agreed-upon public randomness (with no need for a common, random sketch matrix). Huang, Qiu, Yi, and Cormode (2022) used Count Sketch, made private with the geometric (aka. discrete Laplace) mechanism, to design protocols for frequency estimation under local differential privacy and multiparty differential privacy. They observe that the estimator in each repetition of Count Sketch is symmetric (assuming fully random hashing), but unlike the present paper they do not demonstrate that the median estimator has noise that is smaller than the noise added to each estimator. Thus their estimation error bound grows with the number of repetitions of Count Sketch. Zhou, Wang, Chan, Fanti, and Shi (2022) recently used Count Sketch as the basis for a mechanism that releases t-sparse vectors with differential privacy. We relate our result to theirs in Section 3.4. Independent of our work, Zhao, Qiao, Redberg, Agrawal, Abbadi, and Wang (2022) recently presented a comprehensive study of differentially private linear sketches, including Private Count Sketch. They focus on bounding the maximum error, but also have theoretical results for point estimates that are weaker than ours. Their experiments confirm the performance of Private Count Sketch in practice. In addition they show how Count Sketch can be used to build a sketch for quantile approximation, and our results imply tighter analysis for that application. Protocols based on cryptography. Motivated by privacy-preserving aggregation of distributed statistics, Melis, Danezis, and Cristofaro (2016) considered Count Sketch (and the related Count Min sketch) made ε-differentially private using the Laplace mechanism (Dwork et al., 2016). They empirically showed that the set of heavy hitters, i.e., large entries in the vector, could be identified with very little error on skewed distributions, but did not provide general bounds on estimation error. Their protocol can be implemented in a distributed setting in which each user has a 1-hot vector, and we want to sketch the sum of these vectors, using cryptographic protocols for secure aggregation. This bypasses lower bounds on local differential privacy, and yields much better privacy-utility trade- offs (but now assuming security of the aggregation protocols, so guarantees rely on cryptographic assumptions and are not information-theoretic). Another model of differential privacy, built on the cryptographic primitive of an anonymous channel (e.g. implemented as a mixnet), is the shuffle model (Cheu, Smith, Ullman, Zeber, and Zhilyaev, 2019; Erlingsson, Feldman, Mironov, Raghunathan, Talwar, and Thakurta, 2019). Ghazi, Golowich, Kumar, Pagh, and Velingker (2021) used Count-Min sketches in a protocol for frequency estimation and heavy hitters in this context. Our work suggests that better accuracy can be obtained by using Count Sketch instead of Count-Min, especially for high-probability bounds. Related results in the central model. Mir, Muthukrishnan, Nikolov, and Wright (2011) studied general mechanisms for making sketching algorithms differentially private, and in particular studied using the Count-Min sketch to identify heavy hitters. They observed that any mechanism that makes a linear sketch private by adding (oblivious) noise to the sketch vector implies a pan-private sketch that can be updated dynamically. Our work implies an improved pan-private sketch based on Count Sketch. Aumüller, Lebeda, and Pagh (2021) studied differentially private representations of sparse vectors, showing that it is possible to achieve space close to the number of non-zero entries while keeping noise comparable to a naïve application of the Laplace mechanism to the raw vectors. Our work implies that, up to a logarithmic factor in space, such a result is possible with a linear sketch, which enjoys many desirable properties. Analysis of Count Sketch. Several authors have worked on improved analysis of the error of Count Sketch in the non-private setting, including Minton and Price (2014) and Larsen, Pagh, and Tˇetek (2021). It seems possible to analyze Private Count Sketch using the framework of (Minton & Price, 2014), but we will pursue an elementary, direct analysis that does not depend on the Fourier transform of random variables. 3 Private Count Sketch In this section we provide the necessary background information on Count Sketch, and present our improved analysis of Private Count Sketch. 3.1 Notation and background We use [k] to denote the set {1, . . . , k}. Vectors are indexed by one or more integers, each from designated ranges. For example, a vector of dimension D = kb may be indexed by (i, j) where i [k] and j [b]. The ordering of vector entries is not important. For integer k, let tailk(x) denote the vector that is identical to x except that the k coordinates of largest magnitude are replaced with zeros. For a predicate P, we let [P] denote the indicator that is 1 if P is true and 0 otherwise. Count Sketch is a random linear mapping of a vector x Rd to Ax RD. For suitable parameters k, b such that D = kb, the sketch CS(x) is defined in terms of two sequences of random, independent hash functions: h1, . . . , hk : [d] [b], and s1, . . . , sk : [d] { 1, +1}. Count Sketch was originally presented with hash functions from a 2-wise independent families (Charikar et al., 2004), but in this paper we assume that all hash functions used are fully random. (Full randomness is used in our analysis of error, but the privacy of our method does not depend on this assumption.) To simplify our exposition we will further assume that k is odd and that b is even. If we index CS(x) RD by (i, j) [k] [b], then ℓ [d] si(ℓ)xℓ[hi(ℓ) = j] . That is, each vector entry xℓis added, with sign si(ℓ), to entries indexed by (i, hi(ℓ)), for i [k]. We can see the sketch as a sequence of k hash tables, each of size b; thus we refer to k as the number of repetitions and to b as the table size. It is easy to see that Xi = si(ℓ)CS(x)i,hi(ℓ) is an unbiased estimator for xℓfor each i [k]. Furthermore, since s is fully random the distribution of Xi conditioned on si(ℓ) = 1 is identical to the distribution of Xi conditioned on si(ℓ) = 1, which means that Xi is symmetric around xℓ. We can combine these estimators into a more robust estimator: ˆxℓ= median ( {si(ℓ)CS(x)i,hi(ℓ) | i [k]} ) . (1) Minton & Price (2014) bounded the error of ˆxℓ, with a failure probability that is exponentially decreasing with k: Theorem 3.1 (Minton & Price (2014)). For every α [0, 1] and every ℓ [d], the estimation error of Count Sketch with k repetitions and table size b satisfies Pr [|ˆxℓ xℓ| > α ] < 2 exp ( Ω ( α2k )) , where = ||tailb(x)||2/ Note that ||tailb(x)||2 ||x||2, but can be much smaller for skewed distributions and even zero for sparse vectors. Choosing suitable α2k = Θ(log d) we get that all coordinates xℓare estimated within this error bound with high probability. 3.2 Private Count Sketch Several approaches for releasing a Count Sketch with differential privacy have been studied. These all require that the sketches of neighboring vectors x, x , denoted x x , have similar distributions. The neighboring relation that we consider is that x x if and only if these vectors differ in at most one entry, by at most 1, or equivalently that x x has a single nonzero entry with a value in [ 1, +1]. Since Count Sketch is linear, CS(x) CS(x ) = CS(x x ). From this it is easy to see that for x x , ||CS(x) CS(x )||2 k. In differential privacy terminology, the sensitivity of the sketch is There are many ways to compute a differentially private version of a Count Sketch CS(x) (or any other function of x), but it is most common to consider oblivious methods that work by sampling a symmetric noise vector ν RD that is independent of CS(x) and releasing the private Count Sketch: PCS(x) = CS(x) + ν . Since the noise is symmetric around zero, we have (similar to before) that Xi = si(ℓ)PCS(x)i,hi(ℓ) is an unbiased estimator for xℓ, and therefore it makes sense to use the median estimator (1) also for Private Count Sketch: xℓ= median ( {si(ℓ)PCS(x)i,hi(ℓ) | i [k]} ) . (2) Note that except for the addition of ν, which can be considered an alternative initialization step, Private Count Sketch works in exactly the same way as Count Sketch: Updates are done in the same way, and estimators xℓare computed in the same way. Properties of Private Count Sketch. Since PCS(x) is an affine transformation, it can be maintained under updates to x. Also, it is possible to add and subtract Private Count Sketches (based on the same hash functions), e.g.: PCS(x) + PCS(y) = CS(x + y) + ν1 + ν2, where ν1 and ν2 are sampled according to the noise distribution. The noise distribution we will consider in this paper is the standard D-dimensional Gaussian distribution N(0, σ2)D with mean zero and variance σ2, and in the rest of this paper we refer to PSC(x) with noise ν N(0, σ2)D. Differential privacy properties of PCS(x) follows from general results on the Gaussian mechanism, see e.g. Dwork & Roth (2014, Apendix A) and Bun & Steinke (2016). We state some of them for convenience here: Lemma 3.2. PCS(x) is (ε, δ)-differentially private for ε, δ satisfying σ2 > 2k ln(1.25/δ)/ε2 and ε < 1, and it is (k/2σ2)-zero-concentrated differentially private. This tells us that to get good privacy parameters, we need to use Gaussian noise with variance Ω(k), i.e., standard deviation σ at least Ω( k). The median estimator (2) returns, for some i, si(ℓ)PCS(x)i,hi(ℓ) = si(ℓ) ( CS(x)i,hi(ℓ) + νi,hi(ℓ) ) i.e., a value that could be returned by Count Sketch plus a noise term with standard deviation Ω( k). Thus, we might expect the magnitude of the noise to scale proportionally to k. Our main result is that this does not happen, and in fact the magnitude of the noise can be bounded independently of k. Theorem 3.3. For every α [0, 1] and every ℓ [d], the estimation error of Private Count Sketch with k repetitions, table size b, and noise from N(0, σ2)kb satisfies Pr [|ˆxℓ xℓ| > α max{ , σ}] < 2 exp ( Ω ( α2k )) , where = ||tailb(x)||2/ Discussion. Theorem 3.3 generalizes the known tail bound on Count Sketch, since if we let α = 1 and σ = 0 we recover Theorem 3.1. For ε < 1, if we choose suitable σ = O( k log(1/δ)/ε) the Private Count Sketch satisfies (ε, δ)- differential privacy. For noise level σ and deviation γ < σ we then get: Pr[| xℓ xℓ| > γ] < 2 exp ( Ω ( γ2ε2/ ln(1/δ) )) . Up to the hidden constant in the order-notation this is identical to the tail bound for the noise of the Gaussian distribution applied directly to x in order to ensure (ε, δ)-differential privacy. 3.3 Analysis of Private Count Sketch The message of our paper is that Gaussian noise composes very nicely with the Count Sketches as analysed by (Minton & Price, 2014). All we use is that the Count Sketch estimator is the median of symmetric variables. For the median trick on indendent symmetric random variables C1, . . . , Ck, Minton & Price (2014, Lemma 3.3) show that for γ > 0 where Pr[|Ci| γ] p it holds that Pr[|mediani [k]Ci| > γ] 2 exp( p2k/2) . (3) In our case, for i = 1, . . . , k, we have a symmetric random variable Ai (representing a simple estimator error before noise) to which we add Gaussian noise Bi N(0, σ2). The estimator Ci = Ai + Bi is clearly also symmetric. Basic properties of the Gaussian distribution implies that (a) if γ σ and |Ai| σ then Pr[|Ci| γ] = Ω(γ/σ). (b) if γ σ and |Ai| γ then Pr[|Ci| γ] = Ω(1). In our concrete case (details below) we will have a furher property of Ai, namely (c) There exists > 0 such that for every α [0, 1], Pr[|Ai| α ] = Ω(α). Lemma 3.4. Assuming (a), (b), and (c) then for every α [0, 1], Pr[|Ci| α max{ , σ}] = Ω(α ). Proof. If α σ then using (b) and (c), Pr[|Ci| α max{ , σ}] = Pr[|Ci| α ] Pr[|Ci| α | |Ai| α ] Pr[|Ai| α ] = Ω(1) Ω(α ) = Ω(α ) . If α σ and σ, we set α1 = σ/ and α2 = α /σ such that α1α2 = α . Using (a) and (c) we get Pr[|Ci| α max{ , σ}] = Pr[|Ci| α ] = Pr[|Ci| α | |Ai| σ] Pr[|Ai| σ] = Pr[|Ci| α2σ | |Ai| σ] Pr[|Ai| α1 ] = Ω(α2) Ω(α1) = Ω(α ) . Finally, if α < σ and < σ, we use (a) and (c) to get Pr[|Ci| α max{ , σ}] = Pr[|Ci| α σ] Pr[|Ci| α σ | |Ai| σ] Pr[|Ai| ] = Ω(α ) Ω(1) = Ω(α ) . We now return to the details of Private Count Sketch. For a given ℓ [d], Private Count Sketch finds k simple estimates Xi = si(ℓ) ( CS(x)i,hi(ℓ) + νi,hi(ℓ) ) of xℓand returns the median, where νi,hi(ℓ) N(0, σ2) is Gaussian noise. Since the noise is symmetric, we get exactly the same distribution of Xi if we compute it as Xi = si(ℓ) ( CS(x)i,hi(ℓ) ) + Bi where Bi N(0, σ2). The point is that we can fix the variables in si(ℓ) ( CS(x)i,hi(ℓ) ) first, and with the sign s(i) fixed, si(ℓ)νi,hi(ℓ) N(0, σ2). In the above si(ℓ)(CS(x)i,hi(ℓ)) is the ith simple estimator of xℓin the Count Sketch. It has error Ai = si(ℓ)(CS(x)i,hi(ℓ)) xℓ so the error of Private Count Sketch is distributed as Ci = Ai + Bi. For the error Ai from Count Sketch Minton & Price (2014) (proof of Theorem 4.1, using Corollary 3.2) proved that (c) is satisfied with = ||tailb/2(x)||2/ b. Hence, by Lemma 3.4, Pr[|Ci| α max{ , σ}] = Ω(α) for every α [0, 1]. Now, by (3) for every α [0, 1], Pr[|mediani [k]Ci| > α max{ , σ}] 2 exp( Ω(α2k)) . If m is the index of the median Ci, then Xm = xℓ+ Cm is the Private Countsketch estimator of xℓ, so this completes the proof of Theorem 3.3. We note that Theorem 3.3 could also be proved using the framework of (Minton & Price, 2014), exploiting that Gaussian variables have non-negative Fourier transform. However, this requires that the noise free error Ai also has non-negative Fourier transform. Indeed this is the case for Count Sketch as proved in (Minton & Price, 2014) if b > 1. However, our proof does not require Ai to have non-negative Fourier transform, and this could prove useful in other contexts where Gaussian noise is added. Limitations. Our error analysis relies on the assumption that the sign hash functions used are fully random. This assumption may be expensive to realize in practice, though it is always possible by storing an explicit, random list of all hash values (as done in our experiments). Alternatively, as observed in (Minton & Price, 2014) we could use the pseudorandom generator of Nisan (1990) to implement our hash functions, which requires space that exceeds the space for the sketch by a logarithmic factor. Furthermore, these hash functions can be shared among several Count Sketches, reducing the space overhead. Another issue to be aware of is that though Count Sketch will be able to significantly compress sufficiently skewed or sparse input vectors, the size of a Count Sketch with good accuracy can in general be larger than the size of the original vector. 3.4 Comparison to Other Private Sketches It is instructive to compare Private Count Sketch to other private sketches that have been studied in the literature, in particular Private Count-Min Sketch (Melis et al., 2016) and Count Mean Sketch (Apple Differential Privacy Team, 2017). Private Count-Min Sketch. The Count-Min Sketch (Cormode & Muthukrishnan, 2005) is a wellknown sketch that corresponds to Count Sketch without the sign functions (or alternatively with constant si(ℓ) = 1). Its estimator is similar to that of Count Sketch except that it works by taking the minimum of the k independent estimates obtained from the sketch. The minimum estimator has a one-sided error guarantee (never underestimates xℓ) in terms of ||tailb/2(x)||1 for vectors x that have only nonnegative entries. Similar to Count Sketch its failure probability decreases exponentially with the number k of repetitions. A private version of Count-Min Sketch, studied by Melis et al. (2016), works by adding independent (Laplace) noise to each entry of the sketch. This of course breaks the one-sided error guarantee. Another variant, with a non-negative binomial noise distribution (for integer valued vectors), was studied by Ghazi et al. (2021). In both cases the minimum estimator is biased and its variance grows with k. A third possibility would be Gaussian noise, but again the variance of the minimum estimator can be large (at least O(k/ log k) for the minimum of k Gaussians with variance k). Private Count-Mean Sketch. We use Count-Mean Sketch to refer to Count Sketch, where the estimator is the mean rather than the median. This sketch was studied by Apple Differential Privacy Team (2017) in the local model, where it was only used on 1-hot vectors with a single 1. One can ask if the mean estimator is useful in other models of differential privacy, but unfortunately it lacks the robustness properties of Count Sketch. In particular, it gives error bounds in terms of the norm ||x||2 rather than ||tailb/2(x)||2 which means that it is not robust against outliers or adversarial data. Local differential privacy Zhou et al. (2022) consider differentially private encodings of t-sparse vectors with nonzero values in [ 1, 1]. They use a Count Sketch with a single repetition to encode a t-sparse vector, optionally apply a clipping step, and then add Laplace noise to ensure privacy. The main use case is the local differential privacy (LDP) model, where each noisy Count Sketch is sent it to an aggrator that sums the estimates of n users. Kane & Nelson (2014) have shown that the norm of a Count Sketch, ||Ax||2, with k = Oγ(log(1/δ)) repetitions is within a factor 1 γ from k||x||2 with probability at least 1 δ. In particular, the Count Sketch of a t-sparse vector x with nonzero values in [ 1, 1] has norm ||Ax||2 < 2 kt with probability 1 δ. Applying clipping to ensure norm at most 2 kt and scaling the Gaussian noise by 2 kt our results imply an alternative LDP encoding of t-sparse vectors in [ 1, 1]d that differs from the protocol of Zhou et al. (2022) by offering smaller error (in fact matching their lower bound) at the expense of larger communication complexity. 4 Experiments We have conducted experiments to empirically investigate the properties of private Count Sketch. Our main result ignores constant factors in the exponent of the bound on failure probability, but we see in experiments that these constants are very reasonable. All experiments can be recreated by running a Python script available on Git Hub 1 (runs in 13 minutes on an M1 Mac Book Pro). Density plots are smoothed using the Seaborn library s kdeplot function with default parameters. 4.1 Median of normals As a warm-up we consider the setting in which a function f(x) is released k times independently, r1, . . . , rk N(f(x), k), for some integer k. The magnitude of the noise is chosen such that the privacy is bounded independently of k, e.g., if f(x) has sensitivity 1 then (r1, . . . , rk) satisfies 1/2-zero-concentrated differentially privacy (different parameters can be achieved by scaling the magnitude of the noise). Each release has noise of expected magnitude Θ( k), but the median 1https://github.com/rasmus-pagh/private-countsketch/releases/tag/v1.0 4 2 0 2 4 0.00 Median of k Gaussians with mean zero and variance k k=3 k=7 k=15 k=31 k=63 Figure 1: It is well-known that the median of k Gaussians with variance k is itself subgaussian, with variance independent of k. This illustrates a best case for Private Count Sketch, in which estimators Xi (w/o noise) provide the exact answer. 0 1 2 3 4 5 0.0 Additional error over Countsketch in a zero variance setting k=3 k=7 k=15 k=31 k=63 Std. Gaussian Figure 2: Distribution of noise magnitude for Private Count Sketch in the setting where Count Sketch estimators have zero variance. The magnitude is only slightly higher than a standard Gaussian, achieving the same level of privacy. 15 10 5 0 5 10 15 0.0 Cumulative error distribution for sparse vectors, entries in {0, 10} k=1 k=3 k=7 k=15 k=31 k=63 Std. Gaussian Figure 3: Error distribution for Private Count Sketch on sparse vectors for various number of repetitions, compared to making the output differentially private using a standard Gaussian, giving the same privacy guarantee. has noise that can be bounded independently of k, as illustrated in Figure 1. This gives a form of differentially private secret sharing, where i of k releases can be combined to yield an estimator with noise of magnitude Θ( k/i). Taking the mean value also gives a low-variance estimator, but mean values are sensitive to outliers and are not robust to adversarially corrupted data. 4.2 The zero-variance Count Sketch setting Next, we consider the idealized setting in which Count Sketch itself does not have any error. This corresponds to letting the table size b go to infinity, so we would hope to come close to the noise achieved by applying the Gaussian mechanism directly on the vector x. More generally, this is indicative of the situation in which the noise added to achieve differential privacy dominates the noise coming from randomness in Count Sketch. Figure 2 shows the cumulative distribution function for the absolute value of the error obtained by Private Count Sketch, for different values of k. (Of course, in this setting choosing k = 1 suffices to achieve a low failure probability, but a large value of k is needed in general to ensure few estimation failures.) As can be seen, the noise distributions are quite close to that the standard Gaussian mechanism operating directly on x. To achieve a desired level of privacy, the noise in both cases has to be scaled appropriately, e.g., to achieve (ε, δ)- differential privacy for ε < 1 it suffices to multiply the noise by a factor 2 ln(1/δ)/ε. 4.3 Sparse vectors Next we investigate the error distribution on t-sparse vectors, where at most t entries are non-zero. It is well-known that a Count Sketch with k = O(log d) repetitions each using space b = O(t) is able 300000 200000 100000 0 100000 200000 300000 0.0 1e 5World cities sketch error with DP noise of magnitude 1e+04 k=1, b=10000 k=3, b=10000 k=5, b=10000 k=9, b=10000 k=19, b=10000 Std. Gaussian 3 2 1 0 1 2 3 1e6 1e 7World cities sketch error with DP noise of magnitude 1e+05 k=1, b=1000 k=3, b=1000 k=5, b=1000 k=9, b=1000 k=19, b=1000 Std. Gaussian Figure 4: Distribution of noise for Private Count Sketch on the world cities dataset with table size b = 10, 000 (left) and b = 1000 (right) for various number of repetitions k. The scale of the noise added for differential privacy is σ = 10, 000 (left) and σ = 100, 000 (right) so the sketch achieves strong group privacy. The Gaussian with standard deviation σ is included for comparison. to reconstruct t-sparse vectors without any error, with high probability. Thus we expect the error from Private Count Sketch, with a sufficiently large number of repetitions, to be similar to the error obtained by applying the Gaussian mechanism to the raw vectors. Figure 3 shows the cumulative error distribution for a Private Count Sketch (event-level privacy) representing a t-sparse vector in which every nonzero entry has a value of 10, using k repetitions with table size b = t. As can be seen, once the number of repetitions goes to 15 or more, the error distribution becomes comparable to that of the Gaussian mechanism applied directly to the sparse representation. 4.4 Real-world examples Population data with group privacy. To illustrate how Private Count Sketch may be used on realworld data, we first consider the noise of a sketch of population counts in about 40,000 cities2. We can consider this as a very sparse, high-dimensional data set indexed by the names of cities (among the set of all possible strings). In this way, the sketch does not need to contain any direct information on the set of cities whose population counts are stored, though we will be able to infer such membership from auxiliary information on population (if this number is sufficiently high). We aim for group privacy for large sets of people (see (Dwork & Roth, 2014) for a formal definition) by setting the target noise magnitude on estimates high, from 104 to 105. Intuitively, this hides the contribution to the sketch from any not too large group of people. The table size b of the sketches is chosen to roughly balance the noise from the Count Sketch itself and the Gaussian mechanism. Figure 4 shows the figure with noise at scale 104 (left) and at scale 105 (right). As can be seen, the tails of the noise become noticably thinner as k increased (of course at the cost of a factor k in sketch size). Some of the sketches have size kb larger than the original data set (the sparsity of the vector indexed by city names) so the point of sketching is not compression per se, though one would expect to see compression for larger and more skewed data sets. Market basket data. Finally, we consider representing two sparse histograms based on market basket data sets obtained from the FIMI data collection, http://fimi.uantwerpen.be/data/: The kosarak dataset (collected by Ferenc Bodon) contains click-stream data of a Hungarian on-line news portal, a set of click IDs per user session. We limit the size of each set to 100 clicks, resulting in 40148 distinct click IDs and 7264322 clicks in total. The high-dimensional vector considered represents the number of occurrences of each click ID. The retail dataset (collected by Tom Brijs) contains the shopping basket data from a Belgian retail store, a set of item IDs per customer. We limit the size of each set to 30 items, resulting in 16243 distinct item IDs and 888317 items in total. The high-dimensional vector considered represents the number of purchases of each item. 2Retrieved 2022-01-27 from https://simplemaps.com/data/world-cities 2000 1000 0 1000 2000 Estimation error (up to) Estimation error CDF kosarak (max basket size 100), k=5, b=1000 private CS, =1, = 1e-06 standard CS 800 600 400 200 0 200 400 600 800 Estimation error (up to) Estimation error CDF retail (max basket size 30), k=5, b=500 private CS, =1, = 1e-06 standard CS Figure 5: Cumulative distribution of noise for Private Count Sketch on two market basket datasets, kosarak (left) and retail (right). Privacy is with respect to adding or removing a single basket, where baskets are truncated to a maximum size of 100 and 30, respectively. For both datasets we constructed Count Sketches of size considerably smaller than the number of distinct IDs: kb = 5000 sketch entries for kosarak and kb = 2500 entries for retail. Figure 5 shows the empirical cumulative distribution functions of the error of Count Sketch and Private Count Sketch for the two datasets. For privacy parameters ε = 1 and δ = 10 6 (with respect to a single market basket) we see that the error of Private Count Sketch is nearly as well concentrated as the error of Count Sketch. 5 Conclusion We have seen that Count Sketch can be made differentially private at essentially the smallest conceivable cost: Coordinates of the sketched vector are estimated with error that is close to the maximum of the error from non-private Count Sketch and the error necessary for differential privacy without sketching. A problem we leave is to obtain the same error bound with explicit, space-efficient classes of hash functions. Since Count Sketch can represent t-sparse vectors without error, for k = O(log d), this implies a new differentially private representation of such vectors that uses space O(t log d), is dynamic in the sense that the set of nonzero entries can be updated, and which keeps the level of noise close to the best possible in the non-sparse setting. This also implies that we can get good estimates of dot products between standard sparse vectors and vectors represented using Private Count Sketch. Less clear is how to best estimate a dot product between two vectors given as Private Count Sketches. This is related to work of Stausholm (2021) on differentially private Euclidean distance estimation, though one could hope for error guarantees in terms of the norm of the tails of the two vectors. Finally, we note that Cohen et al. (2022) recently used differential privacy techniques in connection with Count Sketch in order to achieve robustness against adaptive adversaries that attempt to find elements that appear to be heavy hitters but are in fact not. It is worth investigating whether Private Countsketch has similar robustness properties. Acknowledgments and Disclosure of Funding We would like to thank the anonymous reviewers for their help with improving the exposition, and in particular with clarifying the relationship between our work and Minton & Price (2014). The authors are affiliated with Basic Algorithms Research Copenhagen (BARC), supported by the VILLUM Foundation grant 16582. Rasmus Pagh is supported by a Providentia, a Data Science Distinguished Investigator grant from Novo Nordisk Fonden. Acharya, J., Sun, Z., and Zhang, H. Hadamard response: Estimating distributions privately, efficiently, and with little communication. In Proceedings of International Conference on Artificial Intelligence and Statistics (AISTATS), pp. 1120 1129. PMLR, 2019. Apple Differential Privacy Team. Learning with privacy at scale. Apple Mach. Learn. J, 1(8):1 25, 2017. Aumüller, M., Lebeda, C. J., and Pagh, R. Differentially private sparse vectors with low error, optimal space, and fast access. In Proceedings of Conference on Computer and Communications Security (CCS), pp. 1223 1236. ACM, 2021. Bassily, R. and Smith, A. Local, private, efficient protocols for succinct histograms. In Proceedings of ACM Symposium on Theory of Computing (STOC), pp. 127 135, 2015. Bassily, R., Nissim, K., Stemmer, U., and Thakurta, A. Practical locally private heavy hitters. Journal of Machine Learning Research, 21(16):1 42, 2020. Bun, M. and Steinke, T. Concentrated differential privacy: Simplifications, extensions, and lower bounds. In Theory of Cryptography Conference (TCC), pp. 635 658. Springer, 2016. Chan, T. H., Shi, E., and Song, D. Optimal lower bound for differentially private multi-party aggregation. In Proceedings of European Symposium on Algorithms (ESA), pp. 277 288. Springer, 2012. Charikar, M., Chen, K., and Farach-Colton, M. Finding frequent items in data streams. Theoretical Computer Science, 312(1):3 15, 2004. Cheu, A., Smith, A., Ullman, J., Zeber, D., and Zhilyaev, M. Distributed differential privacy via shuffling. In Proceedings of International Conference on the Theory and Applications of Cryptographic Techniques (Eurocrypt), pp. 375 403. Springer, 2019. Cohen, E., Lyu, X., Nelson, J., Sarlós, T., Shechner, M., and Stemmer, U. On the robustness of countsketch to adaptive inputs. In International Conference on Machine Learning (ICML), volume 162 of Proceedings of Machine Learning Research, pp. 4112 4140. PMLR, 2022. Cormode, G. and Muthukrishnan, S. An improved data stream summary: the count-min sketch and its applications. Journal of Algorithms, 55(1):58 75, 2005. Dwork, C. and Roth, A. The algorithmic foundations of differential privacy. Foundations and Trends o in Theoretical Computer Science, 9(34):211 407, 2014. ISSN 1551-305X. doi: 10.1561/ 0400000042. Dwork, C., Mc Sherry, F., Nissim, K., and Smith, A. Calibrating noise to sensitivity in private data analysis. In Theory of cryptography conference (TCC), pp. 265 284. Springer, 2006. Dwork, C., Mc Sherry, F., Nissim, K., and Smith, A. Calibrating noise to sensitivity in private data analysis. Journal of Privacy and Confidentiality, 7(3):17 51, 2016. Erlingsson, Ú., Feldman, V., Mironov, I., Raghunathan, A., Talwar, K., and Thakurta, A. Amplification by shuffling: From local to central differential privacy via anonymity. In Proceedings of Symposium on Discrete Algorithms (SODA), pp. 2468 2479. SIAM, 2019. Ghazi, B., Golowich, N., Kumar, R., Pagh, R., and Velingker, A. On the power of multiple anonymous messages: Frequency estimation and selection in the shuffle model of differential privacy. In Proceedings of International Conference on the Theory and Applications of Cryptographic Techniques (Eurocrypt), pp. 463 488. Springer, 2021. Huang, Z., Qiu, Y., Yi, K., and Cormode, G. Frequency estimation under multiparty differential privacy: One-shot and streaming. Proc. VLDB Endow., 15(10):2058 2070, 2022. Kairouz, P., Mc Mahan, H. B., Avent, B., Bellet, A., Bennis, M., Bhagoji, A. N., Bonawitz, K. A., Charles, Z., Cormode, G., Cummings, R., D Oliveira, R. G. L., Eichner, H., Rouayheb, S. E., Evans, D., Gardner, J., Garrett, Z., Gascón, A., Ghazi, B., Gibbons, P. B., Gruteser, M., Harchaoui, Z., He, C., He, L., Huo, Z., Hutchinson, B., Hsu, J., Jaggi, M., Javidi, T., Joshi, G., Khodak, M., Koneˇcný, J., Korolova, A., Koushanfar, F., Koyejo, S., Lepoint, T., Liu, Y., Mittal, P., Mohri, M., Nock, R., Özgür, A., Pagh, R., Qi, H., Ramage, D., Raskar, R., Raykova, M., Song, D., Song, W., Stich, S. U., Sun, Z., Suresh, A. T., Tramèr, F., Vepakomma, P., Wang, J., Xiong, L., Xu, Z., Yang, Q., Yu, F. X., Yu, H., and Zhao, S. Advances and open problems in federated learning. Found. Trends Mach. Learn., 14(1-2):1 210, 2021. Kane, D. M. and Nelson, J. Sparser Johnson-Lindenstrauss transforms. Journal of the ACM, 61(1), January 2014. ISSN 0004-5411. Larsen, K. G., Pagh, R., and Tˇetek, J. Countsketches, feature hashing and the median of three. In Proceedings of International Conference on Machine Learning (ICML), volume 139 of Proceedings of Machine Learning Research, pp. 6011 6020. PMLR, 2021. Melis, L., Danezis, G., and Cristofaro, E. D. Efficient private statistics with succinct sketches. In Annual Network and Distributed System Security Symposium (NDSS). The Internet Society, 2016. Minton, G. T. and Price, E. Improved concentration bounds for count-sketch. In Proceedings of Symposium on Discrete Algorithms (SODA), pp. 669 686, 2014. Mir, D., Muthukrishnan, S., Nikolov, A., and Wright, R. N. Pan-private algorithms via statistics on sketches. In Proceedings of Symposium on Principles of database systems (PODS), pp. 37 48, 2011. Mironov, I. Rényi differential privacy. In Proceedings of Computer Security Foundations Symposium (CSF), pp. 263 275. IEEE, 2017. Nisan, N. Pseudorandom generators for space-bounded computations. In Proceedings of ACM Symposium on Theory of Computing (STOC), pp. 204 212, 1990. Stausholm, N. M. Improved differentially private euclidean distance approximation. In Proceedings of Symposium on Principles of Database Systems (PODS), pp. 42 56, 2021. Weinberger, K., Dasgupta, A., Langford, J., Smola, A., and Attenberg, J. Feature hashing for large scale multitask learning. In Proceedings of International Conference on Machine Learning (ICML), pp. 1113 1120, 2009. Zhao, F., Qiao, D., Redberg, R., Agrawal, D., Abbadi, A. E., and Wang, Y. Differentially private linear sketches: Efficient implementations and applications. In Proceedings of conference on Neural Information Processing Systems (Neur IPS), 2022. Zhou, M., Wang, T., Chan, H., Fanti, G., and Shi, E. Locally differentially private sparse vector aggregation. In Symposium on Security and Privacy (SP). IEEE, 2022. 1. For all authors... (a) Do the main claims made in the abstract and introduction accurately reflect the paper s contributions and scope? [Yes] (b) Did you describe the limitations of your work? [Yes] (c) Did you discuss any potential negative societal impacts of your work? [N/A] (d) Have you read the ethics review guidelines and ensured that your paper conforms to them? [Yes] 2. If you are including theoretical results... (a) Did you state the full set of assumptions of all theoretical results? [Yes] (b) Did you include complete proofs of all theoretical results? [Yes] 3. If you ran experiments... (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] (b) Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [N/A] (c) Did you report error bars (e.g., with respect to the random seed after running experiments multiple times)? [N/A] (d) Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes] 4. If you are using existing assets (e.g., code, data, models) or curating/releasing new assets... (a) If your work uses existing assets, did you cite the creators? [Yes] (b) Did you mention the license of the assets? [Yes] (c) Did you include any new assets either in the supplemental material or as a URL? [Yes] (d) Did you discuss whether and how consent was obtained from people whose data you re using/curating? [N/A] (e) Did you discuss whether the data you are using/curating contains personally identifiable information or offensive content? [N/A] 5. If you used crowdsourcing or conducted research with human subjects... (a) Did you include the full text of instructions given to participants and screenshots, if applicable? [N/A] (b) Did you describe any potential participant risks, with links to Institutional Review Board (IRB) approvals, if applicable? [N/A] (c) Did you include the estimated hourly wage paid to participants and the total amount spent on participant compensation? [N/A]