# lossless_compression_of_efficient_private_local_randomizers__c3b66374.pdf Lossless Compression of Efficient Private Local Randomizers Vitaly Feldman 1 Kunal Talwar 1 Locally Differentially Private (LDP) Reports are commonly used for collection of statistics and machine learning in the federated setting. In many cases the best known LDP algorithms require sending prohibitively large messages from the client device to the server (such as when constructing histograms over a large domain or learning a high-dimensional model). Here we demonstrate a general approach that, under standard cryptographic assumptions, compresses every efficient LDP algorithm with negligible loss in privacy and utility guarantees. The practical implication of our result is that in typical applications every message can be compressed to the size of the server s pseudo-random generator seed. From this general approach we derive low-communication algorithms for the problems of frequency estimation and high-dimensional mean estimation. Our algorithms are simpler and more accurate than existing low-communication LDP algorithms for these well-studied problems. 1 Introduction We consider the problem of collecting statistics and machine learning in the setting where data is held on a large number of user devices. The data held on devices in this federated setting is often sensitive and thus needs to be analyzed with privacy preserving techniques. One of the key approaches to private federated data analysis relies on the use of locally differentially private (LDP) algorithms to ensure that the report sent by a user s device reveals little information about that user s data. Specifically, a randomized algorithm R: X Y is an ε-DP local randomizer if for every possible output y Y , and any two possible values of user data x1, x2 X, Pr[R(x1) = y] and Pr[R(x2) = y] are within a factor of eε (where the probability is taken solely with respect to the randomness of the algorithm R). 1Apple. Correspondence to: Vitaly Feldman . Proceedings of the 38 th International Conference on Machine Learning, PMLR 139, 2021. Copyright 2021 by the author(s). The concept of a local randomizer dates back to the work of Warner (1965) where it was used to encourage truthfulness in surveys. In the context of modern data analysis it was introduced by Evfimievski et al. (2003) and then related to differential privacy in the seminal work of Dwork et al. (2006). Local randomizers are also used for collection of statistics and machine learning in several industrial applications (Erlingsson et al., 2014; Apple s Differential Privacy Team, 2017; Ding et al., 2017). Practical applications such as building a histogram over a large domain or training a model with millions of parameters (Mc Mahan et al., 2018), require applying the randomizer to high dimensional data. Many of the standard and most accurate ways to randomize such data result in reports whose size scales linearly with the dimension of the problem. Communication from the user devices is often significantly constrained in practical applications. This limits the scope of problems in which we can achieve the best known utility-privacy tradeoff and motivates significant research interest in designing communication-efficient LDP algorithms. 1.1 Our contribution In this work, we explore practical and theoretical aspects of designing low-communication LDP mechanisms. It has long been noted that, by design, the output of an LDP randomizer contains a limited amount of information about the input data. Thus it should be possible to reduce the communication down to the information content about the data using standard tools from information theory. We will refer to such reduction in communication as compression of the LDP randomizer. To avoid confusion, we note that the goal is not to compress the content of the original messages of the randomizer but rather to find a different randomizer that communicates the same information about the data point using shorter messages. Unfortunately, standard information-theoretic approaches to such compression do not necessarily preserve privacy. Bassily & Smith (2015) and Bun et al. (2019) describe general privacy preserving techniques for compressing LDP protocols. Unfortunately, their techniques result either in the loss of accuracy (as only a fraction of the user population ends up contributing a report) or an increase in ε by a constant factor > 2. Increase by such a constant factor is likely to make the algorithm impractical in the ε > 1 Lossless Compression of Efficient Private Local Randomizers regime since in some problems accuracy scales as e ε/2 when ε > 1. The ε > 1 regime has become particularly important since the introduction of the privacy amplification techniques based on anonymization and shuffling of LDP reports (Bittau et al., 2017; Erlingsson et al., 2019; Cheu et al., 2019; Balle et al., 2019; Feldman et al., 2020). Central privacy guarantees resulting from amplification by shuffling scale as e ε/2 making preservation of ε crucial in this setting. We propose a general approach to compressing an arbitrary local randomizer that preserves both the privacy and accuracy (or utility) of the randomizer. At a high level it is based on replacing the true random bits used to generate the output with pseudo-random bits that can be described using a short seed. For a randomizer R: X Y , we do this by first picking a fixed reference distribution ρ that is data-independent and ε-close (in the standard sense of differential privacy) to the output distributions of R for all possible inputs x X. Existence of such reference distribution is exactly the definition of the deletion version of local differential privacy (Erlingsson et al., 2020) and thus our results are easiest to describe in this model. A sample from ρ typically requires many random bits to generate but, by replacing random bits with pseudo-randomly generated ones, we will obtain a distribution over values in Y that can be described using a short seed. In addition, under standard cryptographic assumptions, a random sample from this distribution is computationally indistinguishable from ρ. Given an input x we can now emulate R(x) by performing rejection sampling relative to pseudo-random samples from ρ. A special case of this idea appears in the work of Mishra & Sandler (2006) who apply it the problem of estimating sets of counting queries. A crucial question is whether this scheme satisfies ε differential privacy. We show that the answer is yes if the pseudo-random generator (PRG) used is strong enough to fool a certain test that looks at the ratio of the output density of R(x) to ρ. This ratio is typically efficiently computable whenever the randomizer itself is efficiently computable. Thus under standard cryptographic assumptions, the privacy is preserved (up to a negligible loss). Similarly, when the processing of the reports on the server side is done by an efficient algorithm the utility will be preserved. See Theorem 3.4 for a formal statement. Asymptotically, this result implies that if we assume that there exists an exponentially strong PRG, then the number of bits that needs to be communicated is logarithmic in the running time of the rejection sampler we defined. An immediate practical implication of this result is that in most applications the output of the local randomizer can be compressed to the size of the seed of the system (PRG) without any observable effect on utility or privacy. This size is typically less than 1024 bits. We remark that when implementing a randomizer in practice, true randomness is replaced with pseudo-randomly generated bits with an (implicit) assumption that this does not affect privacy or utility guarantees. Thus the assumptions underlying our analysis are similar to those that are already present in practical implementations of differentially private algorithms. We demonstrate that this approach also extends to the (more common) replacement notion of local differential privacy and also to (ε, δ)-DP randomizers. In the latter case the randomizer needs to be modified to allow subsampling via simple truncation. This step adds δ to both privacy and utility guarantees of the algorithm. For replacement DP this version also requires a more delicate analysis and a stronger set of tests for the PRG. A detailed description of these results is given in Section 3. An important property of our analysis is that we do not need to follow the general recipe for specific randomizers. Firstly, for some randomizers it is possible to directly sample from the desired distribution over seeds instead of using rejection sampling that requires eε trials (in expectation). In addition, it may be possible to ensure that privacy and utility are preserved without appealing to general cryptographically secure PRGs and associated computational assumptions. In particular, one can leverage a variety of sophisticated results from complexity theory, such as k-wise independent PRGs and PRGs for bounded-space computation (Nisan, 1992), to achieve unconditional and more efficient compression. We apply this fine-grained approach to the problem of frequency estimation over a discrete domain. In this problem the domain X = [k] and the goal is to estimate the frequency of each element j [k] in the dataset. This is one of the central and most well-studied problems in private (federated) data analysis. However, for ε > 1, existing approaches either require communication on the order of k bits, or do not achieve the best known accuracy in some important regimes (see Sec. 1.2 for an overview). The best accuracy is achieved for this problem is achieved by the (asymmetric) RAPPOR algorithm (Erlingsson et al., 2014) (which has two versions depending on whether it is used with replacement or deletion privacy) and also by the closely related Subset Selection algorithm (Wang et al., 2016; Ye & Barg, 2018). We observe that a pairwiseindependent PRG suffices to fool both the privacy and utility conditions for this randomizer. Thus we can compress RAPPOR to O(log k + ε) bits losslessly and unconditionally using a standard construction of a pairwise-independent PRG (Luby et al., 2006). The structure of the PRG also allows us to sample the seeds efficiently without rejection sampling. The details of this construction appear in Section 3. As an additional application of our techniques we consider the problem of estimating the mean of d-dimensional vectors Lossless Compression of Efficient Private Local Randomizers in ℓ2-norm. This problem is a key part of various machine learning algorithms, most notably stochastic gradient descent. In the ε > 1 regime, the first low-communication (specifically, ε log2 d bits) and asymptotically optimal algorithm was recently given by Chen et al. (2020). It is however less accurate empirically and more involved than the algorithm of Bhowmick et al. (2019) that communicates a d dimensional vector. Using our general result we can losslessly compress the algorithm from (Bhowmick et al., 2019) to O(log d + ε) bits. One limitation of this approach is the O(eεd) complexity of rejection sampling in this case which can be prohibitive for large ε. However we show a simple reduction of the ε > 1 case to ε < 1 which increases communication but a factor of ε . This general reduction allows us to reduce the running time to O( ε d) and also use a simple and low-communication randomizer that is (asymptotically) optimal only when ε < 1 (Duchi et al., 2018; Erlingsson et al., 2020). The details of these results and empirical comparisons appear in Section 5. 1.2 Related Work As mentioned, the closest in spirit to our work is the use of rejection sampling in the work of Mishra & Sandler (2006). Their analysis can be seen as a special case of ours but they only prove that the resulting algorithm satisfies 2ε-DP. Rejection sampling on a sample from the reference distribution is also used in existing compression schemes (Bassily & Smith, 2015; Bun et al., 2019) as well as earlier work on private compression in the two-party setting (Mc Gregor et al., 2010). These approaches assume that the sample is shared between the client and the server, namely, it requires shared randomness. Shared randomness is incompatible with the setting where the report is anonymized and is not directly linked to the user that generated it. As pointed out in (Bassily & Smith, 2015), a simple way to overcome this problem is to include a seed to a PRG in the output of the randomizer and have the server generate the same sample from the reference distribution as the client. While superficially this approach seems similar to ours, its analysis and properties are different. For example, in our setting only the seed for a single sample that passes rejection sampling is revealed to the server, whereas in (Bassily & Smith, 2015; Bun et al., 2019) all samples from the reference distribution are known to the server and privacy analysis does not depend on the strength of the PRG. More importantly, unlike previous approaches our compression scheme is essentially lossless (although at the cost of requiring assumptions for the privacy analysis). Computational Differential Privacy (CDP) (Mironov et al., 2009) is a notion of privacy that defends against computationally bounded adversaries. Our compression algorithm can be easily shown to satisfy the strongest SIM-CDP definition. At the same time, our privacy bounds also hold for computationally unbounded adversaries as long as the LDP algorithm itself does not lead to a distinguisher. This distinction allows us to remove computational assumptions for specific LDP randomizers. For both deletion and replacement privacy the best results for frequency estimation are achieved by variants of the RAPPOR algorithm (Erlingsson et al., 2014) and also by a closely-related Subset Selection algorithm (Wang et al., 2016; Ye & Barg, 2018). Unfortunately, both RAPPOR and Subset Selection have very high communication cost of k H(1/(eε + 1)), where H is the binary entropy function. This has led to numerous and still ongoing efforts to design low-communication protocols for the problem (Hsu et al., 2012; Erlingsson et al., 2014; Bassily & Smith, 2015; Kairouz et al., 2016; Wang et al., 2016; 2017; Ye & Barg, 2018; Acharya et al., 2019; Acharya & Sun, 2019; Bun et al., 2019; Bassily et al., 2020; Chen et al., 2020). A number of low-communication algorithms that achieve asymptotically optimal bounds in the ε < 1 regime are known (Bassily & Smith, 2015; Wang et al., 2017; Acharya et al., 2019; Acharya & Sun, 2019; Bassily et al., 2020; Chen et al., 2020). The first low-communication algorithm that achieves asymptotically optimal bounds in the ε > 1 regime is given in (Wang et al., 2017). It communicates O(ε) bits and relies on shared randomness. However, it matches the bounds achieved by RAPPOR only when eε is an integer. Acharya & Sun (2019) and Chen et al. (2020) give closely related approaches that are asymptotically optimal and use log2 k bits of communication (without shared randomness). However both the theoretical bounds and empirical results for these algorithms are noticeably worse than those of (asymmetric) RAPPOR and Subset Selection (e.g. plots in (Chen et al., 2020) show that these algorithms are 15-20% worse for ε = 5 than Subset Selection1). The constructions in (Acharya & Sun, 2019; Chen et al., 2020) and their analysis are also substantially more involved than RAPPOR. A closely related problem is finding heavy hitters , namely all elements j [k] with counts higher than some given threshold. In this problem the goal is to avoid linear runtime dependence on k that would result from doing frequency estimation and then checking all the estimates. This problem is typically solved using a frequency oracle which is an algorithm that for a given j [k] returns an estimate of the number of j s held by users (typically without computing the entire histogram) (Bassily & Smith, 2015; Bassily et al., 2020; Bun et al., 2019). Frequency estimation is also closely 1The error of asymmetric RAPPOR (namely 0 and 1 are flipped with different probabilities) is essentially identical to that of the Subset Selection randomizer. Comparisons with RAPPOR often use the symmetric RAPPOR which is substantially worse than the asymmetric version for the replacement notion of differential privacy. See Section 4 for details. Lossless Compression of Efficient Private Local Randomizers related to the discrete distribution estimation problem in which inputs are sampled from some distribution over [k] and the goal is to estimate the distribution (Ye & Barg, 2018; Acharya et al., 2019; Acharya & Sun, 2019). Indeed, bounds for frequency estimation can be translated directly to bounds on distribution estimation by adding the sampling error. Mean estimation has attracted a lot of attention in recent years as it is an important subroutine in differentially private (stochastic) gradient descent algorithms (Bassily et al., 2014; Abadi et al., 2016) used in private federated learning (Kairouz et al., 2019). Indeed, private federated optimization algorithms aggregate updates to the model coming from each client in a batch of clients by getting a private estimate of the average update. When the models are large, the dimensionality of the update d leads to significant communication cost. Thus reducing the communication cost of mean estimation has been studied in many works with (Agarwal et al., 2018; Girgis et al., 2020; Chen et al., 2020; Gandikota et al., 2019) or without privacy (Alistarh et al., 2017; Faghri et al., 2020; Suresh et al., 2017; Gandikota et al., 2019; Mayekar & Tyagi, 2020). In the absence of communication constraints and ε < d, the optimal ε-LDP protocols for this problem achieve an expected squared ℓ2 error of Θ( d n min(ε,ε2)) (Duchi et al., 2018; Duchi & Rogers, 2019). When ε 1, the randomizer of Duchi et al. (2018) also achieves the optimal O( d nε2 ) bound. Recent work of Erlingsson et al. (2020) gives a low-communication version of this algorithm. Building on the approach in (Duchi et al., 2018), Bhowmick et al. (2019) describe the Priv Unit algorithm that achieves the asymptotically optimal accuracy also when ε > 1 but has communication cost of Ω(d). An alternative approach in the ε < 1 regime was given by Feldman et al. (2015) who show that the mean estimation problem can be solved by having each client answer a single counting query. This approach is based on Kashin s representation that maps vectors in the unit d-dimensional ball to vectors in [ 1, 1]O(d) (Lyubarskii & Vershynin, 2010). Their work does not explicitly discuss the communication cost and assumes that the server can pick the randomizer used at each client. However it is easy to see that a single bit suffices to answer a counting query and therefore an equivalent randomizer can be implemented using log2 d + 1 bits of communication (or just 1 bit if shared randomness is used). Chen et al. (2020) give a randomizer based on the same idea that also achieves the asymptotically optimal bound in the d > ε > 1 regime. Their approach uses ε log2 d bits of communication. Computing Kashin s representation is more involved than algorithms in (Duchi et al., 2018; Bhowmick et al., 2019). In addition, as we demonstrate empirically2, the variance of the estimate resulting 2Plots in (Chen et al., 2020) also compare their algorithm with from this approach is nearly a factor of 5 larger for typical parameters of interest. 2 Preliminaries For a positive integer k we denote [k] = {1, 2 . . . , k}. For an arbitrary set S we use x S to mean that x is chosen randomly and uniformly from S. Differential privacy (DP) is a measure of stability of a randomized algorithm. It bounds the change in the distribution on the outputs when one of the inputs is either removed or replaced with an arbitrary other element. The most common way to measure the change in the output distribution is via approximate infinity divergence. More formally, we say that two probability distributions µ and ν over (finite) domain Y are (ε, δ)-close if for all E Y , e ε(µ(E) δ) ν(E) eεµ(E) + δ. This condition is equivalent to P y Y |µ(y) eεν(y)|+ δ and P y Y |ν(y) eεµ(y)|+ δ, where |a|+ := max{a, 0} (Dwork & Roth, 2014). We also say that two random variables P and Q are (ε, δ)-close if their probability distributions are (ε, δ)-close. We abbreviate (ε, 0)-close to ε-close. Algorithms in the local model of differential privacy and federated data analysis rely on the notion of local randomizer. Definition 2.1. An algorithm R: X Y is an (ε, δ)-DP local randomizer if for all pairs x, x D, R(x) and R(x ) are (ε, δ)-close. We will also use the add/delete variant of differential privacy which was defined for local randomizers in (Erlingsson et al., 2020). Definition 2.2. An algorithm R: X Y is a deletion (ε, δ)-DP local randomizer if there exists a reference distribution ρ such that for all data points x X, R(x) and ρ are (ε, δ)-close. It is easy to see that a replacement (ε, δ)-DP algorithm is also a deletion (ε, δ)-DP algorithm, and that a deletion (ε, δ)-DP algorithm is also a replacement (2ε, 2δ)-DP algorithm. Fooling and Pseudorandomness: The notion of pseudorandomness relies on the inability to distinguish between the output of the generator and true randomness using a family of tests, where a test is a boolean function (or algorithm). Priv Unit yet as their code at (Kas) shows and was confirmed by the authors, they implemented the algorithm from (Duchi et al., 2018) instead of Priv Unit which is much worse than Priv Unit for ε = 5. The authors also confirmed that parameters stated in their figures are incorrect so cannot be directly compared to our results. Lossless Compression of Efficient Private Local Randomizers Definition 2.3. Let D be a family of boolean functions over some domain Y . We say that two random variables P and Q over Y are (D, β)-indistinguishable if for all D D, |Pr[D(P) = 1] Pr[D(Q) = 1]| β. We say that P and Q are (T, β)-computationally indistinguishable if P and Q are (D, β)-indistinguishable with D being all tests that can be computed in time T (for some fixed computational model such as boolean circuits). We now give a definition of a pseudo-random number generator. Definition 2.4 (Pseudo-random generator). We say that an algorithm G: {0, 1}n {0, 1}m where m n, βfools a family of tests D if G(s) for s {0, 1}n is (D, β)- indistinguishable from r for r {0, 1}m. We refer to such an algorithm as (D, β)-PRG and also use (T, β)-PRG to refer to G that β-fools all tests running in time T. Standard cryptographic assumptions (namely that one-way functions exist) imply that for any m and T that are polynomial in n there exists an efficiently computable (β, T)-PRG G, for negligible β (namely, β = 1/nω(1)). For a number of standard approaches to cryptographically-secure PRGs, no tests are known that can distinguish the output of the PRG from true randomness with β = 2 o(n) in time T = 2o(n). For example finding such a test for a PRG based on SHA-3 would be a major breakthrough. To make the assumption that such a test does not exist we refer to a (β, T)-PRG for β = 2 Ω(n) and T = 2Ω(n) as an exponentially strong PRG. 3 Local Pseudo-Randomizers In this section we describe a general way to compress LDP randomizers that relies on the complexity of the randomizer and subsequent processing. We will first describe the result for deletion ε-DP and then give the versions for replacement DP and (ε, δ)-DP. For the purpose of this result we first need to quantify how much randomness a local randomizer needs. We will say that a randomizer R: X Y is t-samplable if there exists a deterministic algorithm R : {0, 1}t Y such that for r chosen randomly and uniformly from {0, 1}t, R (r) is distributed according to the reference distribution of R (denoted by ρ). Typically, for efficiently computable randomizers, t is polynomial in the size of the output log(|Y |) and ε. Note that every value y in the support of ρ is equal to R (r) for some r. Thus every element that can be output by R can be represented by some r {0, 1}t. Our goal is to compress the communication by restricting the output from all those that can be represented by r {0, 1}t to all those values in Y that can be represented by a t-bit string generated from a seed of length ℓ t using some PRG G: {0, 1}ℓ {0, 1}t. We could then send the seed to the server and let the server first generate the full t-bit string using G and then run R on it. The challenge is to do this efficiently while preserving the privacy and utility guarantees of R. Our approach is based on the fact that we can easily sample from the pseudo-random version of the reference distribution ρ by outputting R (G(s)) for a random and uniform seed s. This leads to a natural way to define a distribution over seeds on a input x: a seed s is output with probability that is proportional to Pr[R(x)=R (G(s))] Prr {0,1}t[R (r)=R (G(s))]. Specifically we define the desired randomizer as follows. Definition 3.1. For a t-samplable deletion DP local randomizer R: X Y and a function G: {0, 1}ℓ {0, 1}t let R[G] denote the local randomizer that given x X, outputs s {0, 1}t with probability proportional to Pr[R(x)=R (G(s))] Prr {0,1}t[R (r)=R (G(s))]. For some combinations of a randomizer R and PRG G there is an efficient way to implement R[G] directly (as we show in one of our applications). In the general case, when such algorithm may not exist we can sample from R[G](x) by applying rejection sampling to uniformly generated seeds. A special case of this approach is implicit in the work of Mishra & Sandler (2006) (albeit with a weaker analysis). Rejection sampling only requires an efficient algorithm for computing the ratio of densities above to sufficiently high accuracy. We describe the resulting algorithm below. Algorithm 1 R[G, γ]: PRG compression of R Input: x X, ε, γ > 0; seeded PRG G: {0, 1}ℓ {0, 1}t; t-samplable ε-DP randomizer R. 1: J = eε ln(1/γ) 2: for j = 1, . . . , J do 3: Sample a random seed s {0, 1}ℓ. 4: y = R (G(s)) 5: Sample b from Bernoulli Pr[R(x)=y] eε Prr {0,1}t[R (r)=y] 6: if b == 1 then 7: BREAK 8: end if 9: end for 10: Send s Naturally, the output of this randomizer can be decompressed by applying R G to it. It is also clear that the communication cost of the algorithm is ℓbits. Next we describe the general condition on the PRG G that suffices for ensuring that the algorithm that outputs a random seed with correct probability is differentially private. Lemma 3.2. For a t-samplable deletion ε-DP local randomizer R: X Y and G: {0, 1}ℓ {0, 1}t, let D denote Lossless Compression of Efficient Private Local Randomizers the following family of tests which take r {0, 1}t as an input: Pr[R(x) = R (r )] Pr r {0,1}t[R (r) = R (r )] θ x X, θ [0, eε] where ind denotes the {0, 1} indicator function of a condition. If G β-fools D for β < 1/(2eε) then R[G] is a deletion (ε + 2eεβ)-DP local randomizer. Furthermore, for every γ > 0, R[G, γ] is a deletion (ε + 2eεβ)-DP local randomizer. Unlike the preservation of privacy, conditions on the PRG under which we can ensure that the utility of R is preserved depend on the application. Here we describe a general result that relies only on the efficiency of the randomizer to establish computational indistinguishability of the output of our compressed randomizer from the output of the original one. Lemma 3.3. Let R be a deletion ε-DP t-samplable local randomizer, let G: {0, 1}ℓ {0, 1}t be (T, β)-PRG. Let T(R, G, γ) denote the running time of R[G, γ] and assume that T > T(R, G, γ). Then for all x X, R (G(R[G, γ](x))) is (T , β )-computationally indistinguishable from R(x), where β = γ + eε ln(1/γ)β and T = T T(R, G, γ). As a direct corollary of Lemmas 3.2 and 3.3 we obtain a general way to compress efficient LDP randomizers. Theorem 3.4. Let R be a deletion ε-DP t-samplable local randomizer, let G: {0, 1}ℓ {0, 1}t be (T, β)-PRG for β < 1/(2eε). Let T(R, G, γ) be the running time of R[G, γ] and assume that T > T(R, G, γ). Then R[G, γ] is a deletion (ε + 2eεβ)-DP local randomizer and for all x X, R (G(R[G, γ](x))) is (T , β )-computationally indistinguishable from R(x), where β = γ + eε ln(1/γ)β and T = T T(R, G, γ). By plugging an exponentially strong PRG G into Theorem 3.4 we obtain that if an LDP protocol based on R runs in time T then its communication can be compressed to O(log(T + T(R, G, γ)) with negligible effect on privacy and utility. We also remark that even without making any assumptions on G, R[G, γ] satisfies 2ε-DP. In other words, failure of the PRG does not lead to a significant privacy violation, beyond the degradation of the privacy parameter ε by a factor of two. Lemma 3.5. Let R be a deletion ε-DP t-samplable local randomizer, let G: {0, 1}ℓ {0, 1}t be an arbitrary function. Then R[G, γ] is a deletion 2ε-DP local randomizer. Our approach extends in a natural way to (ε, δ)-DP as well as to the replacement model of differential privacy. We defer the proof of the following to SM. Theorem 3.6. Let R be a deletion (replacement) (ε, δ)-DP t-samplable local randomizer, let G: {0, 1}ℓ {0, 1}t be (T, β)-PRG for β < 1/(2eε). Let T(R, G, γ) is the running time of R[G, γ] and assume that T > T(R, G, γ). Then R[G, γ] is a deletion (resp. replacement) (ε+2eεβ, e O(ε)δ)- DP local randomizer. 4 Frequency Estimation In this section we apply our approach to the problem of frequency estimation over a discrete domain. In this problem on domain X = [k], the goal is to estimate the frequency of each element j [k] in the dataset. Namely, for S = (x1, . . . , xn) Xn we let c(S) {0, . . . , n}k be the vector of the counts of each of the elements in S: c(S)j = |{i | xi = j}|. In the frequency estimation problem the goal is to design a local randomizer and a decoding/aggregation algorithm that outputs a vector c that is close to c(S). Commonly studied metrics are (the expected) ℓ , ℓ1 and ℓ2 norms of c c(S). In most regimes of interest, n is large enough and all these errors are essentially determined by the variance of the estimate of each count produced by the randomizer and therefore the choice of the metric does not affect the choice of the algorithm. The randomizer used in the RAPPOR algorithm (Erlingsson et al., 2014) is defined by two parameters α0 and α1. The algorithm first converts the input j to the indicator vector of j (also referred to as one-hot encoding). It then randomizes each bit in this encoding: if the bit is 0 then 1 is output with probability α0 (and 0 with probability 1 α0) and if the bit is 1 then 1 is output with probability α1. For deletion privacy the optimal error is achieved by a symmetric setting α0 = 1/(eε + 1) and α1 = eε/(eε + 1) (Erlingsson et al., 2020). This makes the algorithm equivalent to applying the standard binary randomized response to each bit. A simple analysis shows that this results in the standard deviation of each count being neε/2 eε 1 (Erlingsson et al., 2014; Wang et al., 2017). For replacement privacy the optimal error is achieved by an asymmetric version in which α0 = 1/(eε + 1) but α1 = 1/2. The resulting standard deviation for each count is dominated by 2 neε/2 eε 1 (Wang et al., 2017). (We remark that several works analyze the symmetric RAPPOR algorithm in the replacement privacy. This requires setting α0 = (1 α1) = 1/(eε/2 + 1) resulting in a substantially worse algorithm than the asymmetric version). Note that the resulting encoding has n/(eε + 1) ones. A closely-related Subset Selection algorithm (Wang et al., 2016; Ye & Barg, 2018) maps inputs to bit vectors of length k with exactly n/(eε + 1) ones (that can be thought of as a subset of [k]). An input j is mapped with probability 1/2 to a random subset that contains j and with probability 1/2 to a random subset that does not. This results in Lossless Compression of Efficient Private Local Randomizers essentially the same marginal distributions over individual bits and variance bounds as asymmetric RAPPOR. 4.1 Pairwise-independent RAPPOR While we can use our general result to compress communication in RAPPOR, in this section we exploit the specific structure of the randomizer. Specifically, the tests needed for privacy are fooled if the marginals of the PRG are correct. Moreover the accuracy is preserved as long as the bits are randomized in a pairwise independent way. Thus we can simply use a standard derandomization technique for pairwise independent random variables. Specifically, to obtain a Bernoulli random variable with bias α0 we will use a finite field X = Fp of size p such that α0p is an integer (or, in general, sufficiently close to an integer) and p is a prime larger than k. This allows us to treat inputs in [k] as non-zero elements of Fp. We will associate all elements of the field that are smaller (in the regular order over integers) than α0p with 1 and the rest with 0. We denote this indicator function of the event z < α0p by bool(z). Now for a randomly and uniformly chosen element z Fp, we have that bool(z) is distributed as a Bernoulli random variable with bias α0. As mentioned we will, associate each index j [k] with the element j in Fp. We can describe an affine function φ over Fp using its 2 coefficients: φ0 and φ1 and for z Fp we define φ(z) = φ0+zφ1, where addition and multiplication are in the field Fp. Each such function encodes a vector in Fk p as φ([k]) := φ(1), φ(2), . . . , φ(k). Let Φ := {φ | φ F2 p} be the family of all such functions. For a randomly chosen function from this family the values of the function on two distinct non-zero values are uniformly distributed and pairwiseindependent: for any j1 = j2 [k] and a1, a2 Fp we have that Pr φ Φ[φ(j1) = a1 and φ(j2) = a2] = Pr φ Φ[φ(j1) = a1] Pr φ Φ[φ(j2) = a2] = 1 In particular, if we use the encoding of φ as a boolean vector bool(φ[k]) := bool(φ(1)), bool(φ(2)), . . . , bool(φ(k)) then we have that for φ Φ and any j1 = j2 [k], bool(φ(j1)) and bool(φ(j2)) are independent Bernoulli random variables with bias α0. Finally, for every index j [k] and bit b {0, 1} we denote the set of functions φ whose encoding has bit b in position j by Φj,b: Φj,b := {φ Φ | bool(φ(j)) = b}. (1) We can now describe the randomizer, which we refer to as Pairwise-Independent (PI) RAPPOR for general α1 > α0. Algorithm 2 PI-RAPPOR randomizer Input: An index j [k], 0 < α0 < α1 < 1, prime p k + 1 s.t. α0p N 1: Sample b from Bernoulli(α1) 2: Sample randomly φ from Φj,b defined in eq. (1) 3: Send φ The server side of the frequency estimation with pairwiseindependent RAPPOR consists of a decoding step that converts φ to bool(φ[k]) and then the same debiasing and aggregation as for the standard RAPPOR. We describe it as a frequency oracle to emphasize that each count can be computed individually. Algorithm 3 Server-side frequency for PI-RAPPOR Input: 0 < α0 < α1 < 1, k, index j [k] and prime p > k. Reports φ1, . . . , φn from n users. 1: sum = 0 2: for i [n] do 3: sum+ = bool(φi(j)) 4: end for 5: cj = sum α0n α1 α0 6: Return cj We start by establishing several general properties of PIRAPPOR. First we establish that the privacy guarantees for PI-RAPPOR are identical to those of RAPPOR. Lemma 4.1. PI-RAPPOR randomizer (Alg. 2) is deletion max n α1 α0 , 1 α0 o -DP and replacement α1(1 α0) α0(1 α1)-DP. Second we establish that the utility guarantees of PIRAPPOR are identical to those of RAPPOR. This follows directly from the fact that the utility is determined by the variance of the estimate of each individual count in each user s contribution. The variance of the estimate of c(S)j is a sum of c(S)j variances for randomization of 1 and n c(S)j variances of randomization of 0. These variances are identical for RAPPOR and PI-RAPPOR leading to identical exact bounds. Lemma 4.2. For any dataset S [k]n, the estimate c computed by PI-RAPPOR algorithm (Algs. 2,3) satisfies E[ c] = c(S), and for all j [k], Var[ cj] = c(S)j 1 α0 α1 α1 α0 + nα0(1 α0) For the symmetric case α0 = 1 α1 this simplifies to Var[ cj] = n α0(1 α0) (1 2α0)2 . In addition, the expected ℓ2 squared error is E c c(S) 2 2 = n1 α0 α1 α1 α0 + nk α0(1 α0) Lossless Compression of Efficient Private Local Randomizers Plugging α0 = 1/(eε + 1) and α1 = 1/2 for replacement privacy and α0 = 1 α1 = 1/(eε + 1) for deletion privacy gives the following utility bounds for ε-DP versions of PIRAPPOR. Corollary 4.3. For any ε > 0 and a setting of p that ensures that p/(eε + 1) N we have that PI-RAPPOR for α0 = 1 α1 = 1/(eε + 1) satisfies deletion ε-DP and for every dataset S [k]n, the estimate c computed by PI-RAPPOR satisfies: E[ c] = c(S), for all j [k], Var[ cj] = n eε (eε 1)2 and E c c(S) 2 2 = nk eε (eε 1)2 . Corollary 4.4. For any ε > 0 and a setting of p that ensures that p/(eε + 1) N we have that PI-RAPPOR for α0 = 1/(eε + 1) and α1 = 1/2 is replacement ε-DP and for every dataset S [k]n, the estimate c computed by PIRAPPOR satisfies: E[ c] = c(S), for all j [k], Var[ cj] = c(S)j + n 4eε (eε 1)2 and E c c(S) 2 2 = n + nk 4eε (eε 1)2 . Finally, we analyze the computational and communication cost of PI-RAPPOR. Clearly, the communication cost of PI-RAPPOR is 2 log2 p bits. In addition, it is not hard to see that all computations performed by PI-RAPPOR can be implemented in essentially the same time as single multiplication in Fp. The analysis of the running time of decoding and aggregation is similarly straightforward since decoding every bit of message takes time that is dominated by the time of a single multiplication in Fp. We defer the details to SM. As these complexities depend on log p we also need to discuss the choice of p. It is not hard to show (see SM for details) that p c1 max{k, eε, 1/ε} for a sufficiently large constant c1 ensures that PI-RAPPOR will have essentially the same guarantees as RAPPOR. This means that the communication cost of PI-RAPPOR is 2 log2(max{k, eε, 1/ε}) + O(1). Also we are typically interested in compression when k max{eε, 1/ε} and in such case the communication cost is 2 log2(k) + O(1). 5 Mean Estimation In this section, we consider the problem of mean estimation in ℓ2 norm, for ℓ2-norm bounded vectors. Formally, each client has a vector xi Bd, where Bd := {x Rd | x 2 1}. Our goal is to compute the mean of these vectors privately, and we measure our error in the ℓ2 norm. In the literature this problem is often studied in the statistical setting where xi s are sampled i.i.d. from some distribution supported on Bd and the goal is to estimate the mean of this distribution. In this setting, the expected squared ℓ2 distance between the mean of the distribution and the mean of the samples is at most 1/n and is dominated by the privacy error in the regime that we are interested in (ε < d). In the absence of communication constraints and ε < d, the optimal ε-LDP protocols for this problem achieve an expected squared ℓ2 error of Θ( d n min(ε,ε2)) (Duchi et al., 2018; Duchi & Rogers, 2019). Here and in the rest of the section we focus on the replacement DP both for consistency with existing work and since for this problem the dependence on ε is linear (when 1 < ε < d) and thus the difference between replacement and deletion is less important. If one is willing to relax to (ε, δ) or concentrated differential privacy (Dwork & Rothblum, 2016; Bun & Steinke, 2016; Mironov, 2017) guarantees, then standard Gaussian noise addition achieves the asymptotically optimal bound. When ε 1, the randomizer of Duchi et al. (2018) (which we refer to as Priv HS) also achieves the optimal O( d nε2 ) bound. Recent work of Erlingsson et al. (2020) gives a low-communication version of Priv HS. Specifically, in the context of federated optimization they show that Priv HS is equivalent to sending a single bit and a randomly and uniformly generated unit vector. This vector can be sent using a seed to a PRG. Bhowmick et al. (2019) describe the Priv Unit algorithm that achieves the optimal bound also when ε > 1. Unfortunately, Priv Unit has high communication cost of Ω(d). By applying Theorem 3.4 to Priv Unit or Gaussian noise addition, we can immediately obtain a low communication algorithm with negligible effect on privacy and utility. This gives us an algorithm that communicates a single seed, and has the asymptotically optimal privacy utility trade-off. Implementing Priv Unit requires sampling uniformly from a spherical cap {v | v 2 = 1, x, v α} for α p ε/d. Using standard techniques this can be done with high accuracy using O(d) random bits and O(d) time. Further, for every x the resulting densities can be computed easily given the surface area of the cap. Overall rejection sampling can be computed in O(d) time. Thus this approach to compression requires time O(eεd). This implies that given an exponentially strong PRG G, we can compress Priv Unit to O(log(dn) + ε) bits with negligible effects on utility and privacy. In most settings of interest, the computational cost O(eεd) is not much larger than the typical cost of computing the vector itself, e.g. by back propagation in the case of gradients of neural networks (e.g. ε = 8 requires 3000 trials in expectation). We can further reduce this computational overhead. We show a simple reduction from the general case of ε > 1 to a protocol for ε = ε/m that preserves asymptotic optimality, where m 2ε is an integer. The algorithm simply runs m copies of the ε -DP randomizer and sends all the reports. The estimates produced from these reports are averaged by the server. This reduces the expected number of rejection sampling trials to meε/m. We describe the formal details of this simple reduction in SM. This reduction allows one to achieve different trade-offs between computation, communication, and closeness to the accuracy of the original randomizer. As an additional ben- Lossless Compression of Efficient Private Local Randomizers efit, we no longer need an LDP randomizer that is optimal in the ε > 1 regime. We can simply use m = ε and get an asymptotically optimal algorithm for ε > 1 from any algorithm that is asymptotically optimal for ε [1, 1/2]. In particular, instead of Priv Unit we can use the low communication version of Priv HS from (Erlingsson et al., 2020). This bypasses the need for our compression algorithm and makes the privacy guarantees unconditional. Remark 5.1. We remark that the compression of Priv Unit can be easily made unconditional. The reference distribution ρ of Priv Unit is uniform over a sphere of some radius B(d, ε) = O( p d/ min(ε, ε2)). It is not hard to see that both the privacy and utility guarantees of Priv Unit are preserved by any PRG G which preserves Prv ρ[ x, v θ] for every vector x sufficiently well (up to some 1/poly(d, n, eε/ε) accuracy). Note that these tests are halfspaces and have VC dimension d. Therefore by the standard ϵ-net argument, a random sample of size O(d B(d, ε)/γ2) from the reference distribution will, with high probability, give a set of points S that γ-fools the test (for any γ > 0). By choosing γ = 1/poly(d, n, eε/ε) we can ensure that the effect on privacy and accuracy is negligible (relative to the error introduced due to privacy). Thus one can compress the communication to log2(|S|) = O(log(dn/ε) + ε) bits unconditionally (with negligible effect on accuracy and privacy). While Priv Unit, SQKR and the repeated version of Priv HS are asymptotically optimal, the accuracy they achieve in practice may be different. Therefore we empirically compare these algorithms. In our first comparison we consider four algorithms. The Priv HS algorithm outputs a vector whose norm is fully defined by the parameters d, ε: the output vector has norm B(d, ε) = eε+1 The variance is then easily seen to be (B2+1 2B)/n when averaging over n samples. For large dimensional settings of interest, B 1 so this expression is very well approximated by B2/n and we use this value as a proxy. For SQKR, we use the implementation provided by the authors at (Kas) (specifically, second version of the code that optimizes some of the parameters). We show error bars for the empirical squared error based on 20 trials. The Priv Unit algorithm internally splits its privacy budget ε into two parts ε0, ε1 = 1 ε0. As in the case of Priv HS, the output of Priv Unit (for fixed d, ε0, ε1) has a fixed squared norm which is the proxy we use for variance. We first consider the default split used in the experiments in (Bhowmick et al., 2019) and refer to it as Priv Unit. In addition, we optimize the splitting so as to minimize the variance proxy, by evaluating the expression for the variance proxy as a function of the θ = ε0/ε, for 101 values of θ = 0.00, 0.01, 0.02, . . . , 0.99, 1.0. We call this algorithm Priv Unit Optimized. Note that since we are optimizing Figure 1. Expected ℓ2 2 error of mechanisms Priv HS, Priv Unit, Priv Unit Optimized and SQKR for values of ε between 1 and 8. θ to minimize the norm proxy, this optimization is dataindependent and need only be done once for a fixed ε. For both variants of Priv Unit, we use the norm proxy in our evaluation; as discussed above, in high-dimensional settings of interest, the proxy is nearly exact. Figure 1 compares the expected squared error of these algorithms for d = 1, 000, n = 10, 000 and ε taking integer values from 1 to 8. These plots show both Priv Unit and Priv Unit Optimized are more accurate than Priv HS and SQKR in the whole range of parameters While Priv HS is competitive for small ε, it does not get better with ε for large ε. SQKR consistently has about 5 higher expected squared error than Priv Unit Optimized and about 2.5 higher error compared to Priv Unit. Thus in the large ε regime, the ability to compress Priv Unit Optimized gives a 5 improvement in error compared to previous compressed algorithms. We also observe that Priv Unit Optimized is noticeably better than Priv Unit. Our technique being completely general, it will apply losslessly to any other better local randomizers that may be discovered in the future. As discussed earlier, one way to reduce the computational cost of compressed Priv Unit Optimized is to use the reduction described above. For instance, instead of running Priv Unit Optimized with ε = 8, we may run it twice with ε = 4 and average the results on the server. Asymptotically, this gives the same expected squared error and we empirically evaluate the effect of such splitting on the expected error in SM. Lossless Compression of Efficient Private Local Randomizers An implementation of kashine based mean estimation scheme. https://github.com/Wei Ning Chen/ Kashin-mean-estimation. Accessed: 2021-0217. Abadi, M., Chu, A., Goodfellow, I. J., Mc Mahan, H. B., Mironov, I., Talwar, K., and Zhang, L. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (CCS), pp. 308 318, 2016. Acharya, J. and Sun, Z. Communication complexity in locally private distribution estimation and heavy hitters. ar Xiv preprint ar Xiv:1905.11888, 2019. Acharya, J., Sun, Z., and Zhang, H. Hadamard response: Estimating distributions privately, efficiently, and with little communication. volume 89 of Proceedings of Machine Learning Research, pp. 1120 1129. PMLR, 16 18 Apr 2019. Agarwal, N., Suresh, A. T., Yu, F. X. X., Kumar, S., and Mc Mahan, B. cpsgd: Communication-efficient and differentially-private distributed sgd. In Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., and Garnett, R. (eds.), Advances in Neural Information Processing Systems, volume 31, pp. 7564 7575. Curran Associates, Inc., 2018. URL https://proceedings. neurips.cc/paper/2018/file/ 21ce689121e39821d07d04faab328370-Paper. pdf. Alistarh, D., Grubic, D., Li, J., Tomioka, R., and Vojnovic, M. Qsgd: Communication-efficient sgd via gradient quantization and encoding. In Guyon, I., Luxburg, U. V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R. (eds.), Advances in Neural Information Processing Systems, volume 30, pp. 1709 1720. Curran Associates, Inc., 2017. URL https://proceedings. neurips.cc/paper/2017/file/ 6c340f25839e6acdc73414517203f5f0-Paper. pdf. Apple s Differential Privacy Team. Learning with privacy at scale. Apple Machine Learning Journal, 1(9), December 2017. Balle, B., Bell, J., Gasc on, A., and Nissim, K. The privacy blanket of the shuffle model. In Boldyreva, A. and Micciancio, D. (eds.), Advances in Cryptology CRYPTO 2019, pp. 638 667, Cham, 2019. Springer International Publishing. ISBN 978-3-030-26951-7. Bassily, R. and Smith, A. Local, private, efficient protocols for succinct histograms. In Proceedings of the fortyseventh annual ACM symposium on Theory of computing, pp. 127 135, 2015. Bassily, R., Smith, A., and Thakurta, A. Private empirical risk minimization, revisited. Co RR, abs/1405.7085, 2014. URL http://arxiv.org/abs/1405.7085. Bassily, R., Nissim, K., Stemmer, U., and Thakurta, A. Practical locally private heavy hitters. Journal of Machine Learning Research, 21(16):1 42, 2020. Bhowmick, A., Duchi, J., Freudiger, J., Kapoor, G., and Rogers, R. Protection against reconstruction and its applications in private federated learning, 2019. Bittau, A., Erlingsson, U., Maniatis, P., Mironov, I., Raghunathan, A., Lie, D., Rudominer, M., Kode, U., Tinnes, J., and Seefeld, B. Prochlo: Strong privacy for analytics in the crowd. In Proceedings of the 26th Symposium on Operating Systems Principles, SOSP 17, pp. 441 459, 2017. Bun, M. and Steinke, T. Concentrated differential privacy: Simplifications, extensions, and lower bounds. In Proceedings, Part I, of the 14th International Conference on Theory of Cryptography - Volume 9985, pp. 635 658, Berlin, Heidelberg, 2016. Springer Verlag. ISBN 9783662536407. doi: 10.1007/ 978-3-662-53641-4 24. URL https://doi.org/ 10.1007/978-3-662-53641-4_24. Bun, M., Nelson, J., and Stemmer, U. Heavy hitters and the structure of local privacy. ACM Transactions on Algorithms (TALG), 15(4):1 40, 2019. Chen, W.-N., Kairouz, P., and Ozg ur, A. Breaking the communication-privacy-accuracy trilemma. ar Xiv preprint ar Xiv:2007.11707, 2020. Cheu, A., Smith, A., Ullman, J., Zeber, D., and Zhilyaev, M. Distributed differential privacy via shuffling. In Ishai, Y. and Rijmen, V. (eds.), Advances in Cryptology EUROCRYPT 2019, pp. 375 403, Cham, 2019. Springer International Publishing. ISBN 978-3-030-17653-2. Ding, B., Kulkarni, J., and Yekhanin, S. Collecting telemetry data privately. In 31st Conference on Neural Information Processing Systems (NIPS), pp. 3574 3583, 2017. Duchi, J. and Rogers, R. Lower bounds for locally private estimation via communication complexity. In Beygelzimer, A. and Hsu, D. (eds.), Proceedings of the Thirty-Second Conference on Learning Theory, volume 99 of Proceedings of Machine Learning Research, pp. 1161 1191, Phoenix, USA, 25 28 Jun 2019. PMLR. URL http://proceedings.mlr. press/v99/duchi19a.html. Lossless Compression of Efficient Private Local Randomizers Duchi, J. C., Jordan, M. I., and Wainwright, M. J. Minimax optimal procedures for locally private estimation. Journal of the American Statistical Association, 113(521):182 201, 2018. Dwork, C. and Roth, A. The Algorithmic Foundations of Differential Privacy, volume 9. 2014. URL http: //dx.doi.org/10.1561/0400000042. Dwork, C. and Rothblum, G. N. Concentrated differential privacy. Ar Xiv, abs/1603.01887, 2016. Dwork, C., Mc Sherry, F., Nissim, K., and Smith, A. Calibrating noise to sensitivity in private data analysis. In TCC, pp. 265 284, 2006. Erlingsson, U., Pihur, V., and Korolova, A. Rappor: Randomized aggregatable privacy-preserving ordinal response. In Proceedings of the 2014 ACM SIGSAC conference on computer and communications security, pp. 1054 1067, 2014. Erlingsson, U., Feldman, V., Mironov, I., Raghunathan, A., Talwar, K., and Thakurta, A. Amplification by shuffling: From local to central differential privacy via anonymity. In Proceedings of the Thirtieth Annual ACMSIAM Symposium on Discrete Algorithms, SODA 19, pp. 2468 2479, USA, 2019. Society for Industrial and Applied Mathematics. Erlingsson, U., Feldman, V., Mironov, I., Raghunathan, A., Song, S., Talwar, K., and Thakurta, A. Encode, shuffle, analyze privacy revisited: Formalizations and empirical evaluation. 2020. Evfimievski, A. V., Gehrke, J., and Srikant, R. Limiting privacy breaches in privacy preserving data mining. In PODS, pp. 211 222, 2003. Faghri, F., Tabrizian, I., Markov, I., Alistarh, D., Roy, D. M., and Ramezani-Kebrya, A. Adaptive gradient quantization for data-parallel sgd. In Advances in Neural Information Processing Systems, volume 33, 2020. Feldman, V., Guzman, C., and Vempala, S. Statistical query algorithms for mean vector estimation and stochastic convex optimization. Co RR, abs/1512.09170, 2015. URL http://arxiv.org/abs/1512.09170. Extended abstract in SODA 2017. Feldman, V., Mc Millan, A., and Talwar, K. Hiding among the clones: A simple and nearly optimal analysis of privacy amplification by shuffling. Co RR, abs/2012.12803, 2020. URL https://arxiv.org/ abs/2012.12803. Gandikota, V., Kane, D., Maity, R. K., and Mazumdar, A. vqsgd: Vector quantized stochastic gradient descent. ar Xiv preprint ar Xiv:1911.07971, 2019. Girgis, A. M., Data, D., Diggavi, S., Kairouz, P., and Suresh, A. T. Shuffled model of federated learning: Privacy, communication and accuracy trade-offs, 2020. Hsu, J., Khanna, S., and Roth, A. Distributed private heavy hitters. In International Colloquium on Automata, Languages, and Programming, pp. 461 472. Springer, 2012. Kairouz, P., Bonawitz, K., and Ramage, D. Discrete distribution estimation under local privacy. ar Xiv preprint ar Xiv:1602.07387, 2016. Kairouz, P., Mc Mahan, H. B., Avent, B., Bellet, A., Bennis, M., Bhagoji, A. N., Bonawitz, K., Charles, Z., Cormode, G., Cummings, R., D Oliveira, R. G. L., Rouayheb, S. E., Evans, D., Gardner, J., Garrett, Z., Gasc on, A., Ghazi, B., Gibbons, P. B., Gruteser, M., Harchaoui, Z., He, C., He, L., Huo, Z., Hutchinson, B., Hsu, J., Jaggi, M., Javidi, T., Joshi, G., Khodak, M., Koneˇcn y, J., Korolova, A., Koushanfar, F., Koyejo, S., Lepoint, T., Liu, Y., Mittal, P., Mohri, M., Nock, R., Ozg ur, A., Pagh, R., Raykova, M., Qi, H., Ramage, D., Raskar, R., Song, D., Song, W., Stich, S. U., Sun, Z., Suresh, A. T., Tram er, F., Vepakomma, P., Wang, J., Xiong, L., Xu, Z., Yang, Q., Yu, F. X., Yu, H., and Zhao, S. Advances and open problems in federated learning, 2019. Luby, M., Luby, M. G., and Wigderson, A. Pairwise independence and derandomization, volume 4. Now Publishers Inc, 2006. Lyubarskii, Y. and Vershynin, R. Uncertainty principles and vector quantization. Information Theory, IEEE Transactions on, 56(7):3491 3501, 2010. Mayekar, P. and Tyagi, H. Limits on gradient compression for stochastic optimization. In 2020 IEEE International Symposium on Information Theory (ISIT), pp. 2658 2663, 2020. doi: 10.1109/ISIT44484.2020.9174075. Mc Gregor, A., Mironov, I., Pitassi, T., Reingold, O., Talwar, K., and Vadhan, S. The limits of two-party differential privacy. In 2010 IEEE 51st Annual Symposium on Foundations of Computer Science, pp. 81 90, 2010. Mc Mahan, H. B., Ramage, D., Talwar, K., and Zhang, L. Learning differentially private recurrent language models. In 6th International Conference on Learning Representations, ICLR 2018, 2018. Menezes, A. J., Van Oorschot, P. C., and Vanstone, S. A. Handbook of applied cryptography. CRC press, 2018. Mironov, I. R enyi differential privacy. In 2017 IEEE 30th Computer Security Foundations Symposium (CSF), pp. 263 275, 2017. doi: 10.1109/CSF.2017.11. Lossless Compression of Efficient Private Local Randomizers Mironov, I., Pandey, O., Reingold, O., and Vadhan, S. Computational differential privacy. In Halevi, S. (ed.), Advances in Cryptology - CRYPTO 2009, pp. 126 142, Berlin, Heidelberg, 2009. Springer Berlin Heidelberg. ISBN 978-3-642-03356-8. Mishra, N. and Sandler, M. Privacy via pseudorandom sketches. In Proceedings of the Twenty-Fifth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 06, pp. 143 152, New York, NY, USA, 2006. Association for Computing Machinery. ISBN 1595933182. doi: 10.1145/ 1142351.1142373. URL https://doi.org/10. 1145/1142351.1142373. Nisan, N. Pseudorandom generators for space-bounded computation. Combinatorica, 12(4):449 461, 1992. Suresh, A. T., Yu, F. X., Kumar, S., and Mc Mahan, H. B. Distributed mean estimation with limited communication. In Precup, D. and Teh, Y. W. (eds.), Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pp. 3329 3337, International Convention Centre, Sydney, Australia, 06 11 Aug 2017. PMLR. URL http://proceedings.mlr. press/v70/suresh17a.html. Wang, S., Huang, L., Wang, P., Nie, Y., Xu, H., Yang, W., Li, X.-Y., and Qiao, C. Mutual information optimally local private discrete distribution estimation. ar Xiv preprint ar Xiv:1607.08025, 2016. Wang, T., Blocki, J., Li, N., and Jha, S. Locally differentially private protocols for frequency estimation. In 26th USENIX Security Symposium (USENIX Security 17), pp. 729 745, Vancouver, BC, August 2017. USENIX Association. ISBN 978-1-931971-40-9. URL https://www.usenix.org/conference/ usenixsecurity17/technical-sessions/ presentation/wang-tianhao. Warner, S. L. Randomized response: A survey technique for eliminating evasive answer bias. Journal of the American Statistical Association, 60(309):63 69, 1965. Ye, M. and Barg, A. Optimal schemes for discrete distribution estimation under locally differential privacy. IEEE Transactions on Information Theory, 64(8):5662 5676, 2018.