# locally_private_gaussian_estimation__ccf6f79a.pdf Locally Private Gaussian Estimation Matthew Joseph University of Pennsylvania majos@cis.upenn.edu Janardhan Kulkarni Microsoft Research Redmond jakul@microsoft.com Jieming Mao Google Research New York maojm@google.com Zhiwei Steven Wu University of Minnesota zsw@umn.edu We study a basic private estimation problem: each of n users draws a single i.i.d. sample from an unknown Gaussian distribution N(µ,σ2), and the goal is to estimate µ while guaranteeing local differential privacy for each user. As minimizing the number of rounds of interaction is important in the local setting, we provide adaptive two-round solutions and nonadaptive one-round solutions to this problem. We match these upper bounds with an information-theoretic lower bound showing that our accuracy guarantees are tight up to logarithmic factors for all sequentially interactive locally private protocols. 1 Introduction Differential privacy is a formal algorithmic guarantee that no single input has a large effect on the output of a computation. Since its introduction [11], a rich line of work has made differential privacy a compelling privacy guarantee (see Dwork et al. [12] and Vadhan [24] for surveys), and deployments of differential privacy now exist at many organizations, including Apple [2], Google [5, 13], Microsoft [8], Mozilla [3], and the US Census Bureau [1, 20]. Much recent attention, including almost all industrial deployments, has focused on a variant called local differential privacy [4, 11, 19]. In the local model private data is distributed across many users, and each user privatizes their data before the data is collected by an analyst. Thus, as any locally differentially private computation runs on already-privatized data, data contributors need not worry about compromised data analysts or insecure communication channels. In contrast, (global) differential privacy assumes that the data analyst has secure, trusted access to the unprivatized data. However, the stronger privacy guarantees of the local model come at a price. For many problems, a locally private solution requires far more samples than a globally private solution [7, 10, 19, 23]. Here, we study the basic problem of locally private Gaussian estimation: given n users each holding an i.i.d. draw from an unknown Gaussian distribution N(µ,σ2), can an analyst accurately estimate the mean µ while guaranteeing local differential privacy for each user? On the technical front, locally private Gaussian estimation captures two general challenges in locally private learning. First, since data is drawn from a Gaussian, there is no a priori (worst-case) bound on the scale of the observations. Naive applications of standard privatization methods like Laplace and Gaussian mechanisms must add noise proportional to the worst-case scale of the data and are thus infeasible. Second, protocols requiring many rounds of user-analyst interaction are difficult to A portion of this work was done while at Microsoft Research Redmond. This work done while at the Warren Center, University of Pennsylvania. A portion of this work was done while at Microsoft Research New York. 33rd Conference on Neural Information Processing Systems (Neur IPS 2019), Vancouver, Canada. implement in real-world systems and may incur much longer running times. Network latency as well as server and user liveness constraints compound this difficulty [22]. It is therefore desirable to limit the number of rounds of interaction between users and the data analyst. Finally, besides being a fundamental learning problem, Gaussian estimation has several real-world applications (e.g. telemetry data analysis [8]) where one may assume that users behavior follows a Guassian distribution. 1.1 Our Contributions We divide our solution to locally private Gaussian estimation into two cases. In the first case, σ is known to the analyst, and in the second case σ is unknown but bounded in known [σmin,σmax]. For each case, we provide an (ε,0)-locally private adaptive two-round protocol and nonadaptive one-round protocol4. Our privacy guarantees are worst-case; however, when x1,...,xn N(µ,σ2) we also get the following accuracy guarantees. Theorem 1.1 (Informal). When σ is known, and n is sufficiently large, there exists two-round protocol outputting ˆµ such that ˆµ µ = O ( σ n ) with probability 1 β, and there exists one-round protocol outputting ˆµ such that ˆµ µ = O ( σ log(n) n ) with probability 1 β. Theorem 1.2 (Informal). When σ is unknown but bounded in known [σmin,σmax], and n is suffi- ciently large, there exists two-round protocol outputting ˆµ such that ˆµ µ = O ( σ log(1/β) log(n) with probability 1 β, and there exists one-round protocol outputting ˆµ such that ˆµ µ = log([σmax/σmin]+1) log(1/β) log3/2(n) n ) with probability 1 β. All of our protocols are sequentially interactive [10]: each user interacts with the protocol at most once. We match these upper bounds with a lower bound showing that our results are tight for all sequentially interactive locally private protocols up to logarithmic factors. We obtain this result by introducing tools from the strong data processing inequality literature [6, 21]. Using subsequent work by Joseph et al. [16], we can also extend this lower bound to fully interactive protocols. Theorem 1.3 (Informal). For a given σ, there does not exist an (ε,δ)-locally private protocol A such that for any µ = O ( σ 1 n), given x1,...,xn N(µ,σ2), A outputs estimate ˆµ satisfying ˆµ µ = o( σ 1 n) with probability 15/16. 1.2 Related Work Several works have already studied differentially private versions of various statistical tasks, especially in the global setting. Karwa and Vadhan [18] and Kamath et al. [17] consider similar versions of Gaussian estimation under global differential privacy, respectively in the one-dimensional and highdimensional cases. For both the known and unknown variance cases, Karwa and Vadhan [18] offer n + poly log(1/β) εn ) accuracy upper bound for estimating µ. Since an Ω(σ accuracy lower bound holds even without privacy, our upper and lower bounds show that local privacy adds a roughly n accuracy cost over global privacy. In concurrent independent work, Gaboardi et al. [14] also study locally private Gaussian estimation. We match or better their accuracy results with much lower round complexity. They provide adaptive protocols for the knownand unknown-σ settings, with the latter protocol having round complexity T as large as Ω(n), linear in the number of users. In contrast, we provide both adaptive and nonadaptive solutions, and our protocols all have round complexity T 2. A full comparison appears in Figure 1. Gaboardi et al. [14] also prove a tight lower bound for nonadaptive protocols that can be extended to sequentially interactive protocols. We provide a lower bound that is tight for sequentially interactive protocols up to logarithmic factors, and we depart from previous local privacy lower bounds by introducing tools from the strong data processing inequality (SDPI) literature [6, 21]. This approach 4As adaptive and nonadaptive are implicit in two-round and one-round , we often omit these terms. Gaboardi et al. [14] This Work Setting Accuracy α, Round Complexity T Accuracy α, Round Complexity T Known σ, adaptive α = O Known σ, nonadaptive α = O Unknown σ, adaptive α = O T = Ω(log ( R σmin )) Unknown σ, nonadaptive α = O σmin +1) log( 1 β ) log3/2(n) Figure 1: A comparison of upper bounds in Gaboardi et al. [14] and here. In all cases, Gaboardi et al. [14] use (ε,δ)-locally private algorithms and we use (ε,0). Here, R denotes an upper bound on both µ and σ. In our setting, the upper bound on µ is O(2nε2/ log(n/β)), leading the unknown variance protocol of Gaboardi et al. [14] to round complexity potentially as large as Ω(nε2/log(1/β)). uses an SDPI to control how much information a sample gives about its generating distribution, then uses existing local privacy results to bound the mutual information between a sample and the privatized output from that sample. Subsequent work by Duchi and Rogers [9] generalizes the SDPI framework to prove lower bounds for a broader class of problems in local privacy. They also extend the SDPI framework to prove lower bounds for fully interactive algorithms. 2 Preliminaries We consider a setting where, for each i [n], user i s datum is a single draw from an unknown Gaussian distribution, xi N(µ,σ2), and these draws are i.i.d. In our communication protocol, users may exchange messages over public channels with a single (possibly untrusted) central analyst.5 The analyst s task is to accurately estimate µ while guaranteeing local differential privacy for each user. To minimize interaction with any single user, we restrict our attention to sequentially interactive protocols. In these protocols, every user sends at most a single message in the entire protocol. We also study the round complexity of these interactive protocols. Formally, one round of interaction in a protocol consists of the following two steps: 1) the analyst selects a subset of users S [n], along with a set of randomizers {Qi i S}, and 2) each user i in S publishes a message yi = Qi(xi). A randomized algorithm is differentially private if arbitrarily changing a single input does not change the output distribution too much . This preserves privacy because the output distribution is insensitive to any change of a single user s data. We study a stronger privacy guarantee called local differential privacy. In the local model, each user i computes their message using a local randomizer. A local randomizer is a differentially private algorithm taking a single-element database as input. Definition 2.1 (Local Randomizer). A randomized function Qi X Y is an (ε,δ)-local randomizer if, for every pair of observations xi,x i X and any S Y , Pr[Qi(xi) S] eε Pr[Qi(x i) S]+δ. 5The notion of a central coordinating analyst is only a useful simplification. As the analyst has no special powers or privileges, any user, or the protocol itself, can be viewed as playing the same role. A protocol is locally private if every user computes their message using a local randomizer. In a sequentially interactive protocol, the local randomizer for user i may be chosen adaptively based on previous messages z1,...,zi 1. However, the choice of randomizer cannot be based on user i s data. Definition 2.2. A sequentially interactive protocol A is (ε,δ)-locally private for private user data {x1,...,xn} if, for every user i [n], the message Yi is computed using an (ε,δ)-local randomizer Qi. When δ > 0, we say A is approximately locally private. If δ = 0, A is purely locally private. 3 Estimating µ with Known σ We begin with the case where σ2 is known (shorthanded KV ). In Section 3.1, we provide a protocol KVGAUSSTIMATE that requires two rounds of analyst-user interaction. In Section 3.2, we provide a protocol 1ROUNDKVGAUSSTIMATE achieving a weaker accuracy guarantee in a single round. All omitted pseudocode and proofs appear in the full version of this paper [15]. 3.1 Two-round Protocol KVGAUSSTIMATE In KVGAUSSTIMATE the users are split into halves U1 and U2. In round one, the analyst queries users in U1 to obtain an O(σ)-accurate estimate ˆµ1 of µ. In round two, the analyst passes ˆµ1 to users in U2, who respond based on ˆµ1 and their own data. The analyst then aggregates this second set of responses into a better final estimate of µ. Theorem 3.1. Two-round protocol KVGAUSSTIMATE satisfies (ε,0)-local differential privacy for x1,...,xn and, if x1,...,xn iid N(µ,σ2) where σ is known and n log(n) = Ω( log(µ) log(1/β) probability 1 β outputs ˆµ such that ˆµ µ = O ( σ Algorithm 1 KVGAUSSTIMATE Input: ε,k,L,n,σ,U1,U2 1: for j L do 2: for user i U j 1 do 3: User i outputs yi RR1(ε,i,j) 4: end for 5: end for End of round 1 6: Analyst computes ˆH1 KVAGG1(ε,k,L,U1) 7: Analyst computes ˆµ1 ESTMEAN(β,ε, ˆH1,k,L) 8: for user i U2 do 9: User i outputs yi KVRR2(ε,i, ˆµ1,σ) 10: end for End of round 2 11: Analyst computes ˆH2 KVAGG2(ε,n/2,U2) 12: Analyst computes ˆT 2 erf 1 ( 2( ˆ H2( 1)+ ˆ H2(1)) n ) 13: Analyst outputs ˆµ2 σ ˆT + ˆµ1 Output: Analyst estimate ˆµ2 of µ 3.1.1 First round of KVGAUSSTIMATE For neatness, let L = n/(2k) , Lmin = log(σ) , Lmax = Lmin 1 + L, and L = {Lmin,Lmin + 1,...,Lmax}. U1 is then split into L subgroups indexed by L, and each subgroup has size k = Ω( log(n/β) ε2 ). KVGAUSSTIMATE begins by iterating through each subgroup j L. Each user i U j 1 releases a privatized version of xi/2j mod 4 via randomized response (RR1): with probability eε/(eε + 3), user i outputs xi/2j mod 4, and otherwise outputs one of the remaining elements of {0,1,2,3} uniformly at random. Responses from group U j 1 will be used to estimate the jth least significant bit of µ (rounded to an integer). The analyst then uses KVAGG1 ( Known Variance Aggregation ) to aggregate and debias responses to account for this randomness. Algorithm 2 KVAGG1 Input: ε,k,L,U 1: for j L do 2: for a {0,1} do 3: Cj(a) { yi i U j, yi = a} 4: ˆHj(a) eε+3 eε 1 (Cj(a) k eε+3) 5: end for 6: end for 7: Output ˆH Output: Aggregated histogram ˆH of private user responses The result is a collection of histograms ˆH1. The analyst uses ˆH1 in ESTMEAN to binary search for µ. Intuitively, for each subgroup U j 1 if all multiples of 2j are far from µ then Gaussian concentration implies that almost all users i U j 1 compute the same value of x/2j mod 4. This produces a histogram ˆHj 1 where most elements fall concentrate in a single bin. The analyst in turn narrows their search range for µ. For example, if ˆHLmax 1 concentrates in 0, then the range narrows to µ [0,2Lmax); if ˆHLmax 1 1 concentrates in 1, then the range narrows to µ [2Lmax 1,2Lmax), and so on. If instead some multiple of 2j is near µ, the elements of ˆHj 1 will spread over multiple (adjacent) bins. This is also useful: a point from the middle of this block of bins is O(σ)-close to µ. The analyst thus takes such a point as ˆµ1 and ends their search. Our analysis will also rely on having a noticeably low-count bin that is non-adjacent to the bin containing µ. This motivates using 4 as a modulus. In this way, the analyst examines ˆHLmax 1 , ˆHLmax 1 1 ,... in sequence, estimating µ from most to least significant bit. Crucially, the modulus structure of user responses enables the analyst to carry out this binary search with one round of interaction. Thus at the end of the first round the analyst obtains an O(σ)-accurate estimate ˆµ1 of µ. Algorithm 3 ESTMEAN Input: β,ε, ˆH1,k,L 2: j Lmax 3: Ij [0,2Lmax] 4: while j Lmin and maxa {0,1,2,3} ˆHj 1(a) 0.52k + ψ do 5: Analyst computes integer c such that c2j Ij and c M1(j) mod 4 6: Analyst computes Ij 1 [c2j,(c + 1)2j] 7: j j 1 8: end while 9: j max(j,Lmin) 10: Analyst computes M1(j) arg maxa {0,1,2,3} ˆHj 1(a) 11: Analyst computes M2(j) arg maxa {0,1,2,3} {M1(j)} ˆHj 1(a) 12: Analyst computes c maximum integer such that c 2j Ij and c M1(j) or M2(j) mod 4 13: Analyst outputs ˆµ1 c 2j Output: Initial estimate ˆµ1 of µ 3.1.2 Second round of KVGAUSSTIMATE In the second round, the analyst passes ˆµ1 to users in U2. Users respond through KVRR2 ( Known Variance Randomized Response ), a privatized version of an algorithm from the distributed statistical estimation literature [6]. In KVRR2, each user centers their point with ˆµ1, standardizes it using σ, and randomized responds on sgn((xi ˆµ1)/σ). This crucially relies on the first estimate ˆµ1, as properly centering requires an initial O(σ)-accurate estimate of ˆµ. The analyst can then aggregate these responses by a debiasing process KVAGG2 akin to KVAGG1. Algorithm 4 KVRR2 Input: ε,i, ˆµ1,σ 1: User i computes x i (xi ˆµ1)/σ 2: User i computes yi sgn(x i) 3: User i computes c U [0,1] 4: if c eε eε+1 then 5: User i publishes yi yi 6: else 7: User i publishes yi yi 8: end if Output: Private centered user estimate yi Algorithm 5 KVAGG2 Input: ε,k,U 1: for a { 1,1} do 2: C(a) { yi i U, yi = a} 3: ˆH(a) eε+1 eε 1 (C(a) k eε+1) 4: end for 5: Analyst outputs ˆH Output: Aggregated histogram ˆH of private user responses From this aggregation ˆH2, the analyst obtains a good estimate of the bias of the initial estimate ˆµ1. If ˆµ1 < µ, responses will skew toward 1, and if ˆµ1 > µ responses will skew toward 1. By comparing this skew to the true standard CDF using the error function erf, the analyst recovers a better final estimate ˆµ2 of µ (Lines 12-13 of KVGAUSSTIMATE). Privacy of KVGAUSSTIMATE follows from the privacy of the randomized response mechanisms RR1 and KVRR2. 3.2 One-round Protocol 1ROUNDKVGAUSSTIMATE Recall that in KVGAUSSTIMATE the analyst 1) employs user pool U1 to compute rough estimate ˆµ1 and 2) adaptively refines this estimate using responses from the second user pool U2. 1ROUNDKVGAUSSTIMATE executes these two rounds of KVGAUSSTIMATE simultaneously by parallelization. Theorem 3.2. One-round protocol 1ROUNDKVGAUSSTIMATE satisfies (ε,0)-local differential privacy for x1,...,xn and, if x1,...,xn iid N(µ,σ2) where σ is known and n log(n) = Ω( log(µ) log(1/β) ε2 ), with probability 1 β outputs ˆµ such that ˆµ µ = O ( σ log(n) n ). 1ROUNDKVGAUSSTIMATE splits U2 into Θ( log(n)) subgroups that run the second-round protocol from KVGAUSSTIMATE with different values of ˆµ1. Intuitively, it suffices that at least one subgroup centers using a ˆµ1 near µ: the analyst can then use the data from that subgroup and discard the rest. By Gaussian concentration, most user samples cluster within O(σ log(n)) of µ, so each subgroup U j 2 receives a set of points S(j) interspersed Θ(σ log(n)) apart on the real line, and each user i U j 2 centers using the point in S(j) closest to xi. This leads us to use Θ( log(n)) groups with each point in S(j + 1) shifted Θ(σ) from the corresponding point in S(j). By doing so, we ensure that some subgroup has most of its users center using a point within O(σ) of µ. In summary, 1ROUNDKVGAUSSTIMATE works as follows: after collecting the single round of responses from U1 and U2, the analyst computes ˆµ1 using responses from U1. By comparing ˆµ1 and S(j) for each j, the analyst then selects the subgroup U j 2 where most users centered using a value in S(j ) closest to ˆµ1. This mimics the effect of adaptively passing ˆµ1 to the users in U j 2 , so the analyst simply processes the responses from U j 2 as it processed responses from U2 in KVGAUSSTIMATE. Because U j 2 contains Θ(n/ log(n)) users, the cost is a log1/4(n) factor in accuracy. 4 Unknown Variance In this section, we consider the more general problem with unknown variance σ2 (shorthanded UV ) that lies in known interval [σmin,σmax]. We again provide a two-round protocol UVGAUSSTIMATE and a slightly less accurate one-round protocol 1ROUNDUVGAUSSTIMATE. 4.1 Two-round Protocol UVGAUSSTIMATE is structurally similar to KVGAUSSTIMATE. In round one, the analyst uses the responses of half of the users to roughly estimate µ, and in round two the analyst passes this estimate to the second half of users for improvement. However, two key differences now arise. First, since σ is unknown, the analyst must now also estimate σ in round one. Second, since the analyst does not have a very accurate estimate of σ, the refinement process of the second round employs Laplace noise rather than the CDF comparison used in KVGAUSSTIMATE. Theorem 4.1. Two-round protocol UVGAUSSTIMATE satisfies (ε,0)-local differential privacy for x1,...,xn and, if x1,...,xn iid N(µ,σ2) where σ is unknown but bounded in known [σmin,σmax] and n log(n) = Ω( [log( σmax σmin +1)+log(µ)] log( 1 ε2 ), with probability at least 1 β outputs ˆµ such that ˆµ µ = O ( σ log(1/β) log(n) Algorithm 6 UVGAUSSTIMATE Input: ε,k1,L1,n,σ,U1,U2 1: for j L1 do 2: for user i U j 1 do 3: User i outputs yi RR1(ε,i,j) 4: end for 5: end for End of round 1 6: Analyst computes ˆH1 AGG1(ε,L1,U1) 7: Analyst computes ˆσ ESTVAR(β,ε, ˆH1,k1,L1) 8: Analyst computes ˆH2 KVAGG1(ε,k1,L1,U1) 9: Analyst computes ˆµ1 ESTMEAN(β,ε, ˆH2,k1,L1) 10: Analyst computes I [ˆµ1 ˆσ(2 + ln(4n))] 11: for user i U2 do 12: User i outputs yi UVRR2(ε,i,I) 13: end for End of round 2 14: Analyst outputs ˆµ2 2 n i U2 yi Output: Analyst estimate ˆµ2 of µ 4.1.1 First round of UVGAUSSTIMATE Similarly to KVGAUSSTIMATE, we split U1 into L1 = n/(2k1) subgroups of size k1 = Ω( log(n/β) ε2 ) and define Lmin = log(σmin) , Lmax = L1 + Lmin 1 log(σmax) , and L1 = {Lmin,Lmin + 1,...,Lmax}, indexing U1 by L1. Also as in KVGAUSSTIMATE, each user i in each subgroup U j 1 publishes a privatized version of xi/2j mod 4. The analyst aggregates them (KVAGG1) into ˆH2 and roughly estimates µ (ESTMEAN) as in KVGAUSSTIMATE. However, the analyst also employs a (similar) aggregation (AGG1) into ˆH1 for estimating σ (ESTVAR). At a high level, because samples from N(µ,σ2) probably fall within 3σ of µ, when 2j σ there exist a,a + 1 mod 4 {0,1,2,3} such that almost all users i have xi/2j mod 4 {a,a + 1}. The analyst s debiased aggregated histogram ˆHj 1 thus concentrates in at most two adjacent bins when 2j σ and spreads over more bins when 2j σ. By a process like ESTMEAN, examining this transition from concentrated to unconcentrated in ˆHLmax 1 , ˆHLmax 1 1 ,... yields a rough estimate of when 2j σ versus when 2j σ. As a result, at the end of round one the analyst obtains O(σ)-accurate estimates ˆσ of σ and ˆµ1 of µ. 4.1.2 Second round of UVGAUSSTIMATE The analyst now refines their initial estimate of µ. First, the analyst constructs an interval I of size O(ˆσ log(n)) around ˆµ1. Users in U2 then truncate their values to I, add Laplace noise scaled to I (the sensitivity of releasing a truncated point), and send the result to the analyst using UVRR2. The analyst then simply takes the mean of these responses as the final estimate of µ. Its accuracy guarantee follows from concentration of user samples around µ and Laplace noise around 0. Privacy follows from our use of randomized response and Laplace noise. We briefly explain our use of Laplace noise rather than CDF comparison. Roughly, when using an estimate ˆσ in the centering process, error in ˆσ propagates to error in the final estimate ˆµ2. This leads us to Laplace noise, which better handles the error in ˆσ that estimation of σ introduces. The cost is the log(n) factor that arises from adding Laplace noise scaled to I . Our choice of I constructed to contain not only µ but the points of Ω(n) users thus strikes a deliberate balance. I is both large enough to cover most users (who would otherwise truncate too much and skew the responses) and small enough to not introduce much noise from privacy (as noise is scaled to Lap( I /ε)). 4.2 One-round Protocol We now provide a one-round version of UVGAUSSTIMATE, 1ROUNDUVGAUSSTIMATE. Theorem 4.2. One-round protocol 1ROUNDUVGAUSSTIMATE satisfies (ε,0)-local differential privacy for x1,...,xn and, if x1,...,xn iid N(µ,σ2) where σ is unknown but bounded in known [σmin,σmax] and n log(n) = Ω( [log( σmax σmin +1)+log(µ)] log( 1 ε2 ), with probability at least 1 β outputs ˆµ σmin +1) log(1/β) log3/2(n) Like 1ROUNDKVGAUSSTIMATE, 1ROUNDUVGAUSSTIMATE simulates the second round of UVGAUSSTIMATE simultaneously with its first round. 1ROUNDUVGAUSSTIMATE splits U2 into subgroups, where each subgroup responds using a different interval Ij. At the end of the single round the analyst obtains estimates ˆµ1 and ˆσ from users in U1, constructs an interval I from these estimates, and finds a subgroup of U2 where most users employed a similar interval Ij. This similarity guarantees that the subgroup s responses yield the same accuracy as the two-round case up to an O(# subgroups) factor. As in 1ROUNDKVGAUSSTIMATE, we rely on Gaussian concentration and the modulus trick to minimize the number of subgroups. However, this time we parallelize not only over possible values of ˆµ1 but possible values of ˆσ as well. As this parallelization is somewhat involved, we defer its presentation to the full version of this paper [15]. In summary, at the end of the round the analyst computes ˆµ1 and ˆσ, computes the resulting interval I , and identifies a subgroup of U2 that responded using an interval Ij similar to I . This mimics the effect of passing an interval of size O(σ log(n)) around ˆµ1 to this subgroup and using the truncate-then- Laplace noise method of UVGAUSSTIMATE. The cost, due to the g = O ([log ( σmax σmin ) + 1] log(n)) subgroups required, is the 1/ g reduction in accuracy shown in Theorem 4.2. 5 Lower Bound We now show that all of our upper bounds are tight up to logarithmic factors. Our argument has three steps: we first reduce our estimation problem to a testing problem, then reduce this testing problem to a purely locally private testing problem, and finally prove a lower bound for this purely locally private testing problem. Taken together, these results show that estimation is hard for sequentially interactive (ε,δ)-locally private protocols. An extension to fully interactive protocols using recent subsequent work by Joseph et al. [16] appears in the full version of this paper [15]. Theorem 5.1. Let δ < min( ϵβ 60n ln(5n/2β), β 16n ln(n/β)e7ε ) and ε > 0. There exists absolute constant c such that if A is an (ε,δ)-locally private (α,β)-estimator for Estimate(n,M,σ) where M = 2nc] and β < 1/16, then α M/2 = Ω( σ [1] John M. Abowd. The challenge of scientific reproducibility and privacy protection for statistical agencies. Technical report, Census Scientific Advisory Committee, 2016. [2] Differential Privacy Team Apple. Learning with privacy at scale. Technical report, Apple, 2017. [3] Brendan Avent, Aleksandra Korolova, David Zeber, Torgeir Hovden, and Benjamin Livshits. Blender: enabling local search with a hybrid differential privacy model. In USENIX Security Symposium, 2017. [4] Amos Beimel, Kobbi Nissim, and Eran Omri. Distributed private data analysis: Simultaneously solving how and what. In International Cryptology Conference (CRYPTO), 2008. [5] Andrea Bittau, Úlfar Erlingsson, Petros Maniatis, Ilya Mironov, Ananth Raghunathan, David Lie, Mitch Rudominer, Ushasree Kode, Julien Tinnes, and Bernhard Seefeld. Prochlo: Strong privacy for analytics in the crowd. In Symposium on Operating Systems Principles (SOSP), 2017. [6] Mark Braverman, Ankit Garg, Tengyu Ma, Huy L Nguyen, and David P Woodruff. Communication lower bounds for statistical estimation problems via a distributed data processing inequality. In Symposium on the Theory of Computing (STOC), 2016. [7] Amit Daniely and Vitaly Feldman. Learning without interaction requires separation. In Neural Information and Processing Systems (Neur IPS), 2019. [8] Bolin Ding, Janardhan Kulkarni, and Sergey Yekhanin. Collecting telemetry data privately. In Neural Information Processing Systems (NIPS), 2017. [9] John Duchi and Ryan Rogers. Lower bounds for locally private estimation via communication complexity. In Conference on Learning Theory (COLT), 2019. [10] John C Duchi, Michael I Jordan, and Martin J Wainwright. Local privacy and statistical minimax rates. In Foundations of Computer Science (FOCS), 2013. [11] Cynthia Dwork, Frank Mc Sherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography Conference (TCC), 2006. [12] Cynthia Dwork, Aaron Roth, et al. The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science, 2014. [13] Úlfar Erlingsson, Vasyl Pihur, and Aleksandra Korolova. Rappor: Randomized aggregatable privacy-preserving ordinal response. In Conference on Computer and Communications Security (CCS), 2014. [14] Marco Gaboardi, Ryan Rogers, and Or Sheffet. Locally private mean estimation: Z-test and tight confidence intervals. In International Conference on Artificial Intelligence and Statistics (AISTATS), 2019. [15] Matthew Joseph, Janardhan Kulkarni, Jieming Mao, and Zhiwei Steven Wu. Locally private gaussian estimation. ar Xiv preprint arxiv:1811.08382, 2019. [16] Matthew Joseph, Jieming Mao, Seth Neel, and Aaron Roth. The role of interactivity in local differential privacy. In Foundations of Computer Science (FOCS), 2019. [17] Gautam Kamath, Jerry Li, Vikrant Singhal, and Jonathan Ullman. Privately learning highdimensional distributions. In Conference on Learning Theory (COLT), 2019. [18] Vishesh Karwa and Salil Vadhan. Finite Sample Differentially Private Confidence Intervals. In Innovations in Theoretical Computer Science Conference (ITCS), 2018. [19] Shiva Prasad Kasiviswanathan, Homin K Lee, Kobbi Nissim, Sofya Raskhodnikova, and Adam Smith. What can we learn privately? SIAM Journal on Computing, 2011. [20] Yu-Hsuan Kuo, Cho-Chun Chiu, Daniel Kifer, Michael Hay, and Ashwin Machanavajjhala. Differentially private hierarchical count-of-counts histograms. In International Conference on Very Large Databases (VLDB), 2018. [21] Maxim Raginsky. Strong data processing inequalities and φ-sobolev inequalities for discrete channels. IEEE Transactions on Information Theory, 62(6):3355 3389, 2016. [22] Adam Smith, Abhradeep Thakurta, and Jalaj Upadhyay. Is interaction necessary for distributed private learning? In Symposium on Security and Privacy (SP), 2017. [23] Jonathan Ullman. Tight lower bounds for locally differentially private selection. ar Xiv preprint ar Xiv:1802.02638, 2018. [24] Salil Vadhan. The complexity of differential privacy. In Tutorials on the Foundations of Cryptography, pages 347 450. Springer, 2017.