# context_aware_local_differential_privacy__3710d15e.pdf Context-Aware Local Differential Privacy Jayadev Acharya 1 K. A. Bonawitz 2 Peter Kairouz 2 Daniel Ramage 2 Ziteng Sun 1 3 Local differential privacy (LDP) is a strong notion of privacy that often leads to a significant drop in utility. The original definition of LDP assumes that all the elements in the data domain are equally sensitive. However, in many reallife applications, some elements are more sensitive than others. We propose a context-aware framework for LDP that allows the privacy level to vary across the data domain, enabling system designers to place privacy constraints where they matter without paying the cost where they do not. For binary data domains, we provide a universally optimal privatization scheme and highlight its connections to Warner s randomized response and Mangat s improved response. Motivated by geo-location and web search applications, for k-ary data domains, we consider two special cases of context-aware LDP: blockstructured LDP and high-low LDP. We study minimax discrete distribution estimation under both cases and provide communication-efficient, sample-optimal schemes and information theoretic lower bounds. We show, using worst-case analyses and experiments on Gowalla s 3.6 million check-ins to 43,750 locations, that contextaware LDP achieves a far better accuracy under the same number of samples. 1. Introduction Differential privacy (DP) is a rigorous notion of privacy that enforces a worst-case bound on the privacy loss due to release of query results (Dwork et al., 2006). Its local 1ECE, Cornell University, Ithaca, New York, USA 2Google, Seattle, USA 3JA and ZS are supported by NSF-CCF-1846300 (CAREER), NSF-CRII-1657471 and a Google Faculty Research Award. Work partially done while ZS was visiting Google. Correspondence to: Jayadev Acharya , K. A. Bonawitz , Peter Kairouz , Daniel Ramage , Ziteng Sun . Proceedings of the 37 th International Conference on Machine Learning, Vienna, Austria, PMLR 119, 2020. Copyright 2020 by the author(s). version, local differential privacy (LDP) (Definition 1), provides context-free privacy guarantees even in the absence of a trusted data collector (Warner, 1965; Beimel et al., 2008; Kasiviswanathan et al., 2011). Under LDP, all pairs of elements in a data domain are assumed to be equally sensitive, leading to harsh privacy-utility trade-offs in many learning applications. In many real-life settings, however, some elements are more sensitive than others. For example, in URL applications, users may want to differentiate sensitive URLs from non-sensitive ones, and in geo-location applications, users may want to hide their precise location within a city, but not the city itself. This work introduces a unifying context-aware notion of LDP where different pairs of domain elements can have arbitrary sensitivity levels. For binary domains, we provide a universal optimality result and highlight interesting connections to Warner s response and Mangat s improved response. For k-ary domains, we look at two canonical examples of context-aware LDP: block-structured LDP and high-low LDP. For block-structured LDP, the domain is partitioned into m blocks and the goal is to hide the identity of elements within the same block but not the block identity. This is motivated by geo-location applications where users can be in different cities, and it is not essential to hide the city of a person but rather where exactly within that city a person is (i.e., which bars, restaurants, or businesses they visit). In other words, in this context, users would like to hide which element in a given block of the domain their data belongs to and not necessarily which block their data is in, which may be known from side information or application context. For high-low LDP, we assume there is a set of sensitive elements and we only require the information about possessing sensitive elements to be protected. This can be applied in web browsing services, where there are a large number of URLs and not all of them contain sensitive information. 1.1. Related work Recent works consider relaxations of DP (Chatzikokolakis et al., 2013; Doudalis et al., 2017; Borgs et al., 2018; Asi et al., 2019; Feyisetan et al., 2019; Dong et al., 2019) and LDP (Chatzikokolakis et al., 2017; Alvim et al., 2018; Murakami & Kawamoto, 2018; Kawamoto & Murakami, 2019; Geumlek & Chaudhuri, 2019) that incorporate the context and structure of the underlying data domain. Indeed, Context-Aware Local Differential Privacy (Doudalis et al., 2017; Murakami & Kawamoto, 2018) investigate settings where a small fraction of the domain elements are secrets and require that the adversary s knowledge about whether or not a user holds a secret cannot increase much upon the release of data. (Chatzikokolakis et al., 2013; Borgs et al., 2018; Chatzikokolakis et al., 2017; Alvim et al., 2018; Xiang et al., 2019) model the data domain as a metric space and scale the privacy parameter between pairs of elements by their distance. (Schein et al., 2019) considers categorical high dimensional count models and define privacy measures that help privatize counts that are close in 1 distance. To investigate the utility gains under this model of privacy, we consider the canonical task of distribution estimation: Estimate an unknown discrete distribution given (privatized) samples from it. The trade-off between utility and privacy in distribution estimation has received recent attention, and optimal rates have been established (Duchi et al., 2013; Diakonikolas et al., 2015; Kairouz et al., 2016; Wang et al., 2016; Ye & Barg, 2018; Acharya et al., 2019c). These recent works show that the sample complexity for k-ary distribution estimation up to accuracy in total variation distance increases from (without privacy constraints) (for " = O(1)) when we impose "-LDP constraints. For both the block-structured and high-low models of LDP, we will characterize the precise sample complexity by providing privatization and estimation schemes that are both computationand communication-efficient. Another line of work considers a slightly different problem of heavy hitter estimation where there is no distributional assumption under the data samples that the users have and the main focus is on reducing the computational complexity and communication requirements (Bassily & Smith, 2015; Bassily et al., 2017; Hsu et al., 2012; Wang & Blocki, 2017; Bun et al., 2018; Zhu et al., 2019; Acharya & Sun, 2019). 1.2. Our contributions Motivated by the limitations of LDP, and practical applications where not all domain elements are equally sensitive, we propose a general notion of context-aware local differential privacy (see Definition 2) and prove that it satisfies several important properties, such as preservation under post-processing, adaptive composition, and robustness to auxiliary information. When the underlying data domain is binary, we provide a complete characterization of a universally optimal scheme, which interpolates between Warner s randomized response and Mangat s improved response, given in Theorem 2. We then consider general data domains and investigate two practically relevant models of context-aware LDP. The first is the block-structured model and is motivated by applications in geo-location and census data collection. Under this privacy model, we assume that the underlying data domain is divided into a number of blocks, and the elements within a block are sensitive. For example, when the underlying domain is a set of geographical locations, each block can correspond to the locations within a city. In this case, we would like to privatize the precise value within a particular block, but the privacy of the block ID is not of great concern (Definition 4). In Theorem 4 we characterize the sample complexity of estimating k-ary discrete distributions under this privacy model. We also propose a privatization scheme based on the recently proposed Hadamard Response (HR), which is both computationand communication-efficient. We then prove the optimality of these bounds by proving a matching information-theoretic lower bound. This is achieved by casting the problem as an information constrained setting and invoking the lower bounds from (Acharya et al., 2019b;a). We show that when all the blocks have roughly the same number of symbols, the sample complexity can be reduced by a factor of the number of blocks compared to that under the classic LDP. See Theorem 4 for a formal statement of this result. The second model we consider is the high-low model, where there are a few domain elements that are sensitive while the others are not. A form of this high-low notion of privacy was proposed in (Murakami & Kawamoto, 2018), which also considered distribution estimation and proposed an algorithm based on RAPPOR (Erlingsson et al., 2014). We propose a new privatization scheme based on HR that has the same sample complexity as (Murakami & Kawamoto, 2018) with a much smaller communication budget. We also prove a matching information theoretic lower bound showing the optimality of these schemes (Murakami & Kawamoto, 2018). In this case, the sample complexity only depends quadratically in the number of sensitive elements and linearly in the domain size compared to the quadratic dependence on the domain size under the classic LDP. See Theorem 3 for a formal statement of this result. As a consequence of these results, we observe that the sample complexity of distribution estimation can be significantly less than that in the classical LDP. Thus contextual privacy should be viewed as one possible method to improve the trade-off between the utility and privacy in LDP when different domain elements have different levels of privacy. To validate our worst-case (minimax) analyses, we conduct experiments on synthetic data sampled from uniform, geometric, and power law distributions in addition to experiments on over 3.6 million check-ins to 43,750 locations from the Gowalla dataset. Our experiments confirm that context-aware LDP achieves a far better accuracy under the same number of samples. Context-Aware Local Differential Privacy 1.3. Organization In Section 2 we define LDP and the problem of distribution estimation. In Section 3 we define context-aware LDP and provide an operational definition along with some of its specializations. In Section 4 we provide the optimal privatization scheme for binary domains. In Section 5 and Section 6 we derive the optimal sample complexity in the high-low model, and block-structured model respectively. Experimental results are presented in Section 7. We conclude with a few interesting extensions in Section 8. 2. Preliminaries Let X be the k-ary underlying data domain, wlog let X = [k] := {1, . . . , k}. There are n users, and user i has a (potentially) sensitive data point Xi 2 X. A privatization scheme is a mechanism to add noise to mask the true Xi s, and can be represented by a conditional distribution Q1 : X ! Y, where Y denotes the output domain of the privatization scheme. For y 2 Y, and x 2 X, Q (y | x) is the probability that the privatization scheme outputs y, upon observing x. If we let X and Y denote the input and output of a privatization scheme, then Q (y | x) = Pr (Y = y | X = x). For any set S Y, we have Q(S | x) := Pr (Y 2 S | X = x). Definition 1 (Local Differential Privacy. (Warner, 1965; Kasiviswanathan et al., 2011)). Let " > 0. A conditional distribution Q is "-locally differentially private ("-LDP) if for all x, x0 2 X and all S Y, Q (S | x) e"Q (S | x0) . (1) Let Q" be the set of all "-LDP conditional distributions with input [k]. Distribution estimation. Let k := {(p1, . . . , pk) : 8x 2 [k], px 0; be all discrete distributions over [k]. We assume that the user s data Xn := X1, . . . , Xn are independent draws from an unknown p 2 k. User i passes Xi through the privatization channel Q and sends the output Yi to a central server. Upon observing the messages Y n := Y1, . . . , Yn, the server then outputs ˆp as an estimate of p. Our goal is to select Q from a set of allowed channels Q and to design an estimation scheme ˆp : Yn ! k that achieves the following min-max risk r(k, n, d, Q) = min ˆp max p24k E [d(p, ˆp)] where d( , ) is a measure of distance between distributions. In this paper we consider the total variation distance, 1In the remaining parts of this paper, we use terms privatization scheme, conditional distribution, and channel to refer to this Q matrix interchangeably. d T V (p, q) := 1 i=1 |pi qi|. For a parameter > 0, the sample complexity of distribution estimation to accuracy is the smallest number of users for the min-max risk to be smaller than , n(k, , Q) := arg min n {r(k, n, d TV, Q) < }. When Q is Q", the channels satisfying "- LDP constraints with input domain [k], it is now well established that for " = O(1) (Duchi et al., 2013; Kairouz et al., 2016; Ye & Barg, 2018; Wang et al., 2016; Acharya et al., 2019b;c), n(k, , Q") = Hadamard matrix (Sylvester s construction). Let H1 = [1], then Sylvester s construction of Hadamard matrices is a sequence of square matrices of size 2i 2i recursively defined as Hm Hm Hm Hm Letting Si = {y | y 2 [m], H(i + 1, y) = +1} be the column indices of +1 s in the (i+1)th row of Hm, we have 8i 2 [m 1], |Si| = m 2 . 8i 6= j 2 [m 1], |Si \ Sj| = m 3. Context-Aware LDP In local differential privacy, all elements in the data domain are assumed to be equally sensitive, and the same privacy constraint is enforced on all pairs of them (see (1)). However, in many settings some domain elements might be more sensitive than others. To capture this, we present a context-aware notion of privacy. Let E 2 Rk k 0 be a matrix of non-negative entries, where for x, x0 2 [k], "x,x0 is the (x, x0)th entry of E. Definition 2 (Context-Aware LDP). A conditional distribution Q is E-LDP if for all x, x0 2 X and S Y, Q (S | x) e"x,x0 Q (S | x0) . (4) This definition allows us to have a different sensitivity level between each pair of elements to incorporate context information. For a privacy matrix E, the set of all E-LDP mechanisms is denoted by QE. When all the entries of E are ", we obtain the classical LDP. Symmetric E matrices. When the E matrix is symmetric, context-aware LDP introduces a similar structure as Metricbased LDP (Alvim et al., 2018), which considers X to be a metric space endowed with a distance d X . In this case, consider "x,x0 = "x0,x = d X (x, x0). It is required that it is harder to distinguish close-by symbols compared to symbols that are relatively far from each other. Context-Aware Local Differential Privacy As a special case, we introduce (later in this section) the notion of Block-Structured LDP as an example which only requires the elements to be indistinguishable if they are in the same block. Asymmetric E matrices. An advantage of our framework is that it allows the matrix E to be asymmetric. Consider a binary setting where we ask a user whether or not they have used drugs before. Here, people whose answer is no can occasionally say yes to protect those whose answer is yes. Thus, when the data collector sees a yes, they are lost as to whether it came from a true yes answer or a no that was flipped to a yes. Further, there is no need to randomize yes answers because it is okay for the data collector to know who did not do drugs. This is captured in our framework by allowing Q(no | no)/Q(no | yes) to be arbitrarily large and placing an upper bound on Q(yes | yes)/Q(yes | no). A well-known scheme that satisfies this requirement is Mangat s improved randomized response (Mangat, 1994). See Section 4 for more details. This intuition is generalized to k-ary alphabet in the High-Low LDP model introduced later in this section. We provide an operational (hypothesis testing) interpretation for context-aware LDP in Section 3.1. We show that contextaware LDP is robust to post-processing, robust to auxiliary information, and is adaptively composable in Section 3.2. For binary domains, namely when k = 2, we give a single optimal privatization scheme for all E matrices in Section 4. For general k, it is unclear whether there is a simple characterization of the optimal schemes for all E matrices. First note that if " = mini,j "i,j is the smallest entry of E, then any " -LDP algorithm is also E-LDP. However, this is not helpful since it does not help us capture the context of the application and consequently get rid of the stringent requirements of standard LDP. Motivated by applications in geo-location and web search, we consider structured E matrices that are both practically motivating, and are paramterized by a few parameters (or a single parameter) so that they are easier to analyze and implement. High-Low LDP (HLLDP). The high-low model captures applications where there are certain domain elements that are private, and the remaining elements are non-private. We only want to protect the privacy of private elements. This is formalized below. Definition 3. Let A = {x1, , xs} X denote the set of sensitive domain elements, and all symbols in B := X \ A are non-sensitive. A privatization scheme is said to be (A, ")-HLLDP if 8S Y, and x 2 A, x0 2 X, Q(S | x) e"Q(S | x0), (5) which corresponds to the following E matrix: ", x 2 A, 1, x 2 B. This implies that when the input symbol is in A, the output distribution cannot be multiplicatively larger than the output distribution for any other symbol, but there is no such restriction for symbols in B. HLLDP was also defined in (Murakami & Kawamoto, 2018). We solve the problem of minimax distribution estimation under this privacy model in Section 5. Block-Structured LDP (BSLDP). In applications such as geo-location, it is important to preserve the privacy of symbols that are close to each other. We consider this model where the data domain is divided into various blocks (e.g., the cities), and we would like the symbols within a block (e.g., the various locations within a city) to be hard to distinguish. This is formalized below. Definition 4. Suppose there is a partition of X, which is = {X1, X2, ..., Xm}. With a slight abuse of notation, we define 8x 2 Xi, (x) := i. Then 8" > 0, a privatization scheme is said to be ( , ") - BSLDP if it satisfies 8x, x0 2 Xi such that (x) = (x0), and any S Y, we have Q(S | x) e"Q(S | x0), (6) which corresponds to the following E matrix: ", (x) = (x0), 1, (x) 6= (x0). This definition relaxes the local version of differential privacy in the following way. Given a partition of the input set = {X1, X2, ..., Xm}, it requires different levels of indistinguishablitity for element pairs in the same block and those in different blocks. We solve the problem of minimax distribution estimation under this model in Section 6. 3.1. Operational interpretation of context-aware LDP Recall that the entries of E denote the amount of privacy across the corresponding row and column symbols. We provide an operational interpretation of E-LDP by considering a natural hypothesis testing problem. Suppose we are promised that the input is in {x, x0} for some symbols x, x0 2 [k], and an E-LDP scheme outputs a symbol Y 2 Y. Given Y , the goal is to test whether the input is x or x0. Theorem 1 (Operational Interpretation of Context-Aware LDP). A conditional distribution Q is E-locally differentially private if and only if for all x, x0 2 X and all decision rules ˆX : Y ! {x, x0}, PFA(x, x0) + e"x0,x PMD(x, x0) 1, (7) e"x,x0 PFA(x, x0) + PMD(x, x0) 1, (8) Context-Aware Local Differential Privacy 1 1+e" 0 PMD "x,x0 +"x0,x 1 "x,x0 +"x0,x 1 Figure 1. Error region for hypothesis testing between x and x0 under DP constraints when "x,x0 < "x0,x. (Blue for Context Aware LDP, Red for standard LDP) where PFA(x, x0) = Pr( ˆX = x | X = x0) is the false alarm rate and PMD(x, x0) = Pr( ˆX = x0 | X = x) is the miss detection rate. The proof is provided in Section A.4. Consider a test for distinguishing x and x0 given above in Theorem 1. Figure 1 shows the effective error regions for any estimator ˆX under the privacy constraints "x,x0 and "x0,x. We can see that unlike the symmetric region under LDP, we are pushing the miss detection rate to be higher when "x,x0 < "x0,x. This shows that symbol x is more private than symbol x0, namely we want to protect the identity of symbol being x more than we want to protect the identity being x0. 3.2. Properties of context-aware LDP Context-aware LDP also satisfies several important properties held by classical LDP, including post-processing, adaptive composition and robustness to auxiliary information. We provide their proofs in Section A. Proposition 1 (Preservation under post-processing). Let A1 : X ! Y1 be an E-LDP scheme and A2 : Y1 ! Y2 is any algorithm that processes the output of A1, then the scheme A = A2 A1 is also E-LDP. Proposition 2 (Adaptive composition). Let A1 : X ! Y1 be an E1-LDP scheme and A2 : X Y1 ! Y2 be an E2-LDP scheme, then the scheme A defined as (A1, A2) is (E1 + E2)-LDP. Proposition 3 (Robustness to auxiliary information). Let p be a prior we have over X and A : X ! Y be an E-LDP scheme. Then 8x1, x2 2 X and y 2 Y, Pr (X = x1 | Y = y) Pr (X = x2 | Y = y) e"x1,x2 p (x1) 4. Binary Domains Consider a binary domain, namely k = 2 and domain elements {1, 2}. In this case, we have 0 "1,2 "2,1 0 Perhaps the oldest and simplest privatization mechanism is Warner s randomized response for confidential survey interviews (Warner, 1965). In this section, we give the optimal scheme for all utility functions that obey the data processing inequality under all possible binary constraints. We prove that when "1,2 = "2,1, the optimal scheme is Warner s randomized response. Consider the composition of two privatization mechanisms Q W where the output of the first mechanism Q is applied to another mechanism W. We say that a utility function U( ) obeys the data processing inequality if for all Q and W U(Q W) U(Q) . In other words, further processing of the data can only reduce the utility. Such utility functions are ubiquitous. For example, in the minimax distribution learning context of this paper, U(Q) may be chosen as minˆp maxp E [d(p, ˆp)] (i.e., the negative of the minimax risk under a fixed mechanism Q) with d being any p distance or f-divergence. Theorem 2 (Optimal Mechanism for Binary Domains). Let X be a binary input alphabet and U(Q) be any utility function that obeys the data processing inequality. Then for any "1,2, "2,1 0, the following privatization mechanism e"2,1 1 1 e "1,2 e "1,2 (e"2,1 1) e"2,1 (1 e "1,2) e"2,1 e "1,2 (9) Q U(Q) subject to Q 2 QE. Here 8x, y 2 {1, 2}, Q (x, y) := Q (y | x). As a special case of the above theorem, if we consider the original local differential privacy setup where "1,2 = "2,1 = ", then the optimal mechanism for binary alphabets is Q = 1 e" + 1 This is Warner s randomized response model in confidential structured survey interview with p = e"/(e" + 1) (Warner, 1965). Warner s randomized response was shown to be opti- mal for binary alphabets in (Kairouz et al., 2014). Another interesting special case of the above theorem is Mangat s Context-Aware Local Differential Privacy improved randomized response strategy (Mangat, 1994). To see this, let "1,2 = 1 and p = e "2,1. Then This is exactly Mangat s improved randomized response strategy. Thus Mangat s randomized response with p = e "2,1 is optimal for all utility functions obeying the data processing inequality under this generalized differential privacy framework with "1,2 = 1, which corresponds to the case where element 1 is not sensitive. 5. Distribution Estimation under HLLDP We characterize the optimal sample complexity of distribution estimation under the high-low model (see Definition 3). Theorem 3. Let A X, with |A| = s < k/2, and " = O(1). Let QA," be the set of all possible channels satisfying (A, ")-HLLDP, then: n(k, , QA,") = When the size of the sensitive set is relatively large, e.g. s > k, the sample complexity is (s2/( 2"2)), which corresponds to classic LDP with alphabet size s. This question was considered in (Murakami & Kawamoto, 2018), which gave an algorithm based on RAPPOR (Erlingsson et al., 2014) that has the optimal sample complexity, but requires (k) bits of communication from each user, which is prohibitive in settings where the uplink capacity is limited for users. We design a scheme based on Hadamard Response (HR), which is also sample-optimal but requires only log k bits of communication from each user. (Murakami & Kawamoto, 2018) noted that a lower bound of s2 2"2 (the first term in (12)) is immediately implied by previously known results on distribution estimation under standard LDP ((3) for k = s). However, obtaining a lower bound equalling the second term was still open. Using the recently proposed technique in (Acharya et al., 2019b), we prove this lower bound in Section 5.2. 5.1. Achievability using a variant of HR We propose an algorithm based on Hadamard response (Acharya et al., 2019c), which gives us a tight upper bound for k-ary distribution estimation under (A, ")-highlow LDP. Let s = |A| and S be the smallest power of 2 larger than s, i.e., S := 2dlog(s+1)e. Let t := k s be the number of all non-sensitive elements. Then we have S + t 2(s + t) = 2k. Let HS be the S S Hadamard matrix using Sylvester s construction. Define the output alphabet to be [S + t] = {1, ..., S + t}. then the channel is defined as the following: When x 2 A = [s], we have 2e" S(e"+1) if y 2 [S] s.t. HS(x, y) = 1, 2 S(e"+1) if y 2 [S] s.t. HS(x, y) = 1, 0, if y /2 [S]. Else if x /2 A, we have 2 S(e"+1), if y 2 [S], e" 1 e"+1, if y = x + S s, 0, otherwise. It is easy to verify that this scheme satisfies (A, ")-high-low LDP. Next we construct estimators based on the observations Y1, Y2, . . . ,Yn and states its performance in Proposition 4. For all i 2 [s], bpi = 2(e" + 1) [ p(Si) 1 e" + 1 [ p(A), (15) [ p(A) = e" + 1 1{Ym 2 S} 2 e" + 1 [ p(Si) = 1 For all i /2 [s], we simply use the empirical estimates bpi = e" + 1 1{Ym = i + S s} Proposition 4. The estimators defined in (15) and (16) satisfy the following: E [d T V (ˆp, p)] e" + 1 e" 1 when " = O(1), setting the right hand side to be smaller than gives us the upper bound part of Theorem 3. For the proof of (17), please see Section C.2. In this scheme, each user only needs to communicate at most log k + 1 bits since S + t 2k while the scheme proposed in (Murakami & Kawamoto, 2018) needs (k) bits. 5.2. Lower bound We now prove the lower bound part of Theorem 3. A lower bound of (s2/"2 2) follows from (3), which follows directly from the lower bounds on sample complexity of distribution under standard "-LDP (e.g., Theorem IV.1 in (Ye & Barg, 2018)). Context-Aware Local Differential Privacy To prove a lower bound equalling the second term, we use the framework developed in (Acharya et al., 2019b) to prove lower bounds for distributed inference under various information constraints (e.g., privacy and communication constraints). Their key idea is to design a packing of distributions around the uniform distribution and show that the amount of information that can be gleaned from the output of these schemes is insufficient for distribution estimation. In particular, we will use their following result. Lemma 1. [Lemma 13](Acharya et al., 2019b) Let u be the uniform distribution over [k] and P be a family of distributions satisfying the following two conditions. 1. 8p 2 P, we have d T V (p, u) . 2. 8p1 2 P, we have |{p2 2 P|d T V (p1, p2) Suppose Q is the set of all channels we can use to get information about X, then we have the sample complexity of k-ary distribution learning up to TV distance /3 under channel constraints Q is at least log |P| log C max Q2Q χ2(Q|P) where p Q(u Q) is the distribution of Y when X p (u), and χ2(P) := 1 |P| dχ2(p Q, u Q), dχ2(p, q) := (p(y) q(y))2 Let k0 = k s k/2 be the number of non-sensitive elements. Let Z = {+1, 1}k0 be the set of k0 bits. For all z 2 Z, define pz as the following j=1 zj k0 i = 1, 1 k i = 2, 3, ..., s, 1 k + zi s k0 i = s + 1, s + 2, ..., k. Let PZ = {pz|z 2 Z} be the set of all distributions defined by z 2 Z. Let UZ be a uniform distribution over set Z. Then we have z (y) u Q(y))2 i=1(Q(y|1) Q(y|s + i))zi)2 j2[k] Q(y|j) ] i=1(Q(y|1) Q(y|s + i))2 j2[k] Q(y|j) . To bound this quantity, we have the following claim, which we prove in Section D.1. Claim 1. If 8 Q 2 Q, we have: 8i 2 {s + 1, s + 2, ..., s + k0}, y 2 Y, Q(y|1) e"Q(y|i), (Q(y|s + i) Q(y|1))2 j2[k] Q(y|j) = O("k). (18) Moreover, we have |P| = 2k0 and C 2(1 h(1/3)k0), where h(x) = x log(1/x) + (1 x) log(1/(1 x)). Combining these results and using Lemma 1, we get the sample complexity is at least log |P| log C max Q2Q χ2(Q|P) 6. Distribution Estimation under BSLDP For distribution estimation under block-structured LDP constraints we prove the following theorem. Theorem 4. Let " = O(1), and = {X1, X2, ..., Xm} be a partition of X and Q ," be the set of all possible channels that satisfy ( , ")-BSLDP, then n(k, , QP,") = where 8i 2 [m], |Xi| = ki and |X| = k = Pm In the special case when all the blocks have the same size k/m, the sample complexity is (k2/(m 2"2)), which saves a factor of m compared classic LDP. In Section 6.1, we describe an algorithm based on HR that achieves this error. Moreover, it only uses O(log k) bits of communication from each user. A matching lower is proved bound in Section 6.2, which shows that our algorithm is information theoretically optimal. 6.1. Achievablity using a variant of HR The idea of the algorithm is to perform Hadamard Response proposed in (Acharya et al., 2019c) within each block. Without loss of generality we assume each block Xj = {(j, i) | i 2 [kj]}. For each block Xj, j 2 [m], we associate a Hadamard matrix HKj with Kj = 2dlog(kj+1)e. Let Yj = {(j, i) | i 2 [Kj]}. For each x = (j, i) 2 Xj, we assign the (i + 1)th row of HKj to x. Define the set of locations of +1 s at the (i + 1)th row of HKj to be Sx. Context-Aware Local Differential Privacy Then the output domain is Y := [m j=1Yj. The privatization scheme is given as Q(Y = (j, i) | X) = 2e" Kj(1+e"), X 2 Xj, i 2 SX, 2 Kj(1+e"), X 2 Xj, i /2 SX, 0, elsewhere. It is easy to verify that this scheme satisfies the privacy constraints. Next we construct estimators based on Y1, Y2, . . . ,Yn and state the performance in Proposition 5. Let Y (1), Y (2) be the two coordinates of each output Y . For each j 2 [m] and x 2 Xj, ˆpx = 2(e" + 1) \ p(Xj) = 1 1{Yt(1) = j}, \ p(Sx) = 1 1{Yt(1) = j, Yt(2) 2 Sx}. Proposition 5. Under the unbiased estimator ˆp described in (19), we have: E [d T V (ˆp, p)] 2(e" + 1) when " = O(1), we get desired bounds in Theorem 4. For the proof of (20), see Section C.4. We note here that our algorithm also gives the optimal bound in terms of 2 distance. A matching lower bound can shown using well established results (Duchi et al., 2013) on LDP by considering the maximum of expected loss if we put all the mass on each single block. 6.2. Lower bound We now prove the lower bound part of Theorem 4. The general idea is similar to the proof in Section 5.2, which is based on Lemma 1. Without loss of generality, we assume all the ki s are even numbers 2. We construct a family of distributions as following: Let Z = Z1 Z2 Zm and 8j 2 [m], Zj = {+1, 1} 2 . 8z 2 Z, we denote the jth entry of z as zj where zj 2 Zj. Define zj,i to be the ith bit of zj. 8j 2 [m] and i 2 [kj/2], we have 2If one of the ki s is odd, we can remove one element from Xj, which will make the problem simpler and the sample complexity remains unchanged up to constant factors pz((j, 2i 1)) = 1 k + 2zj,ikj Pm pz((j, 2i)) = 1 k 2zj,ikj Pm Note that 8z 2 Z, d TV(pz, u) = . Moreover, |P| = 2 k 2 and |C | 2 k 2 h(1/3) where h is the binary entropy function. By Lemma 1, let Q be the set of channels that satisfy (P, ")- LDP and PZ = {pz | z 2 Z}, it would suffice to show that max Q2W χ2(Q | PZ) = O The proof of (21) is technical and presented in Section D.2. 7. Experiments We perform experiments on both synthetic data and real data to empirically validate how the new notion of context-aware LDP and associated algorithms would affect the accuracy of k-ary distribution estimation. Specifically, we choose the special case of block-structured LDP. For synthetic data, we set k = 1000, " = 1. We assume all the blocks to have the same size and m 2 {10, 20, 50, 100}. For traditional LDP, we use Hadamard Response (Acharya et al., 2019c), one of the state-of-the-art sample-optimal algorithms. We take n = 1000 2i, i 2 {0, 1, , 9} and generate samples from the following three distributions: Geometric distribution Geo(λ) where p(i) / (1 λ)iλ, Zipf distribution Zipf(λ) where p(i) / (i + 1) λ and uniform distribution. We assume the blocks are partitioned based on their indices (the first block is {0, 1, , k/m 1}, the second is {k/m, k/m + 1, , 2k/m 1} and so on). For Geometric distribution, we consider another case where we permute the mass function over the symbols to spread the heavy elements into multiple blocks, denoted by Geo(λ) . The results are shown in Figure 2 and each point is the average of 10 repetitions of the same experiment. We can see that we get consistently better accuracy under the notion of block-structured LDP compared to the classical notion. Moreover, the larger m we have, the better accuracy we get, which is consistent with our analysis. To validate our algorithm on real datasets, we take the Gowalla user check-in dataset (Cho et al., 2011), which consists of user check-in records with location information (latitudes and longitudes) on the Gowalla website. We take 3671812 check-in records with locations within 25N and 50N for latitude and 130W and 60W for longitude (mostly (m1, m2) LDP (5,7) (25, 35) (25, 70) d T V -error 0.591 0.298 0.108 0.082 Table 1. d T V estimation error under different (m1, m2) pairs. Context-Aware Local Differential Privacy (a) Uniform (b) Geo(0.95) (c) Zipf(1) (d) Geo(0.95) Figure 2. d T V -error comparison under different distributions Figure 3. Heatmap (distribution) of the check-in records. within continental US). We round the latitude and longitude for each record up to accuracy 0.2 and regard records with the same latitude and longitude as the same location. By doing so, we get a dataset with 43, 750 possible values. Figure 3 shows the empirical distribution of the check-in records of the dataset. We take the empirical distribution of the records as our ground truth and try to estimate it while preserving the privacy of each record. We partition latitudes into m1 equal parts and longitudes into m2 equal parts. The resulting grid will be used as the blocks (m1m2 blocks in total). Table 1 shows the average d T V error over 100 runs of the experiment for LDP and BS-LDP with different (m1, m2) pairs. From the table we can see that by switch- ing to block-structured LDP, we can get more meaningful estimation accuracy compared to classical LDP. 8. Conclusion We presented a unifying context-aware LDP framework and investigated communication, privacy, and utility trade-offs under both binary and k-ary data domains for minimax distribution estimation. Our theory and experiments on synthetic and real-world data suggest that context-aware LDP leads to substantial utility gains compared to vanilla LDP. In order to examine the effect of the number of data partitions in block-structured LDP, our experiments focused on uniform partitioning of geo-location data and examined the utility gains and various partition sizes. In practice however, non-uniform partitioning can better reflect the different topologies of cities. Thus, more careful experiments with non-uniform partitions need to performed to better quantify the utility gains. More broadly, more experiments should be conducted to verify that the gains we see on the Gowalla dataset are also applicable in other data domains. Context-Aware Local Differential Privacy Acharya, J. and Sun, Z. Communication complexity in locally private distribution estimation and heavy hitters. In Chaudhuri, K. and Salakhutdinov, R. (eds.), Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pp. 51 60, Long Beach, California, USA, 09 15 Jun 2019. PMLR. Acharya, J., Canonne, C., Freitag, C., and Tyagi, H. Test without trust: Optimal locally private distribution testing. In The 22nd International Conference on Artificial Intelligence and Statistics, pp. 2067 2076, 2019a. Acharya, J., Canonne, C. L., and Tyagi, H. Inference under information constraints: Lower bounds from chi-square contraction. In Beygelzimer, A. and Hsu, D. (eds.), Proceedings of the Thirty-Second Conference on Learning Theory, volume 99 of Proceedings of Machine Learning Research, pp. 3 17, Phoenix, USA, 2019b. PMLR. Acharya, J., Sun, Z., and Zhang, H. Hadamard response: Estimating distributions privately, efficiently, and with little communication. In Chaudhuri, K. and Sugiyama, M. (eds.), Proceedings of Machine Learning Research, volume 89 of Proceedings of Machine Learning Research, pp. 1120 1129. PMLR, 16 18 Apr 2019c. Alvim, M. S., Chatzikokolakis, K., Palamidessi, C., and Pazii, A. Local differential privacy on metric spaces: Optimizing the trade-off with utility. 2018 IEEE 31st Computer Security Foundations Symposium (CSF), pp. 262 267, 2018. Asi, H., Duchi, J., and Javidbakht, O. Element level differ- ential privacy: The right granularity of privacy, 2019. Bassily, R. and Smith, A. Local, private, efficient protocols for succinct histograms. In Proceedings of the fortyseventh annual ACM symposium on Theory of computing, pp. 127 135. ACM, 2015. Bassily, R., Nissim, K., Stemmer, U., and Thakurta, A. G. Practical locally private heavy hitters. In Advances in Neural Information Processing Systems, pp. 2285 2293, 2017. Beimel, A., Nissim, K., and Omri, E. Distributed private data analysis: Simultaneously solving how and what. In Proceedings of the 28th Annual International Cryptology Conference, CRYPTO 08, pp. 451 468, Berlin, Heidelberg, 2008. Springer. Borgs, C., Chayes, J., Smith, A., and Zadik, I. Reveal- ing network structure, confidentially: Improved rates for node-private graphon estimation. In 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS), pp. 533 543. IEEE, 2018. Bun, M., Nelson, J., and Stemmer, U. Heavy hitters and the structure of local privacy. In Proceedings of the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, pp. 435 447. ACM, 2018. Chatzikokolakis, K., Andr es, M. E., Bordenabe, N. E., and Palamidessi, C. Broadening the scope of differential privacy using metrics. In International Symposium on Privacy Enhancing Technologies Symposium, pp. 82 102. Springer, 2013. Chatzikokolakis, K., Elsalamouny, E., and Palamidessi, C. Efficient utility improvement for location privacy. Proceedings on Privacy Enhancing Technologies, 2017(4): 308 328, 2017. Cho, E., Myers, S. A., and Leskovec, J. Friendship and mo- bility: user movement in location-based social networks. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 1082 1090, 2011. Diakonikolas, I., Hardt, M., and Schmidt, L. Differentially private learning of structured discrete distributions. In Advances in Neural Information Processing Systems 28, pp. 2566 2574. 2015. Dong, J., Roth, A., and Su, W. J. Gaussian differential privacy. ar Xiv preprint ar Xiv:1905.02383, 2019. Doudalis, S., Kotsoginannis, I., Haney, S., Machanavajjhala, A., and Mehrotra, S. One-sided differential privacy. Proceedings of the VLDB Endowment, 10(11), 2017. Duchi, J. C., Jordan, M. I., and Wainwright, M. J. Local privacy and statistical minimax rates. In FOCS, pp. 429 438. IEEE, 2013. Dwork, C., Mcsherry, F., Nissim, K., and Smith, A. Cali- brating noise to sensitivity in private data analysis. In In Proceedings of the 3rd Theory of Cryptography Conference, 2006. Erlingsson, U., Pihur, V., and Korolova, A. Rappor: Randomized aggregatable privacy-preserving ordinal response. In Proceedings of the 2014 ACM SIGSAC conference on computer and communications security, pp. 1054 1067. ACM, 2014. Feyisetan, O., Diethe, T., and Drake, T. Leveraging hierar- chical representations for preserving privacy and utility in text, 2019. Geumlek, J. and Chaudhuri, K. Profile-based privacy for locally private computations. In 2019 IEEE International Symposium on Information Theory (ISIT), pp. 537 541. IEEE, 2019. Context-Aware Local Differential Privacy Hsu, J., Khanna, S., and Roth, A. Distributed private heavy hitters. In International Colloquium on Automata, Languages, and Programming, pp. 461 472. Springer, 2012. Kairouz, P., Oh, S., and Viswanath, P. Extremal mechanisms for local differential privacy. In Advances in Neural Information Processing Systems, pp. 2879 2887, 2014. Kairouz, P., Bonawitz, K., and Ramage, D. Discrete dis- tribution estimation under local privacy. In Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48, ICML 16, pp. 2436 2444, 2016. Kasiviswanathan, S. P., Lee, H. K., Nissim, K., Raskhod- nikova, S., and Smith, A. What can we learn privately? SIAM Journal on Computing, 40(3):793 826, 2011. Kawamoto, Y. and Murakami, T. Local obfuscation mecha- nisms for hiding probability distributions. In European Symposium on Research in Computer Security, pp. 128 148. Springer, 2019. Mangat, N. S. An improved randomized response strat- egy. Journal of the Royal Statistical Society: Series B (Methodological), 56(1):93 95, 1994. Murakami, T. and Kawamoto, Y. Restricted local differential privacy for distribution estimation with high data utility. ar Xiv preprint ar Xiv:1808.01394, 2018. Schein, A., Wu, Z. S., Schofield, A., Zhou, M., and Wallach, H. Locally private bayesian inference for count models. In International Conference on Machine Learning, pp. 5638 5648, 2019. Wang, S., Huang, L., Wang, P., Nie, Y., Xu, H., Yang, W., Li, X.-Y., and Qiao, C. Mutual information optimally local private discrete distribution estimation. ar Xiv preprint ar Xiv:1607.08025, 2016. Wang, T. and Blocki, J. Locally differentially private proto- cols for frequency estimation. In Proceedings of the 26th USENIX Security Symposium, 2017. Warner, S. L. Randomized response: A survey technique for eliminating evasive answer bias. Journal of the American Statistical Association, 60(309):63 69, 1965. Xiang, Z., Ding, B., He, X., and Zhou, J. Linear and range counting under metric-based local differential privacy, 2019. Ye, M. and Barg, A. Optimal schemes for discrete dis- tribution estimation under locally differential privacy. IEEE Transactions on Information Theory, 64:5662 5676, 2018. Zhu, W., Kairouz, P., Sun, H., Mc Mahan, B., and Li, W. Fed- erated heavy hitters discovery with differential privacy. ar Xiv preprint ar Xiv:1902.08534, 2019.