# profile_reconstruction_from_private_sketches__7341b4ca.pdf Profile Reconstruction from Private Sketches Hao WU 1 Rasmus Pagh 1 Given a multiset of n items from D, the profile reconstruction problem is to estimate, for t = 0, 1, . . . , n, the fraction f[t] of items in D that appear exactly t times. We consider differentially private profile estimation in a distributed, space-constrained setting where we wish to maintain an updatable, private sketch of the multiset that allows us to compute an approximation of f = ( f[0], . . . , f[n]). Using a histogram privatized using discrete Laplace noise, we show how to reverse the noise, using an approach of Dwork et al. (ITCS 10). We show how to speed up their LP-based technique from polynomial time to O(d + n log n), where d = |D|, and analyze the achievable error in the ℓ1, ℓ2 and ℓ norms. In all cases the dependency of the error on d is O(1/ d) we give an informationtheoretic lower bound showing that this dependence on d is asymptotically optimal among all private, updatable sketches for the profile reconstruction problem with a high-probability error guarantee. 1. Introduction The profile is a fundamental statistic of a data set. Given a multiset of n items from a finite domain D, the profile is a vector f whose t(th) entry equals the fraction f[t] of items in D that appear exactly t times. The concept has arisen in many contexts under various names, including rarity, incidence, fingerprint, prevalence, collision statistics, anonymous histogram, unattributed histogram, and histogram of histograms. Besides succinctly summarizing frequency information in a dataset, for example the degree distribution in a graph, a profile is sufficient to compute any symmetric function of the dataset histogram (Chen et al., 2024). In settings where space, communication or privacy constraints prevents us from computing the exact profile it is of 1Department of Computer Science, University of Copenhagen, Denmark. Correspondence to: Hao WU , Rasmus Pagh . Proceedings of the 41 st International Conference on Machine Learning, Vienna, Austria. PMLR 235, 2024. Copyright 2024 by the author(s). interest to approximate the profile f by a vector r such that r f has small ℓp norm for some parameter p. Streaming algorithms. Datar & Muthukrishnan (2002) considered estimating the profile in a streaming setting, showing that a small sample of the domain D suffices to ensure small error (they focus on ℓ1 error, but the technique applies to general ℓp norms). The sample can be described by a compact hash function, making it suitable for use in distributed settings. More recently, Chen, Indyk, and Woodruff (2024) improved the error bound, and showed matching space lower bounds for streaming algorithms. General estimators for symmetric functions based on samples were presented and analyzed by Acharya, Das, Orlitsky, and Suresh (2017). Differentially private algorithms. Differential privacy (Dwork et al., 2006) is a leading approach to ensure that the output of an algorithm, from simple statistics to machine learning models, does not reveal too much about the inputs. It is one of the central techniques for enabling collection of statistics, training of machine learning models, and other computations on data that may hold sensitive information. We will focus on the privacy of individual elements in the multiset, that is, our differential privacy guarantee is defined in terms of pairs of neighboring datasets where the number of occurrences of a particular element differs by 1, and that are otherwise identical. In the central model of differential privacy (where a trusted curator holds the input data), privately releasing a profile with small ℓ2 error was considered by Hay, Li, Miklau, and Jensen (2009); Hay, Rastogi, Miklau, and Suciu (2010), the latter focusing on the application of graph degree distributions. Private synthetic graphs with specific degree distribution were also studied by Karwa and Slavkovic (2012). Motivated by estimating repetitions in password datasets, Blocki, Datta, and Bonneau (2016) studied differentially private protocols for profile estimation, focusing on ℓ2 error. Suresh (2019) considered an alternative (incomparable) error metric, where the profile is represented in the verbatim form of a vector encoding all frequencies in sorted order. They presented an algorithm to release verbatim profiles with a small ℓ1 error. Recently Manurangsi (2022) showed improved tight upper and lower bounds on the privacy/ℓ1 error trade-off for verbatim profiles. Profile Reconstruction from Private Sketches Differentially private distributed/streaming algorithms. Dwork, Naor, Pitassi, Rothblum, and Yekhanin (2010) studied profile estimation (referred to as t-incidence) in the pan-private streaming setting, where the internal state of the streaming algorithm must satisfy a differential privacy guarantee. Pan-private algorithms protect against intrusions, i.e., unintended release of the internal state of the algorithm. Unlike the other works on private profile estimation mentioned above, the neighboring relation considered by Dwork et al. (2010) is defined on the profile itself, i.e., datasets are neighboring if they differ by any amount on the count of one item (so the profiles differ in two entries). In a nutshell1, their algorithm works by maintaining a histogram that stores the number of occurrences of each item perturbed by additive noise. Since the noisy histogram is a linear function of the histogram it is updatable when the count of an item changes. Dwork et al. (2010) finally show how to derive an estimate of the profile from the noisy histogram by solving a linear programming problem. The size of the sketch required for achieving ℓ error α on the profile estimate with high probability is stated as poly(log n, 1/ε, 1/α) with unspecified exponents. Unfortunately, the paper does not analyze for what values of α it is possible to solve the linear program (with high probability), meaning that the privacy-utility trade-off obtained remains unclear. 1.1. Our contributions In this paper we study how to approximate the profile from a noisy histogram of the data. Specifically, we consider the setting where discrete Laplace noise has been added to the counts in the histogram. (This is different from the noise distribution used by Dwork et al. (2010), which is nearly uniform modulo a small prime pi.) Computing the empirical profile directly from the noisy histogram can lead to large errors. For example, consider the situation where D has n items that each appear once in the dataset and let ε > 0 be a small constant. After adding noise to the histogram to make it ε-differentially private (and possibly rounding up negative values to 0), the fraction of items with a count of 1 will be ε, far from the true fraction of 1. Instead we use an approach similar to that of Dwork et al. (2010) to approximately invert the effect of the noise. Since we use a known noise distribution, the expected value of the profile computed from the noisy histogram is a linear function of the true profile. Due to concentration we can argue that the observed values in the profile based on the noisy histogram are in fact close to the expected ones, and 1Since there is no good bound on the difference between histograms with their notion of neighboring datasets, Dwork et al. (2010) actually work with histograms of counts modulo several small random primes pi, and use these to estimate profiles on the counts modulo pi, which are in turn used to estimate the original profile. In our context this complication goes away. thus it makes sense to find a histogram whose expected profile after noise would be close to the observed noisy profile. Unlike Dwork et al. (2010) who formulate this as a feasibility question of a linear program, we show that it can be reduced to solving a linear ℓp-regression problem with a circulant matrix, paving the way for a near-linear time algorithm. We also give a careful analysis of the error Errp in ℓp norm, for p {1, 2, }, showing the following upper bound: Theorem 1.1. Let η (0, 1), ε > 0. Denote B .= 1 ε ln max n 2d η(eε+1), 8 eε e2ε 1 o , and assume that n B. There is an algorithm that, given a private version h of a d-dimensional histogram h published by the ε-DP discrete Laplace mechanism and a parameter p = 1, 2 or , returns an approximate r for the profile f of h in O(d + n log n) time. Moreover, with probability at least 1 η, the estimation error Errp .= || r f||p satisfies 2 (eε 1) d (eε + 1) ln n where f[t] = 1 d|{ℓ D : h[ℓ] = t}|, t = -B, . . . , n + B. Notice that the error bounds in all three norms are proportional to O(1/ d), where d is the domain size. As we will discuss below, this dependence is optimal under ε-differential privacy for updatable sketches. On the other hand, when ε O(1), the O(1/ε1.5) dependence for Err on the privacy parameter does not match the Ω(1/ε) lower bound on the error of releasing a single counter (Hardt & Talwar, 2010). We note that the bound on Err1 is a refinement of the bound implied by the bound on Err2 and the fact that ℓ1 distances in n dimensions are at most n times larger than ℓ2 distances. Since errors are normalized by a 1/d factor one might hope to achieve errors that decrease more rapidly with d than those of Theorem 1.1. However, we show the following lower bound: Theorem 1.2. For every ε-DP and updatable algorithm A : [0 . . n]d Y, and every profile estimation algorithm R : Y [0, 1]n+1, if the sensitivity R A o 1 (for some universal constant c > 0), then there exists an input h [0 . . n]d such that E h f R A( h) d ec ε . (4) This lower bound is not limited to algorithms reconstructing the profile from a perturbed histogram published by the Profile Reconstruction from Private Sketches discrete Laplace mechanism. Instead, it applies to a broader class of algorithms that attempt to reconstruct the profile from outputs of updatable ε-differentially private algorithms. The formal definition of an updatable algorithm and the sensitivity R A will be given in Section 5. Informally, the former denotes the property that when the input histogram h is updated, we can directly modify A( h) to reflect the change, and the latter represents the maximum distance between the profiles reconstructed from the outputs of A for neighboring histograms. The lower bound shows that an error of Ω(1/ d) is unavoidable for such sketches. 1.2. Technical overview As noted by Datar & Muthukrishnan (2002) and Dwork et al. (2010) it suffices to estimate the profile of a sample of elements from the domain, since the profile of the sample will closely resemble the true profile. Thus we will assume that this sampling step has already been done, and consider the full histogram of (sampled) items. The final error bounds add the sampling error and the error from privacy using the triangle inequality, but here we focus on the latter. Given a noisy histogram h, we can compute an empirical profile f. As the noisy histogram is obtained by adding independent noise to each coordinate of the full histogram h, the empirical profile can be viewed as a function of a set of independent random variables and should concentrate around its expectation E[ f]. Hence, we regard the empirical profile as a reliable representative of its expectation. Although the expectation E[ f] itself may not be a good approximation of the original profile f of h, there is a subtle connection between them: E[ f] can be computed via a linear transform of the original profile f. Due the symmetry of discrete Laplace noise, we can write the linear transform as A f for a circulant matrix A with non-zero eigenvalues. Since the inversion for such linear transforms can be computed in O(n log n) time via Fast Fourier transform (FFT), it motivates us to compute A 1 f as an approximation of the true profile. However, such an approximation may not be a valid profile: it may contain coordinates outside [0, 1], and its coordinates do not sum up to 1. We will design a systematic procedure for fixing these violations, which also relies on FFT and runs in O(n log n) time. The approximation error comes from three sources: the deviation of f from its expectation, the inverse linear transform, and the violation fixing procedure. Each part will be analyzed separately. The second part, in particular, might be of independent interest as it involves examining various matrix norms of A 1, which naturally arises from the discrete Laplace mechanism. To establish a lower bound, we employ a reduction from the d-dimensional inner product estimation problem in the framework of two-party differential privacy where an error of Ω( d) is inevitable (Mc Gregor et al., 2010; Haitner et al., 2022). 2. Problem Description Let D .= {1, . . . , d} be a set of d items. Given a multiset of n items from D, for each item ℓ D, let h[ℓ] be the number of times it appears in the multiset. The histogram is a vector h .= ( h[1], . . . , h[d]) [0 . . n]d. The profile of the histogram h is a frequency vector f .= ( f[0], . . . , f[n]) [0, 1]n+1, where f[t] = (1/d) |{ℓ D : h[ℓ] = t}| is the fraction of elements in D which appears t times in the multiset, for each t [0 . . n]. Our goal is to design an efficient and accurate algorithm for estimating the profile f from a private histogram h output by the discrete Laplace mechanism (or geometric mechanism) due to Ghosh, Roughgarden, and Sundararajan (2009). Discrete Laplace Mechanism. Given an input histogram h, the mechanism aims at outputting a perturbed version h of h that is insensitive to the addition/removal of an arbitrary item in the multiset. Specifically, the mechanism creates h by adding independent noises to each coordinate of h, such that h[ℓ] = h[ℓ] + DLap e ε , ℓ D, where DLap (e ε) denotes a random variable following discrete Laplace distribution: Pr [DLap (e ε) = t] = 1 e ε 1+e ε e ε |t|, t Z. In practice, a clipping step can be applied after coordinate perturbation: h[ℓ] = clip h h[ℓ] + DLap e ε , 0, n i , ℓ D. where clip[ h[ℓ] + DLap (e ε) , 0, n] .= max{0, min{ h[ℓ] + DLap (e ε) , n}} ensures the returned value is in [0 . . n]. Privacy Guarantee. The privacy guarantee provided by the mechanism is measured by the framework of differential privacy. Formally, we call two histograms h and h neighboring, written as h h , if ℓ D, s.t., | h[ℓ] h [ℓ]| 1, and ℓ D \ {ℓ}, it holds that h[ℓ ] = h [ℓ ]. The discrete Laplace mechanism ensures that the output histograms have similar distributions for neighboring inputs. Definition 2.1 (ε-Differentially Privacy (Dwork & Roth, 2014)). Given ε > 0, a randomized algorithm A : Nd Z is called ε-differentially private (DP), if for every h, h Nd such that h h , and all (measurable) Z Z, Pr h A( h) Z i eε Pr[A( h ) Z] . (5) Ghosh et al. (2009) showed that the discrete Laplace mechanism is ε-DP (with or without the clipping step). Profile Reconstruction from Private Sketches 3. Preliminaries Definition 3.1 (Circulant Matrix). Let p N+ and c = c0, c1, , cp 1 Cp be a row vector. Define the (right) circular shift operator σ : Cp Cp by σ( c) = cp 1, c0, , cp 2 . The circulant matrix circ circ circ( c) is a matrix whose rows c, σ( c), . . . , σp 1( c) are generated by iteratively applying the operator σ on c. Fact 3.2 (Eigenvectors and Eigenvalues (Gray, 2005)). Let wt .= e 2πt i p , for each t [0 . . p 1]. Then the eigenvalues φ0, . . . , φp 1 and the corresponding eigenvectors v0, . . . , vp 1 of the matrix circ circ circ( c) are given by φt .= c0 + c1(wt)1 + + cp 1(wt)p 1, (6) vt .= 1/ p 1, (wt)1, , (wt)p 1 . (7) Fact 3.3 (Fast Fourier Transform (FFT) (Cormen et al., 2009)). Let V .= [ v0 vp 1] Cp p be the matrix composed of column vectors vt (defined in Equation (7)) for t = 0, 1, . . . , p 1. Then for each x Cp, V x, V 1 x, x V and x V 1 can be computed in O(p log p) time. 4. Near-linear Time Algorithm In this section, we introduce an algorithm that achieves the properties stated in Theorem 1.1. Our presentation is structured as follows. In Section 4.1, we demonstrate that we can transform the histograms published by the discrete Laplace mechanisms with the clipping step into ones without clipping steps, ensuring they follow the same distribution. This allows us to address perturbed histograms with and without clipping in a unified approach. Subsequently, in Section 4.2, we formulate the task of searching for a good estimate of the profile as an optimization problem. Section 4.3 presents a near-linear algorithm for systematically finding an approximate solution to the optimization problem. Sections 4.4 and 4.5 analyze the time complexity and the approximation error of the proposed algorithms, respectively. Finally, Section 4.6 provides a more detailed comparison of our algorithm with the previous approach. 4.1. Unfolding Clipped Histograms Assume that h has been clipped, so that each entry has value between 0 and n. Based on the clipped histogram, we can construct a new histogram that has the same distribution as the an unclipped one. Consider an entry h[ℓ] = clip h h[ℓ] + DLap (e ε), 0, n i for some ℓ D, and the recovery procedure: h[ℓ], if 0 < h[ℓ] < n, h[ℓ] DExp (e ε) , if h[ℓ] = 0, h[ℓ] + DExp (e ε) , if n = h[ℓ], (8) where DExp (e ε) denotes a random variable following the geometric distribution: Pr [DExp (e ε) = t] = (1 e ε) e ε t, t N. Lemma 4.1. h [ℓ] has the same distribution as h[ℓ] + Z, where Z is an independent copy of DLap (e ε). Appendix B contains the detailed proof of the theorem. Here, we present a high-level overview of the proof s intuition: when restricting the two-directional discrete Laplace random variable DLap (e ε) to a single direction, it reduces to a random variable that follows the geometric distribution. It is a well-known fact that for a geometric random variable, truncating it at a certain value and adding an independent copy of the random variable at the truncated value preserves its geometric distribution. This property is commonly referred to as the memorylessness property of the geometric distribution. 4.2. Searching for Profiles A natural attempt would be estimating the profile f with f[0 . . n] .= ( f[0], . . . , f[n]), where f[t] = (1/d) |{ℓ D : h[ℓ] = t}|, t Z. There are several issues with such an estimate. First, f[0 . . n] is not an empirical profile, as its entries do not sum up to 1. An even more significant concern is that such an estimate can incur an error of Ω(1). Example. Consider the example where h[ℓ] = 1, ℓ D. Hence, f[1] = 1 and f[t] = 0 for all t = 1. But, it can be shown that E[ f[1]] = 1 e ε 1+e ε and Var[ f[1]] 1 1+e ε . By Chebyshev s inequality, when ε Θ(1), with constant probability, f[0 . . n] f p | f[1] f[1]| Ω(1), for p = 1, 2, . To deepen our understanding of the relationship between f and f, let us consider a fixed t [0 . . n]. There are f[t] d items in D such that h[ℓ] = t. For each of these items, the random variable h[ℓ] = h[ℓ] + DLap (e ε) is independent and identically distributed. Our analysis considers an algorithm, outlined in Algorithm 1, for generating f directly from f. We emphasize that our final algorithm is the combination of Algorithms 2 and 3, and does not make use of Algorithm 1. Algorithm 1 Private Profile Generator A( r) 1: Initialize an all zero profile v 2: for t [0 . . n] do 3: for j 1 to r[t] d do 4: I t + DLap (e ε) 5: v[I] v[I] + 1/d 6: return v the perturbed profile To search for a good approximation r of f, our strategy shifts towards searching for a r such that A( r) (the vector returned by Algorithm 1 given input r) is close to f. The search is framed as an optimization problem. Our goal is Profile Reconstruction from Private Sketches then to design efficient algorithm for solving the optimization problem, and to prove that the proximity between A( r) and f indeed implies the proximity between r and f. Optimization Problem. Let B R+ be the parameter (as defined in Theorem 1.1), and EB be the event that the all noises introduced by Algorithm 1 are absolutely bounded by B, and LB .= B, UB .= n + B, f .= ( f[LB], . . . , f[UB]). Given p = 1, 2 of , the optimization problem we rely on is given by E A [A( r) | EB] f p t=0 r[t] = 1, 0 r[t] 1, t [0 . . n]. To simplify the notation, when it is clear from the context, we write EA [A( r) | EB] as E [A( r) | EB]. In what follows, we provide motivation for this formulation. Conditioning on EB. Without the event EB, A( r) can have unbounded support, as the noises introduced by Algorithm 1 are unbounded. Conditioned on event EB, only A( r)[LB], . . . , A( r)[UB] process nonzero values. Therefore, E [A( r) | EB] can be viewed as a vector indexed from LB to UB. Further, by proper choice of B, the event EB only occurs with sufficiently small probability. Therefore, the event EB encompasses the majority of cases. Taking the Expectation. Without the expectation, the error A( r) f p is itself a random variable (even conditioned on EB). The objective in (P1) is a relaxation of the expected error, as it provides a lower bound for the latter. To see this, note that p is convex for p 1, and according to Jensen s inequality, we have EA [A( r) | EB] f p EA A( r) f p | EB Matrix Form. We now reformulate the above in matrix form. The primary advantage of this formulation lies in its potential for expressing the objective in special matrix form, facilitating the development of efficient optimization algorithms. Let c Rn+2B+1 be a vector defined as follows: The first B entries are set to 1, e-ε, . . . , e-ε B. The last B entries are set to e-ε B, e-ε (B 1), . . . , e-ε. All other entries are set to 0. Define Pnorm .= 1+e ε 2 (e ε)B+1 1 e ε , and A .= 1 Pnorm circ circ circ( c). (9) To maintain consistency with f, we assume that both rows and columns of A are indexed by [LB . . UB]. The following lemma states that, we can express the objective in (P1) in matrix form. Lemma 4.2. Let A( ) be the Algorithm 1, A be matrix defined as Equation (9), and r = ( r[0], . . . , r[n]) Rn+1 0 be a vector such that P t [0 . . n] r[t] = 1. Assume that we expand the vector r to one in Rn+2B+1 by padding r[ B] = = r[ 1] = 0 to the front and r[n + 1] = = r[n + B] = 0 to the back of r. The following claim holds: E [A( r) | EB] = A r. Based on the above lemma, the optimization problem (P1) is equivalent to the following one: min r Rn+2B+1 t=0 r[t] = 1, 0 r[t] 1, t [0 . . n], r[t] = 0, t [LB . . UB] \ [0 . . n]. An important advantage lies in the potential efficiency gained from the circulant matrix A, as detailed in Section 4.4, enabling A 1 x to be computed in O(n log n) time for each vector x. This motivates the choice of computing A 1 f as the approximate profile r, resulting in zero error in the objective function: A A 1 f f p = 0. However, it s crucial to note that such a solution may potentially violate other constraints outlined in (P2). This observation motivates the development of an algorithm for systematically solving (P2), which will be discussed in the next section. 4.3. Optimization Algorithms In this section, we present effective algorithms to address the optimization problem defined by (P2). Our approach consists of two steps: initially, we concentrate on a relaxed version of (P2) and introduce an algorithm that optimally solves it. Subsequently, building upon the solution obtained in the first step, we introduce a rounding procedure designed to yield an approximate solution for (P2), satisfying all the specified constraints. The proofs of all lemmas in this section are included in the Appendix D. Relaxation. There are two categories of constraints on r in optimization problem (P2): its coordinates should sum up to 1; and each of its coordinates should be within a given range. Optimizing (P2) simultaneously with two constraints is non-trivial, so we initially focus on the first constraint, leading to optimization problem (P3): min A r f p s.t. Pn t=0 r[t] = 1. (P3) The optimization problem (P3) has a simple geometric interpretation. Since A is invertible (as will be shown in Profile Reconstruction from Private Sketches Algorithm 2 Fast Inversion Afst-inv Input: h and p {1, 2, } 1: Compute f based on h 2: u A 1 f 3: Let 10:n be a vector, s.t., 10:n[t] = ( 1, t [0 . . n] 0, otherwise 4: Let a arg max a p=1 10:n A 1 a 5: r u 10:n, u 1 10:n,A 1 a A 1 a 6: return r the estimated profile Lemma 4.6), the function A,p : R[0 . . n] R, defined by v A,p .= A v p, v R[0 . . n]. constitutes a norm. The objective of (P3) is equivalent to A ( r A 1 f) p Hence, (P3) seeks to find the point nearest to A 1 f (measured by the distance induced by the norm A,p) within the hyperplane specified by Pn t=0 r[t] = 1. This inspires Algorithm 2 for solving (P3). First, it computes a vector u = A 1 f. Let 10:n be a binary vector such that 10:n[t] = 1 only when t [0 . . n]. If 10:n, u = 1, the algorithm rectifies this by adjusting u along the direction A 1 a (by an appropriate amount), where a is the unit vector (in terms of the norm p) which maximizes 10:n,A 1 a . In essence, adjusting u along A 1 a is the most cost-effective way to correct the violation 10:n, u = 1. Formally, Lemma 4.3 holds. Lemma 4.3. Algorithm 2 returns an optimal solution for the optimization problem (P3). Rounding. Given a solution r returned by Algorithm 2, Algorithm 3 post-processes it to ensure satisfaction of all constraints in (P2). Algorithm 3 unfolds in three phases: Phase 1 (Line 1), wherein the algorithm sets r[t] to 0 for each t [LB . . UB]\ [0 . . n]; Phase 2 (Line 2 to Line 7), where the algorithm clips r[t] to the range of [0, 1]; and Phase 3 (Line 8 to Line 12), during which the algorithm adjusts r[t] to ensure their sum equals 1. It is essential to observe that Phase 3 will not generate any coordinate outside the range [0, 1]. Moreover, in Phase 3, it is adequate to solely consider decreasing the coordinate values of r, which will be formally explained in the proof of Lemma 4.4. Lemma 4.4. The vector returned by Algorithm 3 satisfies all constraints in the optimization problem (P2). Algorithm 3 Rounding Arnd 1: r[t] 0, t [LB . . UB] \ [0 . . n] 2: T>1 {t [0 . . n] : r[t] > 1} 3: T<0 {t [0 . . n] : r[t] < 0} 4: s>1 P t T>1 (1 r[t]) 5: s<0 P t T<0 (0 r[t]) 6: r[t] 1, t T>1 7: r[t] 0, t T<0 8: s s>1 + s<0. we must have s 0 9: if s > 0 then 10: Let τ 0 be the solution for Pn t=0 min {τ, r[t]} = s 11: r[t] r[t] min {τ, r[t]}, t [0 . . n] 12: end if 13: return r the rounded profile 4.4. Time Analysis In this section, we study the running time of both Algorithm 2 (Theorem 4.5) and Algorithm 3 (Theorem 4.7). Theorem 4.5. Algorithm 2 runs in O(d + n log n) time. The proof of Theorem 4.5 relies heavily on the properties of A as a circulant matrix. Lemma 4.6. The eigenvalues φt, t = B, . . . , n + B of A can be computed in O(n) time. Further, A is invertible and A 1 = V diag φ 1 B, φ 1 B+1, . . . , φ 1 n+B V 1, where diag φ 1 B, φ 1 B+1, . . . , φ 1 n+B is a diagonal matrix, and V .= [ v B vn+B] is the matrix comprising column vectors vt = 1 n+2B+1 1, (wt)1, , (wt)n+2B , wt .= e 2π (t+B) i n+2B+1 , for t = B, . . . , n + B. The proof of Lemma 4.6 is included in the Appendix E. Now we are ready to prove Theorem 4.5. Proof of Theorem 4.5. First, computing the empirical profile f from h takes O(d) time. Next, by fast Fourier transform (Fact 3.3), multiplying a vector with both V and V 1 can be performed in O((n + 2B + 1) log(n + 2B + 1)) = O(n log n) time. Based on the decomposition of A 1 (Lemma 4.6), we see that multiplying A 1 with a vector takes O(n log n) time. Therefore, it suffices to show that a can also be found in O(n log n) time. For consistency, assume that the coordinates of a are indexed from B to n + B and write a = [a B, . . . , an+B]. Then 10:n,A 1 a = Pn+B t= B 10:n,A 1[ , t] at, where A 1[ , t] is the t(th) column of A 1. For convenience, let ct be a short hand for 10:n,A 1[ , t] . Observe that c .= [c B, . . . , cn+B] = 10:n A 1 can be com- puted in O(n log n) time. Once the ct are known, finding a Profile Reconstruction from Private Sketches is equivalent to solving the following optimization problem: max a p=1 c, a . It is straightforward to see that: i) a = sign(ct) et where et is a standard basis vector and t = arg maxt |ct| when p = 1; ii) a = c c 2 when p = 2; iii) a = [sign(c B), . . . , sign(cn+B)] when p = . Theorem 4.7. Algorithm 3 has an implementation which runs in O(n log n) time. The detailed proof of the theorem is included in the Appendix E. At a high level, we can initially sort the r[t] in non-decreasing order in O(n log n) time. Subsequently, we identify the smallest integer t for which r[t ] τ. It can be demonstrated that τ = s Pt 1 t=0 r[t] n+1 t . 4.5. Error Analysis In this section, we analyze the estimation error when we combined algorithms 2 and 3, and show that this results in the estimation error stated in Theorem 1.1. The approximation errors arise from two sources: the difference between the optimal solution of (P3) and the actual profile; and the error introduced during the rounding procedure. We will demonstrate that the errors resulting from the former are of the same order of magnitudes as those indicated in Theorem 1.1, while the latter introduces only additional constants to the error of the former. Formally, the following two lemmas hold. Lemma 4.8 (Error from Optimization (P3)). Given the profile f of a private version h of a histogram h published by the ε-DP discrete Laplace mechanism and a parameter p = 1, 2 or , with probability at least 1 η, Algorithm 2 returns a vector r with the estimation error Errp .= r f p satisfying Inequalities (1,2,3). Lemma 4.9 (Error from Rounding). Given a vector r returned by Algorithm 2, let Errp denote the error before running Algorithm 3, and Err p denote the error afterwards. Then Err p Errp for p = 1, 2, and Err 2 Err . Combing Lemma 4.8 and 4.9 immediately proves the error bounds in Theorem 1.1. In the rest of the section, we present the proofs for Lemma 4.8 and Lemma 4.9 separately. Error from Optimization (P3) Proof of Lemma 4.8. We need to prove that, the closeness between A r and f implies the closeness between r and f. To bound r f p, we observe that r f p = A 1 A r f p A 1 p A r f p A 1 p A r f p + f A f p A 1 p 2 f A f p where the first inequality comes from the definition of matrix norm A 1 p .= max v: v p=1 A 1 v p; the last one from that r is the solution for optimization problem (P3). It remains to bound A 1 p and f A f p separately. Lemma 4.10. Let φt, t = B, . . . , n + B be the eigenvalues of A. Then A 1 1 = A 1 2 + e ε + eε eε e ε 4 e ε B Pnorm, A 1 2 max t 1 |φt| Pnorm (1 + e ε) 1 e ε 2 e ε (B+1) . Lemma 4.11. With probability at least 1 η, it holds that E h f[t] i + 2 1 Pnorm ln n The proof of Lemma 4.10 and Lemma 4.11 are included in the Appendix F. Finally, combining these lemmas, the assumption of B in Theorem 1.1, and that Pnorm = 1+e ε 2 (e ε)B+1 1 e ε finishes the proof. Error from Rounding. The full proof for Lemma 4.9 is provided in the Appendix F. Here is a brief outline of the proof for the case of p = 1 to offer some insight. The proof for the cases of p = 2, are similar but much more technical. The algorithm is divided into three phases: Phrase 1 (Line 1): In this phase, the algorithm rounds r[t] to 0 for each t [ B . . n + B] \ [0 . . n]. Phase 2 (Line 2 to Line 7): In this phase, the algorithm clips the r[t] to the range of [0, 1]. Phase 3 (Line 8 to Line 12): In this phase, the algorithm adjusts the r[t] so that they sum up to 1. It is evident that the quantity r[t] f[t] only decreases in Phases 1 and 2, as f[t] [0, 1]. Further it can be shown that in Phase 2, r f 1 decreases by an amount of s>1+s<0, and in Phase 3, r f 1 can increase by at most s>1 +s<0. Hence overall, r f 1 decreases. 4.6. Comparison with Previous Approach Dwork, Naor, Pitassi, Rothblum, and Yekhanin (2010) also studied the profile estimation (referred to as the t-incidence) problem, with a key difference from our work: their approach aimed for pan-privacy, which resulted in significantly more noise added to the histogram h [0..n]d. To simplify the comparison, we assume a sampling step involved in their method has already been executed, and focus on the Profile Reconstruction from Private Sketches full histogram of sampled items. Let h denote the noisy histogram. For each ℓ [d], their requirement was that max t [0..n] Pr[ h[ℓ] = t] min t [0..n] Pr[ h[ℓ] = t] eε. Specifically, they used noise scaling with n: h[ℓ] h[ℓ]+Z (mod n + 1), where Z is a noise variable such that Pr[Z = t] e ε t/(n+1). In contrast, our paper considers adding symmetric discrete Laplace noise DLap(e ε) to each entry, which is fundamental and widely used for publishing privatized histograms. Our technique is applicable even in the pan-private setting, with appropriately scaled noise. Another minor difference lies in the optimization problem: their linear program (LP) aims to achieve solutions with ℓ error, whereas our algorithm supports ℓ1/ℓ2/ℓ errors. Moreover, they did not provide an almost linear-time algorithm for solving their LP. However, the almost linear-time algorithm developed in our paper can solve their LP as well, as it also involves a circulant matrix. 5. Lower Bound In this section, we discuss the lower bound in Theorem 1.2. We will begin by providing the formal definition of updatable histogram publishing algorithms and the sensitivity of the profile reconstruction algorithms when composed with the private histogram publishing algorithm. To establish the lower bound, we will then present a reduction from the inner product estimation problem in the two-party setting. Our lower bound applies to an algorithm A that is updatable, meaning that there exists an update algorithm U, s.t., given the output A( h) and some other histogram h , U can adjust the output directly (by computing U(A( h), h)), as if the input to A is h + h . Formally, Definition 5.1 (Updatable Algorithm). A (randomized) algorithm A : [0 . . n]d Y is called updatable, if there exists another (randomized) algorithm U : Y [0 . . n]d Y, the update algorithm, such that for every h, h [0 . . n]d, U(A( h), h ) has the same distribution as A( h + h ). Example 5.2. If A is the discrete Laplace mechanism, then it is updatable. Let U be the vector addition operation: U(A( h), h ) = h + DLap (e ε)d + h = A( h + h ), where DLap (e ε)d Zd is a vector composed of independent discrete Laplace random variables. When A represents a randomized algorithm, it can be conceptualized as a function, denoted as FA, acting on both the input of A and the randomness inherent in its execution (commonly referred to as coin tosses in the literature). This randomness is formally represented by the random variable CA. In what follows, denote R : Y [0, 1]n+1 as an algorithm trying to reconstruct the profile of h when given A( h). Definition 5.3. The sensitivity of R A is defined as the maximum deviation of R FA, over all possible realization c A of CA and all possible pair of neighboring inputs of A : R A .= max c A max h h R FA h, c A R FA h , c A , where the second maximum is over all h, h [0 . . n]d such that h h 1 1 In Appendix G, we show that when A is the discrete Laplace mechanism, R is the profile reconstruction algorithm stated in Theorem 1.1, and 1 eε+1 2 ε ec ε , R A satisfies R A o 1 d ε ec ε . Hence, the lower bound in Theorem 1.2 applies to our algorithms. Two Party Differential Privacy. We will prove the lower bound by establishing a connection between the profile estimation problem and the inner product estimation problem in the two party differential privacy setting (Mc Gregor et al., 2010). Two players, Alice and Bob, equipped with unrestricted computational capabilities, collaborate to compute the inner product x, y , where x { 1, 1}d is the binary string held by Alice, and y { 1, 1}d is one held by Bob. The collaboration follows a pre-agreed protocol, denoted as Π. In this protocol, they alternate turns to communicate with each other. During each player s turn, the communicated information is determined by their own vector, the cumulative communication up to that point, and possibly their randomness. The sequences of messages, which collectively form the protocol s transcript, are denoted as Π. To protect the data of both parties, it is required that each party s view of Π, as a function of the other party s data, should be differentially private. Formally, the view of Alice, denoted as Π x : { 1, 1}d Z (where Z is the set of all possible sequences of messages), is regarded as a (randomized) algorithm with input string y when x is fixed. The view Π x is ε-DP, if for each measurable Z Z and every pair of neighboring vector y and y s.t. y y = 1, it holds that Pr [Π x( y) Z] eε Pr [Π x( y ) Z] . We can define the view of Bob Π y : { 1, 1}d Z and the ε-DP property of Π y in a similar manner. The protocol Π is called ε-DP, if Π x is ε-DP for all possible x, and Π y is εDP for all possible y. Mc Gregor et al. (2010) established a lower bound on inner product estimation, which was further refined by Haitner et al. (2022). Profile Reconstruction from Private Sketches Algorithm 4 Protocol P Alice: send m A A( x + 1) to Bob. Bob: on receiving m A, compute r R(U(m A, y + 1))) publish d( r[4] + r[0] r[2] + 3 R A Lap (1/ε)) Fact 5.4 (The Two Party Lower Bound (Haitner et al., 2022)). An ε-DP two-party protocol that, for some ℓ log d, estimate the inner product over {-1, 1}d {-1, 1}d up to an additive error ℓwith probability c ec ϵ ℓ/ d (for some universal constant c > 0), can be used to construct a keyagreement protocol. Since an information theoretically secure key agreement does not exist (Haitner et al., 2022), Fact 5.4 implies that such protocols do not exist either. Sketch Proof of Theorem 1.2. Based on Fact 5.4, we present a sketch proof for the theorem. The complete proof is included in the Appendix G. We will prove the theorem by contraction. Assume there exists an ε-DP and updatable algorithm A : [0 . . n]d Y, profile estimation algorithm R : Y [0, 1]n+1, such that, for every input h [0 . . n]d, E[ f R(A( h)) ] o 1/( We will construct an ε-DP protocol P such that for every x and y, Bob s output (denoted by m B) satisfies Pr [|m B x, y | Err] o(1), (10) where Err = d/(2 c ec ε), contradicting Fact 5.4. The protocol is outlined by Algorithm 4, where Lap (1/ε) denotes a random variable following the Laplace distribution with probability density p(z) = ε 2 exp ( ε |z|), z R. Since x, y {-1, 1}d and x + 1, y + 1 {0, 2}d, it can be verified that x, y = |{ℓ [d] : x[ℓ] = y[ℓ]}| |{ℓ [d] : x[ℓ] = y[ℓ]}| = n ℓ [d] : ( x + 1)[ℓ] = ( y + 1)[ℓ] o n ℓ [d] : ( x + 1)[ℓ] = ( y + 1)[ℓ] o = d ( f[4] + f[0] f[2]). Via the accuracy assumption of R and the sensitivity assumption of R A (in Theorem 1.2), it holds that m B .= d ( r[4] + r[0] r[2] + 3 R A Lap (1/ε)) is an estimate of d ( f[4] + f[0] f[2]) with expected error o( E h m B d ( f[4] + f[0] f[2]) i o( Combing d ( f[4] + f[0] f[2]) = x, y with Markov inequality proves Equation (10). Finally, we verify that P is ε-DP. Clearly A( x + 1) is ε-DP. The output of Bob, d ( r[4] + r[0] r[2] + 3 R A Lap (1/ε)) is also εDP. This is because, by the definition of R A, it serves as an upper bound on the sensitivity of r[4], r[0] and r[2]. We provide some further comments on the lower bound. First, it is also possible to prove a lower bound of E h f R(A( h)) for Theorem 1.2, via a similar reduction to the two-party inner product lower bound accuracy lower bound by Mc Gregor et al. (2010). In particular, they showed that with probability 1 δ, inner product estimation in a two-party setting incurs an additive error of Ω eε . Compared to Equation (4), Equation (11) avoids a constant c blow up in the exponent, at the cost of an addition factor of log d in the denominator. Second, Theorem 1.2 assumes R A o 1 d ε ec ε . We considered removing this assumption because it appears that if R A Ω 1 d ε ec ε , it would imply that E[ f R(A( h)) ] Ω 1 d ec ε for some h. However, this is not the case. Since A is a randomized algorithm, R A is defined over all randomness c A of A as per Definition 5.3. It is possible that R A Ω 1 d ε ec ε for the c A s with very small probabilities. Therefore, we cannot derive a lower bound for E[ f R(A( h)) ]. Finally, our current lower bound applies to an algorithm A that protects the number of occurrences of a single element by 1. It extends naturally to an algorithm A that protects the number of occurrences of a single element by an arbitrary amount, as long as it satisfies the other assumptions in Theorem 1.2. Impact Statement The goal of this work is to advance privacy-preserving data analysis and machine learning in distributed and large-scale settings. Generally speaking this work adds to the toolbox of ways in which we can incorporate privacy into system designs. There are many potential societal consequences of our work, none of which we feel must be specifically highlighted here. Acknowledgements We thank the anonymous reviewers for their feedback which helped improve the paper. The authors are supported by Providentia, a Data Science Distinguished Investigator grant from Novo Nordisk Fonden and affiliated with Basic Algorithms Research Copenhagen (BARC), supported by the VILLUM Foundation grant 16582. Profile Reconstruction from Private Sketches Acharya, J., Das, H., Orlitsky, A., and Suresh, A. T. A unified maximum likelihood approach for estimating symmetric properties of discrete distributions. In International Conference on Machine Learning (ICML), volume 70 of Proceedings of Machine Learning Research, pp. 11 21. PMLR, 2017. Audibert, J., Munos, R., and Szepesv ari, C. Explorationexploitation tradeoff using variance estimates in multiarmed bandits. Theor. Comput. Sci., 410(19):1876 1902, 2009. Blocki, J., Datta, A., and Bonneau, J. Differentially private password frequency lists. In Network and Distributed System Security Symposium (NDSS). The Internet Society, 2016. Chen, J. Y., Indyk, P., and Woodruff, D. P. Space-Optimal Profile Estimation in Data Streams with Applications to Symmetric Functions. In Guruswami, V. (ed.), 15th Innovations in Theoretical Computer Science Conference (ITCS 2024), volume 287 of Leibniz International Proceedings in Informatics (LIPIcs), pp. 32:1 32:22, 2024. Chung, F. and Lu, L. Concentration inequalities and martingale inequalities: a survey. Internet Math., 3(1):79 127, 2006. Cormen, T. H., Leiserson, C. E., Rivest, R. L., and Stein, C. Introduction to Algorithms, 3rd Edition. MIT Press, 2009. ISBN 978-0-262-03384-8. Datar, M. and Muthukrishnan, S. Estimating rarity and similarity over data stream windows. In Proceedings of European Symposium on Algorithms (ESA), pp. 323 335. Springer, 2002. Dwork, C. and Roth, A. The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci., 9(3-4):211 407, 2014. Dwork, C., Mc Sherry, F., Nissim, K., and Smith, A. D. Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography, Third Theory of Cryptography Conference, TCC 2006, Proceedings, volume 3876 of Lecture Notes in Computer Science, pp. 265 284. Springer, 2006. Dwork, C., Naor, M., Pitassi, T., Rothblum, G. N., and Yekhanin, S. Pan-private streaming algorithms. In Proceedings of Innovations in Computer Science (ICS), pp. 66 80, 2010. Now known as ITCS. Ghosh, A., Roughgarden, T., and Sundararajan, M. Universally utility-maximizing privacy mechanisms. In Mitzenmacher, M. (ed.), Proceedings of the 41st Annual ACM Symposium on Theory of Computing, STOC 2009, Bethesda, MD, USA, May 31 - June 2, 2009, pp. 351 360. ACM, 2009. Gray, R. M. Toeplitz and circulant matrices: A review. Found. Trends Commun. Inf. Theory, 2(3), 2005. Haitner, I., Mazor, N., Silbak, J., and Tsfadia, E. On the complexity of two-party differential privacy. In Leonardi, S. and Gupta, A. (eds.), STOC 22: 54th Annual ACM SIGACT Symposium on Theory of Computing, Rome, Italy, June 20 - 24, 2022, pp. 1392 1405. ACM, 2022. Hardt, M. and Talwar, K. On the geometry of differential privacy. In Schulman, L. J. (ed.), Proceedings of Symposium on Theory of Computing (STOC), pp. 705 714. ACM, 2010. Hay, M., Li, C., Miklau, G., and Jensen, D. D. Accurate estimation of the degree distribution of private networks. In International Conference on Data Mining (ICDM), pp. 169 178. IEEE Computer Society, 2009. Hay, M., Rastogi, V., Miklau, G., and Suciu, D. Boosting the accuracy of differentially private histograms through consistency. Proc. VLDB Endow., 3(1):1021 1032, 2010. Karwa, V. and Slavkovic, A. B. Differentially private graphical degree sequences and synthetic graphs. In Privacy in Statistical Databases (PSD), volume 7556 of Lecture Notes in Computer Science, pp. 273 285. Springer, 2012. Manurangsi, P. Tight bounds for differentially private anonymized histograms. In 5th Symposium on Simplicity in Algorithms, pp. 203 213. SIAM, 2022. Mc Gregor, A., Mironov, I., Pitassi, T., Reingold, O., Talwar, K., and Vadhan, S. P. The limits of two-party differential privacy. In 51th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2010, October 23-26, 2010, USA, pp. 81 90. IEEE Computer Society, 2010. Mitzenmacher, M. and Upfal, E. Probability and computing: Randomization and probabilistic techniques in algorithms and data analysis. Cambridge university press, 2017. Suresh, A. T. Differentially private anonymized histograms. In Advances in Neural Information Processing Systems 32, pp. 7969 7979, 2019. Tolstikhin, I. O. Concentration inequalities for samples without replacement. Theory of Probability & Its Applications, 61(3):462 481, 2017. Profile Reconstruction from Private Sketches A. Probability Inequalities Definition A.1 (Lipschitz Condition). Consider an arbitrary set Z. A function : Zd R satisfies the Lipschitz condition with bound c R if, for every i [d] and for every sequence of values x1, . . . , xd Z and yi R, | (x1, x2, . . . , xi 1, xi, xi+1, . . . , xd) (x1, x2, . . . , xi 1, yi, xi+1, . . . , xd)| c. Fact A.2 (Mc Diarmid s Inequality (Mitzenmacher & Upfal, 2017; Tolstikhin, 2017)). Let : Zd R be a function that satisfies the Lipschitz condition with bound c R. Let X1, . . . , Xd be independent random variables from the set Z Then for all λ 0, Pr [ (X1, . . . , Xd) E [ (X1, . . . , Xd)] λ] exp 2λ2 Fact A.3 (Bernstein s Inequality (Chung & Lu, 2006; Audibert et al., 2009)). Let X1, . . . , Xd be independent real-valued random variables such that |Xi| c with probability one. Let Sd = P i [d] Xi and Var [Sd] = P i [d] Var [Xi]. Then for all η (0, 1), |Sd E [Sd]| r 2 Var [Sd] ln 2 η + 2c ln 2 with probability at least 1 η. B. Proofs for Section 4.1 Proof of Lemma 4.1. Our goal is to prove for all t Z, Pr h h [ℓ] = t i = Pr h h[ℓ] + Z = t i . (12) The equation holds trivially for each integer t (0, n). By symmetry, we prove that the equation holds for each integer t n. In this case, Pr h h [ℓ] = t i = Pr DExp e ε = t n Pr DLap e ε n = 1 e ε e ε (t n) 1 + e ε e ε j = 1 e ε e ε (t n) 1 e ε 1 + e ε e ε (n h[ℓ]) 1 + e ε e ε (t h[ℓ]) = Pr h h[ℓ] + Z = t i . C. Proofs for Section 4.2 Proof of Lemma 4.2. Let TDLap (e ε, B) denotes a random variable following truncated Laplace distribution: Pr TDLap e ε, B = t = 1 Pnorm e ε |t|, t [ B . . B], Pnorm = 1 + 2 j=1 e ε j = 1 + 2 e ε (e ε)B+1 1 e ε = 1 + e ε 2 (e ε)B+1 1 e ε . (13) Profile Reconstruction from Private Sketches Conditioned on EB, all random noises introduced by Algorithm 1 follows truncated Laplace distribution. Therefore, A( r)[t] = 1 j=1 1[k+TDLap(e ε,B)=t], t [ B . . 1]. A( r)[t] = 1 j=1 1[k+TDLap(e ε,B)=t], t [0 . . n]. A( r)[t] = 1 j=1 1[k+TDLap(e ε,B)=t], t [n + 1 . . n + B], where 1[ ] is the indicator for some event. Since by assumption, we have r[t] = 0 for all t [ B . . n + B] \ [0 . . n], the above equations can be written in a unified manner: A( r)[t] = 1 j=1 1[k+TDLap(e ε,B) t], (14) where for the ease of notations, we omit the modular operations on the indexes and the events of the indicators. In particular, we assume r[k] .= r[mi(k)], and k + TDLap (e ε, B) t represents the event that mi(k + TDLap (e ε, B)) = t, where ( x, if x [ B . . n + B], ((x + B) mod (n + 2B + 1)) B otherwise . Combining the definition of A, we can verify that E [A( r)[t] | EB] = 1 k= B e ε |k| ( r[t + k] d) 1 Pnorm e ε |k| r[t + k] = A[t, ] r. D. Proofs for Section 4.3 Proof of Lemma 4.3. The proof consists of two parts. First, we show that the returned solution r satisfies the constraint D 10:n, r E = 1: D 10:n, r E = D 10:n, u E 1 D 10:n,A 1 a E A 1 a + = D 10:n, u E D 10:n, u E 1 D 10:n,A 1 a E D 10:n,A 1 a E Next, we need to prove that r is an optimal solution. Consider an arbitrary vector v, s.t., D 10:n, v E = 1. We want to prove that A v f p A r f p . The claim is trivial if A r f p = 0. Otherwise, based on Algorithm 2, observe Profile Reconstruction from Private Sketches A r f = A ( r u) = A D 10:n, u E 1 D 10:n,A 1 a E A 1 a. D 10:n, u E 1 D 10:n,A 1 a E a. Since a p = 1, it holds that A ( r u) A ( r u) p = a. Therefore, 1 D 10:n, u E = D 10:n, v u E = A ( v u) p * 10:n,A 1 A ( v u) A ( v u) p 1 D 10:n, u E = D 10:n, r u E = A ( r u) p * 10:n,A 1 A ( r u) A ( r u) p = A ( r u) p D 10:n,A 1 a E . By the definition of a, we have D 10:n,A 1 a E D 10:n,A 1 A ( v u) A ( v u) p E , which proves that A ( r u) p = A r f p A v f p = A ( v u) p . Proof of Lemma 4.4. When r is inputted to Algorithm 3, it satisfies P t [0 . . n] r[t] = 1. After line 1, it satisfies r[t] = 0, for all t [ B . . n + B] \ [0 . . n]. After line 7, it satisfies r[t] [0, 1], for all t [0 . . n]. However, it may violate is that P t [0 . . n] r[t] should sum up to 1. Since the cumulative change to the sum P t [0 . . n] r[t] is s>1 + s<0, after line 7, we have P t [0 . . n] r[t] 1 = s>1 + s<0. We claim that s>1 + s<0 0. Since s<0 0, the claim holds trivially if s>1 = 0. On the other hand, if s>1 < 0, then there exists some t [0 . . n], such that r[t] = 1 after Line 6. However, after Line 6 and 7, we must have that r[t] [0, 1], t [0 . . n]. Therefore, P t [0 . . n] r[t] 1, and s>1 + s<0 = P t [0 . . n] r[t] 1 0. In case that s>1 + s<0 > 0, the algorithm needs to decreases the r[t] to fix the constraint that P t [0 . . n] r[t] should sum up to 1. Define the function g(τ) .= P t [0 . . n] min {τ, r[t]}. The algorithm searches for the solution τ for which g(τ) = s>1 + s<0. By intermediate value theorem, the solution always exists, as g is a continuous function with respect to τ; 0 = g(0) < s>1 + s<0; 1 + s>1 + s<0 = P t [0 . . n] r[t] = g(1) > s>1 + s<0. Finally the algorithm decreases each r[t] by min {τ, r[t]} to fix the constraint so that they sum up to 1. E. Proofs for Section 4.4 Proof of Lemma 4.6. The proof relies on Fact 3.2. It s worth emphasizing that A is a matrix with row and column indices starting from B (up to n + B), in contrast to the matrix described in Fact 3.2, where indices commence from 0. With appropriate shifting, we observe that the eigenvalues of A are given by Profile Reconstruction from Private Sketches φt .= D A[ B, :], n + 2B + 1 vt E = 1 Pnorm j= B e ε |j+B| wj+B t + j=1 0 wj+B t + j=n+1 e ε |j (n+B+1)| wj+B t = w B t Pnorm j=1 e ε j wj t + j=1 e ε j w j t wn+2B+1 t = w B t Pnorm 1 + (e ε wt)1 (e ε wt)B+1 1 e ε wt + (e ε w 1 t )1 (e ε w 1 t )B+1 1 e ε w 1 t Observe that 1 + (e ε wt)1 (e ε wt)(B+1) 1 e ε wt + (e ε w 1 t )1 (e ε w 1 t )(B+1) 1 e ε w 1 t = 1 + e ε wt e ε (B+1) w B t w1 t 1 e ε wt + e ε w 1 t e ε (B+1) w B t w 1 t 1 e ε w 1 t = 1 + e ε wt e 2ε e ε (B+1) w B t w1 t e ε + e ε w 1 t e 2ε e ε (B+1) w B t w 1 t e ε 1 e ε wt e ε w 1 t + e 2ε = 1 e 2ε e ε (B+1) w B+1 t + w (B+1) t e ε w B t e ε w B t 1 e ε wt e ε w 1 t + e 2ε . It is easy to see that each φt can be computed in O(1) time, therefore all the eigenvalues can be computed in O(n) time. This proves the first part of the lemma. It follows that A V = V diag (φ B, φ B+1, . . . , φn+B), A = V diag (φ B, φ B+1, . . . , φn+B) (V ) 1. Since φt = 0 for all t [ B . . n + B], the inverse of diag (φ B, φ B+1, . . . , φn+B) is given by diag (φ B, φ B+1, . . . , φn+B) 1 = diag φ 1 B, φ 1 B+1, . . . , φ 1 n+B . Therefore, it can be verified that A 1 = V diag φ 1 B, φ 1 B+1, . . . , φ 1 n+B V 1. Proof of Theorem 4.7. All steps in Algorithm 3 clearly run in linear time, with the exception of the search for the τ such that P t [0 . . n] min {τ, r[t]} = s. We show that it can be found in O(n log n) time. First, sort the r[t] in non-decreasing order in O(n log n) time. Now, assume that r[0] r[n]. After this step, our goal is to find the smallest t [0 . . n] such that r[t ] τ. Since t=0 min {τ, r[t]} = t=0 r[t] + (n + 1 t )τ = s, t is the smallest number in [0 . . n] satisfying Pt 1 t=0 r[t] + (n + 1 t ) r[t ] s and therefore can be identified in O(n) time. Once t is known, τ is determined by τ = s Pt 1 t=0 r[t] /(n + 1 t ). Profile Reconstruction from Private Sketches F. Proofs for Section 4.5 Proof of Lemma 4.9. Since f is a vector in Rn+1, we expand it into one in Rn+2B+1 by padding f[ B] = f[ B + 1] = = f[ 1] = 0 to the front and f[n + 1] = f[n + 2] = = f[n + B] = 0 to the back of the vector. After this, f has the same dimension as r, therefore r f p is well-defined. We study how the error r f p changes in Algorithm 3, for p = 1, 2, . To begin with, we divide Algorithm 3 into 3 Phrase 1 (Line 1): in which the algorithm rounds r[t] to 0 for each t [ B . . n + B] \ [0 . . n]. Phase 2 (Line 2 to Line 7), in which the algorithm clips the r[t] to the range of [0, 1]. Phase 3 (Line 8 to Line 12), in which the algorithm adjusts the r[t] so that they sum up to 1. The error only decreases in Phase 2 and 3: Since f[t] = 0 for each t [ B . . n + B] \ [0 . . n], rounding r[t] to 0 for each t [ B . . n + B] \ [0 . . n] only decrease r f p. Since f[t] [0, 1], for each t [0 . . n], clipping the r[t], t [0 . . n] to the range [0, 1] also decreases r f p. It remains to study the change of r f p in Phase 3. We discuss this for p = 1, 2, separately. To distinguish the value of r at different phases of the algorithm, we denote it as r0 before Phase 1, r1 right after Phase 1, r2 right after Phase 2, and r3 right after Phase 3. r3 f 1 r1 f 1 : observe that in Phase 2 r f 1 decreases by an amount of s>1 + s<0 : t [0 . . n] r1[t] f[t] = X r1[t] f[t] + X r1[1] f[t] + X t [0 . . n]\(T>1 T<0) 1 f[t] + r1[t] 1 + X 0 f[t] + 0 r1[t] + X t [0 . . n]\(T>1 T<0) t [0 . . n] r2[t] f[t] s>1 + s<0. In Phase 3, r f 1 can increase by at most s>1 + s<0. To see that, observe that P t [0 . . n] r2[t] 1 = P t [0 . . n] r2[t] P t [0 . . n] r1[t] = s>1 +s<0 = s. After the adjustment, we have r3[t] r2[t] for each t [0 . . n], and P t [0 . . n] r3[t] = 1. Therefore, t [0 . . n] r3[t] f[t] X t [0 . . n] r2[t] f[t] X t [0 . . n] | r2[t] r3[t]| t [0 . . n] ( r2[t] r3[t]) = X t [0 . . n] r2[t] 1 = s>1 + s<0. It follows that X t [0 . . n] r3[t] f[t] X t [0 . . n] r1[t] f[t] s>1 + s<0 ( s>1 + s<0) = 2 s>1 0. Profile Reconstruction from Private Sketches r3 f 2 r2 f : Let bt .= r2[t] f[t] for each t [0 . . n], and bmax .= maxt [0 . . n] bt. It is easy to see that X t [0 . . n] ( r2[t] min {bmax, r2[t]}) X t [0 . . n] On the other hand, we have X t [0 . . n] ( r2[t] min {τ, r2[t]}) = 1. It follows that P t [0 . . n] min {τ, r2[t]} P t [0 . . n] min {bmax, r2[t]} , and hence τ bmax. r3 f = max t [0 . . n] r2[t] min {τ, r2[t]} f[t] max t [0 . . n] r2[t] f[t] + min {τ, r2[t]} r3 f 2 r2 f 2 : without lose of generality, assume that r2[0] r2[1] . . . r2[n]. The adjustment step (line 11) can be replaced by Algorithm 5. Algorithm 5 Iterated Adjustment 1: t 0 2: while t n and s > 0 do 3: τ min r[t], s n t+1 4: r[j] r[j] τ , j [t . . n] 5: s s τ (n t + 1) 6: t t + 1 7: end while We claim that, r f 2 does not increase at each iteration. Fix some iteration t. Note that r[j] does not change for j [0 . . t 1], whereas r[j] decreases by τ for j [t . . n]. Therefore, the total change of r f 2 2 is given by j [t . . n] r[t] τ f[t] 2 X j [t . . n] r[t] f[t] 2 j [t . . n] τ 2 r[t] 2 f[t] τ j [t . . n] r[t] 2 X j [t . . n] j [t . . n] τ j [t . . n] r[t] 1 (n t + 1) τ where the last inequality follows since 0 τ r[j] for each j [t . . n]. We present the proof for Lemma 4.11 first, then the proof for Lemma 4.10. Profile Reconstruction from Private Sketches Proof of Lemma 4.11. Since f is a vector in Rn+1, we expand it into one in Rn+2B+1 by padding f[ B] = f[ B + 1] = = f[ 1] = 0 to the front and f[n + 1] = f[n + 2] = = f[n + B] = 0 to the back of the vector, so that A f is well-defined. Recall that h[ℓ] = h[ℓ] + DLap (e ε) , ℓ D. Since B 1 ε ln 2 d η (eε+1). Then Pr DLap e ε > B = 2 X 1 + e ε e ε |t| = 2 1 e ε 1 + e ε e ε B 1 e ε = 2 e ε B By union bound, h[ℓ] h[ℓ] > B η. Assuming that maxℓ D h[ℓ] h[ℓ] B, f = A( f) can be equivalent viewed as the outcome of running Algorithm 1 with input f, (conditioned on the event EB that all noises added by the algorithm are absolutely bounded by B). It follows by Lemma 4.2 that E h f i = E h A f | EB i = A f. Bounding f A f 1. Under these assumptions, E h f A f 1 i = E h f E h f i 1 f[t] E h f[t] i t= B E h f[t] E h f[t] i i = "r f[t] E h f[t] i 2 # E f[t] E h f[t] i 2 . Consider a t [ B . . n + B]. Conditioned on the event that maxℓ D h[ℓ] h[ℓ] B, the noises added to the h[ℓ] follows independent truncated Laplace distribution. For each ℓ D, define the indicator 1[ h[ℓ]=t] for the event that h[ℓ] = t. We have Pr h 1[ h[ℓ]=t] = 1 i = e ε |t h[ℓ]| where Pnorm = 1+e ε 2 (e ε)B+1 1 e ε . Since f[t] = 1 ℓ D 1[ h[ℓ]=t], we have E f[t] E h f[t] i 2 = Var ℓ D 1[ h[ℓ]=t] ℓ D Var h 1[ h[ℓ]=t] e ε |t h[ℓ]| 1 e ε |t h[ℓ]| d E h f[t] i . Profile Reconstruction from Private Sketches Note that 1 e ε |t h[ℓ]| Pnorm 1 1 Pnorm = 1 1 e ε 1+e ε 2 (e ε)B+1 = 2e ε 2 (e ε)B+1 1+e ε 2 (e ε)B+1 . So the inequality is tight up to a factor of 2e ε 2 (e ε)B+1 1+e ε 2 (e ε)B+1 . It concludes that E h f A f 1 E f[t] E h f[t] i 2 = 1 d E h f[t] i . Further, f A f 1 is a function of the random variables h[ℓ] s : f A f 1 = X t [ B . . n+B] ℓ D 1[ h[ℓ]=t] E h f[t] i . Changing the value of one h[ℓ] affects at most two terms on the right hand side of the equation, each by 1/d. By Definition A.1, f A f 1 satisfies the Lipschitz condition with bound 2/d. Therefore, via Fact A.2, f A f 1 E h f A f 1 d (2/d)2 2 ln 1 Bounding f A f 2. Similarly, E h f A f 2 i = E h f E h f i 2 f[t] E h f[t] i 2 t= B E f[t] E h f[t] i 2 1 d E h f[t] i = Further, f A f 2 is a function of the random variables h[ℓ] : t [ B . . n+B] ℓ D 1[ h[ℓ]=t] E h f[t] i We can view f = (1/d) P ℓ D 1[ h[ℓ]= B], . . . , (1/d) P ℓ D 1[ h[ℓ]=n+B] . Assume that we change the value of one h[ℓ], to obtain a vector f , which differs from f by two coordinates, each by 1/d. Therefore, f f 2 = f A f 2 f A f 2 By Definition A.1, f A f 2 satisfies the Lipschitz condition with bound 2/d. Therefore, via Fact A.2, f A f 2 E h f A f 2 Bounding f A f . To prove the last part, by Bernstein s inequality (Fact A.3), with probability at least 1 1/(nη), Profile Reconstruction from Private Sketches we have f[t] E h f[t] i r 2 Var h f[t] i ln n e ε |t h[ℓ]| 2 E h f[t] i ln n It is not easy to have a non-trivial bound for E h f[t] i , which depends on the histogram h. On the other hand, a trivial bound can be given by E h f[t] i 1 d Pnorm X ℓ D e ε 0 = 1 Pnorm . Proof of Lemma 4.10. We first bound A 1 2, since it is the easier part of the proof. According to Lemma 4.6, A 1 2 = V diag φ 1 B, φ 1 B+1, . . . , φ 1 n+B V 1 2 (15) V 2 diag φ 1 B, φ 1 B+1, . . . , φ 1 n+B 2 V 1 2 . (16) Since V is the discrete Fourier transform matrix with orthonormal column and row vectors, it holds that V 2 = V 1 2 = 1. Moreover, diag φ 1 B, φ 1 B+1, . . . , φ 1 n+B 2 = maxt [ B . . n+B] 1 |φt|. We proceed to show that to bounds maxt [ B . . n+B] 1 |φt| = 1/ mint [ B . . n+B] |φt|. For each t [ B . . n + B], w B t Pnorm 1 e 2ε e ε (B+1) w B+1 t + w (B+1) t e ε w B t e ε w B t 1 e ε wt e ε w 1 t + e 2ε We have w B t = 1. 1 e ε wt e ε w 1 t + e 2ε 1 + 2 e ε + e 2ε = 1 + e ε 2. and 1 e 2ε e ε (B+1) w B+1 t + w (B+1) t e ε w B t e ε w B t 1 e 2ε + e ε (B+1) w B+1 t + w (B+1) t e ε w B t e ε w B t 1 e 2ε e ε (B+1) 2 1 + e ε = 1 e ε 2 e ε (B+1) 1 + e ε . |φt| 1 Pnorm 1 e ε 2 e ε (B+1) (1 + e ε) . To bound A 1 1 and A 1 , we will first show that there two values are equal; then we bound just A 1 . Profile Reconstruction from Private Sketches A 1 1 = A 1 . We will show that, A 1 is also a circulant matrix. Since each column of a circulant matrix has the same ℓ1-norm as each row of the matrix, and since A 1 1 = max t [ B . . n+B] A 1[ , t] 1 , A 1 = max t [ B . . n+B] A 1[t, ] 1 , it follows that A 1 1 = A 1 . It remains to show that A 1 is also a circulant matrix. According to Lemma 4.6, V is the discrete Fourier transform matrix whose rows and columns are indexed from B to n+B. Therefore k, ℓ [ B . . n + B], it holds that V [k, ℓ] = 1 n + 2B + 1 φk+B ℓ n + 2B + 1 e 2π(k+B) (ℓ+B) i V 1[k, ℓ] = 1 n + 2B + 1 φ (ℓ+B) k n + 2B + 1 e 2π(k+B) (ℓ+B) i Since A 1 = V diag φ 1 B, φ 1 B+1, . . . , φ 1 n+B V 1, we can verify that A 1[k, ℓ] = V [k, ] diag φ 1 B, φ 1 B+1, . . . , φ 1 n+B V 1[ , ℓ] = 1 n + 2B + 1 D e 2π(k+B) 0 i n+2B+1 φ 1 B, . . . , e 2π(k+B) (n+2B) i n+2B+1 φ 1 n+B , e 2π0 (ℓ+B) i n+2B+1 , . . . , e 2π(n+2B) (ℓ+B) i = 1 n + 2B + 1 X t [ B . . n+B] e 2π(k ℓ) (t+B) i n+2B+1 φ 1 t . Now it is easy to see that A 1 is a circulant matrix, since for each B k < n + B, we have A 1[k + 1, ℓ+ 1] = A 1[k, ℓ], ℓ [ B . . n + B 1], A 1[k + 1, B] = A 1[k, n + B]. Bounding A 1 . We will show that for each v Rn+2B+1, it holds that A 1 v 1 2 + e ε + eε eε e ε 4 e ε B Pnorm v 1 . Since A is invertible, let x .= A 1 v. It reduces to prove that x 1 2 + e ε + eε eε e ε 4 e ε B Pnorm A x 1 . To finish the proof, we need the following lemma. Lemma F.1. Let x Rn+2B+1 and assume that for each t [ B . . n + B], there exists bt R 0, s.t. |A[t, ] x| bt. The for each t [ B . . n + B], we have | x[t]| e ε + eε eε e ε Pnorm bt + bt 1 + bt+1 e ε + eε | x[t B 1]| + | x[t + B + 1]| + e ε | x[t B]| + e ε | x[t + B]| , where for simplicity we omit the modular operations on the subscripts and indexes. In particular, we define b B 1 .= bn+B, bn+B+1 = b B, and for each t / [ B . . n + B], we let x[t] .= x[((t + B) mod (n + 2B + 1)) + ( B)]. Profile Reconstruction from Private Sketches Based on Lemma F.1, we have t [ B . . n+B] | x[t]| X t [ B . . n+B] Pnorm bt + bt 1+bt+1 e ε B e ε+eε (| x[t B 1]| + | x[t + B + 1]| + e ε | x[t B]| + e ε | x[t + B]|) t [ B . . n+B] eε e ε Pnorm bt 1 + 2 e ε + eε t [ B . . n+B] eε e ε e ε B e ε + eε | x[t]| t [ B . . n+B] 2 + e ε + eε eε e ε Pnorm bt + X t [ B . . n+B] eε e ε | x[t]| . This implies that eε e ε 4 e ε B t [ B . . n+B] | x[t]| X t [ B . . n+B] 2 + e ε + eε eε e ε Pnorm bt, t [ B . . n+B] | x[t]| 2 + e ε + eε eε e ε 4 e ε B Pnorm X t [ B . . n+B] bt, Proof of Lemma F.1. Fix a t [ B . . n + B]. By assumption, we have Pnorm bt Pnorm |A[t, ] x| j [B] e ε j x[t j] + e ε 0 x[t] + X j [B] e ε j x[t + j] On the other hand, since A is a circulant matrix, Pnorm bt 1 Pnorm |A[t 1, ] x| j [B+1] e ε (j 1) x[t j] + e ε x[t] + X j [B 1] e ε (j+1) x[t + j] Pnorm bt+1 Pnorm |A[t + 1, ] x| j [B 1] e ε (j+1) x[t j] + e ε x[t] + X j [B+1] e ε (j 1) x[t + j] Pnorm bt + bt 1 + bt+1 Pnorm A[t, ] x 1 e ε + eε (A[t 1, ] x + A[t + 1, ] x) e ε+eε x[t B 1] + 1 eε e ε+eε e ε B x[t B] + 1 2e ε e ε+eε x[t] + 1 eε e ε+eε e ε B x[t + B] e ε B e ε+eε x[t + B + 1] e ε + eε | x[t]| e ε B e ε + eε | x[t B 1]| + | x[t + B + 1]| + e ε | x[t B]| + e ε | x[t + B]| . It concludes that | x[t]| e ε + eε eε e ε Pnorm bt + bt 1 + bt+1 e ε + eε | x[t B 1]| + | x[t + B + 1]| + e ε | x[t B]| + e ε | x[t + B]| . Profile Reconstruction from Private Sketches G. Proof for Section 5 We show that, the lower bound in Theorem 1.2 applies to our algorithms, if 1 d ε c ec ε , i.e., 1 eε+1 2 ε ec ε . We need the following lemma. Lemma G.1. Let A be the discrete Laplace mechanism, CA = (DLap (e ε))d, FA( h, CA) = h + CA and R be the profile reconstruction algorithm stated in Theorem 1.1. Then R A O 1 Combined with Lemma 4.10 on the matrix norm, and the assumption of B in Theorem 1.1, we have 1/d A 1 O 1 eε 1 2 , which proves our claim. Proof of Lemma G.1. Let h, h [0 . . n]d be two neighboring histograms that differ in just one coordinate by at most 1, i.e., h h 1 1. For every possible realization c A of CA, the histograms h = FA h, c A , h = FA h , c A published by discrete Laplace mechanism also differ in just one coordinate by at most 1, i.e., FA h, c A FA h , c A 1 = h h 1 1. Let f and f be the profiles that corresponds to h and h , respectively. Then f and f differ in at most two coordinates, each by at most 1/d. Applying Afst-inv (Algorithm 2) to f and f the amplify their distance by: Afst-inv( f) Afst-inv( f ) = D 10:n,A 1 f f E D 10:n,A 1 a E A 1 a D 10:n,A 1 f f E D 10:n,A 1 a E D 10:n,A 1 f f E D 10:n,A 1 a E where the last inequality follows since a a p = 1 for p = 1, 2, . We claim that D 10:n,A 1 f f E D 10:n,A 1 a E Since f f is a vector of at most two non-zero coordinates, each bounded by 1/d, D 10:n,A 1 f f E = 1 D 10:n,A 1[ , t] E + D 10:n,A 1[ , t ] E D 10:n,A 1[ , t] E . On the other hand, since it holds for the standard basis vectors that et p = 1, D 10:n,A 1 a E max t D 10:n,A 1 et E = max t D 10:n,A 1[ , t] E . It remains to verify that maxt D 10:n,A 1[ , t] E = 0. This is true, as t A 1[ , t] = D 10:n,A 1 1 E = D 10:n, 1 E = n + 1, Profile Reconstruction from Private Sketches where the equation A 1 1 = 1 holds since A 1 = 1. Combined, we have Afst-inv( f) Afst-inv( f ) O 1 To ease the notation, from now on, denote r = Afst-inv( f) and r .= Afst-inv( f ). We will finally show that, Applying Arnd (Algorithm 3) to r and r the amplify their ℓ distance only by some constant: Arnd( r) Arnd( r ) O( r r ). We investigate the steps of Algorithm 3. First, rounding the coordinates of r and r to the interval [0, 1] only decreases their ℓ distance. Next, the algorithm searches for some thresholds τ and τ , such that X t [0 . . n] ( r[t] min {τ, r[t]}) = 1, t [0 . . n] ( r [t] min {τ , r [t]}) = 1. We claim that τ τ + r r , as X t [0 . . n] ( r [t] min {τ + r r , r [t]}) X t [0 . . n] ( r[t] min {τ, r[t]}) = 1. Via symmetry, we can prove that τ τ + r r . As Algorithm 3 eventually update each r[t] and r [t] by r[t] r[t] min {τ, r[t]} , r [t] r [t] min {τ , r [t]} , after the update, it still holds that | r[t] r [t]| O( r r ), which finishes the proof. Proof of Theorem 1.2. We will prove the theorem by contraction. Assume there exists ε-DP and updatable algorithm A : [0 . . n]d Y, profile estimation algorithm R : Y [0, 1]n+1, such that, for every input h [0 . . n]d, s.t. E[ f R(A( h)) ] o 1/( We will construct an ε-DP protocol P such that such that for every x and y, Bob s output (denoted by m B) satisfies Pr [|m B x, y | Err] o(1), (17) where Err = d/(2 c ec ε), contradicting Fact 5.4. The protocol is given by Algorithm 4. Alice initializes the computation by sending A( x + 1) to Bob. On receiving A( x + 1), Bob updates it with their own data y + 1 by running algorithm U, and the result is then passed to the profile estimation algorithm R. Observe that by the property of updatable algorithm, U(A( x + 1), y + 1)) = A( x + 1 + y + 1). Hence, R U(A( x + 1), y + 1)) is essentially an estimate of the profile f of h .= x + 1 + y + 1. Finally, Bob publishes d ( r[4] + r[0] r[2] + 3 R A Lap (1/ε)). where Lap (1/ε) R is a random variable following the Laplace distribution whose probability density function is given by p(z) = ε 2 exp ( ε |z|), for all z R. It remains to analyze the privacy and utility guarantee of P respectively. Privacy Guarantee. The view of Bob is A( x + 1), which is ε-DP by the assumptions of A. The view of Alice is d ( r[4] + r[0] r[2] + 3 R A Lap (1/ε)). To prove this is ε-DP, it suffices to show that r[4] + r[0] r[2] has sensitivity R A. Assume the randomness CA of A has realization c A. The message A( x + 1) received by Bob therefore can be written as FA( x + 1, c A). After performing the update U(A( x + 1), y + 1)) = A( x + 1 + y + 1), the Profile Reconstruction from Private Sketches message becomes FA( x + 1 + y + 1, c A). Suppose that y is replaced by a neighbor binary vector y such that y y 1 1. Then x + 1 + y + 1 ( x + 1 + y + 1 ) 1 1, and via the definition of R A, we have R FA( x + 1 + y + 1, c A) R FA( x + 1 + y + 1, c A) R A. Accuracy Guarantee. Since x and y are binary vectors, x, y = |{ℓ [d] : ( x + y)[ℓ] = 2}| = d f[2]. Therefore d ( r[4] + r[0] r[2] + 3 R A Lap (1/ε)) d f[2] + f[0] f[2] (18) d r[4] f[4] + r[2] f[2] + r[0] f[0] + 3 R A |Lap (1/ε)| . (19) Since E [|Lap (1/ε)|] O(1/ε), and by assumption, E h r f d 1 ec ε , and R A o 1 d ε ec ε , it concludes that E h d ( r[4] + r[0] r[2] + 3 R A Lap (1/ε)) d f[2] + f[0] f[2] i o Combing d ( f[4] + f[0] f[2]) = x, y with Markov inequality, we see |d ( r[4] + r[0] r[2] + 3 R A Lap (1/ε)) x, y | 2 c1 E [|d ( r[4] + r[0] r[2] + 3 R A Lap (1/ε)) x, y |] = o(1). (23)