# oneshot_differentially_private_topk_selection__e5479253.pdf Oneshot Differentially Private Top-k Selection Gang Qiao 1 Weijie J. Su 2 Li Zhang 3 Being able to efficiently and accurately select the top-k elements with differential privacy is an integral component of various private data analysis tasks. In this paper, we present the oneshot Laplace mechanism, which generalizes the wellknown Report Noisy Max (Dwork & Roth, 2014) mechanism to reporting noisy top-k elements. We show that the oneshot Laplace mechanism with a noise level of e O( k/ε) is approximately differentially private. Compared to the previous peeling approach of running Report Noisy Max k times, the oneshot Laplace mechanism only adds noises and computes the top k elements once, hence much more efficient for large k. In addition, our proof of privacy relies on a novel coupling technique that bypasses the use of composition theorems. Finally, we present a novel application of efficient top-k selection in the classical problem of ranking from pairwise comparisons. 1. Introduction Modern statistical analyses have increasingly relied on sensitive data from individuals and, accordingly, there is a growing recognition that privacy constraints should be incorporated into consideration in data analysis. In response, a mathematically rigorous framework called differential privacy (Dwork et al., 2006a;b) was introduced for privacypreserving data analysis. Roughly speaking, a differentially private procedure ensures that the released information is not influenced significantly by any individual record in the dataset. As a consequence, the privacy of the individuals will not be revealed based on the released information. This paper is concerned with the top-k problem, one of the most important primitives in differential privacy: reporting 1Department of Statistics, University of Michigan, Ann Arbor, MI, USA. 2The Wharton School, University of Pennsylvania, Philadelphia, PA, USA. 3Google Research, Mountain View, CA, USA. Correspondence to: . Proceedings of the 38 th International Conference on Machine Learning, PMLR 139, 2021. Copyright 2021 by the author(s). k items with (approximately) the maximum values among m given values. The problem of privately reporting the k largest elements is an essential building block in many machine learning tasks and has gained continued popularity in the literature (Mc Sherry & Mironov, 2009; Friedman & Schuster, 2010; Banerjee et al., 2012; Mc Sherry & Mironov, 2013; Shen & Jin, 2014; Qin et al., 2016; Bafna & Ullman, 2017; Steinke & Ullman, 2017; Dwork et al., 2018; Durfee & Rogers, 2019). The common peeling solution Hardt & Roth (2013) and Dwork et al. (2018) is by iteratively applying the Report Noisy Max algorithm and then resorting to the composition theorem for computing the privacy loss. In general, this results in the noise level of O(k/ε) for ε pure privacy and e O( k/ε)1 for (ε, δ) privacy loss. While the peeling algorithm has good privacy guarantee, it requires to run Report Noisy Max k times, hence incurring a high computational cost for large k. In this paper, we show that by adapting Report Noisy Max to reporting noisy top-k items, we can still achieve comparable utility but with a much more efficient procedure as we only need to run the selection once. We call the resulted algorithm the oneshot Laplace mechanism. More precisely, in the oneshot Laplace mechanism, we add the Laplace noise to each count and then report the set of items with the top-k noisy counts. In this paper, we show that the oneshot Laplace mechanism can achieve utility comparable to those obtained from the peeling procedure (see Theorems 2.1 and 2.2 for the precise statements). It is relatively straightforward to show that by adding Laplace noise of level k/ε, the mechanism is ε purely differentially private (Theorem 2.1).2 However, it turns out to be much more challenging to show that with e O( k/ε) noise, it is (ε, δ)-differentially private. Indeed, the proof of this fact is the main contribution of our paper (Theorem 2.2). Our proof directly bounds the privacy loss without the help of the composition theorems. The difficulty of this approach is in untangling the complex distribution dependence of the k selected items, as opposed to the condi- 1Throughout the paper, we use e O to hide dependence on logarithmic factors. 2This is probably a folklore but since we could not find a reference, we include its proof for completeness. Oneshot Differentially Private Top-k Selection tional dependence in other existing mechanisms, which allows us to use the advanced composition theorem (Dwork et al. (2010), see also Kairouz et al. (2017)). To deal with this difficulty, we introduce a novel theoretical technique that, in effect, reduces the oneshot problem to a multinomial distribution problem to bypass the use of composition theorems. To shed light on our proof, consider the case of many similar or equal values. This is the case where the true top-k set can be extremely sensitive to the change of input values. In order to privately report the top-k set in this case, we add independent Laplace noises centered at zero to these values, which yields an approximately equal chance that the noisy values of two adjacent inputs will go up or go down , leading to the cancellation of certain first-order terms in the (logarithms of) the probabilities of events and hence a tight control between their ratio. Circumventing composition theorems, however, may have its advantage. Since it relies on the direct analysis so may avoid the slackness introduced by the generic composition theorems. Indeed, there have been recent work on exploring the special properties of the privacy mechanisms to improve upon the generic composition theorem (Abadi et al., 2016; Bun & Steinke, 2016; Dong et al., 2021). One closely related previous work is the oneshot Gumbel mechanism proposed in Durfee & Rogers (2019). In that paper, the authors show that adding Gumbel noise and reporting the top-k items is equivalent to the peeling procedure of the exponential mechanism for reporting the maximum item (Dwork & Roth, 2014). This elegant connection immediately provides privacy guarantees through the well understood composition of exponential mechanisms and can benefit from any improvement of the composition property (Dong et al., 2019). In addition, their mechanism reports the noisy rank of the top-k selections. However, it is important to note that their analysis goes through the composition theorem, whereas our paper takes an entirely different angle by employing a composition-free analysis. The comparison between these two approaches, both in theory and in practice, remains an interesting future work. 1.1. Preliminaries Before continuing, we pause to revisit some basic concepts in differential privacy. Definition 1.1. Data sets D, D are said to be neighbors, or adjacent, if one is obtained by removing or adding a single data item. Differential privacy, sometimes called pure differential privacy now, was first defined and constructed in Dwork et al. (2006b). The relaxation of pure differential privacy defined next is sometimes referred to as approximate differential privacy or (ε, δ)-differential privacy. Definition 1.2 (Differential privacy (Dwork et al., 2006b)). A randomized mechanism M is (ε, δ)-differentially private if for all adjacent D, D , and for any subset of possible outputs S: P(M(D) S) eεP(M(D ) S) + δ. Pure differential privacy is the special case of approximate differential privacy in which δ = 0. In differential privacy problems, privacy law protects individually identifiable data, and the parameters ε and δ in the definition above measure the degree of privacy protected. Let f = f(D) be a statistic on a database D. In any randomized (ε, δ)-differentially private mechanism M, the perturbed response f(D)+Z is reported instead of the true answer f(D), where Z is the random noise to guarantee indistinguishability between two datasets. The sensitivity of the statistic, or query function f, is the largest change in its output when we change a single data item and is defined below. Definition 1.3 (Sensitivity). Let f = (f1, , fm) be m real valued functions that take a database as input. The sensitivity of f, denoted as s, is defined as sf = max D,D max 1 i m |fi(D) fi(D )| , where the maximum is taken over any adjacent databases D and D . In the Laplace Mechanism, the output f(D) is perturbed with noise generated from the Laplace distribution Lap(λ) with probability density function: f Lap(λ)(z) = 1 2λe |z|/λ, where the scale λ should be calibrated to the sensitivity of the statistics f. 2. The Oneshot Laplace Mechanism In this section, we introduce the oneshot Laplace mechanism in full detail, along with its privacy guarantees. Consider the problem of privately reporting the minimum k locations of m values x1, . . . , xm and their estimated values. Here two input values (x1, . . . , xm) and (x 1, . . . , x m) are called adjacent if x x 1, i.e., |xi x i| 1 for all 1 i m. In this definition, x can be considered as the counts of each of m-attributes of the population in some database D and, therefore, changing any individual in D may in the worst case change each count xi by 1. As a special case when k = 1, the solution relies on the Report Noisy Min algorithm (Dwork & Roth, 2014; Dwork et al., 2018), which takes as input a function f, database D, and privacy parameter ε, and outputs the index of the minimum element and its estimated value. The Report Noisy Min algorithm adds independently sampled Lap(2sf/ε) noise to each element of f(D) and reports the index i of the minimum noisy count. The algorithm further reports its estimated value by adding noise freshly sampled from Oneshot Differentially Private Top-k Selection Lap(2sf/ε) to fi (D). In Dwork & Roth (2014); Dwork et al. (2018), the Report Noisy Min algorithm is proved to be (ε, 0)-differentially private. Notably, in order to avoid violation of differential privacy, we shall not report the minimum noisy element as its estimated value. Hence, we need to add fresh random noise to fi (D) in the last step. To efficiently solve the top-k problem where k can be larger than 1, we introduce the oneshot Laplace mechanism Mos, which is one of our main contributions in this paper. In Mos, we add noise Lap(λ) once to each value and report the indices and approximations of the minimum k noisy values (Algorithm 1). Algorithm 1 The Oneshot Laplace Mechanism Mos for Privately Reporting Minimum k Elements Input: database D, functions f = (f1, , fm) with sensitivity sf, parameter k, and the noise scale λ Output: indices i1, . . . , ik and approximations to fi1(D), , fik(D) 1: for i = 1 to m do 2: set yi = fi(D) + gi where gi is sampled i.i.d. from Lap(λ) 3: end for 4: sort y1, . . . , ym from low to high, yi1 yi2 yim 5: return the set {i1, . . . , ik} and fij(D) + g ij, where 1 j k and g ij are fresh independent random noise sampled from Lap(λ) We provide the following theorem for pure differential privacy of the oneshot Laplace mechanism. Its proof uses a standard coupling argument and it is given in the supplementary material. Theorem 2.1. The oneshot Laplace mechanism is (ε, 0)- differentially private if we set λ = 2ksf/ε or larger. However, it is surprisingly challenging to prove the privacy guarantees for the oneshot Laplace mechanism in the approximate differential privacy framework. Here we state the theorem and leave the technical sketches and intuition to Section 4 and the complete proof to the supplementary material. Theorem 2.2 (Privacy guarantees). Given ε 0.2, δ 0.05 and m 2, the oneshot Laplace mechanism is (ε, δ)- differentially private if we set λoneshot = 8sf ε or larger. We remark that ε can be set up to O(log(m/δ)) (Dwork et al., 2015), and the constant in λoneshot is for ease of analysis. We also point out that our result includes a higher multiplicative factor of O( p log(m/δ)) compared to O( p log(1/δ)) in the other results, which only incurs a small constant factor since δ is typically required to be o(1/m). The next result is concerned with the utility of the oneshot Laplace mechanism. Theorem 2.3 (Utility). Let f(1)(D) f(2)(D) f(m)(D) denote the order statistics of the counts. Write := min1 i m 1 f(i+1)(D) f(i)(D) . Then with probability at least p( ) = max 0, 1 (m 1)(2λ + )e /λ the oneshot Laplace mechanism returns the index set of the true top-k elements. The proof of Theorem 2.3 is mainly based on the application of Bonferroni bound and is left to the supplementary material. The form of p( ) guarantees that when the gaps of fi(D) s are significantly large, the oneshot Laplace mechanism can return the true index set of topk elements almost surely. Specifically, when 20λ and m 8 106, then with probability at least p( ) > 0.99 the oneshot Laplace mechanism correctly returns the index set of the true top-k elements. 3. Application to Differentially Private Pairwise Comparison Our work was first motivated by and used for the private false discovery rate control mechanism (Dwork et al., 2018). Here we present another application in ranking n objects from partial binary comparisons, a problem with many important applications in Statistics and Computer Science. Given a large collection of m items, and only part of the comparisons {Xij}1 i =j m between pairs of the m items are revealed. Our goal is to privately recover the set of k items with the highest ranks through the information released by pairwise comparison. One of the most widely used parametric models for pairwise comparison discovered is the Bradley-Terry-Luce (BTL) model (Bradley & Terry, 1952; Luce, 2012). The BTL model was introduced to derive a full ranking when one only has access to pairwise comparison information. The basic idea of the Bradley-Terry-Luce parametric model is to assume that there exists a latent preference score ω i (i = 1, , m) assigned to the m items of interest, and given a pair of items (i, j) from the population, one can estimate the winning probability of item j over item i in the pairwise comparison as Pji = P{item j is preferred over item i} = ω j ω i + ω j . To define a comparison graph G = (V, E) for the Bradley- Oneshot Differentially Private Top-k Selection Terry-Luce model, we define the vertices set V = [m] of the graph G to represent the m items we aim to compare, and each edge (i, j) included in the edge set E indicates that items i and j are compared and the comparison information is included in y. We further assume that the comparison graph G is drawn from the Erdös-Rényi random graph (Erdos & Rényi, 1960), such that each edge between any two vertices is present independently with some probability that captures the fraction of paired items being compared. For each edge (i, j) E, we obtain L independent paired comparison sampled from items i and j, and for the lth comparison y(l) i,j, where 1 l L, we build the pairwise comparison model by assigning ( 1, with probability ω j ω i +ω j , 0, otherwise, (3.1) and y(l) j,i = 1 y(l) i,j for all (i, j) E. The sufficient statistics of this model are given by y := {yi,j|(i, j) E} , where yi,j := 1 L P 1 l L y(l) i,j . We remark that in the regime of differential privacy, the comparison graphs of two adjacent datasets only differ in one data item, i.e., only one sample of a specific edge in the adjacent datasets is different. Two algorithms tailored to the Bradley-Terry-Luce model that attract most attention are the spectral method (rank centrality) and the maximum likelihood estimator method (Chen et al., 2017), the former of which we will focus on due to the applicability of the oneshot Laplace mechanism. To adopt the spectral method, we make use of the pairwise comparison information y to establish a random walk over the graph G by defining its time-independent transition matrix Pm m = [Pij], where Pij = P(Xt+1 = j|Xt = i) is defined as 1 dyi,j if (i, j) E, 1 1 k:(i,k) E yi,k if i = j, 0 otherwise. (3.2) Here d > 0 is some given normalization factor which is on the same order of the maximum vertex degree of graph G, and in general, we can assume that the normalization factor d becomes larger as the number of vertices in graph G grows. The spectral method is summarized in Algorithm 2. The intuition behind the spectral method is based on the fact that, assuming the sample size is sufficiently large, the stationary distribution π of the transition matrix P defined in (3.2) is a reliable estimate of the preference scores [ω 1, ω 2, , ω m] up to some scaling (Chen et al., 2017). We notice that the result derived from the spectral method Algorithm 2 The spectral method for pairwise comparison Input: comparison graph G = ([m], E), sufficient statistics y and the normalization factor d > 0 Output: the rank of {π(i)}i [m] 1: define the defined transition matrix P as in (3.2) 2: compute the stationary distribution π of P 3: sort π1, , πm from low to high, π(1) π(2) π(m) of pairwise comparison only takes advantage of the stationary distribution and, therefore, the oneshot Laplace mechanism can be applied to report top-k elements via pairwise comparison information privately. We pause to introduce some definitions and a lemma to find the sensitivity of the statistic that maps the pairwise information to the stationary distribution. Throughout this paper, the -norm P of a matrix P is its maximum absolute row sum. Definition 3.1 (Ergodicity coefficient of a stochastic matrix (Ipsen & Selee, 2011)). For a m m stochastic matrix A, the ergodicity coefficient τ1(A) of matrix A is defined as τ1(A) sup v 1=1 v e=0 where e is the vector of all ones. Definition 3.2 (Conditional number of a Markov Chain (Cho & Meyer, 2001)). Let P denote the transition probability matrix of an m state Markov chain C, and π denotes the stationary distribution vector. The perturbed matrix P is the transition probability matrix of another n state Markov chain C with stationary distribution vector π. The conditional number κ of a Markov chain C is defined by the following perturbation bound π π κ P P . In Ipsen & Selee (2011), the authors stated the result that for every transition matrix P , the ergodicity coefficient of P always falls between 0 and 1, and τ1(P ) = 1 if and only if the rank of matrix P equals 1. There is also a vast literature on exploring the form of the conditional number κ (Cho & Meyer, 2001). With all these preparations, we will build our private spectral method based on the following conclusion from Seneta (1988) and Cho & Meyer (2001). Lemma 3.3 (Sensitivity of stationary distribution (Seneta, 1988; Cho & Meyer, 2001)). Suppose P and P are m m transition matrices with unique stationary distributions π and π . If the ergodicity coefficient of transition matrix P satisfies τ1(P ) < 1, then π π 1 1 τ1(P ) P P . Motivated by Lemma 3.3, the mapping f has a bound sensitivity when the ergodicity coefficient of the transition ma- Oneshot Differentially Private Top-k Selection trix P is upper bounded by a constant ρ < 1. In light of this observation, we can build our oneshot algorithm based on the following definition. Definition 3.4 (ρ-constrained comparison graph). A comparison graph G = (V, E) is said to be ρ-constrained if: (1) the transition matrix P of the Markov Chain defined as in (3.2) has unique stationary distribution π . (2) there exists a constant ρ < 1 such that τ1(P ) ρ. Definition 3.4 implies that if the comparison graph G defined by the database D is ρ-constrained, then the mapping f : P π has a sensitivity bounded by (1 ρ) 1. Making use of this fact, the oneshot Laplace mechanism for privately reporting the maximum k elements through pairwise comparison information is stated in Algorithm 3. Algorithm 3 The oneshot differentially private spectral method Input: ρ-constrained comparison graph G = ([m], E), sufficient statistics y, parameter L, normalization factor d > 0, k 1 and privacy parameters ε, δ Output: i1, , ik 1: define the transition matrix P as in (3.2) 2: compute the stationary distribution π = (π1(G), , πm(G))) of P 3: apply oneshot Laplace mechanism Mos to π1(G), , πm(G) with noise scale λ to obtain (i1, y1), (i2, y2), , (ik, yk) 4: return the set {i1, . . . , ik} Now we establish the differential privacy of Algorithm 3 in Theorem 3.5 stated below, and the proof is left to the supplementary material. Theorem 3.5. Given ε 0.2 and δ 0.05, assume that the comparison graph G = ([m], E) is ρ-constrained, then Algorithm 3 is (ε, δ)-differentially private if λ = ε or larger, where the sensitivity s = 2 d L(1 ρ). 4. Proofs and Intuition In this section, we introduce a novel technique to prove the privacy of the oneshot Laplace mechanism. At a high level, the proof proceeds by considering the bad events, which have a large probability bias between two neighboring inputs. We show that those bad events happen when the sum of some dependent random variables deviates from its mean. We first partition the event space to remove the dependence between the random variables and, therefore, we can apply a concentration bound directly. Furthermore, we apply a coupling technique to pair up the partitions for the two neighboring inputs. For each pair, we apply a concentration inequality to bound the probability of bad events. The technical tools developed for proving the privacy of the oneshot top-k algorithm in this section can be applied in many other settings. We provide the proof sketch to our main result and leave most of the technical details to the supplementary material. Our goal is to reduce the dependence on k to k for (ε, δ)- differential privacy in the oneshot Laplace mechanism. We note that in the oneshot Laplace mechanism Mos, only a subset of k elements, but not their ordering, is returned. The privacy proof of the oneshot Laplace mechanism crucially depends on this fact. We remark that one can further obtain the relative ranks and scores by running a second ranking phase in either mechanism to the reported k elements. For example, by utilizing the Gaussian mechanism, one can publish more accurate scores, and hence their relative ranks, with the maximum noise of O( k log k) by paying slightly more privacy cost. We start by providing the following lemma that directly establishes the privacy part of the oneshot Laplace mechanism in Theorem 2.2. Lemma 4.1. For any k-subset S of {1, , m} and any adjacent x, x , we have P(Mos(x) S) eεP(Mos(x ) S) + δ . The key idea of the proof is to divide the event space by fixing the kth smallest noisy element j together with the noise value gj. For each partition, whether an element i = j is selected by Mos only depends on whether xi +gi xj +gj, which happens with probability qi = G((xj +gj xi)/λ). Here G denotes the cumulative distribution function of the standard Laplace distribution. As a result, we consider the following mechanism M instead: given (q1, . . . , qm) where 0 < qi < 1, output a subset of indices where each index i is included in the subset with probability qi. In the following proof, we will first understand the sensitivity of qi dependent on the change of xi and then show that M is private with respect to the corresponding sensitivity on q. In order to prove Lemma 4.1, we present the definition of τ-closeness for vectors and two other lemmata we shall also use. Definition 4.2 (τ-closeness for vectors). For two vectors q = (q1, . . . , qm) and q = (q 1, . . . , q m), we say q is τclose with respect to q if for each 1 i m, |qi q i| τqi(1 qi). Lemma 4.3. For any z, z , we have |G(z ) G(z)| 2e|z z||z z|G(z)(1 G(z)), here G denotes the cumulative distribution function of the standard Laplace distribution. The following lemma is the key step that constitutes the privacy guarantee of the mechanism M with respect to the Oneshot Differentially Private Top-k Selection sensitivity on q, and the proof of Lemma 4.1 is largely based on this result combined with Lemma 4.3. Lemma 4.4. Assume C0 = 3.92, C1 = 1.95. Under the conditions ε 0.2, δ 0.05, m 2 and k C0 log(m/δ), and if q is τ-close with respect to q with τ ε C1 k log(m/δ), then for any set S of k-subsets of {1, , m}, we have P(M(q) S) eεP(M(q ) S) + δ/m . The proof of Lemma 4.3 is relatively straightforward and is relegated to the supplementary material. To prove Lemma 4.4, notice that if k C0 log(m/δ), then according to Theorem 2.1, the mechanism is (ε, 0)-private. Thus we assume k C0 log(m/δ). Since S consists of k-sets, we first show that if P i qi k, then P(M(q) S) is small. This can be done by applying the standard concentration bound in the following lemma. Lemma 4.5. Let Z1, . . . , Zm be m independent Bernoulli random variables with P(Zi = 1) = qi. Suppose Pm i=1 qi (1 + t)k for any t > 0. Then P (P exp (1 + t)kh t t+1 , where h(u) = (1 + u) log(1 + u) u. Specifically, by setting K = (1+c p log(m/δ)/k)k and c = 1.9, if we have Pm i=1 qi K, then i=1 Zi k) δ The proof of Lemma 4.5 is based on the classical Bennett s inequality and is left to the supplementary material. By Lemma 4.5, we only need to consider the case of P i qi K, which is more difficult than the case above. We first represent a set S {1, . . . , m} by a binary vector z {0, 1}m and write Pq(z) as Pq(z) = Q i:zi=1 qi Q i:zi=0(1 qi) . Our goal is to show that for any S consisting of weight k vectors in {0, 1}m, z S Pq(z) eε X z S Pq (z) + δ By defining the set S = {z : Pq(z) eεPq (z)}, to prove Lemma 4.4, it suffices to show that P z S Pq(z) δ m. By the form of Pq(z), z S holds if and only if Y i:zi=1 qi Y i:zi=0 (1 qi) eε Y i:zi=1 q i Y i:zi=0 (1 q i) . (4.1) Let i = q i qi. The τ-closeness of q with respect to q implies | i| τqi(1 qi). Taking the logarithm of both sides of (4.1) and rearranging gives X i:zi=1 log(1 + i/qi) + X i:zi=0 log(1 i/(1 qi)) ε . z S Pq(z), we consider independent Bernoulli random variables Z1, . . . , Zm, where for each i, Zi = 1 with probability qi and Zi = 0 with probability 1 qi. We set ζi = Zi log(1 + i/qi) + (1 Zi) log(1 i/(1 qi)). Note that z S Pq(z) = X z 1{P ζi ε}Pq(z) = P where 1{ } denotes the indicator function and the last probability is over the distribution of Z1, . . . , Zm. Combine this with previous arguments, we need to prove that P(P i ζi ε) δ/m. It is easy to check that ζ1 + . . . + ζm has mean Pm i=1 (qi log(1 + i/qi) + (1 qi) log(1 i/(1 qi))) and variance σ2 = Pm i=1 qi(1 qi) log2 1+ i/qi 1 i/(1 qi). To apply Bennett s inequality, we also need to check that the ranges of the centered random variables ζi := ζi qi log(1+ i/qi) (1 qi) log(1 i/(1 qi)) are bounded in absolute value by max1 i m log 1+ i/qi 1 i/(1 qi) . To see why this is true, observe that ζi = (Zi qi) log 1 + i/qi 1 i/(1 qi) log 1 + i/qi 1 i/(1 qi) Therefore, according to Bennett s inequality we can assert that for any t 0, qi log(1 + i qi ) + (1 qi) log(1 i 1 qi ) t with probability at least 1 exp σ2h(At/σ2) A = max 1 i m | log 1 + i/qi 1 i/(1 qi)|. Furthermore, by taking t = ε + Pm i=1 (qi log(1 + i/qi) + (1 qi) log(1 i/(1 qi))), Bennett s inequality implies that P(P i ζi ε) exp σ2h(At/σ2)/A2 . Hence, the case of P i qi K can be established by proving Now we seek to bound t, ε, A and σ using the fact that | i| τqi(1 qi). We start with exploring the relationship between t and ε by applying the standard results that log(1 + u) u , and when |u| 1 2, log(1 + u) u u2 . Notice that Oneshot Differentially Private Top-k Selection ε C1 C0 log(m/δ) 0.2 1.95 3.9 log(2/0.05) < 1 We distinguish two cases. When i 0, we see that qi log(1+ i/qi) > 0 and (1 qi) log(1 i/(1 qi)) < 0. If |qi log(1 + i/qi)| |(1 qi) log(1 i/(1 qi))|, these relations yield that |qi log(1 + i/qi) + (1 qi) log(1 i/(1 qi))| |qi( i/qi + ( i/qi)2) + (1 qi)( i/(1 qi) + ( i/(1 qi))2)| = qi( i/qi)2 + (1 qi)( i/(1 qi))2 τ 2 qi(1 qi)2 + q2 i (1 qi) Similarly, in the case that |qi log(1 + i/qi)| < |(1 qi) log(1 i/(1 qi))|, it follows that |qi log(1 + i/qi) + (1 qi) log(1 i/(1 qi))| |qi( i/qi ( i/qi)2) + (1 qi)( i/(1 qi) ( i/(1 qi))2)| = qi( i/qi)2 + (1 qi)( i/(1 qi))2 τ 2 qi(1 qi)2 + q2 i (1 qi) To proceed, note that P i qi K (1 + c C0 )k, and thus i=1 (qi log(1 + i/qi) + (1 qi) log(1 i/(1 qi))) i=1 |qi log(1 + i/qi) + (1 qi) log(1 i/(1 qi))| i=1 qi 1 + c C0 When i < 0, by using the same arguments, we can also obtain i=1 (qi log(1 + i/qi) + (1 qi) log(1 i/(1 qi))) Making use of the assumption τ ε/(C1 p k log(m/δ)), qi log(1 + i qi ) + (1 qi) log(1 i 1 qi ) 1 + c C0 C2 1 log(m/δ) 1 + 1.9 3.9 1.952 0.2ε log(2/0.05) Rearranging the inequality above gives Furthermore, note that k log(m/δ) ε C1 C0 log(m/δ) 0.2 1.95 3.9 log(2/0.05) < 0.0072 . Hence, log 1 + i/qi 1 i/(1 qi) log 1 + τ(1 qi) τ 1 τqi τ 1 0.0072 < 1.0073τ , which implies A 1.0073τ . (4.3) Combining the relations above implies that i=1 qi(1 qi) log2 1 + i/qi 1 i/(1 qi) 1.00732 m X i=1 qi(1 qi)τ 2 1.509τ 2k . (4.4) Since uh(a/u) is a decreasing function in u, from (4.4) it follows that σ2h(At/σ2) 1.509τ 2kh(At/(1.509τ 2k)) , we set τ = ε/(C1 p k log(m/δ)). Recognizing k C0 log(m/δ), it is clear that At 1.509τ 2k 1.0073 τk = 1.0073 1.95 1.0073 1.95 1.509 1.0213 1 3.9 0.3409 . Finally, by taking advantage of the fact that h(u)/u2 is a Oneshot Differentially Private Top-k Selection decreasing function in u, we see that 1.509τ 2kh(At/(1.509τ 2k)) A2 h(0.3409) 0.34092 At 1.509τ 2k 1.509 h(0.3409) 0.34092 1 1.5092 1.952 t2 log(m/δ) 1.509 h(0.3409) 0.34092 1 1.5092 1.952 (0.9787)2 log m This completes the proof of inequality (4.2), and therefore, also completes the proof of Lemma 4.4. We now prove Lemma 4.1. The proof follows from Lemma 4.3 combined with Lemma 4.4. Proof of Lemma 4.1. Without loss of generality, we assume sf = 1. First, notice that if k < C0 log(m/δ), Therefore, Theorem 2.1 immediately implies the mechanism is (ε, 0)-private. Now we assume k C0 log(m/δ). Throughout we use Jk to denote the random variable of the index of the kth smallest element in terms of the noisy count x, and we use gj to denote the noise added to the jth element of x. We also define J k and g j in terms of x , respectively. For any given Jk = j and the noise gj = g, we have P(i Mos(x)) = G((xj + g xi)/λ) := qi. Setting q = q(g) = (qi) for 1 i m and i = j, and Sj = {s/{j} : s S and j s}, then we have P(Mos(x) S, Jk = j|gj = g) = P(M(q) Sj). Making use of the fact that x x 1, we conclude that for any i, λ x j + g x i λ By Lemma 4.3, this implies 2q(1 q)e| xj +g xi λ x j +g x i λ | xj + g xi λ x j + g x i λ k log(m/δ)q(1 q) λeε/4 C0 log(m/δ)q(1 q) Hence q is 4.014/λ-close with respect to q . We also notice that k log(m/δ) 1.95ε/8 , which implies that q is 0.9785ε C1 k log(m/δ)-close with respect to q . We write P(Mos(x) S, Jk = j|gj = g) and P(Mos(x ) S, J k = j|g j = g) as Px and Px respectively. Applying Lemma 4.4 to q and q with parameters ε and δ, we have Px e0.9785εPx + δ/m . Let lgj(g) stands for the probability when the jth noise is taking value of g. Noting that λ 8 C0 log(m/δ) ε 8 3.9 log(2/0.05) The conclusion is now one step away. To show that the algorithm is (ε, δ)-differentially private, note that P(Mos(x) S) = Z m X j=1 lgj(g)Pxdg j=1 lgj(g) e0.9785εPx + δ/m dg e0.9785ε Z m X lgj(g) lg j(g) lg j(g) Px dg + δ = e0.9785ε Z m X |g x j | |g xj | λ lg j(g) Px dg + δ e0.9785ε+ 1 j=1 lg j(g)Px dg + δ e0.9785ε+ ε 115 Z m X j=1 lg j(g)Px dg + δ < e0.99εP(Mos(x ) S) + δ . Therefore, Theorem 2.2 is proved given the completion of the proof of Lemma 4.1. Oneshot Differentially Private Top-k Selection 5. Discussion In this paper, we provide a theoretical study of the classical top-k problem in the regime of differential privacy. We propose a fast, low-distortion, statistically accurate, and differentially private algorithm to tackle this question, which we refer to as the oneshot Laplace mechanism. We provide a novel coupling technique in proving its privacy without taking advantage of the advanced composition theorems, thereby circumventing the linear dependence on k in the privacy loss compared to the existing results in the literature. We further provide the applications of the oneshot Laplace mechanism in multiple hypothesis testing and pairwise comparison. Our contributions in the theoretical framework have the potential to impact many essential areas in machine learning in both theory and practice. This study leaves a number of open questions that we hope will inspire further work. Through the proof of differential privacy on the oneshot Laplace mechanism, there is nothing to prevent us from achieving better bounds for λ. From a different angle, we wonder if the coupling technique would lead to tighter privacy analysis using other notions of privacy such as concentrated differential privacy, Rényi differential privacy, and Gaussian differential privacy (Dwork & Rothblum, 2016; Bun & Steinke, 2016; Mironov, 2017; Dong et al., 2021; Bu et al., 2020). Finally, an important direction for further research is to obtain sharp asymptotic properties of the oneshot mechanism and use the results to give a more comprehensive comparison between the oneshot Laplace mechanism and existing approaches in the literature. Acknowledgments We are very grateful to Cynthia Dwork for encouragement and discussions on an early version of the manuscript. This work was supported in part by NSF through CAREER DMS-1847415 and CCF-1763314, a Facebook Faculty Research Award, and an Alfred Sloan Research Fellowship. Abadi, M., Chu, A., Goodfellow, I., Mc Mahan, H. B., Mironov, I., Talwar, K., and Zhang, L. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp. 308 318. ACM, 2016. Bafna, M. and Ullman, J. The price of selection in differential privacy. ar Xiv preprint ar Xiv:1702.02970, 2017. Banerjee, S., Hegde, N., and Massoulié, L. The price of privacy in untrusted recommendation engines. In 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 920 927. IEEE, 2012. Bradley, R. A. and Terry, M. E. Rank analysis of incomplete block designs: I. the method of paired comparisons. Biometrika, 39(3/4):324 345, 1952. Bu, Z., Dong, J., Long, Q., and Su, W. J. Deep learning with gaussian differential privacy. Harvard Data Science Review, 2020(23), 2020. Bun, M. and Steinke, T. Concentrated differential privacy: Simplifications, extensions, and lower bounds. In Theory of Cryptography Conference, pp. 635 658. Springer, 2016. Chen, Y., Fan, J., Ma, C., and Wang, K. Spectral method and regularized MLE are both optimal for top-k ranking. ar Xiv preprint ar Xiv:1707.09971, 2017. Cho, G. E. and Meyer, C. D. Comparison of perturbation bounds for the stationary distribution of a markov chain. Linear Algebra and its Applications, 335(1-3):137 150, 2001. Dong, J., Durfee, D., and Rogers, R. Optimal differential privacy composition for exponential mechanisms and the cost of adaptivity. ar Xiv preprint ar Xiv:1909.13830, 2019. Dong, J., Roth, A., and Su, W. J. Gaussian differential privacy. Journal of the Royal Statistical Society, Series B, 2021. to appear. Durfee, D. and Rogers, R. M. Practical differentially private top-k selection with pay-what-you-get composition. In Advances in Neural Information Processing Systems, pp. 3527 3537, 2019. Dwork, C. and Roth, A. The Algorithmic Foundations of Differential Privacy. Foundations and Trends in Theoretical Computer Science. Now Publishers, 2014. Dwork, C. and Rothblum, G. N. Concentrated differential privacy. ar Xiv preprint ar Xiv:1603.01887, 2016. Dwork, C., Kenthapadi, K., Mc Sherry, F., Mironov, I., and Naor, M. Our data, ourselves: Privacy via distributed noise generation. In Annual International Conference on the Theory and Applications of Cryptographic Techniques, pp. 486 503. Springer, 2006a. Dwork, C., Mc Sherry, F., Nissim, K., and Smith, A. Calibrating noise to sensitivity in private data analysis. In Theory of cryptography conference, pp. 265 284. Springer, 2006b. Oneshot Differentially Private Top-k Selection Dwork, C., Rothblum, G. N., and Vadhan, S. Boosting and differential privacy. In Foundations of Computer Science (FOCS), 2010 51st Annual IEEE Symposium on, pp. 51 60. IEEE, 2010. Dwork, C., Su, W., and Zhang, L. Private false discovery rate control. ar Xiv preprint ar Xiv:1511.03803, 2015. Dwork, C., Su, W. J., and Zhang, L. Differentially private false discovery rate control. ar Xiv preprint ar Xiv:1807.04209, 2018. Erdos, P. and Rényi, A. On the evolution of random graphs. Publ. Math. Inst. Hung. Acad. Sci, 5(1):17 60, 1960. Friedman, A. and Schuster, A. Data mining with differential privacy. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 493 502. ACM, 2010. Hardt, M. and Roth, A. Beyond worst-case analysis in private singular vector computation. In Proceedings of the forty-fifth annual ACM symposium on Theory of computing, pp. 331 340. ACM, 2013. Ipsen, I. C. and Selee, T. M. Ergodicity coefficients defined by vector norms. SIAM Journal on Matrix Analysis and Applications, 32(1):153 200, 2011. Kairouz, P., Oh, S., and Viswanath, P. The composition theorem for differential privacy. IEEE Transactions on Information Theory, 63(6):4037 4049, 2017. Luce, R. D. Individual choice behavior: A theoretical analysis. Courier Corporation, 2012. Mc Sherry, F. and Mironov, I. Differentially private recommender systems: Building privacy into the netflix prize contenders. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 627 636. ACM, 2009. Mc Sherry, F. D. and Mironov, I. Differential privacy preserving recommendation, December 31 2013. US Patent 8,619,984. Mironov, I. Rényi differential privacy. In 2017 IEEE 30th Computer Security Foundations Symposium (CSF), pp. 263 275. IEEE, 2017. Qin, Z., Yang, Y., Yu, T., Khalil, I., Xiao, X., and Ren, K. Heavy hitter estimation over set-valued data with local differential privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp. 192 203. ACM, 2016. Seneta, E. Perturbation of the stationary distribution measured by ergodicity coefficients. Advances in Applied Probability, 20(1):228 230, 1988. Shen, Y. and Jin, H. Privacy-preserving personalized recommendation: An instance-based approach via differential privacy. In 2014 IEEE International Conference on Data Mining, pp. 540 549. IEEE, 2014. Steinke, T. and Ullman, J. Tight lower bounds for differentially private selection. In 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS), pp. 552 563. IEEE, 2017.