# kng_the_knorm_gradient_mechanism__4cfddcb7.pdf KNG: The K-Norm Gradient Mechanism Matthew Reimherr Department of Statistics Pennsylvania State University State College, PA 16802 mreimherr@psu.edu Jordan Awan Department of Statistics Pennsylvania State University State College, PA 16802 awan@psu.edu This paper presents a new mechanism for producing sanitized statistical summaries that achieve differential privacy, called the K-Norm Gradient Mechanism, or KNG. This new approach maintains the strong flexibility of the exponential mechanism, while achieving the powerful utility performance of objective perturbation. KNG starts with an inherent objective function (often an empirical risk), and promotes summaries that are close to minimizing the objective by weighting according to how far the gradient of the objective function is from zero. Working with the gradient instead of the original objective function allows for additional flexibility as one can penalize using different norms. We show that, unlike the exponential mechanism, the noise added by KNG is asymptotically negligible compared to the statistical error for many problems. In addition to theoretical guarantees on privacy and utility, we confirm the utility of KNG empirically in the settings of linear and quantile regression through simulations. 1 Introduction The last decade has seen a tremendous increase in research activity related to data privacy [Aggarwal and Philip, 2008, Lane et al., 2014, Machanavajjhala and Kifer, 2015, Dwork et al., 2017]. This drive has been fueled by an increasing societal concern over the large amounts of data being collected by companies, governments, and scientists. These data often contain vast amounts of personal information, for example DNA sequences, images, voice recordings, electronic health records, and internet usage patterns. Such data allows for great scientific progress by researchers and governments, as well as increasingly curated business strategies by companies. However, the such data also comes with increased risk for privacy breaches, placing greater pressure on institutions to prevent disclosures. Currently, Differential Privacy (DP) [Dwork et al., 2006] is the leading framework for formally quantifying privacy risk. One of the most popular methods for achieving DP is the Exponential Mechanism, introduced by Mc Sherry and Talwar [2007], and used in [Friedman and Schuster, 2010, Wasserman and Zhou, 2010, Blum et al., 2013, Dwork and Roth, 2014]. A major attribute of the exponential mechanism that contributes to its popularity is its flexibility; it can be readily adapted and incorporated into most statistical analyses. In particular, its structure makes it amenable to a wide array of statistical and machine learning problems that are based on minimizing an objective function, so called m-estimators [van der Vaart, 2000, Chapter 5]. Some examples where the exponential mechanism has been used include PCA [Chaudhuri et al., 2013, Awan et al., 2019], hypothesis testing [Canonne et al., 2019], maximum likelihood estimation (related to posterior sampling) [Wang et al., 2015, Minami et al., 2016], and density estimation [Wasserman and Zhou, 2010]. However, examples have arisen [Wang et al., 2015, Awan et al., 2019] where the magnitude of the noise added by the exponential mechanism is substantially higher than other mechanisms. Recently, Research supported in part by NSF DMS 1712826, NSF SES 1853209, and the Simons Institute for the Theory of Computing at UC Berkeley. 33rd Conference on Neural Information Processing Systems (Neur IPS 2019), Vancouver, Canada. Awan et al. [2019], demonstrated that, in a very broad sense, the exponential mechanism adds noise that is not asymptotically negligible relative to the statistical estimation error, which other mechanism are able to achieve in different problems [e.g. Smith, 2011]. In this paper we provide a new mechanism called the K-Norm Gradient Mechanism, or KNG, that retains the flexibility of the exponential mechanism, but with substantially improved utility guarantees. KNG provides a principled approach to developing efficient mechanisms that also perform well in practice. Indeed the Laplace, K-norm, and Private Quantile mechanisms can all be viewed as instantiations of KNG. Here we also use KNG to provide the first mechanism for private quantile regression that we are aware of, which we empirically show is efficient. At a high level, KNG uses a similar perspective to that of the exponential mechanism. In particular, suppose that ℓn(θ; D) is an objective, whose minimizer, ˆθ Rd, is the summary we aim to sanitize. Here D represents the particular database and n the sample size of D. The exponential mechanism aims to release θE based on the density f E(θ) exp{ c0ℓn(θ; D)}, where c0 is a generic constant determined by the sensitivity of ℓn and the desired level of privacy. Conceptually, the idea is to promote sanitized estimates whose utility, as measured by ℓn, is close to that of ˆθ. Unfortunately, Awan et al. [2019], showed that the magnitude of the noise added by the exponential mechanism is often of the same order as the statistical error (as a function of n), resulting in inefficient private estimators. KNG uses a similar perspective, but takes the gradient of ℓn and promotes θ that are close to the solution ℓn(ˆθ) = 0. Since we work with the gradient, we also have the flexibility of choosing a desirable norm, which Awan and Slavkovi c [2018] showed can be tailored to the problem at hand to achieve better utility. The resulting mechanism produces a sanitized θ according to the density fn(θ) exp{ c0 ℓn(θ; D) K}, where K is a general norm on Rd that can be chosen to accommodate the context of the problem. Here we see a connection between KNG and the K-norm mechanism, introduced by Hardt and Talwar [2010]. The terminology is based on the idea of considering a set K which is the convex hull of the sensitivity polytope [Kattis and Nikolov, 2017], and defining K to be the norm such that the ball of radius one is K, i.e. {v Rd | v K = 1} = K. In fact every norm can be generated in this manner, so no there is no loss in generality from using this approach [Awan and Slavkovi c, 2018]. KNG can similarly be viewed as a modification of objective perturbation [Chaudhuri et al., 2011, Kifer et al., 2012]. There, one releases a sanitized estimate, θO, by minimizing2 θO = argminθ Θ ℓn(θ; D) + ωθ b , where b Rd is a random vector with distribution drawn from the K-norm mechanism fb(x) exp{ b K}, and ω R is a fixed constant based on the sensitivity of ℓn and the desired level of privacy3. Equivalently, one has that ℓn( θO; D) + ωb = 0, which implies that θO = ℓ 1 n ( ωb), assuming ℓn is invertible. Using the change of variables formula, this implies that θO has density f O(θ) exp{ ω 1 ℓn(θ) K}|det( 2ℓn(θ))|. With KNG, the second derivative term 2ℓn is not included. Furthermore, there are several technical requirements when working with objective perturbation that KNG sidesteps. In particular, the proof that objective perturbation satisfies DP requires the objective function to be strongly convex and twice differentiable almost everywhere [Chaudhuri et al., 2011, Kifer et al., 2012, Awan and Slavkovi c, 2018]. While we assume strong convexity and a second derivative to prove a utility result in Theorem 3.2, KNG does not require either of these conditions to satisfy DP. This allows the KNG mechanism to be applied in more general situations (such as median estimation and quantile regression, explored in Section 4), and requires fewer calculations to implement. The remainder of this paper is organized as follows. In Section 2 we recall the necessary background on differential privacy and the exponential mechanism. In Section 3 we formally define KNG and 2In fact, objective perturbation minimizes ℓn(θ; D) + cθ θ + ωθ b, where c is a constant. We ignore this regularization term in this discussion for the simplicity of the illustration. 3In Chaudhuri et al. [2011] and Kifer et al. [2012], the ℓ2 norm is used. Awan and Slavkovi c [2018] extend objective perturbation to allow for arbitrary norms. show that it achieves ϵ-DP with nearly the same flexibility as the exponential mechanism. We also provide a general utility result that shows that the noise introduced by KNG is of order Op(n 1), which is negligible compared to the statistical estimation error, which is typically Op(n 1/2). We also show that the noise introduced by KNG is asymptotically from a K-norm mechanism. In section 4 we provide several examples of KNG applied to statistical problems, including mean estimation, linear regression, median/quantile estimation, and quantile regression. We also illustrate the empirical advantages of KNG in the settings of linear and quantile regression through simulations. We conclude in Section 5 by discussing challenges and potential extensions of KNG. 2 Differential Privacy Background Differential privacy (DP), introduced by Dwork et al. [2006] has taken hold as the primary framework for formally quantifying privacy risk. Several versions of DP have been proposed, such as approximate DP [Dwork and Roth, 2014], concentrated DP [Dwork and Rothblum, 2016, Bun and Steinke, 2016], and local DP [Duchi et al., 2013], all of which fit into the axiomatic treatment of formal privacy given by Kifer and Lin [2012]. In this paper, we work with pure ϵ-DP, stated in Definition 2.1. Let Dn denote the collection of all possible databases with n units. The bivariate function δ : Dn Dn R, which maps δ(D, D ) := #{i | Di = D i}, is called the Hamming Distance on Dn. It is easy to verify that δ is a metric on Dn. If δ(D, D ) = 1 then D and D are said to be adjacent. Let f : Dn Θ represent a summary of Dn, and F a σ-algebra on Θ, such that (Θ, F) is a measurable space. A privacy mechanism is a family of probability measures {µD : D Dn} over Θ. Definition 2.1 (Differential Privacy: Dwork et al., 2006). A privacy mechanism {µD : D Dn} satisfies ϵ-Differential Privacy (ϵ-DP) if for all B F and adjacent D, D Dn, µD(B) µD (B) exp(ϵ). The exponential mechanism, introduced by Mc Sherry and Talwar [2007] is a central tool in the design of DP mechanisms [Dwork and Roth, 2014]. In fact every mechanism can be viewed as an instance of the exponential mechanism, by setting the objective function as the log-density of the mechanism. In practice, it is most common to set the objective as a natural loss function, such as an empirical risk. Proposition 2.2 (Exponential Mechanism: Mc Sherry and Talwar, 2007). Let (Θ, F, ν) be a measure space, and let {ℓn(θ; D) : Θ R | D Dn} be a collection of measurable functions indexed by the database D. We say that this collection has a finite sensitivity , if |ℓn(θ; D) ℓn(θ; D )| < , for all adjacent D, D and ν-almost all θ Θ. If R Θ exp( ℓn(θ; D)) dν(θ) < for all D D, then the collection of probability measures {µD | D D} with densities (with respect to ν) f D(θ) exp ϵ ℓn(θ; D) satisfies ϵ-DP. Intuitively, ℓn(θ; D) provides a score quantifying the utility of an output θ for the database D. We use the convention that smaller values of ℓn(θ; D) provide more utility. So, the exponential mechanism places more mass near the minimizers of ℓ, and less mass the higher the value of ℓn(θ; D). 3 The K-Norm Gradient Mechanism In Section 2 we considered an arbitrary measure space, (θ, F, ν), when defining DP and the exponential mechanism. However, here we focus on Rd. The KNG mechanism cannot be defined to quite the generality of the exponential mechanism since we require enough structure on the parameter space to define a gradient. Most applications focus on Euclidean spaces, so this is not a major practical concern, but there could be implications for more complicated nonlinear, discrete, or infinite dimensional settings. Theorem 3.1 (K-Norm Gradient Mechanism (KNG)). Let Θ Rd be a convex set, K be a norm on Rd, and ν be a σ-finite measure on Θ. Let {ℓn(θ; D) : Θ R | D Dn} be a collection of measurable functions, which are differentiable ν almost everywhere. We say that this collection has sensitivity : Θ R+, if ℓn(θ; D) ℓn(θ; D ) K (θ) < , for all adjacent D, D and ν-almost all θ. If R Θ exp( 1 (θ) ℓn(θ; D) K) dν(θ) < for all D D, then the collection of probability measures {µD | D D} with densities (with respect to ν) f D(θ) exp ϵ 2 (θ) satisfies ϵ-DP. Proof. Set eℓn(θ; D) = (θ) 1 ℓn(θ; D) K. Then eℓhas sensitivity 1. By Proposition 2.2, the described mechanism satisfies ϵ-DP. One advantage of this approach over the traditional exponential mechanism is that the sensitivity calculation is often simpler (e.g. quantile regression, subsection 4.5). However, it also has the same intuition as the exponential mechanism. In particular, the optimum, ˆθ, occurs when ℓn(ˆθ) = 0, thus we want to promote solutions that make the gradient close to 0, and discourage ones that make the gradient far from 0. These concepts are closely related to m-estimators, z-estimators, and estimating equations [van der Vaart, 2000, Chapter 5]. Since KNG utilizes the gradient, it links in nicely to optimization methods such as gradient descent. However, it could also suffer from some of the same challenges as gradient descent. Namely, if the objective function has multiple local minima, then KNG will promote output near each these points. For this reason, a great deal of care should be taken with KNG when applying to non-convex objective functions, such as fitting neural networks [Gori and Tesi, 1992]. 3.1 Asymptotic Properties While flexibility of a mechanism is an important concern, ultimately the utility of the output is of primary importance. Awan et al. [2019] showed that for a large class of objective functions, the exponential mechanism introduces noise of magnitude Op(n 1/2), where n is the sample size. For many statistical problems the non-private error rate is also Op(n 1/2) [van der Vaart, 2000, Chapter 5], meaning that the exponential mechanism introduces noise that is not asymptotically negligible. Under similar assumptions, we show in Theorem 3.2 that KNG has aymptotic error Op(n 1), which is asymptotically negligible compared to the statistical error. In fact, Theorem 3.2 shows that the noise introduced is asymptotically from a K-norm mechanism [Hardt and Talwar, 2010, Awan and Slavkovi c, 2018], which generalizes the Laplace mechanism. The assumptions in Theorem 3.2 are chosen to capture a large class of common loss functions, which include many convex empirical risk functions and log-likelihood functions. Mathematically, the assumption that ℓis twice-differentiable and strongly convex allow us to use a one term Taylor expansion of ℓabout ˆθ, and guarantee that the integrating constants converge. The proof of Theorem 3.2 is found in the Supplementary Materials. Theorem 3.2 (Utility of KNG). Let Θ Rd be a convex set, K a norm on Rd, ν a σ-finite measure om Θ, and ℓn(θ) := ℓn(θ; D) be a sequence of objective functions which satisfy the assumptions of Theorem 3.1, with sensitivity (θ). We further assume that 1. n 1ℓn(θ) are twice differentiable (almost everywhere) convex functions and there exists a finite α > 0 such that n 1Hn(θ) has eigenvalues greater than α. for all n and θ Θ; 2. the minimizers satisfy ˆθ θ Rd and n 1Hn(ˆθ) Σ 1 where Σ is a d d positive definite matrix; 3. (θ) is continous in θ, constant in n, and there exists > 0 such that (θ). Assume the base measure, ν, has a bounded, differentiable density g(θ) (with respect to Lebesgue measure) which is strictly positive in a neighborhood of θ . Then the sanitized value θ drawn from the KNG mechanism with privacy parameter ϵ is asymptotically K-norm. That is, the density of Z = n( θ ˆθ) converges to a K-norm distribution, with density (wrt ν) proportional to f(z) exp ϵ 2 (θ ) Σ 1z K The proof of the CLT for the exponential mechanism in Awan et al. [2019], as well as the proof of Theorem 3.2, both rely on a Taylor expansion of the objective function. In both cases, it is assumed that the Hessian converges, when scaled by n, to a positive definite matrix. However, using the original objective function requires two derivatives before the Hessian appears in the Taylor expansion, whereas the use of the gradient only requires one derivative. The consequence of this is that the traditional exponential mechanism results in a quadratic numerator inside the exponent, whereas KNG has a (normed) linear numerator. Asymptotically, this gives an Op(n 1/2) Gaussian noise for the exponential mechanism and an Op(n 1) K-norm noise for KNG. Geometrically, it seems that the use of an objective function which behaves linearly (in absolute value) near the optimum, rather than quadratic, results in better asymptotic utility. By using the normed-gradient, we construct an objective function with this property. The assumptions in Theorem 3.2 are very similar to the assumptions for the CLT in Awan et al. [2019]. So, whenever these properties hold, we know that KNG results in an Op(n 1) privacy noise whereas the exponential mechanism is Op(n 1/2). To further emphasize the importance of this result, we note that the magnitude of the noise introduced for privacy can have a substantial impact on the sample complexity. Asymptotically, KNG requires exactly the same sample size as the non-private estimator, whereas the exponential mechanism requires a constant > 1 multiple of the non-private sample size to achieve the same accuracy. As we see in Section 4, in the problem of quantile regression the assumptions of Theorem 3.2 do not hold, meaning that while we guarantee privacy in that setting, we can t guarantee the utility of the estimator. However, we see in Figure 2 that KNG still introduces asymptotically negligible noise, suggesting that the assumptions of Theorem 3.2 can likely be weakened to accomodate a larger class of objective functions. Remark 3.3. Based on the discussion in Section 1, a result similar to 3.2 may hold for objective perturbation as well. The main issue is dealing with the change of variables factor | det Hn(θ)|, which may or may not contribute to the asymptotic form. We suspect that when both KNG and objective perturbation are applicable (e.g. linear regression, see subsection 4.3), they will have similar performance. However, as KNG does not require a second derivative (or convexity), it is applicable in more settings than objective perturbation (e.g. quantile regression, see subsection 4.5). 4.1 Mean Estimation Mean estimation is one of the simplest statistical tasks, and one of the first to be solved in DP. Assuming bounds on the data, the mean can be estimated by adding Laplace noise [Dwork et al., 2006]. Recently there has been some work developing statistical tools for the mean under differential privacy, such as confidence intervals in the normal model [Karwa and Vadhan, 2017] and hypothesis tests for Bernouilli data [Awan and Slavkovi c, 2018]. We show that KNG recovers the K-norm mechanism when estimating the mean, a generalization of the Laplace mechanism. Let x1, . . . , xn Rd, which we assume are drawn from some population with mean θ . To estimate θ , we use the sum of squares as our objective function: i=1 xi θ 2 2 and ℓn(θ; D) = 2 i=1 (xi θ) = 2n( x θ). Turning to the sensitivity, if we assume that there exists a constant r such that xi K r < for some norm K, then the sensitivity of the gradient is ℓn(θ; D) ℓn(θ; D ) K = 2 x1 x 1 K 2r. Thus the mechanism becomes fn(θ) exp { (nϵ/(4r)) x θ K} , which is exactly a K-norm mechanism [Hardt and Talwar, 2010]. So θ x has mean 0 and standard deviation Op(n 1). Thus, the noise added for privacy is asymptotically negligible compared to the statistical error Op(n 1/2). Remark 4.1. Because the KNG results in a location family in this case, the integrating constant does not depend on the data. So, we do not need to divide ϵ by 2 in the density, and may instead draw from fn(θ) exp nϵ 2r x θ K , which is how the K-norm mechanism is normally stated. 4.2 Linear Regression There has been a great deal of work developing DP methods for linear regression [Zhang et al., 2012, Song et al., 2013, Dwork and Lei, 2009, Chaudhuri et al., 2011, Kifer et al., 2012, Sheffet, 2017]. In this section, we detail how KNG can be used to estimate the coefficients in a linear regression model. We observe pairs of data (xi, yi), where yi R and xi Rd, which we assume are modeled as yi = x i θ + ei, where the errors are iid with mean zero and are uncorrelated with x. Our goal is to estimate θ . To implement KNG, we assume that the data has been pre-processed such that 1 xi 1 and 1 yi 1 for all i = 1, . . . , n. We also assume that θ 1 B. The usual non-private estimator for θ is the least-squares, which minimizes the objective function ℓ(θ; D) = Pn i=1(yi x i θ)2. KNG requires a bound on the sensitivity of ℓn: ℓn(θ; D) ℓn(θ; D ) sup y1,x1,θ 4 (y1 x 1 θ)x1 = sup x1 4(1 + B) x1 . By using the ℓ norm, we get the tightest bound, since x1 1. KNG samples from the density i=1 (yi x i θ)x 1 with respect to the uniform measure on Θ = {θ | θ 1 B}. Remark 4.2. Alternative sensitivity bounds can be obtained by choosing other bounds on x and y. The bound on θ can be removed entirely, allowing to depend on θ. In that case, a nontrivial base measure will be required as the resulting density is not integrable with respect to Lebesgue measure. We prefer to use the given sensitivity bound as it allows a fairer comparison against the exponential mechanism and objective perturbation in subsection 4.3. log_10( Average L2 Distance to Truth ) 100 1000 10000 1e+05 1e+06 1e+07 Non Private Exp Mech Obj Pert KNG Figure 1: Simulation comparing the non-private MLE, exponential mechanism, objective perturbation, and KNG for linear regression. 2.5 2.0 1.5 1.0 0.5 0.0 0.5 log_10( Average L2 Distance to Truth ) 10 100 1000 10000 1e+05 Non Private Exp Mech KNG Figure 2: Simulation comparing the non-private, exponential mechanism, and KNG for quantile regression. 4.3 Linear Regression Simulation In this section, we examine the finite sample performance of the KNG mechanism on linear regression compared to the exponential mechanism and objective perturbation mechanism. KNG samples from the density (1), the exponential mechanism samples from ϵ 2(1 + B)2 i=1 (yi x i θ)2 ! and objective perturbation draws a random vector b from the density f(b) exp ϵ 8(1+B) b , and then finds the optimum of the modified objective: arg min θ 1 1 ℓn(θ; D)+ γ 2 θ θ +θ b, where γ = (exp(ϵ/2) 1) 1(2d) and d is the dimension of the xi s. For all three mechanisms we assume the bound on θ 1 is B = 1. Details on these mechanisms for linear regression can be found in the Supplementary Materials. For the simulations the true regression vector θ R12 is θ = (0, 1, 1+2/11, 1+4/11, . . . , 1 2/11), and so d = 12. For each n in 102, 103, 104, . . . , 107 we run 100 replicates of Algorithm 1 at ϵ = 1. For KNG and exponential mechanism, we draw samples using a one-at-a-time MCMC procedure with 10000 steps. At the end, we compute the average distance over the 100 replicates for each mechanism and for each sample size n. The results are plotted in Figure 1, taking the base 10 log of both axes. At each n value and for each mechanism, the Monte Carlo standard errors are between 0.01380 and 0.02729, in terms of the log-scale used in the plot. The benefit of plotting in this fashion is that it makes it easier to understand the asymptotic behavior of each estimator. Since we know that the estimation error of the non-private MLE is error = Cn 1/2, taking the log of both sides shows that the convergence should appear as a straight line with slope 1/2: log(error) = 1 2 log(n) + log(C), which is the black line in Figure 1. As Awan et al. [2019] showed, the asymptotic estimation error of the exponential mechanism is error = Kn 1/2, where K is a constant greater than C. Taking the log of both sides gives another line with slope 1/2, but with a higher intercept: log(error) = 1 2 log(n) + log(K), which we see in red in Figure 1. On the other hand, for KNG and objective perturbation (based on Remark 3.3) , the asymptotic estimation error is error = Cn 1/2 + Kn 1, which when logged shows that for larger n, the curve approaches the line of the non-private estimation error from above: log(error) = 1 2 log(n) + log(C + Kn 1/2), which is also confirmed in Figure 1. Algorithm 1 Regression Simulation INPUT: n, ϵ, d, θ . 1: Generate X Rn d such that Xi,1 = 1 and Xij iid U( 1, 1) for i = 1, . . . , n and j = 2, . . . , d. 2: Generate independent errors ei N(0, 1) for i = 1, . . . , n. 3: Compute the responses Yi = Xiθ + ei. 4: Set R = maxi |Yi|. 5: Set Y i = Yi/R. 6: Use X and Y to estimate the regression coefficient via the non-private estimator, and each DP mechanism. 7: Multiply the estimates by R to estimate θ . 8: Compute the euclidean distance between the estimate and the true θ for each estimator. OUTPUT: Average distances of the estimates to the true θ . 4.4 Median Estimation Just as in the mean estimation problem, we observe D = (x1, . . . , xn), where xi Rd, and our goal is to estimate the population median. In the case when d = 1, the median can be estimated using the empirical risk function ℓn(θ; D) = Pn i=1 |xi θ|. In general for d 1, we are estimating the geometric median [Minsker et al., 2015], which can be expressed as arg minm E X m , and typically the euclidean norm is used. Now, our objective becomes ℓn(θ; D) = Pn i=1 xi θ . It may be concerning that this objective is not differentiable everywhere, however, KNG only requires that the gradient exist on a set of measure one. The gradient of xi θ in our norm s topology is given by d(θ, xi) := xi θ 1(xi θ), provided that θ = xi. Notice that this gives a direction in Rd since d(θ, xi) = 1. Using the triangle inequality, we see that the sensitivity of the gradient is bounded by 2. So the KNG mechanism for the median can be expressed as i=1 d(θ, xi) Again, the error introduced is Op(n 1), which is negligible compared to the statistical error. 4.5 Quantile Regression For quantile regression as for linear regression, we observe pairs of data (xi, yi), where yi R and xi Rd. We assume that QYi|Xi(τ) = X i θ τ, for all i = 1, . . . , n, where QY |X(τ) is the conditional quantile function of Y given X for 0 < τ < 1, and θ Rp [Hao et al., 2007]. For a given τ, θ τ can be estimated as ˆθτ = arg minθ Pn i=1 ρτ(yi x i θ), where ρτ(z) = (τ 1)z I(z 0)+τz I(z > 0) is called the tiled absolute value function [Koenker and Hallock, 2001]. So, our objective function is ℓn(θ; D) = (τ 1) X yi x i θ (yi x i θ) + τ X yi>x i θ (yi x i θ), with gradient (almost everywhere) ℓn(θ; D) = (τ 1) X yi x i θ ( xi) + τ X yi>x i θ ( xi) = τ yi x i θ xi. We bound the sensitivity as = 2(1 τ)CX, where supx1 x1 CX. Then KNG samples from ϵn 4(1 τ)CX yi x i θ xi We see a few nice benefits of the KNG method in this example. If we were to use ℓn directly in the exponential mechanism, then not only would we expect worse asymptotic performance (as demonstrated in subsection 4.5.1), but we see that the sensitivity calculation for the gradient only requires a bound on X, whereas the sensitivity of ℓn requires bounds on Y , X, and θ . Furthermore, the objective perturbation mechanism cannot be used in this setting, because ℓis not strongly convex, whereas the proofs for objective perturbation [Chaudhuri and Monteleoni, 2009, Chaudhuri et al., 2011, Kifer et al., 2012, Awan and Slavkovi c, 2018] all require strong convexity. In fact, the Hessian of ℓn is zero almost everywhere making the objective perturbation inapplicable. Finally note that if we are only interested in estimating the τ th quantile of a set of real numbers Y1, . . . , Yn, we could set Xi = 1 for all i = 1, . . . , n, in which case KNG samples from fn(θ) exp ϵn 4(1 τ) τ ˆF(θ; Y ) . (3) In fact, this is the Private Quantile algorithm proposed by Smith [2011], who also establish strong utility guarantees for the algorithm; this exercise demonstrates that KNG could provide, or at least contribute to, a more unified framework for developing efficient privacy mechanisms. 4.5.1 Quantile Regression Simulation In this section, we examine the empirical performance of the KNG mechanism on quantile regression compared to the exponential mechanism. KNG samples from the density (2) using the norm and setting CX = 1, and the exponential mechanism samples from fn(θ) exp ϵ 4 max{τ, 1 τ}(1 + B)ℓn(θ; D) . We assume, as in subsection 4.3 that B = 1. Details on the exponential mechanism can be found in the Supplementary Materials. Note that objective perturbation cannot be used in this setting, as discussed in subsection 4.5. For the simulations, we use τ = 1/2 and the true regression vector θ 1/2 R2 is θ 1/2 = (0, 1). For each n in 101, 102, . . . , 105 we run 100 replicates of Algorithm 1 at ϵ = 1. Samples from KNG and the exponential mechanism are obtained using 1000 steps of a one-at-a-time MCMC algorithm. At the end, we compute the average distance over the 100 replicates for each estimator and for each sample size n. The results are plotted in Figure 1, taking the base 10 log of both axes. At each n value and for each mechanism, the monte carlo standard errors are between 0.04403 and 0.06028, in terms of the log-scale. We see in figure 2 that the non-private estimate appears as a straight line with slope 1/2, reflecting the fact that its estimation error is Op(n 1/2). We also see that the exponential mechanism approaches a line with slope 1/2, but with a higher intercept, reflecting that it has increased asymptotic variance. Last, we see that the error of KNG approaches the error line of the non-private estimator, suggesting that KNG has the same asymptotic rate as the non-private estimator. While the utility guarantees of Theorem 3.2 do not apply in this setting, as the objective function is not strongly convex, the santized estimates still achieve ϵ-DP and we see from Figure 2 that, empirically, KNG introduces op(n 1/2) error in this setting as well. This suggests that the assumptions in Theorem 3.2 can likely be weakened, and KNG in fact produces efficient mechanisms for an even broader set of problems than Theorem 3.2 prescribes. 5 Conclusions In this paper we presented a new privacy mechanism, KNG, that maintains much of the flexbility of the exponential mechanism, while having substantially better utility guarantees. These guarantees are similar to those provided by objective perturbation, but privacy can be achieved with far fewer structural assumptions. A major draw back of the mechanism is the same as for gradient descent, which can have trouble with local minima or saddle points. Two interesting open questions concern the finite sample efficiency of KNG vs objective perturbation and if KNG can be adapted or combined with other methods to better handle multiple minima. We also believe that KNG has a great deal of potential for handling infinite dimensional and nonlinear problems. For example, parameter spaces consisting of Hilbert spaces or Riemannian manifolds have structures that allow for the computation of gradients, and which might be amenable to KNG. With Riemannian manifolds, the gradient is often viewed as a linear mapping over tangent spaces, while in Hilbert spaces, the gradient is often treated as a linear functional. A major advantage of KNG over other mechanisms is the direct incorporation of a general K-norm. Awan et al. [2019] showed that the exponential mechanism has major problems over function spaces, which are of interest in nonparametric statistics. These issues could potentially be alleviated by KNG with a careful choice of norm. Many interesting challenges remain in data privacy, especially if there is additional complicated structure in the parameters or data. KNG has strong connections with prior DP mechanisms, especially the exponential mechanism and objective perturbation. Indeed, like nearly every privacy mechanism, KNG can be phrased as very particular type of exponential mechanism, however this doesn t provide insight into why KNG achieves better statistical properties. In particular, a key point is to consider the objective function that motivated the original statistical summary, which, when used with KNG produces sanitized estimators with better statistical performance than the classic implementation of the exponential mechanism. One downside of KNG is the issue of sampling, which is similar to the exponential mechanism in that sampling from these distributions is, in general, non-trivial. We show that for mean and quantile estimation, KNG results in distributions that are efficiently sampled. However, for linear and quantile regression, we used a one-at-a-time MCMC procedure (also used for exponential mechanism). Just like sampling from an posterior distribution, developing a convenient sampling scheme is case-by-case, but often a simple MCMC procedure works well in practice. Acknowledgements This research was supported in part by NSF DMS 1712826, NSF SES 1853209, and NSF SES-153443 to The Pennsylvania State University. The first author is also grateful for the hospitality of the Simons Institute for the Theory of Computing at UC Berkeley. Charu C Aggarwal and S Yu Philip. A general survey of privacy-preserving data mining models and algorithms. In Privacy-preserving data mining, pages 11 52. Springer, 2008. Jordan Awan and Aleksandra Slavkovi c. Differentially private uniformly most powerful tests for binomial data. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems 31, pages 4208 4218. Curran Associates, Inc., 2018. Jordan Awan and Aleksandra Slavkovi c. Structure and sensitivity in differential privacy: Comparing k-norm mechanisms. Ar Xiv e-prints, January 2018. Under Review. Jordan Awan, Ana Kenney, Matthew Reimherr, and Aleksandra Slavkovi c. Benefits and pitfalls of the exponential mechanismwith applications to hilbert spaces and functional pca. In Proceedings of the 36th International Conference on International Conference on Machine Learning, ICML 19, pages 374 384. JMLR.org, 2019. Avrim Blum, Katrina Ligett, and Aaron Roth. A learning theory approach to noninteractive database privacy. Journal of the ACM (JACM), 60(2):12, 2013. Mark Bun and Thomas Steinke. Concentrated differential privacy: Simplifications, extensions, and lower bounds. In TCC, 2016. Clément L Canonne, Gautam Kamath, Audra Mc Millan, Adam Smith, and Jonathan Ullman. The structure of optimal private tests for simple hypotheses. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, pages 310 321. ACM, 2019. Kamalika Chaudhuri and Claire Monteleoni. Privacy-preserving logistic regression. In D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, editors, Advances in Neural Information Processing Systems 21, pages 289 296. Curran Associates, Inc., 2009. Kamalika Chaudhuri, Claire Monteleoni, and D. Sarwate. Differentially private empirical risk minimization. In Journal of Machine Learning Research, volume 12, pages 1069 1109, 2011. Kamalika Chaudhuri, Anand D. Sarwate, and Kaushik Sinha. A near-optimal algorithm for differentially-private principal components. Journal of Machine Learning Research, 14(1):2905 2943, January 2013. ISSN 1532-4435. John C Duchi, Michael I Jordan, and Martin J Wainwright. Local privacy and statistical minimax rates. In 2013 IEEE 54th Annual Symposium on Foundations of Computer Science, pages 429 438. IEEE, 2013. Cynthia Dwork and Jing Lei. Differential privacy and robust statistics. In Proceedings of the Fortyfirst Annual ACM Symposium on Theory of Computing, STOC 09, pages 371 380, New York, NY, USA, 2009. ACM. ISBN 978-1-60558-506-2. doi: 10.1145/1536414.1536466. Cynthia Dwork and Aaron Roth. The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci., 9(3–4):211 407, August 2014. ISSN 1551-305X. doi: 10.1561/0400000042. Cynthia Dwork and Guy N. Rothblum. Concentrated differential privacy. Co RR, abs/1603.01887, 2016. Cynthia Dwork, Frank Mc Sherry, Kobbi Nissim, and Adam Smith. Calibrating Noise to Sensitivity in Private Data Analysis, pages 265 284. Springer Berlin Heidelberg, Berlin, Heidelberg, 2006. ISBN 978-3-540-32732-5. doi: 10.1007/11681878_14. Cynthia Dwork, Adam Smith, Thomas Steinke, and Jonathan Ullman. Exposed! a survey of attacks on private data. Annual Review of Statistics and Its Application, 4(1):61 84, 2017. doi: 10.1146/annurev-statistics-060116-054123. Arik Friedman and Assaf Schuster. Data mining with differential privacy. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 493 502. ACM, 2010. Marco Gori and Alberto Tesi. On the problem of local minima in backpropagation. IEEE Transactions on Pattern Analysis & Machine Intelligence, (1):76 86, 1992. Lingxin Hao, Daniel Q Naiman, and Daniel Q Naiman. Quantile regression. Number 149. Sage, 2007. Moritz Hardt and Kunal Talwar. On the geometry of differential privacy. In Proceedings of the Forty-second ACM Symposium on Theory of Computing, STOC 10, pages 705 714, New York, NY, USA, 2010. ACM. ISBN 978-1-4503-0050-6. doi: 10.1145/1806689.1806786. Vishesh Karwa and Salil P. Vadhan. Finite sample differentially private confidence intervals. Co RR, abs/1711.03908, 2017. Assimakis Kattis and Aleksandar Nikolov. Lower Bounds for Differential Privacy from Gaussian Width. In Boris Aronov and Matthew J. Katz, editors, 33rd International Symposium on Computational Geometry (So CG 2017), volume 77 of Leibniz International Proceedings in Informatics (LIPIcs), pages 45:1 45:16, Dagstuhl, Germany, 2017. Schloss Dagstuhl Leibniz-Zentrum fuer Informatik. ISBN 978-3-95977-038-5. doi: 10.4230/LIPIcs.So CG.2017.45. D Kifer, A Smith, and A Thakurta. Private convex empirical risk minimization and high-dimensional regression. Journal of Machine Learning Research, 1:1 41, 01 2012. Daniel Kifer and Bing-Rong Lin. An axiomatic view of statistical privacy and utility. Journal of Privacy and Confidentiality, 4(1):5 49, 2012. Roger Koenker and Kevin F Hallock. Quantile regression. Journal of economic perspectives, 15(4): 143 156, 2001. Julia Lane, Victoria Stodden, Stefan Bender, and Helen Nissenbaum. Privacy, big data, and the public good: Frameworks for engagement. Cambridge University Press, 2014. Ashwin Machanavajjhala and Daniel Kifer. Designing statistical privacy for your data. Commun. ACM, 58:58 67, 2015. Frank Mc Sherry and Kunal Talwar. Mechanism design via differential privacy. In Proceedings of the 48th Annual IEEE Symposium on Foundations of Computer Science, FOCS 07, pages 94 103, Washington, DC, USA, 2007. IEEE Computer Society. ISBN 0-7695-3010-9. doi: 10.1109/FOCS.2007.41. Kentaro Minami, Hiromi Arai, Issei Sato, and Hiroshi Nakagawa. Differential privacy without sensitivity. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, editors, Advances in Neural Information Processing Systems 29, pages 956 964. Curran Associates, Inc., 2016. Stanislav Minsker et al. Geometric median and robust estimation in banach spaces. Bernoulli, 21(4): 2308 2335, 2015. Or Sheffet. Differentially private ordinary least squares. In Doina Precup and Yee Whye Teh, editors, Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 3105 3114, International Convention Centre, Sydney, Australia, 06 11 Aug 2017. PMLR. Adam Smith. Privacy-preserving statistical estimation with optimal convergence rates. In Proceedings of the Forty-third Annual ACM Symposium on Theory of Computing, STOC 11, pages 813 822, New York, NY, USA, 2011. ACM. ISBN 978-1-4503-0691-1. doi: 10.1145/1993636.1993743. Shuang Song, Kamalika Chaudhuri, and Anand D. Sarwate. Stochastic gradient descent with differentially private updates. In in Proceedings of the Global Conference on Signal and Information Processing. IEEE, pages 245 248, 2013. A.W. van der Vaart. Asymptotic Statistics. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, 2000. ISBN 9781107268449. Yu-Xiang Wang, Stephen E. Fienberg, and Alexander J. Smola. Privacy for free: Posterior sampling and stochastic gradient monte carlo. In Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37, ICML 15, pages 2493 2502. JMLR.org, 2015. Larry Wasserman and Shuheng Zhou. A statistical framework for differential privacy. JASA, 105:489: 375 389, 2010. Jun Zhang, Zhenjie Zhang, Xiaokui Xiao, Yin Yang, and Marianne Winslett. Functional mechanism: Regression analysis under differential privacy. Proc. VLDB Endow., 5(11):1364 1375, July 2012. ISSN 2150-8097. doi: 10.14778/2350229.2350253.