# from_robustness_to_privacy_and_back__e22e5d0d.pdf From Robustness to Privacy and Back Hilal Asi 1 Jonathan Ullman 2 Lydia Zakynthinou 2 We study the relationship between two desiderata of algorithms in statistical inference and machine learning differential privacy and robustness to adversarial data corruptions. Their conceptual similarity was first observed by Dwork and Lei (STOC 2009), who observed that private algorithms satisfy robustness, and gave a general method for converting robust algorithms to private ones. However, all general methods for transforming robust algorithms into private ones lead to suboptimal error rates. Our work gives the first black-box transformation that converts any adversarially robust algorithm into one that satisfies pure differential privacy. Moreover, we show that for any low-dimensional estimation task, applying our transformation to an optimal robust estimator results in an optimal private estimator. Thus, we conclude that for any low-dimensional task, the optimal error rate for ε-differentially private estimators is essentially the same as the optimal error rate for estimators that are robust to adversarially corrupting 1/ε training samples. We apply our transformation to obtain new optimal private estimators for several high-dimensional tasks, including Gaussian (sparse) linear regression and PCA. Finally, we present an extension of our transformation that leads to approximate differentially private algorithms whose error does not depend on the range of the output space, which is impossible under pure differential privacy. Authors ordered alphabetically. 1Apple 2Khoury College of Computer Sciences, Northeastern University, Boston, Massachusetts, USA. Correspondence to: Hilal Asi , Lydia Zakynthinou . Proceedings of the 40 th International Conference on Machine Learning, Honolulu, Hawaii, USA. PMLR 202, 2023. Copyright 2023 by the author(s). 1. Introduction Both differential privacy and robustness are desirable properties for machine learning or statistical algorithms, and there are extensive, mostly separate bodies of research on each of these properties. Differential privacy (DP) was proposed by Dwork, Mc Sherry, Nissim, and Smith (Dwork et al., 2006) as a rigorous formalization of what it means for an algorithm to guarantee individual privacy, and has been widely deployed in both industry (Erlingsson et al., 2014; Bittau et al., 2017; Apple Differential Privacy Team, 2017; Tastuggine and Mironov, 2020; Wilson et al., 2020; Rogers et al., 2020) and government (Haney et al., 2017; Abowd, 2018; Abowd et al., 2022) applications. Informally, a DP algorithm ensures that no adversary who observes the algorithm s output can learn much more about an individual in the dataset than they could if that individual s data had been excluded. Formally, a randomized algorithm A satisfies ε-DP if for every dataset S, and every dataset S that differs on one, or a small number of entries, the distributions A(S) and A(S ) are ε-close in a precise sense (see Definition 2.1), where the privacy guarantee becomes stronger as ε becomes smaller. Robustness, which was first systematically studied by Tukey and Huber in the 1960s (Tukey, 1960; Huber, 1965), formalizes algorithms that perform well under data corruptions or model misspecifications. Specifically, we consider a dataset S that is drawn iid from some well behaved distribution P, and allow an adversary to produce a dataset S that differs from S in a τ fraction of its entries. An algorithm A is τrobust if the distance A(S) A(S ) is typically small in some particular error norm, where the robustness guarantee becomes stronger as τ becomes larger. Although these two conditions have entirely different motivations, they are both notions of what it means for an algorithm to be insensitive to small changes in its input, which was first observed in the influential work of Dwork and Lei (2009). But even once we recognize their similarity, there are substantial technical differences. While differentially private algorithms are robust, DP is a more stringent requirement in a few ways: First, DP is worst case, meaning the algorithm A must be insensitive in a neighborhood around every dataset S, whereas a robust algorithm only needs to be insensitive in the average case around datasets From Robustness to Privacy and Back S drawn from well behaved distributions P. Second, DP requires that A(S) an A(S ) be close as probability distributions in a strong sense, whereas robustness only requires the distance between outputs A(S) and A(S ) to be small. On the other hand, since DP is harder to satisfy, DP algorithms are typically insensitive to changes in a small number of inputs, whereas robust algorithms can often be insensitive to changes in a small constant fraction of inputs. Although, differential privacy is stronger than robustness, Dwork and Lei (Dwork and Lei, 2009) designed a method, called propose-test-release (PTR), which can be used to turn any accurate robust algorithm for a statistical estimation task into an accurate differentially private algorithm for the same task. However, this generic transformation typically does not lead to optimal algorithms for most specific tasks of interest. Nonetheless, there has been a recent flurry of works in differentially private statistics that use robust estimators as inspiration for differentially private estimators (see Related Work for a summary of this line of work). These works use a variety of methods for upgrading robust estimators to differentially private ones, and each of these methods is tailored to a specific task or set of tasks. In this paper we demonstrate that there is, in fact, a blackbox way to transform robust estimators into private estimators that provably gives optimal error rates for lowdimensional tasks, and often leads to optimal error rates for many high-dimensional tasks. 1.1. Our Contributions In this section we give a high-level overview of our main contributions black-box transformation from robust to private estimators, optimality results for our transformations for low-dimensional estimation tasks, and applications of our transformations to high-dimensional estimation tasks. From robustness to privacy via inverse-sensitivity. Our first main contribution is a black-box transformation that takes a robust algorithm for any statistical task and converts it into an ε-differentially private algorithm for the same task with comparable accuracy. Intuitively, since robust estimators are insensitive to changing nτ inputs on a dataset of size n, and private estimators are insensitive to changing a 1/ε total inputs, the accuracy of the private estimator will be related to the accuracy of the robust estimator when a τ 1/nε fraction of inputs can be corrupted. Our transformation applies in the standard setting of statistical estimation: assume that there exists a distribution P over domain Z Rd and S = (S1, . . . , Sn) is a dataset consisting of n examples drawn independently from P, that is, i [n], Si iid P. We wish to estimate a parameter µ(P) (e.g. the mean µ = Es P [s]) of distribution P. The error of an algorithm A for this task is measured with respect to a norm , i.e., A(S) µ . We use the following short-hand to denote the accuracy guarantees of τ-robust and ε-private estimators of a statistic µ for a distribution P. Definition 1.1 ((τ, β, α)-robust estimator). Let A be a (possibly randomized) algorithm for the estimation of statistic µ. We say that A is a (τ, β, α)-robust estimator for distribution P, if with probability 1 β over dataset S iid P n (and possibly the randomness of the algorithm), for all S that differ in at most nτ points from S, we have that A(S ) µ(P) α. Definition 1.2 ((ε, β, α)-private estimator). Let A be a (possibly randomized) algorithm for the estimation of statistic µ. We say that A is an (ε, β, α)-private estimator for distribution P, if A is ε-DP (Definition 2.1) and with probability 1 β over S iid P n (and possibly the randomness of the algorithm), we have that A(S) µ(P) α. We may refer to such algorithms as (τ, β, α)-robust and (ε, β, α)-private for brevity. Our main theorem shows that any robust algorithm can be transformed into an ε-DP algorithm with the same accuracy guarantees up to constants, as long as the fraction of corruptions τ d log(R) nε , where R is the diameter of the range of the robust algorithm. Theorem 1.3 (Informal, Theorem 3.1). Let n 1, ε (0, 1). Let P be a distribution over Z Rd. Let Arob : Zn {t Rd : t R} be any (τ, β, α)-robust algorithm for the statistic µ(P). Let α0 α. If τ d log(R/α0) + log(1/β) then there exists an (ε, O(β), O(α))-private algorithm Apriv for µ(P). The notation above hides constants. The main idea behind our transformation is to use the inverse-sensitivity mechanism (Asi and Duchi, 2020a). At a high level, for a deterministic algorithm Arob, the inversesensitivity mechanism outputs t with probability (or density) proportional to Pr[MInv(S; Arob) = t] e len Arob(S;t) ε/2 where len Arob(S; t) is the minimum number of entries of S that would have to be corrupted to obtain a dataset S with Arob(S ) = t. This mechanism is an instance of the exponential mechanism (Mc Sherry and Talwar, 2007), and to the best of our knowledge the idea to use len as a score function first appeared in (Johnson and Shmatikov, 2013) for From Robustness to Privacy and Back applications in genomic data, and its general accuracy guarantees were first studied systematically in (Asi and Duchi, 2020a;b). A standard analysis shows that this estimator will output t for which len Arob(S; t) is small, and we can relate the accuracy of such a t to the robustness of the estimator Arob on corruptions of the sample S. We note that our transformation only preserves the accuracy of the robust mechanism, but not its computational efficiency, and an interesting open question is whether one can get a fully black-box, efficiency-preserving transformation from robustness to privacy (see the concurrent and independent work of (Hopkins et al., 2022b) for some progress on this question). We also define and analyze an extension of this transformation, which is based on the restricted exponential mechanism of Brown et al. (Brown et al., 2021), that avoids the dependence on R that appears in Theorem 1.3 above, by relaxing the privacy definition to approximate DP. An equivalence between private and robust estimation. We prove that, up to the factor of d in Theorem 1.3, our transformation is optimal, and in particular is optimal for low-dimensional tasks when d is a constant. That is, for any low-dimensional task, if Arob is an optimal robust estimator, then our transformation of Arob is an optimal private estimator. This is the first result to show a general conversion from robust estimators to optimal private estimators. A consequence of this result is an equivalence between robust and private estimation in low dimensions, which shows that the optimal minimax error rates for ε-DP estimation and for τ-robust estimation for τ 1/nε are essentially the same. Specifically, for a given statistic µ and a family of distributions P over domain Z, we define the minimax error of estimating µ under P for private and robust algorithms as follows. Having fixed β, τ, and ε, the (τ, β)-robust minimax error under P is αrob(τ, β) = inf α {α : (τ, β, α)-robust algorithm P P}, and the (ε, β)-private minimax error under P is αpriv(ε, β) = inf α {α : (ε, β, α)-private algorithm P P}. Theorem 1.4 (Informal, Corollary 4.1). Let P be a family of distributions over R and let µ be a 1-dimensional statistic where |µ(P)| 1 for all P P. Suppose αrob(τ, β) is a continuous function of β for all τ 1/2. Let n > 1, ε = ω(log(n)/n) and τ = Θ(log(n)/nε). Suppose there exists constant c such that the non-private error αrob(0, β) 1 nc for any β 1/4. Then there are constants c1 c2 > 0 such that for βp = 1/nc1 and βr = 1/nc2, αpriv(ε, βp) = Θ (αrob(τ, βr)) . The above theorem extends to any d-dimensional problem with a weaker conclusion roughly αpriv(ε, βp) = Θ (d αrob(τ, βr)), so in particular we obtain the same equivalence for any problem in constant dimension. The theorem follows from the folklore observation that differentially private estimators are also robust (with τ 1/εn). Thus, if we have an optimal differentially private algorithm Apriv we can use Apriv itself as the robust estimator. Thus, we can instantiate the inverse-sensitivity mechanism using Apriv as the robust estimator and apply the inverse-sensitivity mechanism to Apriv to obtain a new private mechanism with similar accuracy. Although this transformation would be a circular way to produce a private estimator, the argument shows that one can always obtain an optimal private estimator by instantiating our transformation with an optimal robust estimator. Applications to high-dimensional private estimation. Although our general optimality result only applies to lowdimensional problems, we show that our transformation often yields optimal error bounds for several high-dimensional tasks as well. By instantiating Theorem 1.3 with existing algorithms for robust estimation, we give ε-DP algorithms for the same tasks. At high level, Theorem 1.3 says that if there is a robust algorithm with accuracy α(τ) as a function of the fraction of corruptions, then there is an ε-DP algorithm with accuracy α(D/nε), where D is the dimension of the parameter we aim to estimate. Since usually α(τ) = O(τ), this implies that the error due to privacy is O(D/nε). In particular, we apply our theorem to give pure differentially private algorithms for Gaussian (sparse and non-sparse) linear regression and subgaussian PCA, which, to the best of our knowledge, are the first optimal algorithms for these tasks satisfying pure (rather than approximate) differential privacy. 1.2. Related Work General transformations between robust and private algorithms. Dwork and Lei (2009) were the first to observe the intuitive connection between differential privacy and robust statistics. That work also introduced a generic framework for differentially private algorithms called proposetest-release (PTR) that can be used to transform any robust estimator into an approximately DP estimator. However, compared to our optimal transformation, the error of the resulting private algorithm will be larger by a factor of 1/ε. An even earlier work of Nissim, Raskhodnikova, and Smith (Nissim et al., 2007) presented a framework called smooth sensitivity that can be used to obtain a similar transformation from robust to pure DP estimators, again losing a factor of 1/ε compared to our transformation. In the other direction, Dwork and Lei (2009) also observed From Robustness to Privacy and Back that differentially private algorithms are robust with certain parameters. However, private estimators are mostly studied in a regime where 1/ε = o(n), so they do not give robustness to corrupting a constant fraction of inputs, which is the most commonly studied regime for robust estimation. More recently, Georgiev and Hopkins (2022) observed that private algorithms with sufficiently small failure probability and privacy parameter are robust to corrupting a constant fraction of inputs, and use this fact to give evidence of computational hardness of certain private estimation tasks. Private estimators inspired by robust estimators. Although prior black-box transformations from robust to private estimators give suboptimal error rates, many optimal private algorithms are nonetheless inspired by methods from robust statistics, albeit with task-specific analyses (Kamath et al., 2019; Bun and Steinke, 2019; Avella-Medina and Brunel, 2019; 2020; Kamath et al., 2020; Liu et al., 2021; Brown et al., 2021; Ghazi et al., 2021; Liu et al., 2022; Tsfadia et al., 2022; Hopkins et al., 2022a; Ashtiani and Liaw, 2022; Kothari et al., 2022; Alabi et al., 2022; Georgiev and Hopkins, 2022). Their algorithms often leverage the structure of specific robust estimators such as medians, highdimensional generalizations of the median, trimmed-means, or sum-of-squared-based certificates of robustness. An elegant work of Liu et al. (2022) proposed a generalization of PTR that can be used to give near-optimal approximate differentially private estimators for many tasks. Although the framework is fairly general, their analysis relies on specific properties of the estimation tasks rather than merely the existence of a robust estimator. Another line of work is that of algorithms that specifically aim to satisfy optimal robustness and privacy guarantees simultaneously for high-dimensional problems (Liu et al., 2021; Ghazi et al., 2021; Alabi et al., 2022). Concurrent work by Hopkins et al. (2022b). In concurrent and independent work, Hopkins et al. (2022b) proposed the same black-box transformation from robust to DP algorithms based on the smooth-inverse-sensitivity mechanism. In contrast to our work, they also show that in some cases their method can be implemented in a computationally efficient way by instantiating the smooth-inverse-sensitivity mechanism with robust estimators based on the sum-ofsquares paradigm. In particular they construct computationally efficient pure DP algorithms for estimating a Gaussian distribution with optimal error. In contrast to their work, we demonstrate that our transformation gives optimal error for low-dimensional problems and establishes a tight connection between privacy and robustness for these problems. 1.3. Organization In Section 2, we provide background on DP and the inversesensitivity mechanism. In Section 3, we present and prove the guarantees of our transformation from robust to pure DP algorithms (and vice versa, for completeness). In Section 4, we show the optimality of our transformation for low-dimensional tasks and the equivalence of robustness and privacy for those tasks. In Section 5, we extend our transformation to convert robust to approximate DP algorithms. In Section 6, we show that our transformation gives us pure DP algorithms with near-optimal error for PCA for subgaussian data, deferring more applications to Appendix C. 2. Preliminaries and Background Additional Notation For a finite set T , we denote its cardinality by card(T ). For any (continuous or discrete) set T , we denote its diameter by diam(T ) = sups,t T s t , where the choice of norm will be clear from context. We denote by d H the Hamming distance between two vectors or datasets. We denote by Bd(v, R) the d-dimensional ball with radius R and center v (with respect to some norm ). We also let Bd(R) = Bd(0d, R) and omit d when it is clear from context. 2.1. Differential Privacy Let S, S Zn be two datasets of size n. We say that S, S are neighboring datasets if d H(S, S ) 1. Differentially private algorithms have indistinguishable output distributions on neighboring datasets. Definition 2.1 (Differential Privacy (Dwork et al., 2006)). A (possibly randomized) algorithm A: Zn T is (ε, δ)- differentially private (DP) if for all neighboring datasets S, S and any measurable output space T T we have Pr[A(S) T] eε Pr[A(S ) T] + δ. We say algorithm A satisfies pure DP if it satisfies the definition for δ = 0, which we denote by ε-DP. Otherwise, we say it satisfies approximate DP. The exponential mechanism (Mc Sherry and Talwar, 2007) is a ubiquitous building block for constructing DP algorithms. The inverse-sensitivity mechanism, on which our transformation is based, is an instantiation of this mechanism. Definition 2.2 (Exponential Mechanism, (Mc Sherry and Talwar, 2007)). Let input data set S Zn, range T , and score function q : Zn T R with sensitivity q = max t T max S :d H(S ,S) 1 |q(S; t) q(S ; t)|. The exponen- tial mechanism selects and outputs an element t T with probability πS(t) e(ε q(S;t)/2 q). The exponential mechanism is ε-DP. From Robustness to Privacy and Back 2.2. Inverse-sensitivity Mechanism Let f : Zn T be a (deterministic) algorithm that we aim to compute on dataset S. Define the path-length function lenf(S; t) := inf S {d H(S, S ) | f(S ) = t} , which is the minimum number of points in S that need to be replaced so that the value of function f on the modified input is t. Given black-box access to function f, the inverse-sensitivity mechanism with input dataset S, denoted by MInv(S; f), is then defined as follows: the probability that MInv(S; f) returns t T is Pr[MInv(S; f) = t] := e lenf (S;t)ε/2 P s T e lenf (S;s)ε/2 . The error of this mechanism in some norm depends on the local modulus of continuity of a function f : Zn T at S Zn with respect to , defined by ωf(S; K) = sup S Zn { f(S) f(S ) : d H(S, S ) K} . For a finite set T , the inverse-sensitivity mechanism has the following guarantees. Theorem 2.3 (Discrete functions, Th.3 (Asi and Duchi, 2020a)). Let f : Zn T and D = diam(T ). Then for any S Zn and β > 0, with probability at least 1 β, the inverse-sensitivity mechanism has error MInv(S) f(S) ωf ε log 2D card(T ) For continuous functions f : Zn Rd, which is the main setting in this paper, we use a smooth version of the inversesensitivity mechanism. To this end, we define the ρ-smooth inverse-sensitivity of t with respect to norm : lenρ(S; t) = inf s Rd: s t ρ len(S; s). (1) The ρ-smooth inverse-sensitivity mechanism M ρ Inv( ; f) then has the following density given input dataset S: πS(t) = e lenρ(S;t)ε/2 R s Rd e lenρ(S;s)ε/2ds (2) For our setting of interest where T = {v Rd : v R}, we have the following upper bound on the error of M ρ Inv. Its proof follows similar ideas as in (Asi and Duchi, 2020a;b) and is in Appendix A . Theorem 2.4 (Continuous functions). Let f : Zn T where T = {v Rd : v R}. Then for any S Zn, and β > 0, with probability at least 1 β, the ρ-smooth inverse-sensitivity mechanism with norm has error M ρ Inv(S; f) f(S) S; 2 d log R ρ + 1 + log 1 3. Transformations between Robust and Private Algorithms In this section, we provide transformations between differentially private and robust algorithms. We begin in Section 3.1 with our main result: a general transformation from robust algorithms to private algorithms with roughly the same error for a specified number of corruptions. In Section 3.2, we consider the other direction, and show for completeness that any private algorithm is inherently robust as well, which was already observed since (Dwork and Lei, 2009). 3.1. Robust to Private Our first result shows how to transform a determinitstic robust algorithm into a private algorithm with roughly the same error. The main idea is to apply the ρ-smooth inversesensitivity mechanism (Asi and Duchi, 2020a) with the input function f being the robust algorithm itself. Theorem 3.1 (Robust-to-private). Let S = (S1, . . . , Sn) where Si iid P such that µ(P) Rd. Let ε, β (0, 1). Let Arob : (Rd)n {t Rd : t R} be a deterministic (τ, β, α)-robust algorithm. Let α0 α and τ = 2 d log R α0 + 1 + log 1 If τ τ , then M ρ Inv(S; Arob) with ρ = α0 is ε-DP and, with probability at least 1 2β, has error M ρ Inv(S; Arob) µ 4α. In particular, this im- plies that for τ 2 d log R α0 +1 +log 2 nε , αpriv(ε, β) 4αrob(τ, β/2). Proof. First note that the privacy guarantee is immediate from the guarantees of the exponential mechanism (Definition 2.2) and the fact that the sensitivity of the ρ-smooth path-length function in Equation (1) is 1. Now we prove utility. Let K = nτ . The error of M ρ Inv(S, Arob) is then bounded as follows: M ρ Inv(S; Arob) µ Arob(S) µ + M ρ Inv(S; Arob) Arob(S) Arob(S) µ + ωArob (S; K) + α0 (w.p. 1 β by Theorem 2.4) 2 Arob(S) µ + Arob(S ) µ + α0, From Robustness to Privacy and Back for S = argmax S :d H(S ,S) K Arob(S) Arob(S ) . Recall that, by assumption, Arob is (τ, β, α)-robust for τ K/n. Thus, with probability 1 β, Arob(S ) µ α for any τ-corrupted dataset S . By union bound, we have that with probability 1 2β, M ρ Inv(S; Arob) µ 3α+α0 4α. This completes the proof of the theorem. The parameter α0 determines the smallest fraction of corruptions τ , which in turn determines the smallest α so that Arob is (τ , β, α)-robust. A simple choice for α0 is the minimax error for estimating the statistic µ(P), without adversarial corruptions or privacy constraints. Remark 3.2. We can extend this transformation to hold for randomized robust algorithms, by first converting Arob into a deterministic algorithm, albeit doubling the error and failure probability, as shown in Theorem A.1, Appendix A.1. 3.2. Private to Robust In this section, we state the folklore fact that any εdifferentially private algorithm is also τ 1 nε-robust with the same accuracy. This follows directly from the definition of differential privacy which states that changing 1/ε users does not change the output distribution by much (by group privacy) and was observed in (Dwork and Lei, 2009). Theorem 3.3 (Private-to-robust). Let S = (S1, . . . , Sn) where Si iid P. Let ε, β (0, 1). Let Apriv be an (ε, β, α)-private algorithm for estimating the statistic µ. Let γ (0, 1) and τ = log(1/γ) nε . Then Apriv is (τ, β/γ, α)- robust. In particular, αrob(τ, β/γ) αpriv(ε, β), which is equivalent to αrob(τ, β) αpriv(log(1/γ)/nτ, γβ). Proof. Let W = {t Rd : t µ > α} be the set of bad outputs for the distribution P. The accuracy guarantee of Apriv implies that Pr[Apriv(S) W] β. Now assume S is a τn-corrupted version of S where τ = log(1/γ)/(nε), that is, d H(S, S ) log(1/γ)/ε. The definition of differential privacy now implies that Pr[ Apriv(S ) µ > α] = Pr[Apriv(S ) W] ed H(S,S )ε Pr[Apriv(S) W] for τ log(1/γ) nε . Thus Apriv is τ-robust for τ = log(1/γ) nε with accuracy α and failure probability β/γ. We note that for γ = 1/e, αrob(τ, β) αpriv (1/nτ, β/e), that is, the minimax error of any τ-robust algorithm with failure probability β is bounded by the minimax error of a ε-DP algorithm, for ε = 1/nτ, with the same failure probability up to constant factors. As private algorithms are often randomized, this transformation would result in a randomized robust algorithm. As in the previous section, we can convert it into a deterministic one via Theorem A.1 in Appendix A.1. 4. Implications of our transformations 4.1. Equivalence between Private and Robust Estimation In this section, we show that the (high-probability) minimax rates for ε-DP and τ-robustness are on the same order when the problem is low-dimensional and τ = log(n)/nε. The following corollary states the result. For simplicity, we assume that the dimension d = 1 and the range R = 1. Corollary 4.1 (Equivalence). Let P be a family of distributions and P P. Let µ be a 1-dimensional statistic where |µ(P)| 1 such that αrob(τ, β) is a continuous function of β for all τ 1/2. Let n > 1, ε = ω(log(n)/n), and τ = Θ(log(n)/nε). Suppose there exists a constant c such that the error αrob(0, β) 1 nc for any β 1/4. Then there are constants c1 c2 > 0 such that for βp = 1/nc1 and βr = 1/nc2, αpriv(ε, βp) = Θ (αrob(τ, βr)) . Proof. First, we observe that α0 = αrob(0, βp) αrob(τ, βp) by the monotonicity of αrob and α0 1 nc since βp = 1 nc1 1/2. By Theorem 3.11, we have that, for τ1 = 2(c+c1) log(2n) nε 2 log(1/α0+1)+log(2/βp) αpriv(ε, βp) 4αrob(τ1, βp/2). (4) Setting γ = 1/(2n)2(c+c1), we have that τ1 = log(1/γ) nε . By Theorem 3.3, αrob(τ1, (2n)2(c+c1)β) αpriv(ε, β). Note that if αpriv(ε, βp) αrob(τ1, βp/2) then the claim follows from Equation (4), using βr = βp/2. Otherwise we have that αrob(τ1, (2n)2(c+c1)βp) αpriv(ε, βp) αrob(τ1, βp/2) As αrob(τ1, β) is a continuous function of β, there is βr [βp/2, (2n)2(c+c1)βp] [ 1 n2c1 , 1 n2c+3c1 ] such that αrob(τ, βr) = αpriv(ε, βp). Note that in most settings, αrob(τ, βr) has the same order when βr [βp, poly(n) βp] as it depends on log(1/β) (see for example Section 6). 1If the robust algorithm achieving the minimax error αrob is randomized, we can transform it into a deterministic one as Theorem 3.1 requires, via Theorem A.1, by losing only constant factors. From Robustness to Privacy and Back Algorithm 1 Robust-to-Private ((ε, δ)-DP) Require: S = (S1, . . . , Sn), (τ, β, α)-robust algorithm Arob, local modulus bound B 1: Let K = nτ/2 1 2: Let Sbad = {S Zn : ωf(S; K + 1) > B} 3: Calculate d = min S Sbad d H(S, S ) 4: Set ˆd = d + ζ where ζ Laplace(2/ε) 5: if ˆd > 2 log(1/ min(δ, β))/ε then 6: Sample t from the truncated inverse-sensitivity mechanism with threshold K, privacy parameter ε/2, smoothness parameter ρ = 2B, and return t. 7: else 8: Return 9: end if 4.2. Optimality of Black-Box Transformation An immediate corollary of the previous transformations is that for some choice of robust algorithm Arob our robustto-private transformation achieves the minimax optimal rate among the family of private algorithms, for low-dimensional statistics. See Appendix A for its proof. Corollary 4.2 (Optimality). Let P be a family of distributions and P P. Let µ be a 1-dimensional statistic where |µ(P)| 1. Let αpriv(ε, β) be the minimax error of any ε-DP algorithm with failure probability β 1/4 that estimates statistic µ(P). Let n > 1. Suppose that there exists a constant c such that the non-private error αpriv( , β) 1 nc for any β 1/2. Then there are constants c1 c2 > 0 such that βp = 1/nc1 and β p = 1/nc2, robust algorithm Arob, and a choice of ρ, such that Algorithm M ρ Inv( ; Arob) with privacy parameter ε achieves the optimal error O(αpriv(ε, βp)) with probability 1 β p. 5. A Transformation for Approximate DP In this section, we propose a different transformation for (ε, δ)-DP that avoids the necessary dependence on diameter for pure ε-DP , by using the following truncated version of the inverse-sensitivity: lentrunc f (S; t) := ( lenρ f(S; t) if lenρ f(S; t) K otherwise. Algorithm 1 uses a private test to check whether S is far from the set Sbad = {S Zn : ωf(S; K + 1) > B}. If it is not, then it fails. Crucially, if S iid P n then the robust algorithm guarantees that ωf(S ; K + 1) B for S in a neighborhood of S, allowing the test to succeed in this case (Theorem 5.1, proven in Appendix B). Theorem 5.1 (Robust-to-private, approximate DP). Let S = (S1, . . . , Sn) where Si iid P such that µ(P) Rd. Let ε, δ, β (0, 1). Let Arob : (Rd)n Rd be a deterministic (τ, β, α)-robust algorithm for the statistic µ. If τ 8(d+log(1/ min{δ,β})) nε then Algorithm 1 with B = 2α and ρ = 2B is (ε, δ)-DP and, with probability at least 1 2β returns ˆµ such that ˆµ µ 7α. 6. Applications for Pure DP 6.1. Principal Component Analysis In this section, we apply our transformation to obtain a pure DP algorithm for PCA under Gaussian data. We note that the result holds as-is for subgaussian distributions more generally, because Theorem 6.3 (Jambulapati et al., 2020) does. We assume w.l.o.g. that the distribution is zero-mean. To the best of our knowledge, Corollary 6.1 gives the first (computationally inefficient) algorithm for pure DP with error O( d nε). PCA with a spectral gap has been studied under pure DP in (Chaudhuri et al., 2013), where the result can be translated to yield a suboptimal error of O( d2 nε). A long line of work studies PCA under approximate DP (Blum et al., 2005; Hardt and Roth, 2012; 2013; Chaudhuri et al., 2013; Kapralov and Talwar, 2013; Dwork et al., 2014) with the recent result by Liu et al. (2022) achieving the optimal error of O( d nε) for subgaussian distributions. Corollary 6.1 (Gaussian PCA). Let S = (S1, . . . , Sn) where Si iid N(0, Σ). Let ε, β (0, 1). Suppose n is such that α 1 in Equation (5). There exists a constant C > 0 and an ε-DP algorithm M such that, with probability at least 1 β, returns unit vector M(S) = v such that 1 v Σv n + d log2( n d ) + log( 1 We will need a more general transformation, proven in Appendix A.2, and stated in Theorem 6.2below. Theorem 6.2 (Robust-to-private, general loss). Let S = (S1, . . . , Sn) where Si iid P such that µ(P) Rd. Let ε, β (0, 1). Let L : (Rd)2 R be a loss function which satisfies the triangle inequality. Let Arob : (Rd)n {t Rd : t R} be a (deterministic) (τ, β, α)-robust algorithm with respect to L. Let α0 α. Suppose n is such that the smallest value τ satisfying Equation (6) is at most 1. Suppose for all u, v B(R + α0), L(u, v) c L u v for some constant c L. If τ 2 d log R α0 + 1 + log 1 then Algorithm M ρ Inv(S; Arob) with ρ = α0 in norm is From Robustness to Privacy and Back ε-DP and, with probability at least 1 2β, has error L (M ρ Inv(S; Arob), µ) (3 + c L)α = O(α). We will instantiate our transformation with the robust PCA algorithm from (Jambulapati et al., 2020). An alternative with the same guarantees is returning the unit vector that minimizes the surrogate cost function proposed by (Liu et al., 2022). Theorem 6.3 (Theorem 1, (Jambulapati et al., 2020)). Let S = (S1, . . . , Sn) where Si iid N(0, Σ). Let β (0, 1), τ (0, 1/2). Let n + τ log 1 for a known constant C > 0. There exists algorithm Arob which is (τ, β, α )-robust. Proof of Corollary 6.1. We define the following loss function L(u, v) = u Σu Σ 2 . If v1 is the top eigenvector of Σ, then our goal is to return vector v with small error L(v1, v). Let α0 = C q n and assume it is less than 1, to be confirmed later. Then L satisfies the triangle inequality and for all u, v B(1 + α0) B(2), Σ1/2u 2 2 Σ 2 Σ1/2v 2 2 Σ 2 Σ1/2u 2 Σ1/2v 2 Σ1/2u 2 + Σ1/2v 2 Σ 2 u v 2 ( u 2 + v 2) 4 u v 2 . Let Arob : (Rd)n Sd 1 be the algorithm established by Theorem 6.3, where Sd 1 denotes the unit sphere. Then L satisfies the requirements of Theorem 6.2. For τ = 2d log(n/d)+2 log(1/β) nε , Arob is (τ, β, α )-robust with d + log(1/β) n + d log2( n d ) + log( 1 We then have that M ρ Inv( , Arob) with ρ = α0 is ε-DP and with probability 1 2β, returns v B(1 + α0), such that L(v1, v) = 1 v Σv Let ˆv = v v 2 be the unit vector in the direction of v. We have that L(v1, ˆv) = 1 v Σv Σ 2 v 2 2 . If v 2 1 then L(v1, ˆv) L(v1, v). Suppose v 2 > 1. L(v1, ˆv) = 1 ( v 2 2 1) + L(v1, v) v 2 2 + L(v1, v) (since v 2 > 1) (α0 + 1)2 + L(v1, v) (since (x 1)/x ) 2α0 + L(v1, v) 9α , since α0 α . Therefore, we return a unit vector ˆv with 1 ˆv Σˆv Σ 2 α for α = 9α . By assumption n is sufficiently large so that α 1, and as such α < 1. The proof is complete by rescaling β β/2 and adjusting the constants. 6.2. More Applications We apply our transformation to Gaussian mean and covariance estimation, instantiated by the Tukey median and the robust algorithm by (Diakonikolas et al., 2017) respectively, retrieving the known near-optimal error. We also apply it to Gaussian linear regression (Corollary 6.4 below), instantiated by the robust algorithm by Gao (2020) to give the first algorithm with optimal error under pure DP. Corollary 6.4 (Gaussian Linear Regression). Let S = (S1, . . . , Sn) where for all i [n], Si = (Xi, yi) Rd R is generated by a linear model yi = X i θ + ηi for some unknown θ Bd(R), where Xi iid N(0, Σ), I Σ κI, and ηi iid N(0, σ2), independent from Xi. Let ε, β (0, 1). There exists an ε-DP algorithm M such that, with probability 1 β, Σ 1/2(M(S) θ) 2 ασ for n + d log (R/σ+ κ)n We extend our main transformation to handle sparse estimation in Appendix A.3, which allows us to prove the equivalent result for the case of sparse linear regression where θ 0 k. We show that in this case, the error is in the order of p k/n + k/(nε), as expected. All remaining statements and proofs are in Appendix C. 7. Conclusions and Future Work We gave the first black-box transformation that converts an arbitrary robust algorithm into a differentially private one. We proved that this transformation gives an optimal From Robustness to Privacy and Back strategy for designing a differentially private algorithm for low-dimensional tasks, and that the minimax errors for robustness and privacy are equivalent for these tasks. We also showed that this transformation often gives nearoptimal error rates for several canonical high-dimensional tasks under (sub)Gaussian distributions (including under sparsity assumptions) and expect that it achieves similar results under other families of distributions, such as heavytailed. In particular, we note that the dependence on dimension d cannot be improved in the general setting. This follows from the optimal rates we obtain in our applications using this transformation, as any further improvements in the dependence on d in the general setting will result in a contradiction to existing lower bounds for the applications we consider. However, it would still be interesting to explore specialized settings where this dependence can be improved (for example, as we show for the case of sparse linear regression), as well as to determine the conditions under which this or another black-box transformation yields private algorithms with optimal error in high dimensions. A drawback of our transformation is that it produces a computationally inefficient algorithm, even if the robust algorithm we instantiate it with is computationally efficient. However, there are some approaches which allow for efficient implementation of this transformation. One approach is to use accurate approximations for the inverse sensitivity mechanism that can be implemented efficiently in certain settings such as PCA and linear regression, as in (Asi and Duchi, 2020b). Moreover, as the inverse sensitivity is an application of the exponential mechanism with a specific score function, it is possible to use existing results (Hopkins et al., 2022a) which can be applied as long as the score function satisfies certain properties. Specifically, Hopkins et al. (2022b) use the sum-of-squares paradigm, to make this transformation computationally efficient specifically for the task of Gaussian estimation in TV distance, an approach that has been recently successful when applied to several problems (Hopkins et al., 2022a; Ashtiani and Liaw, 2022; Kothari et al., 2022; Alabi et al., 2022). Acknowledgements We thank Chao Gao for helpful discussions and clarifications regarding results in the robustness literature and the anonymous ICML reviewers for helpful comments that improved the presentation of our paper. JU and LZ were supported by NSF awards CCF-1750640, CNS-1816028, and CNS-2120603. LZ was additionally supported by a Facebook (Meta) Fellowship. John M Abowd. The US Census Bureau adopts differential privacy. In ACM International Conference on Knowledge Discovery & Data Mining, KDD 18, pages 2867 2867, 2018. John M. Abowd, Robert Ashmead, Ryan Cumings-Menon, Daniel Kifer, Philip Leclerc, Jeffrey Ocker, Michael Ratcliffe, and Pavel Zhuravlev. Geographic spines in the 2020 census disclosure avoidance system topdown algorithm, 2022. URL https://arxiv.org/abs/22 03.16654. Ishaq Aden-Ali, Hassan Ashtiani, and Gautam Kamath. On the sample complexity of privately learning unbounded high-dimensional gaussians. In Proceedings of the 32nd International Conference on Algorithmic Learning Theory (ALT 2021), ALT 21. JMLR, Inc., March 2021. URL https://arxiv.org/abs/2010.09929. Daniel Alabi, Pravesh K. Kothari, Pranay Tankala, Prayaag Venkat, and Fred Zhang. Privately estimating a gaussian: Efficient, robust and optimal, 2022. URL https://ar xiv.org/abs/2212.08018. Apple Differential Privacy Team. Learning with privacy at scale. Apple Machine Learning Journal, 1(8), 2017. https://machinelearning.apple.com/do cs/learning-with-privacy-at-scale/ap pledifferentialprivacysystem.pdf. Hassan Ashtiani and Christopher Liaw. Private and polynomial time algorithms for learning gaussians and beyond. In Po-Ling Loh and Maxim Raginsky, editors, Proceedings of Thirty Fifth Conference on Learning Theory, volume 178 of Proceedings of Machine Learning Research, pages 1075 1076. PMLR, 02 05 Jul 2022. URL https://proceedings.mlr.press/v178/a shtiani22a.html. Hilal Asi and John C. Duchi. Near instance-optimality in differential privacy, May 2020a. URL https://arxi v.org/abs/2005.10630. Hilal Asi and John C. Duchi. Instance-optimality in differential privacy via approximate inverse sensitivity mechanisms. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 14106 14117. Curran Associates, Inc., Dec 2020b. URL https://proceedings.neurips.cc/paper /2020/file/a267f936e54d7c10a2bb70dbe 6ad7a89-Paper.pdf. Marco Avella-Medina and Victor-Emmanuel Brunel. Differentially private sub-gaussian location estimators. ar Xiv preprint ar Xiv:1906.11923, 2019. From Robustness to Privacy and Back Marco Avella-Medina and Victor-Emmanuel and Brunel. Propose, test, release: Differentially private estimation with high probability. ar Xiv preprint ar Xiv:2002.08774, 2020. Amos Beimel, Hai Brenner, Shiva Prasad Kasiviswanathan, and Kobbi Nissim. Bounds on the sample complexity for private learning and private data release. Mach. Learn., 94(3):401 437, mar 2014. ISSN 0885-6125. doi: 10.100 7/s10994-013-5404-1. URL https://doi.org/10 .1007/s10994-013-5404-1. Andrea Bittau, Ulfar Erlingsson, Petros Maniatis, Ilya Mironov, Ananth Raghunathan, David Lie, Mitch Rudominer, Ushasree Kode, Julien Tinnes, and Bernhard Seefeld. PROCHLO: Strong privacy for analytics in the crowd. In ACM Symposium on Operating Systems Principles, SOSP 17, pages 441 459, Shanghai, China, 2017. https://arxiv.org/abs/1710.00901. Avrim Blum, Cynthia Dwork, Frank Mc Sherry, and Kobbi Nissim. Practical privacy: the Su LQ framework. In Proceedings of the 24th Annual ACM Symposium on Principles of Database Systems, PODS 05, pages 128 138, Baltimore, MD, USA, 2005. ACM. Gavin Brown, Marco Gaboardi, Adam Smith, Jonathan Ullman, and Lydia Zakynthinou. Covariance-aware private mean estimation without private covariance estimation. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, volume 34, pages 7950 7964. Curran Associates, Inc., 2021. URL https://proceedings.neurips.cc/paper /2021/file/42778ef0b5805a96f9511e20b 5611fce-Paper.pdf. Mark Bun and Thomas Steinke. Average-case averages: Private algorithms for smooth sensitivity and mean estimation. Advances in Neural Information Processing Systems, 32, 2019. Mark Bun, Gautam Kamath, Thomas Steinke, and Zhiwei Steven Wu. Private hypothesis selection. In Advances in Neural Information Processing Systems, Neur IPS 19, pages 156 167, Vancouver, Canada, 2019. https: //arxiv.org/abs/1905.13229. Michael A Burr and Robert J Fabrizio. Uniform convergence rates for halfspace depth. Statistics & Probability Letters, 124:33 40, 2017. Cl ement Canonne, Gautam Kamath, Audra Mc Millan, Jonathan Ullman, and Lydia Zakynthinou. Private identity testing for high dimensional distributions. In Advances in Neural Information Processing Systems, Neur IPS 20, 2020. https://arxiv.org/abs/1905.11947. Kamalika Chaudhuri, Anand D. Sarwate, and Kaushik Sinha. A near-optimal algorithm for differentially-private principal components. J. Mach. Learn. Res., 14(1):2905 2943, jan 2013. ISSN 1532-4435. Yu Cheng, Ilias Diakonikolas, Rong Ge, and David P. Woodruff. Faster algorithms for high-dimensional robust covariance estimation. In Alina Beygelzimer and Daniel Hsu, editors, Proceedings of the Thirty-Second Conference on Learning Theory, volume 99 of Proceedings of Machine Learning Research, pages 727 757. PMLR, 25 28 Jun 2019. URL https://proceedings.mlr. press/v99/cheng19a.html. Ilias Diakonikolas, Gautam Kamath, Daniel M Kane, Jerry Li, Ankur Moitra, and Alistair Stewart. Robust estimators in high dimensions without the computational intractability. In IEEE Annual Symposium on Foundations of Computer Science, FOCS 16, pages 655 664. IEEE, 2016. https://arxiv.org/abs/1604.06443. Ilias Diakonikolas, Gautam Kamath, Daniel M Kane, Jerry Li, Ankur Moitra, and Alistair Stewart. Being robust (in high dimensions) can be practical. In International Conference on Machine Learning, ICML 17, pages 999 1008, 2017. https://arxiv.org/abs/1703.0 0893. David L Donoho and Miriam Gasko. Breakdown properties of location estimates based on halfspace depth and projected outlyingness. The Annals of Statistics, pages 1803 1827, 1992. Cynthia Dwork and Jing Lei. Differential privacy and robust statistics. In Proceedings of the 41st ACM Symposium on Theory of Computing, STOC 09, pages 371 380. ACM, 2009. Cynthia Dwork, Frank Mc Sherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensitivity in private data analysis. In Conference on Theory of Cryptography, TCC 06, pages 265 284, New York, NY, USA, 2006. Cynthia Dwork, Kunal Talwar, Abhradeep Thakurta, and Li Zhang. Analyze gauss: optimal bounds for privacypreserving principal component analysis. In Proceedings of the 46th Annual ACM Symposium on the Theory of Computing, STOC 14, pages 11 20, New York, NY, 2014. ACM. Ulfar Erlingsson, Vasyl Pihur, and Aleksandra Korolova. RAPPOR: Randomized aggregatable privacy-preserving ordinal response. In ACM Conference on Computer and Communications Security, CCS 14, 2014. Chao Gao. Robust regression via mutivariate regression depth. Bernoulli, 26(2):1139 1170, 2020. doi: 10.3150/ From Robustness to Privacy and Back 19-BEJ1144. URL https://doi.org/10.3150/ 19-BEJ1144. Kristian Georgiev and Samuel B. Hopkins. Privacy induces robustness: Information-computation gaps and sparse mean estimation. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors, Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id=g Oke NXPy-X. Badih Ghazi, Ravi Kumar, Pasin Manurangsi, and Thao Nguyen. Robust and private learning of halfspaces. In Arindam Banerjee and Kenji Fukumizu, editors, Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, volume 130 of Proceedings of Machine Learning Research, pages 1603 1611. PMLR, 13 15 Apr 2021. URL https://proceedings.ml r.press/v130/ghazi21a.html. Samuel Haney, Ashwin Machanavajjhala, John M Abowd, Matthew Graham, Mark Kutzbach, and Lars Vilhuber. Utility cost of formal privacy for releasing national employer-employee statistics. In Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD 17, pages 1339 1354, Chicago, IL, 2017. ACM. Moritz Hardt and Aaron Roth. Beating randomized response on incoherent matrices. STOC 12, page 1255 1268, New York, NY, USA, 2012. Association for Computing Machinery. ISBN 9781450312455. doi: 10.1145/2213 977.2214088. URL https://doi.org/10.1145/ 2213977.2214088. Moritz Hardt and Aaron Roth. Beyond worst-case analysis in private singular vector computation. In Proceedings of the Forty-Fifth Annual ACM Symposium on Theory of Computing, STOC 13, page 331 340, New York, NY, USA, 2013. Association for Computing Machinery. ISBN 9781450320290. doi: 10.1145/2488608.2488650. URL https://doi.org/10.1145/2488608.2488 650. Moritz Hardt and Kunal Talwar. On the geometry of differential privacy. In Proceedings of the 42nd ACM Symposium on Theory of Computing, STOC, 2010. Samuel B. Hopkins, Gautam Kamath, and Mahbod Majid. Efficient mean estimation with pure differential privacy via a sum-of-squares exponential mechanism. STOC 2022, page 1406 1417, New York, NY, USA, 2022a. Association for Computing Machinery. ISBN 9781450392648. doi: 10.1145/3519935.3519947. URL https://doi.org/10.1145/3519935.3519 947. Samuel B. Hopkins, Gautam Kamath, Mahbod Majid, and Shyam Narayanan. Robustness implies privacy in statistical estimation, 2022b. URL https://arxiv.org/ abs/2212.05015. Peter J. Huber. A Robust Version of the Probability Ratio Test. The Annals of Mathematical Statistics, 36(6):1753 1758, 1965. doi: 10.1214/aoms/1177699803. URL ht tps://doi.org/10.1214/aoms/1177699803. Arun Jambulapati, Jerry Li, and Kevin Tian. Robust sub-gaussian principal component analysis and widthindependent schatten packing. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 15689 15701. Curran Associates, Inc., 2020. URL https://proceedings.neurips. cc/paper/2020/file/b58144d7e90b5a43e dcce1ca9e642882-Paper.pdf. Aaron Johnson and Vitaly Shmatikov. Privacy-preserving data exploration in genome-wide association studies. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 13, page 1079 1087, New York, NY, USA, 2013. Association for Computing Machinery. ISBN 9781450321747. doi: 10.1145/2487575.2487687. URL https: //doi.org/10.1145/2487575.2487687. Gautam Kamath, Jerry Li, Vikrant Singhal, and Jonathan Ullman. Privately learning high-dimensional distributions. In Conference on Learning Theory, pages 1853 1902. PMLR, 2019. Gautam Kamath, Vikrant Singhal, and Jonathan Ullman. Private mean estimation of heavy-tailed distributions. https://arxiv.org/abs/2002.09464, 2020. Michael Kapralov and Kunal Talwar. On differentially private low rank approximation, pages 1395 1414. 2013. doi: 10.1137/1.9781611973105.101. URL https: //epubs.siam.org/doi/abs/10.1137/1.9 781611973105.101. Pravesh Kothari, Pasin Manurangsi, and Ameya Velingker. Private robust estimation by stabilizing convex relaxations. In Po-Ling Loh and Maxim Raginsky, editors, Proceedings of Thirty Fifth Conference on Learning Theory, volume 178 of Proceedings of Machine Learning Research, pages 723 777. PMLR, 02 05 Jul 2022. URL https://proceedings.mlr.press/v178/k othari22a.html. Jerry Li. Lecture notes in robustness in machine learning (cse 599-m), 2019. URL https://jerryzli.git hub.io/robust-ml-fall19.html. From Robustness to Privacy and Back Jerry Li and Guanghao Ye. Robust gaussian covariance estimation in nearly-matrix multiplication time. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 12649 12659. Curran Associates, Inc., 2020. URL https://proceeding s.neurips.cc/paper/2020/file/9529fbb a677729d3206b3b9073d1e9ca-Paper.pdf. Xiyang Liu, Weihao Kong, Sham Kakade, and Sewoong Oh. Robust and differentially private mean estimation. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, volume 34, pages 3887 3901. Curran Associates, Inc., 2021. URL https:// proceedings.neurips.cc/paper/2021/fi le/1fc5309ccc651bf6b5d22470f67561ea Paper.pdf. Xiyang Liu, Weihao Kong, and Sewoong Oh. Differential privacy and robust statistics in high dimensions. In Po Ling Loh and Maxim Raginsky, editors, Proceedings of Thirty Fifth Conference on Learning Theory, volume 178 of Proceedings of Machine Learning Research, pages 1167 1246. PMLR, 02 05 Jul 2022. URL https://pr oceedings.mlr.press/v178/liu22b.html. Frank Mc Sherry and Kunal Talwar. Mechanism design via differential privacy. In IEEE Symposium on Foundations of Computer Science, FOCS 07, pages 94 103, Las Vegas, NV, USA, 2007. Kobbi Nissim, Sofya Raskhodnikova, and Adam Smith. Smooth sensitivity and sampling in private data analysis. In ACM Symposium on Theory of computing, pages 75 84, 2007. Ryan Rogers, Subbu Subramaniam, Sean Peng, David Durfee, Seunghyun Lee, Santosh Kumar Kancha, Shraddha Sahay, and Parvez Ahammad. Linkedin s audience engagements api: A privacy preserving data analytics system at scale. ar Xiv preprint ar Xiv:2002.05839, 2020. David Tastuggine and Ilya Mironov. Introducing Opacus: A high-speed library for training Py Torch models with differential privacy. Facebook AI Blog, 2020. https: //ai.facebook.com/blog/introducing-o pacus-a-high-speed-library-for-train ing-pytorch-modelswith-differential-privacy/. Eliad Tsfadia, Edith Cohen, Haim Kaplan, Yishay Mansour, and Uri Stemmer. Friendly Core: Practical differentially private aggregation. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato, editors, Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pages 21828 21863. PMLR, 17 23 Jul 2022. URL https: //proceedings.mlr.press/v162/tsfadia 22a.html. John D. Tukey. A survey of sampling from contaminated distributions. Contributions to Probability and Statistics: Essays in Honor of Harold Hotelling, pages 448 485, 1960. Vladimir Naumovich Vapnik and Aleksei Yakovlevich Chervonenkis. On uniform convergence of the frequencies of events to their probabilities. Teoriya Veroyatnostei i ee Primeneniya, 16(2):264 279, 1971. Royce J Wilson, Celia Yuxin Zhang, William Lam, Damien Desfontaines, Daniel Simmons-Marengo, and Bryant Gipson. Differentially private sql with bounded user contribution. Proceedings on Privacy Enhancing Technologies, 2020(2):230 250, 2020. https://arxiv.org/ab s/1909.01917. From Robustness to Privacy and Back A. Additional Proofs for Transformations A.1. Randomized to Deterministic Robust Algorithm Here, we present a transformation from a randomized algorithm A to a deterministic robust algorithm whose error and failure probability are larger by a factor of 2. Intuitively, Algorithm 2 finds a small ball where the randomized algorithm has the largest density, and returns its center. This transformation is computationally inefficient because it requires running the randomized algorithm with all possible choices of random coins. Algorithm 2 Randomized-to-Deterministic Robust Require: S = (S1, . . . , Sn), Algorithm A, accuracy α. 1: Let PS denote the probability distribution of A(S) over the randomness of the algorithm 2: Find the center v of a ball B(v) = {u Rd : u v α} of radius α that maximizes PS(B(v)) 3: Return v We have the following guarantees for the transformation in Algorithm 2. Theorem A.1 (Randomized-to-deterministic-robust). Let S = (S1, . . . , Sn) where Si iid P. Let τ, β (0, 1). Let A be a (τ, β, α)-robust algorithm for estimating the statistic µ. Then Algorithm 2 is (τ, 2β, 2α)-robust. Proof. Let S = (S1, . . . , Sn) where Si iid P and let S be a τ-corrupted version of S, that is, d H(S, S ) nτ. We show that running Algorithm 2 over S returns an accurate estimate with high probability over S. Let B(µ) = {u Rd : u µ α} be the ball of radius α around µ, and let B(v ) be the ball that maximizes PS (B(v)). Let E = {S : S d H(S , S) nτ, PS (B(µ)c) 1/2} denote the set of good input datasets S such that for all τ-corrupted S , the robust algorithm A returns bad answers with probability less than 1/2. We show that if S E then Algorithm 2 returns an accurate answer for any τ-corrupted S . Indeed, if S E, then for any τ-corrupted S , PS (B(µ)) > 1/2. Moreover, the definition of B(v ) implies PS (B(v )) PS (B(µ)) > 1/2. As a result, we have that B(v ) B(µ) = . Let u B(v ) B(µ). We have that v µ v u + u µ 2α. Thus, if S E then Algorithm 2 is 2α accurate. It remains to show that Pr[S / E] 2β. This follows from the fact that A has failure probability β: β Pr S,A[ S : d H(S , S) and A(S ) B(µ)c] Pr S [S / E] Pr S,A [ S : d H(S , S) nτ and A(S ) B(µ)c | S / E] = Pr S [S / E] ES,A max S :d H(S ,S) nτ 1{A(S ) B(µ)c} | S / E Pr S [S / E] ES max S :d H(S ,S) nτ EA[1{A(S ) B(µ)c}] | S / E (by Jensen s inequality) = Pr S [S / E] ES max S :d H(S ,S) nτ PS (B(µ)c) | S / E > Pr S [S / E] 1 The claim follows. A.2. Robust-to-Private Transformation for General Loss In the statement of Theorem 3.1, the error is measured in some norm , the range of the robust algorithm Arob is bounded in the same norm, and the smoothness ρ allowed in the inverse sensitivity score function (Equation (1)) is again bounded in the same norm. We can prove a more general theorem, where the last two norms are the same, but the error is instead measured with respect to a general loss function which satisfies the triangle inequality. From Robustness to Privacy and Back Theorem A.2 (Robust-to-private, general loss, restatement of Theorem 6.2). Let S = (S1, . . . , Sn) where Si iid P such that µ(P) Rd. Let ε, β (0, 1). Let L : (Rd)2 R be a loss function which satisfies the triangle inequality. Let Arob : (Rd)n {t Rd : t R} be a (deterministic) (τ, β, α)-robust algorithm with respect to L. Let α0 α. Suppose n is such that the smallest value τ satisfying Equation (7) is at most 1. Suppose u, v B(R + α0) L(u, v) c L u v for some constant c L. If τ 2 d log R α0 + 1 + log 1 then Algorithm M ρ Inv(S; Arob) with ρ = α0 in norm is ε-DP and, with probability at least 1 2β, has error L (M ρ Inv(S; Arob), µ) (3 + c L)α = O(α). Before proving Theorem A.2, we need the equivalent of Theorem 2.4 for general loss functions. Theorem A.3 (Continuous functions, general loss). Let ε, β (0, 1), ρ > 0. Let f : Zn T where T = {t Rd : t R}. Let L : (Rd)2 R be a loss function which satisfies the triangle inequality and u, v B(R+ρ) L(u, v) c L u v for some constant c L. Suppose n is larger than the smallest K satisfying Equation (8) below. Then for any S Zn, with probability 1 β, the ρ-smooth-inverse-sensitivity mechanism with norm has error L (M ρ Inv(S; f), f(S)) ωL f (S; K) + c Lρ, K 2d log(R/ρ + 1) + 2 log(1/β) and ωL f (S; K) = sup S :d H(S,S ) K L (f(S), f(S )) denotes the local modulus of continuity of f at S with respect to loss function L. Proof. We define the good set of outputs A = {t Rd : lenρ(S; t) K}, where lenρ(S; t) is defined with respect to norm and K N. By the definition of ρ-smooth-inverse sensitivity, for any t A, there exists s T with len(S; s) = K and s t ρ. We will show that Pr[M ρ Inv(S) / A] β for sufficiently large K. This implies the desired upper bound as we have that for t A L(t, f(S)) L(t, s) + L(s, f(S)) (by triangle inequality) c L t s + ωL f (S; K) (since s B(R), t s ρ, so s, t B(R + ρ)) c Lρ + ωL f (S; K). Now we upper bound Pr[M ρ Inv(S; f) / A]. First, note that lenρ(S; u) = 0 for u such that u f(S) ρ. This implies that for any t such that lenρ(S; t) K, the density is upper bounded by πS(t) e Kε/2 R u: u f(S) ρ du Overall, this implies that Pr[M ρ Inv(S; f) / A] e Kε/2 R u: u R+ρ du R u: u f(S) ρ du e Kε/2(R/ρ + 1)d. Setting K 2d log(R/ρ+1)+2 log(1/β) ε , we get that Pr(M ρ Inv(S; f) / A) β. We are now ready to prove Theorem A.2. From Robustness to Privacy and Back Proof of Theorem A.2. First note that the claim about privacy is immediate from the guarantees of the smooth-inverse- sensitivity mechanism. Now we prove utility. Let K = 2 d log R α0 +1 +log 1 ε . The error of M ρ Inv(S, Arob) is then bounded as follows: L (M ρ Inv(S; Arob), µ) L (Arob(S), µ) + L (M ρ Inv(S; Arob), Arob(S)) (by triangle inequality) L (Arob(S), µ) + sup S :d H(S ,S) K L (Arob(S ), Arob(S)) + c Lα0 (w.p. 1 β by Theorem A.3) 2L (Arob(S), µ) + sup S :d H(S ,S) K L (µ, Arob(S )) + c Lα0 (by triangle inequality) Recall that, by assumption, Arob is (τ, β, α)-robust for τ K/n, and α0 α. Thus, with probability 1 β, L (Arob(S ), µ) α for any τ-corrupted dataset S . By union bound, we have that with probability 1 2β, L (M ρ Inv(S; Arob), µ) 3α + c Lα0 (3 + c L)α = O(α). A.3. Robust-to-Private Transformation for Sparse Estimators In this section, we extend our transformation to work for k-sparse statistical estimation problems with improved dependence on the dimension. To this end, we define a variant of the smooth inverse sensitivity which is non-zero only for k-sparse outputs, lensp(S; t) = ( infs Rd: s t ρ len(S; s) if t 0 k if t 0 > k Then, our sparse-variant of the inverse sensitivity mechanism M ρ sp applies the exponential mechanism with lensp as the score function, πS(t) = e lensp(S;t)ε/2 R s Rd e lensp(S;s)ε/2ds (9) We have the following upper bound for this mechanism. Theorem A.4. Let f : Zn T where T = {v Rd : v R} such that f(S) 0 k for all S Zn . Then for any S Zn, and β > 0, with probability at least 1 β, the (sparse) inverse-sensitivity mechanism (9) with norm has error L (M ρ Inv(S; f), f(S)) ωL f (S; K) + c Lρ, K 2 k (log(ed/k) + log(R/ρ + 1)) + log 1 and ωL f (S; K) = sup S :d H(S,S ) K L (f(S), f(S )) denotes the local modulus of continuity of f at S with respect to loss function L. Proof. The proof follows similar steps to the proof of Theorem A.3. We define the good set of outputs A = {t Rd : lensp(S; t) K}, where lensp(S; t) is defined with respect to norm and K N. By the definition of sparse inverse sensitivity, for any t A, t is k-sparse and there exists s T with len(S; s) = K and s t ρ. We will show that Pr[M ρ Inv(S) / A] β for sufficiently large K. This implies the desired upper bound as we have that for t A L(t, f(S)) L(t, s) + L(s, f(S)) (by triangle inequality) c L t s + ωL f (S; K) (since s B(R), t s ρ, so s, t B(R + ρ)) c Lρ + ωL f (S; K). Now we upper bound Pr[M ρ Inv(S; f) / A]. First, note that lenρ(S; u) = 0 for u such that u f(S) ρ. This implies that for any t such that lenρ(S; t) K, the density is upper bounded by 0 if t 0 > k, and otherwise, πS(t) e Kε/2 R u: u 0 k, u f(S) ρ du From Robustness to Privacy and Back Overall, this implies that Pr[M ρ Inv(S; f) / A] e Kε/2 R u Rd: u 0 k, u R+ρ du R u Rd: u 0 k, u f(S) ρ du u Rk: u R+ρ du u Rk: u ρ du e Kε/2(ed/k)k(R/ρ + 1)k. Setting K 2k(log(ed/k)+log(R/ρ+1))+2 log(1/β) ε , we get that Pr(M ρ Inv(S; f) / A) β. Using this mechanism, we now have the following transformation from robust-to-private for sparse estimators. Theorem A.5 (Robust-to-private for sparse estimators). Let S = (S1, . . . , Sn) where Si iid P such that µ(P) Rd. Let ε, β (0, 1). Let L : (Rd)2 R be a loss function which satisfies the triangle inequality. Let Arob : (Rd)n {t Rd : t R} be a (deterministic) (τ, β, α)-robust algorithm with respect to L such that Arob(S) 0 k for all S. Let α0 α. Suppose n is such that the smallest value τ satisfying Equation (11) is at most 1. Suppose u, v B(R + α0) L(u, v) c L u v for some constant c L. If τ 2 k (log(ed/k) + log(R/α0 + 1)) + log 1 then Algorithm M ρ sp(S; Arob) with ρ = α0 in norm is ε-DP and, with probability at least 1 2β, has error L M ρ sp(S; Arob), µ (3 + c L)α = O(α). We leave the proof as an exercise for the reader as it is identical to the proof of Theorem A.2 using the upper bounds for the sparse variant of the inverse sensitivity mechanism (Theorem A.4). A.4. Omitted proofs of Section 2 and Section 4.2 Theorem A.6 (Continuous functions, restatement of Theorem 2.4). Let f : Zn T where T = {v Rd : v R}. Then for any S Zn, and β > 0, with probability at least 1 β, the ρ-smooth inverse-sensitivity mechanism with norm has error M ρ Inv(S; f) f(S) ωf S; 2 d log R ρ + 1 + log 1 Proof. We define the good set of outputs A = {t Rd : lenρ(S; t) K}. By the definition of ρ-smooth inverse-sensitivity, for any t A, there exists s Rd with len(S; s) K and s t ρ. We will show that Pr[M ρ Inv(S) / A] β for sufficiently large K. This implies the desired upper bound as we have that for t A t f(S) = t s + s f(S) t s + s f(S) ρ + ωf(S; K). Now we upper bound Pr[M ρ Inv(S; f) / A]. First, note that lenρ(S; s) = 0 for s such that s f(S) ρ. This implies that for any t such that lenρ(S; t) K, the density is upper bounded by πS(t) e Kε/2 R s: s f(S) ρ ds. From Robustness to Privacy and Back Overall, this implies that Pr[M ρ Inv(S; f) / A] e Kε/2 R s: s R+ρ ds R s: s f(S) ρ ds e Kε/2(R/ρ + 1)d. Setting K 2d log(R/ρ+1)+2 log(1/β) ε , we get that Pr(M ρ Inv(S; f) / A) β. Corollary A.7 (Optimality, restatement of Corollary 4.2). Let P be a family of distributions and P P. Let µ be a 1-dimensional statistic where |µ(P)| 1. Let αpriv(ε, β) be the minimax error of any ε-DP algorithm with failure probability β 1/4 that estimates statistic µ(P). Let n > 1. Suppose that there exists a constant c such that the non-private error αpriv( , β) 1 nc for any β 1/2. Then there are constants c1 c2 > 0 such that βp = 1/nc1 and β p = 1/nc2, robust algorithm Arob, and a choice of ρ, such that the ρ-smooth inverse-sensitivity mechanism M ρ Inv( ; Arob) with privacy parameter ε achieves the minimax optimal error O(αpriv(ε, βp)) with probability 1 β p. Proof. Let αpriv(ε, β) be the minimax error for estimating µ under family P for any β 1/4. By Theorem 3.3 and Theorem A.1, there exists a deterministic τ-robust algorithm with τ = log(1/γ)/nε, with accuracy α1 = 2αpriv(ε, β) and failure probability β1 = 2β/γ. Via the transformation of Theorem 3.1, choosing ρ = 1 nc αpriv(ε, β), we can construct an ε-DP algorithm with failure probability 2β1 = 4β/γ and accuracy 4α1 = 8αpriv(ε, β), if τ = log(1/γ) nε 2 log(nc+1)+2 log(γ/(4β)) nε . Setting γ = (4β/(2n)c)2/3 satisfies the requirement. Thus, the ε-DP algorithm constructed via the transformation in Theorem 3.1 has error at most 8αpriv(ε, β) with failure probability at most β p = 4β/γ = (4β)1/3 (2n)2c/3. There exist constants c1 c2 > 0 such that β = βp = 1 nc1 and β p = 1 nc2 . B. Improved Transformation for Approximate DP In this section, we propose a different transformation for (ε, δ)-DP that avoids the necessary dependence on diameter for pure ε-DP. Our transformation uses a truncated version of the inverse-sensitivity mechanism which only outputs values with bounded inverse sensitivity. This mechanism is not differentially private for all inputs, therefore, in order to guarantee privacy, we use a private test to verify that the input is well-behaved before running the truncated inverse-sensitivity mechanism. This approach can be viewed as a special case of the restricted exponential mechanism of Brown et al. (2021) (or the even more general HPTR framework (Liu et al., 2022)), which in turn has been inspired by the propose-test-release (PTR) framework (Dwork and Lei, 2009). However, we choose a simplified algorithm and presentation, which is tailored to our case, where we have the smooth inverse sensitivity as our cost function. B.1. Truncated Inverse-Sensitivity Mechanism We develop a truncated version of the inverse-sensitivity mechanism which is (ε, δ)-DP. This mechanism uses a truncated version of the inverse sensitivity as follows: given a function f : Zn Rd and threshold K, lentrunc f (S; t) := ( lenρ f(S; t) if lenρ f(S; t) K otherwise. The truncated inverse-sensitivity mechanism M ρ trunc( ; f) then applies the exponential mechanism using this score function, resulting in the following density given an input dataset S: πS(t) = e lentrunc f (S;t)ε/2 R s Rd e lentrunc f (S;s)ε/2ds (12) Before proving the guarantees of the truncated inverse-sensitivity mechanism, we need to define (ε, δ)-indistinguishable distributions: Definition B.1 ((ε, δ)-indistinguishability). Two distributions P, Q over domain T are (ε, δ)-indistinguishable, denoted by P ε,δ Q, if for any measurable subset T T , Pr t P[t T] eε Pr t Q[t T] + δ and Pr t Q[t T] eε Pr t P[t T] + δ. From Robustness to Privacy and Back Note that if A(S) ε,δ A(S ) for any neighboring datasets S, S , then A is (ε, δ)-differentially private. We have the following guarantees for the truncated inverse-sensitivity mechanism. Proposition B.2. Let n 1, ε, δ (0, 1), B > 0, and f : Zn Rd. Let K d+log(1/δ) ε and Sbad = {S Zn : ωf(S; K + 1) > B}. For any S / Sbad, the truncated inverse-sensitivity mechanism (12) with ρ = 2B has error M ρ trunc(S; f) f(S) 3B. Moreover, for any S / Sbad and neighboring dataset S , M ρ trunc(S; f) ε,δ M ρ trunc(S ; f). Proof. The claim about utility follows directly from the definition of the truncated inverse-sensitivity as the probability of returning t such that lentrunc f (S; t) K is zero. Now we proceed to prove the privacy claim. Let S Sc bad and S Zn be two neighboring datasets and T T . Since ωf(S; K + 1) B, we have that ωf(S ; K) 2B. Thus, it suffices to show that for any two neighboring datasets S and S such that ωf(S ; K) 2B and ωf(S; K) 2B, we have Pr[M ρ trunc(S; f) T] eε Pr[M ρ trunc(S ; f) T] + δ. Let T0 = {t T : lenρ f(S; t) = K}. Now we have Pr[M ρ trunc(S; f) T] = Pr[M ρ trunc(S; f) T \ T0] + Pr[M ρ trunc(S; f) T T0] eε Pr[M ρ trunc(S ; f) T \ T0] + e Kε Vol(Bd(ρ))Vol(T T0) eε Pr[M ρ trunc(S ; f) T] + e Kε Vol(Bd(ρ))Vol(T T0), where the first inequality follows since for t / T0 we have that either |lentrunc f (S; t) lentrunc f (S ; t)| 1 or lentrunc f (S; t) = . Since ωf(S; K) 2B, we get Vol(T T0) Vol(T0) Vol(Bd(2B + ρ)). For ρ = 2B, this implies that Vol(T T0)/Vol(Bd(ρ)) Vol(Bd(4B))/Vol(Bd(2B)) = 2d. Therefore, for K d+log(1/δ) ε , we have that Pr[M ρ trunc(S; f) T] eε Pr[M ρ trunc(S ; f) T] + δ. By symmetry, we can use the same argument, to show that Pr[M ρ trunc(S ; f) T] eε Pr[M ρ trunc(S; f) T] + δ. Thus, overall we show that M ρ trunc(S; f) ε,δ M ρ trunc(S ; f) for all S / Sbad and neighboring dataset S : d H(S, S ) 1. While it may seem that the truncated inverse sensitivity provides the desired transformation for (ε, δ)-DP, note that it requires the condition ωf(S; K + 1) B to hold for all inputs S Zn. However, robust algorithms only guarantee boundedness of ωf(S; K + 1) for S iid P n. To this end, in the next section we show how to use propose-test-release (PTR) in order to overcome this barrier. B.2. PTR-based Transformation Building on the truncated inverse-sensitivity mechanism, in this section we use propose-test-release (PTR) to design a transformation from robust algorithms into approximate (ε, δ)-DP algorithms where the error does not depend on the diameter. An equivalent approach would be to use the restricted exponential mechanism from (Brown et al., 2021) with the smooth inverse-sensitivity lenρ f(S; t) as its cost function. The main idea of this approach is to perform a private test to check if the input S is far from unsafe , before running the exponential mechanism restricted to points t with lenρ f(S; t) K. The set unsafe consists of datasets on which running the restricted exponential mechanism would not produce (ε, δ)- indistinguishable outputs. However, our specific score function allows us to simplify the unsafe set, and this is the algorithm we present in this section. The following theorem states its guarantees. We present our transformation in Algorithm 3. Theorem B.3 (Robust-to-private, approximate DP, restatement of Theorem 5.1). Let S = (S1, . . . , Sn) where Si iid P such that µ(P) Rd. Let ε, δ, β (0, 1). Let Arob : (Rd)n Rd be a deterministic (τ, β, α)-robust algorithm for the statistic µ. If τ 8(d+log(1/ min{δ,β})) nε then Algorithm 1 with B = 2α and ρ = 2B is (ε, δ)-DP and, with probability at least 1 2β returns ˆµ such that ˆµ µ 7α. Proof. We start by proving the privacy guarantees of Algorithm 1. Note that the Laplace mechanism (Dwork et al., 2006) implies that ˆd is ε/2-DP as the function d has sensitivity 1. By assumption on τ, K = nτ/2 1 2(d+log(2/δ)) ε . Thus, by Proposition B.2, for input dataset S, if ωf(S; K + 1) B then the truncated inverse-sensitivity is (ε/2, δ/2)-DP. On the From Robustness to Privacy and Back Algorithm 3 Robust-to-Private ((ε, δ)-DP), restatement of Algorithm 1 Require: S = (S1, . . . , Sn), (τ, β, α)-robust algorithm Arob, local modulus bound B 1: Let K = nτ/2 1 2: Let Sbad = {S Zn : ωf(S; K + 1) > B} 3: Calculate d = dist(S, Sbad) 4: Set ˆd = d + ζ where ζ Laplace(2/ε) 5: if ˆd > 2 log(1/ min(δ, β))/ε then 6: Sample t from the truncated inverse-sensitivity mechanism (12) with threshold K, privacy parameter ε/2, smoothness parameter ρ = 2B, and return t. 7: else 8: Return 9: end if other hand, if ωf(S; K + 1) > B then d = 0 and therefore ˆd 2 log(1/δ)/ε with probability 1 δ/2 and the algorithm returns . Overall, by composition, Algorithm 1 is (ε, δ)-DP. We now prove the accuracy guarantee. Let S iid P n. By the guarantee of the robust algorithm, with probability 1 β, for all S such that d H(S, S ) τn, we get that Arob(S ) µ(P) α. Therefore, for any S 1 such that d H(S, S 1) τn/2, we have ωf(S 1; nτ/2) = sup S 2:d H(S 1,S 2) nτ/2 Arob(S 1) Arob(S 2) sup S 2:d H(S 1,S 2) nτ/2 ( Arob(S 1) µ(P) + µ(P) Arob(S 2) ) α + sup S 2:d H(S,S 2) nτ µ(P) Arob(S 2) Since B = 2α, we have that with probability 1 β, d => nτ/2 and in particular S / Sbad. By the concentration guarantees of the Laplace distribution, we have that ˆd > nτ/2 2 log(1/β)/ε with probability at least 1 β, and thus ˆd > 2 log(1/ min{β,δ}) ε , which implies that the algorithm will run the truncated inverse-sensitivity mechanism. Proposition B.2 now implies that the latter will return ˆµ such that ˆµ Arob(S) 3B. Moreover, Arob(S) µ α. Overall we get that with probability 1 2β, ˆµ µ ˆµ Arob(S) + Arob(S) µ 3B + α = 7α. This completes the proof of the theorem. C. More Applications for Pure DP In this section we apply our main transformation in Theorem 3.1 to fundamental tasks in private statistics to demonstrate that for all these tasks near-optimal error can be achieved by instantiating our black-box reduction with a robust estimator for the same task. In Appendix C.1 and Appendix C.2 we show that we can retrieve known optimal results for mean and covariance estimation of Gaussian distributions up to logarithmic factors. In Appendix C.3 and Section 6.1, we show that our transformation gives the first algorithms with optimal error for linear regression (including the sparse case) and PCA for Gaussian distributions. Our results for PCA hold for subgaussian distributions more generally. For the majority of this section we will use the more general transformation, proven in Appendix A.2Theorem A.2. The general strategy we follow in our applications is simple. We choose a known robust algorithm A for the statistic µ B(R) we want to estimate. Informally, let us denote its accuracy by α(τ), as it will be a function of the fraction of corruptions in the dataset τ (among other parameters). Applying our robust-to-private transformation from Theorem 3.1, we retrieve an ε-DP algorithm with accuracy roughly α(τ ) for τ d log(R /α0)+log(1/β) nε . More precisely, we let Arob be the algorithm that runs A and then projects its output on B(R ), where R is such that, with high probability, the projection From Robustness to Privacy and Back will have no effect and will maintain the accuracy guarantees of A. Let α0 be the error rate for learning statistic µ without privacy constraints or corruptions, which is always smaller than α(τ). We run the ρ-smooth-inverse-sensitivity mechanism instantiating it with the projected robust algorithm Arob and with smoothness parameter ρ = α0. In most applications, α(τ) = O(τ) so the error we incur on top of the non-private error α0 is O(d/nε). Theorem A.2extends Theorem 3.1 allowing us to measure the error of the algorithm with respect to a loss function L that may depend on unknown parameters and thus can not be computed directly. As long as this loss satisfies the triangle inequality, and any error we incur due to the smoothness ρ in norm upper-bounds the error in L up to constants, the statement of our main theorem still holds. For useful linear algebra facts and definitions, see Appendix D. C.1. Mean Estimation C.1.1. KNOWN COVARIANCE We start with the task of estimating the mean µ of a d-dimensional Gaussian distribution with known covariance Σ. By applying Σ 1/2 to all the points, this case can be reduced to spherical Gaussian mean estimation, where we can assume Σ = I. We also assume that we know a priori a bound R such that µ 2 R.2 Corollary C.1 states that via our transformation, we can retrieve the optimal error for Gaussian mean estimation with known covariance under pure DP, matching optimal bounds (Bun et al., 2019; Liu et al., 2021). Corollary C.1 (Spherical Gaussian mean). Let S = (S1, . . . , Sn) where Si iid N(µ, I) such that µ B(R). Let ε, β (0, 1) and C 1 a known constant. Suppose n is such that α 1 in Equation (13). There exists an ε-DP algorithm M such that, with probability at least 1 β, has error M(S) µ 2 α for n + d log Rn Since we are not concerned with computational efficiency, we will use the Tukey median as the robust Gaussian mean estimation algorithm for our transformation. The Tukey depth (Tukey, 1960) of a point t with respect to a distribution P is defined by TP (t) := inf v Rd Pr S P[ S, v t, v ]. We denote by TS(t) := 1 i [n][ Si, v t, v ] the (normalized) Tukey depth of t with respect to dataset S. The Tukey median with respect to any dataset S is then tm(S) = argmaxt Rd TS(t). Let ΠC(t) = argminv C v t 2 be the euclidean projection of a point t to convex set C. The next proposition states the robustness guarantees of (projected) Tukey median, which have been long-established (for a complete proof see e.g. (Li, 2019) or the more general Proposition D.4 in Appendix D). Proposition C.2. Let S = (S1, . . . , Sn) where Si iid N(µ, I) such that µ B(R). Let β (0, 1), τ 0.05, and α0 = C0 p (d + log(1/β))/n for a known constant C0 1. Suppose n is such that α0 0.05. Let α = 7(α0 + τ) 1. The projected Tukey median algorithm Arob(S) = ΠB(R+1)(tm(S)) is (τ, β, α)-robust. That is, with probability 1 β, for any τ-corrupted S , such that d H(S, S ) nτ, it holds that, Arob(S ) µ 2 α. Using the above proposition, the proof of Corollary C.1 is a straightforward application of Theorem 3.1. Proof of Corollary C.1. Consider the ρ-smooth-inverse-sensitivity mechanism M ρ Inv( ; Arob) with norm = 2, Arob(S) = ΠB(R+1)(tm(S)) and ρ = α0 = C0 p (d + log(1/β))/n, as in Proposition C.2 above. We apply Theorem 3.1 to obtain a bound on the error of M ρ Inv(S; Arob). Let τ = 2d log R+1 α0 + 1 + 2 log( 1 2Knowledge of R is necessary for mean estimation under pure DP (Hardt and Talwar, 2010; Beimel et al., 2014; Bun et al., 2019). From Robustness to Privacy and Back Assume τ 0.05 and α0 0.05, which we will confirm later. By Proposition C.2, Arob is (τ, β, α)-robust for α0 + 1 + 2 log( 1 n + d log Rn for constant C = 42C0. Notice that α0 α. Therefore, by Theorem 3.1, it holds that, with probability at least 1 2β, M ρ Inv(S; Arob) µ 2 4α C q β ) n + d log( Rn , for C = 168C0. By assumption, n is sufficiently large so that the latter is less than 1 and as such, it also ensures that α0 0.05 and τ 0.05. The proof is complete by rescaling β β/2 and adjusting the constants. C.1.2. UNKNOWN COVARIANCE We now move to the more general task of Gaussian mean estimation with unknown mean µ and covariance Σ, but with known a priori bounds R, κ such that µ Bd(R) and I Σ κI. The error metric is the affine-invariant Mahalanobis distance with respect to Σ, defined by ˆµ µ Σ := p (ˆµ µ) Σ 1(ˆµ µ). In Corollary C.3, we show that via our transformation, we retrieve known error bounds for Gaussian mean estimation with known parameters R, κ under pure DP (Bun et al., 2019; Liu et al., 2021).3 Corollary C.3 (Gaussian mean). Let S = (S1, . . . , Sn) where Si iid N(µ, Σ) such that µ B(R) and I Σ κI. Let ε, β (0, 1) and C 1 a known constant. Suppose n is such that α 1 in Equation (14). There exists an ε-DP algorithm M such that, with probability at least 1 β, has error M(S) µ Σ α for n + d log (R+ κ)n Again, we choose the projected Tukey median as our robust mechanism for this task. We state its guarantees for the Mahalanobis loss (proven in Appendix D). Proposition C.4. Let S = (S1, . . . , Sn) where Si iid N(µ, Σ) such that µ B(R) and I Σ κI. Let β (0, 1), τ 0.05, and α0 = C0 p (d + log(1/β))/n for known constant C0. Suppose n is such that α0 0.05. Let α = 7(α0+τ) 1. The projected Tukey median algorithm Arob(S) = ΠB(R+ κ)(tm(S)) is (τ, β, α)-robust with respect to the Mahalanobis loss. That is, with probability 1 β, for any τ-corrupted S , such that d H(S, S ) nτ, it holds that Arob(S ) µ Σ α. Using the above proposition, the proof of Corollary C.3 is a straightforward application of Theorem A.2. Proof of Corollary C.3. We let L(u, v) = u v Σ be the loss function. As a norm, L satisfies the triangle inequality. Moreover, s, t Rd L(s, t) c L s t 2 for c L = 1 since I Σ (by Proposition D.1 in Appendix D). Consider the ρ-smooth-inverse-sensitivity mechanism M ρ Inv( ; Arob) with norm = 2, Arob(S) = ΠB(R+ κ)(tm(S)) and ρ = α0 = C0 p (d + log(1/β))/n. We apply Theorem A.2 to obtain a bound on the mechanism s error with respect to L. Let τ = 2d log R+ κ α0 + 1 + 2 log( 1 3These results are stated for Σ = I, but can be extended to the case of unknown Σ such that I Σ κI, achieving the same error as in Corollary C.3 up to logarithmic factors. From Robustness to Privacy and Back Assume τ 0.05 and α0 0.05, which we will confirm later. By Proposition C.2, Arob is (τ, β, α)-robust for n + 7 2d log R+ κ α0 + 1 + 2 log( 1 n + d log (R+ κ)n for constant C = 28C0. Notice that α0 α. Therefore, by Theorem A.2, it holds that, with probability at least 1 2β, L(M ρ Inv(S; Arob), µ) = M ρ Inv(S; Arob) µ Σ 4α C n + d log (R+ κ)n for C = 112C0. By assumption, n is sufficiently large so that the latter is less than 1, and as such, it also ensures that α0 0.05 and τ 0.05. The proof is complete by rescaling β β/2 and adjusting the constants. C.2. Covariance Estimation Given dataset S iid N(0, Σ)n, where I Σ κI, our goal is to return an estimate ˆΣ Rd d, with small error, measured by the relative Frobenius norm: Σ 1/2 ˆΣΣ 1/2 I F. The task of covariance estimation for Gaussian distributions has been extensively studied both under robustness and differential privacy, and is particularly useful as a first step for learning a Gaussian distribution in total variation distance (see e.g. Corollary 2.14 in (Diakonikolas et al., 2016)). Note that the fact that the distribution is assumed to be zero-mean is w.l.o.g., as the general case can be reduced to the zero-mean case up to constant factors in the error, by letting the difference between a pair of nonzero-mean samples be a single zero-mean sample. In Corollary C.5, we show that via our transformation, we retrieve the optimal known error bounds for Gaussian covariance estimation with known parameter κ under pure DP (Bun et al., 2019; Aden-Ali et al., 2021).4 Corollary C.5 (Gaussian covariance). Let S = (S1, . . . , Sn) where Si iid N(0, Σ) such that I Σ κI. Let ε, β (0, 1). Suppose n is such that α 1 in Equation (15). There exists an ε-DP algorithm M such that, with probability at least 1 β, has error Σ 1/2M(S)Σ 1/2 I F α for polylog(nκ/β) There are several algorithms in the robust statistics literature that achieve near-optimal bounds for robust covariance estimation of Gaussian distributions, which can serve as a good instantiation of our transformation. The next theorem states the robust accuracy guarantees of the algorithm proposed in (Diakonikolas et al., 2017).5 Theorem C.6 ((Diakonikolas et al., 2017)). Let S = (S1, . . . , Sn) where Si iid N(0, Σ). Let β (0, 1), τ (0, 1). Suppose n Ω d2 log5(d/τβ) τ 2 . Let α = O (τ log (1/τ)) . There exists algorithm Arob which is (τ, β, α )-robust. That is, with probability 1 β, for any τ-corrupted S , such that d H(S, S ) nτ, it returns matrix Arob(S ) = ˆΣ such that Σ 1/2 ˆΣΣ 1/2 I F α . Proof of Corollary C.5. We will run the ρ-smooth-inverse-sensitivity mechanism over vectors RD, D = d2, with = 2. We denote by vec(V ) Rd2 the flattening of a matrix V Rd d, so that if vec(V ) = v, then Vi,j = vd(i 1)+j. Then vec(U) vec(V ) 2 = U V F. Let A be the robust algorithm established in Theorem C.6. We will instantiate our 4Knowledge of parameter κ is necessary for this task under pure DP (Bun et al., 2019; Alabi et al., 2022). 5This algorithm as well as other alternatives (Cheng et al., 2019; Li and Ye, 2020) are computationally efficient. It is possible that by using a computationally inefficient algorithm we would achieve smaller error up to logarithmic factors, but since we are not aiming to optimize for those factors, we chose the clearer statement from (Diakonikolas et al., 2017). From Robustness to Privacy and Back transformation with Arob(S) = ΠBD(R )(vec(A(S))) for R = 2 dκ, that is, after flattening the output of A, we take its euclidean projection on the D = d2-dimensional ball of radius R in ℓ2 norm. Let ˆΣ = A(S). We have that vec(ˆΣ) 2 = ˆΣ F Σ F + ˆΣ Σ F (triangle inequality) = Σ F + ΣΣ 1(ˆΣ Σ) F Σ F + Σ F Σ 1/2(ˆΣ Σ)Σ 1/2 F ( F sub-multiplicative, rotation-invariant) dκ 1 + Σ 1/2(ˆΣ Σ)Σ 1/2 F By Theorem C.6, the latter is at most dκ(1 + α ) with probability 1 β. Suppose α 1, which we will confirm last. Thus, with probability 1 β, the projection on the euclidean ball with radius R = 2 dκ will not affect the output of the algorithm and Arob will have the same accuracy guarantees as stated in Theorem C.6. We will let the loss function L : (Rd d)2 R be L(U, V ) = Σ 1/2(U V )Σ 1/2 F over matrices U, V . Our goal is then to return a matrix U with small error L(U, Σ).6 Note that L satisfies the triangle inequality since the Frobenius norm does. For all u, v Rd2, let V, U Rd d denote their corresponding matrices. We have that L(U, V ) = Σ 1(U V ) F (U V ) F = u v 2, since Σ 1 I and F is monotone. It follows that L satisfies all the requirements of Theorem A.2. Let α0 = O( p (d2 + log(1/β))/n) < 1, by assumption on n. We take τ which satisfies both τ = Ω 2D log(R /α0+1)+2 log(1/β) nε = Ω d2 log(κn/d)+log(1/β) nε (required by Theorem A.2) and τ = Ω q d2 log5(d/τβ) (required by Theorem C.6). Then Arob is (τ, β, α )-robust with polylog(nκ/β) We then have that M ρ Inv( , Arob) with ρ = α0 is ε-DP and with probability at least 1 2β, returns matrix ˆV , which has error L( ˆV , Σ) = Σ 1/2 ˆV Σ1/2 I F 4α = α. By assumption, n is large enough so that α 1 and as such α < 1 as well. The statement follows by rescaling β β/2 and adjusting the constants. C.3. Linear Regression In this section, we apply our transformation to obtain an algorithm for linear regression for Gaussian data under pure DP. To the best of our knowledge, Corollary C.7 gives the first (computationally inefficient) algorithm for pure DP which achieves the optimal error rate up to logarithmic factors for Gaussian distributions. Liu et al. (2022) gave the analogous result under approximate DP. Corollary C.7 (Gaussian Linear Regression). Let S = (S1, . . . , Sn) where for all i [n], Si = (Xi, yi) Rd R is generated by a linear model yi = X i θ + ηi for some unknown θ Bd(R), where Xi iid N(0, Σ), I Σ κI, and ηi iid N(0, σ2), independent from Xi. Let ε, β (0, 1). Let n + d log (R/σ+κ)n for a known constant C > 0. Suppose n is such that α/σ c for a known constant c (0, 1). Then there exists an ε-DP algorithm M such that, with probability at least 1 β, returns M(S) = ˆθ such that Σ 1/2(ˆθ θ) 2 α. 6We can straightforwardly convert any vector v Rd2 to a unique matrix V Rd d such that v = vec(V ). From Robustness to Privacy and Back Since the running time of the robust algorithm is not the bottleneck for the computational complexity of our proposed approach, we will instantiate our transformation with the (computationally inefficient) robust linear regression algorithm from (Gao, 2020). This algorithm achieves the information-theoretic optimal error for Gaussian distributions and is based on the notion of multivariate regression depth, similar to the Tukey depth we used for Gaussian mean estimation in Appendix C.1.7 Theorem C.8 (Theorem 3.2, (Gao, 2020)). Consider the setting of Corollary C.7. Let β (0, 1), τ (0, 1). Suppose n and τ are such that τ + p d/n < c for a known constant c (0, 1). Then there exists constant C > 0 and algorithm Arob which is (τ, β, α )-robust, for That is, with probability 1 β, for any τ-corrupted S , such that d H(S, S ) nτ, it returns Arob(S ) = ˆθ Rd such that Σ 1/2(ˆθ θ) 2 α . Proof of Corollary C.7. We will run the ρ-smooth-inverse-sensitivity mechanism in Rd with = 2. Let A be the robust algorithm established in Theorem C.8. We will instantiate our transformation with Arob(S) = ΠB(R )(A(S)) for R = R + σ κ, that is, we take the euclidean projection of A(S) on the ball of radius R in ℓ2 norm. Let ˆθ = A(S). We have that ˆθ 2 θ 2 + ˆθ θ 2 (triangle inequality) Σ 1/2(ˆθ θ) 2 R + κ Σ 1/2(ˆθ θ) 2 . Let α0 = C σ p (d + log(1/β))/n for C > 0 as in Theorem C.8 and τ = 2d log(R /α0 + 1) + 2 log(1/β) Assume that n is such that C q β ) n + τ < c for c (0, 1) and for C > 0 as in Theorem C.8, which we will confirm last. Then, the conditions of Theorem C.8 are satisfied, and with probability 1 β, R + κ Σ 1/2(ˆθ θ) 2 R + κα R + σ κ = R and the projection will not affect the output of the algorithm Arob. We will let the loss function L : (Rd)2 R be L(u, v) = Σ 1/2(u v) 2. Our goal is then to return a vector u with small error L(u, θ). Note that L satisfies the triangle inequality. For all u, v Rd, we have that L(u, v) = Σ 1/2(u v) 2 u v 2, since Σ 1/2 I. It follows that L satisfies all the requirements of Theorem A.2. Thus, M ρ Inv( , Arob) with ρ = α0 < α is ε-DP and with probability at least 1 2β, returns ˆu, which has error L(ˆu, θ) = Σ 1/2(ˆu θ) 2 4α . That is, there exists C > C , such that Σ 1/2(ˆu θ) 2 α, for n + d log((R + σ κ)n/(σd)) + log(1/β) By assumption n is sufficiently large so that the latter is smaller than σc, and as such, it ensures that C q β ) n + τ < c as well. The proof is complete by rescaling β β/2 and adjusting the constants. 7The result is stated for the weaker Huber s contamination model, but it holds under the strong contamination model as well. From Robustness to Privacy and Back C.3.1. SPARSE LINEAR REGRESSION We now apply our transformation to obtain an algorithm for sparse linear regression for Gaussian data under pure DP, which, to the best of our knowledge, is the first (computationally inefficient) algorithm this case that achieves near-optimal error rate. When the solution is known to be k-sparse, our transformation allows us to improve the dependence on dimension from d/nε to k log d/(nε) as we show in the next corollary. Corollary C.9 (Sparse Linear Regression). Let S = (S1, . . . , Sn) where for all i [n], Si = (Xi, yi) Rd R is generated by a linear model yi = X i θ + ηi for some unknown θ Bd(R), θ 0 k, where Xi iid N(0, Σ), I Σ κI, and ηi iid N(0, σ2), independent from Xi. Let ε, β (0, 1). Let k ) + log( 1 n + k log( ed k ) + k log (R/σ+κ)n for a known constant C > 0. Suppose n is such that α/σ c for a known constant c (0, 1). Then there exists an ε-DP algorithm M such that, with probability at least 1 β, returns M(S) = ˆθ such that Σ 1/2(ˆθ θ) 2 α. We use the robust algorithm for sparse linear regression by (Gao, 2020). Theorem C.10 (Theorem 3.2, (Gao, 2020)). Consider the setting of Corollary C.9. Let β (0, 1), τ (0, 1). Suppose n and τ are such that τ + p k log(ed/k)/n < c for a known constant c (0, 1). Then there exists constant C > 0 and algorithm Arob which is (τ, β, α )-robust, for k ) + log( 1 That is, with probability 1 β, for any τ-corrupted S , such that d H(S, S ) nτ, it returns Arob(S ) = ˆθ Rd such that Σ 1/2(ˆθ θ) 2 α . The proof of Corollary C.9 follows exactly the same steps as Corollary C.7, but uses the slightly modified inverse-sensitivity mechanism for sparse estimation and its guarantees in Theorem A.5 instead of Theorem 6.2. D. Useful Facts and Proofs for Applications D.1. Linear Algebra Facts and Definitions We denote by v M = M 1/2v = v M 1v the Mahalanobis norm of vector v with respect to M for any positive definite matrix M. Observe that v I = v 2. Proposition D.1. For positive definite matrices Σ1, Σ2, if Σ1 Σ2, then for any vector v, v Σ2 v Σ1. Let A Rd d. We denote the spectral norm of A by A 2 = sup{ Ax 2 : x Rd s.t. x 2 = 1} and its Frobenius norm by A F = q Pd j=1 Pd i=1 |Ai,j|2. It holds that A 2 A F D.2. Robustness Guarantee of Tukey Median We first state known properties of the Tukey depth for Gaussian datasets. The next proposition relates the Tukey depth of a point to its Mahalanobis distance from the mean (see e.g. Proposition D.2 in (Brown et al., 2021) for a proof). Here, Φ is the CDF of the univariate standard Gaussian. Proposition D.2. For any µ, y Rd and positive definite Σ, TN(µ,Σ)(y) = Φ( y µ Σ). The next proposition states the uniform convergence property of Tukey depth. It follows from standard uniform convergence of halfspaces (Vapnik and Chervonenkis, 1971), extended to the definition of Tukey depth (Donoho and Gasko, 1992; Burr and Fabrizio, 2017) (see e.g. (Liu et al., 2021) for a complete proof). From Robustness to Privacy and Back Proposition D.3 (Convergence of Tukey Depth). Let S = (S1, . . . , Sn) where Si iid N(µ, Σ). There exists constant C0 such that, with probability 1 β, for any v Rd, |TN(µ,Σ)(v) TS(v)| C0 q Proposition D.4 (Robust Accuracy of Tukey Median, Restatement of Proposition C.4). Let S = (S1, . . . , Sn) where Si iid N(µ, Σ) such that µ B(R) and I Σ κI. Let β (0, 1), τ 0.05, and α0 = C0 p (d + log(1/β))/n as in Proposition D.3. Suppose n is such that α0 0.05. Let α = 7(α0 + τ) 1. The projected Tukey median algorithm Arob(S) = ΠB(R+ κ)(tm(S)) is (τ, β, α)-robust with respect to the Mahalanobis loss. That is, with probability 1 β, for any τ-corrupted S , such that d H(S, S ) nτ, it holds that Arob(S ) µ Σ α. Proof. Let S be any τ-corruption of S, that is, d H(S, S ) nτ. Observe that |TS(v) TS (v)| τ for any v Rd by the definition of Tukey depth. Let t m = argmaxv Rd TS (v) be the Tukey median of the corrupted dataset. We condition on the event that the bound of Proposition D.3 holds, which occurs with probability 1 β. We have that TN(µ,Σ)(µ) = 1 2 (by Proposition D.2 since Φ(0) = 1 2 α0 (by Proposition D.3) 2 α0 τ (by definition of t m) TN(µ,Σ)(t m) 1 2 2α0 2τ (by Proposition D.3) Φ( t m µ Σ) 1 2 2α0 2τ (by Proposition D.2) 2Erf t m µ Σ 2(α0 + τ) (since Φ( z) = 1 It is easy to see that the following bound holds for the error function 0.84z Erf(z) for z [0, 1] (see e.g. Lemma 3.2 in (Canonne et al., 2020)). It follows that, t m µ Σ 4 2 0.84(α0 + τ) 7(α0 + τ) for α0 + τ 1/7, which holds by assumption. Thus, with probability 1 β, t m µ Σ α, for α = 7(α0 +τ) 1. Since we have assumed that µ 2 R, it follows that t m 2 µ 2 + t m µ 2 R + κ t m µ Σ R + κα R + κ, where the second inequality holds due to Proposition D.1. Then t m B(R + κ) and the projection will not affect the output.