# robust_and_differentially_private_mean_estimation__2c35af25.pdf Robust and differentially private mean estimation Xiyang Liu, Weihao Kong, Sham Kakade, Sewoong Oh Paul G. Allen School of Computer Science and Engineering, University of Washington {xiyangl,whkong,sham,sewoong}@cs.washington.edu In statistical learning and analysis from shared data, which is increasingly widely adopted in platforms such as federated learning and meta-learning, there are two major concerns: privacy and robustness. Each participating individual should be able to contribute without the fear of leaking one s sensitive information. At the same time, the system should be robust in the presence of malicious participants inserting corrupted data. Recent algorithmic advances in learning from shared data focus on either one of these threats, leaving the system vulnerable to the other. We bridge this gap for the canonical problem of estimating the mean from i.i.d. samples. We introduce PRIME, which is the first efficient algorithm that achieves both privacy and robustness for a wide range of distributions. We further complement this result with a novel exponential time algorithm that improves the sample complexity of PRIME, achieving a near-optimal guarantee and matching a known lower bound for (non-robust) private mean estimation. This proves that there is no extra statistical cost to simultaneously guaranteeing privacy and robustness. 1 Introduction When releasing database statistics on a collection of entries from individuals, we would ideally like to make it impossible to reverse-engineer each individual s potentially sensitive information. Privacy-preserving techniques add just enough randomness tailored to the statistical task to guarantee protection. At the same time, it is becoming increasingly common to apply such techniques to databases collected from multiple sources, not all of which can be trusted. Emerging data access frameworks, such as federated analyses across users devices or data silos [50], make it easier to temper with such collected datasets, leaving private statistical analyses vulnerable to a malicious corruption of a fraction of the data. Differential privacy has emerged as a widely accepted de facto measure of privacy, which is now a standard in releasing the statistics of the U.S. Census data [2] statistics and also deployed in real-world commercial systems [74, 40, 41]. A statistical analysis is said to be differentially private (DP) if the likelihood of the (randomized) outcome does not change significantly when a single arbitrary entry is added/removed (formally defined in 1.2). This provides a strong privacy guarantee: even a powerful adversary who knows all the other entries in the database cannot confidently identify whether a particular individual is participating in the database based on the outcome of the analysis. This ensures plausible deniability, central to protecting an individual s privacy. In this paper, we focus on one of the most canonical problems in statistics: estimating the mean of a distribution from i.i.d. samples. For distributions with unbounded support, such as sub-Gaussian and heavy-tailed distributions, fundamental trade-offs between accuracy, sample size, and privacy have only recently been identified [58, 52, 54, 3] and efficient private estimators proposed. However, these approaches are brittle when a fraction of the data is corrupted, posing a real threat, referred to as data poisoning attacks [19, 79]. In defense of such attacks, robust (but not necessarily private) statistics has emerged as a popular setting of recent algorithmic and mathematical breakthroughs [73, 30]. 35th Conference on Neural Information Processing Systems (Neur IPS 2021). One might be misled into thinking that privacy ensures robustness since DP guarantees that a single outlier cannot change the estimation too much. This intuition is true only in a low dimension; each sample has to be an obvious outlier to significantly change the mean. However, in a high dimension, each corrupted data point can look perfectly uncorrupted but still shift the mean significant when colluding together (e.g., see Fig. 1). Focusing on the canonical problem of mean estimation, we introduce novel algorithms that achieve robustness and privacy simultaneously even when a fraction of data is corrupted arbitrarily. For such algorithms, there is a fundamental question of interest: do we need more samples to make private mean estimation also robust against adversarial corruption? Sub-Gaussian distributions. If we can afford exponential run-time in the dimension, robustness can be achieved without extra cost in sample complexity. We introduce a novel estimator that (i) satisfies (ε, δ)-DP, (ii) achieves near-optimal robustness under α-fraction of corrupted data, achieving accuracy of O(α p log(1/α)) nearly matching the fundamental lower bound of Ω(α) that holds even for a (non-private) robust mean estimation with infinite samples, and (iii) achieves near-optimal sample complexity matching that of a fundamental lower bound for a (non-robust) private mean estimation as shown in Table 1. Theorem 1 (Informal Theorem 7, exponential time). Algorithm 2 is (ε, δ)-DP. When α fraction of the data is arbitrarily corrupted from n samples from a d-dimensional sub-Gaussian distribution with mean µ and an identity sub-Gaussian parameter, if n = eΩ(d/α2 + (d + d1/2 log(1/δ))/(αε)) then Algorithm 2 achieves ˆµ µ 2 = O(α p log(1/α)) w.h.p. We introduce PRIME (PRIvate and robust Mean Estimation) in 2.3 with details in Algorithm 9 in Appendix E.1, to achieve computational efficiency. It requires a run-time of only e O(d3 + nd2), but at the cost of requiring extra d1/2 factor larger number of samples. This cannot be improved upon with current techniques since efficient robust estimators rely on the top PCA directions of the covariance matrix to detect outliers. [78] showed that eΩ(d3/2) samples are necessary to compute PCA directions while preserving (ε, δ)-DP when xi 2 = O( d). It remains an open question if this eΩ(d3/2/(αε)) bottleneck is fundamental; no matching lower bound is currently known. Theorem 2 (Informal Theorem 6, polynomial time). PRIME is (ε, δ)-DP and under the assumption of Thm.1, if n = eΩ(d/α2 + (d3/2 log(1/δ))/(αε)), achieves ˆµ µ 2 = O(α p log(1/α)) w.h.p. Upper bound (poly-time) Upper bound (exp-time) Lower bound (ε, δ)-DP [52] e O( d α2 + d log1/2(1/δ) αε ) e O( d α2 + d αε) eΩ( d α-corruption [36] e O( d α2 ) e O( d α-corruption and e O d α2 + d3/2 log(1/δ) α2 + d+d1/2 log(1/δ) α2 + d αε) (ε, δ)-DP (this paper) [Theorem 6] [Theorem 7] [52] Table 1: For estimating the mean µ Rd of a sub-Gaussian distribution with a known covariance, we list the sufficient or necessary conditions on the sample sizes to achieve an error ˆµ µ 2 = e O(α) under (ε, δ)-DP, corruption of an α-fraction of samples, and both. requires the distribution to be a Gaussian [14] and requires δ Heavy-tailed distributions. When samples are drawn from a distribution with a bounded covariance, parameters of Algorithm 2 can be modified to nearly match the optimal sample complexity of (nonrobust) private mean estimation in Table 2. This algorithm also matches the fundamental limit on the accuracy of (non-private) robust estimation, which in this case is Ω(α1/2). Theorem 3 (Informal Theorem 8, exponential time). From a distribution with mean µ Rd and covariance Σ I, n samples are drawn and α-fraction is corrupted. Algorithm 2 is (ε, δ)-DP and if n = eΩ((d + d1/2 log(1/δ))/(αε) + d1/2 log3/2(1/δ)/ε) achieves ˆµ µ 2 = O(α1/2) w.h.p. The proposed PRIME-HT for covariance bounded distributions achieve computational efficiency at the cost of an extra factor of d1/2 in sample size. This bottleneck is also due to DP PCA, and it remains open whether this gap can be closed by an efficient estimator. Theorem 4 (Informal Theorem 9, polynomial time). PRIME-HT is (ε, δ)-DP and if n = eΩ((d3/2 log(1/δ))/(αε)) achieves ˆµ µ 2 = O(α1/2) w.h.p. under the assumptions of Thm. 3. Upper bound (poly-time) Upper bound (exp-time) Lower bound (ε, δ)-DP [54] e O( d log1/2(1/δ) αε ) e O( d log1/2(1/δ) αε) α-corruption [36] e O( d α-corruption and e O d3/2 log(1/δ) αε e O( d+d1/2 log3/2(1/δ) αε) (ε, δ)-DP (this paper) [Theorem 9] [Theorem 8] ([54]) Table 2: For estimating the mean µ Rd of a covariance bounded distribution, we list the sufficient or necessary conditions on the sample size to achieve an error ˆµ µ 2 = O(α1/2) under (ε, δ)-DP, corruption of an α-fraction of samples, and both. 1.1 Technical contributions We introduce PRIME which simultaneously achieves (ε, δ)-DP and robustness against α-fraction of corruption. A major challenge in making a standard filter-based robust estimation algorithm (e.g., [30]) private is the high sensitivity of the filtered set that we pass from one iteration to the next. We propose a new framework which makes private only the statistics of the set, hence significantly reducing the sensitivity. Our major innovation is a tight analysis of the end-to-end sensitivity of this multiple interactive accesses to the database. This is critical in achieving robustness while preserving privacy and is also of independent interest in making general iterative filtering algorithms private. The classical filter approach (see, e.g. [30]) needs to access the database O(d) times, which brings an extra O( d) factor in the sample complexity due to DP composition. In order to reduce the iteration complexity, following the approach in [36], we propose filtering multiple directions simultaneously using a new score based on the matrix multiplicative weights (MMW). In order to privatize the MMW filter, our major innovation is a novel adaptive filtering algorithm DPTHRESHOLD( ) that outputs a single private threshold which guarantees sufficient progress at every iteration. This brings the number of database accesses from O(d) to O((log d)2). One downside of PRIME is that it requires an extra d1/2 factor in the sample complexity, compared to known lower bounds for (non-robust) DP mean estimation. To investigate whether this is also necessary, we propose a sample optimal exponential time robust mean estimation algorithm in 4 and prove that there is no extra statistical cost to jointly requiring privacy and robustness. Our major technical innovations is in using resilience property of the dataset to not only find robust mean (which is the typical use case of resilience) but also bound sensitivity of that robust mean. 1.2 Preliminary on differential privacy (DP) DP is a formal metric for measuring privacy leakage when a dataset is accessed with a query [37]. Definition 1.1. Given two datasets S = {xi}n i=1 and S = {x i}n i=1, we say S and S are neighboring if d (S, S ) 1 where d (S, S ) max{|S \ S |, |S \ S|}, which is denoted by S S . For an output of a stochastic query q on a database, we say q satisfies (ε, δ)-differential privacy for some ε > 0 and δ (0, 1) if P(q(S) A) eεP(q(S ) A) + δ for all S S and all subset A. Let z Lap(b) be a random vector with entries i.i.d. sampled from Laplace distribution with pdf (1/2b)e |z|/b. Let z N(µ, Σ) denote a Gaussian random vector with mean µ and covariance Σ. Definition 1.2. The sensitivity of a query f(S) Rk is defined as p = sup S S f(S) f(S ) p for a norm x p = (P i [k] |xi|p)1/p. For p = 1, the Laplace mechanism outputs f(S) + Lap( 1/ε) and achieves (ε, 0)-DP [37]. For p = 2, the Gaussian mechanism outputs f(S) + N(0, ( 2( p 2 log(1.25/δ))/ε)2I) and achieves (ε, δ)-DP [38]. We use these output perturbation mechanisms along with the exponential mechanism [69] as building blocks. Appendix A provides detailed survey of privacy and robust estimation. 1.3 Problem formulation We are given n samples from a sub-Gaussian distribution with a known covariance but unknown mean, and α fraction of the samples are corrupted by an adversary. Our goal is to estimate the unknown mean. We follow the standard definition of adversary in [30], which can adaptively choose which samples to corrupt and arbitrarily replace them with any points. Assumption 1. An uncorrupted dataset Sgood consists of n i.i.d. samples from a d-dimensional sub-Gaussian distribution with mean µ Rd and covariance E[xx ] = Id, which is 1-sub-Gaussian, i.e., E[exp(v x)] exp( v 2 2/2) for all v Rd. For some α (0, 1/2), we are given a corrupted dataset S = {xi Rd}n i=1 where an adversary adaptively inspects all the samples in Sgood, removes αn of them, and replaces them with Sbad which are αn arbitrary points in Rd. Similarly, we consider the same problem for heavy-tailed distributions with a bounded covariance. We present the assumption and main results for covariance bounded distributions in Appendix B. Notations. Let [n] = {1, 2, . . . , n}. For x Rd, we use x 2 = (P i [d](xi)2)1/2 to denote the Euclidean norm. For X Rd d, we use X 2 = max v 2=1 Xv 2 to denote the spectral norm. The d d identity matrix is Id d. Whenever it is clear from context, we use S to denote both a set of data points and also the set of indices of those data points. e O and eΩhide poly-logarithmic factors in d, n, 1/α, and the failure probability. Outline. We present PRIME for sub-Gaussian distribution in 2, and present theoretical analysis in 3. We then introduce an exponential time algorithm with near optimal guarantee in 4. Due to space constraints, analogous results for heavy-tailed distributions are presented in Appendix B. 2 PRIME: efficient algorithm for robust and DP mean estimation In order to describe the proposed algorithm PRIME, we need to first describe a standard (non-private) iterative filtering algorithm for robust mean estimation. 2.1 Background on (non-private) iterative filtering for robust mean estimation Non-private robust mean estimation approaches recursively apply the following filter, whose framework is first proposed in [28]. Given a dataset S = {xi}n i=1, the current set S0 [n] of data points is updated starting with S1 = [n]. At each step, the following filter (Algorithm 1 in [63]) attempts to detect the corrupted data points and remove them. 1. Compute the top eigenvector vt arg maxv: v 2=1 v Cov(St 1)v of the covariance of the current data set {xi}i St 1 ; 2. Compute scores for all data points j St 1: τj v t (xj Mean(St 1)) 2 ; 3. Draw a random threshold: Zt Unif([0, 1]) ; 4. Remove outliers from St 1 defined as {i St 1 : τi is in the largest 2α-tail of {τj}j St 1 and τi Zt τmax}, where τmax = maxj St 1 τj This is repeated until the empirical covariance is sufficiently small and the empirical mean ˆµ is output. At a high level, the correctness of this algorithm relies on the key observation that the α-fraction of adversarial corruption can not significantly change the mean of the dataset without introducing large eigenvalues in the empirical covariance. Therefore, the algorithm finds top eigenvector of the empirical covariance in step 1, and tries to correct the empirical covariance by removing corrupted data points. Each data point is assigned a score in step 2 which indicates the badness of the data points, and a threshold Zt in step 3 is carefully designed such that step 4 guarantees to remove more corrupted data points than good data points (in expectation). This guarantees the following bound achieving the near-optimal sample complexity shown in the second row of Table 1. A formal description of this algorithm is in Algorithm 4 in Appendix C. Proposition 2.1 (Corollary of [63, Theorem 2.1]). Under assumption 1, the above filtering algorithm achieves accuracy ˆµ µ 2 O(α p log(1/α)) w.p. 0.9 if n eΩ(d/α2) . Challenges in making robust mean estimation private. To get a DP and robust mean, a naive attempt is to apply a standard output perturbation mechanism to ˆµ. However, this is obviously challenging since the end-to-end sensitivity is intractable. The standard recipe to circumvent this is to make the current state St private at every iteration. Once St 1 is private (hence, public knowledge), making the next state St private is simpler. We only need to analyze the sensitivity of a single step and apply some output perturbation mechanism with (εt, δt). End-to-end privacy is guaranteed by accounting for all these (εt, δt) s using the advanced composition [51]. This recipe has been quite successful, for example, in training neural networks with (stochastic) gradient descent [1], where the current state can be the optimization variable xt. However, for the above (non-private) filtering algorithm, this standard recipe fails, since the state St is a set and has large sensitivity. Changing a single data point in St can significantly alter which (and how many) samples are filtered out. 2.2 A new framework for private iterative filtering Instead of making the (highly sensitive) St itself private, we propose a new framework which makes private only the statistics of St: the mean µt and the top principal direction vt. There are two versions of this algorithm, which output the exactly same ˆµ with the exactly same privacy guarantees, but are written from two different perspectives. We present here the interactive version from the perspective of an analyst accessing the dataset via DP queries (qrange, qsize, qmean, qnorm and q PCA), because this version makes clear the inner operations of each private mechanisms, hence making (i) the sensitivity analysis transparent, (ii) checking the correctness of privacy guarantees easy, and (iii) tracking privacy accountant simple. In practice, one should implement the centralized version (Algorithm 7 in Appendix D), which is significantly more efficient. Algorithm 1: Private iterative filtering (interactive version) Input: S = {xi}i [n], α (0, 1/2), probability ζ (0, 1), # of iterations T = Θ(d), (ε, δ) 1 ( x, B) qrange(S, 0.01ε, 0.01δ) 2 ε1 min{0.99ε, 0.9}/(4 p 2T log(2/δ)), δ1 0.99δ/(8T) 3 if n < (4/ε1) log(1/(2δ1)) then Output: 4 for t = 1, . . . , T do 5 nt qsize({(µℓ, vℓ, Zℓ)}ℓ [t 1], ε1, x, B), if nt < 3n/4 then Output: 6 µt qmean({(µℓ, vℓ, Zℓ)}ℓ [t 1], ε1, x, B) 7 λt qnorm({(µℓ, vℓ, Zℓ)}ℓ [t 1], µt, ε1, x, B) 8 if λt (C 0.01)α log 1/α then Output: µt 9 vt q PCA({(µℓ, vℓ, Zℓ)}ℓ [t 1], µt, ε1, δ1, x, B)) 10 Zt Unif([0, 1]) Output: µt We give a high-level explanation of each step of Algorithm 1 here and give the formal definitions of all the queries in Appendix D. First, qrange returns (the parameters of) a hypercube x+[ B/2, B/2]d that is guaranteed to include all uncorrupted samples while preserving privacy. This is achieved by running d coordinate-wise private histograms and selecting xj as the center of the largest bin for the j-th coordinate. Since covariance is I, qrange returns a fixed B = 8σ p log(dn/ζ). Such an adaptive estimate of the support is critical in tightly bounding the sensitivity of all subsequent queries, which operate on the clipped dataset; all data points are projected as P x+[ B/2,B/2]d(x) = arg miny x+[ B/2,B/2]d y x 2 in all the queries that follow. With clipping, a single data point can now change at most by B The subsequent steps perform the non-private filtering algorithm of 2.1, but with private statistics µt and vt. As the set St changes over time, we lower bound its size (which we choose to be |St| > n/2) to upper bound the sensitivity of other queries qmean, qnorm and q PCA. At the t-th iterations, every time a query is called the data curator (i) uses ( x, B) to clip the data, (ii) computes St by running t 1 steps of the non-private filtering algorithm of 2.1 but with a given fixed set of parameters {(µℓ, vℓ)}ℓ [t 1] (and the given randomness {Zℓ}ℓ [t 1]), and (iii) computes the queried private statistics of St. If the private spectral norm of the covariance of St (i.e., λt) is sufficiently small, we output the private and robust mean ˆµ = µt (line 8). Otherwise, we compute the private top PCA direction vt and draw an randomness Zt to be used in the next step of filtering, as in the non-private filtering algorithm. We emphasize that {Sℓ} are not private, and hence never returned to the analyst. We also note that this interactive version is redundant as every query is re-computing St. In our setting, the analyst has the dataset and there is no need to separate them. This leads to a centralized version we provide in Algorithm 7 in the appendix, which avoids redundant computations and hence is significantly more efficient. The main challenge in this framework is the privacy analysis. Because {Sℓ}ℓ [t 1] is not private, each query runs t 1 steps of filtering whose end-to-end sensitivity could blow-up. Algorithmically, (i) we start with a specific choice of a non-private iterative filtering algorithm (among several variations that are equivalent in non-private setting but widely differ in its sensitivity), and (ii) make appropriate changes in the private queries (Algorithm 1) to keep the sensitivity small. Analytically, the following key technical lemma allows a sharp analysis of the end-to-end sensitivity of iterative filtering. Lemma 2.2. Let St(S) denote the resulting subset of samples after t iterations of the filtering in the queries (qsize, qmean, qnorm, and q PCA) are applied to a dataset S using fixed parameters {(µℓ, vℓ, Zℓ)}t ℓ=1. Then, we have d (St(S), St(S )) d (S, S ), where d (S, S ) max{|S \ S |, |S \ S|}. Recall that two datasets are neighboring, i.e., S S , iff d (S, S ) 1. This lemma implies that if two datasets are neighboring, then they are still neighboring after filtering with the same parameters, no matter how many times we filter them. Hence, this lemma allows us to use the standard outputperturbation mechanisms with (ε1, δ1)-DP. Advanced composition ensures that end-to-end guarantee of 4T such queries is (0.99ε, 0.99δ)-DP. Together with (0.01ε, 0.01δ)-DP budget used in qrange, this satisfied the target privacy. Analyzing the utility of this algorithm, we get the following guarantee. Theorem 5. Algorithm 1 is (ε, δ)-DP. Under Assumption 1, there exists a universal constant c (0, 0.1) such that if α c and n = eΩ (d/α2) + d2(log(1/δ))3/2/(εα) then Algorithm 1 achieves ˆµ µ 2 O(α p log(1/α)) with probability 0.9. The first term O(d/α2) in the sample complexity is optimal (cf. Table 1), but there is a factor of d gap in the second term. This is due to the fact that we need to run O(d) iterations in the worst-case. Such numerous accesses to the database result in large noise to be added at each iteration, requiring large sample size to combat that extra noise. We introduce PRIME to reduce the number of iterations to O((log d)2) and significantly reduce the sample complexity. 2.3 PRIME: novel robust and private mean estimator Algorithm 1 (specifically Filter( ) in Algorithm 1) accesses the database O(d) times. This is necessary for two reasons. First, the filter checks only one direction vt at each iteration. In the worst case, the corrupted samples can be scattered in Ω(d) orthogonal directions such that the filter needs to be repeated O(d) times. Secondly, even if the corrupted samples are clustered together in one direction, the filter still needs to be repeated O(d) times. This is because we had to use a large (random) threshold of d B2Zt = O(d) to make the threshold data-independent so that we can keep the sensitivity of Filter( ) low, which results in slow progress. We propose filtering multiple directions simultaneously using a new score {τi} based on the matrix multiplicative weights. Central to this approach is a novel adaptive filtering algorithm DPTHRESHOLD( ) that guarantees sufficient decrease in the total score at every iteration. 2.3.1 Matrix Multiplicative Weight (MMW) scoring The MMW-based approach, pioneered in [36] for non-private robust mean estimation, filters out multiple directions simultaneously. It runs over O(log d) epochs and every epoch consists of O(log d) iterations. At every epoch s and iteration t, step 2 of the iterative filtering in 2.1 is replaced by a new score τi = (xi Mean(S(s) t ))T U (s) t (xi Mean(S(s) t )) where U (s) t now accounts for all directions in Rd but appropriately weighted. Precisely, it is defined via the matrix multiplicative update: U (s) t = exp α(s) P r [t](Cov(S(s) r ) I) Tr exp(α(s) P r [t](Cov(S(s) r ) I)) , for some choice of α(s) > 0. If we set the number of iterations to one, a choice of α(s) = recovers the previous score that relied on the top singular vector from 2.1 and a choice of α(s) = 0 gives a simple norm based score τi = xi 2 2. An appropriate choice of α(s) smoothly interpolates between these two extremes, which ensures that O(log d) iterations are sufficient for the spectral norm of the covariance to decrease strictly by a constant factor. This guarantees that after O(log d) epochs, we sufficiently decrease the covariance to ensure that the empirical mean is accurate enough. Critical in achieving this gain is our carefully designed filtering algorithm DPTHRESHOLD that uses the privately computed MMW-based scores using Gaussian mechanism on the covariance matrices as shown in Algorithm 11 in Appendix E. 2.3.2 Adaptive filtering with DPTHRESHOLD Novelty. The corresponding non-private filtering of [36, Algorithm 9] for robust mean estimation takes advantage of an adaptive threshold, but filters out each sample independently resulting in a prohibitively large sensitivity; the coupling between each sample and the randomness used to filter it can change widely between two neighboring datasets. On the other hand, Algorithm 1 (i.e., Filter( ) in Algorithm 6) takes advantage of jointly filtering all points above a single threshold B2d Zt with a single randomness Zt Unif[0, 1], but the non-adaptive (and hence large) choice of the range B2d results in a large number of iterations because each filtering only decrease the score by little. To sufficiently reduce the total score while maintaining a small sensitivity, we introduce a filter with a single and adaptive threshold. Algorithm. Our goal here is to privately find a single scalar ρ such that when a randomized filter is applied on the scores {τi} with a (random) threshold ρZ (with Z drawn uniform in [0, 1]), we filter out enough samples to make progress in each iteration while ensuring that we do not remove too many uncorrupted samples. This is a slight generalization of the non-private algorithm in Section 2.1, which simply set ρ = maxj St τj. While this guarantees the filter removes more corrupted samples than good samples, it does not make sufficient progress in reducing the total score of the samples. Ideally, we want the thresholding to decrease the total score by a constant multiplicative factor, which will in the end allow the algorithm to terminate within logarithmic iterations. To this end, we propose a new scheme of using the largest ρ such that the following inequality holds: X τi>ρ (τi ρ) 0.31 X τi St (τi 1) . (1) We use a private histogram of the scores to approximate this threshold. Similar to [55, 58], we use geometrically increasing bin sizes such that we use only O(log B2d) bins while achieving a preferred multiplicative error in our quantization. At each epoch s and iteration t, we run DPTHRESHOLD sketched in the following to approximate ρ followed by a random filter. Step 3 replaces the non-private condition in Eq. (1). A complete description is provided in Algorithm 11. 1. Privately compute scores for all data points i S(s) t : τi (xi µt) U (s) t (xi µt) ; 2. Compute a private histogram { hj}2+log(B2d) j=1 of the scores over geometrically sized bins I1 = [1/4, 1/2), I2 = [1/2, 1), . . . , I2+log(B2d) = [2log(B2d) 1, 2log(B2d)] ; 3. Privately find the largest ℓsatisfying P j ℓ(2j 2ℓ) hj 0.31 P i S(s) t (τi 1) ; 4. Output ρ = 2ℓ. 3 Analyses of PRIME Building on the framework of Algorithm 1, PRIME (Algorithm 9) replaces the score with the MMWbased score presented in 2.3.1 and the filter with the adaptive DPTHRESHOLD. This reduces the number of iterations to T = O((log d)2) achieving the following bound. Theorem 6. PRIME is (ε, δ)-differentially private. Under Assumption 1 there exists a universal constant c (0, 0.1) such that if α c and n = eΩ((d/α2) + (d3/2/(εα)) log(1/δ)), then PRIME achieves ˆµ µ 2 = O(α p log(1/α)) with probability 0.9. A proof is provided in Appendix F. The notation eΩ( ) hides logarithmic terms in d, R, and 1/α. To achieve an error of O(α p log(1/α)), the first term eΩ(d/α2 log(1/α)) is necessary even if there is no corruption. The accuracy of O(α p log(1/α)) matches the lower bound shown in [33] for any polynomial time statistical query algorithm, and it nearly matches the information theoretical lower bound on robust estimation of Ω(α). On the other hand, the second term of eΩ(d3/2/(εα log(1/α))) has an extra factor of d1/2 compared to the optimal one achieved by exponential time Algorithm 2. It is an open question if this gap can be closed by a polynomial time algorithm. The bottleneck is the private matrix multiplicative weights. Such spectral analyses are crucial in filter-based robust estimators. Even for a special case of privately computing the top principal component, the best polynomial time algorithm requires O(d3/2) samples [39, 18, 78], and this sample complexity is also necessary as shown in [39, Corollary 25]. To boost the success probability to 1 ζ for some small ζ > 0, we need an extra log(1/ζ) factor in the sample complexity to make sure the dataset satisfies the regularity condition with probability ζ/2. Then we can run PRIME log(1/ζ) times and choose the output of a run that satisfies n(s) > n(1 10α) and λ(s) Cα log(1/α) at termination. 0 20 40 60 80 100 Dimension d 2 error || ||2 DP Mean PRIME 0 20 40 60 80 100 Dimension d 2 error || ||2 DP Mean PRIME 10 2 10 1 100 101 102 0.0 2 error || ||2 DP Mean PRIME Figure 1: Private mean estimators (e.g., DP mean [52]) are vulnerable to adversarial corruption especially in high dimensions, while the proposed PRIME achieves robustness (and privacy) regardless of the dimension of the samples. Numerical experiments support our theoretical claims. The left figure with (α, ε, δ, n) = (0.05, 20, 0.01, 106) is in the large α regime where the DP Mean error is dominates by α d and PRIME error by α p log(1/α). Hence, PRIME error is constant whereas DP Mean error increases with the dimension d. The second figure with (α, ε, δ, n) = (0.001, 20, 0.01, 106) is in the small α regime when DP Mean error consists of α d/n and PRIME is dominated by p d/n. Both increase with the dimension d, and the gap can be made large by increasing α. The right figure with (α, δ, d, n) = (0.1, 0.01, 10, 106) is when DP Mean error is dominated by α d and PRIME by α p log(1/α) when ε > cd1.5/(αn). Below this threshold, which happens in this example around ε = 0.05, the added noise in the private mechanism starts to dominate with decreasing ε. Both algorithms have respective thresholds below which the error increases with decreasing ε. This threshold is larger for PRIME because it uses the privacy budget to perform multiple operations and hence the noise added to the final output is larger compared to DP Mean. Below this threshold, which can be easily determined based on the known parameters (ε, δ, n, α), we should either collect more data (which will decrease the threshold) or give up filtering and spend all privacy budget on qrange and the empirical mean (which will reduce the error). Details of the experiments are in Appendix L. 4 Exponential time algorithm with near-optimal sample complexity Novelty. An existing exponential time algorithm for robust and private mean estimation in [14] strictly requires the uncorrupted samples to be drawn from a Gaussian distribution. We also provide a similar algorithm based on private Tukey median in Appendix I and its analysis in Appendix J. In this section, we introduce a novel estimator that achieves near-optimal guarantees for more general sub-Gaussian distributions (and also covariance bounded distributions) but takes an exponential run-time. Its innovation is in leveraging on the resilience property of well-behaved distributions not only to estimate the mean robustly (which is the standard use of the property) but also to adaptively bound the sensitivity of the estimator, thus achieving optimal privacy-accuracy tradeoff. Definition 4.1 (Resilience from Definition 1 in [73]). A set of points {xi}i S lying in Rd is (σ, α)- resilient around a point µ if (1/|T|) P i T (xi µ) 2 σ for all subsets T S of size (1 α)|S|. Algorithm. As data is corrupted, we define R(S) as a surrogate for resilience of the uncorrupted part of the set. If S indeed consists of a 1 α fraction of independent samples from the promised class of distributions, the goodness score R(S) will be close to the resilience property of the good data. Definition 4.2 (Goodness of a set). For µ(S) = (1/|S|) P i S xi, let us define R(S) min S S,|S |=(1 2α)|S|. max T S ,|T |=(1 α)|S |. µ(T) µ(S ) 2 . Algorithm 2 first checks if the resilience matches that of the promised distribution. The data is pre-processed with qrange to ensure we can check R(S) privately. Once resilience is cleared, we can safely use the exponential mechanism based on the score function d(ˆµ, S) in Definition 4.3 to select an approximate robust mean ˆµ privately. The choice of the sensitivity critically relies on the fact that resilient datasets have small sensitivity of O((1/n) p log(1/α)). Without the resilience check, the sensitivity is O(d1/2/n) resulting in an extra factor of d in the sample complexity. Algorithm 2: Exponential-time private and robust mean estimation Input: S = {xi}i [n], α (0, 1/2), (ε, δ) 1 if n < cd1/2 log(1/δ)/ (εα p log(1/α)) then Output: [ cd1/2 log(1/δ)/ (εα) for hevay-tail] 2 ( x, B) qrange(S, (1/3)ε, (1/3)δ) [ qrange ht( ) for hevay-tail] 3 Project the data points onto the ball: xi PB d B/2( x)(xi), for all i [n] 4 b R(S) R(S) + Lap(3Bd1/2/(nε)) 5 if b R(S) > 2 α p log(1/α) then Output: [ b R(S) > 2cζ α for hevay-tail] 6 else Output: a randomly drawn point ˆµ B d B/2( x) sampled from a density 7 r(ˆµ) e (1/(24 log(1/α)))ε n d(ˆµ,S) [e (εn α/(24cζ))d(ˆµ,S) for heavy-tail] We propose the score function d(ˆµ, S) in the following definition, which is a robust estimator of the distance between the mean and the candidate ˆµ. Definition 4.3. For a set of data {xi}i S lying in Rd, for any v Sd 1, define T v to be the 3α|S| points with the largest v xi value, Bv to be the 3α|S| points with the smallest v xi value, and Mv = S \ (T v Bv). Define d(ˆµ, S) maxv Sd 1 v (µ(Mv) ˆµ) . Analysis. For any direction v, the truncated mean estimator µ(Mv) provides a robust estimation of the true mean along the direction v, thus the distance can be simply defined by taking the maximum over all directions v. We show the sensitivity of this simple estimator is bounded by the resilience property σ divided by n, which is O((1/n) p log(1/α)) once the resilience check is passed. This leads to the following near-optimal sample complexity. We provide a proof in Appendix H.2. Theorem 7 (Exponential time algorithm for sub-Gaussian distributions). Algorithm 2 is (ε, δ)-DP. Under Assumption 1, this algorithm achieves ˆµ µ 2 = O(α p log(1/α)) with probability 1 ζ if n = eΩ d + log 1 α + d log d p log(dn/ζ)/α + d1/2 log 1 Run-time. Computing R(S) exactly can take O(deΘ(n)) operations. The exponential mechanism implemented with α-covering for ˆµ and a constant covering for v can take O(nd( p log(dn/ζ)/α)d) operations. 5 Conclusion Differentially private mean estimation is brittle against a small fraction of the samples being corrupted by an adversary. We show that robustness can be achieved without any increase in the sample complexity by introducing a novel DP mean estimator, which requires run-time exponential in the dimension of the samples. The technical contribution is in leveraging the resilience property of well-behaved distributions in an innovative way to not only find robust mean (which is the typical use case of resilience) but also bound sensitivity for optimal privacy guarantee. To cope with the computational challenge, we propose an efficient algorithm, which we call PRIME, that achieves the optimal target accuracy at the cost of an increased sample complexity. The technical contributions are (i) a novel framework for private iterative filtering and its tight analysis of the end-to-end sensitivity and (ii) novel filtering algorithm of DPTHRESHOLD which is critical in privately running matrix multiplicative weights and hence significantly reducing the number of accesses to the database. With appropriately chosen parameters, we show that our exponential time approach achieves near-optimal guarantees for both sub-Gaussian and covariance bounded distributions and PRIME achieves the same accuracy efficiently but at the cost of an increased sample complexity by a d1/2 factor. There are several directions for improving our results further and applying the framework to solve other problems. PRIME provides a new design principle for private and robust estimation. This can be more broadly applied to fundamental statistical analyses such as robust covariance estimation [28, 30, 64] robust PCA [60, 48], and robust linear regression [59, 35]. PRIME could be improved in a few directions. First, the sample complexity of eΩ((d/(α2 log(1/α)))+ (d3/2/(εα log(1/α))) log(1/δ)) in Theorem 6 is suboptimal in the second term. Improving the d3/2 factor requires bypassing differentially private singular value decomposition, which seems to be a challenging task. However, it might be possible to separate the log(1/δ) factor from the rest of the terms and get an additive error of the form eΩ((d/(α2 log(1/α))) + (d3/2/(εα log(1/α))) + (1/ε) log(1/δ)). This requires using Laplace mechanism in private MMW (line 16 Algortihm 10). Secondly, the time complexity of PRIME is dominated by computation time of the matrix exponential in (line 16 Algortihm 10). Total number of operations scale as e O(d3 + nd2). One might hope to achieve e O(nd) time complexity using approximate computations of τj s using techniques from [36]. This does not improve the sample complexity, as the number of times the dataset is accessed remains the same. Finally, for (non-robust) private mean estimation, COINPRESS provides a practical improvement in the small sample regime by progressively refining the search space [12]. The same principle could be applied to PRIME to design a robust version of COINPRESS. One important question remains open; how are differential privacy and robust statistics fundamentally related? We believe our exponential time algorithm hints on a fundamental connection between robust statistics of a data projected onto one-dimensional subspace and sensitivity of resulting score function for the exponential mechanism. It is an interesting direction to pursue this connection further to design novel algorithms that bridge privacy and robustness. Acknowledgement Sham Kakade acknowledges funding from the National Science Foundation under award CCF1703574. Sewoong Oh acknowledges funding from Google faculty research award, NSF grants IIS-1929955, CCF-1705007, CNS-2002664, CCF 2019844 as a part of Institute for Foundation of Machine Learning, and CNS-2112471 as a part of Institute for Future Edge Networks and Distributed Intelligence. [1] Martin Abadi, Andy Chu, Ian Goodfellow, H Brendan Mc Mahan, Ilya Mironov, Kunal Talwar, and Li Zhang. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pages 308 318, 2016. [2] John M Abowd. The us census bureau adopts differential privacy. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 2867 2867, 2018. [3] Ishaq Aden-Ali, Hassan Ashtiani, and Gautam Kamath. On the sample complexity of privately learning unbounded high-dimensional gaussians. ar Xiv preprint ar Xiv:2010.09929, 2020. [4] Zeyuan Allen-Zhu, Zhenyu Liao, and Lorenzo Orecchia. Spectral sparsification and regret minimization beyond matrix multiplicative updates. In Proceedings of the forty-seventh annual ACM symposium on Theory of computing, pages 237 245, 2015. [5] Edoardo Amaldi and Viggo Kann. The complexity and approximability of finding maximum feasible subsystems of linear relations. Theoretical computer science, 147(1-2):181 210, 1995. [6] Frank J Anscombe. Rejection of outliers. Technometrics, 2(2):123 146, 1960. [7] Ainesh Bakshi and Pravesh Kothari. List-decodable subspace recovery via sum-of-squares. ar Xiv preprint ar Xiv:2002.05139, 2020. [8] S. Balakrishnan, S. S. Du, J. Li, and A. Singh. Computationally efficient robust sparse estimation in high dimensions. In Proceedings of the 30th Conference on Learning Theory, COLT 2017, pages 169 212, 2017. [9] Amos Beimel, Shay Moran, Kobbi Nissim, and Uri Stemmer. Private center points and learning of halfspaces. ar Xiv preprint ar Xiv:1902.10731, 2019. [10] K. Bhatia, P. Jain, P. Kamalaruban, and P. Kar. Consistent robust regression. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, pages 2107 2116, 2017. [11] Kush Bhatia, Prateek Jain, and Purushottam Kar. Robust regression via hard thresholding. In Advances in Neural Information Processing Systems, pages 721 729, 2015. [12] Sourav Biswas, Yihe Dong, Gautam Kamath, and Jonathan Ullman. Coinpress: Practical private mean and covariance estimation. ar Xiv preprint ar Xiv:2006.06618, 2020. [13] Avrim Blum, Cynthia Dwork, Frank Mc Sherry, and Kobbi Nissim. Practical privacy: the sulq framework. In Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 128 138, 2005. [14] Mark Bun, Gautam Kamath, Thomas Steinke, and Steven Z Wu. Private hypothesis selection. In Advances in Neural Information Processing Systems, pages 156 167, 2019. [15] T Tony Cai, Yichen Wang, and Linjun Zhang. The cost of privacy: Optimal rates of convergence for parameter estimation with differential privacy. ar Xiv preprint ar Xiv:1902.04495, 2019. [16] Clément L Canonne, Gautam Kamath, Audra Mc Millan, Jonathan Ullman, and Lydia Zakynthinou. Private identity testing for high-dimensional distributions. ar Xiv preprint ar Xiv:1905.11947, 2019. [17] Moses Charikar, Jacob Steinhardt, and Gregory Valiant. Learning from untrusted data. In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pages 47 60, 2017. [18] Kamalika Chaudhuri, Anand D Sarwate, and Kaushik Sinha. A near-optimal algorithm for differentially-private principal components. The Journal of Machine Learning Research, 14(1):2905 2943, 2013. [19] Xinyun Chen, Chang Liu, Bo Li, Kimberly Lu, and Dawn Song. Targeted backdoor attacks on deep learning systems using data poisoning. ar Xiv preprint ar Xiv:1712.05526, 2017. [20] Yu Cheng, Ilias Diakonikolas, and Rong Ge. High-dimensional robust mean estimation in nearly-linear time. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 2755 2771. SIAM, 2019. [21] Yu Cheng, Ilias Diakonikolas, Rong Ge, and David P Woodruff. Faster algorithms for highdimensional robust covariance estimation. In Conference on Learning Theory, pages 727 757. PMLR, 2019. [22] Yeshwanth Cherapanamjeri, Sidhanth Mohanty, and Morris Yau. List decodable mean estimation in nearly linear time. ar Xiv preprint ar Xiv:2005.09796, 2020. [23] Arnak Dalalyan and Philip Thompson. Outlier-robust estimation of a sparse linear model using ℓ1-penalized huber s m-estimator. In Advances in Neural Information Processing Systems, pages 13188 13198, 2019. [24] Jules Depersin and Guillaume Lecué. Robust subgaussian estimation of a mean vector in nearly linear time. ar Xiv preprint ar Xiv:1906.03058, 2019. [25] Luc Devroye and Gábor Lugosi. Combinatorial methods in density estimation. Springer Science & Business Media, 2012. [26] Aditya Dhar and Jason Huang. Designing differentially private estimators in high dimensions. ar Xiv preprint ar Xiv:2006.01944, 2020. [27] Ilias Diakonikolas, Samuel B Hopkins, Daniel Kane, and Sushrut Karmalkar. Robustly learning any clusterable mixture of gaussians. ar Xiv preprint ar Xiv:2005.06417, 2020. [28] Ilias Diakonikolas, Gautam Kamath, Daniel Kane, Jerry Li, Ankur Moitra, and Alistair Stewart. Robust estimators in high-dimensions without the computational intractability. SIAM Journal on Computing, 48(2):742 864, 2019. [29] Ilias Diakonikolas, Gautam Kamath, Daniel Kane, Jerry Li, Jacob Steinhardt, and Alistair Stewart. Sever: A robust meta-algorithm for stochastic optimization. In International Conference on Machine Learning, pages 1596 1606, 2019. [30] Ilias Diakonikolas, Gautam Kamath, Daniel M. Kane, Jerry Li, Ankur Moitra, and Alistair Stewart. Being Robust (in High Dimensions) Can Be Practical. ar Xiv e-prints, page ar Xiv:1703.00893, March 2017. [31] Ilias Diakonikolas, Gautam Kamath, Daniel M Kane, Jerry Li, Ankur Moitra, and Alistair Stewart. Robustly learning a gaussian: Getting optimal error, efficiently. In Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 2683 2702. SIAM, 2018. [32] Ilias Diakonikolas and Daniel M Kane. Recent advances in algorithmic high-dimensional robust statistics. ar Xiv preprint ar Xiv:1911.05911, 2019. [33] Ilias Diakonikolas, Daniel M Kane, and Alistair Stewart. Statistical query lower bounds for robust estimation of high-dimensional gaussians and gaussian mixtures. In 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS), pages 73 84. IEEE, 2017. [34] Ilias Diakonikolas, Daniel M Kane, and Alistair Stewart. List-decodable robust mean estimation and learning mixtures of spherical gaussians. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, pages 1047 1060, 2018. [35] Ilias Diakonikolas, Weihao Kong, and Alistair Stewart. Efficient algorithms and lower bounds for robust linear regression. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 2745 2754. SIAM, 2019. [36] Yihe Dong, Samuel Hopkins, and Jerry Li. Quantum entropy scoring for fast robust mean estimation and improved outlier detection. In Advances in Neural Information Processing Systems, pages 6067 6077, 2019. [37] Cynthia Dwork, Frank Mc Sherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensitivity in private data analysis. In Theory of cryptography conference, pages 265 284. Springer, 2006. [38] Cynthia Dwork and Aaron Roth. The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science, 9(3-4):211 407, 2014. [39] Cynthia Dwork, Kunal Talwar, Abhradeep Thakurta, and Li Zhang. Analyze gauss: optimal bounds for privacy-preserving principal component analysis. In Proceedings of the forty-sixth annual ACM symposium on Theory of computing, pages 11 20, 2014. [40] Úlfar Erlingsson, Vasyl Pihur, and Aleksandra Korolova. Rappor: Randomized aggregatable privacy-preserving ordinal response. In Proceedings of the 2014 ACM SIGSAC conference on computer and communications security, pages 1054 1067, 2014. [41] Giulia Fanti, Vasyl Pihur, and Úlfar Erlingsson. Building a rappor with the unknown: Privacypreserving learning of associations and data dictionaries. Proceedings on Privacy Enhancing Technologies, 2016(3):41 61, 2016. [42] Chao Gao et al. Robust regression via mutivariate regression depth. Bernoulli, 26(2):1139 1170, 2020. [43] Sam Hopkins, Jerry Li, and Fred Zhang. Robust and heavy-tailed mean estimation made simple, via regret minimization. Advances in Neural Information Processing Systems, 33, 2020. [44] Samuel B Hopkins. Mean estimation with sub-gaussian rates in polynomial time. Annals of Statistics, 48(2):1193 1213, 2020. [45] Samuel B Hopkins and Jerry Li. Mixture models, robustness, and sum of squares proofs. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, pages 1021 1034, 2018. [46] Samuel B Hopkins and Jerry Li. How hard is robust mean estimation? In Conference on Learning Theory, pages 1649 1682. PMLR, 2019. [47] Peter J. Huber. Robust Estimation of a Location Parameter. The Annals of Mathematical Statistics, 35(1):73 101, 1964. [48] Arun Jambulapati, Jerry Li, and Kevin Tian. Robust sub-gaussian principal component analysis and width-independent schatten packing. Advances in Neural Information Processing Systems, 33, 2020. [49] He Jia and Santosh Vempala. Robustly clustering a mixture of gaussians. ar Xiv preprint ar Xiv:1911.11838, 2019. [50] Peter Kairouz, H Brendan Mc Mahan, Brendan Avent, Aurélien Bellet, Mehdi Bennis, Arjun Nitin Bhagoji, Keith Bonawitz, Zachary Charles, Graham Cormode, Rachel Cummings, et al. Advances and open problems in federated learning. ar Xiv preprint ar Xiv:1912.04977, 2019. [51] Peter Kairouz, Sewoong Oh, and Pramod Viswanath. The composition theorem for differential privacy. In International conference on machine learning, pages 1376 1385, 2015. [52] Gautam Kamath, Jerry Li, Vikrant Singhal, and Jonathan Ullman. Privately learning highdimensional distributions. In Conference on Learning Theory, pages 1853 1902, 2019. [53] Gautam Kamath, Or Sheffet, Vikrant Singhal, and Jonathan Ullman. Differentially private algorithms for learning mixtures of separated gaussians. In 2020 Information Theory and Applications Workshop (ITA), pages 1 62. IEEE, 2020. [54] Gautam Kamath, Vikrant Singhal, and Jonathan Ullman. Private mean estimation of heavy-tailed distributions. ar Xiv preprint ar Xiv:2002.09464, 2020. [55] Haim Kaplan, Katrina Ligett, Yishay Mansour, Moni Naor, and Uri Stemmer. Privately learning thresholds: Closing the exponential gap. In Conference on Learning Theory, pages 2263 2285. PMLR, 2020. [56] Sushrut Karmalkar, Adam Klivans, and Pravesh Kothari. List-decodable linear regression. In Advances in Neural Information Processing Systems, pages 7423 7432, 2019. [57] Sushrut Karmalkar and Eric Price. Compressed sensing with adversarial sparse noise via l1 regression. In 2nd Symposium on Simplicity in Algorithms, 2019. [58] Vishesh Karwa and Salil Vadhan. Finite sample differentially private confidence intervals. ar Xiv preprint ar Xiv:1711.03908, 2017. [59] Adam Klivans, Pravesh K Kothari, and Raghu Meka. Efficient algorithms for outlier-robust regression. In Conference On Learning Theory, pages 1420 1430, 2018. [60] Weihao Kong, Raghav Somani, Sham Kakade, and Sewoong Oh. Robust meta-learning for mixed linear regression with small batches. Advances in Neural Information Processing Systems, 33, 2020. [61] Pravesh K Kothari, Jacob Steinhardt, and David Steurer. Robust moment estimation and improved clustering via sum of squares. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, pages 1035 1046, 2018. [62] Kevin A Lai, Anup B Rao, and Santosh Vempala. Agnostic estimation of mean and covariance. In 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS), pages 665 674. IEEE, 2016. [63] Jerry Li. CSE 599-M, Lecture Notes: Robustness in Machine Learning , 2019. URL: https: //jerryzli.github.io/robust-ml-fall19/lec7.pdf. [64] Jerry Li and Guanghao Ye. Robust gaussian covariance estimation in nearly-matrix multiplication time. Advances in Neural Information Processing Systems, 33, 2020. [65] Liu Liu, Yanyao Shen, Tianyang Li, and Constantine Caramanis. High dimensional robust sparse regression. ar Xiv preprint ar Xiv:1805.11643, 2018. [66] Xiaohui Liu. Fast implementation of the tukey depth. Computational Statistics, 32(4):1395 1410, 2017. [67] Xiaohui Liu, Karl Mosler, and Pavlo Mozharovskyi. Fast computation of tukey trimmed regions and median in dimension p> 2. Journal of Computational and Graphical Statistics, 28(3):682 697, 2019. [68] Gábor Lugosi, Shahar Mendelson, et al. Sub-gaussian estimators of the mean of a random vector. Annals of Statistics, 47(2):783 794, 2019. [69] Frank Mc Sherry and Kunal Talwar. Mechanism design via differential privacy. In 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS 07), pages 94 103. IEEE, 2007. [70] Bhaskar Mukhoty, Govind Gopakumar, Prateek Jain, and Purushottam Kar. Globally-convergent iteratively reweighted least squares for robust regression problems. In The 22nd International Conference on Artificial Intelligence and Statistics, pages 313 322, 2019. [71] A. Prasad, A. S. Suggala, S. Balakrishnan, and P. Ravikumar. Robust estimation via robust gradient estimation. ar Xiv preprint ar Xiv:1802.06485, 2018. [72] Prasad Raghavendra and Morris Yau. List decodable learning via sum of squares. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 161 180. SIAM, 2020. [73] Jacob Steinhardt, Moses Charikar, and Gregory Valiant. Resilience: A criterion for learning in the presence of arbitrary outliers. In 9th Innovations in Theoretical Computer Science Conference (ITCS 2018). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2018. [74] Jun Tang, Aleksandra Korolova, Xiaolong Bai, Xueqiang Wang, and Xiaofeng Wang. Privacy loss in apple s implementation of differential privacy on macos 10.12. ar Xiv preprint ar Xiv:1709.02753, 2017. [75] Terence Tao. Topics in random matrix theory, volume 132. American Mathematical Soc., 2012. [76] John W Tukey. A survey of sampling from contaminated distributions. Contributions to probability and statistics, pages 448 485, 1960. [77] Martin J Wainwright. High-dimensional statistics: A non-asymptotic viewpoint, volume 48. Cambridge University Press, 2019. [78] Lu Wei, Anand D Sarwate, Jukka Corander, Alfred Hero, and Vahid Tarokh. Analysis of a privacy-preserving pca algorithm using random matrix theory. In 2016 IEEE Global Conference on Signal and Information Processing (Global SIP), pages 1335 1339. IEEE, 2016. [79] Huang Xiao, Battista Biggio, Gavin Brown, Giorgio Fumera, Claudia Eckert, and Fabio Roli. Is feature selection secure against training data poisoning? In International Conference on Machine Learning, pages 1689 1698. PMLR, 2015. [80] Huanyu Zhang, Gautam Kamath, Janardhan Kulkarni, and Zhiwei Steven Wu. Privately learning markov random fields. ar Xiv preprint ar Xiv:2002.09463, 2020. [81] Banghua Zhu, Jiantao Jiao, and Jacob Steinhardt. Generalized resilience and robust statistics. ar Xiv preprint ar Xiv:1909.08755, 2019. [82] Banghua Zhu, Jiantao Jiao, and Jacob Steinhardt. When does the tukey median work? ar Xiv preprint ar Xiv:2001.07805, 2020.