# differentially_private_covariance_estimation__1ea2db68.pdf Differentially Private Covariance Estimation Kareem Amin kamin@google.com Google Research NY Travis Dick tdick@cs.cmu.edu Carnegie Mellon University Alex Kulesza kulesza@google.com Google Research NY Andr es Mu noz Medina ammedina@google.com Google Research NY Sergei Vassilvitskii sergeiv@google.com Google Research NY The task of privately estimating a covariance matrix is a popular one due to its applications to regression and PCA. While there are known methods for releasing private covariance matrices, these algorithms either achive only ( , δ)-differential privacy or require very complicated sampling schemes, ultimately performing poorly in real data. In this work we propose a new -differentially private algorithm for computing the covariance matrix of a dataset that addresses both of these limitations. We show that it has lower error than existing state-of-the-art approaches, both analytically and empirically. In addition, the algorithm is significantly less complicated than other methods and can be efficiently implemented with rejection sampling. 1 Introduction Differential privacy has emerged as a standard framework for thinking about user privacy in the context of large scale data analysis [Dwork et al., 2014a]. While differential privacy does not protect against all attack vectors, it does provide formal guarantees about possible information leakage. A key feature of differential privacy is its robustness to post-processing: once a mechanism is certified as differentially private, arbitrary post-processing can be performed on its outputs without additional privacy impact. The past decade has seen the emergence of a wide range of techniques for modifying classical learning algorithms to be differentially private [Mc Sherry and Mironov, 2009, Chaudhuri et al., 2011, Jain et al., 2012, Abadi et al., 2016]. These algorithms typically train directly on the raw data, but inject carefully designed noise in order to produce differentially private outputs. A more general (and challenging) alternative approach is to first preprocess the dataset using a differentially private mechanism and then freely choose among standard off-the-shelf algorithms for learning. This not only provides more flexibility in the design of the learning system, but also removes the need for access to sensitive raw data (except for the initial preprocessing step). This approach thus falls under the umbrella of data release: since the preprocessed dataset is differentially private, it can, in principle, be released without leaking any individual s data. In this work we consider the problem of computing, in a differentially private manner, a specific preprocessed representation of a dataset: its covariance matrix. Formally, given a data matrix X 2 Rd n, where each column corresponds to a data point, we aim to compute a private estimate of C = XX> that can be used in place of the raw data, for example, as the basis for standard linear regression algorithms. Our methods provide privacy guarantees for the columns of X. There are many existing techniques that can be applied to this problem. We distinguish - differentially private algorithms, which promise what is sometimes referred to as pure differential 33rd Conference on Neural Information Processing Systems (Neur IPS 2019), Vancouver, Canada. privacy, from ( , δ)-differentially private algorithms, which may fail to preserve privacy with some probability δ. While algorithms in the pure differential privacy setting give stronger privacy guarantees, they tend to be significantly more difficult to implement, and often underperform empirically when compared to the straightforward algorithms in the ( , δ) setting. In this work, we give a new practical -differentially private algorithm for covariance matrix estimation. At a high level, the algorithm is natural. It approximates the eigendecomposition of the covariance matrix C by estimating the collections of eigenvalues and eigenvectors separately. Since the eigenvalues are insensitive to changes in a single column of X, we can accurately estimate them using the Laplace mechanism. To estimate the eigenvectors, the algorithm uses the exponential mechanism to sample a direction from the unit sphere that approximately maximizes >C , subject to the constraint of being orthogonal to the approximate eigenvectors sampled so far. The overall privacy guarantee for the combined method then follows from basic composition. Our empirical results demonstrate lower reconstruction error for our algorithm when compared to other methods on both simulated and real-world datasets. This is especially striking in the highprivacy/lowregime, where we outperform all existing methods. We note that there is a different regime where our bounds no longer compete with those of the Gaussian mechanism, namely when , δ, and the number of data points are all sufficiently large (i.e., when privacy is easy ). This suggests a two-pronged approach for the practitioner: utilize simple perturbation techniques when the data is insensitive to any one user and privacy parameters are lax, and more careful reconstruction when the privacy parameters are tight or the data is scarce, as is often the case in the social sciences and medical research. Our main results can be summarized as follows: We prove our algorithm improves the privacy/utility trade-off by achieving lower error at a given privacy parameter compared with previous pure differentially private approaches (Theorem 2). We derive a non-uniform allocation of the privacy budget for estimating the eigenvectors of the covariance matrix giving the strongest utility guarantee from our analysis (Corollary 1). We show that our algorithm is practical: a simple rejection sampling scheme can be used for the core of the implementation (Algorithm 2). Finally, we perform an empirical evaluation of our algorithm, comparing it to existing meth- ods on both synthetic and real-world datasets (Section 4). To the best of our knowledge, this is the first comparative empirical evaluation of different private covariance estimation methods, and we show that our algorithm outperforms all of the baselines, especially in the high privacy regime. 1.1 Database Sanitization for Ridge Regression Our motivation for private covariance estimation is training regression models. In practice, regression models are trained using different subsets of features, multiple regularization parameters, and even varying target variables. If we were to directly apply differentially private learning algorithms for each of these learning tasks, our privacy costs would accumulate with every model we trained. Our goal is to instead pay the privacy cost only once, computing a single data structure that can be used multiple times to tune regression models. In this section, we show that a private estimate of the covariance matrix C = XX> summarizes the data sufficiently well for all of these ridge regression learning tasks with only a one-time privacy cost. Therefore, we can view differentially private covariance estimation as a database sanitization scheme for ridge regression. Formally, given a data matrix X 2 Rd n with columns x1, . . . , xn, we denote the ith entry of xj by xj(i). Consider using ridge regression to learn a linear model for estimating some target feature x(t) as a function of x( t), where x( t) denotes the vector obtained by removing the tth feature of x 2 Rd. That is, we want solve the following regularized optimization problem: w>xj( t) xj(t) We can write the solution to the ridge regression problem in closed form as follows. Let A 2 R(d 1) n be the matrix consisting all but the tth row of X and y = x1(t), . . . , xn(t) be the tth row of X (as a column vector). Then the solution to the ridge regression problem with regularization parameter is given by w = (AA> + 2 n I) 1Ay. Given access to just the covariance matrix C = XX>, we can compute the above closed form ridge regression model. Suppose first that the target feature is t = d. Then, writing X in block-form, we have AA> Ay y>A> y>y Now it is not hard to see we can recover w by using the block entries of the full covariance matrix. The following lemma quantifies how much the error of estimating C privately affects the regression solution w . The proof can be found in Appendix A. Lemma 1. Let X 2 Rd n be a data matrix, C = XX> 2 Rd d, and ˆC 2 Rd d be a symmetric approximation to C. Fix any target feature t and regularization parameter . Let w and ˆw be the ridge regression models learned for predicting feature t from C and ˆC, respectively. Then kw ˆw k2 k C ˆCk2,1 + k C ˆCk2 k ˆw k2 λmin(C) + 2 n , where k Mk2,1 denotes the L2,1-norm of M (the maximum 2-norm of its columns). Both k C ˆCk2,1 and k C ˆCk2 are upper bounded by the Frobenius error k C ˆCk F . Therefore, in our analysis of our differentially private covariance estimation mechanism, we will focus on bounding the Frobenius error. The bound in Lemma 1 also holds with k ˆw k2 replaced by kw k2 in the right hand side, however we prefer the stated version since it can be computed by the practitioner. 1.2 Related Work A variety of techniques exist for computing differentially private estimates of covariance matrices, including both general mechanisms that can be applied in this setting as well as specialized methods that take advantage of problem-specific structure. A na ıve approach using a standard differential privacy mechanism would be to simply add an appropriate amount of Laplace noise independently to every element in the true covariance matrix C. However, the amount of noise required makes such a mechanism impractical, as the sensitivity, and hence the amount of noise added, grows linearly in the dimension. A better approach is to add Gaussian noise [Dwork et al., 2014b]; however, this results in ( , δ)-differential privacy, where, with some probability δ, the outcome is not private. Similarly, Upadhyay [2018] proposes a private way of generating low dimensional representations of X. This is a slightly different task than covariance estimation. Moreover, their algorithm is only ( , δ)-differentially private for δ > n log n which makes the privacy regime incomparable to the one proposed in this paper. Another approach, proposed in Chaudhuri et al. [2012], is to compute a private version of PCA. This approach has two limitations. First, it only works for computing the top eigenvectors, and can fail to give non-trivial results for computing the full covariance matrix. Second, the sampling itself is quite involved and requires the use of a Gibbs sampler. Since it is generally impossible to know when the sampler converges, adding noise in this manner can violate privacy guarantees. The algorithm we propose bears the most resemblance to the differentially private low-rank matrix approximation proposed by Kapralov and Talwar [2013], which approximates the SVD. Their algorithm computes a differentially private rank-1 approximation of a matrix C, subtracts this matrix from C and then iterates the process on the residual. Similarly, our approach iteratively generates estimates of the eigenvectors of the matrix, but repeatedly projects the matrix onto the subspace orthogonal to the previously estimated eigenvectors. We demonstrate the benefit of this projective update both in our analytical bounds and empirical results. This ultimately allows us to rely on a simple rejection sampling technique proposed by Kent et al. [2018] to select our eigenvectors. Other perturbation approaches include recent work on estimating sparse covariance matrices by Wang and Xu [2019]. Their setup differs from ours in that they assume all columns in the covariance matrix have s-sparsity. There was also an attempt by Jiang et al. [2016] to use Wishart-distributed noise to privately estimate a covariance matrix. However, Imtiaz and Sarwate [2016] proposed the same algorithm and later discovered that the algorithm was in fact not differentially private. Wang [2018] also study the effectiveness of differentially private covariance estimation for private linear regression (and compare against several other private regression approaches). However, they only consider the Laplace and Gaussian mechanisms for private covariance estimation and do not study the quality of the estimated covariance matrices, only their performance for regression tasks. 2 Preliminaries Let X 2 Rd n be a data matrix where each column corresponds to a d-dimensional data point. Throughout the paper, we assume that the columns of the data matrix have 2-norm at most one1. Our goal is to privately release an estimate of the unnormalized and uncentered covariance matrix C = XX> 2 Rd d. We say that two data matrices X and X are neighbors if they differ on at most one column, denoted by X X. We want algorithms that are -differentially private with respect to neighboring data matrices. Formally, an algorithm A is -differentially private if for every pair of neighboring data matrices X and X and every set O of possible outcomes, we have: Pr(A(X) 2 O) e Pr(A( X) 2 O) . (1) A useful consequence of this definition is composability. Lemma 2. Suppose an algorithm A1 : Rd n ! Y1 is 1-differentially private and a second algorithm A2 : Rd n Y1 ! Y2 is 2-differentially private. Then the composition A(X) = A2(X, A1(X)) is ( 1 + 2)-differentially private. Our main algorithm uses this property and multiple applications of the following mechanisms. Laplace Mechanism. Let Lap( ) denote the Laplace distribution with parameter . Given a query f : Rd n ! Rk mapping data matrices to vectors, the 1-sensitivity of the query is given by f = max X X kf(X) f( X)k1. For a given privacy parameter , the Laplace mechanism approximately answers queries by outputting f(X)+(Y1, . . . , Yk), where each Yi is independently sampled from the Lap( f/ ) distribution. The privacy and utility guarantees of the Laplace mechanism are summarized in the following lemma. Lemma 3. The Laplace mechanism preserves -differential privacy and, for any β > 0, we have Pr(maxi |Yi| f Exponential Mechanism. The exponential mechanism can be used to privately select an approximately optimal outcome from an arbitrary domain. Formally, let (Y, µ) be a measure space and g : (X, y) 7! g(X, y) be the utility of outcome y for data matrix X. The sensitivity of g is given by g = max X X,y |g(X, y) g( X, y)|. For > 0, the exponential mechanism samples y from density proportional to fexp(y) = exp( 2 g g(X, y)), defined with respect to the base measure µ. Lemma 4 (Mc Sherry and Talwar [2007]). The exponential mechanism preserves -differential privacy. Let OPT = maxy g(X, y) and G = {y 2 Y : g(X, y) OPT }. If ˆy is the output of the exponential mechanism, we have Pr(ˆy 62 G2 ) exp In our algorithm, we will apply the exponential mechanism in order to choose unit-length approximate eigenvectors. Therefore, the space of outcomes Y will be the unit sphere Sd 1 = { 2 Rd : k k2 = 1}. For convenience, we will use the uniform distribution on the sphere, denoted by µ , as our base measure (this is proportional to the surface area). For example, the density p( ) = 1 is the uniform distribution on Sd 1 and µ (Sd 1) = 1. 3 Iterative Eigenvalue Sampling for Covariance Estimation In this section we describe our -differentially private covariance estimation mechanism. In fact, our method produces a differentially private approximation to the eigendecomposition of C = XX>. 1If the columns of X have 2-norm bounded by a known value B, we can rescale the columns by 1/B to obtain a matrix X0 with column norm at most 1. Since XX> = B2X0X 0>, estimating the covariance matrix of X0 gives an estimate of the covariance matrix of X with Frobenius error inflated by a factor B2. We first estimate the vector of eigenvalues, a query that has 1-sensitivity at most 2. Next, we show how to use the exponential mechanism to approximate the top eigenvector of the covariance matrix C. Inductively, after estimating the top k eigenvectors ˆ 1, . . . , ˆ k of C, we project the data onto the (d k)-dimensional orthogonal subspace and apply the exponential mechanism to approximate the top eigenvector of the remaining projected covariance matrix. Once all eigenvalues and eigenvectors have been estimated, the algorithm returns the reconstructed covariance matrix. Pseudocode for our method is given in Algorithm 1. In Section 3.2 we discuss a rejection-sampling algorithm of Kent et al. [2018] that can be used for sampling the distribution defined in step (a) of Algorithm 1. It is worth mentioning that if we only sample k eigenvectors, Algorithm 1 would return a rank kapproximation of matrix C. Algorithm 1 Iterative Eigenvector Sampling Input: C = XX> 2 Rd d, privacy parameters 0, . . . , d. 1. Initialize C1 = C, P1 = I 2 Rd d, ˆλi = λi(C) + Lap(2/ 0) for i = 1, . . . , d. 2. For i = 1, . . . , d: (a) Sample ˆui 2 Sd i proportional to f Ci(u) = exp( i 4 u>Ciu) and let ˆ i = P> (b) Find an orthonormal basis Pi+1 2 R(d i) d orthogonal to ˆ 1, . . . , ˆ i. (c) Let Ci+1 = Pi+1CP> i+1 2 R(d i) (d i). 3. Output ˆC = Pd i=1 ˆλiˆ iˆ > Our approach is similar to the algorithm of Kapralov and Talwar [2013] with one significant difference: in their algorithm, rather than projecting onto the orthogonal subspace of the first k estimated eigenvectors, they subtract the rank-one matrix given by ˆλiˆ iˆ > i from C, where ˆλi is the estimate of the ith eigenvalue. There are several advantages to using projections. First, the projection step exactly eliminates the variance along the direction ˆ i, while the rank-one subtraction will fail to do so if the estimated eigenvalues are incorrect (effectively causing us to pay for the eigenvalue approximation twice: once in the reconstruction of the covariance matrix and once because it prevents us from removing the variance along the direction ˆ i before estimating the remaining eigenvectors). Second, the analysis of the algorithm is substantially simplified because we are guaranteed that the estimated eigenvectors ˆ 1, . . . , ˆ d are orthogonal, and we do not require bounds for rank-one updates on the spectrum of a matrix. We now show that Algorithm 1 is differentially private. The algorithm applies the Laplace mechanism once and the exponential mechanism d times, so the result follows from bounding the sensitivity of the relevant queries and applying basic composition. Theorem 1. Algorithm 1 preserves -differential privacy. We now focus on the main contribution of this paper: a utility guarantee for Algorithm 1 in terms of the Frobenius distance between ˆC and the true covariance matrix C as a function of the privacy parameters used for each step. An important consequence of this analysis is that we can optimize the allocation of our total privacy budget among the d + 1 queries in order to get the best bound. First we provide a utility guarantee for the exponential mechanism applied to approximating the top eigenvector of a matrix C. This result is similar to the rank-one approximation guarantee given by Kapralov and Talwar [2013], but we include a proof in the appendix for completeness. Lemma 5. Let X 2 Rd n be a data matrix and C = XX>. For any β > 0, with probability at least 1 β over ˆu sampled from the density proportional to f C(u) = exp( 4u>Cu) on Sd 1, we have ˆu>Cˆu λ1(C) O d log λ1(C) + log 1 The following result characterizes the dependence of the Frobenius error on the errors in the estimated eigenvalues and eigenvectors. In particular, given that the eigenvalue estimates all have bounded error, the dependence on the ith eigenvector estimate ˆ i is only through the quantity i Cˆ i, which measures how much less variance of C is captured by ˆ i as compared to the true ith eigenvector. Moreover, the contribution of ˆ i is roughly weighted by ˆλi. This observation allows us to tune the privacy budgeting across the d eigenvector queries, allocating more budget (at runtime) to the eigenvectors with large estimated eigenvalues. Empirically, we find that this budget allocation step improves performance in some settings. Lemma 6. Let C 2 Rd d be any positive semidefinite matrix. Let ˆ 1, . . . , ˆ d be any orthonormal vectors and ˆλ1, . . . , ˆλd be estimates of the eigenvalues of C satisfying |ˆλi λi(C)| for all i 2 [d]. Then k C ˆ ˆ ˆ >k F λi(C) (λi(C) ˆ > where ˆ is the matrix with columns ˆ i and ˆ is the diagonal matrix with entries ˆλi. Proof. Let 2 Rd d be the diagonal matrix of true eigenvalues of C. We have k C ˆ ˆ ˆ >k F k C ˆ ˆ >k F + kˆ ( ˆ )ˆ >k F . The second term is bounded by d, so it remains to bound the first term. We have that k C ˆ ˆ >k2 F + kˆ ˆ >k2 F 2 tr(Cˆ ˆ >) = 2 λi(C)(λi(C) ˆ > where the second equation follows from the fact that the first two terms are both equal to P i λi(C)2 and the cyclic property of the trace. The final bound follows by taking the square root. We are now ready to prove our main utility guarantee for Algorithm 1. The remaining analysis focuses on the effect of working with the projected covariance matrices Ci. One interesting observation is that our algorithm does not have error accumulating across its iterations due to the projection step. Following Lemma 6, we only need to show that ˆ i captures nearly as much of the variance of C as the ith eigenvector. Fortunately, if our estimates ˆ 1, . . . , ˆ i 1 have errors, then the orthogonal subspace only contains more variance, and thus the sampling step in round i actually becomes easier. In this sense Algorithm 1 is self-correcting . Theorem 2. Let ˆC be the output of Algorithm 1 run with inputs C and privacy parameters 0, . . . , d. For any β > 0, with probability at least 1 β we have k C ˆCk F O where the O notation suppresses logarithmic terms in d, λ1(C), and β. If Algorithm 1 is used to obtain a k-rank approximation, the above theorem can be modified to show that the distance from the best k-rank approximation would be in O . Since Theorem 2 bounds the error in terms of the privacy parameters 0, . . . , d, we can tune our allocation of the total privacy budget of across the d + 1 private operations in order to obtain the tightest possible bound. In order to preserve privacy, we tune based on the estimated eigenvalues ˆλ1, . . . , ˆλd obtained in step (1) of Algorithm 1 rather than using the true eigenvalues. The following result makes precise the natural intuition that more effort should be made to estimate those eigenvectors with larger (estimated) eigenvalues; its proof can be found in Appendix B. Corollary 1. Fix any privacy parameter and any failure probability β > 0, let 0 = /2, and pˆλj+ where = 2 0 log(2d/β). Then Algorithm 1 run with 0, . . . , d preserves -differential privacy and, with probability at least 1 β, the output ˆC satisfies k C ˆCk F O 3.1 Comparison of Bounds In this section we compare the bound provided by Theorem 2 to previous state-of-the-art results. Comparison to Kapralov and Talwar [2013]. The bounds given by Kapralov and Talwar [2013], when applied to the case of recovering the full-rank covariance matrix, bound the spectral error k C ˆCk2 by λ1(C) (for some > 0) under the condition that λ1(C) is sufficiently large. In particular, Theorem 18 from their paper shows that there exists an -differentially private algorithm with the above guarantee whenever λ1(C) C1d4/( 6) for some constant C1. Since k C ˆCk2 k C ˆCk F , we can directly compare both algorithms after slightly rewriting our bounds. The following result shows that we improve the necessary lower bound on λ1(C) by a factor of d/ 4 (ignoring log terms). Corollary 2. For any > 0 and any positive semidefinite matrix ˆC, with probability at least 0.99 (or any fixed success probability), running Algorithm 1 with 0 = /2 and i = /(2d) for i = 1, . . . , d preserves -differential privacy and outputs ˆC such that k C ˆCk F O( λ1(C)) if λ1(C) 2d3 Comparison to Gaussian Mechanism. We can also directly compare to the error bounds for the Gaussian mechanism given by Dwork et al. [2014b]. Theorem 9 in their paper gives k C ˆCk F O(d3/2p log(1/δ)/ ), where and δ are the (approximate) differential privacy parameters. Using privacy parameters 0 = /2 and i = /(2d) for i = 1, . . . , d, Theorem 2 implies that with high probability we have k C ˆCk F O λ1(C) log(λ1(C))/ + . For all values of δ > 0, our algorithm provides a stronger privacy guarantee than the Gaussian mechanism. On the other hand, whenever λ1(C) log(λ1(C)) log(1/δ)/ , our utility guarantee is tighter. Given that λ1(C) = O(n), where n is the number of data points, we see that our algorithm admits better utility guarantees in both the low data regime and the high privacy regime. 3.2 Sampling on the Sphere To implement Algorithm 1, we need a subprocedure for drawing samples from the densities proportional to exp( 4u>Cu) defined on the sphere Sd 1, where C is a covariance matrix and is the desired privacy parameter. This density belongs to a family called Bingham distributions. Kapralov and Talwar [2013] also discuss this sampling problem and, while their algorithm could also be used in our setting, we instead rely on a simpler rejection-sampling scheme proposed by Kent et al. [2018]. This sampling technique is exact and we find empirically that it is very efficient. Pseudocode for their method is given in Algorithm 2 in the appendix. Recall that rejection sampling allows us to generate samples from the distribution with density proportional to f, provided we can sample from the distribution with density proportional to a similar function g, called the envelope. Kent et al. [2018] propose to use the angular central Gaussian distribution as an envelope. This distribution has a matrix parameter and unnormalized density (defined on the sphere Sd 1) given by g(u) = (u> u) d/2. To sample from this distribution, we can simply sample z from the mean-zero Gaussian distribution with covariance given by 1 and output u = z/kzk2. Kent et al. [2018] provide a choice of parameter to minimize the number of rejected samples. They show that under some reasonable assumptions the expected number of rejections grows like O( d) (see [Kent et al., 2018] for more details). In our experiments we observed the median number of samples was less than d, and the mean was around 2d. We believe that our empirical rejection counts are larger than the asymptotic bounds of Kent et al. [2018] because the dimensionality of our datasets is not large enough. 4 Experiments We now present the results of an extensive empirical evaluation of the performance of our algorithm. Given a data matrix X, we study the performance of the algorithm on two tasks: (i) privately estimating the covariance matrix C = XX>, and (ii) privately regressing to predict one of the columns of X from the others. Due to space constraints we present only the results of (i) and present the results of (ii) in Appendix C. 10 1 10 1 100 101 1o Um Dl Lzed e UUo U w Lne d 13, n 200 AD ,T-8 L .T 10 1 10 1 100 101 DLUIo Ll d 6,n 1.5. 10 1 10 1 100 101 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 Ddult d 108, n 49. 10 1 10 1 100 101 1orm Dlize G error wine G 13, n 200 G-16 G-10 G-3 AD 10 1 10 1 100 101 Dirfoil G 6,n 1.5. 10 1 10 1 100 101 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 DGult G 108, n 49. Figure 1: Results comparing our algorithm across the wine, airfoil and adult data sets. (a) Comparison to KT and L. Error is normalized Frobenius distance. (b) Comparison to the Gaussian mechanism. The legend G-x corresponds to a value of δ = 10 x. We compare the performance of our algorithm to a number of different baselines. We begin with two general purpose output perturbation methods: the Laplace mechanism and the Gaussian mechanism. The Laplace mechanism [Dwork et al., 2006] (L). The output is given by ˆC = C + M where M is a matrix with entries distributed Lap( 2d The Gaussian mechanism [Dwork et al., 2014b] (G). Notably, the Gaussian mechanism achieves ( , δ)-differential privacy, hence its privacy guarantees are weaker for the same value of . Our goal is to measure if we can achieve similar utility under stricter privacy constraints. We experiment with different values of δ. The algorithm proposed by Kapralov and Talwar [2013] (KT). This algorithm is - differentially private. We use Algorithm 2 for the vector sampling subroutine. Algorithm 1 with adaptive privacy splitting (AD). We allocate the privacy budget in the manner suggested by Corollary 1. Algorithm 1 with uniform privacy splitting (IT-U). Same as above except the privacy bud- get used to sample eigenvectors is split uniformly. One final modification we apply to all algorithms that release a covariance matrix is to round the eigenvalues of the private matrix to fall in the interval [0, n], since this bound is data-independent and is easy to derive analytically. We measure the performance of our algorithm on three different datasets: Wine, Adult, and Airfoil from the UCI repository2, These datasets have dimensions ranging from 13 to 108, and number of points from 200 to 49,000. The approximation error of each algorithm is measured using the normalized Frobenius distance k ˆC Ck F n . To investigate the privacy/utility trade-off, we run each algorithm with privacy parameter 2 {0.01, 0.1, 0.2, 0.5, 1.0, 2.0, 4.0}. For the Gaussian mechanism, we also varied the parameter δ 2 {1e 16, 1e 10, 1e 3} We ran each experiment 50 times, showing the average error in Figure 1. The first thing to notice is that our algorithm consistently outperforms all others except for the single case of the wine data set with = 0.01. Recall that the Gaussian mechanism has an additional failure probability δ, thus the privacy guarantees we obtain are strictly better for the same value of . Therefore, it is particularly striking that we consistently beat the Gaussian mechanism even for the very relaxed value of δ = .001. Another important observation from this experiment is that the adaptive and non adaptive privacy budget splitting seems to not have a big effect on the performance of the algorithm. Finally, we see that the performance gap between AD and KT is largest on the dataset with the highest dimension. This phenomenon is in line with the analysis of Section 3.1. We explore this effect in more detail in Appendix C. Finally, as we detail in Appendix C our approach outperforms the output perturbation method of Chaudhuri et al. [2011] on the regression task, even though the latter achieves ( , δ)-differential privacy. As we mentioned previously, the private covariance matrix output by our algorithm can 2https://archive.ics.uci.edu/ml/datasets/ be also be used to tune regularization parameters without affecting the privacy budget, thus giving additional freedom to practitioners in tuning their algorithms. 5 Conclusion We presented a new algorithm for differentially private covariance estimation, studied it analytically, and demonstrated its performance on a number of synthetic and real world datasets. To the best of our knowledge this is the first -differentially private algorithm to admit a utility guarantee that grows as O(d3/2) with the dimension of the dataset. Previously, such bounds could only be achieved at the cost ( , δ)-differential privacy. We also showed that the average Frobenius approximation error of our algorithm decreases as O , which is slower than the O rate of the Gaussian and Laplace mechanisms. This poses an open question of whether the suboptimal dependency on n is necessary in order to achieve pure differential privacy or to achieve a dependency on the dimension of O(d3/2). Looking more broadly, practical machine learning and data analysis typically requires a significant amount of tuning: feature selection, hyperparameter selection, experimenting with regularization, and so on. If this tuning is performed using the underlying private dataset, then in principle all of these count against the privacy budget of the algorithm designer (who must also, of course, have access to that private dataset). By producing a differentially private summary of the dataset from which multiple models can be trained with no additional privacy cost, our approach allows a practitioner to operate freely, without worrying about privacy budgets or the secure handling of private data. We believe that finding techniques for computing private representations in other settings is an exciting direction for future research. Martin Abadi, Andy Chu, Ian Goodfellow, H Brendan Mc Mahan, Ilya Mironov, Kunal Talwar, and Li Zhang. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pages 308 318. ACM, 2016. Keith Ball. An elementary introduction to modern convex geometry. Flavors of geometry, 31:1 58, K. Chaudhuri, A. Sarwate, and K. Sinha. Near-optimal algorithms for differentially-private principal component analysis. In NIPS, 2012. Kamalika Chaudhuri, Claire Monteleoni, and Anand D Sarwate. Differentially private empirical risk minimization. Journal of Machine Learning Research, 12(Mar):1069 1109, 2011. Cynthia Dwork, Frank Mc Sherry, Kobbi Nissim, and Adam D. Smith. Calibrating noise to sen- sitivity in private data analysis. In Third Theory of Cryptography Conference, pages 265 284, 2006. Cynthia Dwork, Aaron Roth, et al. The algorithmic foundations of differential privacy. Foundations and Trends R in Theoretical Computer Science, 9(3 4):211 407, 2014a. Cynthia Dwork, Kunal Talwar, Abhradeep Thakurta, and Li Zhang. Analyze gauss: optimal bounds for privacy-preserving principal component analysis. In Proceedings of the forty-sixth annual ACM symposium on Theory of computing, pages 11 20. ACM, 2014b. Hafiz Imtiaz and Anand D. Sarwate. Symmetric matrix perturbation for differentially-private princi- pal component analysis. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016, Shanghai, China, March 20-25, 2016, pages 2339 2343, 2016. Prateek Jain, Pravesh Kothari, and Abhradeep Thakurta. Differentially private online learning. In Conference on Learning Theory, pages 24 1, 2012. Wuxuan Jiang, Cong Xie, and Zhihua Zhang. Wishart mechanism for differentially private principal components analysis. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, February 12-17, 2016, Phoenix, Arizona, USA., pages 1730 1736, 2016. Michael Kapralov and Kunal Talwar. On differentially private low rank approximation. In Proceed- ings of SODA, pages 1395 1414, 2013. John T Kent, Asaad M Ganeiber, and Kanti V Mardia. A new unified approach for the simulation of a wide class of directional distributions. Journal of Computational and Graphical Statistics, 27 (2):291 301, 2018. Daniel Kifer, Adam D. Smith, and Abhradeep Thakurta. Private convex optimization for empirical risk minimization with applications to high-dimensional regression. In Proceedings of COLT, pages 25.1 25.40, 2012. F. Mc Sherry and K. Talwar. Mechanism design via differential privacy. In FOCS, 2007. Frank Mc Sherry and Ilya Mironov. Differentially private recommender systems: Building privacy into the netflix prize contenders. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 627 636. ACM, 2009. Jalaj Upadhyay. The price of privacy for low-rank factorization. In Proceedings of Neur IPS, pages 4180 4191, 2018. Di Wang and Jinhui Xu. Differentially private high dimensional sparse covariance matrix estimation. Co RR, abs/1901.06413, 2019. URL http://arxiv.org/abs/1901.06413. Yu-Xiang Wang. Revisiting differentially private linear regression: optimal and adaptive prediction & estimation in unbounded domain. In Proceedings of UAI, 2018.