# differentially_private_robust_lowrank_approximation__77b5d565.pdf

Differentially Private Robust Low-Rank

Approximation

Raman Arora Johns Hopkins University

Baltimore, MD-21201

arora@cs.jhu.edu

Vladimir Braverman Johns Hopkins University

Baltimore, MD-21201

vova@cs.jhu.edu

Jalaj Upadhyay Johns Hopkins University

Baltimore, MD-21201

jalaj@jhu.edu

In this paper, we study the following robust low-rank matrix approximation problem: given a matrix A 2 Rn d, ﬁnd a rank-k matrix M, while satisfying differential privacy, such that k A Mkp OPTk(A) + , where k Bkp is the entry-wise p-norm of B and OPTk(A) := minrank(X) k k A Xkp. It is well known that low-rank approximation w.r.t. entrywise p-norm, for p 2 [1, 2), yields robustness to gross outliers in the data. We propose an algorithm that guarantees = e O(k2), = e O(k2(n + kd)/"), runs in e O((n + d)poly k) time and uses O(k(n + d) log k) space. We study extensions to the streaming setting where entries of the matrix arrive in an arbitrary order and output is produced at the very end or continually. We also study the related problem of differentially private robust principal component analysis (PCA), wherein we return a rank-k projection matrix such that k A A kp OPTk(A) + .

1 Introduction

Low rank matrix approximation is a well studied problem, where given a data matrix A, the goal is to ﬁnd a low-rank matrix B that approximates A in the sense that µ(A B) is small under some function µ( ). It ﬁnds application in numerous machine learning tasks, such as recommendation systems [10], clustering [9, 25], and learning distributions [2].

Often, the real-world data used in these applications is plagued with gross outliers, and it is desirable to impart robustness to low-rank approximation algorithms against such corruptions. Furthermore, these applications increasingly rely on sensitive data which raises the need for preserving privacy of the underlying data. The focus of this paper, therefore, is to compute a low-rank approximation of a given matrix under a strong privacy guarantee while being robust to outliers in data.

For robustness to outliers, we choose the measure µ( ) to be the entrywise p-norm for p 2 [1, 2), deﬁned as k Akp = (P

i,j |Ai,j|p)1/p. It is well known that low-rank approximation w.r.t. entrywise p-norm, for p 2 [1, 2), yields robustness to gross outliers in the data [5, 7, 22, 23, 24, 29]. To address the need for privacy, we rely on the notion of differential privacy [11] that has become the de facto standard for private data analysis in recent years. Formally, we deﬁne differential privacy as follows. Deﬁnition 1. A randomized algorithm M is said to be (", δ)-differentially private if for all neighboring datasets, A and A0, and all subsets S range(M) in the range of M, we have that Pr[M(A) 2 S] e"Pr[M(A0) 2 S] + δ.

The notion of what makes two datasets neighboring determines the granularity of differential privacy [13]. At the ﬁnest scale, we consider two matrices as neighboring if they differ in at most one entry by a unit value [17, 19, 20]; this corresponds to feature-level privacy. At the coarsest granularity, two matrices are deemed neighboring if they differ in one row by a unit norm [18, 14]; this

32nd Conference on Neural Information Processing Systems (Neur IPS 2018), Montréal, Canada.

corresponds to the user-level privacy. Note that since we do not make any boundedness assumption on the entries of the data-matrix, we need to establish a normalized scale to limit the inﬂuence of a single entry or a single row of a given matrix. In this paper, we say that two matrices A and A0 are neighboring if the matrices are within a unit (entrywise) 1 ball of each other, i.e., k A A0k1 1. This notion of neighboring datasets provides stronger guarantees than the feature-level privacy.

We are interested in private robust data analysis, speciﬁcally, robust low-rank approximation of a matrix with respect to entrywise p-norm for p 2 [1, 2), under the constraints of differential privacy. Even without privacy, low-rank matrix approximation with respect to entrywise p-norm for p 6= 2 is a non-trivial problem: it does not have a closed form solution and computing the optimal low-rank approximation with respect to 1-norm is known to be NP-hard [16]. A natural question then is whether we can compute a good enough approximation to the best rank-k approximation. This question has formed the basis for many recent results [5, 7, 22, 23, 24, 29]. However, prior to this work, differentially private low-rank approximation with respect to entrywise p-norm has been an open problem. We give the ﬁrst timeand space-efﬁcient differentially private algorithm for low-rank matrix approximation with respect to entrywise p-norm.

1.1 Formal Problem Statement and Contributions

In this section, we formally deﬁne the problem of differentially private robust low-rank matrix approximation, and state our main results. For the ease of presentation, we assume that δ = (n log n). We use the notation e O( ) to hide poly log factors. Deﬁnition 2 (Robust low-rank approximation). Given a matrix A 2 Rn d, and p 2 [1, 2), output a rank-k matrix M such that with probability at least 1 β,

k A Mkp OPTk(A) + , where OPTk(A) := min rank(X) k k A Xkp . (1)

Our ﬁrst contribution is Algorithm 1, ROBUST-LRA, which given an input matrix A 2 Rn d returns a differentially private rank-k approximation to A with a multiplicative approximation factor of = O((k log k)2(2 p)/p log d log n) and an additive approximation error of = e O

" 1k2(n + kd)

. In particular, for p = 1, we have = O(k2 log2 k log d log n) and = e O

" 1k2(n + kd)

. We note that the best known algorithm in a non-private setting [29] achieves the same multiplicative factor, albeit with no additive error. Therefore, the price we pay for privacy is in terms of an additional additive error.

In many machine learning problems, e.g. feature selection and representation learning, all we are interested in is recovering the low-dimensional subspace spanned by the data. One such example is principal component analysis using data with gross outliers or corruptions (e.g. face recognition in the presence of occlusions). Of course, the proposed Algorithm 1 can also output the projection matrix associated with the right singular vectors of matrix M with the same accuracy guarantee as for robust low-rank approximation (see Remark 1 for more details). However, the additive error we incur still scales with n whereas intuitively making the basis for a k-dimensional subspace in Rd should require only adding noise proportional to k d. This motivates a slightly different treatment for the robust principal component analysis problem, which can be formulated as follows. Deﬁnition 3 (Robust principal component analysis). Given a matrix A 2 Rn d, output a rank-k orthonormal projection matrix such that with probability at least 1 β,

k A A kp OPTk(A) + , where OPTk(A) := min rank(X) k k A Xkp . (2)

The second main contribution of this paper is an algorithm that returns a differentially private rank-k orthonormal projection matrix with = O((kd log k)(2 p)/p log3 d log n) = e O

Many variants of differentially private low-rank approximation have been studied in the literature [14, 18, 19, 17, 20, 21, 31, 32] for both the Frobenius norm and spectral norm. We give the ﬁrst (", δ)-

differentially private algorithm for robust PCA. Unlike PCA under Frobenius and spectral norm, computing an exact robust PCA is a computationally hard problem (NP-hard when p = 1).

Besides the objective function, our work differs from existing work also in terms of the privacy granularity and efﬁciency. A detailed comparison and review of previous works is presented in Table 1.

Table 1: Comparison of Models for Differentially Private k-Rank Approximation (u and v are unit vectors, es is the s-th standard basis, is an arbitrary constant, !k := σk(A) σk+1(A) is the singular value separation, µ is coherence of the matrix A, and p 2 [1, 2)).

Assumptions Accuracy ( , ) Metric

Theorem 10 k A A0k1 = 1

e O(k2p(2 p)/2 log k log d), e O

Hardt-Roth [18] A A0 = esv>

Frobenius µ-coherence Upadhyay [32] A A0 = uv>

(1 + ), e O

Kapralov-Talwar [21] k Akop k A0kop = 1

σ-value separation

Spectral Hardt-Price [17] A A0 = ese>

kµ log(log dσk/(!k))

"!k )) µ-coherence Dwork et al. [14] A A0 = esv>

Jiang et al. [20] A A0 = ese>

2 Basic Preliminaries

One of the key features of differential privacy is that it is preserved under arbitrary post-processing, i.e., an analyst, without additional information about the private database, cannot compute a function that makes an output less differentially private. This is formalized in the form of following lemma:

Lemma 4 (Dwork et al. [11])). Let M(D) be an (", δ)-differential private algorithm for a data matrix D , and let h be any function, then any mechanism M0 := h(M(D)) is also (", δ)-differentially private.

A key ingredient in our algorithms is a p-stable distribution which can be deﬁned in terms of a limit of normalized sums of i.i.d. random variables [33].

Deﬁnition 5 (p-stable distirbution). A distribution Dp over R is called p-stable, if there exists p 0, such that for any (v1, , vn) 2 Rn, and n i.i.d. random variables X1, , Xn with distribution Dp, the random variable P

i vi Xi has the same distribution as the variable kvkp X, where X Dp.

We use the notation D(r,c)

p to denote a distribution over r c random matrices, where every entry of the matrix is sampled from the distribution Dp. It is known that p-stable distributions exist for all p 2 (0, 2] [33], and that Gaussian distribution is 2-stable and the Cauchy distribution is 1-stable. Moreover, one can use the method of Chambers et al. [8] to sample from Dp (1 < p < 2).

Our analysis uses the fact that S D(r,c)

p satisﬁes the no-dilation and no-contraction property [28].

Deﬁnition 6 (No-dilation [28]). Given a matrix A 2 Rn d, if a matrix S 2 Rm n satisﬁes k SAkp c1 k Akp , then S has at most c1 dilation on A with respect to entrywise p-norm.

Deﬁnition 7 (No-contraction [28]). Given a matrix A 2 Rn d, a matrix S 2 Rm n has c2contraction on A with respect to the entrywise p-norm if 8x 2 Rd, k SAxkp c2 1 k Axkp .

Our analysis uses recent results from matrix sketching. In particular, we use the fact that we can approximately solve p-regression problem using random matrix sketches [29].

Lemma 8 (Song et al. [29]). Let Φ 2 Rφ n be a projection matrix that preserves pnorm of a vector for p 2 [1, 2) and let B 2 Rn d, C 2 Rn c be any matrix. Let e X := argmin X2Rd c kΦ(BX C)kp , b X := argmin X2Rd c k BX Ckp , then k B e X Ckp

Cφk B b X Ckp for some constant Cφ that depends only on log d.

Lemma 9 (Song et al. [29]). Given matrices L, N, A of appropriate dimension, let X := argmin X k LXN Akp. Suppose S and T satisﬁes c1-dilation on LX N A and c2-contraction

property on L. Further if b X be such that k S(L b XN A)Tkp c minrank(X) k k S(LXN A)Tkp ,

then, we have that k L b XN Akp O(c1c2c) minrank(X) k k LXN Akp .

Algorithm 1 ROBUST-LRA Input: Input data matrix A 2 Rn d, target rank k Output: Rank-k matrix M 2 Rn d

1: Initialization: Set the variables φ, , s, t, Cφ, C , Cs, Ct as in Table 2. 2: Initialization: Sample Φ 2 Rφ n, 2 Rd , S 2 Rs n, and T 2 Rd t from distributions

p , and D(d,t)

p , respectively. All these matrices are made public. 3: Sample: N1 2 Rφ d, N2 2 Rn , N3 2 Rs t such that N1 Lap(0, C /")n , N2

Lap(0, Cφ/")φ d, and N3 Lap(0, Cs Ct/")s t. Keep N1, N2, N3 private. 4: Sketch: Compute Yr = ΦA + N1, Yc = A + N2. 5: Sketch: Compute Zr =Yr T, Zc =SYc, Z =SAT +N3. 6: SVD: Compute [Uc, c, Vc] = SVD(Zc), [Ur, r, Vr] = SVD(Zr). 7: 2-LRA: Compute b X = Vc

r , where [B]k = argminr(X) k k B Xk F .

8: Output: M = Yc b XYr.

Table 2: Values of different variables.

Cφ, Cs C , Ct φ, , s, t O(log d) O(log n) O(k log k log(1/δ))

3 Differentially private robust LRA

In this section, we give an (", δ)-differentially private polynomial-time algorithm for robust low-rank approximation. We ﬁrst discuss algorithmic challenges in extending known techniques and analyses to our problem. We present the proposed algorithm and main results in Section 3.1, and discuss extensions to the general turnstile model and the continual release model in Section 3.2. Proofs of all results are deferred to the supplementary material of this paper.

Two common approaches to preserve privacy are output perturbation [11] and input perturbation [3, 30] of the objective function. In output perturbation, we ﬁrst compute the output (e.g. rank-k approximation of a given matrix) non-privately and then add appropriately scaled noise to preserve privacy. In input perturbation, we add noise to the private matrix and then compute the output on the noisy matrix. Both these approaches require adding noise to every entry of the given input matrix or to every entry of the non-private output matrix. Consequently, both of these methods would incur an additive error of O(nd). On the other hand, most existing non-private algorithms for robust low-rank approximation either use heuristics and do not have provable guarantees, or they make additional assumptions on the input matrix; the only exception is the work of Song et al. [29]. Again, a naive mechanism to make the algorithm of Song et al. [29] private would incur an additive error of O(nd).

3.1 Proposed Algorithm

It is somewhat tantalizing, from a computational perspective, to attempt approximating a solution to the robust LRA problem using a low-rank approximation with respect to 2-norm; however, it is well understood that the latter is quite sensitive to even a single outlier. A key idea behind the proposed solution then is based on the following key observation. We can approximate the output of robust low rank approximation using low rank approximation with respect to 2-norm after sketching the matrix using S D(r,n)

p and T D(c,d)

p for some choice of r and s. In particular, p-stable distribution imparts robustness, and the effect of outliers is reduced in the lower dimensional space.

In summary, the proposed algorithms are based on the following three algorithmic primitives: (a) sketching the row-space and column-space of the input matrix, (b) formulating the low-rank matrix approximation problem as a regression problem, and (c) approximating the solution to p regression problem by corresponding 2 regression problem. The analysis, then, carefully bounds the error in approximation for each of the steps above as well as error resulting from the privacy mechanism.

The pseudo-code of the proposed algorithm (ROBUST-LRA) is presented as Algorithm 1. We present values of various variables used in the algorithm in Table 2. Our main result is as follows.

Theorem 10. Algorithm ROBUST-LRA (see Algorithm 1) is (", δ)-differentially private. Furthermore, given a matrix A 2 Rn d, it runs in poly(k, n, d) time, e O(k(n + d)) space, and outputs a rank k matrix M such that, with probability 9/10 over the randomness of the algorithm,

k A Mkp O((k log k log(1/δ))2(2 p)/plog d log n)OPTk(A) + e O(k2(n+kd) log2(1/δ)/"),

where OPTk(A) := minrank(X) k k A Xkp.

In particular, for p = 1, we get

k A Mkp O(k2 log2 k log2(1/δ)log d log n)OPTk(A) + e O(k2(n+kd) log2(1/δ)/").

Remark 1. Algorithm ROBUST-LRA (Figure 1) outputs a low-rank matrix. However, it is possible to output a low-rank factorization without any loss in efﬁciency. It can be done by computing the SVD [U b X, b X, V b X] of b X, the QR decomposition of Yc and Yr to get orthonormal bases U of column space of Yc and V of the row space of Yr. The algorithm then outputs [UU b X, b X, V V b X] as a low-rank factorization. The extra running time of this algorithm is O(φ2d + 2n + φ 2) = e O(k2(n + d)). This is smaller than O(nd2) time if one naively factorizes M. Remark 2 (Additive Error). The additive error in Theorem 10 has a quadratic dependence on k. There is an implicit tradeoff between the additive and multiplicative error as k increases. When k is small, then error due to OPTk(A) is higher, and when k is larger, then the additive error is high. For instance when k equals to the rank of the matrix, then we have zero multiplicative error, but additive error is of order O(k2n). Note that O(kn) error is unavoidable because we are trying to hide every single entry of the matrix A. Without making additional strong assumptions such as (a) stochastic data, and/or (b) incoherence, and/or (c) bounded norms, O(kn) additive error is perhaps the best we can hope for. Intuitively, we have to privatize a k-dimensional latent representation of our data and therefore at least add noise proportional to kn.

3.2 Extension to Other Models of Differential Privacy

ROBUST LRA can be easily extended to the streaming model of computation [32] and the continual release model [12]. We ﬁrst deﬁne the basic streaming model of computation that we study. Deﬁnition 11 (General turnstile update model). In the general turnstile update model, a matrix A 2 Rn d is streamed in the form of tuple ( t, it, jt, ), where 1 it n, 1 jt d and t 2 R. An update is of the form Ait,jt Ait 1,jt 1 + t. The curator is required to output a robust PCA or robust subspace for the matrix at the end of the stream.

Matrix as a stream

(1,5,0) (1,5,0) (3,4,1) (6,1,1) (5,4,0) (1,5,1) (3,4,1) . . . . . .

Analyst s u

Small sketch

For example, in the ﬁgure, the server receives an update of 6 to A1,1 and it updates the small sketch using an update function, U.

At the end of the stream, the server uses the small sketch and runs an algorithm S to compute the function (low-rank approximation in our context).

We call two streams neighboring if they are formed by neighboring matrices. Note that the private matrix is stored only in the form of linear sketches, therefore, to get an algorithm in the general turnstile streaming model, we initialize Yr = N1, Yc = N2, and Z = N3. Then when we receive ( t, it, jt) 2 R [n] [d], we construct a matrix A(t) with all entries zero except for A(t)

it,jt = t. We then update the sketches as follows: Yc = Yc + ΦA(t), Yr = Yc + A(t) , and Z = Z + SA(t)T. Once all the updates are made, we simply run the last three steps of ROBUST-LRA. As a result, we get the following corollary. Corollary 12 (Informal). Algorithm ROBUST-LRA is an (", δ)-differentially private that on input a private matrix A in a general turnstile update model, outputs a rank k matrix M with the same accuracy guarantee as in Theorem 10.

ROBUST-LRA can also be extended to the following continual release setting [12]. Deﬁnition 13 (Continual release model). In a continual release model, a matrix A 2 Rn d is streamed in the form of tuple ( t, it, jt), where 1 it n, 1 j d and t 2 R. An update is of the form Ait,jt Ait 1,jt 1 + t. The curator is required to output a robust PCA or robust subspace for the matrix streamed up until any time t T.

Algorithm 2 ROBUST-PCA Input: Input data matrix A 2 Rd n, target rank k Output: Rank-k projection matrix 2 Rd d

1: Initialization: Set the variables φ, , t, Cφ, C , Ct as in Table 2. 2: Initialization: Sample Φ 2 Rφ d, 2 Rn , S 2 Rs d, and T 2 Rn t from distributions

p , and D(n,t)

p , respectively. All these matrices are made public. 3: Sample: N1 2 Rφ t, N2 2 Rd such that N1 Lap(0, CφCt/")φ t, N2 Lap(0, C /")d . Keep N1, N2 private. 4: Sketch: Compute Yr = ΦAT T + N1 and Yc = AT + N2. Zc = ΦYc and Z = Yr 5: SVD: Compute [Uc, c, Vc] = SVD(Zc), 6: [Ur, r, Vr] = SVD(Yr). 7: 2-LRA: Compute b X = Vc

r , where [B]k = argminr(X) k k B Xk F .

8: Pick: a permutation matrix Q 2 Rφ φ. 9: Compute: the full SVD of Yc b X, [U 0, 0, V 0]. Set U = U 0Q, = 0Q, and P = Φ (U ) . 10: Output: = PU (ΦPU ) Φ.

For outputting a low-rank approximation in the continual release model, we can use the generic transformation to store a binary tree that is constructed over the privatized sketches of the updates as its leaves [12]. When a new query for a range of updates is made, we accumulate the sketches of the dyadic partition of the range to compute the sketches for that range. We then compute the last three steps of ROBUST-LRA. We have the following result. Corollary 14. Algorithm ROBUST-LRA is an (", δ)-differentially private algorithm that on input matrix A in a streaming manner, runs in time poly(k, n, d, log T) and outputs a rank k matrix M (t) in the continual release model over T time epochs, such that, with probability at least 9/10, k A(t) M (t)kp O((k log k log(1/δ))2(2 p)/plog d log n)OPTk(A(t)) + e O(k2(n + kd) log T), where OPTk(A) is as in Theorem 10, and A(t) is the matrix up to t time epochs.

4 Differentially Private Robust Principal Component Analysis

In this section, we focus on the problem of robust PCA under the constraints of differential privacy. We ﬁrst present the proposed algorithm and then discuss extensions to the general turnstile model continual release model. Proofs of all results are deferred to the supplementary material of this paper.

The key ideas underlying the proposed algorithm, ROBUST-PCA (see Algorithm 2 for the pseudocode), and its analysis, essentially follow the techniques developed in the previous section for ROBUST-LRA, but with a couple of small modiﬁcations to get a better additive error. First, we only generate two sketches, Yr = ΦAT T + N1 and Yc = AT + N2, where , Φ, T are random sketching matrices and N1, N2 are noise matrices as deﬁned in Algorithm 2. Second, we solve a slightly different optimization problem:

min rank(Y ) k

&&AT (PU )Y (ΦAT )

where P, U, are as formed in Algorithm 2. We show that (ΦU P) is an approximate solution to min X

&&Φ(AT PU XΦAT )T

p. The rest of the proof then follows the same steps as in the proof of Theorem 10. In addition, we also show that is an orthonormal rank-k projection matrix. The above exposition focuses on the non-private setting for the sake of simplicity. The proof is more involved due to noise matrices added for privacy.

We show the following guarantee for the proposed algorithm. Theorem 15. Algorithm ROBUST-PCA, (see Algorithm 2), is (", δ)-differentially private. Further, given a matrix A 2 Rn d with OPTk(A) := minrank(X) k k A Xkp, it runs in time poly(k, n, d),

space e O(k(n+d)), and outputs a rank k orthonormal projection matrix such that, with probability 9/10 over the random coin tosses of the algorithm,

k A A kp O((k log k log(1/δ))2(2 p)/p log n log3 d)OPTk(A) + e O(k2d log n/").

In particular, when p = 1, we have the following guarantee:

k A A kp O(k2 log n log3 d log2 k log2(1/δ))OPTk(A) + e O(k2d log n/").

We note that ROBUST-PCA yields a smaller additive error than ROBUST-LRA by a factor of n/d, but at the expense of an additional multiplicative factor of log2(d). Therefore, in settings where OPTk(A) is small (e.g. when A is nearly low rank), ROBUST-PCA enjoys a much better accuracy guarantee.

Extension to Other Models of Differential Privacy. We can extend ROBUST-PCA to the streaming model of computation [32] and the continual release model [12] as in Section 3.2. We can also extend ROBUST-PCA to the local model of differential privacy. Local differential privacy has gained a lot of attention recently [1, 15]. In the local privacy model, there is no central database of private data. Instead, each individual has its own data element (a database of size one), and sends a report based on its own datum in a differentially private manner.

Formally, we consider the database X = [x1, , xn]> as a collection of n elements (rows) from some domain X Rd, with each row held by a different individual. The ith individual has access to "i-local randomizer, Ri : X ! W, which is an "i-differentially private algorithm that takes as input a database of size n = 1. We assume that the algorithms may interact with the database only through local randomizers. We can then deﬁne local differential privacy as follows [13]. An algorithm is "-locally differentially private if it accesses the database X via the local randomizers, R1(x1), . . . , Rn(xn), where max {"1, . . . , "n} ".

We note that what we have deﬁned above is a non-interactive local differential privacy algorithm where an individual only sends a single message to the server. It was argued in Smith et al. [27] that it is more desirable to have as few rounds of interactions as possible from an implementation point of view. In fact, existing large-scale deployments are limited to one that are non-interactive. Therefore, we limit our study to what is possible in the non-interactive variant of local differential privacy.

We extend Algorithm 2 to an "-locally-differentially private protocol, LOCAL-ROBUST-PCA, where every user 1 i n has a row Ai: of the data matrix and sends only one message to the server. We show that the output produced by the server after a run of LOCAL-ROBUST-PCA is a rank-k orthonormal projection matrix 2 Rd d such that

k A A kp O(log n log3 d (k log k log(1/δ))2(2 p)/p)OPTk(A) + e O(k2nd/").

The above guarantee is non-trivial when k Akp nd. Such an assumption is often valid in practical settings with large corruption to data matrices.

5 Discussion

In this paper, we present differentially private algorithms for robust low-rank approximation and for robust principal component analysis. In addition, we study extensions of our algorithms to a continual release model, the streaming model of computation, and the local model of differential privacy.

The bounds we provide involve a multiplicative factor that depends on the target rank k. Such a dependence was deemed necessary in non-private settings. In particular, Song et al. [29] show that if the exponential time hypothesis is true, then any linear-sketch based polynomial time algorithm for robust rank-k factorization incurs an (k1/2 γ) multiplicative approximation for some γ 2 (0, 0.5) that can be arbitrarily small. It is not clear immediately if such a result still holds when we allow an additive error in the approximation, as is the case here.

Acknowledgements

This research was supported in part by NSF BIGDATA grant IIS-1546482, NSF BIGDATA grant IIS-1838139, NSF Career CCF-1652257, and ONR Award N00014-18-1-2364.

[1] Apple tries to peek at user habits without violating privacy. The Wall Street Journal, 2016.

[2] Dimitris Achlioptas and Frank Mc Sherry. On spectral learning of mixtures of distributions. In

Learning Theory, pages 458 469. Springer, 2005.

[3] Jeremiah Blocki, Avrim Blum, Anupam Datta, and Or Sheffet. The Johnson-Lindenstrauss

Transform Itself Preserves Differential Privacy. In FOCS, pages 410 419, 2012.

[4] Avrim Blum, Katrina Ligett, and Aaron Roth. A learning theory approach to noninteractive

database privacy. Journal of the ACM (JACM), 60(2):12, 2013.

[5] J Paul Brooks, José H Dulá, and Edward L Boone. A pure l1-norm principal component analysis.

Computational statistics &amp; data analysis, 61:83 98, 2013.

[6] Mark Bun, Jelani Nelson, and Uri Stemmer. Heavy hitters and the structure of local privacy.

ar Xiv preprint ar Xiv:1711.04740, 2017.

[7] Emmanuel J Candès, Xiaodong Li, Yi Ma, and John Wright. Robust principal component

analysis? Journal of the ACM (JACM), 58(3):11, 2011.

[8] John M Chambers, Colin L Mallows, and BW Stuck. A method for simulating stable random

variables. Journal of the american statistical association, 71(354):340 344, 1976.

[9] Michael B Cohen, Sam Elder, Cameron Musco, Christopher Musco, and Madalina Persu.

Dimensionality reduction for k-means clustering and low rank approximation. In STOC, pages 163 172. ACM, 2015.

[10] Petros Drineas, Iordanis Kerenidis, and Prabhakar Raghavan. Competitive recommendation

systems. In STOC, pages 82 90. ACM, 2002.

[11] Cynthia Dwork, Frank Mc Sherry, Kobbi Nissim, and Adam Smith. Calibrating noise to

sensitivity in private data analysis. In TCC, pages 265 284. Springer, 2006.

[12] Cynthia Dwork, Moni Naor, Toniann Pitassi, and Guy N. Rothblum. Differential privacy under

continual observation. In Leonard J. Schulman, editor, STOC, pages 715 724. ACM, 2010.

[13] Cynthia Dwork and Aaron Roth. The algorithmic foundations of differential privacy. Founda-

tions and Trends in Theoretical Computer Science, 9(3-4):211 407, 2014.

[14] Cynthia Dwork, Kunal Talwar, Abhradeep Thakurta, and Li Zhang. Analyze Gauss: Optimal

Bounds for Privacy-Preserving Principal Component Analysis. In STOC, pages 11 20, 2014.

[15] Úlfar Erlingsson, Vasyl Pihur, and Aleksandra Korolova. Rappor: Randomized aggregatable

privacy-preserving ordinal response. In Proceedings of the 2014 ACM SIGSAC conference on computer and communications security, pages 1054 1067. ACM, 2014.

[16] Nicolas Gillis and Stephen A Vavasis. On the complexity of robust pca and 1-norm low-rank

matrix approximation. ar Xiv preprint ar Xiv:1509.09236, 2015.

[17] Moritz Hardt and Eric Price. The noisy power method: A meta algorithm with applications. In

Z. Ghahramani, M. Welling, C. Cortes, N.d. Lawrence, and K.q. Weinberger, editors, Advances in Neural Information Processing Systems 27, pages 2861 2869. Curran Associates, Inc., 2014.

[18] Moritz Hardt and Aaron Roth. Beating randomized response on incoherent matrices. In

Howard J. Karloff and Toniann Pitassi, editors, STOC, pages 1255 1268. ACM, 2012.

[19] Moritz Hardt and Aaron Roth. Beyond worst-case analysis in private singular vector computa-

tion. In Dan Boneh, Tim Roughgarden, and Joan Feigenbaum, editors, STOC, pages 331 340. ACM, 2013.

[20] Wuxuan Jiang, Cong Xie, and Zhihua Zhang. Wishart mechanism for differentially private

principal components analysis. ar Xiv preprint ar Xiv:1511.05680, 2015.

[21] Michael Kapralov and Kunal Talwar. On differentially private low rank approximation. In

SODA, volume 5, page 1. SIAM, 2013.

[22] Qifa Ke and Takeo Kanade. Robust l1 norm factorization in the presence of outliers and missing

data by alternative convex programming. In Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, volume 1, pages 739 746. IEEE, 2005.

[23] Eunwoo Kim, Minsik Lee, Chong-Ho Choi, Nojun Kwak, and Songhwai Oh. Efﬁcient l_{1}-

norm-based low-rank matrix approximations for large-scale problems using alternating rectiﬁed gradient method. IEEE transactions on neural networks and learning systems, 26(2):237 251, 2015.

[24] Panos P Markopoulos, Sandipan Kundu, Shubham Chamadia, and Dimitrios Pados. Efﬁcient

l1-norm principal-component analysis via bit ﬂipping. IEEE Transactions on Signal Processing, 2017.

[25] Frank Mc Sherry. Spectral partitioning of random graphs. In FOCS, pages 529 537. IEEE,

[26] Frank Mc Sherry and Kunal Talwar. Mechanism design via differential privacy. In FOCS, pages

94 103. IEEE, 2007.

[27] A. Smith, A. Thakurata, and J. Upadhyay. Is Interaction Necessary for Distributed Private

Learning? To Appear in IEEE Symposium for Security & Privacy, 2017.

[28] Christian Sohler and David P Woodruff. Subspace embeddings for the l 1-norm with applications.

In Proceedings of the forty-third annual ACM symposium on Theory of computing, pages 755 764. ACM, 2011.

[29] Zhao Song, David P. Woodruff, and Peilin Zhong. Low rank approximation with entrywise

1-norm error. In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2017, Montreal, QC, Canada, June 19-23, 2017, pages 688 701, 2017.

[30] Jalaj Upadhyay. Random Projections, Graph Sparsiﬁcation, and Differential Privacy. In

ASIACRYPT (1), pages 276 295, 2013.

[31] Jalaj Upadhyay. Differentially private linear algebra in the streaming model. ar Xiv preprint

ar Xiv:1409.5414, 2014.

[32] Jalaj Upadhyay. The price of privacy for low-rank factorization. In Advances in Neural

Information Processing Systems, pages 4180 4191, 2018.

[33] Vladimir M Zolotarev. One-dimensional stable distributions, volume 65. American Mathemati-

cal Soc., 1986.