# differentially_private_subspace_clustering__00fdf6b0.pdf Differentially Private Subspace Clustering Yining Wang, Yu-Xiang Wang and Aarti Singh Machine Learning Department, Carnegie Mellon Universty, Pittsburgh, USA {yiningwa,yuxiangw,aarti}@cs.cmu.edu Subspace clustering is an unsupervised learning problem that aims at grouping data points into multiple clusters so that data points in a single cluster lie approximately on a low-dimensional linear subspace. It is originally motivated by 3D motion segmentation in computer vision, but has recently been generically applied to a wide range of statistical machine learning problems, which often involves sensitive datasets about human subjects. This raises a dire concern for data privacy. In this work, we build on the framework of differential privacy and present two provably private subspace clustering algorithms. We demonstrate via both theory and experiments that one of the presented methods enjoys formal privacy and utility guarantees; the other one asymptotically preserves differential privacy while having good performance in practice. Along the course of the proof, we also obtain two new provable guarantees for the agnostic subspace clustering and the graph connectivity problem which might be of independent interests. 1 Introduction Subspace clustering was originally proposed to solve very specific computer vision problems having a union-of-subspace structure in the data, e.g., motion segmentation under an affine camera model [11] or face clustering under Lambertian illumination models [15]. As it gains increasing attention in the statistics and machine learning community, people start to use it as an agnostic learning tool in social network [5], movie recommendation [33] and biological datasets [19]. The growing applicability of subspace clustering in these new domains inevitably raises the concern of data privacy, as many such applications involve dealing with sensitive information. For example, [19] applies subspace clustering to identify diseases from personalized medical data and [33] in fact uses subspace clustering as a effective tool to conduct linkage attacks on individuals in movie rating datasets. Nevertheless, privacy issues in subspace clustering have been less explored in the past literature, with the only exception of a brief analysis and discussion in [29]. However, the algorithms and analysis presented in [29] have several notable deficiencies. For example, data points are assumed to be incoherent and it only protects the differential privacy of any feature of a user rather than the entire user profile in the database. The latter means it is possible for an attacker to infer with high confidence whether a particular user is in the database, given sufficient side information. It is perhaps reasonable why there is little work focusing on private subspace clustering, which is by all means a challenging task. For example, a negative result in [29] shows that if utility is measured in terms of exact clustering, then no private subspace clustering algorithm exists when neighboring databases are allowed to differ on an entire user profile. In addition, state-of-the-art subspace clustering methods like Sparse Subspace Clustering (SSC, [11]) lack a complete analysis of its clustering output, thanks to the notorious graph connectivity problem [21]. Finally, clustering could have high global sensitivity even if only cluster centers are released, as depicted in Figure 1. As a result, general private data releasing schemes like output perturbation [7, 8, 2] do not apply. In this work, we present a systematic and principled treatment of differentially private subspace clustering. To circumvent the negative result in [29], we use the perturbation of recovered low- dimensional subspace from the ground truth as the utility measure. Our contributions are two-fold. First, we analyze two efficient algorithms based on the sample-aggregate framework [22] and established formal privacy and utility guarantees when data are generated from some stochastic model or satisfy certain deterministic separation conditions. New results on (non-private) subspace clustering are obtained along our analysis, including a fully agnostic subspace clustering on well-separated datasets using stability arguments and exact clustering guarantee for thresholding-based subspace clustering (TSC, [14]) in the noisy setting. In addition, we employ the exponential mechanism [18] and propose a novel Gibbs sampler for sampling from this distribution, which involves a novel tweak in sampling from a matrix Bingham distribution. The method works well in practice and we show it is closely related to the well-known mixtures of probabilistic PCA model [27]. Related work Subspace clustering can be thought as a generalization of PCA and k-means clustering. The former aims at finding a single low-dimensional subspace and the latter uses zerodimensional subspaces as cluster centers. There has been extensive research on private PCA [2, 4, 10] and k-means [2, 22, 26]. Perhaps the most similar work to ours is [22, 4]. [22] applies the sample-aggregate framework to k-means clustering and [4] employs the exponential mechanism to recover private principal vectors. In this paper we give non-trivial generalization of both work to the private subspace clustering setting. 2 Preliminaries 2.1 Notations For a vector x Rd, its p-norm is defined as x p = (P i xp i )1/p. If p is not explicitly specified then the 2-norm is used. For a matrix A Rn m, we use σ1(A) σn(A) 0 to denote its singular values (assuming without loss of generality that n m). We use ξ to denote matrix norms, with ξ = 2 the matrix spectral norm and ξ = F the Frobenious norm. That is, A 2 = σ1(A) and A F = p Pn i=1 σi(A)2. For a q-dimensional subspace S Rd, we associate with a basis U Rd q, where the q columns in U are orthonormal and S = range(U). We use Sd q to denote the set of all q-dimensional subspaces in Rd. Given x Rd and S Rd, the distance d(x, S) is defined as d(x, S) = infy S x y 2. If S is a subspace associated with a basis U, then we have d(x, S) = x PS(x) 2 = x UU x 2, where PS( ) denotes the projection operator onto subspace S. For two subspaces S, S of dimension q, the distance d(S, S ) is defined as the Frobenious norm of the sin matrix of principal angles; i.e., d(S, S ) = sin Θ(S, S ) F = UU U U F , (1) where U, U are orthonormal basis associated with S and S , respectively. 2.2 Subspace clustering Given n data points x1, , xn Rd, the task of subspace clustering is to cluster the data points into k clusters so that data points within a subspace lie approximately on a low-dimensional subspace. Without loss of generality, we assume xi 2 1 for all i = 1, , n. We also use X = {x1, , xn} to denote the dataset and X Rd n to denote the data matrix by stacking all data points in columnwise order. Subspace clustering seeks to find k q-dimensional subspaces ˆC = { ˆS1, , ˆSk} so as to minimize the Wasserstein s distance or distance squared defined as d2 W ( ˆC, C ) = min π:[k] [k] i=1 d2( ˆSi, S π(i)), (2) where π are taken over all permutations on [k] and S are the optimal/ground-truth subspaces. In a model based approach, C is fixed and data points {xi}n i=1 are generated either deterministically or stochastically from one of the ground-truth subspaces in C with noise corruption; for a completely agnostic setting, C is defined as the minimizer of the k-means subspace clustering objective: C := argmin C={S1, ,Sk} Sd qcost(C; X) = argmin C={S1, ,Sk} Sd q 1 n i=1 min j d2(xi, Sj). (3) To simplify notations, we use k(X) = cost(C ; X) to denote cost of the optimal solution. Algorithm 1 The sample-aggregate framework [22] 1: Input: X = {xi}n i=1 Rd, number of subsets m, privacy parameters ε, δ; f, d M. 2: Initialize: s = m, α = ε/(5 p 2 ln(2/δ)), β = ε/(4(D + ln(2/δ))). 3: Subsampling: Select m random subsets of size n/m of X independently and uniformly at random without replacement. Repeat this step until no single data point appears in more than m of the sets. Mark the subsampled subsets XS1, , XSm. 4: Separate queries: Compute B = {si}m i=1 RD, where si = f(XSi). 5: Aggregation: Compute g(B) = si where i = argminm i=1ri(t0) with t0 = ( m+s 2 + 1). Here ri(t0) denotes the distance d M( , ) between si and the t0-th nearest neighbor to si in B. 6: Noise calibration: Compute S(B) = 2 maxk(ρ(t0 + (k + 1)s) e βk), where ρ(t) is the mean of the top s/β values in {r1(t), , rm(t)}. 7: Output: A(X) = g(B) + S(B) α u, where u is a standard Gaussian random vector. 2.3 Differential privacy Definition 2.1 (Differential privacy, [7, 8]). A randomized algorithm A is (ε, δ)-differentially private if for all X, Y satisfying d(X, Y) = 1 and all sets S of possible outputs the following holds: Pr[A(X) S] eε Pr[A(Y) S] + δ. (4) In addition, if δ = 0 then the algorithm A is ε-differentially private. In our setting, the distance d( , ) between two datasets X and Y is defined as the number of different columns in X and Y. Differential privacy ensures the output distribution is obfuscated to the point that every user has a plausible deniability about being in the dataset, and in addition any inferences about individual user will have nearly the same confidence before and after the private release. 3 Sample-aggregation based private subspace clustering In this section we first summarize the sample-aggregate framework introduced in [22] and argue why it should be preferred to conventional output perturbation mechanisms [7, 8] for subspace clustering. We then analyze two efficient algorithms based on the sample-aggregate framework and prove formal privacy and utility guarantees. We also prove new results in our analysis regarding the stability of k-means subspace clustering (Lem. 3.3) and graph connectivity (i.e., consistency) of noisy threshold-based subspace clustering (TSC, [14]) under a stochastic model (Lem. 3.5). 3.1 Smooth local sensitivity and the sample-aggregate framework Figure 1: Illustration of instability of k-means subspace clustering solutions (d = 2, k = 2, q = 1). Blue dots represent evenly spaced data points on the unit circle; blue crosses indicate an additional data point. Red lines are optimal solutions. Most existing privacy frameworks [7, 8] are based on the idea of global sensitivity, which is defined as the maximum output perturbation f(X1) f(X2) ξ, where maximum is over all neighboring databases X1, X2 and ξ = 1 or 2. Unfortunately, global sensitivity of clustering problems is usually high even if only cluster centers are released. For example, Figure 1 shows that the global sensitivity of k-means subspace clustering could be as high as O(1), which ruins the algorithm utility. To circumvent the above-mentioned challenges, Nissim et al. [22] introduces the sample-aggregate framework based on the concept of a smooth version of local sensitivity. Unlike global sensitivity, local sensitivity measures the maximum perturbation f(X) f(X ) ξ over all databases X neighboring to the input database X. The proposed sample-aggregate framework (pseudocode in Alg. 1) enjoys local sensitivity and comes with the following guarantee: Theorem 3.1 ([22], Theorem 4.2). Let f : D RD be an efficiently computable function where D is the collection of all databases and D is the output dimension. Let d M( , ) be a semimetric on the outer space of f. 1 Set ε > 2D/ m and m = ω(log2 n). The sample-aggregate algorithm A in Algorithm 1 is an efficient (ε, δ)-differentially private algorithm. Furthermore, if f and m are chosen such that the ℓ1 norm of the output of f is bounded by Λ and Pr XS X [d M(f(XS), c) r] 3 for some c RD and r > 0, then the standard deviation of Gaussian noise added is upper bounded by O(r/ε) + Λ D ). In addition, when m satisfies m = ω(D2 log2(r/Λ)/ε2), with high probability each coordinate of A(X) c is upper bounded by O(r/ε), where c depending on A(X) satisfies d M(c, c) = O(r). Let f be any subspace clustering solver that outputs k estimated low-dimensional subspaces and d M be the Wasserstein s distance as defined in Eq. (2). Theorem 3.1 provides privacy guarantee for an efficient meta-algorithm with any f. In addition, utility guarantee holds with some more assumptions on input dataset X. In following sections we establish utility guarantees. The main idea is to prove stability results as outlined in Eq. (5) for particular subspace clustering solvers and then apply Theorem 3.1. 3.2 The agnostic setting We first consider the setting when data points {xi}n i=1 are arbitrarily placed. Under such agnostic setting the optimal solution C is defined as the one that minimizes the k-means cost as in Eq. (3). The solver f is taken to be any (1+ϵ)-approximation2 of optimal k-means subspace clustering; that is, f always outputs subspaces ˆC satisfying cost( ˆC; X) (1 + ϵ)cost(C ; X). Efficient core-set based approximation algorithms exist, for example, in [12]. The key task of this section it to identify assumptions under which the stability condition in Eq. (5) holds with respect to an approximate solver f. The example given in Figure 1 also suggests that identifiability issue arises when the input data X itself cannot be well clustered. For example, no two straight lines could well approximate data uniformly distributed on a circle. To circumvent the above-mentioned difficulty, we impose the following well-separation condition on the input data X: Definition 3.2 (Well-separation condition for k-means subspace clustering). A dataset X is (φ, η, ψ)-well separated if there exist constants φ, η and ψ, all between 0 and 1, such that 2 k(X) min φ2 2 k 1(X), 2 k, (X) ψ, 2 k,+(X) + η , (6) where k 1, k, and k,+ are defined as 2 k 1(X) = min S1:k 1 Sdq cost({Si}; X); 2 k, (X) = min S1 Sd q 1,S2:k Sdq cost({Si}; X); and 2 k,+(X) = min S1 Sd q+1,S2:k Sdq cost({Si}; X). The first condition in Eq. (6), 2 k(X) φ2 2 k 1(X), constrains that the input dataset X cannot be well clustered using k 1 instead of k clusters. It was introduced in [23] to analyze stability of k-means solutions. For subspace clustering, we need another two conditions regarding the intrinsic dimension of each subspace. The 2 k(X) 2 k, (X) ψ asserts that replacing a q-dimensional subspace with a (q 1)-dimensional one is not sufficient, while 2 k(X) 2 k,+(X) + η means an additional subspace dimension does not help much with clustering X. The following lemma is our main stability result for subspace clustering on well-separated datasets. It states that when a candidate clustering ˆC is close to the optimal clustering C in terms of clustering cost, they are also close in terms of the Wasserstein distance defined in Eq. (2). Lemma 3.3 (Stability of agnostic k-means subspace clustering). Assume X is (φ, η, ψ)-well separated with φ2 < 1/1602, ψ > η. Suppose a candidate clustering ˆC = { ˆS1, , ˆSk} Sd q satisfies cost( ˆC; X) a cost(C ; X) for some a < 1 802φ2 800φ2 . Then the following holds: d W ( ˆC, C ) 600 k (1 150φ2)(ψ η). (7) The following theorem is then a simple corollary, with a complete proof in Appendix B. 1d M( , ) satisfies d M(x, y) 0, d M(x, x) = 0 and d M(x, y) d M(x, z) + d M(y, z) for all x, y, z. 2Here ϵ is an approximation constant and is not related to the privacy parameter ε. Algorithm 2 Threshold-based subspace clustering (TSC), a simplified version 1: Input: X = {xi}n i=1 Rd, number of clusters k and number of neighbors s. 2: Thresholding: construct G {0, 1}n n by connecting xi to the other s data points in X with the largest absolute inner products | xi, x |. Complete G so that it is undirected. 3: Clustering: Let X (1), , X (ℓ) be the connected components in G. Construct X (ℓ) by sampling q points from X (ℓ) uniformly at random without replacement. 4: Output: subspaces ˆC = { ˆS(ℓ)}k ℓ=1; ˆS(ℓ) is the subspace spanned by q arbitrary points in X (ℓ). Theorem 3.4. Fix a (φ, η, ψ)-well separated dataset X with n data points and φ2 < 1/1602, ψ > η. Suppose XS X is a subset of X with size m, sampled uniformly at random without replacement. Let ˆC = { ˆS1, , ˆS2} be an (1 + ϵ)-approximation of optimal k-means subspace clustering computed on XS. If m = Ω( kqd log(qd/γ 2 k(X)) γ 2 4 k(X) ) with γ < 1 802φ2 800φ2 2(1 + ϵ), then we have: d W ( ˆC, C ) 600 k (1 150φ2)(ψ η) where C = {S 1, , S k} is the optimal clustering on X; that is, cost(C ; X) = 2 k(X). Consequently, applying Theorem 3.4 together with the sample-aggregate framework we obtain a weak polynomial-time ε-differentially private algorithm for agnostic k-means subspace clustering, with additional amount of per-coordinate Gaussian noise upper bounded by O( φ2 k ε(ψ η)). Our bound is comparable to the one obtained in [22] for private k-means clustering, except for the (ψ η) term which characterizes the well-separatedness under the subspace clustering scenario. 3.3 The stochastic setting We further consider the case when data points are stochastically generated from some underlying true subspace set C = {S 1, , S k}. Such settings were extensively investigated in previous development of subspace clustering algorithms [24, 25, 14]. Below we give precise definition of the considered stochastic subspace clustering model: The stochastic model For every cluster ℓassociated with subspace S ℓ, a data point x(ℓ) i Rd belonging to cluster ℓcan be written as x(ℓ) i = y(ℓ) i + ε(ℓ) i , where y(ℓ) i is sampled uniformly at random from {y S ℓ: y 2 = 1} and εi N(0, σ2/d Id) for some noise parameter σ. Under the stochastic setting we consider the solver f to be the Threshold-based Subspace Clustering (TSC, [14]) algorithm. A simplified version of TSC is presented in Alg. 2. An alternative idea is to apply results in the previous section since the stochastic model implies well-separated dataset when noise level σ is small. However, the running time of TSC is O(n2d), which is much more efficient than core-set based methods. TSC is provably correct in that the similarity graph G has no false connections and is connected per cluster, as shown in the following lemma: Lemma 3.5 (Connectivity of TSC). Fix γ > 1 and assume max 0.04nℓ s min nℓ/6. If for every ℓ {1, , k}, the number of data points nℓand the noise level σ satisfy nℓ log nℓ > γπ 2q(12π)q 1 0.01(q/2 1)(q 1); σ(1 + σ) log n d 1 15 log n 1 min ℓ =ℓ d2(S ℓ, S ℓ ) q ; 12π γ 2πq log nℓ 0.01(q/2 1)(q 1) π where σ = 2 5σ + σ2. Then with probability at least 1 n2e ℓe nℓ/400 P ℓn1 γ ℓ /(γ log nℓ) 12/n P ℓnℓe c(nℓ 1), the connected components in G correspond exactly to the k subspaces. Conditions in Lemma 3.5 characterize the interaction between sample complexity nℓ, noise level σ and signal level minℓ =ℓ d(S ℓ, S ℓ ). Theorem 3.6 is then a simple corollary of Lemma 3.5. Complete proofs are deferred to Appendix C. Theorem 3.6 (Stability of TSC on stochastic data). Assume conditions in Lemma 3.5 hold with respect to n = n/m for ω(log2 n) m o(n). Assume in addition that limn nℓ= for all ℓ= 1, , L and the failure probability does not exceed 1/8. Then for every ϵ > 0 we have lim n Pr XS h d W ( ˆC, C ) > ϵ i = 0. (9) Compared to Theorem 3.4 for the agnostic model, Theorem 3.6 shows that one can achieve consistent estimation of underlying subspaces under a stochastic model. It is an interesting question to derive finite sample bounds for the differentially private TSC algorithm. 3.4 Discussion It is worth noting that the sample-aggregate framework is an (ε, δ)-differentially private mechanism for any computational subroutine f. However, the utility claim (i.e., the O(r/ε) bound on each coordinate of A(X) c) requires the stability of the particular subroutine f, as outlined in Eq. (5). It is unfortunately hard to theoretically argue for stability of state-of-the-art subspace clustering methods such as sparse subspace cluster (SSC, [11]) due to the graph connectivity issue [21]3. Nevertheless, we observe satisfactory performance of SSC based algorithms in simulations (see Sec. 5). It remains an open question to derive utility guarantee for (user) differentially private SSC. 4 Private subspace clustering via the exponential mechanism In Section 3 we analyzed two algorithms with provable privacy and utility guarantees for subspace clustering based on the sample-aggregate framework. However, empirical evidence shows that sample-aggregate based private clustering suffers from poor utility in practice [26]. In this section, we propose a practical private subspace clustering algorithm based on the exponential mechanism [18]. In particular, given the dataset X with n data points, we propose to samples parameters θ = ({Sℓ}k ℓ=1, {zi}n i=1) where Sℓ Sq d, zj {1, , k} from the following distribution: p(θ; X) exp i=1 d2(xi, Szi) where ε > 0 is the privacy parameter. The following proposition shows that exact sampling from the distribution in Eq. (10) results in a provable differentially private algorithm. Its proof is trivial and is deferred to Appendix D.1. Note that unlike sample-aggregate based methods, the exponential mechanism can privately release clustering assignment z. This does not violate the lower bound in [29] because the released clustering assignment z is not guaranteed to be exactly correct. Proposition 4.1. The random algorithm A : X 7 θ that outputs one sample from the distribution defined in Eq. (10) is ε-differential private. 4.1 A Gibbs sampling implementation It is hard in general to sample parameters from distributions as complicated as in Eq. (10). We present a Gibbs sampler that iteratively samples subspaces {Si} and cluster assignments {zj} from their conditional distributions. Update of zi: When {Sℓ} and z i are fixed, the conditional distribution of zi is p(zi|{Sℓ}k ℓ=1, z i; X) exp( ε/2 d2(xi, Szi)). (11) Since d(xi, Szi) can be efficiently computed (given an orthonormal basis of Szi), update of zi can be easily done by sampling zj from a categorical distribution. Update of Sℓ: Let e X (ℓ) = {xi X : zi = ℓ} denote data points that are assigned to cluster ℓand nℓ= | e X (ℓ)|. Denote e X(ℓ) Rd nℓas the matrix with columns corresponding to all data points in e X (ℓ). The distribution over Sℓconditioned on z can then be written as p(Sℓ= range(Uℓ)|z; X) exp(ε/2 tr(U ℓAℓUℓ)); Uℓ Rd q, U ℓUℓ= Iq q, (12) where Aℓ= e X(ℓ) e X(ℓ) is the unnormalized sample covariance matrix. Distribution of the form in Eq. (12) is a special case of the matrix Bingham distribution, which admits a Gibbs sampler [16]. We give implementation details in Appendix D.2 with modifications so that the resulting Gibbs sampler is empirically more efficient for a wide range of parameter settings. 3Recently [28] established full clustering guarantee for SSC, however, under strong assumptions. 4.2 Discussion The proposed Gibbs sampler resembles the k-plane algorithm for subspace clustering [3]. It is in fact a probabilistic version of k-plane since sampling is performed at each iteration rather than deterministic updates. Furthermore, the proposed Gibbs sampler could be viewed as posterior sampling for the following generative model: first sample Uℓuniformly at random from Sd q for each subspace Sℓ; afterwards, cluster assignments {zi}n i=1 are sampled such that Pr[zi = j] = 1/k and xi is set as xi = Uℓyi + PU ℓwi, where yi is sampled uniformly at random from the qdimensional unit ball and wi N(0, Id/ε). Connection between the above-mentioned generative model and Gibbs sampler is formally justified in Appendix D.3. The generative model is strikingly similar to the well-known mixtures of probabilistic PCA (MPPCA, [27]) model by setting variance parameters σℓin MPPCA to p 1/ε. The only difference is that yi are sampled uniformly at random from a unit ball 4 and noise wi is constrained to U ℓ, the complement space of Uℓ. Note that this is closely related to earlier observation that posterior sampling is private [20, 6, 31], but different in that we constructed a model from a private procedure rather than the other way round. As the privacy parameter ε (i.e., no privacy guarantee), we arrive immediately at the exact k-plane algorithm and the posterior distribution concentrates around the optimal k-means solution (C , z ). This behavior is similar to what a small-variance asymptotic analysis on MPPCA models reveals [30]. On the other hand, the proposed Gibbs sampler is significantly different from previous Bayesian probabilisitic PCA formulation [34, 30] in that the subspaces are sampled from a matrix Bingham distribution. Finally, we remark that the proposed Gibbs sampler is only asymptotically private because Proposition 4.1 requires exact (or nearly exact [31]) sampling from Eq. (10). 5 Numerical results We provide numerical results of both the sample-aggregate and Gibbs sampling algorithms on synthetic and real-world datasets. We also compare with a baseline method implemented based on the k-plane algorithm [3] with perturbed sample covariance matrix via the Su LQ framework [2] (details presented in Appendix E). Three solvers are considered for the sample-aggregate framework: threshold-based subspace clustering (TSC, [14]), which has provable utility guarantee with sampleaggregation on stochastic models, along with sparse subspace clustering (SSC, [11]) and low-rank representation (LRR, [17]), the two state-of-the-art methods for subspace clustering. For Gibbs sampling, we use non-private SSC and LRR solutions as initialization for the Gibbs sampler. All methods are implemented using Matlab. For synthetic datasets, we first generate k random q-dimensional linear subspaces. Each subspace is generated by first sampling a d q random Gaussian matrix and then recording its column space. n data points are then assigned to one of the k subspaces (clusters) uniformly at random. To generate a data point xi assigned with subspace Sℓ, we first sample yi Rq with yi 2 = 1 uniformly at random from the q-dimensional unit sphere. Afterwards, xi is set as xi = Uℓyi + wi, where Uℓ Rd q is an orthonormal basis associated with Sℓand wi N(0, σ2Id) is a noise vector. Figure 2 compares the utility (measured in terms of k-means objective cost( ˆC; X) and the Wasserstein s distance d W ( ˆC, C )) of sample aggregation, Gibbs sampling and Su LQ subspace clustering. As shown in the plots, sample-aggregation algorithms have poor utility unless the privacy parameter ε is truly large (which means very little privacy protection). On the other hand, both Gibbs sampling and Su LQ subspace clustering give reasonably good performance. Figure 2 also shows that Su LQ scales poorly with the ambient dimension d. This is because Su LQ subspace clustering requires calibrating noise to a d d sample covariance matrix, which induces much error when d is large. Gibbs sampling seems to be robust to various d settings. We also experiment on real-world datasets. The right two plots in Figure 2 report utility on a subset of the extended Yale Face Dataset B [13] for face clustering. 5 random individuals are picked, forming a subset of the original dataset with n = 320 data points (images). The dataset is preprocessed by projecting each individual onto a 9D affine subspace via PCA. Such preprocessing step was adopted in [32, 29] and was theoretically justified in [1]. Afterwards, ambient dimension of the entire dataset is reduced to d = 50 by random Gaussian projection. The plots show that Gibbs sampling significantly outperforms the other algorithms. 4In MPPCA latent variables yi are sampled from a normal distribution N(0, ρ2Iq). 1 0.5 0 0.5 1 1.5 2 2.5 3 0 K means cost s.a., SSC s.a., TSC s.a., LRR exp., SSC exp. LRR Su LQ 10 Su LQ 50 1 0.5 0 0.5 1 1.5 2 2.5 3 0 K means cost s.a., SSC s.a., TSC s.a., LRR exp., SSC exp. LRR Su LQ 10 Su LQ 50 1 0.5 0 0.5 1 1.5 2 2.5 3 0 K means cost s.a., SSC s.a., TSC s.a., LRR exp., SSC exp. LRR Su LQ 10 Su LQ 50 1 0.5 0 0.5 1 1.5 2 2.5 3 0.5 Wasserstein distance s.a., SSC s.a., TSC s.a., LRR exp., SSC exp. LRR Su LQ 10 Su LQ 50 1 0.5 0 0.5 1 1.5 2 2.5 3 0 Wasserstein distance s.a., SSC s.a., TSC s.a., LRR exp., SSC exp. LRR Su LQ 10 Su LQ 50 1 0.5 0 0.5 1 1.5 2 2.5 3 2 Wasserstein distance s.a., SSC s.a., TSC s.a., LRR exp., SSC exp. LRR Su LQ 10 Su LQ 50 Figure 2: Utility under fixed privacy budget ε. Top row shows k-means cost and bottom row shows the Wasserstein s distance d W ( ˆC, C ). From left to right: synthetic dataset, n = 5000, d = 5, k = 3, q = 3, σ = 0.01; n = 1000, d = 10, k = 3, q = 3, σ = 0.1; extended Yale Face Dataset B (a subset). n = 320, d = 50, k = 5, q = 9, σ = 0.01. δ is set to 1/(n ln n) for (ε, δ)-privacy algorithms. s.a. stands for smooth sensitivity and exp. stands for exponential mechanism. Su LQ-10 and Su LQ-50 stand for the Su LQ framework performing 10 and 50 iterations. Gibbs sampling is run for 10000 iterations and the mean of the last 100 samples is reported. 0 20 40 60 80 100 0 100 iterations Test statistic ε=0.1 ε=1 ε=10 ε=100 0 20 40 60 80 100 0 100 iterations K means cost 0 20 40 60 80 100 0 100 iterations Wasserstein distance Figure 3: Test statistics, k-means cost and d W ( ˆC, C ) of 8 trials of the Gibbs sampler under different privacy settings. Synthetic dataset setting: n = 1000, d = 10, k = 3, q = 3, σ = 0.1. In Figure 3 we investigate the mixing behavior of proposed Gibbs sampler. We plot for multiple trials of Gibbs sampling the k-means objective, Wasserstein s distance and a test statistic 1/ kq (Pk ℓ=1 1/T PT t=1 U(t) ℓ 2 F )1/2, where U(t) ℓ is a basis sample of Sℓat the tth iteration. The test statistic has mean zero under distribution in Eq. (10) and a similar statistic was used in [4] as a diagnostic of the mixing behavior of another Gibbs sampler. Figure 3 shows that under various privacy parameter settings, the proposed Gibbs sampler mixes quite well after 10000 iterations. 6 Conclusion In this paper we consider subspace clustering subject to formal differential privacy constraints. We analyzed two sample-aggregate based algorithms with provable utility guarantees under agnostic and stochastic data models. We also propose a Gibbs sampling subspace clustering algorithm based on the exponential mechanism that works well in practice. Some interesting future directions include utility bounds for state-of-the-art subspace clustering algorithms like SSC or LRR. Acknowledgement This research is supported in part by grant NSF CAREER IIS-1252412, NSF Award BCS-0941518, and a grant by Singapore National Research Foundation under its International Research Centre @ Singapore Funding Initiative administered by the IDM Programme Office. [1] R. Basri and D. Jacobs. Lambertian reflectance and linear subspaces. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(2):218 233, 2003. [2] A. Blum, C. Dwork, F. Mc Sherry, and K. Nissim. Practical privacy: the SULQ framework. In PODS, 2015. [3] P. S. Bradley and O. L. Mangasarian. k-plane clustering. Journal of Global Optimization, 16(1), 2000. [4] K. Chaudhuri, A. Sarwate, and K. Sinha. Near-optimal algorithms for differentially private principal components. In NIPS, 2012. [5] Y. Chen, A. Jalali, S. Sanghavi, and H. Xu. Clustering partially observed graphs via convex optimization. The Journal of Machine Learning Research, 15(1):2213 2238, 2014. [6] C. Dimitrakakis, B. Nelson, A. Mitrokotsa, and B. I. Rubinstein. Robust and private bayesian inference. In Algorithmic Learning Theory, pages 291 305. Springer, 2014. [7] C. Dwork, K. Kenthapadi, F. Mc Sherry, I. Mironov, and M. Naor. Our data, ourselves: Privacy via distributed noise generation. In EUROCRYPT, 2006. [8] C. Dwork, F. Mc Sherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. In TCC, 2006. [9] C. Dwork and A. Roth. The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science, 9(3 4):211 407, 2014. [10] C. Dwork, K. Talwar, A. Thakurta, and L. Zhang. Analyze Gauss: Optimal bounds for privacy-preserving principal component analysis. In STOC, 2014. [11] E. Elhamifar and R. Vidal. Sparse subspace clustering: Algorithm, theory and applications. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(11):2765 2781, 2013. [12] D. Feldman, M. Schmidt, and C. Sohler. Turning big data into tiny data: Constant-size coresets for k-means, pca and projective clustering. In SODA, 2013. [13] A. Georghiades, P. Belhumeur, and D. Kriegman. From few to many: Illumination cone models for face recognition under variable lighting and pose. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(6):643 660, 2001. [14] R. Heckel and H. B olcskei. Robust subspace clustering via thresholding. ar Xiv:1307.4891, 2013. [15] J. Ho, M.-H. Yang, J. Lim, K.-C. Lee, and D. Kriegman. Clustering appearances of objects under varying illumination conditions. In CVPR, 2003. [16] P. Hoff. Simulation of the matrix bingham-conmises-fisher distribution, with applications to multivariate and relational data. Journal of Computational and Graphical Statistics, 18(2):438 456, 2009. [17] G. Liu, Z. Lin, S. Yan, J. Sun, Y. Ma, and Y. Yu. Robust recovery of subspace structures by low-rank representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(1):171 184, 2012. [18] F. Mc Sherry and K. Talwar. Mechanism design via differential privacy. In FOCS, 2007. [19] B. Mc Williams and G. Montana. Subspace clustering of high-dimensional data: a predictive approach. Data Mining and Knowledge Discovery, 28(3):736 772, 2014. [20] D. J. Mir. Differential privacy: an exploration of the privacy-utility landscape. Ph D thesis, Rutgers University, 2013. [21] B. Nasihatkon and R. Hartley. Graph connectivity in sparse subspace clustering. In CVPR, 2011. [22] K. Nissim, S. Raskhodnikova, and A. Smith. Smooth sensitivity and sampling in private data analysis. In STOC, 2007. [23] R. Ostrovksy, Y. Rabani, L. Schulman, and C. Swamy. The effectiveness of Lloyd-type methods for the k-means problem. In FOCS, 2006. [24] M. Soltanolkotabi, E. J. Candes, et al. A geometric analysis of subspace clustering with outliers. The Annals of Statistics, 40(4):2195 2238, 2012. [25] M. Soltanolkotabi, E. Elhamifa, and E. Candes. Robust subspace clustering. The Annals of Statistics, 42(2):669 699, 2014. [26] D. Su, J. Cao, N. Li, E. Bertino, and H. Jin. Differentially private k-means clustering. ar Xiv, 2015. [27] M. Tipping and C. Bishop. Mixtures of probabilistic principle component anlyzers. Neural computation, 11(2):443 482, 1999. [28] Y. Wang, Y.-X. Wang, and A. Singh. Clustering consistent sparse subspace clustering. ar Xiv, 2015. [29] Y. Wang, Y.-X. Wang, and A. Singh. A deterministic analysis of noisy sparse subspace clustering for dimensionality-reduced data. In ICML, 2015. [30] Y. Wang and J. Zhu. DP-space: Bayesian nonparametric subspace clustering with small-variance asymptotic analysis. In ICML, 2015. [31] Y.-X. Wang, S. Fienberg, and A. Smola. Privacy for free: Posterior sampling and stochastic gradient monte carlo. In ICML, 2015. [32] Y.-X. Wang and H. Xu. Noisy sparse subspace clustering. In ICML, pages 89 97, 2013. [33] A. Zhang, N. Fawaz, S. Ioannidis, and A. Montanari. Guess who rated this movie: Identifying users through subspace clustering. ar Xiv, 2012. [34] Z. Zhang, K. L. Chan, J. Kwok, and D.-Y. Yeung. Bayesian inference on principal component analysis using reversible jump markov chain monte carlo. In AAAI, 2004.