# a_nullspace_property_for_subspacepreserving_recovery__99fbaa3a.pdf

A Nullspace Property for Subspace-Preserving Recovery

Mustafa D. Kaba 1 Chong You 2 Daniel P. Robinson 3 Enrique Mallada 4 Ren e Vidal 5

Much of the theory for classical sparse recovery is based on conditions on the dictionary that are both necessary and sufﬁcient (e.g., nullspace property) or only sufﬁcient (e.g., incoherence and restricted isometry). In contrast, much of the theory for subspace-preserving recovery, the theoretical underpinnings for sparse subspace classiﬁcation and clustering methods, is based on conditions on the subspaces and the data that are only sufﬁcient (e.g., subspace incoherence and data innerradius). This paper derives a necessary and sufﬁcient condition for subspace-preserving recovery that is inspired by the classical nullspace property. Based on this novel condition, called here the subspace nullspace property, we derive equivalent characterizations that either admit a clear geometric interpretation that relates data distribution and subspace separation to the recovery success, or can be veriﬁed using a ﬁnite set of extreme points of a properly deﬁned set. We further exploit these characterizations to derive new sufﬁcient conditions, based on inner-radius and outer-radius measures and dual bounds, that generalize existing conditions and preserve the geometric interpretations. These results ﬁll an important gap in the subspace-preserving recovery literature.

1. Introduction

Many machine learning problems involve the analysis of high-dimensional data whose intrinsic dimension is much lower than the ambient dimension. When such data comes from multiple classes, it can be well approximated by a union of low-dimensional subspaces, where each subspace

1e Bay Inc., San Jose, CA, USA. 2Dept. of Elect. Eng. & Comp. Sci., University of California at Berkeley, Berkeley, CA, USA. 3Dept. of Ind. & Sys. Eng., Lehigh University, Bethlehem, PA, USA. 4Dept. of Elect. & Comp. Eng., Johns Hopkins University, Baltimore, MD, USA. 5MINDS & Dept. of Biom. Eng., Johns Hopkins University, Baltimore, MD, USA.. Correspondence to: Mustafa D. Kaba <mkaba@ebay.com>.

Proceedings of the 38 th International Conference on Machine Learning, PMLR 139, 2021. Copyright 2021 by the author(s).

corresponds to one class. This has motivated the development of many supervised and unsupervised methods for learning a union of subspaces, a.k.a. the subspace classiﬁcation and subspace clustering tasks (Vidal et al., 2016).

Among existing methods for subspace classiﬁcation and clustering, sparse representation based methods (Wright et al., 2009; Elhamifar & Vidal, 2009; Soltanolkotabi & Cand es, 2012; Soltanolkotabi et al., 2014; You & Vidal, 2015; Robinson et al., 2019) have drawn much attention due to their simplicity, broad theoretical guarantees and superior empirical performance. Such methods are based on expressing each data point as a linear combination of the other data points while enforcing ℓ1 regularization on the representation coefﬁcients. The idea is that the nonzero entries of the coefﬁcient vector should correspond to data points that lie in the same subspace as the point being represented. Coefﬁcient vectors that have that property are referred to as subspace-preserving (Vidal et al., 2016), a notion that plays a central role in establishing the correctness of sparse subspace classiﬁcation and clustering methods. In particular, the recovery of subspace-preserving solutions is provably guaranteed when the subspaces are independent (Elhamifar & Vidal, 2009), disjoint (Elhamifar & Vidal, 2010) and non-trivially intersecting (Soltanolkotabi & Cand es, 2012; Soltanolkotabi et al., 2014; You & Vidal, 2015). Recently, such guarantees have been further extended to a broader range of applications that deal with dimension-reduced data (Wang et al., 2015), corrupted data (Soltanolkotabi & Cand es, 2012; Wang & Xu, 2016; You et al., 2017), data with missing entries (Tsakiris & Vidal, 2018), and afﬁne subspaces (Li et al., 2018; You et al., 2019).

Most of the existing theoretical conditions for subspacepreserving recovery characterize the geometry of the union of subspaces (e.g., via subspace independence (Elhamifar & Vidal, 2009; Lu et al., 2012) and subspace incoherence (Soltanolkotabi & Cand es, 2012; You & Vidal, 2015)) and the distribution of points in the subspaces (e.g., via inner-radius and circumradius (Soltanolkotabi & Cand es, 2012; You & Vidal, 2015)). Such geometric conditions have clear geometric interpretations and provide an important understanding of the regimes in which sparse methods are applicable. However, such conditions are only sufﬁcient and a tight characterization of subspace-preserving recovery has been missing in the existing literature.

A Nullspace Property for Subspace-Preserving Recovery

In this paper, we derive necessary and sufﬁcient conditions for subspace-preserving recovery. Our conditions are inspired by the classical Nullspace Property (NSP) (Foucart & Rauhut, 2013, Defn. 4.1)1, which has been widely used in classical sparse recovery, not only as a tool to obtain results of theoretical importance, but also as a powerful tool to obtain stable (Foucart & Rauhut, 2013, Thm. 4.14) and robust (Foucart & Rauhut, 2013, Thm. 4.19) recovery guarantees. However, the NSP appears not to have been explored in the subspace classiﬁcation and clustering literature.2 Arguably, one of the reasons is that necessary and sufﬁcient conditions such as the NSP are often difﬁcult to verify. This paper not only introduces an NSP that is successfully tailored for subspace-preserving recovery, but also derives equivalent characterizations of it based on the set of extreme points that reduce the veriﬁcation of the condition to a ﬁnite set. Our primary goal is to present an in-depth theoretical analysis of subspace-preserving recovery through a nullspace-propertylike condition, and not necessarily to construct computationally efﬁcient tools. Hence, the conditions we derive can sometimes be computationally hard to verify, as is often the case with other conditions in the literature (Elhamifar & Vidal, 2009; Lu et al., 2012; Soltanolkotabi & Cand es, 2012; You & Vidal, 2015).

More speciﬁcally, the contributions of this paper can be summarized as follows:

We derive a necessary and sufﬁcient condition for subspace-preserving recovery (Thm. 1) that we call the subspace nullspace property (SNSP). We show that the SNSP holds if and only if it holds on a certain subset of the columns of the data matrix X, which has the potential of signiﬁcantly facilitating the veriﬁcation of the SNSP for the subspace classiﬁcation task (Lemma 1).

We provide a characterization of the SNSP based on a comparison of the symmetrized convex hull of the data points on a subspace and the symmetrized convex hull of the remaining data points, which provides a very clear geometric interpretation for the SNSP (Thm. 2). We derive a sufﬁcient condition from this characterization, which is also geometric in nature and simpliﬁes the decision to a comparison between the inner-radius and outer-radius of two compact convex sets (Cor. 2).

By exploiting convexity of the recovery problem in the

1Since the NSP is a necessary and sufﬁcient condition, sufﬁcient conditions like Restricted Isometry Property implies the NSP (Foucart & Rauhut, 2013, Sect. 6.2). 2As far as we know, (Robinson et al., 2019) is the only prior work that discusses a necessary and sufﬁcient condition for subspace-preserving recovery and establishes a connection between their condition (based on dual certiﬁcate analysis) and the NSP. However, the exposition in (Robinson et al., 2019) is limited to a comment and an in-depth analysis is missing in the literature.

primal space, we provide a novel characterization of the SNSP that reduces its veriﬁcation to a ﬁnite subset of the extreme points of the intersection of the nullspace of X and the ℓ1-ball (Thm. 3). This characterization requires solving an ℓ1-minimization problem for each extreme point. By replacing the ℓ1-minimization problem by its dual and exploiting convexity in the dual space, we formulate a characterization of the SNSP that can be veriﬁed by a decision on a ﬁnite set only (Thm. 5).

We introduce bounds on the dual of the ℓ1-minimization problem when the columns of X are assumed to have unit ℓp-norm (Prop. 1). These bounds allow us to simplify the aforementioned characterization of the SNSP based on extreme points to obtain a sufﬁcient condition for subspace-preserving recovery that is simpler to check (Thm. 6).

2. Preliminaries and Problem Formulation

2.1. Notation and Preliminaries

The set of integers from 1 up to N is denoted as [N] := {1, . . . , N}. For any c RN, the support of c is denoted as Supp(c) := {k [N] : ck = 0}. The vector c is called s-sparse if | Supp(c)| s. For any index set S [N], the complement of S in [N] is denoted by Sc. For a nonempty set S [N], the vector c S R|S| denotes the part of c that is supported on S. We use Pr S RN N to denote the matrix that projects onto the coordinates in S and sets all other coordinates to zero. For a matrix X RD N and an index set S [N], the matrix XS RD |S| denotes the submatrix of X consisting of the columns of X indexed by S. Therefore, for all c RN, we have X Pr S c = XSc S. If S = {j} for some j, we write xj instead of XS to refer to the jth-column of X. We prioritize the subscript over superscript so that X S (XS) , and not (X )S. Finally, Null(X) denotes the nullspace of the matrix X, and X 1( ) denotes the inverse image under X.

The ℓp-norm of a vector x RD is deﬁned as x p := (PD k=1 |xk|p) 1 p , where | | denotes the absolute value. The unit ℓp-sphere is denoted by SD 1 p := {x RD : x p = 1} and the unit ℓp-ball by BD p := {x RD : x p 1}.

The convex hull is denoted by conv( ). We denote the convex hull of the union of the columns of X and X by K(X). Sometimes we refer to it as the symmetrized convex hull of the columns of X.

For a nonempty convex set C RD, the set of extreme points of C is denoted as Ext(C). These are precisely the points that cannot be written as a nontrivial convex combination of two distinct points in C. The afﬁne hull of C, denoted by aﬀ(C), is the smallest afﬁne set in RD that contains C. The relative interior

A Nullspace Property for Subspace-Preserving Recovery

(Rockafellar, 1970, p.44) of C is deﬁned as rinte(C) := x aﬀ(C) : ε > 0, (x + εBD 2 ) aﬀ(C) C) . The polar (Rockafellar, 1970, p.125) of C is deﬁned as C :={q RD : q x 1 for all x C}. Note that C is always a closed, convex set (Rockafellar, 1970, p.125).

We deﬁne the inner-ℓp-radius of a nonempty compact convex set C RD containing the origin as the radius of the largest ℓp-ball (conﬁned to the linear span of C) one can inscribe inside C, and denote it by rp (C). That is, rp (C) := max{α R>0 : α(BD p span(C)) C}, where span(C) denotes the subspace spanned by C and R>0 denotes the set of positive real numbers. Likewise, we deﬁne the outer-ℓp-radius of C as the radius of the smallest ℓp-ball that contains C, and denote it by Rp (C). That is, Rp (C) := min{β R>0 : βBD p C}.

2.2. Sparse Subspace Classiﬁcation and Clustering

Let X = [x1, , x N] RD N be a matrix with nonzero columns drawn from a union of n subspaces Sn i=1 Si RD of dimension di = dim(Si). Let P := {Pi}n i=1 be a partitioning of [N] deﬁned according to the memberships of the columns of X to the subspaces, i.e., Pi := {k [N] : xk Si}. 3 Assuming P is known, the goal of subspace classiﬁcation is to assign a new nonzero point y Si for some i [n] to the subspace Si that it belongs.

We assume that the columns of XPi span Si, hence any point y Si can be expressed as a linear combination of the columns of XPi, i.e., there exists a c RN with support in Pi such that y = Xc. Given y Si, a solution c to the system y = Xc is called subspace-preserving if and only if Supp(c) Pi. Note that such a vector c need not be unique, since in general the number of columns of XPi will be larger than the dimension of Si. Nonetheless, assuming that the largest subspace dimension d = maxi di is small relative to N, it follows that all subspace-preserving vectors c must be d-sparse. This motivated (Wright et al., 2009) to tackle the subspace classiﬁcation problem by solving the basis pursuit problem

min c RN c 1 s.t. y = Xc, (1)

where the ℓ1 norm is used as a proxy for the sparsity of c. To understand the correctness of this approach, we consider the following theoretical question concerning subspacepreserving recovery:

(Q) What are the necessary and sufﬁcient conditions on the union of subspaces Sn i=1 Si and the data X such that for all i [n] and all y Si, all solutions to (1) are subspace-preserving?

3We implicitly assume that no column of X lies at the intersection of two distinct subspaces from the union, which is not a major assumption for data coming from real applications.

The answer to (Q) provides necessary and sufﬁcient conditions under which any test data point y Sn i=1 Si is correctly classiﬁed by assigning it to the subspace determined by Supp(c). 4

Closely related to subspace classiﬁcation is the problem of subspace clustering, where we assume that the subspaces and the membership of the columns of X to their respective subspaces {Si} are unknown and the goal is to segment the columns of X into their respective subspaces, i.e., to ﬁnd P. The work of (Elhamifar & Vidal, 2013) addressed the subspace clustering problem by solving a modiﬁed basis pursuit problem where each data point is expressed as a combination of other data points. More precisely, for each j [N], the authors propose to solve

min c RN 1 c 1 s.t. xj = X jc,

where X j denotes the submatrix of X consisting of all but the jth column. For each j, the solution of this optimization problem provides a sparse representation of the data point xj in terms of the other data points, which is then used to cluster the data. The conditions under which such a representation is subspace-preserving can help guarantee the correct segmentation of the data points. Hence, ﬁnding an answer to (Q) has important implications in subspace clustering setting. In this paper, we focus on answering (Q) for the subspace classiﬁcation problem, and refer the reader to (You & Vidal, 2015; Robinson et al., 2019) for a detailed discussion of the implications of such answers in the subspace clustering setting.

We end this section by a comparison of the subspace classiﬁcation problem and the recovery of the block sparse signals. Over the years researchers studied the different aspects of this problem in detail. For instance, there is work on exact recovery of block sparse signals (Jenatton et al., 2011; Bajwa et al., 2015) and the recovery of the support of the unknown block sparse signal (Obozinski et al., 2011; Kolar et al., 2011). A nullspace characterization of the recovery of block sparse signals is also available (Stojnic et al., 2009). However, the existing work on block sparse signals differs from the problem we study here in one major way: We are neither interested in the correct recovery of the support nor the correct recovery of the signal itself. Even if the minimizer of (1) has a support different than the original signal, the recovery attempt can be deemed successful in the setting we study. To the best of our knowledge, the problem (Q) was never studied in the literature addressing the recovery of block sparse signals.

4Note that if y 0, then the solution to (1) is c 0. Therefore, the support of c is the empty set, and we have = Supp( c) Pi for all i [n]. That is, when y 0, the solution to (1) is always subspace-preserving. In particular, y 0 is assigned to all subspaces, which is in accordance with the fact that all subspaces contain the origin.

A Nullspace Property for Subspace-Preserving Recovery

3. A Nullspace Property for Subspace-preserving Recovery

This section provides an initial answer to (Q) based on a necessary and sufﬁcient condition that is inspired by the classical NSP for sparse recovery. We call this condition the Subspace Nullspace Property (SNSP) in homage to its ties with subspace classiﬁcation and clustering, and the NSP.

Deﬁnition 1 (SNSP). We denote the set of vectors in the nullspace of X whose support is not contained in any single element of the partition P by Null(X, P). That is,

Null(X, P):={η Null(X) : Supp(η) P, P P} . (2) We say that X satisﬁes the SNSP if and only if for all η Null(X, P) and P P, we have

min z:XP z=XP ηP z 1 < ηP c 1. (3)

We ﬁnd it informative to brieﬂy discuss how we interpret the SNSP. It tells us that the vectors that are relevant in characterizing the subspace-preserving recovery are those vectors in Null(X) whose supports are not contained within a single element of the partition P. This is the ﬁrst level of attention SNSP introduces. Then, (3) further rules out certain vectors in Null(X, P): For any η Null(X, P), if there is ˆη Null(X, P) with ηP c = ˆηP c and XP ˆηP 1 < XP ηP 1, then it is enough to check (3) for ˆη only. Hence, the variation in η Null(X, P) introduced by the nullspace of XP is irrelevant, unless it is reducing the ℓ1-norm of ηP .

When XP has full column rank, the only solution to XP z = XP ηP is z = ηP , and the condition in (3) becomes ηP 1 < ηP c 1. Such a condition is closely related to the classical NSP of order s, which requires ηP 1 < ηP c 1 to hold for all s-sparse patterns, i.e. for all P with |P| s. Therefore, the SNSP differs from the NSP in two directions. First, it does not require the columns of XP to be linearly independent, thus allowing for nonunique sparse solutions. This makes sense for subspace classiﬁcation and clustering, where the goal is to determine the correct subspace rather than a unique sparse representation, hence the speciﬁc support of the solution is irrelevant as long as it gives the correct subspace. Second, the SNSP is veriﬁed over the partitioning P of the data according to the membership to the subspaces, while the NSP is veriﬁed with respect to all s-sparse patterns.

The next result shows that SNSP gives an answer to (Q).5

Theorem 1. The solution to (1) is subspace-preserving for all i [n] and y Si if and only if X satisﬁes the SNSP. That is, for all i [n] and for all c RN supported in Pi,

5All missing proofs can be found in the supplementary material.

the support of any minimizer of

min c:Xc=X c c 1 (4)

is contained in Pi if and only if X satisﬁes the SNSP.

Proof. : Let η Null(X, P) and P P. Since 0 = Xη = X (Pr P (η) + Pr P c(η)), we have

X Pr P (η) = X Pr P c( η). (5)

Note that Supp(Pr P (η)) P P. Then, by the hypothesis, the support of any minimizer of

min z:X Pr P (η)=Xz z 1 (6)

is a subset of P. Therefore,

min c:XP (ηP )=XP (c) c 1 = min z:X Pr P (η)=Xz z 1. (7)

By (5), Pr P c( η) is feasible for (6). Moreover, Supp(Pr P c( η)) = , because η Null(X, P). Hence, Supp(Pr P c( η)) P. Therefore, Pr P c( η) cannot be a minimizer of (6). So,

min z:X Pr P (η)=Xz z 1 < Pr P c( η) 1 = ηP c 1.

Combining with (7), we get minc:XP (ηP )=XP (c) c 1 < ηP c 1. Hence, X satisﬁes the SNSP.

: Suppose that X satisﬁes the SNSP. Let c RN with Supp( c) P for some P P. Suppose on the contrary that (4) has a minimizer ˆc with Supp(ˆc) \ P = . Let η := c ˆc and note that η Null(X) and Supp(η)\P = . If there exists Q P with Supp(η) Q, then necessarily c P = ˆc P , and so c 1 = c P 1 = ˆc P 1 < ˆc 1. This contradicts ˆc being a minimizer of (4). Hence, we can assume that η Null(X, P), so that

min z:X c=Xz z 1 = min z:X Pr P ( c)=Xz z 1,

since P Supp( c). Setting y := z Pr P (ˆc), we obtain

min z:X c=Xz z 1 = min y:X Pr P ( c ˆc)=Xy y + Pr P (ˆc) 1.

Then, by the triangle inequality,

min z:X c=Xz z 1 min y:XP (ηP )=Xy y 1 + ˆc P 1.

Shrinking the constraint set gives us

min z:X c=Xz z 1 min c:XP (ηP )=XP (c) c 1 + ˆc P 1.

The SNSP implies

min z:X c=Xz z 1 < ηP c 1 + ˆc P 1.

Since P Supp( c), we ﬁnally get

min z:X c=Xz z 1 < ˆc P c 1 + ˆc P 1 = ˆc 1

which is again a contradiction, since ˆc is a minimizer of (4). So, we must have Supp(ˆc) P.

A Nullspace Property for Subspace-Preserving Recovery

3.1. Veriﬁcation of the SNSP on a Submatrix

One can work with a submatrix of X and still verify the SNSP for X. Speciﬁcally, one can discard the data points from subspace Si that do not correspond to extreme points of the symmetrized convex hull of the points in Si, and verify the SNSP for the matrix deﬁned by the remaining points. Formally, recall that for each Pi P, the convex set K(XPi) is the convex hull of the union of the columns of XPi and XPi. Let Pi := {l Pi : xl Ext(K(XPi))} be the collection of indices associated with columns of X that are extreme points of K(XPi), which must be nonempty. Let P := { Pi} and X := X P so that X consists of all columns of X whose indices are in P. The SNSP can now be veriﬁed on X.

Lemma 1. The data matrix X satisﬁes the SNSP if and only if X (see the previous paragraph) satisﬁes the SNSP.

Although Lemma 1 shows that one can safely replace the data matrix X with its submatrix X for the purpose of verifying the SNSP, in this paper we do not assume that this substitution takes place. The reason for this is two fold: 1) There is no major difference for our theoretical development. 2) If the data is normalized with the ℓp-norm where p / {1, }, then no reduction takes place since every column of the data matrix X is an extreme point in this case. However, sparse subspace classiﬁcation might beneﬁt immensely from this reduction when the data set is large and either not normalized or normalized with the ℓ1 or ℓ -norm. In this case, rather than verifying the SNSP for the data matrix X, we can verify it on the much smaller submatrix of extreme points. Such a reduction in matrix size can speed up ℓ1-recovery signiﬁcantly.

Since the SNSP characterizes when subspace-preserving recovery is possible, we devote the remainder of the paper to the derivation of alternative characterizations of the SNSP. These characterizations are either geometrically interpretable (thus providing insight to the nature of subspacepreserving recovery) or veriﬁable by considering certain ﬁnite sets of points (thus opening the door to developing practical approaches for certifying when the SNSP holds).

4. A Geometrically Interpretable Characterization of SNSP

In this section we introduce a characterization of the SNSP that allows for a clear geometric interpretation. Our main result in this section reads as follows.

Theorem 2. The matrix X satisﬁes the SNSP if and only if for all i [n], we have

Si K(XPc i ) rinte (K(XPi)) . (8)

Proof. For any η Null(X) it holds that XP (ηP ) =

XP c(ηP c). Combining this with Defn. 1, we have that X satisﬁes the SNSP if and only if for all η Null(X, P) and for all P P, we have

min z : XP c ηP c ηP c 1

=XP (z) z 1 < 1. (9)

Since the optimization problem in (9) is feasible, we can replace it with its dual to obtain

max w (X P ) 1B|P | 1 ηP c 1 η P c X P cw < 1. (10)

That is, X satisﬁes the SNSP if and only if for all η Null(X, P) and for all P P, we have

XP c ηP c ηP c 1

rinte h (X P ) 1B|P | i

= rinte XP B|P | 1 .

where the inclusion follows from the deﬁnition of the polar operator, and the equality follows from (Rockafellar, 1970, Cor. 16.3.2). After deﬁning the set

ΛP := {XP c(ηP c) : η Null(X, P) and ηP c 1 = 1} ,

we conclude that X satisﬁes the SNSP if and only if for all P P, we have ΛP rinte XP B|P | 1 .

Next, we claim that for all P P, the following holds

ΛP im(XP ) XP c S|P c| 1 1 ΛP {0}. (11)

The ﬁrst inclusion is straightforward. In order to show why the second inclusion also holds, we let z im(XP ) XP c S|P c| 1 1 . Hence, there exists η Null(X) such that z = XP ( ηP ) = XP cηP c, with ηP c 1 = 1. If η Null(X, P), then z ΛP , and we are done. If not, there exists Q P, with Q = P, such that Supp(η) Q. But if this is the case, we necessarily have z = XP ( ηP ) = 0. Hence, the second inclusion holds.

Since we always have 0 rinte XP B|P | 1 , by (11) we conclude that X satisﬁes the SNSP if and only if for all P P, we have im(XP ) XP c S|P c| 1 1

rinte XP B|P | 1 . The set rinte XP B|P | 1 is convex.

The convex hull of {0} im(XP ) XP c S|P c| 1 1 is

im(XP ) XP c B|P c| 1 . Hence, we conclude that X satisﬁes the SNSP, if and only if, for all P P, we have im(XP ) XP c B|P c| 1 rinte XP B|P | 1 . Since, for all

i [n], we have im(XPi) = Si, XPc i B|Pc i | 1 = K(XPc i )

and XPi B|Pi| 1 = K(XPi), we obtain the result.

A Nullspace Property for Subspace-Preserving Recovery

Thm. 2 has a straightforward geometric interpretation. It tells us that subspace-preserving recovery always succeeds, if and only if, for any i [n], the intersection of the symmetrized convex hull of the columns of XPc i and the subspace Si, is contained in the relative interior of the symmetrized convex hull of the columns of XPi. Although, to the best of our knowledge, this theorem did not appear in the literature before, it is possible to ﬁnd work that makes similar arguments (Elhamifar & Vidal, 2013, p.8). We refer the reader to Fig. 1 for an illustration of Thm. 2 and comparison with similar arguments existing in the literature.

Note that Thm. 2 does not assume that the columns of X are normalized. Therefore, Thm. 2 suggests that an ℓpnormalization of the columns of X may affect subspacepreserving recovery. Intuitively, one would like the righthand-side of (8) to be as large as possible, which suggests that an ℓ -normalization of the columns of X may be a good option. However, this normalization also enlarges the left-hand-side of (8), and predicting the total effect of this normalization can be non-trivial. In fact, an ℓ - normalization can sometimes be counter-productive (see Example 1 in the next section). On the other hand, an ℓ2normalization has an advantage. Suppose that we are given two data matrices X and ˆX, and moreover that ˆX = ΦX for some orthogonal matrix Φ RD D. (In special cases, Φ can be realized as a rotation matrix.) If we normalize X and ˆX using the ℓp-norm for p = 2, then Thm. 2 tells us that it is possible that the normalized X satisﬁes the SNSP, while the normalized ˆX does not. This is an undesirable effect since the two (unscaled) data sets differ only by a rotation. However, if we normalize with the ℓ2-norm, this problem does not occur, meaning that the decision on the normalized X and normalized X are the same.

Corollary 1. Let Φ RD D be an orthogonal matrix. Let Xℓ2 and (ΦX)ℓ2 denote the matrices obtained after normalizing the columns of X and ΦX with the ℓ2-norm, respectively. Then, Xℓ2 satisﬁes the SNSP, if and only if, (ΦX)ℓ2 satisﬁes the SNSP.

Proof. The result follows from Thm. 2 and the fact that the matrix Φ preserves the ℓ2-norm.

To the best of our knowledge Thm. 2 provides one of the most geometrically interpretable necessary and sufﬁcient conditions for subspace-preserving recovery in the literature. However, in general, it can be difﬁcult to determine whether a convex set is contained in another, so that the veriﬁcation of (8) can be difﬁcult. Therefore, sufﬁcient conditions that reduce the computational burden can be valuable. With this motivation in mind, we introduce the next corollary, which reduces the veriﬁcation of the SNSP to a comparison of two numbers: An inner-radius and an outer-radius.

Corollary 2. Suppose that the columns of X have unit ℓp-norm. The data matrix X satisﬁes the SNSP if, for all i [n], we have

Rp Si K(XPc i ) < rp (K(XPi)) , (12)

i.e. if, for all i [n], the outer-ℓp-radius of Si K(XPc i ) is strictly smaller than the inner-ℓp-radius of K(XPi).

Proof. Since inequality (12) implies (8), the result immediately follows from Thm. 2.

We end this section by connecting Cor. 2 to existing geometric results for subspace-preserving recovery in the literature.

Relation with results for independent subspaces. It is shown in (Elhamifar & Vidal, 2009) that subspacepreserving recovery is guaranteed if the collection of subspaces is linearly independent. Here, we demonstrate that this result is implied by Cor. 2. Recall that a collection of subspaces {Sk}n k=1 is called independent if the dimension of the span of {Sk}n k=1 is equal to the sum of the dimensions of the individual subspaces Sk, i.e. if we have dim (Ln k=1 Sk) = Pn k=1 dim(Sk). If {Sk}n k=1 is a union of independent subspaces, then the left-hand-side of (12) is 0 for every i [n], and so (12) is satisﬁed trivially.

Relation with results characterized by incoherence measures. Many of the existing geometric conditions for subspace-preserving recovery are characterized by inradius and certain incoherence metrics (Soltanolkotabi & Cand es, 2012; You & Vidal, 2015; You et al., 2016). In particular, You & Vidal (2015) shows that under the assumption that the columns of X have unit ℓ2-norm, subspace-preserving recovery is guaranteed if the following principal recovery condition is satisﬁed:

µ(Si, XPc i ) < r2 (K(XPi)) , i [n], (13)

where µ(Si, XPc i ) is the incoherence between the subspace Si and the points XPc i deﬁned as µ(Si, XPc i ) = maxx Si {0} X Pc i x / x 2. Note that the right hand side of (13) is the same as that of (12). On the other hand, the incoherence on the left hand side of (13) is related to the outer-radius on the left hand side of (12) as follows:

µ(Si, XPc i ) = R2 Pr Si(XPc i )

= R2 Pr Si(K(XPc i )) R2 Si K(XPc i ) , (14)

where Pr Si( ) denotes projection onto subspace Si. Therefore, the condition in (13) implies (i.e., is more restrictive than) the condition in (12), showing that Cor. 2 provides a stronger result than the result in You & Vidal (2015).

A Nullspace Property for Subspace-Preserving Recovery

Figure 1. Elhamifar & Vidal (2013) provide an interpretation of their results on subspace-preserving recovery in terms of these drawings, when the collection of the subspaces is disjoint. For a point x S1 (S2 S3) they argue that subspace-preserving recovery in S1 succeeds in the case shown on the left because min{α : x αK(XP1)} < min{α : x αK(XPc 1)}. They argue that recovery fails in the two other cases (shown in the middle and the right) because this inequality is reversed. According to their interpretation, the failure is due to smaller angles between the subspaces (middle) and degenerate data distribution (right). Our interpretation of these cases based on Thm. 2 is as follows: We make no speciﬁc reference to points in S1 (S2 S3). We argue that subspace-preserving recovery in S1 succeeds in the case shown on the left because S1 K(XPc 1) rinte(K(XP1)), whereas it fails in the next two cases because this strict inclusion fails to hold. (Drawings are borrowed from (Elhamifar & Vidal, 2013) for comparison.)

5. Reduction of the Veriﬁcation of SNSP to a Decision on Finite Sets

In the previous section we developed a characterization of the SNSP that admits a clear geometric interpretation. We also noted that condition (8) in Thm. 2 can be difﬁcult to verify. In this section we introduce equivalent characterizations of SNSP that reduces the veriﬁcation of SNSP to decisions on ﬁnite sets. These ﬁnite sets correspond to extreme points of certain compact convex sets. For instance, the next theorem shows that the SNSP admits a characterization in terms of the extreme points of Null(X) BN 1 . Before stating the theorem, we introduce the following new notation:

Ext(Null(X) BN 1 , P)

:= η Ext(Null(X) BN 1 ) : Supp(η) P, P P .

That is, Ext(Null(X) BN 1 , P) is the collection of all extreme points of Null(X) BN 1 whose support is not contained in any P P. The next result reduces the veriﬁcation of the SNSP from Null(X, P) to Ext(Null(X) BN 1 , P). Theorem 3. The matrix X satisﬁes the SNSP if and only if, for all η Ext(Null(X) BN 1 , P) and all P P, we have ηP 1 + min XP (ηP )=XP (z) z 1 < 1. (15)

While Thm. 3 shows that the veriﬁcation of SNSP can be reduced to a decision on a ﬁnite set, there is still an ℓ1minimization problem that persists. We address this issue in the next couple of results. Theorem 4. The matrix X satisﬁes the SNSP if and only if, for all P P and all η Ext(Null(X) BN 1 , P), we have ηP 1 + max w im(X P ) B|P | η P w < 1. (16)

Proof. Since there is zero duality gap for the ℓ1minimization problem in (15), we can replace it by its dual, namely maxw im(X P ) B|P | (ηP ) w, which gives the de-

sired result.

Now that we have converted the minimization problem to a maximization problem by using its dual, we can reduce the problem further to a decision on a ﬁnite set, as shown by the next result.

Theorem 5. The matrix X satisﬁes the SNSP if and only if, for all η Ext(Null(X) BN 1 , P) and all P P, we have

ηP 1 + max w Ext im(X P ) B|P | η P w < 1. (17)

Proof. Since the optimization problem in (16) is the maximization of a linear objective function over a compact convex feasible set im(X P ) B|P | , the optimal value must occur at an extreme point of im(X P ) B|P | . This gives the desired result.

Thm. 5 is our ﬁnal characterization of the SNSP. We stress that it does not require the solution of a continuous optimization problem or veriﬁcation on an inﬁnite set. With Thm. 5, the SNSP can be veriﬁed in a ﬁnite number of steps by checking (17) at all (η, w) Ext(Null(X) BN 1 , P) Ext(im(X P ) B|P | ) and each P P. We illustrate this approach with a simple example.

Example 1. In this example, we consider the union of three subspaces {Si}3 i=1 in R3. We assume that S1 and S2 are lines through the origin spanned by x1 := 0 1 1 and

x2 := 0 1 1/2 , respectively, and S3 is the xy-plane.

A Nullspace Property for Subspace-Preserving Recovery

Furthermore, we assume that we have four points from this union {xi}4 i=1 that are given by x1 and x2, as deﬁned above, x3 := 1 1 0 and x4 := 1 1 0 . Note that the columns of the data matrix

X = x1 x2 x3 x4 =

0 0 1 1 1 1 1 1 1 1/2 0 0

have unit ℓ -norm, and we have the partitioning {Pk}3 k=1 given by P1 = {1}, P2 = {2} and P3 = {3, 4}. The nullspace of X is 1-dimensional, and spanned by η = 1/4 1/2 1/8 1/8 . Then, we trivially have Ext(Null(X) B3 1) = { η}. Moreover, since the support of η is not contained in any Pk, we also have Ext(Null(X) B3 1, P) = { η}. A simple observation reveals that Ext(im(X P1) B2 ) = { 1}, Ext(im(X P2) B1 ) = { 1}, and Ext(im(X P3)

B1 ) = { 1 1 , 1 1 }. We see that (17) is violated for P2. Hence, X does not satisfy the SNSP.

Interestingly, if we normalize the columns of X by the ℓ2norm to obtain a new matrix Xℓ2, then the nullspace of Xℓ2 is spanned by η := 2

10 1 1 and Xℓ2 satisﬁes the SNSP. Therefore, the effect of normalization can be counter-intuitive as we already pointed out in the previous section.

So far, in this section, we have derived necessary and sufﬁcient conditions for the SNSP to hold that reduce its veriﬁcation to a decision on a ﬁnite set. Still, these characterizations can be computationally involved, which makes it desirable to obtain sufﬁcient conditions that are simpler to verify. To derive such a condition, we need an upper bound on the dual of the ℓ1-minimization problem, which we provide next.

Proposition 1. Let Ψ Rr s be a matrix with columns of unit ℓp-norm and y Rs. Then,

max w im(Ψ ) Bs y w Ψ y p rp (ΨBs 1). (18)

If p {1, } and rp (ΨBs 1) = 1, then equality holds.

Our characterizations in terms of extreme points require that we either solve an ℓ1-minimization problem or check the extreme points of im(X P ) BD to verify the SNSP. The bound in Prop. 1 allows us to simplify these characterizations and get rid of this requirement, as shown next.

Theorem 6. Suppose that the columns of X have unit ℓpnorm. The matrix X satisﬁes the SNSP if, for all η Ext(Null(X) BN 1 , P) and all P P, we have

ηP 1 + XP (ηP ) p

rp (K(XP )) < 1. (19)

Proof. The result is an immediate consequence of Thm. 4 and Prop. 1 since K(XP ) = XP BD 1 .

Before we end this section we would like to note that if the dimension of the nullspace of X or the dimension of the individual subspaces {Si} are large, then the number of extreme points can be large. However, numerical evidence suggests that making decisions based on a random sampling of extreme points can lead to satisfactory results (Kaba et al., 2021). The investigation of such an approach for subspacepreserving recovery is beyond the scope of this paper, so we leave it as future work.

6. Conclusion

In this paper we have provided several necessary and sufﬁcient conditions for subspace-preserving recovery, which is at the heart of the analysis of subspace classiﬁcation and clustering algorithms in machine learning. Our characterizations are based on a condition that is similar to the classical Nullspace Property (NSP), which we called Subspace Nullspace Property (SNSP). To the best of our knowledge, the SNSP is the ﬁrst NSP-like condition for subspacepreserving recovery that has been extensively studied. We showed that some formulations of the SNSP give rise to a clear geometric interpretation or allow us to reduce the veriﬁcation of the SNSP to decisions on ﬁnite sets. These ﬁnite sets correspond to certain extreme points, and hence, they also admit a clear geometric description. We also showed that all of our equivalent conditions can further be simpliﬁed to obtain sufﬁcient conditions that are geometrically intuitive and computationally simpler.

The classical NSP plays an important role in sparse recovery literature. It is arguably the most popular necessary and sufﬁcient condition that guarantees the success of ℓ1-minimization in the recovery of sparse signals. However, NSP-like conditions were mostly overlooked in the subspace-preserving recovery literature; the results in this paper ﬁll that gap.

Bajwa, W. U., Duarte, M. F., and Calderbank, R. Conditioning of random block subdictionaries with applications to block-sparse recovery and regression. IEEE Transactions on Information Theory, 61(7):4060 4079, 2015.

Elhamifar, E. and Vidal, R. Sparse subspace clustering. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 2790 2797, 2009.

Elhamifar, E. and Vidal, R. Clustering disjoint subspaces via sparse representation. In IEEE International Conference

A Nullspace Property for Subspace-Preserving Recovery

on Acoustics, Speech, and Signal Processing, pp. 1926 1929, 2010.

Elhamifar, E. and Vidal, R. Sparse subspace clustering: Algorithm, theory, and applications. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(11): 2765 2781, 2013.

Foucart, S. and Rauhut, H. A mathematical introduction to compressive sensing. Applied and Numerical Harmonic Analysis. Birkh auser Basel, 2013.

Jenatton, R., Audibert, J.-Y., and Bach, F. Structured variable selection with sparsity-inducing norms. The Journal of Machine Learning Research, 12:2777 2824, 2011.

Kaba, M. D., Zhao, M., Vidal, R., Robinson, D. P., and Mallada, E. What is the largest sparsity pattern that can be recovered by 1-norm minimization? IEEE Transactions on Information Theory, 67(5):3060 3074, 2021.

Kolar, M., Lafferty, J., and Wasserman, L. Union support recovery in multi-task learning. Journal of Machine Learning Research, 12(7), 2011.

Li, C.-G., You, C., and Vidal, R. On geometric analysis of afﬁne sparse subspace clustering. IEEE Journal on Selected Topics in Signal Processing, 12(6):1520 1533, 2018.

Lu, C.-Y., Min, H., Zhao, Z.-Q., Zhu, L., Huang, D.-S., and Yan, S. Robust and efﬁcient subspace segmentation via least squares regression. In European Conference on Computer Vision, pp. 347 360, 2012.

Obozinski, G., Wainwright, M. J., Jordan, M. I., et al. Support union recovery in high-dimensional multivariate regression. The Annals of Statistics, 39(1):1 47, 2011.

Robinson, D. P., Vidal, R., and You, C. Basis pursuit and orthogonal matching pursuit for subspacepreserving recovery: Theoretical analysis. ar Xiv preprint ar Xiv:1912.13091, 2019.

Rockafellar, R. T. Convex Analysis. Princeton University Press, 1970.

Soltanolkotabi, M. and Cand es, E. J. A geometric analysis of subspace clustering with outliers. Annals of Statistics, 40(4):2195 2238, 2012.

Soltanolkotabi, M., Elhamifar, E., and Cand es, E. J. Robust subspace clustering. Annals of Statistics, 42(2):669 699, 2014.

Stojnic, M., Parvaresh, F., and Hassibi, B. On the reconstruction of block-sparse signals with an optimal number of measurements. IEEE Transactions on Signal Processing, 57(8):3075 3085, 2009.

Tsakiris, M. C. and Vidal, R. Theoretical analysis of sparse subspace clustering with missing entries. In International Conference on Machine Learning, pp. 4975 4984, 2018.

Vidal, R., Ma, Y., and Sastry, S. Generalized Principal Component Analysis. Springer Verlag, 2016.

Wang, Y., Wang, Y., and Singh, A. A deterministic analysis of noisy sparse subspace clustering for dimensionalityreduced data. In International Conference on Machine Learning, pp. 1422 1431, 2015.

Wang, Y.-X. and Xu, H. Noisy sparse subspace clustering. Journal of Machine Learning Research, 17(12): 1 41, 2016.

Wright, J., Yang, A. Y., Ganesh, A., Sastry, S. S., and Ma, Y. Robust face recognition via sparse representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(2):210 227, Feb. 2009.

You, C. and Vidal, R. Geometric conditions for subspacesparse recovery. In International Conference on Machine Learning, pp. 1585 1593, 2015.

You, C., Robinson, D. P., and Vidal, R. Scalable sparse subspace clustering by orthogonal matching pursuit. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 3918 3927, 2016.

You, C., Robinson, D. P., and Vidal, R. Provable selfrepresentation based outlier detection in a union of subspaces. In IEEE Conference on Computer Vision and Pattern Recognition, pp. 4323 4332, 2017.

You, C., Li, C.-G., Robinson, D. P., and Vidal, R. Is an afﬁne constraint needed for afﬁne subspace clustering? In IEEE International Conference on Computer Vision, 2019.