# homomorphic_sensing__f3b85e04.pdf

Homomorphic sensing

Manolis C. Tsakiris 1 Liangzu Peng 1

Abstract A recent line of research termed unlabeled sensing and shufﬂed linear regression has been exploring under great generality the recovery of signals from subsampled and permuted measurements; a challenging problem in diverse ﬁelds of data science and machine learning. In this paper we introduce an abstraction of this problem which we call homomorphic sensing. Given a linear subspace and a ﬁnite set of linear transformations we develop an algebraic theory which establishes conditions guaranteeing that points in the subspace are uniquely determined from their homomorphic image under some transformation in the set. As a special case, we recover known conditions for unlabeled sensing, as well as new results and extensions. On the algorithmic level we exhibit two dynamic programming based algorithms, which to the best of our knowledge are the ﬁrst working solutions for the unlabeled sensing problem for small dimensions. One of them, additionally based on branch-and-bound, when applied to image registration under afﬁne transformations, performs on par with or outperforms state-of-the-art methods on benchmark datasets.

1. Introduction

In a recent line of research termed unlabeled sensing, it has been established that uniquely recovering a signal from shufﬂed and subsampled measurements is possible as long as the number of measurements is at least twice the intrinsic dimension of the signal (Unnikrishnan et al., 2018). The special case where the signal is fully observed but subject to a permutation of its values is known as shufﬂed linear regression (Hsu et al., 2017; Pananjady et al., 2018; Song et al., 2018; Abid & Zou, 2018). In its simplest form, it consists of solving a linear system of equations, with the

1School of Information Science and Technology, Shanghai Tech University, Shanghai, China. Correspondence to: Manolis C. Tsakiris <mtsakiris@shanghaitech.edu.cn>.

Proceedings of the 36 th International Conference on Machine Learning, Long Beach, California, PMLR 97, 2019. Copyright 2019 by the author(s).

entries of the right hand side vector permuted (Unnikrishnan et al., 2015; Tsakiris et al., 2018).

The unlabeled sensing or shufﬂed linear regression problems and their variations naturally arise in many applications in data science and engineering, such as 1) record linkage for data integration (Lahiri & Larsen, 2005; Shi et al., 2018), a particularly important problem in medical data analysis where publicly available health records are anonymized, 2) image registration (Lian et al., 2017), multi-target tracking (Poore & Gadaleta, 2006) and pose/correspondence estimation (David et al., 2004; Marques et al., 2009), 3) header-free communications in Internet-Of-Things networks (Pananjady et al., 2017; Song et al., 2018; Peng et al., 2019) and user de-anonymization (Narayanan & Shmatikov, 2008; Keller et al., 2009), 4) acoustic wave ﬁeld reconstruction (Dokmani c, 2019), 5) system identiﬁcation under asynchronous input/output samples (Balakrishnan, 1962), and many more, e.g., see Unnikrishnan et al. 2018; Pananjady et al. 2018.

1.1. Prior-Art

Theory. Suppose that y = Π Ax + ε Rm is a noisy and shufﬂed version of some signal Ax , where x Rn

is some unknown regression vector, Π is some unknown permutation, and ε is noise. What can be said about the estimation of x and Π given y, A and the distribution of ε? This shufﬂed linear regression problem has been a classic subject of research in the area of record linkage, where predominant methods study maximum likelihood estimators under the working hypothesis that an accurate estimate for the probabilities of transpositions between samples is available (Fellegi & Sunter, 1969; Lahiri & Larsen, 2005). However, this is a strong hypothesis that does not extend to many applications beyond record linkage.

Very recently important theoretical advances have been made towards understanding this problem in greater generality. Speciﬁcally, Slawski & Ben-David 2019; Abid et al. 2017 and Hsu et al. 2017 have demonstrated that in the absence of any further assumptions, the maximum likelihood estimator bx ML given by

(bΠML, bx ML) = argmin Π,x y ΠAx 2 , (1)

where Π ranges over all permutations, is biased. On the

Homomorphic sensing

other hand, if the SNR is large enough, Pananjady et al. 2016; 2018 have asserted that bΠML = Π with high probability. If Π is sparse enough, i.e., only a small percentage of entries of Ax have been shufﬂed (this is the support of Π ), Slawski & Ben-David 2019 have shown that under weaker SNR conditions the supports of bΠML, Π coincide. Moreover, they provide well behaved error bounds for bx ML x 2 as well as for bx RR x 2, where bx RR is the solution to the convex ℓ1 robust regression problem

min x,e y Ax me 2 2 + mλ e 1, λ > 0, (2)

in which the support of the sparse error e is meant to capture the support of the sparse permutation Π .

Another interesting line of work related more to algebraic geometry rather than statistics, is that of Song et al. 2018, which for the noiseless case (ε = 0) has proposed the use of symmetric polynomials towards extracting permutationinvariant constraints that x Rn must satisfy. Such a self-moment estimator had already been brieﬂy investigated by Abid et al. 2017 from a statistical point of view, where the authors noted that in the presence of noise it is unclear whether the resulting system of equations has any solutions. Perhaps surprisingly, working with n nonhomogeneous polynomials of degrees 1, 2, . . . , n in n variables, the work of Tsakiris et al. 2018 has established that regardless of the value of the noise ε and under the sole requirement that A is generic1, the polynomial system always has a solution and in fact at most n! of them, thus proving the existence of a purely algebraic estimator for x .

Much less is known for the more challenging and realistic case of unlabeled sensing, where now y Rk consists of a shufﬂed noisy subset of the entries of Ax Rm, i.e., there is no longer a 1-1 correspondence between y and Ax . The main theoretical ﬁnding up to date comes from the seminal work of Unnikrishnan et al. 2018, according to which, in the absence of noise, x is uniquely recoverable from y and A as long as 1) k 2n and 2) A is generic. Inspired by a certain duality between compressed and unlabeled sensing, a recovery condition for noisy data has further been given by Haghighatshoar & Caire 2018 in terms of a restricted isometry property. However, this approach is valid only for the special case of y obtained by subsampling Ax while maintaining the relative order of the samples.

Algorithms. Towards computing a solution to the shufﬂed linear regression problem, which can be solved by brute force in O(m!), the algorithms presented by Hsu et al. 2017; Pananjady et al. 2017 are conceptually important but applicable only for noiseless data or they have a complexity of at least O(m7). When the ratio of shufﬂed data is small, one may apply the ℓ1 robust regression method of

1A rigorous deﬁnition of generic will be given in 2.1

(2) (Slawski & Ben-David, 2019). Other approaches use alternating minimization or multi-start gradient descent to solve (1) (Abid et al., 2017; Abid & Zou, 2018), an NPhard problem for n > 1 (Pananjady et al., 2018). Due to the high non-convexity such methods are very sensitive to initialization. This is remedied by the algebraically-initialized expectation-maximization method of Tsakiris et al. 2018, which uses the solution to the polynomial system of equations mentioned above to obtain a high-quality initialization. This approach is robust to small levels of noise, efﬁcient for n 5, and is able to handle fully shufﬂed data; its main drawback is its exponential complexity in n.

In the unlabeled sensing case, which may be thought of as shufﬂed linear regression with outliers, the above methods in principle break down. Instead, we are aware of only two relevant algorithms, which nevertheless are suitable under strong structural assumptions on the data. The O(nmn+1) method of Elhami et al. 2017 applies a brute-force solution, which explicitly relies on the data being noiseless and whose theoretical guarantees require a particular exponentially spaced structure on A. On the other hand, Haghighatshoar & Caire 2018 attempt to solve

min S,x y SAx 2 , (3)

via alternating minimization, with S in (3) being a selection matrix2. Their main algorithmic insight is to solve for S given x via dynamic programming. However, their algorithm works only for order-preserving selection matrices S, a rather strong limitation, and seems to fail otherwise. It is thus fair to conclude that, to the best of our knowledge, there does not seem to exist a satisfactory algorithm for unlabeled sensing, even for small values of n.

1.2. Contributions

Theory. In this work we adopt an abstract view of the shufﬂed linear regression and unlabeled sensing problems, which naturally leads us to a more general formulation that we refer to as homomorphic sensing. In homomorphic sensing one is given a ﬁnite set T of linear transformations Rm Rm (to be called endomorphisms) and a linear subspace V Rm of dimension n, and asks under what conditions the image τ(v) of some unknown v V under some unknown τ T is enough to uniquely determine what v is. This is equivalent to asking under what conditions the relation τ1(v1) = τ2(v2) implies v1 = v2 whenever τ1, τ2 T and v1, v2 V. E.g., in shufﬂed linear regression these endomorphisms are permutations, while in unlabeled sensing they are compositions of permutations with coordinate pro-

2An order-preserving selection matrix is a row-submatrix of the identity matrix. A selection matrix is a row-permutation of an order-preserving selection matrix. This is equivalent to a permutation matrix composed by a coordinate projection.

Homomorphic sensing

jections, and the unlabeled sensing theorem of Unnikrishnan et al. 2018 asserts that a sufﬁcient condition for unique recovery is that 1) the coordinate projections preserve at least 2n coordinates and 2) V is generic.

The ﬁrst theoretical contribution of this paper is a general homomorphic sensing result (Theorem 1) applicable to arbitrary endomorphisms, and thus of potential interest in a broad spectrum of applications. For generic V, the key condition asks that the dimension n of V does not exceed the codimension of a certain algebraic variety associated with pairs of endomorphisms from T . This in turn can be used to obtain within a principled framework the unlabeled sensing result of Unnikrishnan et al. 2018. Our second theoretical contribution is a recovery result (Theorem 2) for generic points in V (as opposed to all points), which for the unlabeled sensing case says that the coordinate projections need to preserve at least n + 1 coordinates (as opposed to 2n).

Algorithms. Inspired by Haghighatshoar & Caire 2018; Elhami et al. 2017 and Yang et al. 2016 we make three algorithmic contributions. First, we introduce a branchand-bound algorithm for the unlabeled sensing problem by globally minimizing (3). Instead of branching over the space of selection matrices, which is known to be intractable (Li & Hartley, 2007), our algorithm only branches over the space of x Rn, relying on a locally optimal computation of the selection matrix via dynamic programming. Second, it is this dynamic programming feature that also allows us to modify the purely theoretical algorithm of Elhami et al. 2017 into a robust and efﬁcient method for small dimensions n. These two algorithms constitute to the best of our knowledge the ﬁrst working solutions for the unlabeled sensing problem. Third, when customized for image registration under an afﬁne transformation, our branch-and-bound algorithm is on par with or outperforms state-of-the-art methods (Lian et al., 2017; Jian & Vemuri, 2011; Myronenko & Song, 2010) on benchmark datasets.

2. Homomorphic Sensing: Algebraic Theory

The main results of this section are Theorems 1 and 2. We sketch the proofs wherever space allows and refer the reader to Tsakiris 2019 for the complete arguments. All results in this section refer to the noiseless case; an analysis for corrupted data is left to future research.

2.1. Preliminaries

For an integer k, [k] = {1, . . . , k}. For non-negative real number α, α is the greatest integer k such that k α.

2.1.1. R VERSUS C

We work over the complex numbers C. This does not contradict the fact that in this paper we are primarily inter-

ested in Rm, rather it facilitates the analysis. E.g., a matrix T Rm m may be diagonalizable over C but not over R. This is the case with permutations, whose eigenvalues are associated with the complex roots of unity. Hence our philosophy is check the conditions over C, then draw a conclusion over R ; see Remark 1.

2.1.2. ABSTRACT LINEAR ALGEBRA

We adopt the terminology of abstract linear algebra (Roman, 2008), since the ideas we discuss in this paper are best delivered in a coordinate-free way. The reader who insists on thinking in terms of matrices may safely replace linear transformations, kernels and images by matrices, nullspaces and rangespaces, respectively.

We work in Cm. For a subspace V we denote by dim(V) its dimension. For subspaces V, W we say that V, W do not intersect if V W = 0. An endomorphism is a linear transformation τ : Cm Cm; an automorphism is an invertible endomorphism. We denote by i the identity map i(w) = w, w Cm. If τ is an endomorphism, its kernel ker(τ) is the set of all v Cm such that τ(v) = 0, and its image im(τ) is the set of all τ(w) for w Cm. By rank(τ) we mean dim(im(τ)). The preimage of τ(v) is the set of all w Cm such that τ(v) = τ(w), i.e., the set of all v + ξ, for all ξ ker(τ). If V is a linear subspace, by τ(V) we mean the set of all vectors τ(v) for all v V. We denote by Eτ,λ the eigenspace of τ associated to eigenvalue λ, i.e., the set of all v Cm such that τ(v) = λv. For τ1, τ2 endomorphisms the generalized eigenspace of the pair (τ1, τ2) of eigenvalue λ is the set of all w Cm for which τ1(w) = λτ2(w). By a projection ρ of Cm

we mean an idempotent (ρ2 = ρ) endomorphism. By a coordinate projection we mean a projection that sets to zero certain coordinates of Cm while preserving the rest.

2.1.3. ALGEBRAIC GEOMETRY

By an algebraic variety (or variety) of Cm we mean the zero locus of a set of polynomials in m variables. The study of such varieties is facilitated by the use of the Zariski topology, in which every variety is a closed set. In particular, there is a well-developed theory in which topological and algebraic notions of dimension coincide (Hartshorne, 1977). This allows us to assign dimensions to sets such as the intersection of a variety with the complement of another, called quasi-variety. A linear subspace S is an algebraic variety and its linear-algebra dimension coincides with its algebraic-geometric dimension. The union A = i [ℓ]Si of ℓlinear subspaces is also an algebraic variety and dim(A) = maxi [ℓ] dim(Si). For a variety Y which is deﬁned by homogeneous polynomials dim(Y) can be characterized as the smallest number of hyperplanes through the origin 0 that one needs to intersect Y with to obtain

Homomorphic sensing

the origin. For Y a variety of Cm, we set codim(Y) = m dim(Y). The set of all n-dimensional linear subspaces of Cm is itself an algebraic variety of C( m n) called Grassmannian and denoted by Gr(n, m). By a generic subspace V of dimension n we mean a non-empty open subset U Gr(n, m) in the Zariski topology of Gr(n, m). Such a U is dense in Gr(n, m) and if one endows Gr(n, m) with a continuous probability measure, then U has measure 1 (B urgisser & Cucker, 2013). Hence the reader may safely think of a generic subspace as a random subspace. When we say for a generic V property P is true , we mean that the set of all V for which property P is true contains a non-empty Zariski open subset of Gr(n, m). Hence for a randomly drawn V property P will be true with probability 1. We will make repeated use of the following fact: Lemma 1. Let Y be a variety deﬁned by homogeneous polynomials and V a generic linear subspace. Then

dim(Y V) = max{dim(V) codim(Y), 0}. (4)

A property of the Zariski topology that we need is that the intersection of ﬁnitely many non-empty open sets of an irreducible space (such as Gr(n, m)) is open and non-empty.

2.2. Formulation and ﬁrst insights

Let V Cm be a linear subspace of dimension n and let T be a ﬁnite set of endomorphisms of Cm. Let v be some point (vector) in V. Suppose that we know the image τ(v) Cm of v for some unspeciﬁed τ T . Can we uniquely recover v from knowledge of V, T and τ(v)? Example 1. In shufﬂed linear regression T consists of the m! permutations on the m coordinates of Cm. In unlabeled sensing T consists of the set of all possible combinations of permutations composed with coordinate projections. In both cases V is the range-space of some matrix A Cm n

and v = Ax for some x Cn. The meaning of A and x may vary depending on the application. E.g., in signal processing/control systems x may be the impulse response of some linear ﬁlter, while in image registration x may represent the parameters of some afﬁne transformation.

The above question motivates the following deﬁnition. Deﬁnition 1. Let V Cm be a subspace and T a ﬁnite set of endomorphisms of Cm. We say that v1 V is uniquely recoverable in V under T if whenever τ1(v1) = τ2(v2) for some τ1, τ2 T and v2 V, then v1 = v2. If this holds for every v V, we have unique recovery in V under T . Remark 1. Let T be a set of endomorphisms of Rm. These can also be viewed as endomorphisms of Cm (Theorem 2.29 of Roman 2008). Let V be a subspace of Rm with basis v1, . . . , vn and VC = Span C(v1, . . . , vn) the subspace of Cm generated by the basis of V. Then unique recovery in VC under T implies unique recovery in V under T .

To build some intuition about the notion of unique recovery in V under T , consider ﬁrst the case where T = {i, τ} with i the identity map and τ some automorphism. As a ﬁrst step, we characterize the purely combinatorial condition of Deﬁnition 1 given in terms of points by the geometric condition of the next proposition given in terms of subspaces.

Lemma 2. Let τ be any automorphism of Cm and let V be any linear subspace. Then we have unique recovery in V under {i, τ} if and only if V τ(V) Eτ,1.

Proof. (Necessity) Suppose that V τ(V) Eτ,1. Then there exists some v1 V τ(V) not inside Eτ,1. Note that this also implies that v1 Eτ 1,1. Since v1 τ(V) there exists some v2 V such that v1 = τ(v2). Since v1 Eτ 1,1 we must have that v1 = τ 1(v1) = v2. Hence v1 = τ(v2) with v1 = v2. (Sufﬁciency) Suppose that v1 = τ(v2) for some v1, v2 V, whence v1 V τ(V). By hypothesis v1 Eτ,1. Hence τ(v1) = v1. Hence τ(v1) = τ(v2). Since τ is invertible, v1 = v2.

Notice that condition V τ(V) Eτ,1 of Lemma 2 prevents V τ(V) from intersecting Eτ,λ for any λ = 1. Hence, a necessary condition for unique recovery is V τ(V) to not intersect Eτ,λ =1. Notice that V τ(V) Eτ,λ = 0 if and only if V Eτ,λ = 0. This places a restriction on the dimension of V, for if dim(V) > codim(Eτ,λ) then V, Eτ,λ will necessarily intersect. Hence, a necessary condition for unique recovery in V under {i, τ} is

dim(V) min λ =1 codim Eτ,λ. (5)

Since the algebraic-geometric dimension of a ﬁnite union of linear subspaces is the maximum among the subspace dimensions ( 2.1.3), condition (5) is equivalent to

dim(V) codim λ =1 Eτ,λ . (6)

Then for a generic V satisfying condition (6), Lemma 1 guarantees that V λ =1 Eτ,λ = 0.

2.3. Recovery under diagonalizable automorphisms

It is not hard to show that when τ is diagonalizable and n = dim(V) is small enough compared to m, condition (6) is also sufﬁcient for unique recovery in V under {i, τ}3:

Lemma 3. Let τ be a diagonalizable automorphism of Cm and V generic subspace, dim(V) m/2 . We have unique recovery in V under {i, τ} if and only if (6) is true.

3Lemma 3 (with a different proof) is the main insight of the parallel and independent work of Dokmani c 2019. That work studies the same problem as the present paper, but focuses only on diagonalizable automorphisms. On the other hand, it has the advantage that it considers countably many automorphisms, while here we only consider ﬁnitely many.

Homomorphic sensing

Proof. ( ) By Lemma 2 V τ(V) Eτ,1. Hence V does not intersect Eτ,λ for every λ = 1. This implies (6). ( ) Suppose ﬁrst that dim(Eτ,1) m n. The subset X of Gr(n, m) on which V τ(V) = 0 contains the open subset Uτ on which dim(V + τ(V)) = 2n. Then Uτ is non-empty. To see this, note that there exist 2n linearly independent eigenvectors w1, . . . , w2n of τ such that no more than n of them correspond to the same eigenvalue. Ordering the wi such that eigenvectors corresponding to the same eigenvalue are placed consecutively, we then deﬁne vi = wi + wi+n, i [n]. Then V = Span(v1, . . . , vn) Uτ.

Next, suppose that dim(Eτ,1) > m n. Since n m/2 we have dim(Eτ,1) n. Suppose that v1 = τ(v2) for v1, v2 V. Let B be a basis of Cm on which τ is represented by a diagonal matrix T Cm m, and let V Cm n and V ξ1, V ξ2 Cm be the corresponding representations of a basis of V, and of v1, v2 respectively, with ξ1, ξ2 Cn. Then the equation v1 = τ(v2) is equivalent to V ξ1 = TV ξ2. Since dim(Eτ,1) n we may assume without loss of generality that the ﬁrst n diagonal elements of T are equal to 1. Letting V1 Cn n be the top n n submatrix of V , this implies that V1ξ1 = V1ξ2. Then V1 is invertible on a non-empty open subset U τ of Gr(n, m), on which v1 = v2. Thus V τ(V) Eτ,1, V U τ. In conclusion, there is an open set U = Uτ or U = U τ, such that for any V U we have V τ(V) Eτ,1, and so we are done by Lemma 2. The extension to multiple automorphisms follows from Lemma 3 and the fact that the intersection of ﬁnitely many non-empty open sets of Gr(n, m) is non-empty and open:

Proposition 1. Let T be a ﬁnite set of automorphisms of Cm such that for any τ1, τ2 T we have that τ 1 1 τ2 is diagonalizable. Let V be a generic subspace with dim(V) m/2 . We have unique recovery in V under T if and only if (6) is true for every τ = τ 1 1 τ2 with τ1, τ2 T .

Even though the invertibility and diagonalizability requirement of Proposition 1 may seem too strong, it is satisﬁed by our canonical example where T is the set of m! permutations on the m coordinates of Cm. In fact, more is true:

Proposition 2. Let T be the permutations on the m coordinates of Cm. Then dim(Eπ,λ) m m/2 , π T , λ = 1. Hence, for generic subspace V with dim(V) m/2 we have unique recovery in V under T .

Proof. This follows from basic structural facts about permutations. Let π T be a permutation. Then π is the product of c 1 disjoint cycles, say π = π1 πc. Suppose that cycle πi cycles mi coordinates, i.e., it has length mi. Since the cycles are disjoint we have m = Pc i=1 mi. Now, each cycle is diagonalizable with mi eigenvalues equal to the mi complex roots of unity, i.e., the roots of the equation xmi = 1. Since the cycles are disjoint, the dimensions of

the eigenspaces of π are counted additively across cycles. Hence for λ = 1 the dimension of Eπ,λ is at most equal to the number of cycles of length at least 2. But the number of such cycles is at most m/2 . Hence dim(Eπ,λ) m/2 . But m/2 m m/2 , i.e., dim(Eπ,λ) m m/2 . The rest of the statement is a corollary of Proposition 1.

2.4. Recovery under arbitrary endomorphisms

2.4.1. UNIQUE RECOVERY FOR ALL POINTS

The arguments that led to Proposition 1 relied heavily on the invertibility of the endomorphisms in T . This is because in that case unique recovery in V under {τ1, τ2} is equivalent to unique recovery in V under {i, τ 1 1 τ2}, where i is the identity map. It was this feature that helped us understand the homomorphic sensing property of Deﬁnition 1 in terms of V intersecting its image τ 1 1 τ2(V). In turn, the key objects controlling this intersection turned out to be the eigenspaces of τ = τ 1 1 τ2 corresponding to eigenvalues different than 1, as per Lemma 3, whose proof however made explicit use of the diagonalizability of τ. As a consequence, generalizing Proposition 1 to arbitrary endomorphisms for which τ1 might not even be invertible, let alone τ 1 1 τ2 diagonalizable, is not straightforward.

Example 2. In unlabeled sensing, a permutation composed with a coordinate projection is in general neither invertible nor diagonalizable. For example, consider a cycle π of length 3, a coordinate projection ρ onto the ﬁrst two coordinates, and their composition ρπ:

0 0 1 1 0 0 0 1 0

1 0 0 0 1 0 0 0 0

0 0 1 1 0 0 0 0 0

First, rank(ρπ) = 2 so that ρπ is not invertible. Secondly, ρπ is nilpotent, i.e., (ρπ)3 = 0, and so the only eigenvalue of ρπ is zero. This means that ρπ is similar to a 3 3 Jordan block of eigenvalue 0, i.e., ρπ is far from diagonalizable.

We proceed by developing two devices. The ﬁrst one is a generalization of Lemma 3 and overcomes the challenge of the potential non-diagonalizability of the endomorphisms.

Lemma 4. Let V be a generic subspace with dim(V) m/2 , and τ any endomorphism of Cm for which (6) is true. Then we have unique recovery in V under {i, τ}.

Proof. (Sketch) The arguments are similar in spirit with those in the proof of Lemma 3 but technically more involved. Let n = dim(V). The difﬁcult part is when dim(Eτ,1) m n, where we prove the existence a nonempty open set U of Gr(n, m), such that for every V U we have dim(V + τ(V)) = n + min{n, rank(τ)}, which is the maximal dimension that the subspace V + τ(V) can have. In analogy with the diagonalizable case, this can be

Homomorphic sensing

done by working with the Jordan canonical form of τ and constructing a V = Span(v1, . . . , vn) U for which the vi are suitably paired (generalized) eigenvectors of τ.

Our second device overcomes the challenge of potential lack of invertibility. We need some notation. Let τ1, τ2 be endomorphisms of Cm and let ρ be a projection onto im(τ2). Deﬁne the variety Yρτ1,τ2 as the set of w Cm for which ρτ1(w) and τ2(w) are linearly dependent, i.e.,

Yρτ1,τ2 = w : dim Span(ρτ1(w), τ2(w)) 1 . (7)

This is indeed a variety because if τ1, τ2, ρ are represented by matrices T1, T2, P, then Yρτ1,τ2 is deﬁned by the vanishing of all 2 2 minors of the matrix [PT1w T2w], which are quadratic polynomials in w. If w Yρτ1,τ2 then there exists some λw C such that either τ2(w) = λwρτ1(w) or ρτ1(w) = λwτ2(w). Hence Yρτ1,τ2 is the union of all generalized eigenspaces of the endomorphism pairs (ρτ1, τ2) and (τ2, ρτ1). Note that ker(ρτ1 τ2) is the generalized eigenspace corresponding to eigenvalue 1, while ker(ρτ1), ker(τ2) are the generalized eigenspaces of (ρτ1, τ2), (τ2, ρτ1) respectively, corresponding to eigenvalue 0. In analogy with the automorphism case of Lemma 2 where the eigenspace of eigenvalue 1 was irrelevant for unique recovery, it turns out that in the general case the same is true for the generalized eigenspaces of eigenvalue 1 and 0. Removing their union

Zρτ1,τ2 = ker(τ2) ker(ρτ1) ker(ρτ1 τ2), (8)

from Yρτ1,τ2 yields the quasi-variety

Uρτ1,τ2 = Yρτ1,τ2 \ Zρτ1,τ2. (9)

Uρτ1,τ2 plays an analogous role with λ =1Eτ,λ when τ is an automorphism. More precisely, in analogy with (6), the next theorem shows that the condition that controls homomorphic sensing in general is

dim(V) codim Uρτ1,τ2 . (10)

Theorem 1. Let T be a ﬁnite set of endomorphisms of Cm

such that for every τ T we have rank(τ) 2n, for some n m/2 . Then for a general subspace V of dimension n, we have unique recovery in V under T as long as for every τ1, τ2 T there is a projection ρ onto im(τ2) such that for Uρτ1,τ2 deﬁned in (9) condition (10) is true.

Proof. (Sketch) The key idea for the case T = {τ1, τ2} is to view V as a generic n-dimensional subspace of a generic k-dimensional subspace H, where k = rank(τ2). Then τ2|H is an isomorphism from H onto im(τ2), and so unique recovery in V under {τ1, τ2} follows from unique recovery in V under {i|H, τH}, with τH = τ2|H 1ρτ1|H endomorphism of H. By Lemma 4 we are done if

dim EτH,λ k n, λ = 1. Let τH(w) = λw, then

τ2 τ2|H 1ρτ1(w) = λτ2(w). Now, τ2 τ2|H 1ρ = ρ, thus ρτ1(w) = λτ2(w). Hence, EτH,λ Uρτ1,τ2 H

and the rest follows from dimension considerations.

In unlabeled sensing the endomorphisms in T have the form ρπ, where π is a permutation and ρ is a coordinate projection. Then as per Theorem 1 if dim(V) codim Uρ2ρ1π1,ρ2π2 one has unique recovery in V under {ρ1π1, ρ2π2}. Furthermore, via a combinatorial algebraicgeometric argument we obtain a convenient lower bound on codim Uρ2ρ1π1,ρ2π2 :

Proposition 3. Let π1, π2 be permutations on the m coordinates of Cm and ρ1, ρ2 coordinate projections. For Uρ2ρ1π1,ρ2π2 deﬁned in (9) we have

rank(ρ2)/2 codim Uρ2ρ1π1,ρ2π2 . (11)

As a consequence of Theorem 1 and Proposition 3 one has unique recovery in the unlabeled sensing case as long as the dimension of V does not exceed half the number of the coordinates preserved by each coordinate projection. This is precisely the result of Unnikrishnan et al. 2018 which they obtained by attacking the problem directly via ingenious yet complicated combinatorial arguments. Even though our proof is not necessarily less complicated, it has the advantage of using a framework that generalizes relatively easily. For example, one may consider entry-wise sign corruptions on top of coordinate projections and permutations. In such a case, it is not hard to show that for the same condition as for unlabeled sensing one has unique recovery up to a sign. In general, even though analytically computing codim Uρτ1,τ2 may be challenging, performing this computation in an algebraic geometry software environment such as Macaulay2 is in principle straightforward.

2.4.2. UNIQUE RECOVERY FOR GENERIC POINTS

Often the requirement that every v V is uniquely recoverable is unnecessarily strict. Instead, it may be of interest to ask whether unique recovery holds true for a generic v V. In such a situation a less demanding technical analysis gives unique recovery under much weaker conditions:

Theorem 2. Let T be a ﬁnite set of endomorphisms of Cm. Then a generic point v inside a generic subspace V of dimension n is uniquely recoverable in V under T as long as 1) rank(τ) n + 1 for every τ T , and 2) no two endomorphisms in T are a scalar multiple of each other.

Proof. (Sketch) Let V Cm n be a basis of V. If τ1(v1) = τ2(v2) then τ2(v2) τ1(V) and so rank([T1V T2V ξ]) n for ξ Cn with v2 = V ξ. The proof then proceeds by exhibiting, for τ1 = τ2, a V and a ξ for which rank([T1V T2V ξ]) = n + 1. This implies that

Homomorphic sensing

for generic V and v2 V, τ1(v1) = τ2(v2) for v1 V only when τ1 = τ2. In that case v1 v2 ker(τ1), and Lemma 1 implies that v1 v2 = 0.

A consequence of Theorem 2 is the unique recovery of a generic vector in the unlabeled sensing case as soon as the coordinate projections preserve at least n + 1 entries:

Corollary 1. Let T be the set of endomorphisms of Cm

such that every τ T has the form τ = ρπ, where π is a permutation and ρ a coordinate projection. Then for a generic subspace V of dimension n, and a generic v V, we have unique recovery of v in V under T , as long as rank(ρ) n + 1 for every ρπ T .

3. Algorithms and Application

3.1. Branch-and-bound for unlabeled sensing

In this section we propose a globally optimal method for the unlabeled sensing problem, by minimizing (3) via a dynamic-programming based branch-and-bound scheme. Both ingredients are standard, and we just describe how to combine them; see Emiya et al. 2014; Yang et al. 2016 for transparent discussions of branch-and-bound in related contexts. Set f(x, S) = y SAx 2, with x Rn and S a selection matrix. As branching over the space of selection/permutation matrices S is known to be inefﬁcient (Li & Hartley, 2007), the crucial aspect of our approach is to branch only over the space of x, while relying on a local computation of the optimal S, say Sx, given x. Here is where dynamic programming comes into play: Haghighatshoar & Caire 2018 showed that if there exists an orderpreserving Sx such that f(x, Sx) = min S y SAx 2, then Sx can be computed via dynamic programming4 at a complexity O(mk). At ﬁrst sight this does not generalize to any y, A, x as none of the minimizers over S is expected to be order-preserving. However, if we order y, Ax in descending order to obtain say y , (Ax) , then 1) there is an order-preserving selection matrix S x such that y S x(Ax) 2 = min S y S(Ax) 2, and 2) Sx can be easily obtained from S x. In conclusion, given x we can compute Sx in O(mk); this is in sharp contrast to other linear assignment algorithms such as the Hungarian algorithm, most of which have complexity O(m3) (Burkard et al., 2009). Finally, our strategy becomes that of computing an upper bound of f in a hypercube with center x0 via alternating minimization between x and S, initialized at x0. Computing a tight lower bound ℓof f for a given hypercube is challenging and our choice here is a crude one: ℓ= y Sx0Ax0 2 σ1(A)ϵ, where ϵ is half the hypercube diagonal and σ1(A) is the largest singular value of A. We refer to this as Algorithm-A.

4That such an assignment problem can be solved via dynamic programming was already known by Aggarwal et al. 1992.

3.2. A robust version of Elhami et al. 2017

It turns out that the dynamic programming trick of 3.1 is also the key to a robust version of the theoretical algorithm of Elhami et al. 2017: we randomly select a sub-vector y of y of length n, and for each Ai out of the m!/(m n)! many n n matrices that can be made by concatenating different rows of A in any order we let xi = A i y. We then use dynamic programming to select the xi with the lowest assignment error min S y SAxi 2. This is an algorithm of complexity O(kmn+1), we call it Algorithm-B.

3.3. Evaluation on synthetic data

We compare the proposed 1) Algorithm-A of 3.1 and 2) Algorithm-B of 3.2 to other state-of-the-art methods ( 1.1), using normally distributed A, x, ε with n = 3, m = 100 and σ = 0.01 for the noise. For shufﬂed linear regression (k = m) we compare with 3) Slwasky19 that solves (2), 4) Tsakiris18 which is the algebraic-geometric method of Tsakiris et al. 2018 and 3) Abid18 which performs alternating minimization on (1) via least-squares and sorting (Abid & Zou, 2018)5. For unlabeled sensing (k m) we compare with 4) Haghighatshoar18 (Haghighatshoar & Caire, 2018). As seen from Fig. 1 the proposed methods perform uniformly better and often by a large margin than the other methods, when tested in their robustness against the percentage of shufﬂed data, outlier ratio and noise level. In particular, we see that for the unlabeled sensing problem of Figs. 1b-1d Algorithm-A and Algorithm-B are the only working solutions. Encouraging as these results may be, we do note that an important weakness of these methods is their scalability: for Algorithm-B this is more of an inherent issue due to its brute-force nature, while for Algorithm-A it is due to its naive lower bounding scheme: the consequence of it being far from tight manifests itself at higher dimensions (n 4) or large outlier ratios (k m), in which the method becomes too slow: it runs6 in 1sec for k = m = 100, 30sec for k = 80 and 5min for k = 50. In contrast, Algorithm-B is immune to k and runs in about 40sec.

3.4. Application to image registration

Registering point sets P, Q between two images is a classical problem in computer vision. Assuming that P, Q are related by an afﬁne transformation T and that each point in P (model set) has a counterpart in Q (scene set) (Lian et al., 2017), jointly searching for the afﬁne transformation and the registration can be done by minimizing the func-

5The soft-EM algorithm of Abid & Zou 2018 consistently fails in our experiment, thus we do not include it in the ﬁgure. 6In this experiment Algorithm-A stops splitting a hypercube when a depth 6 for that hypercube has been reached. Run on an Intel(R) i7-8650U, 1.9GHz, 16GB machine.

Homomorphic sensing

0% 20% 40% 60% 80% 100% Percentage of Shuffled Data

Estimation Error

Tsakiris18 Abid18 Slawski19 Algorithm A Algorithm B

(a) shufﬂed linear regression

0% 20% 40% 60% 80% 100% Percentage of Shuffled Data

Estimation Error

Haghighatshoar18 Algorithm A Algorithm B

(b) unlabeled sensing

0% 10% 20% 30% 40% 50% Outlier Ratio

Estimation Error

Haghighatshoar18 Algorithm A Algorithm B

(c) unlabeled sensing

0.02 0.04 0.06 0.08 0.1 Noise Level

Estimation Error

Haghighatshoar18 Algorithm A Algorithm B

(d) unlabeled sensing Figure 1: Relative error vs. % of shufﬂed data, outlier ratio (1 k/m) and noise level (σ). n = 3, m = 100, k = 80, σ = 0.01, 1000 trials. Proposed algorithms in red.

tion F(T, S) = PT SQ F, where Q Rm 2, P Rk 3, T R3 2, with homogeneous coordinates used for P, and S is a selection matrix. This is a matrix version of the unlabeled sensing objective function (3). Our contribution here is to adjust algorithm of 3.1 to solve the image registration problem. This involves branching over a 6-dimensional space to compute T, i.e., n = 6. This does not contradict the remark of the previous section regarding scalability: the key here is that each point correspondence imposes two constraints on T (as opposed to one constraint in the general case), so that, loosely speaking, the effective outlier ratio is 1 2k/m (as opposed to 1 k/m). As we will soon see, this has a crucial effect on performance. Finally, as dynamic programming is not applicable to obtain S given T, we employ a standard linear assignment algorithm of complexity O(m3) (Jonker & Volgenant, 1987). We refer to this algorithm as Algorithm-C.

We compare 1) Algorithm-C with state-of-the-art image registration techniques, i.e., 2) CPD (Myronenko & Song, 2010), 3) GMMREG (Jian & Vemuri, 2011), and 4) APM (Lian et al., 2017), using a subset of the benchmark datasets used by Lian et al. 2017. Since APM is the most competitive among these last three, we let it run to convergence with a tolerance parameter of 0.1 and set its running time as a time budget for our method; CPD and GMMREG are local methods and they run very fast. When the afﬁne transformation is a rotation (Fig. 2a) CPD, GMMREG only work for small angles, while they fail for general afﬁne transformations (Figs. 2b-2c). On the other hand, Algorithm-C performs comparably to APM with the following twist in Fig. 2c: when the outlier ratio is small, APM converges

0 36 72 108 144 180 Rotation Angle

Registration Error

GMMREG CPD APM Algorithm C

(a) 2D rotation

0.02 0.04 0.06 0.08 0.1 Noise Level

Registration Error

GMMREG CPD APM Algorithm C

0% 10% 20% 30% 40% 50%

Outlier Ratio

Registration Error

GMMREG CPD APM Algorithm C

car eiffel motobike revolver Category

Registration Error

GMMREG CPD APM Algorithm C

(d) real images

0% 10% 20% 30% 40% 50%

Outlier Ratio

Running Time (Sec)

APM Algorithm C

0% 10% 20% 30% 40% 50%

Outlier Ratio

Registration Error

APM Algorithm C

(f) afﬁne Figure 2: Image registration using the synthetic benchmark dataset The Chinese Character (Chui & Rangarajan, 2003) (2a-2c,2e-2f,100 trials) and the same collection of real images used in Lian et al. 2017 (2d). Proposed method in red.

very quickly resulting to an inadequate time budget for our method. Conversely, when the outlier ratio is large, APM s accuracy becomes inadequate while it is slow enough for our method to perform even better than for fewer outliers. These running times for APM are shown in Fig. 2e where the same experiment is run for noiseless data: APM still uses a tolerance of 0.1 while to make a point we set the tolerance of our method to zero and let it terminate. As seen in Figs. 2e-2f, our method terminates signiﬁcantly faster than APM for large outlier ratios, suggesting that its branch-andbound structure may have an advantage over that of APM.

Acknowledgement

The ﬁrst author is grateful to Dr. Aldo Conca for ﬁrst noting and sharing the phenomenon of Lemmas 2 and 3. We also thank Dr. Laurent Kneip for suggesting the case study of image registration under general afﬁne transformations.

Homomorphic sensing

Abid, A. and Zou, J. Stochastic em for shufﬂed linear regression. Technical report, ar Xiv:1804.00681v1 [stat.ML], 2018.

Abid, A., Poon, A., and Zou, J. Linear regression with shufﬂed labels. Technical report, ar Xiv:1705.01342v2 [stat.ML], 2017.

Aggarwal, A., Bar-Noy, A., Khuller, S., Kravets, D., and Schieber, B. Efﬁcient minimum cost matching using quadrangle inequality. In Annual Symposium on Foundations of Computer Science, pp. 583 592, 1992.

Balakrishnan, A. On the problem of time jitter in sampling. IRE Transactions on Information Theory, 8(3):226 236, 1962.

B urgisser, P. and Cucker, F. The geometry of numerical algorithms. Springer Science & Business Media, 349, 2013.

Burkard, R. E., Dell Amico, M., and Martello, S. Assignment Problems. Springer, 2009.

Chui, H. and Rangarajan, A. A new point matching algorithm for non-rigid registration. Comput. Vis. Image Underst., 89(2-3):114 141, 2003.

David, P., De Menthon, D., Duraiswami, R., and Samet, H. Softposit: Simultaneous pose and correspondence determination. International Journal of Computer Vision, 59 (3):259 284, 2004.

Dokmani c, I. Permutations unlabeled beyond sampling unknown. IEEE Signal Processing Letters, 26(6):823 827, 2019.

Elhami, G., Scholeﬁeld, A., Haro, B. B., and Vetterli, M. Unlabeled sensing: Reconstruction algorithm and theoretical guarantees. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4566 4570, 2017.

Emiya, V., Bonnefoy, A., Daudet, L., and Gribonval, R. Compressed sensing with unknown sensor permutation. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1040 1044, 2014.

Fellegi, I. P. and Sunter, A. B. A theory for record linkage. Journal of the American Statistical Association, 64 (328):1183 1210, 1969.

Haghighatshoar, S. and Caire, G. Signal recovery from unlabeled samples. IEEE Transactions on Signal Processing, 66(5):1242 1257, 2018.

Hartshorne, R. Algebraic Geometry. Springer, 1977.

Hsu, D. J., Shi, K., and Sun, X. Linear regression without correspondence. In Advances in Neural Information Processing Systems (NIPS), pp. 1531 1540, 2017.

Jian, B. and Vemuri, B. C. Robust point set registration using gaussian mixture models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(8):1633 1645, 2011.

Jonker, R. and Volgenant, A. A shortest augmenting path algorithm for dense and sparse linear assignment problems. Computing, 38(4):325 340, 1987.

Keller, L., Siavoshani, M. J., Fragouli, C., Argyraki, K., and Diggavi, S. Identity aware sensor networks. In IEEE INFOCOM, pp. 2177 2185, 2009.

Lahiri, P. and Larsen, M. D. Regression analysis with linked data. Journal of the American Statistical Association, 100(469):222 230, 2005.

Li, H. and Hartley, R. The 3d-3d registration problem revisited. In International Conference on Computer Vision (ICCV), pp. 1 8, 2007.

Lian, W., Zhang, L., and Yang, M.-H. An efﬁcient globally optimal algorithm for asymmetric point mathcing. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(7):1281 1293, 2017.

Marques, M., Stosic, M., and Costeira, J. Subspace matching: Unique solution to point matching with geometric constraints. In International Conference on Computer Vision (ICCV), pp. 1288 1294, 2009.

Myronenko, A. and Song, X. Point set registration: coherent point drift. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(12):2262 2275, 2010.

Narayanan, A. and Shmatikov, V. Robust deanonymization of large sparse datasets. In IEEE Symposium on Security and Privacy, pp. 111 125, 2008.

Pananjady, A., Wainwright, M. J., and Courtade, T. A. Linear regression with shufﬂed data: Statistical and computational limits of permutation recovery. In 54th IEEE Annual Allterton Conference on Communication, Control and Computing, pp. 417 424, 2016.

Pananjady, A., Wainwright, M. J., and Courtade, T. A. Denoising linear models with permuted data. In IEEE International Symposium on Information Theory (ISIT), pp. 446 450, 2017.

Pananjady, A., Wainwright, M. J., and Courtade, T. A. Linear regression with shufﬂed data: Statistical and computational limits of permutation recovery. IEEE Transactions on Information Theory, 64(5):3286 3300, 2018.

Homomorphic sensing

Peng, L., Song, X., Tsakiris, M., Choi, H., Kneip, L., and Shi, Y. Algebraically-initialized expectation maximization for header-free communication. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5182 5186, 2019.

Poore, A. B. and Gadaleta, S. Some assignment problems arising from multiple target tracking. Mathematical and Computer Modelling, 43(9):1074 1091, 2006.

Roman, S. Advanced Linear Algebra. Springer, 2008.

Shi, X., Li, X., and Cai, T. Spherical regression under mismatch corruption with application to automated knowledge translation. Technical report, ar Xiv:1810.05679v1 [stat.ME], 2018.

Slawski, M. and Ben-David, E. Linear regression with sparsely permuted data. Electronic Journal of Statistics, 13(1):1 36, 2019.

Song, X., Choi, H., and Shi, Y. Permuted linear model for header-free communication via symmetric polynomials. In International Symposium on Information Theory (ISIT), 2018.

Tsakiris, M. C. Eigenspace conditions for homomorphic sensing. Technical report, ar Xiv:1812.07966v3 [math.CO], 2019.

Tsakiris, M. C., Peng, L., Conca, A., Kneip, L., Shi, Y., and Choi, H. An algebraic-geometric approach to shufﬂed linear regression. Technical report, ar Xiv:1810.05440v1 [cs.LG], 2018.

Unnikrishnan, J., Haghighatshoar, S., and Vetterli, M. Unlabeled sensing: solving a linear system with unordered measurements. In 53rd IEEE Annual Allerton Conference on Communication, Control and Computing, 2015.

Unnikrishnan, J., Haghighatshoar, S., and Vetterli, M. Unlabeled sensing with random linear measurements. IEEE Transactions on Information Theory, 64(5):3237 3253, 2018.

Yang, J., Li, H., Campbell, D., and Jia, Y. Go-icp: A globally optimal solution to 3d icp point-set registration. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(11):2241 2254, 2016.