# on_geometric_alignment_in_low_doubling_dimension__343b585c.pdf

The Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19)

On Geometric Alignment in Low Doubling Dimension

Hu Ding School of Computer Science and Technology School of Data Science University of Science and Technology of China He Fei, China, 230026 huding@ustc.edu.cn

Mingquan Ye Department of Computer Science and Engineering Michigan State University East Lansing, MI, USA, 48824 yemingqu@msu.edu

In real-world, many problems can be formulated as the alignment between two geometric patterns. Previously, a great amount of research focus on the alignment of 2D or 3D patterns, especially in the field of computer vision. Recently, the alignment of geometric patterns in high dimension finds several novel applications, and has attracted more and more attentions. However, the research is still rather limited in terms of algorithms. To the best of our knowledge, most existing approaches for high dimensional alignment are just simple extensions of their counterparts for 2D and 3D cases, and often suffer from the issues such as high complexities. In this paper, we propose an effective framework to compress the high dimensional geometric patterns and approximately preserve the alignment quality. As a consequence, existing alignment approach can be applied to the compressed geometric patterns and thus the time complexity is significantly reduced. Our idea is inspired by the observation that high dimensional data often has a low intrinsic dimension. We adopt the widely used notion doubling dimension to measure the extents of our compression and the resulting approximation. Finally, we test our method on both random and real datasets; the experimental results reveal that running the alignment algorithm on compressed patterns can achieve similar qualities, comparing with the results on the original patterns, but the running times (including the times cost for compression) are substantially lower.

1 Introduction

Given two geometric patterns, the problem of alignment is to find their appropriate spatial positions so as to minimize the difference between them. In general, a geometric pattern is represented by a set of (weighted) points in the space, and their difference is often measured by some objective function. In particular, geometric alignment finds many applications in the field of computer vision, such as image retrieval, pattern recognition, fingerprint and facial shape alignment, etc (Cohen and Guibas 1999; Maltoni et al. 2009; Cao et al. 2014). For different applications, we may have different constraints for the alignment, e.g., we allow rigid transformations for fingerprint alignment. In addition, Earth Mover s Distance (EMD) (Rubner, Tomasi, and Guibas

Copyright c 2019, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.

2000) has been widely adopted as the metric for measuring the difference of patterns in computer vision, where its major advantage over other measures is the robustness with respect to noise in practice. Besides the computer vision applications in 2D or 3D, recent research shows that a number of high dimensional problems can be solved by geometric alignments. We briefly introduce several interesting examples below. (1) The research on natural language processing has revealed that different languages often share some similar structure at the word level (Youn et al. 2016); in particular, the recent study on word semantics embedding has also shown the existence of structural isomorphism across languages (Mikolov, Le, and Sutskever 2013), and further finds that EMD can serve as a good distance for languages or documents (Zhang et al. 2017; Kusner et al. 2015). Therefore, (Zhang et al. 2017) proposed to learn the transformation between different languages without any cross-lingual supervision, and the problem is reduced to minimizing the EMD via finding the optimal geometric alignment in high dimension. (2) A Protein-Protein Interaction (PPI) network is a graph representing the interactions among proteins. Given two PPI networks, finding their alignment is a fundamental bioinformatics problem for understanding the correspondences between different species (Malod-Dognin, Ban, and Prˇzulj 2017). However, most existing approaches require to solve the NP-hard subgraph isomorphism problem and often suffer from high computational complexities. To resolve this issue, (Liu et al. 2017) recently applied the geometric embedding techniques to develop a new framework based on geometric alignment in Euclidean space. (3) Other applications of high dimensional alignment include domain adaptation and indoor localization. Domain adaptation is an important problem in machine learning, where the goal is to predict the annotations of a given unlabeled dataset by determining the transformation (in the form of EMD) from a labeled dataset (Pan and Yang 2010); as the rapid development of wireless technology, indoor localization becomes rather important for locating a person or device inside a building. Recent studies show that they both can be well formulated as geometric alignments in high dimension. We refer the reader to (Courty et al. 2017) and (Yang, Wu, and Liu 2012) for more details. Despite of the above studies in terms of applications, the

research on the algorithms is still rather limited and far from being satisfactory. Basically, we need to take into account of the high dimensionality and large number of points of the geometric patterns, simultaneously. In particular, as the developing of data acquisition techniques, data sizes increase fast and designing efficient algorithms for high dimensional alignment will be important for some applications. For example, due to the rapid progress of high-throughput sequencing technologies, biological data are growing exponentially (Yin et al. 2017). In fact, even for the 2D and 3D cases, solving the geometric alignment problem is quite challenging yet. For example, the well-known iterative closest point (ICP) method (Besl and Mc Kay 1992) can only guarantee to obtain a local optimum. More previous works will be discussed in Section 1.2. To the best of our knowledge, we are the first to consider developing efficient algorithmic framework for the geometric alignment problem in large-scale and high dimension. Our idea is inspired by the observation that many real-world datasets often manifest low intrinsic dimensions (Belkin 2003). For example, human handwriting images can be well embedded to some low dimensional manifold though the Euclidean dimension can be very high (Tenenbaum, De Silva, and Langford 2000). Following this observation, we consider to exploit the widely used notion, doubling dimension (Krauthgamer and Lee 2004; Talwar 2004; Karger and Ruhl 2002; Har-Peled and Mendel 2006; Dasgupta and Sinha 2013), to deal with the geometric alignment problem. Doubling dimension is particularly suitable to depict the data having low intrinsic dimension. We prove that the given geometric patterns with low doubling dimensions can be substantially compressed so as to save a large amount of running time when computing the alignment. More importantly, our compression approach is an independent step, and hence can serve as the preprocessing for various alignment methods. The rest of the paper is organized as follows. We provide the definitions that are used throughout this paper in Section 1.1, and discuss some existing approaches and our main idea in Section 1.2. Then, we present our algorithm, analysis, and the time complexity in detail in Section 2 and 3. Finally, we study the practical performance of our proposed algorithm in Section 4.

1.1 Preliminaries Before introducing the formal definition of geometric alignment, we need to define EMD and rigid transformation first. Definition 1 (Earth Mover s Distance (EMD)). Let A = {a1, a2, , an1} and B = {b1, b2, , bn2} be two sets of weighted points in Rd with nonnegative weights αi and βj for each ai A and bj B respectively, and WA and WB be their respective total weights. The earth mover s distance between A and B is EMD(A, B) =

1 min{WA, WB} min F

j=1 fij||ai bj||2, (1)

where F = {fij} is a feasible flow from A to B, i.e., each fij 0, n1 i=1 fij βj, n2 j=1 fij αi, and

n1 i=1 n2 j=1 fij = min{WA, WB}.

Intuitively, EMD can be viewed as the minimum transportation cost between A and B, where the weights of A and B are the supplies and demands respectively, and the cost of each edge connecting a pair of points from A to B is their ground distance . In general, the ground distance can be defined in various forms, and here we simply use the squared distance because it is widely adopted in practice.

Definition 2 (Rigid Transformation). Let P be a set of points in Rd. A rigid transformation T on P is a transformation (i.e., rotation, translation, reflection, or their combination) which preserves the pairwise distances of the points in P.

We consider rigid transformation for alignment, because it is very natural to interpret in real-world and has already been used by the aforementioned applications.

Definition 3 (Geometric Alignment). Given two weighted point sets A and B as described in Definition 1, the problem of geometric alignment between A and B under rigid transformation is to determine a rigid transformation T for B so as to minimize the earth mover s distance EMD(A, T (B)).

As previously mentioned, we consider to use doubling dimension to describe high dimensional data having low intrinsic dimension. We denote a metric space by (X, d X) where d X is the distance function of the set X. For instance, we can imagine that X is a set of points in a low dimensional manifold and d X is simply Euclidean distance. For any x X and r 0, Ball(x, r) = {p X | d X(x, p) r} indicates the ball of radius r around x (note that Ball(x, r) is a subset of X).

Definition 4 (Doubling Dimension). The doubling dimension of a metric space (X, d X) is the smallest number ρ, such that for any x X and r 0, Ball(x, 2r) is always covered by the union of at most 2ρ balls with radius r.

Doubling dimension describes the expansion rate of (X, d X); intuitively, we can imagine a set of points uniformly distributed inside a ρ-dimensional hypercube, where its doubling dimension is O(ρ) but the Euclidean dimension can be very high. For a more general case, a manifold in high dimensional Euclidean space may have a very low doubling dimension, as many examples studied in machine learning (Belkin 2003). Unfortunately, as shown before (Laakso 2002), such low doubling dimensional metrics cannot always be embedded to low dimensional Euclidean spaces with low distortion in terms of Euclidean distance. Therefore, we need to design the techniques being able to manipulate the data in high dimensional Euclidean space directly.

1.2 Existing Results and Our Approach If building a bipartite graph, where the two columns of vertices correspond to the points of A and B respectively and each edge connecting (ai, bj) has the weight ||ai bj||2, we can see that computing EMD actually is a min-cost flow problem. Many min-cost flow algorithms have been developed in the past decades, such as Network simplex algo-

rithm (Ahuja, Magnanti, and Orlin 1993), a specialized version of the simplex algorithm. Since EMD is an instance of min-cost flow problem in Euclidean space and the geometric techniques (e.g., the geometric sparse spanner) are applicable, a number of faster algorithms have been proposed in the area of computational geometry (Agarwal et al. 2017; Cabello et al. 2008; Indyk 2007), however, most of them only work for low dimensional case. Several EMD algorithms with assumptions (or some modifications on the objective function of EMD) have been studied in practical areas (Pele and Werman 2009; Ling and Okada 2007; Benamou et al. 2015). Computing the geometric alignment of A and B is more challenging, since we need to determine the rigid transformation and EMD flow simultaneously. Moreover, due to the flexibility of rigid transformations, we cannot apply the EMD embedding techniques (Indyk and Thaper 2003; Andoni et al. 2009) to relieve the challenges. For example, the embedding can only preserve the EMD between A and B; however, since there are infinite number of possible rigid transformations T for B (note that we do not know T in advance), it is difficult to also preserve the EMD between A and T (B). In theory, (Cabello et al. 2008) presented a (2 + ϵ)-approximation solution for the 2D case, and later (Klein and Veltkamp 2005) achieved an O(2d 1)-approximation in Rd; (Ding and Xu 2017) proposed a PTAS for constant dimension. However, these theoretical algorithms cannot be efficiently implemented when the dimensionality is not constant. (Ding and Xu 2017) also mentioned that any constant factor approximation needs a time complexity at least nΩ(d) based on some reasonable assumption in the theory of computational complexity, where n = max{|A|, |B|}. That is, it is unlikely to obtain a (1 + ϵ)-approximation within a practical running time, especially when n is very large. In practice, (Cohen and Guibas 1999) proposed an alternating minimization approach for computing the geometric alignment of A and B. Several other approaches (Cornea et al. 2005; Todorovic and Ahuja 2008) based on graph matching are inappropriate to be extended for high dimensional alignment. In machine learning, a related topic is called manifold alignment (Ham, Lee, and Saul 2005; Wang, Krafft, and Mahadevan 2011); however, it usually has different settings and applications, and thus is out of the scope of this paper. Because the approach of (Cohen and Guibas 1999) is closely related to our proposed algorithm, we introduce it with more details for the sake of completeness. Roughly speaking, their approach is similar to the Iterative Closest Point method (ICP) method (Besl and Mc Kay 1992), where its each iteration alternatively updates the EMD flow and rigid transformation. Thus it converges to some local optimum eventually. To update the rigid transformation, we can apply Orthogonal Procrustes (OP) analysis (Sch onemann 1966). The original OP analysis is only for unweighted point sets, and hence we need some significant modification for our problem. Suppose that the EMD flow F = {fij} is fixed and the rigid transformation is waiting to update in the current stage.

We imagine two new sets of weighted points ˆA = {a1 1, a2 1, , an2 1 ; a1 2, a2 2, , an2 2 ;

; a1 n1, a2 n1, , an2 n1}; (2) ˆB = {b1 1, b1 2, , b1 n2; b2 1, b2 2, , b2 n2; ; bn1 1 , bn1 2 , , bn1 n2}, (3)

where each aj i (resp., bi j) has the weight fij and the same spatial position of ai (resp., bj). With a slight abuse of notations, we also use aj i (resp., bi j) to denote the corresponding d-dimensional column vector in the following description. First, we take a translation vector v such that the weighted mean points of ˆA and ˆB + v coincide with each other (this can be easily proved, due to the fact that the objective function uses squared distance (Cohen and Guibas 1999)). Second, by OP analysis, we compute an orthogonal matrix R for ˆB + v to minimize its weighted L2 2 difference to ˆA. For this purpose, we generate two d (n1n2) matrices MA and MB, where each point of ˆA (resp., ˆB + v ) corresponds to an individual column of MA (resp., MB); for example, a point aj i ˆA (resp., bi j + v ˆB + v ) corresponds to a column

fijaj i (resp.,

fij(bi j + v )) in MA (resp., MB). Let the SVD of MA M T B be UΣV T , and the optimal orthogonal matrix R should be UV T through OP analysis. Actually we do not need to really construct the large matrices MA and MB, since many of the columns are identical. Instead, we can compute the multiplication MA M T B in O(n1n2d + min{n1, n2} d2) time (see Lemma 3 in Appendix). Therefore, the time complexity for obtaining the optimal R is O(n1n2d + min{n1, n2} d2 + d3). Proposition 1. Each iteration of the approach of (Cohen and Guibas 1999) takes Γ(n1, n2, d) + O(n1n2d + min{n1, n2} d2+d3) time, where Γ(n1, n2, d) indicates the time complexity of the EMD algorithm it adopts. In practice, we usually assume n1, n2 = O(n) with some n d, and then the complexity can be simplified to be Γ(n, d)+O(n2d). The bottleneck is that the algorithm needs to repeatedly compute the EMD and transformation, especially when n and d are large (usually Γ(n, d) = Ω(n2d)). Based on the property of low doubling dimension, we construct a pair of compressed point sets to replace the original A and B, and run the same algorithm on the compressed data instead. As a consequence, the running time is reduced significantly. Note that our compression step is independent of the approach (Cohen and Guibas 1999); actually, any alignment method with the same objective function in Definition 3 can benefit from our compression idea. Recently, (Nasser, Jubran, and Feldman 2015) proposed a core-set based compression approach to speed up the computation of alignment. However, their method requires to know the correspondences between the point sets in advance and therefore it is not suitable to handle EMD; moreover, their compression achieves the advantage only when d is small.

2 The Algorithm and Analysis Our idea starts from the widely studied k-center clustering. Given an integer k 1 and a point set P in some metric

space, k-center clustering is to partition P into k clusters and cover each cluster by an individual ball, such that the maximum radius of the balls is minimized. (Gonzalez 1985) presented an elegant 2-approximation algorithm, where the radius of each resulting ball (i.e., cluster) is at most two times the optimum. Initially, it selects an arbitrary point, say c1, from the input P and sets S = {c1}; then it iteratively selects a new point which has the largest distance to S among the points of P and adds it to S, until |S| = k (the distance between a point q and S is defined as min{||q p|| | p S}); suppose S = {c1, , ck}, and then P is covered by the k balls Ball(c1, r), , Ball(ck, r) with r min{||ci cj|| | 1 i = j k}. It is able to prove that r is at most two times the optimal radius of the given instance. Using the property of doubling dimension, we have the following lemma.

Lemma 1. Let P be a point set in Rd with the doubling dimension ρ d. The diameter of P is denoted by , i.e., = max{||p q|| | p, q P}. Given a small parameter ϵ > 0, if one runs the k-center clustering algorithm of Gonzalez by setting k = ( 2

ϵ )ρ, the radius of each resulting ball is at most ϵ .

Proof. Let S be the set of k points by Gonzalez s algorithm, and the resulting radius be r. We also define the aspect ratio of S as the ratio between the maximum and minimum pairwise distances in S. Then, it is easy to see that the aspect ratio of S is at most /r. Now, we need the following Claim 1 from (Krauthgamer and Lee 2004; Talwar 2004). Actually, the claim can be obtained by recursively applying the definition of doubling dimension.

Claim 1. Let (X, d X) be a metric space with the doubling dimension ρ, and Y X. If the aspect ratio of Y is upper bounded by some positive value α, then |Y | 2ρ log2 α .

Replacing X and Y by P and S respectively in the above claim, we have

|S| 2ρ log2 /r 2ρ(1+log2 /r). (4)

Since |S| = (2/ϵ)ρ, (4) implies r ϵ .

Let A and B be the two given point sets in Definition 3, and the maximum of their diameters be . We also assume that they both have the doubling dimension at most ρ. Our idea for compressing A and B is as follows. As described in Lemma 1, we set k = ( 2

ϵ )ρ and run Gonzalez s algorithm on A and B respectively. We denote by SA = {c A 1 , , c A k } and SB = {c B 1 , , c B k } the obtained sets of k-cluster centers. For each cluster center c A j (resp., c B j ), we assign a weight that is equal to the total weights of the points in the corresponding cluster. As a consequence, we obtain a new instance (SA, SB) for geometric alignment. It is easy to know that the total weights of SA (resp., SB) is equal to WA (resp., WB). The following theorem shows that we can achieve an approximate solution for the instance (A, B) by solving the alignment of (SA, SB).

Theorem 1. Suppose ϵ > 0 is a small parameter in Lemma 1. Given any c 1, let T be

a rigid transformation yielding c-approximation for minimizing EMD ( SA, T (SB) ) in Definition 3. Then, EMD ( A, T (B) )

c(1 + 2ϵ)2 min T EMD ( A, T (B) )

+2ϵ(c + 1 + 2cϵ)(1 + 2ϵ) 2

= c ( 1 + O(ϵ) ) min T EMD ( A, T (B) )

+2ϵ ( 1 + O(ϵ) ) (c + 1) 2. (5)

Proof. First, we denote by Topt the optimal rigid transformation achieving min T EMD ( A, T (B) ) . Since T yields c-approximation for minimizing EMD ( SA, T (SB) ) , we have EMD ( SA, T (SB) )

c min T EMD ( SA, T (SB) )

c EMD ( SA, Topt(SB) ) . (6)

Recall that each point c A j (resp., c B j ) has the weight equal to the total weights of the points in the corresponding cluster. For instance, if the cluster contains {aj(1), aj(2), , aj(h)}, the weight of c A j should be h l=1 αj(l); actually, we can view c A j as h overlapping points {a j(1), a j(2), , a j(h)} with each a j(l) having the weight αj(l). Therefore, for the sake of convenience, we use another representation for SA and SB in our proof below:

SA = {a 1, , a n1} and SB = {b 1, , b n2}, (7)

where each a j (resp., b j) has the weight αj (resp., βj). Note that SA and SB only have k distinct positions respectively in the space. Moreover, due to Lemma 1, we know that ||a i ai||, ||b j bj|| ϵ for any 1 i n1 and 1 j n2, and these bounds are invariant under any rigid transformation in the space. Consequently, for any pair (i, j) and rigid transformation T , we have ||ai T (bj)||2

( ||ai a i|| + ||a i T (b j)||

+||T (b j) T (bj)|| )2

( ||a i T (b j)|| + 2ϵ )2

= ||a i T (b j)||2 + 4ϵ ||a i T (b j)|| + 4ϵ2 2

||a i T (b j)||2 + 2ϵ ( 2 + ||a i T (b j)||2)

= (1 + 2ϵ)||a i T (b j)||2 + (2ϵ + 4ϵ2) 2 (8)

by applying triangle inequality. Using exactly the same idea, we have ||a i T (b j)||2

(1 + 2ϵ)||ai T (bj)||2 + (2ϵ + 4ϵ2) 2. (9)

Based on Definition 1, we denote by F = { fij} the induced flow of EMD ( SA, T (SB) ) (using the representations (7) for SA and SB). Then (8) directly implies that

EMD ( A, T (B) )

1 min{WA, WB}

j=1 fij||ai T (bj)||2

1 + 2ϵ min{WA, WB}

j=1 fij||a i T (b j)||2

+(2ϵ + 4ϵ2) 2

= (1 + 2ϵ)EMD(SA, T (SB))

+(2ϵ + 4ϵ2) 2. (10)

By the similar idea (replacing T by Topt, and exchanging the roles of (A, B) and (SA, SB)), (9) directly implies that EMD ( SA, Topt(SB) )

(1 + 2ϵ)EMD ( A, Topt(B) )

+(2ϵ + 4ϵ2) 2. (11)

Combining (6), (10), and (11), we have EMD ( A, T (B) )

(1 + 2ϵ)EMD ( SA, T (SB) )

+(2ϵ + 4ϵ2) 2

(1 + 2ϵ) c EMD ( SA, Topt(SB) )

+(2ϵ + 4ϵ2) 2

c(1 + 2ϵ)2 EMD ( A, Topt(B) )

+2ϵ(c + 1 + 2cϵ)(1 + 2ϵ) 2, (12)

and the proof is completed.

When ϵ is small enough, Theorem 1 shows that EMD ( A, T (B) ) c EMD ( A, Topt(B) ) . That is, T , the solution of (SA, SB), achieves roughly the same performance on (A, B). Consequently, we propose the approximation algorithm for geometric alignment (see Algorithm 1). We would like to emphasize that though we use the algorithm from (Cohen and Guibas 1999) in Step 3, Theorem 1 is an independent result; that is, any alignment method with the same objective function in Definition 3 can benefit from Theorem 1.

Algorithm 1 Geometric Alignment

1: Given ϵ (0, 1) and set k = (2/ϵ)ρ. 2: Run Gonzalez s k-center clustering algorithm on A and B, and obtain the sets of cluster centers SA and SB respectively. 3: Apply the existing alignment algorithm, e.g., (Cohen and Guibas 1999), on (SA, SB). 4: Obtain the rigid transformation T from Step 3, and compute the corresponding EMD flow between A and T (B). 5: Output: A rigid transformation T of B and the EMD flow between A and T (B).

3 The Time Complexity We analyze the time complexity of Algorithm 1 and consider step 2-4 separately. To simplify our description, we use n to denote max{n1, n2}. In step 3, we suppose that the iterative approach (Cohen and Guibas 1999) takes λ 1 rounds. Step 2. A straightforward implementation of Gonzalez s algorithm is selecting the k cluster centers iteratively where the resulting running time is O(knd). Several faster implementations for the high dimensional case with low doubling dimension have been studied before; their idea is to maintain some data structures to reduce the amortized complexity of each iteration. We refer the reader to (Har-Peled and Mendel 2006) for more details. Step 3. Since we run the algorithm (Cohen and Guibas 1999) on the smaller instance (SA, SB) instead of (A, B),

we know that the complexity of Step 3 is O ( λ ( Γ(k, d) +

k2d + kd2 + d3)) by Proposition 1.

Step 4. We need to compute the transformed T (B) first and then EMD(A, T (B)). Note that the transformation T is not off-the-shelf, because it is the combination of a sequence of rigid transformations from the iterative approach (Cohen and Guibas 1999) in Step 3. Since it takes λ rounds, T should be the multiplication of λ rigid transformations. We use (Rl, v l) to denote the orthogonal matrix and translation vector obtained in the l-th round for 1 l λ. We can update B round by round: starting from l = 1, update B to be Rl B + v l in each round; the whole time complexity will be O(λnd2). In fact, we have a more efficient way by computing T first before transforming B. Lemma 2. Let (R, v ) be the orthogonal matrix and translation vector of T . Then

R = Πλ l=1Rl, v = (Πλ l=2Rl) v 1 + (Πλ l=3Rl) v 2 + + Rλ v λ 1 + v λ, (13)

and T (B) can be obtained in O(λd3 + nd2) time.

Proof. The equations (13) can be easily verified by simple calculations. In addition, we can recursively compute the multiplications Πλ l=i Rl for i = λ, λ 1, , 1. Consequently, the orthogonal matrix R and translation vector v can be obtained in O(λd3) time. In addition, the complexity for computing T (B) = RB + v is O(nd2).

Lemma 2 provides a complexity significantly lower than the previous O(λnd2) (usually n is much larger than d in practice). After obtaining T (B), we can compute EMD(A, T (B)) in Γ(n, d) time. Note that the complexity Γ(n, d) usually is Ω(n2d), which dominates the complexity of Step 2 and the second term nd2 in the complexity by Lemma 2. As a consequence, we have the following theorem for the running time. Theorem 2. Suppose n = max{n1, n2} d and the algorithm of (Cohen and Guibas 1999) takes λ 1 rounds. The running time of Algorithm 1 is O ( λ ( Γ(k, d) + k2d + kd2 +

d3)) + Γ(n, d).

Table 1: Random dataset: EMDs and running times of different compression levels.

γ 1/50 1/40 1/30 1/20 1/10 1 EMD 0.948 0.946 0.945 0.943 0.941 0.933 Time (s) 48.7 54.2 61.0 80.6 144.6 1418.2

Table 2: Linguistic datasets: EMDs and running times of different compression levels.

γ 1/50 1/40 1/30 1/20 1/10 1 sp-en EMD 0.989 0.969 0.955 0.931 0.892 0.820 Time (s) 3.4 3.6 3.7 3.9 5.3 100.5 it-en EMD 0.983 0.966 0.951 0.935 0.899 0.847 Time (s) 5.4 5.9 6.1 6.5 8.8 162.6 ja-ch EMD 0.991 0.975 0.962 0.941 0.910 0.836 Time (s) 1.6 2.1 2.2 2.0 3.0 67.0 tu-en EMD 0.982 0.960 0.946 0.922 0.889 0.839 Time (s) 10.1 10.4 11.2 11.9 15.9 287.0 ch-en EMD 1.014 1.012 0.990 0.962 0.923 0.842 Time (s) 1.9 1.7 2.2 2.4 2.7 51.6

If we run the same number of rounds on the original instance (A, B) by the approach (Cohen and Guibas 1999), the

total running time will be O ( λ ( Γ(n, d) + n2d )) by Proposition 1. When k n, Algorithm 1 achieves a significant reduction on the running time.

4 Experiments We implement our proposed algorithm and test the performance on both random and real datasets. All of the experimental results are obtained on a Windows workstation with 2.4GHz Intel Xeon CPU and 32GB DDR4 Memory. For each dataset, we run 20 trials and report the average results. We set the iterative approach (Cohen and Guibas 1999) to terminate when the change of the objective value is less than 10 3. To construct a random dataset, we randomly generate two manifolds in R500, which are represented by the polynomial functions with low dimension ( 50); in the manifolds, we take two sets of randomly sampled points having the sizes of n1 = 2 104 and n2 = 3 104, respectively; also, for each sampled point, we randomly assign a positive weight; finally, we obtain two weighted point sets as an instance of geometric alignment. For real datasets, we consider the two applications mentioned in Section 1, unsupervised bilingual lexicon induction and PPI network alignment. For the first application, we have 5 pairs of languages: Chinese-English, Spanish English, Italian-English, Japanese-Chinese, and Turkish English. Given by (Zhang et al. 2017), each language has a vocabulary list containing 3000 to 13000 words; we also follow their preprocessing idea to represent all the words by vectors in R50 through the embedding technique (Mikolov, Le, and Sutskever 2013). Actually, each vocabulary list is represented by a distribution in the space where each vector has the weight equal to the corresponding frequency in the language. For the second application, we use the popular benchmark dataset NAPAbench (Sayed Mohammad and

Yoon 2012) of PPI networks. It contains 3 pairs of PPI networks, where each network is a graph of 3000-10000 nodes. As the step of preprocessing, we apply the recent node2vec technique (Grover and Leskovec 2016) to represent each network by a group of vectors in R100; following (Liu et al. 2017), we assign a unit weight to each vector.

Results. For each instance, we try different compression levels. According to Algorithm 1, we compress the size of each point set to be k. We set k = γ max{n1, n2} where γ {1/50, 1/40, 1/30, 1/20, 1/10, 1}; in particular, γ = 1 indicates that we directly run the algorithm (Cohen and Guibas 1999) without compression. The purpose of our proposed approach is to design a compression method, such that the resulting qualities on the original and compressed datasets are close (as stated in Section 1.2 and the paragraph after the proof of Theorem 1). So in the experiment, we focus on the comparison with the results on the original data (i.e., the results of γ = 1).

The results on random dataset are shown in Table 1. The obtained EMDs by compression are only slightly higher than the ones of γ = 1, while the advantage of the compression on running time is significant. For example, the running time of γ = 1/50 is less than 5% of the one of γ = 1. We obtain the similar performances on the real datasets. The results on Linguistic are shown in Table 2 (due to the space limit, we put the results on PPI network dataset in the full version of our paper); the EMD for each compression level is always at most 1.2 times the baseline with γ = 1, but the corresponding running time is dramatically reduced.

To further show the robustness of our method, we particularly add Gaussian noise to the random dataset and study the change of the objective value by varying the noise level. We set the standard variance of the Gaussian noise to be η , where is the maximum diameter of the point sets and η is from 0.5 10 2 to 2.5 10 2. Figure. 1 shows that the obtained EMD over baseline remains very stable (slightly higher than 1) of each noise level η.

1/50 1/40 1/30 1/20 1/10 Compression level

EMD over baseline

=0.005 =0.010 =0.015 =0.020 =0.025

Figure 1: The EMDs over baseline for different noise levels.

5 Conclusion In this paper, we propose a novel framework for compressing point sets in high dimension, so as to approximately preserve the quality of alignment. This work is motivated by several emerging applications in the fields of machine learning, bioinformatics, and wireless network. Our method utilizes the property of low doubling dimension, and yields a significant speedup on alignment. In the experiments on random and real datasets, we show that the proposed compression approach can efficiently reduce the running time to a great extent.

6 Acknowledgments The research of this work was supported in part by NSF through grant CCF-1656905 and a start-up fund from Michigan State University. The authors also want to thank the anonymous reviewers for their helpful comments and suggestions for improving the paper.

7 Appendix Lemma 3. The multiplication MA M T B can be computed in O(n1n2d + min{n1, n2} d2) time.

Proof. With a slight abuse of notations, we also use F to denote the n1 n2 matrix of the EMD flow where each entry is fij; also, Fi,: represents the i-th row of the matrix F. Given a vector t, we use

t to denote the new vector with each entry being the square root of the corresponding one in t. Also, we use diag(t) to denote the diagonal matrix where the i-th diagonal entry is the i-th entry of t. Following the constructions of MA and MB, we have

f1n2an2 1 ;

fn11a1 n1, ,

fn1n2an2 n1]

F2,:, , an1

fn11bn1 1 , ,

fn1n2bn1 n2]

F1,:), diag(

by some simple calculation. Then,

Fi,: ) ( diag(

i=1 ai Fi,:BT = AFBT .

It is easy to know that computing AFBT takes O(n1n2d + min{n1, n2} d2) time.

Agarwal, P. K.; Fox, K.; Panigrahi, D.; Varadarajan, K. R.; and Xiao, A. 2017. Faster algorithms for the geometric transportation problem. In 33rd International Symposium on Computational Geometry, So CG 2017, July 4-7, 2017, Brisbane, Australia, 7:1 7:16. Ahuja, R. K.; Magnanti, T. L.; and Orlin, J. B. 1993. Network flows: theory, algorithms, and applications. Prentice Hall. Andoni, A.; Do Ba, K.; Indyk, P.; and Woodruff, D. 2009. Efficient sketches for earth-mover distance, with applications. In Foundations of Computer Science, 2009. FOCS 09. 50th Annual IEEE Symposium on, 324 330. IEEE. Belkin, M. 2003. Problems of learning on manifolds. The University of Chicago. Benamou, J.; Carlier, G.; Cuturi, M.; Nenna, L.; and Peyr e, G. 2015. Iterative bregman projections for regularized transportation problems. SIAM J. Scientific Computing 37(2). Besl, P., and Mc Kay, N. D. 1992. A method for registration of 3-d shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 14(2):239 256. Cabello, S.; Giannopoulos, P.; Knauer, C.; and Rote, G. 2008. Matching point sets with respect to the earth mover s distance. Computational Geometry 39(2):118 133. Cao, X.; Wei, Y.; Wen, F.; and Sun, J. 2014. Face alignment by explicit shape regression. International Journal of Computer Vision 107(2):177 190. Cohen, S., and Guibas, L. 1999. The earth mover s distance under transformation sets. In Proceedings of the 7th IEEE International Conference on Computer Vision, 1. Cornea, N. D.; Demirci, M. F.; Silver, D.; Dickinson, S.; and Kantor, P. 2005. 3d object retrieval using many-to-many matching of curve skeletons. In Shape Modeling and Applications, 2005 International Conference, 366 371. IEEE. Courty, N.; Flamary, R.; Habrard, A.; and Rakotomamonjy, A. 2017. Joint distribution optimal transportation for domain adaptation. In Advances in Neural Information Processing Systems, 3733 3742. Dasgupta, S., and Sinha, K. 2013. Randomized partition trees for exact nearest neighbor search. In Conference on Learning Theory, 317 337. Ding, H., and Xu, J. 2017. FPTAS for minimizing the earth mover s distance under rigid transformations and related problems. Algorithmica 78(3):741 770.

Gonzalez, T. F. 1985. Clustering to minimize the maximum intercluster distance. Theoretical Computer Science 38:293 306. Grover, A., and Leskovec, J. 2016. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 855 864. ACM. Ham, J.; Lee, D. D.; and Saul, L. K. 2005. Semisupervised alignment of manifolds. In AISTATS, 120 127. Har-Peled, S., and Mendel, M. 2006. Fast construction of nets in low-dimensional metrics and their applications. SIAM Journal on Computing 35(5):1148 1184. Indyk, P., and Thaper, N. 2003. Fast color image retrieval via embeddings. In Workshop on Statistical and Computational Theories of Vision (at ICCV). Indyk, P. 2007. A near linear time constant factor approximation for euclidean bichromatic matching (cost). In Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, 39 42. Society for Industrial and Applied Mathematics. Karger, D. R., and Ruhl, M. 2002. Finding nearest neighbors in growth-restricted metrics. In Proceedings of the thiry-fourth annual ACM symposium on Theory of computing, 741 750. ACM. Klein, O., and Veltkamp, R. C. 2005. Approximation algorithms for computing the earth mover s distance under transformations. In International Symposium on Algorithms and Computation, 1019 1028. Springer. Krauthgamer, R., and Lee, J. R. 2004. Navigating nets: simple algorithms for proximity search. In Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms, 798 807. Society for Industrial and Applied Mathematics. Kusner, M.; Sun, Y.; Kolkin, N.; and Weinberger, K. 2015. From word embeddings to document distances. In International Conference on Machine Learning, 957 966. Laakso, T. J. 2002. Plane with a -weighted metric not bilipschitz embeddable to rn. Bulletin of the London Mathematical Society 34(6):667 676. Ling, H., and Okada, K. 2007. An efficient earth mover s distance algorithm for robust histogram comparison. IEEE transactions on pattern analysis and machine intelligence 29(5):840 853. Liu, Y.; Ding, H.; Chen, D.; and Xu, J. 2017. Novel geometric approach for global alignment of PPI networks. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, February 4-9, 2017, San Francisco, California, USA., 31 37. Malod-Dognin, N.; Ban, K.; and Prˇzulj, N. 2017. Unified alignment of protein-protein interaction networks. Scientific Reports 7(1):953. Maltoni, D.; Maio, D.; Jain, A. K.; and Prabhakar, S. 2009. Handbook of fingerprint recognition. Springer Science & Business Media. Mikolov, T.; Le, Q. V.; and Sutskever, I. 2013. Exploiting

similarities among languages for machine translation. ar Xiv preprint ar Xiv:1309.4168. Nasser, S.; Jubran, I.; and Feldman, D. 2015. Low-cost and faster tracking systems using core-sets for pose-estimation. Co RR abs/1511.09120. Pan, S. J., and Yang, Q. 2010. A survey on transfer learning. IEEE Transactions on knowledge and data engineering 22(10):1345 1359. Pele, O., and Werman, M. 2009. Fast and robust earth mover s distances. In Computer vision, 2009 IEEE 12th international conference on, 460 467. IEEE. Rubner, Y.; Tomasi, C.; and Guibas, L. J. 2000. The earth mover s distance as a metric for image retrieval. International journal of computer vision 40(2):99 121. Sayed Mohammad, E. S., and Yoon, B.-J. 2012. A network synthesis model for generating protein interaction network families. Plo S one 7. Sch onemann, P. H. 1966. A generalized solution of the orthogonal procrustes problem. Psychometrika 31(1):1 10. Talwar, K. 2004. Bypassing the embedding: algorithms for low dimensional metrics. In Proceedings of the thirty-sixth annual ACM symposium on Theory of computing, 281 290. Tenenbaum, J. B.; De Silva, V.; and Langford, J. C. 2000. A global geometric framework for nonlinear dimensionality reduction. science 290(5500):2319 2323. Todorovic, S., and Ahuja, N. 2008. Region-based hierarchical image matching. International Journal of Computer Vision 78(1):47 66. Wang, C.; Krafft, P.; and Mahadevan, S. 2011. Manifold alignment. Yang, Z.; Wu, C.; and Liu, Y. 2012. Locating in fingerprint space: wireless indoor localization with little human intervention. In Proceedings of the 18th annual international conference on Mobile computing and networking, 269 280. ACM. Yin, Z.; Lan, H.; Tan, G.; Lu, M.; Vasilakos, A. V.; and Liu, W. 2017. Computing platforms for big biological data analytics: perspectives and challenges. Computational and structural biotechnology journal 15:403 411. Youn, H.; Sutton, L.; Smith, E.; Moore, C.; Wilkins, J. F.; Maddieson, I.; Croft, W.; and Bhattacharya, T. 2016. On the universal structure of human lexical semantics. Proceedings of the National Academy of Sciences 113(7):1766 1771. Zhang, M.; Liu, Y.; Luan, H.; and Sun, M. 2017. Earth mover s distance minimization for unsupervised bilingual lexicon induction. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, Copenhagen, Denmark, September 9-11, 2017, 1934 1945.