# multifrequency_vector_diffusion_maps__97755813.pdf Multi-Frequency Vector Diffusion Maps Yifeng Fan 1 Zhizhen Zhao 1 Abstract We introduce multi-frequency vector diffusion maps (MFVDM), a new framework for organizing and analyzing high dimensional datasets. The new method is a mathematical and algorithmic generalization of vector diffusion maps (VDM) and other non-linear dimensionality reduction methods. MFVDM combines different nonlinear embeddings of the data points defined with multiple unitary irreducible representations of the alignment group that connect two nodes in the graph. We illustrate the efficacy of MFVDM on synthetic data generated according to a random graph model and cryo-electron microscopy image dataset. The new method achieves better nearest neighbor search and alignment estimation than the state-of-the-arts VDM and diffusion maps (DM) on extremely noisy data. 1. Introduction Nonlinear dimensionality reduction methods, such as locally linear embedding (LLE) (Roweis & Saul, 2000), ISOMAP (Tenenbaum et al., 2000), Hessian LLE (Donoho & Grimes, 2003), Laplacian eigenmaps (Belkin & Niyogi, 2002; 2003), and diffusion maps (DM) (Coifman & Lafon, 2006) are invaluable tools for embedding complex data in a low dimensional space and for regression problems on graphs and manifolds. To this end, those methods assume that the high-dimensional data lies on a low dimensional manifold and local affinities in a weighted neighborhood graph are used to learn the global structure of the data. Spectral clustering (Nadler et al., 2006; Von Luxburg, 2007), semi-supervised learning (Zhu, 2006; Goldberg et al., 2009; Yang et al., 2016), out-of-sample extension (Belkin et al., 2006), image denoising (Gong et al., 2010; Singer et al., 2009) share similar geometrical considerations. Those techniques are either directly or indirectly related to the heat kernel for functions on the data. Vector diffusion maps (VDM) (Singer & Wu, 2012) generalizes DM to define heat 1Department of Electrical and Computer Engeneering, Coordinated Science Laboratory, University of Illinois at Urbana Champaign, Illinois, USA. Correspondence to: Yifeng Fan . Proceedings of the 36 th International Conference on Machine Learning, Long Beach, California, PMLR 97, 2019. Copyright 2019 by the author(s). kernel for vector fields on the data manifold. The corresponding adjacency matrix is based on edge weights and orthogonal transformations between connected nodes. Using the spectral decomposition of the matrix, VDM defines a metric for the data to indicate the closeness of the data points on the manifold. For some applications, the vector diffusion metric is beneficial, since it takes into account linear transformations, and as a result, it provides a better organization of the data. However, for extremely noisy data, VDM nearest neighbor search may fail at identifying the true nearby points on the manifold. This results in shortcut edges that connect points with large geodesic distances on the manifold. To address this issue, we introduce a new algorithm called multi-frequency vector diffusion maps (MFVDM) to represent and organize complex high-dimensional data, exhibiting a non-trivial group invariance. To this end, we augment VDM with multiple irreducible representations of the compact group to improve the rotationally invariant nearest neighbor search and the alignment estimation between nearest neighbor pairs, when the initial estimation contains a large number of outliers due to noise. Specifically, we define a set of kernels, denoted by Wk, using multiple irreducible representations of the compact alignment group indexed by integer k and introduce the corresponding frequency-k-VDMs. The MFVDM is constructed by concatenating all the frequency-k-VDMs up to a cutoff kmax. We use the new embeddings to identify nearest neighbors. The eigenvectors of the normalized Wk are used to estimate the pairwise alignments between nearest neighbors. This framework also extends the mathematical theory of cryo-electron microscopy (EM) image analysis (Singer et al., 2011; Hadani & Singer, 2011; Giannakis et al., 2012; Schwander et al., 2012; Dashti et al., 2014). We show that MFVDM outperforms VDM and DM for data sampled from low-dimensional manifolds, when a large proportion of the edge connections are corrupted. MFVDM is also able to improve the nearest neighbor search and rotational alignment for 2-D class averaging in cryo-EM. 2. Preliminaries and Problem Setup Given a dataset xi Rl for i = 1, . . . , n, we assume that the data lie on or close to a low dimensional smooth manifold X of intrinsic dimension d l. Suppose that G is a compact Lie group, which has unitary irreducible representations according to Peter-Weyl theorem. The data space X is closed under G if for all g G and all x X, g x X, Multi-Frequency Vector Diffusion Maps where denotes the group action. The G-invariant distance between two data points is defined as, dij = min g G xi g xj , (1) and the associated optimal alignment is, gij = arg min g G xi g xj . (2) We assume that the optimal alignment is unique and construct an undirected graph G = (V, E) based on the distances in (1) using the ϵ-neighborhood criterion, i.e. (i, j) E iff dij < ϵ, or κ-nearest neighbor criterion, i.e. (i, j) E iff j is one of the κ nearest neighbors of i. The edge weights wij are defined using a kernel function on the G-invariant distance wij = Kσ(dij). For example, the Gaussian kernel leads to weights of the form wij = Kσ(dij) = exp ming G xi g xj 2 The resulting graph is defined on the quotient space M := X/G and is invariant to the group transformation of the individual data points. Under certain conditions, the quotient space M is also a smooth manifold. We can identify each data point xi with vi M and the dimension of M is lower than the dimension of X. The unitary irreducible representation of the group g is represented by ρk(g). If vi and vj are close on the manifold, then the representation ρ1(gij) of the optimal alignment gij is an approximation of the local parallel transport operator Pxi,xj : Tvj M 7 Tvi M (Singer et al., 2011; Singer & Wu, 2012). Take cryo-EM imaging as an example, each image is a tomographic projection of a 3D object at an unknown orientation x SO(3) represented by a 3 3 orthogonal matrix R = [R1, R2, R3] satisfying R R = RR = I and det R = 1 (Singer et al., 2011; Hadani & Singer, 2011; Zhao & Singer, 2014). The viewing direction of each image can be represented as a point on the unit sphere, denoted by v (v = R3). The first two columns of the orthogonal matrix R1 and R2 correspond to the lifted vertical and horizontal axes of the image in the tangent plane Tv S2. Therefore, each image can be represented by a unit tangent vector on the sphere and the base manifold is M = SO(3)/SO(2) = S2. Images with similar v s are identified as the nearest neighbors and they can be accurately estimated using (1) from clean images. Registering the centered images corresponds to in-plane rotationally aligning the nearest neighbor images according to (2). In many applications, noise in the observational data affects the estimations of G-invariant distances dij and optimal alignments gij. This results in shortcut edges in the ϵ-neighborhood graph or κ-nearest neighbor graph, and connects points on M where the underlying geodesic distances are large. 3. Algorithm To address this issue of shortcut edges induced by noise, we extend VDM using multiple irreducible representations of the compact alignment group. 3.1. Affinity and mapping We assume the initial graph G is given along with the optimal alignments on the connected edges. For simplicity and because of our interest in cryo-EM image classification, we focus on G = SO(2) and we denote the optimal alignment angle by αij. The corresponding frequency-k unitary irreducible representations is eıkαij, where ı = 1. For points that are nearby on M, the alignments should have cycle consistency under the clean case, for example, k(αij + αjl + αli) 0 mod 2π for integers k Z, if nodes i, j and l are true nearest neighbors. To systematically incorporate the alignment information and impose the consistency of alignments, for a given graph G = (V, E), we construct a set of n n affinity matrices Wk, Wk(i, j) = wijeıkαij (i, j) E, 0 otherwise, (4) where the edge weights according to (3) are real, wij = wji and αij = αji for all (i, j) E. At frequency k, the weighted degree of node i is: deg(i) := X j:(i,j) E |Wk(i, j)| = X j:(i,j) E wij, (5) and the degree is identical through all frequencies. We define a diagonal degree matrix D of size n n, where the ith diagonal entry D(i, i) = deg(i). We construct the normalized matrix Ak = D 1Wk which is applied to complex vectors z of length n and each entry z(i) C can be viewed as a vector in TM. The matrix Ak is an averaging operator for vector fields, i.e. (Akz)(i) = 1 deg(i) P j:(i,j) E wijeıkαijz(j). In our framework, we define affinity between i and j by considering the consistency of the transformations over all paths of length 2t that connect i and j. In addition, we also consider the consistencies in the transported vectors at k frequency (see Fig. 1). Intuitively, this means A2t k (i, j) sums the transformations of all length-2t paths from i to j, and a large value of |A2t k (i, j)| indicates not only the strength of connection between i and j, but also the level of consistency in the alignment along all connected paths. We obtain the affinity of i and j by observing the following decomposition: Ak = D 1Wk = D 1/2 D 1/2Wk D 1/2 | {z } Sk Since Sk is Hermitian, it has a complete set of real eigenvalues λ(k) 1 , λ(k) 2 , . . . , λ(k) n and eigenvectors u(k) 1 , u(k) 2 , . . . , u(k) n , where λ(k) 1 > λ(k) 2 > . . . > λ(k) n . We can express S2t k (i, j) in terms of the eigenvalues and eigen- Multi-Frequency Vector Diffusion Maps 𝑘= 1 𝑘= 2 𝑘= 𝑘𝑚𝑎𝑥 Figure 1. Illustration of multi-frequency edge connection. The -operation denotes concatenation. vectors of Sk: S2t k (i, j) = λ(k) l 2t u(k) l (i)u(k) l (j). (7) Therefore the affinity of i and j at the kth frequency is given by |S2t k (i, j)|2 = λ(k) l λ(k) r 2t u(k) l (i)u(k) r (i)u(k) l (j)u(k) r (j) = D V (k) t (i), V (k) t (j) E , (8) which is expressed by an inner product between two vectors V (k) t (i), V (k) t (j) Cn2 via the mapping V (k) t : V (k) t : i 7 λ(k) l λ(k) r t u(k) l (i), u(k) r (i) n l,r=1 . (9) We call this frequency-k-VDM. Truncated mapping: Notice the matrices I+Sk and I Sk are both positive semi-definite (PSD) due to the following property: z Cn we have z (I Sk)z = (10) (i,j) E wij deg (i) eıkαijz(j) p Therefore all eigenvalues {λ(k) i }n i=1 of Sk lie within the interval [ 1, 1]. Consequently, for large t, most (λ(k) l λ(k) r )2t terms in (8) are close to 0, and |S2t k (i, j)|2 can be well approximated by using only a few of the largest eigenvalues and their corresponding eigenvectors. Hence, we truncate the frequency-k-VDM mapping V (k) t using a cutoff mk for each frequency k: ˆV (k) t : i 7 λ(k) l λ(k) r t u(k) l (i), u(k) r (i) mk l,r=1 . (11) The affinity of i and j at the frequency k after truncation is given by | ˆS2t k (i, j)|2 = D ˆV (k) t (i), ˆV (k) t (j) E |S2t k (i, j)|2. (12) Remark 1: The truncated mapping not only has the advantage of computational efficiency, but also enhances the robustness to noise since the eigenvectors with smaller eigenvalues are more oscillatory and sensitive to noise. Multi-frequency mapping: Consider the affinity in (8) for k = 1, . . . , kmax, if i and j are connected by multiple paths with consistent transformations, the affinity | ˆS2t k (i, j)|2 should be large for all k. Then we can combine multiple representations (i.e., combine multiple k) to evaluate the consistencies of the group transformations along connected paths. Therefore, a straightforward way is to concatenate the truncated mappings ˆV (k) t for all k = 1, 2, . . . , kmax as: ˆVt(i) : i 7 ˆV (1) t (i); ˆV (2) t (i); . . . ; ˆV (kmax) t (i) , (13) called multi-frequency vector diffusion maps (MFVDM). We define the new affinity of i and j as the inner product of ˆVt(i) and ˆVt(j): | ˆS2t(i, j)|2 := k=1 | ˆS2t k (i, j)|2 = D ˆV (k) t (i), ˆV (k) t (j) E = D ˆVt(i), ˆVt(j) E . (14) MFVDM systematically incorporates the cycle consistencies on the geometric graph across multiple irreducible representations of the transformation group elements (in-plane rotational alignments in this case, see Fig. 1). Using information from multiple irreducible group representations leads to a more robust measure of rotationally invariant similarity. Remark 2: Empirically, we find the normalized mapping i 7 ˆVt(i) ˆVt(i) to be more robust to noise than ˆVt(i). A similar phenomenon was discussed in VDM (Singer & Wu, 2012). The normalized affinity is defined as, ˆVt(i) , ˆVt(j) Comparison with DM and VDM: Diffusion maps (DM) only consider scalar weights over the edges and the vector diffusion maps (VDM) only take into account consistencies of the transformations along connected edges using only one representation of SO(2), i.e. eıαij. In this paper, we generalize VDM and use not only one irreducible representation, i.e. k = 1, but also higher order k up to kmax. 3.2. Nearest neighbor search and rotational alignment In this section we introduce our method for joint nearest neighbor search and rotational alignment. Nearest neighbor search: Based on the extended and normalized mapping i 7 ˆVt(i) ˆVt(i) , we define the multi- frequency vector diffusion distance d MFVDM,t(i, j) between node i and j as d2 MFVDM,t(i, j) = ˆVt(i) ˆVt(j) ˆVt(i) , ˆVt(j) = 2 2Nt(i, j), which is the Euclidean distance between mappings of i and Multi-Frequency Vector Diffusion Maps Figure 2. Illustration of MFVDM rotational alignment. Solid lines indicate the local frames at node i and dashed lines at node j. j. We define the nearest neighbor for a node i to be the node j with smallest d2 MFVDM,t(i, j). Similarly, for VDM and DM, we define the distances d VDM,t and d DM,t, and perform the nearest neighbor search accordingly. Rotational alignment: We notice that the eigenvectors of Sk encode the alignment information between neighboring nodes, as illustrated in Fig. 2. Assume that two nodes i and j are located at the same base manifold point, for example, the same point on S2, but their tangent bundle frames are oriented differently, with an in-plane rotational angle αij. Then the corresponding entries of the eigenvectors are vectors in the complex plane and the following holds, u(k) l (i) = eıkαiju(k) l (j), l = 1, 2, . . . , n. (17) When i and j are close but not identical, (17) holds approximately. Recalling Remark 1, due to the existence of noise, for each frequency k we approximate the alignment eıkαij using only top mk eigenvectors. We then use weighted least squares to estimate αij, which can be written as the following optimization problem: ˆαij = arg min α λ(k) l 2t u(k) l (i) eıkαu(k) l (j) 2 = arg max α λ(k) l 2t u(k) l (i)u(k) l (j) = arg max α k=1 S2t k (i.j)e ikα. (18) To solve this, we define a sequence z and set z(k) for k = 1, 2, . . . , kmax to be z(k) = S2t k (i.j) = λ(k) l 2t u(k) l (i)u(k) l (j). (19) According to (19) and (18), the alignment angles ˆαij can be efficiently estimated by using an FFT on zero-padded z and identifying its peak. Due to usage of multiple unitary irreducible representations of SO(2), this approximation is more accurate and robust to noise than VDM. The improvement of the alignment estimation using higher order trigonometric moments is also observed in phase synchronization (Gao & Zhao, 2019). Algorithm 1 Joint nearest neighbor search and alignment Input: Initial noisy nearest neighbor graph G = (V, E) and the corresponding edge weights wijeıkαij defined on the edges, truncation cutoff mk for k = 1, . . . , kmax Output: κ-nearest neighbors for each data point and the corresponding alignments ˆαij 1 for k = 1, . . . , kmax do 2 Construct the normalized affinity matrix Wk and Sk according to (4) and (6) Compute the largest mk eigenvalues λ(k) 1 λ(k) 2 , , . . . , λ(k) mk of Sk and the corresponding eigenvectors {u(k) l }mk l=1 Compute the truncated frequency-k embedding ˆV (k) t according to (11) 4 Concatenate the truncated embedding { ˆV (k) t }kmax k=1, compute the normalized affinity by (15) Identify κ nearest neighbors for each data point Compute ˆαij for nearest neighbor pairs using (18). Computational complexity: Our joint nearest neighbor search and alignment algorithm is summarized in Alg. 1. The computational complexity is dominated by the eigendecomposition: Computing the top mk eigenvectors of the sparse Hermitian matrices Sk, for k = 1, . . . , kmax requires O(Pkmax k=1 n(m2 k + mkl)), where l is the average number of non-zero elements in each row of Sk (e.g. number of nearest neighbors). If we assume to use an identical truncation m (i.e., mk = m for all k), and express the above in terms of the mapping dimension d = kmaxm2, then the complexity is O(n(d + l kmaxd)). For large d and moderate kmax, the dominant term is O(nd), therefore MFVDM and VDM (kmax = 1) could have similar computational complexity for generating the mapping. Moreover, MFVDM can be faster by parallelizing for each frequency k. Next, searching for κnearest neighbors takes O(nκd log n) flops. The alignment step requires FFT of zero-padded z of length T, therefore identifying the alignments takes O(nκ(kmaxm + T log T)) or O(nκ( kmaxd + T log T)). 4. Analysis We use a probabilistic model to illustrate the noise robustness of our embedding using the top eigenvectors and eigenvalues of Wk s. We start with the clean neighborhood graph, i.e. (i, j) E if i is among j s κ-nearest neighbors or j is among i s κ-nearest neighbors according to the G-invariant distances. We construct a noisy graph based on the following process starting from the existing clean graph edges: with probability p, the distance dij is still small and we keep the edge between i and j. With probability 1 p we remove the edge (i, j) and link i to a random vertex, drawn uniformly at random from the remaining vertices that are not already connected to i. We assume that if the link between i and j is a random link, then the optimal alignment αij is uniformly distributed over [0, 2π). Our model assumes that the underlying graph of links between noisy data points is a small-world graph (Watts & Strogatz, 1998) on the mani- Multi-Frequency Vector Diffusion Maps fold, with edges being randomly rewired with probability 1 p. The alignments take their correct values for true links and random values for the rewired edges. The parameter p controls the signal to noise ratio of the graph connection where p = 1 indicates the clean graph. The matrix Wk is a random matrix under this model. Since the expected value of the random variable eıkθ vanishes for θ Uniform[0, 2π), the expected value of the matrix Wk is EWk = p W clean k , (20) where W clean k is the clean matrix that corresponds to p = 1 obtained in the case that all links and angles are set up correctly. At a single frequency k, the matrix Wk can be decomposed into Wk = p W clean k + Rk, (21) where Rk is a random matrix whose elements are independent and identically distributed (i.i.d) zero mean random variables with finite moments, since the elements of Rk are bounded for 1 k kmax. The top eigenvectors of Wk approximate the top eigenvectors of W clean k as long as the 2norm of Rk is not too large. Various bounds on the spectral norm of random sparse matrices are proven in (Khorunzhy, 2001; Khorunzhiy, 2003). This ensures the noise robustness for each frequency-k-VDM. Combining an ensemble of classifiers is able to boost the performance (Zhou, 2012). Across different frequencies, the entries Rk are dependent through the relations of the irreducible representations. We will provide detailed analysis across frequency channels in the future. Spectral properties for SO(3): Related to the application in cryo-EM image analysis, we assume that the data points xi are uniformly distributed over SO(3) according to the Haar measure. The base manifold characterized by the viewing directions vi s is a unit two sphere S2 and the pairwise alignment group is SO(2). Then eıkαij approximates the local parallel transport operator from Tvj S2 to Tvi S2, whenever xi and xj have similar viewing directions vi and vj that satisfy vi, vj 1 h, where h characterizes the size of the small spherical cap of the neighborhood. The matrices W clean k approximate the local parallel transport operators P (k) h , which are integral operators over SO(3). We have the following spectral properties for the integral operators, Theorem 1 The operator P (k) h has a discrete spectrum λk l (h), l N, with multiplicities equal to 2(l + k) 1, for every h (0, 2]. Moreover, in the regime h 1, the eigenvalue λ(k) l (h) has the asymptotic expansion λ(k) l (h) = 1 2h k + (l 1)(l + 2k) 8 h2 + O(h3). (22) The proof of Theorem 1 is detailed in the Appendix A.1 of (Gao et al., 2019b). Each eigenvalue λ(k) l (h), as a function of h, is a polynomial of degree l + k. This extends Theorem 3 in (Hadani & Singer, 2011) to frequencies k > 1. The multiplicities of the eigenvalues can be seen in the last column of Fig. 3 and Fig. 11. A direct consequence of Theorem 1 is that the top spectral gap of P (k) h for small h > 0 can be explicitly obtained. When h 1, the top spectral gap is G(k)(h) 1+k 4 h2, which increases with the angular frequency. If we use top mk = 2k + 1 eigenvectors for the frequency-k-VDM, then from a perturbation analysis perspective, it is well known (see e.g. (Rohe et al., 2011; Eldridge et al., 2018; Fan et al., 2018) and the references therein) that the stability of the eigenmaps essentially depends on the top spectral gap. Therefore, we are able to jointly achieve more robust embedding and nearest neighbor search under high level of noise or a large number of outliers. Moreover, we are not restricted to use only top 2k + 1 eigenvectors and incorporating more eigenvectors can improve the results (Singer et al., 2011). 5. Experiments 5.1. Synthetic examples on 2 dimensional sphere and torus We test MFVDM on two synthetic examples: 2-D sphere S2 and torus T2. For the first example, we simulate n = 104 points xi uniformly distributed over SO(3) according to the Haar measure. Each xi can be represented by a 3 3 orthogonal matrix Ri whose determinant is equal to 1. The third column of the rotation matrices Ri (denoted as vi) forms a point on the manifold S2, S2 = {v R3 : v = 1}. (23) The pairwise alignment αij is computed based on (2). The hairy ball theorem (Milnor, 1978) says that a continuous tangent vector field to the two dimensional sphere must vanish at some points on the sphere, therefore, we cannot identify αi [0, 2π) for i = 1, . . . , n, such that αij = αi αj, for all i and j. As a result, we cannot globally align the tangent vectors. For the torus, we sample n = 104 points uniformly distributed on the manifold, which are embedded in three dimensional space according to, x = (R + r cos u) cos v, y = (R + r cos u) sin v, z = r sin u, (24) where R = 1, r = 0.2 and (u, v) [0, 2π) [0, 2π), and for each node i we assign an angle αi that is uniformly distributed in [0, 2π), due to the existence of a continuous vector field, we set the pairwise alignment αij = αi αj. For both examples, we connect each node with its top 150 nearest neighbors based on their geodesic distances on the base manifold, then noise is added on edges following the random graph model described in Sec. 4 with parameter p. Finally, we build the affinity matrix Wk by setting weights wij 1 (i, j) E, with k = 1, 2, . . . , kmax. Parameter setting: For MFVDM, we set the maximum frequency kmax = 50 and for each k, we select top mk = 50 eigenvectors. For VDM and DM, we set the number of eigenvectors to be m = 50. In addition, we set random walk Multi-Frequency Vector Diffusion Maps p = 1 p = 0.4 p = 0.2 0 10 20 30 0 0 10 20 30 0.55 0 10 20 30 0.76 0 10 20 30 0 0 10 20 30 0.55 0 10 20 30 0.76 0 10 20 30 0 0 10 20 30 0.55 0 10 20 30 0.76 Figure 3. S2 case: Bar plots of the 30 smallest eigenvalues 1 λ(k) of the graph connection Laplacian I Sk on S2 for different p s and k s. t = 1 t = 10 t = 100 Figure 4. S2 case: The normalized d MFVDM,t, d VDM,t, and d DM,t between a reference point (marked in red) and other points, with t = 1, 10, and 100, p = 1. step size t = 1. Spectral property on S2: We numerically verify the spectrum of graph connection Laplacian I Sk on S2 for different k and random rewiring parameter p. Smaller p indicates more edges are corrupted by noise. Fig. 3 shows that the multiplicities of Sk (normalized Wk matrix) agree with Theorem 1. The spectral gaps persist even when 80% of the edges are corrupted (see the right column of Fig. 3). Multi-frequency vector diffusion distances on S2: Based on (16), Fig. 4 displays the normalized and truncated multifrequency vector diffusion distances d2 MFVDM,t(i, j), vector diffusion distances d2 VDM,t(i, j), and diffusion distances d2 DM,t(i, j) between a reference point (marked in red) and others, on S2 at p = 1 (clean graph). Moreover, we increase the diffusion step size t from t = 1 to t = 10 and 100. In this clean case, all three distances are highly correlated to the geodesic distance. Specifically, MFVDM and VDM perform similarly. To demonstrate the robustness to noise of d MFVDM,t, we compare d MFVDM,t, d VDM,t, and d DM,t against the geodesic distance on S2 in Fig. 5 at different noise levels. When p = 1, all the distances are highly correlated with the geodesic distance, e.g., small d MFVDM,t, d VDM,t, and d DM,t all corre- p = 1 p = 0.4 p = 0.2 Geodesic distance Geodesic distance Geodesic distance Geodesic distance Geodesic distance Geodesic distance Geodesic distance Geodesic distance Geodesic distance Figure 5. S2 case: Scatter plots comparing the normalized d MFVDM,t, d VDM,t, and d DM,t at p = 0.2, 0.4, and 1. spond to small geodesic distance. However at high noise level as p = 0.4 or 0.2, both d VDM,t and d DM,t become more scattered, while d MFVDM,t remains correlated with the geodesic distance. Here the random walk steps t = 10 and the results are similar for t = 1 or 100. Nearest neighbor search and rotational alignment: We test the nearest neighbor search (NN search) and rotational alignment results on both sphere and torus, with different noise levels p. As mentioned, one advantage of MFVDM is its robustness to noise. Even at a high noise level, the true affinity between nearest neighbors can still be preserved. In our experiments, for each node we identify its κ = 50 nearest neighbors. We evaluate the NN search by the geodesic distance between each node and its nearest neighbors. A better method should find more neighbors with geodesic distance close to 0. In the top rows of Fig. 6 and Fig. 7 we show the histograms of such geodesic distance. Note that in the low noise regime (p 0.2), MFVDM, VDM and DM all perform well and MFVDM is slightly better. When the noise level increases to p = 0.1, both VDM and DM have poor result while MFVDM still works well. These comparisons show MFVDM, which benefits from multiple irreducible representations, is very robust to noise. We evaluate the rotational alignment estimation by computing the alignment errors αij ˆαij for all pairs of nearest neighbors (i, j), where αij is the ground truth and ˆαij is the estimation. In the bottom rows of Fig. 6 and Fig. 7, we show the histograms of such alignment errors. The results demonstrate that for a wide range of p, i.e., p 0.1, the MFVDM alignment errors are closer to 0 than the baseline VDM. At p = 0.08, the VDM errors disperse between 0 to 180 degrees, whereas a large number of the alignment errors of MFVDM are still close to 0. At each frequency k, we individually perform NN search based on frequency-k-VDM and the corresponding affinity in (12). For the S2 example, we find that all single frequency mappings achieve similar accuracies when mk s are identical (see Fig. 8). MFVDM combines those weak single Multi-Frequency Vector Diffusion Maps p = 0.2 p = 0.1 p = 0.08 0 50 100 150 0 Angle between viewing directions Proportion of neighbors MFVDM VDM DM 0 50 100 150 0 Angle between viewing directions Proportion of neighbors MFVDM VDM DM 0 50 100 150 0 Angle between viewing directions Proportion of neighbors MFVDM VDM DM -100 -50 0 50 100 Error of rotational alignment Propotion of neighbors -100 -50 0 50 100 Error of rotational alignment Propotion of neighbors -100 0 100 Error of rotational alignment Propotion of neighbors Figure 6. S2 case: Top: histograms of the viewing direction difference between nearest neighbors found by MFVDM, VDM and DM; Bottom: the accuracy of the rotational alignment estimated by MFVDM and VDM. p = 0.2 p = 0.1 p = 0.08 0 1 2 3 Geodesic distance Propotion of neighbors MFVDM VDM DM 0 1 2 3 Geodesic distance Propotion of neighbors MFVDM VDM DM 0 1 2 3 Geodesic distance Propotion of neighbors MFVDM VDM DM -100 -50 0 50 100 Error of rotational alignment Propotion of neighbors -100 -50 0 50 100 Error of rotational alignment Propotion of neighbors -100 0 100 Error of rotational alignment Propotion of neighbors Figure 7. T2 case: Top: histograms of the geodesic distances between nearest neighbors identified by MFVDM, VDM and DM; Bottom: the accuracy of the rotational alignment estimated by MFVDM and VDM. p = 0.2 p = 0.1 p = 0.08 0 50 100 150 0 Angle between viewing directions Proportion of neighbors 0 50 100 150 0 Angle between viewing directions Proportion of neighbors 0 50 100 150 0 Angle between viewing directions Proportion of neighbors Figure 8. S2 case: Weak classifier versus strong classifier: histograms of the angles between nearest neighbors found by using single frequency-k-VDM (weak classifier) and MFVDM (all) with k = 1, . . . , kmax (strong classifier, shown as all ). Here kmax = 10. frequency classifiers into a strong classifier to boost the accuracy of nearest neighbor search Choice of parameters: The performance of MFVDM depends on two parameters: the maximum frequency cutoff kmax and the number of top eigenvectors mk. We assume that mk s are the same for all frequencies, that is m1 = m2 = = mkmax = mk. In the top row of Fig. 9, we show the average geodesic distances between the nearest neighbor pairs identified by MFVDM, with different values of kmax and mk. First, we fix mk = 50 and vary kmax. The performance of MFVDM improves with increasing kmax and plateaus when kmax approaches 50 (see the upper left panel of Fig. 9). Then we fix kmax = 10 and vary mk. The upper right panel of Fig. 9 shows that choosing mk = 50 achieves the best performance. Using a larger number of eigenvectors, i.e. mk = 100, does not lead to higher accu- Figure 9. S2 case: Nearest neighbor search by MFVDM, VDM and DM under varying parameters: maximum frequency kmax and the number of eigenvectors mk. Upper left: MFVDM with varying kmax and mk = 50; Upper right: MFVDM with varying mk and kmax = 10; Lower left: VDM with varying m (kmax = 1); Lower right: DM with varying m. Horizontal axis: the value of the parameter p in the random graph model, lower p means larger number of outliers in the edge connections. Vertical axis: the average geodesic distances of nearest neighbors pairs (lower is better). racy in nearest neighbor search, because the eigenvectors of Sk with small eigenvalues are more sensitive to noise and including them will reduce the robustness to noise of the mappings. In addition, we evaluate the performance of VDM and DM under varying number of eigenvectors m in the bottom row of Fig. 9. VDM and DM also achieve the best performance at m = 50. Comparing the upper left and lower left panels of Fig. 9, we find that MFVDM greatly improves the nearest neighbor search accuracy of VDM when 90% of the true edges are rewired. Note that the solid blue line in the upper left panel of Fig. 9 corresponds to the best performance curve in the lower left panel of Fig. 9 (green line with m = 50). 5.2. Application: Cryo-EM 2-D image analysis MFVDM is motivated by the cryo-EM 2-D class averaging problem. In the experiments, protein samples are frozen in a very thin ice layer. Each image is a tomographic projection of the protein density map at an unknown random orientation. It is associated with a 3 3 rotation matrix Ri, where the third column of Ri indicates the projection direction vi, which can be realized by a point on S2. Projection images Ii and Ij that share the same views look the same up to some in-plane rotation. The goal is to identify images with similar views, then perform local rotational alignment and averaging to denoise the image. Therefore, MFVDM is suitable to perform the nearest neighbor search and rotational alignment estimation. In our experiment, we simulate n = 104 projection images from a 3-D electron density map of the 70S ribosome (see Fig. 10), the orientations for the projection images are uniformly distributed over SO(3) and the images are Multi-Frequency Vector Diffusion Maps Reference volume Clean projection SNR = 0.05 Figure 10. Cryo-EM 2-D image analysis: Left: Reference volume of 70S ribosome; Mid: Clean projection images; Right: Noisy projection images at SNR= 0.05. MFVDM SGL k = 1 k = 4 k = 1 k = 4 0 10 20 30 0 0 10 20 30 0 0 10 20 30 0 0 10 20 30 0 0 10 20 30 0 0 10 20 30 0 0 10 20 30 0 0 10 20 30 0 Figure 11. Bar plots of the 30 smallest eigenvalues of the graph connection Laplacian I Sk that build upon the initial NN search and alignment results on cryo-EM images (MFVDM) and the corresponding eigenvalues of the steerable graph Laplacian (SGL) in (Landa & Shkolnisky, 2018). contaminated by additive white Gaussian noise at signal to noise ratio (SNR) equal to 0.05. Note such high noise level is commonly observed in real experiments. In Fig. 10, we display samples of such clean and noisy images. As preprocessing, we use fast steerable PCA (s PCA) (Zhao et al., 2016) and rotationally invariant features (Zhao & Singer, 2014) to initially identify the images of similar views and the in-plane rotational alignment angles according to (Zhao & Singer, 2014). Then we take the initial graph structure and estimation of the optimal alignments as the input of MFVDM. As a comparison to (4), we introduce another kernel introduced in steerable graph Laplacian (SGL) (Landa & Shkolnisky, 2018), which is defined on images including all rotated versions. Then similar to the synthetic examples, in Fig. 11 we present the spectrum of the graph connection Laplacian I Sk. The spectral gap clearly exists at both clean and noisy cases. Fig. 12 shows the NN search and rotational alignment results. We set t = 10, kmax = 10, mk = 10, and m = 10 for MFVDM, VDM, and DM respectively. Although using SGL kernel achieves slightly better NN search, its performance on alignment estimation is worse than MFVDM. 6. Discussion In the current probabilistic model, we only consider independent edge noise, i.e., the entries in Rk for a fixed k are independent. This does not cover the measurement scenarios in some applications. For example, in cryo-EM 2-D image analysis, each image is corrupted by independent noise. Therefore, the entries in Rk become dependent since the edge connections and alignments are affected by the noise in each image node. Empirically, our new algorithm is still applicable and results in the improved nearest Nearest neighbor search Rotational alignment 0 10 20 30 40 50 Angle between viewing directions Propotion of neighbors MFVDM SGL VDM DM s PCA -30 -20 -10 0 10 20 30 Error of rotational alignment Propotion of neighbors MFVDM SGL VDM s PCA Figure 12. Nearest neighbor search and rotational alignment results for simulated cryo-EM images of 70S ribosome with SNR = 0.05. Left: distribution of the angles between nearest neighbors; Right: Rotational alignment accuracy. See text for the methods description. neighbor search and rotational alignment estimation compared to the state-of-the-art VDM. We leave the analysis of node level noise to future work. In addition, there can be other approaches to define the multi-frequency mapping, such as weighted average among different frequencies or majority voting. We will explore other ways to integrate multi-frequency information in the future. The current analysis focuses on data points that are uniformly distributed on the manifold. For non-uniformly distributed points, different normalization techniques introduced in DM(Coifman & Lafon, 2006) and (Zelnik-Manor & Perona, 2005) are needed to compensate the sampling density. Since our framework is motivated by the cryo-EM nearest neighbor image search and alignment, we have so far only considered the compact manifold M where the intrinsic dimension is 2 and the local parallel transport operator can be well approximated by the in-plane rotational alignment of the images or the alignment of the local tangent bundles as discussed in VDM (Singer & Wu, 2012). In the future, we will extend the current algorithm to manifolds with higher intrinsic dimension and other compact group alignments g G with their corresponding irreducible representations ρk(g), for example, the symmetric group which is widely used in computer vision (Bajaj et al., 2018). 7. Conclusion In this paper, we have introduced MFVDM for joint nearest neighbor search and rotational alignment estimation. The key idea is to extend VDM using multiple irreducible representations of the compact Lie group. Enforcing the consistency of the rotational transformations at different frequencies allows us to achieve better nearest neighbor identification and accurately estimate the alignments between the updated nearest neighbor pairs. The approach is based on spectral decomposition of multiple kernel matrices. We use the random matrix theory and the rationale of ensemble methods to justify the robustness of MFVDM. Experimental results show efficacy of our approach compared to the state-of-the-art methods. This general framework can be applied to many other problems, such as joint synchronization and clustering (Gao et al., 2019a) and multi-frame alignment in computer vision. Multi-Frequency Vector Diffusion Maps Bajaj, C., Gao, T., He, Z., Huang, Q., and Liang, Z. SMAC: Simultaneous mapping and clustering using spectral decompositions. In International Conference on Machine Learning, pp. 334 343, 2018. Belkin, M. and Niyogi, P. Laplacian eigenmaps and spectral techniques for embedding and clustering. In NIPS, 2002. Belkin, M. and Niyogi, P. Laplacian eigenmaps for dimensionality reduction and data representation. Neural computation, 2003. Belkin, M., Niyogi, P., and Sindhwani, V. Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. Journal of machine learning research, 7(Nov):2399 2434, 2006. Coifman, R. R. and Lafon, S. Diffusion maps. Applied and computational harmonic analysis, 21(1):5 30, 2006. Dashti, A., Schwander, P., Langlois, R., Fung, R., Li, W., Hosseinizadeh, A., Liao, H. Y., Pallesen, J., Sharma, G., Stupina, V. A., et al. Trajectories of the ribosome as a brownian nanomachine. Proceedings of the National Academy of Sciences, 111(49):17492 17497, 2014. Donoho, D. L. and Grimes, C. Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data. Proceedings of the National Academy of Sciences, 100 (10):5591 5596, 2003. Eldridge, J., Belkin, M., and Wang, Y. Unperturbed: spectral analysis beyond Davis-Kahan. In Algorithmic Learning Theory, pp. 321 358, 2018. Fan, J., Wang, W., and Zhong, Y. An ℓ eigenvector perturbation bound and its application. Journal of Machine Learning Research, 18(207):1 42, 2018. Gao, T. and Zhao, Z. Multi-frequency phase synchronization. In Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pp. 2132 2141, 2019. Gao, T., Brodzki, J., and Mukherjee, S. The geometry of synchronization problems and learning group actions. Discrete & Computational Geometry, 2019a. Gao, T., Fan, Y., and Zhao, Z. Representation theoretic patterns in multi-frequency class averaging for threedimensional cryo-electron microscopy. ar Xiv preprint ar Xiv:1906.01082, 2019b. Giannakis, D., Schwander, P., and Ourmazd, A. The symmetries of image formation by scattering. I. theoretical framework. Optics express, 20(12):12799 12826, 2012. Goldberg, A., Zhu, X., Singh, A., Xu, Z., and Nowak, R. Multi-manifold semi-supervised learning. In Artificial Intelligence and Statistics, pp. 169 176, 2009. Gong, D., Sha, F., and Medioni, G. Locally linear denoising on image manifolds. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 265 272, 2010. Hadani, R. and Singer, A. Representation theoretic patterns in three-dimensional cryo-electron microscopy IIthe class averaging problem. Foundations of Computational Mathematics, 11(5):589 616, 2011. Khorunzhiy, O. Rooted trees and moments of large sparse random matrices. In Discrete Mathematics and Theoretical Computer Science, pp. 145 154. Discrete Mathematics and Theoretical Computer Science, 2003. Khorunzhy, A. Sparse random matrices: spectral edge and statistics of rooted trees. Advances in Applied Probability, 33(1):124 140, 2001. Landa, B. and Shkolnisky, Y. The steerable graph laplacian and its application to filtering image datasets. SIAM Journal on Imaging Sciences, 11(4):2254 2304, 2018. Milnor, J. Analytic proofs of the hairy ball theorem and the brouwer fixed point theorem. The American Mathematical Monthly, 85(7):521 524, 1978. Nadler, B., Lafon, S., Kevrekidis, I., and Coifman, R. R. Diffusion maps, spectral clustering and eigenfunctions of Fokker-Planck operators. In Advances in neural information processing systems, pp. 955 962, 2006. Rohe, K., Chatterjee, S., and Yu, B. Spectral clustering and the high-dimensional stochastic block model. The Annals of Statistics, 39(4):1878 1915, 2011. Roweis, S. T. and Saul, L. K. Nonlinear dimensionality reduction by locally linear embedding. Science, 290 (5500):2323 2326, 2000. Schwander, P., Giannakis, D., Yoon, C. H., and Ourmazd, A. The symmetries of image formation by scattering. II. applications. Optics express, 20(12):12827 12849, 2012. Singer, A. and Wu, H.-T. Vector Diffusion Maps and the Connection Laplacian. Communications on Pure and Applied Mathematics, 65(8):1067 1144, 2012. Singer, A., Shkolnisky, Y., and Nadler, B. Diffusion interpretation of nonlocal neighborhood filters for signal denoising. SIAM Journal on Imaging Sciences, 2(1):118 139, 2009. Singer, A., Zhao, Z., Shkolnisky, Y., and Hadani, R. Viewing angle classification of cryo-electron microscopy images using eigenvectors. SIAM Journal on Imaging Sciences, 4(2):723 759, 2011. Tenenbaum, J. B., De Silva, V., and Langford, J. C. A global geometric framework for nonlinear dimensionality reduction. Science, 2000. Multi-Frequency Vector Diffusion Maps Von Luxburg, U. A tutorial on spectral clustering. Statistics and computing, 2007. Watts, D. J. and Strogatz, S. H. Collective dynamics of small-world networks. Nature, 393(6684):440, 1998. Yang, Z., Cohen, W., and Salakhudinov, R. Revisiting semi-supervised learning with graph embeddings. In International Conference on Machine Learning, pp. 40 48, 2016. Zelnik-Manor, L. and Perona, P. Self-tuning spectral clustering. In Advances in neural information processing systems, pp. 1601 1608, 2005. Zhao, Z. and Singer, A. Rotationally invariant image representation for viewing direction classification in cryo-EM. Journal of structural biology, 186(1):153 166, 2014. Zhao, Z., Shkolnisky, Y., and Singer, A. Fast steerable principal component analysis. IEEE transactions on computational imaging, 2(1):1 12, 2016. Zhou, Z.-H. Ensemble methods: foundations and algorithms. Chapman and Hall/CRC, 2012. Zhu, X. Semi-supervised learning literature survey. Computer Science, University of Wisconsin-Madison, 2(3):4, 2006.