# multifrequency_phase_synchronization__9222de0a.pdf Multi-Frequency Phase Synchronization Tingran Gao * 1 Zhizhen Zhao * 2 Abstract We propose a novel formulation for phase synchronization the statistical problem of jointly estimating alignment angles from noisy pairwise comparisons as a nonconvex optimization problem that enforces consistency among the pairwise comparisons in multiple frequency channels. Inspired by harmonic retrieval in signal processing, we develop a simple yet efficient two-stage algorithm that leverages the multi-frequency information. We demonstrate in theory and practice that the proposed algorithm significantly outperforms state-of-the-art phase synchronization algorithms, at a mild computational costs incurred by using the extra frequency channels. We also extend our algorithmic framework to general synchronization problems over compact Lie groups. 1. Introduction Angular or phase synchronization (Singer, 2011; Boumal, 2016) concerns estimating angles θ1, . . . , θn in [0, 2π) from a subset of possibly noise-contaminated relative offsets (θi θj) mod 2π. An instance of phase synchronization can be encoded on an observation graph G = (V, E), where each angle is assigned to a vertex i V and relative offsets are measured between θi and θj if and only if there is an edge in G connecting vertices i and j. Equivalently, the angles can be encoded into a column phase vector z = (exp ιθ1, , exp ιθn) , and measurements constitute a Hermitian matrix H = A [zz + ] , (1) where A is the adjacency matrix of the observation graph G, is the entrywise product, and the Hermitian matrix *Equal contribution 1Committee on Computational and Applied Mathematics, Department of Statistics, University of Chicago, Chicago IL, USA 2Department of Electrical and Computer Engineering, Coordinated Science Laboratory, University of Illinois at Urbana-Champaign, Urbana IL, USA. Correspondence to: Tingran Gao , Zhizhen Zhao . Proceedings of the 36 th International Conference on Machine Learning, Long Beach, California, PMLR 97, 2019. Copyright 2019 by the author(s). Cn n encodes measurement noise. As a prototypical example of more general synchronization problems arising from many scientific fields concerning consistent pairwise comparisons within large collections of objects (e.g., cryogenic electron microscopy (Singer et al., 2011) and comparative biology (Gao et al., 2019a)), phase synchronization attracted much attention due to its simple yet rich mathematical structure. One mathematical formulation is through nonconvex optimization max x Cn 1 x Hx (2) where Cn 1 is the Cartesian product of n copies of U(1). Depending on the context of the scientific problem, H may be assumed to arise from an additive Gaussian noise model (Boumal, 2016; Bandeira et al., 2017), in which the Hermitian matrix in (1) is a Wigner matrix with i.i.d. complex Gaussian entries above the diagonal, or from a random corruption model (Singer, 2011; Chen et al., 2016) that assumes ( zi zj with prob. r [0, 1] w Unif (U(1)) with prob. 1 r (3) for each edge (i, j) E. Note that the random corruption model can also be cast in the form (1) after proper shifting and scaling. In general, the additive Gaussian noise model is more amenable to analysis, while the random corruption model is better at capturing the behavior of physical or imaging models where many outliers exist. In this paper, we propose to tackle the phase synchronization problem by solving an alternative nonconvex optimization problem of multi-frequency nature, namely, k=1 (xk) H(k)xk (4) where kmax is the number of frequency channels, xk is the entrywise kth power of x, and H(k) Cn n is a Hermitian matrix containing information of the true signal z in the kth frequency component: For the random corruption model (3), we construct H(k) directly from H by entrywise power: H(k) ij = Hk ij, k = 1, . . . , kmax; 1 i, j n. (5) Multi-Frequency Phase Synchronization For the additive Gaussian noise model, following (Bandeira et al., 2015; Perry et al., 2018), we assume H(k) = A [zk (z )k + σk (k)], (6) where each (k) is a complex Hermitian random matrix with independent upper diagonal entries, and the scaling σk is chosen such that the operator norm of (k) is upper bounded by n. Unlike (Bandeira et al., 2015; Perry et al., 2018), we allow entries of (k) to be general sub-Gaussian random variables rather than restrictively complex Gaussian, and we do not assume independence of the (k) s across different k s. We treat the two types of noise (5) (6) in a unified model, under which we design and analyze our multi-frequency phase synchronization algorithm. We demonstrate surprising theoretical and empirical results that drastically outperform all existing phase synchronization algorithms in their corresponding settings, measured in terms of the correlation between the output and the true phase vector z, at a mild increase in the computational cost incurred by parallelizing the computation in kmax frequency channels. As will be demonstrated in Section 4, in the noise regime where phase synchronization is tractable, the number of frequencies kmax needed to outperform single frequency algorithms is at most polylogarithmically dependent on the problem size n, while the estimation error decays polynomially in kmax. Motivation The rationale behind the multi-frequency formulation (4) lies at the observation that statistical estimation can often benefit from higher moment estimates, even without introducing new measurements. As a motivating example, let G be a complete graph, and consider the following kmax = 2 coupled problems: max x Cn 1 x H(1)x; max x Cn 1 x2 H(2)x2 (7) where H(1) = H, and H(2) is generated according to (5). Up to rescaling by a factor of 1/r, H(1) and H(2) fit into model (6) with σk (k) ij = ( (1 r) zi zj with prob. r [0, 1] eιkϕij rzi zj with prob. 1 r (8) where ϕij are i.i.d. uniform on R/2π for (i, j) E, and ϕji = ϕij. Note that σ1 (1) and σ2 (2) are by no means independent, but for all practical purposes satisfy the same sub-Gaussian bounds since eιϕij and eι2ϕij are identically distributed; we thus assume without loss of generality that σ1 = σ2. If we can find ˆx Cn 1 satisfying jointly (ˆxk) H(k)ˆxk (zk) H(k)zk k = 1, 2 (9) then, by Lemma 1 of (Boumal, 2016) (assuming without loss of generality that ˆx z = |ˆx z|), we have for k = 1, 2 ˆxk zk 2 2 = 2(n |z ˆx|k) 16σ2 (k) 2 2/n, (10) which gives |z ˆx| maxk=1,2 n n 8σ2 (k) 2 2/n 1 k o , a tighter bound than one could obtain from (9) with k = 1 alone, especially for large σ (with n 8σ2 (k) 2 2/n < 1). The lesson we learn from this motivating example is that statistical estimation can benefit from leveraging higher-order moment information, even when the moment measurements are not essentially independent of each other. This is particularly prominent for the random corruption model, where all the higher-order trigonometric moments in H(k), k > 1 come from the first moments in H = H(1) by taking entrywise powers. In drastic contrast is the message-passing algorithm in (Perry et al., 2018), for which independence of the complex Gaussian Wigner noise (k) s across the frequency channels play an essential role. The AMP approach was motivated by the non-unique games (NUG) framework in (Bandeira et al., 2015). Our algorithm follows an efficient two-stage paradigm (initialization and iterative refinement) popularized by recent progress in nonconvex optimization (see, e.g. (Candes et al., 2015; Chen & Candes, 2015)), and combines the trigonometric moments information across frequency channels in a manner akin to classical harmonic retrieval techniques in signal processing (Stoica & Moses, 1997; Tufts & Kumaresan, 1982; Bresler & Macovski, 1986; Ziskind & Wax, 1988; Schmidt, 1986; Roy & Kailath, 1989; Sorensen & De Lathauwer, 2017a;b) and the generalized power method (Boumal, 2016). This strategy easily extends to synchronization over general compact Lie groups, as illustrated in Section 5. Notations Upper case letters A, B, C, and lower case letters a, b, c, will be used to denote matrices and vectors, respectively. A , A are the transpose of A with or without conjugation, respectively. The entrywise (Hadamard) product of matrix A and B will be denoted as A B. Graphs G = (V, E) are always undirected and connected. Vertices of the graph will be denoted as integers 1, 2, , |V |; pairs of integers (i, j) denote edges in E. For n N we write [n] := {1, , n}. Norms 2, stand for matrix or vector norms, depending on the context; op, F are matrix operator and Frobenius norms, respectively. The Cartesian product of n copies of U(1) is denoted as Cn 1. The quotient space R/2π is identified with the unit circle. 2. Related Work Phase synchronization Directly solving (2) is NP-hard (Zhang & Huang, 2006), but many convex and nonconvex methods have been proposed to find high quality approximate solutions. These include spectral and semi-definite programming (SDP) relaxations (Singer, 2011; Cucuringu et al., 2012; Chaudhury et al., 2015; Bandeira et al., 2016; 2017). An alternative approach using generalized power method (GPM) is also studied (Boumal, 2016; Liu et al., 2017; Zhong & Boumal, 2018). Multi-Frequency Phase Synchronization Phase synchronization in multiple frequency channels (Bandeira et al., 2015) proposed the non-unique games (NUG) SDP optimization framework for synchronization over compact Lie groups. The SDP is based on quadratically lifting the irreducible representations of the group elements, and imposing consistency among variables across frequency channels via a F ejer kernel; it is computationally expensive. (Perry et al., 2018) introduced an iterative approximate message passing (AMP) algorithm for noise model (6), assuming the noise are Gaussian and independent across frequency channels. Each iteration of the AMP performs matrix-vector multiplication and entrywise nonlinear transformation, followed by an extra Onsager correction term; it is conjectured to be asymptotically optimal. The multifrequency methodology has also been applied to cryo-EM image analysis (Fan & Zhao, 2019a;b; Gao et al., 2019b). 3. Algorithm In this section we formally state the two-stage multifrequency phase synchronization algorithmic paradigm. Stage One combines phase synchronization outcomes from individual frequency channels with harmonic retrieval, aiming at producing a high-quality initialization; Stage Two iteratively refines an input by an extended generalized power method that works concurrently in multiple frequency channels while striving to maintain entrywise consistency. 3.1. Stage One: Initialization Strategy Our algorithm takes as input kmax Hermitian measurement matrices H(k), k = 1, . . . , kmax, arising from the general sub-Gaussian model (6) (which includes (5) as a special case). This stage can be divided into three steps. Step 1. Individual Frequency Synchronization: Apply any phase synchronization algorithm (spectral/SDP relaxation or GPM) to get phase vector estimate u(k) Cn from each H(k), k = 1, , kmax, and form W (k) := u(k)(u(k)) ; Step 2. Entrywise Harmonic Retrieval: For each (i, j) E, use any harmonic retrieval technique to estimate θi θj from W (k) ij , k = 1, 2, , kmax, call the estimators ˆθij; Step 3. Final Phase Synchronization: Construct another Hermitian matrix b H Cn n by b Hij := eιˆθij, and apply any phase synchronization algorithm to estimate the true phases {eιθ1, , eιθn} from matrix b H. The flexibility of the multi-frequency phase synchronization framework lies at the various choices to be made in each step. As a concrete example, we detail in Algorithm 1 a simple version that uses spectral relaxation for phase synchronization and periodogram-based harmonic retrieval. We will henceforth refer to Algorithm 1 as the periodogram peak extraction with spectral methods (PPE-SPC). If a different Algorithm 1 Periodogram Peak Extraction with Spectral Methods (PPE-SPC) Input: Hermitian matrices H(k) | 1 k kmax Output: Initialization ˆx Cn 1 Step 1: Individual Frequency Synchronization for k = 1 to kmax do Extract the leading eigenvector u(k) of H(k) with scaling u(k) 2 = n W (k) u(k) u(k) end for Step 2: Entrywise Harmonic Retrieval for i = 1 to n do ˆθij argmax φ R/2π k=1 W (k) ij e ιkφ ) end for Step 3: Final Phase Synchronization Construct Hermitian b H Cn n by b Hij = eιˆθij Extract the leading eigenvector ˆu = (ˆu1, . . . , ˆun) of b H ˆx (ˆu1/|ˆu1|, . . . , ˆun/|ˆun|) phase synchronization method is used, for instance, SDP relaxation, our nomenclature refers to it as PPE-SDP. We will focus on analyzing PPE-SPC in depth in Section 4, but the analysis strategy can be seamlessly carried in principle to other variants of this algorithmic paradigm. We briefly motivate the argmax operation in Step 2 as follows. If our measurement matrices are noise-free, then the (i, j)th entry of W (k) from Step 1 should equal to eιk(θi θj); in this case, the goal of Step 2 is to reconstruct (θi θj) from its trigonometric moments, for which any harmonic retrieval technique can be applied; the periodogram method in Algorithm 1 is among the most na ıve approach for this purpose. For clean signal, the periodogram |Re{Pkmax k=1 W (k) ij e ιkφ}| is equal to the modulus of the Dirichlet kernel Dirkmax (θij θ) := k= kmax eιk(θij θ) (11) which attains its maximum at θij = θi θj (mod2π). Since the peak of Dirkmax becomes sharper and sharper as kmax increases, we expect the periodogram peak identification step to be robust to noise, which will produce a very high quality estimate b H for Step 3. In fact, our analysis in Section 4 suggests that this initialization stage alone can produce highly accurate phase vectors for sufficiently large kmax, and the estimation error drops inverse-polynomially in kmax. 3.2. Stage Two: Iterative Refinement In this stage, we use an iterative refinement scheme that takes an initial phase vector and enhances it successively. Multi-Frequency Phase Synchronization In our implementation we warm-start this iterative algorithm with the ˆx produced from the PPE-SPC Algorithm 1, but any initialization scheme can be applied in principle, including random initialization. This iterative refinement concurrently performs the generalized power method (GPM) (Boumal, 2016) in multiple frequency channels consistently: at each frequency k, we perform power iteration by multiplication with H(k); the results are combined across frequency channels to obtain one periodogram for each vertex i followed by a soft harmonic retrieval step that soft-thresholds (Donoho, 1995) the periodogram in frequency domain. We pick a relatively lower threshold at the beginning of this iterative scheme, but gradually raise the threshold over 0.99 to reveal the true peak that persists. Details can be found in Algorithm 2, henceforth referred to as multi-frequency generalized power method (MFGPM). Algorithm 2 Multi-Frequency Generalized Power Method (MFGPM) Input: Hermitian matrices H(k) | 1 k kmax , initialization ˆx Cn 1; threshold τ Output: Phase vector ˆz Cn 1 for k = 1 to kmax do end for for t = 1 to T do for k = 1 to kmax do u(k,t) H(k)z(k,t 1). end for for i = 1 to n do h(t) i (θ) = Re k=1 u(k,t) i eıkθ ) Soft-thresholding: ˆu(k,t) i R 2π 0 ητ(h(t) i (θ))e ıkθdθ Rounding: z(k,t) i ˆu(k,t) i |ˆu(k,t) i | end for end for MFGPM can be viewed as an iterative version of PPE-SPC, except that the stringent peak extraction step is replaced with the more malleable soft-thresholding. Periodograms h(t) i are virtually the Dirichlet kernels, which truncate a Dirac delta function in the frequency domain; one can also take Ces aro means of these periodograms, or equivalently, work with the F ejer kernels that are known to converge faster to the Dirac delta function. We omit those results as no significant difference is observed in performance. As an integral part of our two-stage algorithmic framework, MFGPM works most efficiently with initialization from PPE-SPC, but we also observed empirically that the MFGPM outperforms other methods given identical random initialization, illustrated in Figure 1, in the sense that Figure 1. Evolution of correlations between iterates and the true phase vector z Cn 1 for three iterative algorithms with the same random initialization. The three iterative algorithms in comparison are generalized power method (GPM, (Boumal, 2016)), multifrequency generalized power method (MFGPM, Algorithm 2), and approximate message passing (AMP, (Perry et al., 2018)). The figure illustrates a typical success run (left) where MFGPM started with a random initialization of low correlation (around 0.1) but terminated with an output of correlation > 0.5, along with a failure run (right) started with another random initialization of comparable correlation but terminated with correlation < 0.5. In both circumstances, outputs from MFGPM attain higher correlation than GPM or AMP. Input data are generated from the random corruption model (5) with r = 0.1 and n = 100. See Figure 3 for more systematic comparison results under this noise model. MFGPM often produces phase vectors that correlate more strongly with the true phase vector z. See Section 6 for more comprehensive comparisons results. The computational complexities of PPE-SPC and MFGPM are O kmaxn3 and O Tkmaxn2 , respectively. 4. Analysis In this section we analyze PPE-SPC in theory, under the general sub-Gaussian noise model (6). We assume the observation graph G is generated from a Erd os R enyi model with edge connectivity p [0, 1] independent of the (k) s. Assumption 1 For σ > 0 and each k [kmax], assume H(k) = A [zk zk + σ (k)] (12) where zk Cn 1 is the entrywise kth power of z, and (k), k = 1, . . . , kmax are complex random Wigner matrices satisfying the following assumptions: (1) For any fixed k [kmax], {Re( (k) ℓj ), Im( (k) ℓj ) | 1 ℓ< j n} are jointly independent with zero mean, and unit sub-Gaussian norm (Vershynin, 2018); (2) (k) ii = 0 for all 1 k kmax and 1 i n; (3) (k) ℓj = (k) jℓfor all k = 1, . . . , kmax and ℓ< j. Furthermore, assume A is the adjacency matrix of a Erd os R enyi random graph independent of all the (k) s, with edge connecting probability p [0, 1]. Multi-Frequency Phase Synchronization We emphasize again that Assumption 1 assumes no independence for the (k) s across frequency channels; only entries within the same (k) are assumed independent. As explained in Introduction, this enables us to unify our discussions on the random corruption model and additive Gaussian model in a single pass (see e.g., (8)). Another advantage for such generality is that we can focus on analyzing complete observation graphs, since EH(k) = pzk zk p In where In is the identify matrix of dimension n-by-n, and thus we can apply the theoretical analysis in this section to 1 p H(k) + p In = zk zk + E(k) where n A [zk zk + σ (k)] o zk zk + In satisfies the same conditions as (k) in Assumption 1 with different absolute constants. Therefore, in the rest of this section we focus on complete observation graph G only, i.e., H(k) = zk zk + σ (k), 1 k kmax. (13) Our first goal is to understand the spectral method in PPESPC Step 1 and Step 3. Since Step 2 is entrywise, it is crucial to bound the ℓ distance between zk and the leading eigenvector u(k) (scaled to u(k) 2 = n). The proof of the following Lemma 1 uses recent ℓ perturbation results of eigenvectors of random matrices (Eldridge et al., 2017; Abbe et al., 2017; Fan et al., 2018; Zhong & Boumal, 2018) and can be found in the supplemental material. Lemma 1 Assume Assumption 1 is satisfied, and the observation graph G is a complete graph. Let ϵ (0, 2] be an arbitrarily chosen but fixed absolute constant. For any k [kmax], denote u(k) for the leading eigenvector of H(k) scaled such that u(k) 2 = n and (zk) u(k) = |(zk) u(k)|. There exist absolute (in particular, independent of k and n) constants c0, C0, C2 > 0 such that, if σ < c0 p n/ log n, there holds with probability 1 O n (2+ϵ) u(k) zk C0σ p log n/n, (14) W (k) ij zk i zk j C2σ p log n/n. (15) The inequality (15) is a direct consequence of (14), which is identical to Theorem 8 of (Zhong & Boumal, 2018), but we verify in the proof that the event probability 1 O(n 2) in (Zhong & Boumal, 2018) can be made slightly higher. This is necessary for taking the union bound across all O(n2) entries in the main Theorem 2. A quick consequence of Lemma 1 is the uniform proximity of the periodogram to a Dirichlet kernel up to constant scaling and shifts, with high probability. More specifically, Re k=1 W (k) ij e ιkφ ) 2 [Dirkmax (θi θj φ) 1] W (k) ij zk i zk j e ιkφ 2C2kmaxσ p with probability 1 O n (2+ϵ) . Clearly, the maximum of |Dirkmax (θi θj φ) 1| is attained at θ = θi θj. We thus expect the argmax operation in Step 2 of PPE-SPC to produce high accuracy estimates of θi θj as long as the difference between the optimization landscape of the periodogram and the Dirichlet kernel is small enough. This is formalized in the following lemma, which exploits the geometry of the Dirichlet kernel. Lemma 2 Under the same conditions as in Lemma 1, if [2kmax sin (π/ (2kmax + 1))] 1 + 4C2σ p log n/n < 1 (16) then with probability at least 1 O n (2+ϵ) ˆθij (θi θj) 4π/ (2kmax + 1) . (17) It is straightforward to check that (16) holds for sufficiently large kmax as long as 4C2σ p log n/n is bounded from above by 1 1/π. This can be seen by noticing that the function [2x sin (π/ (2x + 1))] 1 is differentiable and monotonically decreasing for all x 2, and for sufficiently large kmax it infinitesimally approaches 1/π < 1. The most important message from Lemma 2 is the following: At the beginning of the Step 3 of PPE-SPC, the newly constructed Hermitian matrix b H is entrywise O k 1 max close to the ground truth rank-one matrix zz . We emphasize that this error incurred in b H is significantly smaller than the noise level σ in the raw input data, and can be made arbitrarily small by choosing large kmax. We formalize this key observation in the main theorem below, for which the proof is deferred to the supplemental material. Theorem 2 Under the same conditions as Lemma 1 and Lemma 2, if (16) holds and 4c0C2 < 1 2/π, then there exists an absolute constant C3 > 0 such that, with probability 1 O (n ϵ), the correlation between the true phase vector z and the leading eigenvector ˆu (scaled to ˆu 2 = n) of b H in PPE-SPC Step 3 is at least Corr (ˆu, z) 1 C3/k2 max (18) provided that kmax > max 5, 2 π 1 4C2σ p log n/n 2 1 . Moreover, for the phase vector ˆx output from PPE-SPC, Corr (ˆx, z) 1 4C3/k2 max. Following the discussion after Lemma 2, it is not surprising to see in Theorem 2 that the correlation can be Multi-Frequency Phase Synchronization made arbitrarily close to 1 (or equivalently, the ℓ2 distance between the estimated and true phase vectors can be made arbitrarily close to 0). Moreover, it doesn t take excessively large kmax for PPE-SPC to outperform all existing phase synchronization algorithms in fact, for σ O( p n/ log n) which is the highest level of noise tolerable to ensure the validity of Lemma 1, it suffices to take kmax = O ( n/σ) O log n to suppress the ℓ2 estimation error below the established near-optimal bound O (σ) for eigenvector based phase synchronization methods (Bandeira et al., 2017; Zhong & Boumal, 2018). We believe (18) can still be improved by a factor of n by leveraging the randomness in the residue error in (15), but such finer analysis relies on more detailed analysis on the ℓ perturbation and the change in the optimization landscape, which will be pursued in a future work. 5. Extension to General Synchronization The algorithmic framework of multi-frequency phase synchronization proposed in this paper can be extended to synchronization over any compact Lie group G, by the representation-theoretic analogue of Fourier series the Peter Weyl decomposition. In a nutshell, the Peter Weyl theorem states that, for square integrable functions f L2 (G), we have decomposition k=0 dktr ˆf(k)ρk(g) (19) where each ρk : G Cdk dk is an irreducible, unitary representation of G, and ˆf (k) is the Fourier coefficient G f(g)ρk(g) dg, (20) where the integral is take with respect to the Haar measure. On a connected observation graph G, the input data to a synchronization problem over group G are pairwise measurements gij G on edges (i, j) E satisfying gij = g 1 ji . The goal is to find n group elements g1, . . . , gn G, one for each vertex, that satisfy as many constraints gij = gig 1 j as possible. Mathematically, this type of problems can often be formulated as an optimization problem (Bandeira et al., 2015) min g1,...,gn G i,j=1 fij(gig 1 j ), (21) where each fij L2 (G) measures the compatibility between the relative alignment gig 1 j and the observation data gij on edge (i, j) E. The fij s are nonlinear and nonconvex in general. If fij are bandlimited, we can expand (21) using the Peter Weyl decomposition i,j=1 fij(gig 1 j ) = i,j=1 dktr h ˆfij(k)ρk(gi)ρ k(gj) i which can be viewed as a generalization of the multifrequency phase synchronization problem (4). For simplicity of statement, we assume the observation graph G is complete in this section. Since ρk s are unitary representations, the matrices ρk (g) s are unitary matrices for any g G, and it is natural to solve for gi from its irreducible representations ρk (gi). Vertically stacking the kth irreducible representations together, the variable can be organized in matrices X(k) Cndk dk, k Z defined by X(k) = [ρk(g1), . . . , ρk(gn)] . (22) Analogies of the noise models also exist in this more general setting. The additive Gaussian noise model, following (Perry et al., 2018), amounts to n X(k)(X(k)) + 1 ndk (k) (23) where the parameter λk > 0 stands for the signal-to-noise ratio (SNR) at frequency k, (k) Cndk ndk is a Wigner matrix with i.i.d. standard complex Gaussian entries in the upper triangular part. For the random corruption model, let ( gig 1 j , with probability r g Unif (G) , with probability 1 r (24) and set the (i, j)th sub-block of H(k) to ρk(gij). As we elaborate in the remainder of this section, all the key ingredients in PPE-SPC and MFGPM can be extended to this more general setting. We demonstrate the efficacy of this algorithm for SO(3) synchronization in Section 6. Spectral relaxation: Compute the top dk eigenvectors and stack them horizontally to form U (k) = [u(k) 1 , . . . , u(k) dk ]. Approximate H(k) with b H(k) = U (k) U (k) . Generalized harmonic retrieval: For each (i, j) E, set ˆgij = argmax g G k=1 dktr h b H(k) ij ρ k(g) i . (25) Based on these new estimates for the pairwise alignments, we build matrix b H with n2 blocks with e Hij = ρ1(ˆgij). We then extract the top d1 eigenvectors of b H, stack them horizontally to form e U = [u1, u2, . . . ud1], and project each of its n vertical blocks e U1 . . . , e Un Cd1 d1 to a unitary matrix through singular value decomposition (SVD) Proj(e Ui) = ΦΨ where e Ui = ΦΣΨ . (26) Iterative refinement: At the tth iteration, denoting X(k,t) for the current stacked kth representations (22), we construct Y (k) = H(k)X(k,t), Multi-Frequency Phase Synchronization (b) PPE-SPC (d) PPE-SDP (e) PPE-SDP (f) PPE-SPC3 (g) PPE-SPC + AMP (h) PPE-SPC + MFGPM Figure 2. U(1) synchronization under Gaussian noise model for n = 100. Here σ = n/λ. Every data point is a median over 40 trials in (a) (c) and (f) (h), and over 5 trials in (d) (e). and compute the inverse Fourier transform for each of the n vertical sub-blocks Y (k) 1 , . . . , Y (k) n Cdk dk of Y (k): k=1 Re n dktr h Y (k) i ρ k(g) io , i = 1, . . . , n. Note that we only need toe evaluate Ci(g) on a finite number of uniformly sampled elements of G, from which the inverse Fourier transform can be applied U (k) i = Z G ητ (hi(g)) ρk(g) dg (27) along with the soft-thresholding ητ. We again project each U (k) i to the closest unitary matrix by SVD (26), then form X(k,t+1) by vertically stacking the Proj(U (k) i ) s. The final outputs are b X(k) = X(k,T ) for k = 1, . . . , kmax. 6. Numerical Experiments This section contains detailed numerical results under both additive Gaussian noise and random corruption models, for both U(1) and SO(3). In all experiments with Gaussian noise, we keep σk σ n/λ where λ > 0 is the signalto-noise ratio (SNR); for the random corruption model (3) we set r λ/ n. We fix n = 100 and vary λ and kmax to evaluate and compare the performance of different algorithms. When comparing iterative algorithms (AMP, GPM, MFGPM), within each random trial the random initialization is kept identical for all three algorithms and across frequency channels; between trials both data and initialization are redrawn. The remainder of the section contains (b) PPE-SPC (d) PPE-SDP (e) PPE-SDP (f) PPE-SPC3 (g) PPE-SPC + AMP (h) PPE-SPC + MFGPM Figure 3. U(1) synchronization under random corruption model for n = 100. Here r = λ/ n. Every data point is a median over 40 trials in (a) (c) and (f) (h), and over 5 trials in (d) (e). results for U(1) and SO(3) synchronization with complete observation graphs only; incomplete observation graph results are similar and included in the supplemental material. U(1) synchronization: In Figure 2 and Figure 3, we measure the correlation between the output and the truth phase vector for various singleand multi-frequency synchronization methods, under the additive Gaussian and random corruption noise model, respectively. The SNR λ varies between 0.7 and 1.3, which is in the extremely noisy regime: under the random corruption model, for instance, with n = 100, between 87% and 93% of the pairwise alignments are corrupted with random elements. In each subplot, the vertical axis varies kmax from 1 to 1024, and the horizontal axis marks the change in λ. The bottom row in each subplot thus represents the single-frequency (kmax = 1) version of the algorithm. The methods under comparison are: (a) AMP (Perry et al., 2018) with random initialization; (b) PPE-SPC; (c) MFGPM with random initialization; (d) PPE-SDP (replacing the spectral methods in Algorithm 1 with SDP relaxation); (e) PPE-SDP with an additional projection to rank-one matrices in each iteration; (f) Iterating PPE-SPC three times; (g) AMP initialized with PPE-SPC; (h) MFGPM initialized with PPE-SPC. It is clear from Figure 2 and Figure 3 that leveraging information in multiple frequency channels produces superior results than single-frequency approaches. Most shockingly, in Figure 3 our proposed PPE-SPC method and variants [subplots (b) (h)] are capable of recovering the true phase vector when the SNR is well below the critical threshold λ = 1 (corresponding to r < 1/ n) determined in (Singer, Multi-Frequency Phase Synchronization (a) Gaussian Noise Model (b) Random Corruption Model Figure 4. Accuracy and error-bars for SO(3) synchronization with Gaussian noise (left) and random corruption (right) model with K frequencies, for various K. The noise levels are kept as λ = n/σ for Gaussian models and r = λ/ n for random corruption models, where n = 100. Accuracy is measured by the correlation (X(1)) b X(1) F /( 3n) between estimates and the ground truth. Each data point is the median of 50 trials. 2011) by random matrix arguments. This is surprising because, according to (Singer, 2011), for single frequency phase synchronization one can not expect correlation to be much higher than 1/ n, which is 0.1 in our experiments. This is confirmed by looking at the bottom row of each subplot of Figure 3, but with suitably large kmax this barrier no longer exists, even though in model (5) our high-frequency measurements are generated from the single frequency data. In Figure 2 and Figure 3, (d) and (e) illustrates the performance of the SDP variant of PPE-SPC. The difference between (d) and (e) is the following: in (d) we use directly estimated W (k) by solving the SDP in (Singer, 2011), but in (e) we apply project the SDP solution to a rank-one matrix using eigen-decomposition. The results from these SDP variants are occasionally slightly better PPE-SPC, but the computational cost is expensive: the runtime is over 40 times longer, and a lot more memory is required. The SDP relaxation in (Bandeira et al., 2015) is even more demanding on computation resources so is not included here. Figures 2f and 3f explore another possibility of extending PPE-SPC: After recovering b H, take entrywise powers of ˆH and treat them as multi-frequency data input to another fresh run of PPE-SPC. Unlike the iterative refinement algorithm MFGPM, we observed empirically that the performance boost saturate quickly after just a couple of such repeated calls to PPE-SPC. The result in (f) from both figures are obtained from performing 3 such repetitions. Compared with (b), this strategy improves the estimation accuracy for smaller λ, but the performance gain is not as significant as using MFGPM for iterative refinements (h). Initialization turns out to be important for AMP: As shown in Figure 2a, when the SNR is below the critical threshold predicted in (Perry et al., 2018) (λ < 1), increasing kmax does not lead to performance improvement; the critical threshold appears even higher for random corruption model (Figure 3a). In contrast, PPE-SPC and MFGPM can Figure 5. Comparison of random and spectral initialization for SO(3) synchronization, under noise model (23) where kmax = 8, n = 100. Each data point is averaged from 20 random trials. always benefit from sufficiently larger kmax. SO(3) synchronization: Comparison results for SO(3) synchronization under Gaussian noise model and random corruption model are shown in Figure 4a and 4b, respectively. In all these experiments, the Fourier transform (27) is numerically evaluated using m = 1000 elements uniformly sampled in SO(3). Clearly, the proposed method outperforms single frequency methods and achieve higher accuracy as kmax increases; moreover, the multi-frequency formulation and algorithm lead to drastic performance boost especially at the low SNR regime. In Figure 5 we compare AMP and MFGPM with different initialization strategies PPE-SPC vs. random initialization under the additive Gaussian noise model (23) with kmax = 8. We plot the accuracy of using PPE-SPC alone without iterative refinement as a baseline. The results demonstrate the performance boost from using PPE-SPC for initialization, as well as improvements gain from using iterative refinements on top of the initialization PPE-SPC. 7. Conclusion In this paper, we propose a novel, mult-frequency formulation for phase synchronization as a nonconvex optimization problem, for which we develop a two-stage algorithm inspired by harmonic retrieval and generalized power method that produces high accuracy approximate solutions. We demonstrate in theory and experiments that the new framework significantly outperform all existing phase synchronization algorithms. There are many opportunities for future research. We are particularly interested in gaining deeper theoretical understandings for the multi-frequency GPM algorithm, especially its performance guarantees and behavior near local optimum. More general harmonic retrieval techniques can be potentially used in place of the periodogram-based peak extraction. We are also working on extending the algorithmic framework beyond compact Lie groups, such as Euclidean groups and symmetric groups, with applications to object matching (Shen et al., 2016; Pachauri et al., 2013). Multi-Frequency Phase Synchronization Acknowledgement. Tingran Gao acknowledges partial support from DARPA D15AP00109 and NSF IIS 1546413. Abbe, E., Fan, J., Wang, K., and Zhong, Y. Entrywise Eigenvector Analysis of Random Matrices with Low Expected Rank. arxiv preprint, 2017. URL http: //arxiv.org/abs/1709.09565. Bandeira, A., Chen, Y., and Singer, A. Non-unique games over compact groups and orientation estimation in cryo EM. ar Xiv preprint ar Xiv:1505.03840, 2015. Bandeira, A. S., Kennedy, C., and Singer, A. Approximating the little Grothendieck problem over the orthogonal and unitary groups. Mathematical Programming, 160(1-2):433 475, 2016. ISSN 0025-5610. doi: 10.1007/s10107-016-0993-7. Bandeira, A. S., Boumal, N., and Singer, A. Tightness of the maximum likelihood semidefinite relaxation for angular synchronization. Mathematical Programming, 163(1-2):145 167, 2017. ISSN 0025-5610. doi: 10.1007/ s10107-016-1059-6. Boumal, N. Nonconvex Phase Synchronization. SIAM Journal on Optimization, 26(4):2355 2377, 2016. ISSN 1052-6234. doi: 10.1137/16M105808X. Bresler, Y. and Macovski, A. Exact maximum likelihood parameter estimation of superimposed exponential signals in noise. IEEE Transactions on Acoustics, Speech, and Signal Processing, 34(5):1081 1089, oct 1986. ISSN 0096-3518. doi: 10.1109/TASSP.1986.1164949. Candes, E. J., Li, X., and Soltanolkotabi, M. Phase Retrieval via Wirtinger Flow: Theory and Algorithms. IEEE Transactions on Information Theory, 61(4):1985 2007, 2015. Chaudhury, K. N., Khoo, Y., and Singer, A. Global Registration of Multiple Point Clouds Using Semidefinite Programming. SIAM Journal on Optimization, 25(1):468 501, 2015. ISSN 1052-6234. doi: 10.1137/130935458. Chen, Y. and Candes, E. Solving Random Quadratic Systems of Equations Is Nearly as Easy as Solving Linear Systems. In Cortes, C., Lawrence, N. D., Lee, D. D., Sugiyama, M., and Garnett, R. (eds.), Advances in Neural Information Processing Systems 28, pp. 739 747. Curran Associates, Inc., 2015. Chen, Y., Suh, C., and Goldsmith, A. J. Information Recovery From Pairwise Measurements. IEEE Transactions on Information Theory, 62(10):5881 5905, 2016. ISSN 0018-9448. doi: 10.1109/TIT.2016.2600566. Cucuringu, M., Singer, A., and Cowburn, D. Eigenvector synchronization, graph rigidity and the molecule problem. Information and inference : a journal of the IMA, 1(1): 21, 2012. ISSN 2049-8764. Donoho, D. L. De-noising by soft-thresholding. IEEE transactions on information theory, 41(3):613 627, 1995. Eldridge, J., Belkin, M., and Wang, Y. Unperturbed: spectral analysis beyond Davis-Kahan. arxiv preprint. arxiv: 1706.06516, jun 2017. URL http://arxiv.org/ abs/1706.06516. Fan, J., Wang, W., and Zhong, Y. An ℓ Eigenvector Perturbation Bound and Its Application. Journal of Machine Learning Research, 18(207):1 42, 2018. URL http: //jmlr.org/papers/v18/16-140.html. Fan, Y. and Zhao, Z. Multi-frequency vector diffusion maps. In Chaudhuri, K. and Salakhutdinov, R. (eds.), Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pp. 1843 1852, Long Beach, California, USA, 09 15 Jun 2019a. PMLR. URL http:// proceedings.mlr.press/v97/fan19a.html. Fan, Y. and Zhao, Z. Cryo-electron microscopy image analysis using multi-frequency vector diffusion maps. ar Xiv preprint ar Xiv:1904.07772, 2019b. Gao, T., Brodzki, J., and Mukherjee, S. The geometry of synchronization problems and learning group actions. Discrete & Computational Geometry, 2019a. ISSN 14320444. doi: 10.1007/s00454-019-00100-2. URL https: //doi.org/10.1007/s00454-019-00100-2. Gao, T., Fan, Y., and Zhao, Z. Representation theoretic patterns in multi-frequency class averaging for threedimensional cryo-electron microscopy. ar Xiv preprint ar Xiv:1906.01082, 2019b. Liu, H., Yue, M.-C., and Man-Cho So, A. On the Estimation Performance and Convergence Rate of the Generalized Power Method for Phase Synchronization. SIAM Journal on Optimization, 27(4):2426 2446, 2017. Pachauri, D., Kondor, R., and Singh, V. Solving the multiway matching problem by permutation synchronization. In Advances in neural information processing systems, pp. 1860 1868, 2013. Perry, A., Wein, A. S., Bandeira, A. S., and Moitra, A. Message-Passing Algorithms for Synchronization Problems over Compact Groups. Communications on Pure and Applied Mathematics, 2018. ISSN 00103640. doi: 10.1002/cpa.21750. Multi-Frequency Phase Synchronization Roy, R. and Kailath, T. ESPRIT-estimation of signal parameters via rotational invariance techniques. IEEE Transactions on Acoustics, Speech, and Signal Processing, 37(7):984 995, jul 1989. ISSN 00963518. doi: 10.1109/29.32276. Schmidt, R. Multiple emitter location and signal parameter estimation. IEEE Transactions on Antennas and Propagation, 34(3):276 280, mar 1986. ISSN 0096-1973. doi: 10.1109/TAP.1986.1143830. Shen, Y., Huang, Q., Srebro, N., and Sanghavi, S. Normalized spectral map synchronization. In Lee, D. D., Sugiyama, M., Luxburg, U. V., Guyon, I., and Garnett, R. (eds.), Advances in Neural Information Processing Systems 29, pp. 4925 4933. Curran Associates, Inc., 2016. Singer, A. Angular synchronization by eigenvectors and semidefinite programming. Applied and Computational Harmonic Analysis, 30(1):20 36, 2011. ISSN 1063-5203. doi: 10.1016/J.ACHA.2010.02.001. Singer, A., Zhao, Z., Shkolnisky, Y., and Hadani, R. Viewing angle classification of cryo-electron microscopy images using eigenvectors. SIAM Journal on Imaging Sciences, 4(2):723 759, 2011. Sorensen, M. and De Lathauwer, L. Multidimensional Harmonic Retrieval via Coupled Canonical Polyadic Decomposition Part I: Model and Identifiability. IEEE Transactions on Signal Processing, 65(2):517 527, jan 2017a. ISSN 1053-587X. doi: 10.1109/TSP.2016.2614796. Sorensen, M. and De Lathauwer, L. Multidimensional Harmonic Retrieval via Coupled Canonical Polyadic Decomposition Part II: Algorithm and Multirate Sampling. IEEE Transactions on Signal Processing, 65(2):528 539, jan 2017b. ISSN 1053-587X. doi: 10.1109/TSP.2016. 2614797. Stoica, P. and Moses, R. L. Introduction to spectral analysis. Prentice Hall, 1997. ISBN 0132584190. Tufts, D. W. and Kumaresan, R. Estimation of frequencies of multiple sinusoids: Making linear prediction perform like maximum likelihood. Proceedings of the IEEE, 70 (9):975 989, 1982. ISSN 0018-9219. doi: 10.1109/ PROC.1982.12428. Vershynin, R. High-Dimensional Probability: An Introduction with Applications in Data Science, volume 47. Cambridge University Press, 2018. Zhang, S. and Huang, Y. Complex Quadratic Optimization and Semidefinite Programming. SIAM Journal on Optimization, 16(3):871 890, 2006. ISSN 1052-6234. doi: 10.1137/04061341X. Zhong, Y. and Boumal, N. Near-Optimal Bounds for Phase Synchronization. SIAM Journal on Optimization, 28 (2):989 1016, 2018. ISSN 1052-6234. doi: 10.1137/ 17M1122025. Ziskind, I. and Wax, M. Maximum likelihood localization of multiple sources by alternating projection. IEEE Transactions on Acoustics, Speech, and Signal Processing, 36(10):1553 1560, 1988. ISSN 00963518. doi: 10.1109/29.7543.