# robust_group_synchronization_via_quadratic_programming__7f02f1b5.pdf

Robust Group Synchronization via Quadratic Programming

Yunpeng Shi * 1 Cole Wyeth * 2 Gilad Lerman 2

We propose a novel quadratic programming formulation for estimating the corruption levels in group synchronization, and use these estimates to solve this problem. Our objective function exploits the cycle consistency of the group and we thus refer to our method as detection and estimation of structural consistency (DESC). This general framework can be extended to other algebraic and geometric structures. Our formulation has the following advantages: it can tolerate corruption as high as the information-theoretic bound, it does not require a good initialization for the estimates of group elements, it has a simple interpretation, and under some mild conditions the global minimum of our objective function exactly recovers the corruption levels. We demonstrate the competitive accuracy of our approach on both synthetic and real data experiments of rotation averaging.

1. Introduction

Group synchronization (GS) is a critical mathematical problem that has broad applications in statistics and computer science. It assumes a mathematical group G and a graph G([n], E) where [n] = {1, 2, . . . , n} and E is the set of edges. Each node i of the graph is assigned an unknown group element g i , where the star superscript designates ground-truth information. At each edge ij E, one observes noisy and corrupted measurements gij of the ground-truth group ratio g ij = g i g 1 j . The GS problem asks to recover the unknown group elements, {g i }i [n], from the observed group ratios, {gij}ij E. The most well-known group synchronization problem is rotation averaging in 3D computer vision, where G is SO(3). It asks to recover the absolute rotations of objects from the possibly corrupted

*Equal contribution 1Program in Applied and Computational Mathematics, Princeton University 2School of Mathematics, University of Minnesota. Correspondence to: Yunpeng Shi <yunpengs@princeton.edu>, Cole Wyeth <wyeth008@umn.edu>, Gilad Lerman <lerman@umn.edu>.

Proceedings of the 39 th International Conference on Machine Learning, Baltimore, Maryland, USA, PMLR 162, 2022. Copyright 2022 by the author(s).

and noisy relative rotations between pairs of objects. The rotation averaging problem plays a central role in many 3D computer vision tasks such as structure from motion (Sf M) and simultaneous localization and mapping (SLAM). Other examples of group synchronization include phase synchronization (G = SO(2)) with applications to cryoelectron microscopy imaging, permutation synchronization (G = Sn) with applications to multi-object matching, and Z2-synchronization (G = Z2) with applications to correlation clustering and community detection.

In many real scenarios, the observed group ratios are highly corrupted. In order to handle the high corruption, it is critical to estimate the corruption levels. In order to quantify them we assume a predefined bi-invariant metric on G, which we denote by d. The bi-invariance property of d means that for any g1, g2, g3 G, d(g1, g2) = d(g3g1, g3g2) = d(g1g3, g2g3). Using d, the corruption level of edge ij E is s ij = d(gij, g ij). (1) The primary goal of this work is to develop a principled and interpretable framework to estimate {s ij}ij E without requiring a good initialization, and to robustly estimate the group elements using the estimated {s ij}ij E.

For this purpose we exploit the cycle-consistency constraint. It uses the group identity e as follows: g ijg jkg ki = e for any ij, jk, ki E, (2) In principle, one can utilize the cycle-consistency constraints for longer cycles, whereas we focus on 3-cycles for simplicity and for computational efficiency.

We propose a general solution for GS, which we refer to as Detection and Estimation of Structural Consistency (DESC). The terminology structural consistency refers to the induced cycle consistency constraint. We believe that this framework can also be generalized to other algebraic or geometric structure constraints in other application areas, such as low-rankness, low dimensionality, coplanarity of points in a Euclidean space, transitivity of ordering in the ranking problem, etc. Nevertheless, in order to keep this presentation focused and clear, we restrict it to GS. Similarly, we discuss the most refined procedures of DESC for the group SO(3) only.

1.1. Previous Works

Group synchronization was commonly applied to the discrete groups Z2 and Sn (Bandeira, 2018; Cucuringu,

Robust Group Synchronization via Quadratic Programming

2015; Ling, 2020; Chen et al., 2014; Pachauri et al., 2013) and the Lie group SO(d) (Singer, 2011; Wang & Singer, 2013; Eriksson et al., 2018). Its most common formulation uses least squares (LS) minimization:

gi = arg min gi G

ij E d2(gig 1 j , gij). (3)

Since the domain of this minimization is a nonconvex group, it is often relaxed to an eigendecomposition problem (Cucuringu, 2015; Singer, 2011; Pachauri et al., 2013; Ling, 2020), or a semidefinite programming (SDP) problem (Bandeira, 2018; Singer, 2011; Chen et al., 2014), or when G is a Lie group it is solved locally and iteratively by tangent space approximations of the manifold (Govindu, 2004). For discrete groups, such as Z2 and Sn, this formulation is robust to corruptions. However, when G is a Lie group, such as G = SO(d), then it is sensitive to outliers. In this case, robustness to outliers can be achieved by either introducing a robust objective function, or applying an outlier detection algorithm to preprocess the corrupted data.

A common robust formulation is ℓp-minimization with 0 < p 1 (Wang & Singer, 2013; Chatterjee & Govindu, 2017; Hartley et al., 2011), where d2 in (3) is replaced with dp. It is often solved by the iteratively reweighted least squares (IRLS). The main limitation of IRLS is that it requires good initialization of group elements and may easily get stuck at local minima in the presence of high noise and corruption. A recentworkby Maunu&Lerman(2020)usesanenergybased on Tukey depth to achieve provable robust synchronization to arbitrary outliers, but it also requires local initialization.

Outlier detection methods for GS utilize cycle-consistency constraints to distinguish between clean and corrupted edges. In particular, Zach et al. (2010) suggested two methods, based on belief propagation and linear programming. Agarwal et al. (2020) proposed a similar linear programming approach for the different ranking problem. Shen et al. (2016) classified an edge as clean according to its appearance in a consistent cycle (note that such an approach cannot handle self-consistent corruption). We remark that both IRLS and outlier detection methods eventually rely on accurate assignment of edge weights. For outlier detection, the edge weights are binary, where zero weight corresponds to removing an edge. However, the binary weights do not exactly reflect the corruption levels and may thus result in suboptimal estimates for group elements. IRLS updates the edge weights from the estimated corruption levels. However, the corruption levels are heuristically estimated in an iterative procedure, which is sensitive to the initialized estimates of the group elements.

The recent cycle-edge message passing (CEMP) (Lerman & Shi, 2021) overcomes the aforementioned drawbacks of both IRLS and outlier detection methods. It estimates the corruption levels without requiring a good initialization or solving weighted least squares, even when the corruption is high.

Given a set of 3-cycles, for each 3-cycle ijk, CEMP first computes the cycle inconsistency: dij,k = d(gijgjkgji, e). It then estimates the edge corruption level from those cycle inconsistencies via message passing between cycles and edges. However, the message passing procedure is hard to interpret as it does not explicitly optimize an objective function. Moreover, both its performance and theory rely on a set of reweighting parameters. A recent message passing least squares (Shi & Lerman, 2020) framework combines CEMP-like iterations with IRLS and achieves superior performance on a variety of datasets. However, it remains heuristic and lacks theoretical guarantees.

Few previous works for GS (Birdal et al., 2018; Sun et al., 2019; Birdal et al., 2020) further estimate the distribution of the group elements. They aim to address scene ambiguities and for this purpose assume special probabilistic models. Our proposed work estimates the distribution of corruption levels, assuming that a deterministic condition holds. This estimated distribution is merely used to improve the estimation of the corruption levels, though it might be used in the future for statistical inference and uncertainty quantification.

A common theoretical setting in GS assumes the uniform corruption model (UCM). In this model, the graph G is generated by the Erd os-R enyi model G(n, p), where p is the probability of connecting two nodes. An edge is then independently corrupted with probability q. If ij E is corrupted, then gij is i.i.d. sampled from a Haar measure on G, otherwise, gij = g ij. Under UCM, the information theoretic sample complexity for the exact recovery of group elements is n/ log n = O(p 1(1 q) 2) (Chen et al., 2016). For Z2 and Sn-synchronization, spectral and SDP methods match this bound (Cucuringu, 2015; Bandeira, 2018; Chen et al., 2014; Ling, 2020). However, for Lie group synchronization, it is a challenging open problem to prove that an algorithm can match the information theoretic sample complexity. The best sample complexity bound for SO(2) and SO(3) was established for CEMP: n/ log n = O(p 2(1 q) 8) (Lerman & Shi, 2021), but there is a clear gap with the desired bound.

1.2. This Work

In view of the limitations of the previous methods, we summarize the contributions of our proposed DESC.

DESC is a novel quadratic programming framework for estimating the corruption levels of edges in GS. Its minimization formulation provides a simple interpretation. We prove that under mild conditions, the global minimum recovers the corruption levels We show that under UCM, the sample complexity of our QP formulation is n/ log n = O(p 2(1 q) 2). It matches the dependence of the information-theoretic bound on q, unlike previous methods Our QP formulation is parameter free and does not require

Robust Group Synchronization via Quadratic Programming

good initialization even in highly corrupted scenarios. We demonstrate that a naive projected gradient descent is able to obtain satisfying corruption estimates For rotation averaging, we develop an algorithm for estimating the absolute rotations using the corruption levels that are estimated from our QP framework (we refer to both the latter framework and the former algorithm as DESC). Both synthetic and real data experiments demonstrate the state-of-the-art accuracy of this algorithm

The rest of the paper is organized as follows: 2 presents the framework of DESC for GS, its theoretical guarantees and the DESC algorithm for rotation averaging; 3 tests DESC on synthetic and real datasets; and 4 concludes this work. The supplemental code is in https://github.com/Cole Wyeth/DESC

2. The DESC Framework

We explain DESC as follows: 2.1 introduces notation and preliminary observations; 2.2 and 2.3 formulate and motivate the DESC framework for corruption estimation; 2.4 demonstrates the theory of this formulation under UCM; 2.5 describes the optimization method that we adopted to solve this formulation; 2.6 clarifies its computational complexity; 2.7 explains how to generally recover group elements using the output of our framework; and 2.8 presents our refined DESC algorithm for rotation averaging, and, in particular, its initialization, which we refer to as DESC-init.

2.1. Preliminaries

The (noiseless) adversarial corruption model assumes that the set of edges E is partitioned to Eg of good (clean) edges, where gij = g ij, and Eb of bad (corrupted) edges, where gij = g ij. For each ij E, let Cij := {k : ik, jk E} and Gij := {k : ik, jk Eg}. While Cij and Gij are sets of the nodes, we also view them as sets of cycles ijk containing edge ij, where k Cij or Gij, respectively. When addressing the adversarial corruption model we further assume that for each ij E, Gij is nonempty. Recall that d is a bi-invariant metric on G and assume WLOG that it is scaled so that d(g1, g2) [0, 1] for any g1, g2 G. Therefore, s ij [0, 1] for all ij E. We take advantage of the cycle inconsistency dij,k = d(gijgjkgji, e) and its following property (Lerman & Shi, 2021):

Proposition 2.1. For any ij E and k Cij, |dij,k s ij| s ik + s jk, (4) and consequently, dij,k = s ij if ik, jk Eg. (5)

Note that (5) states that the cycle inconsistency of cycle ijk is an exact estimator of the corruption level of edge ij whenever k Gij. Since we assumed that Gij is nonempty for any ij E, s ij must be supported on the set {dij,k}k Cij. The

distribution of s ij can thus be written as the probability mass for k |Cij|: p ij(k) = 1{k Gij}/|Gij|, where 1 is the indicator function. Indeed, by (5), p ij only has positive mass on the k s such that dij,k = s ij, so the distribution concentrates on the true value of s ij. Let (m) denote the simplex of length m, that is, (m) = {x Rm : x 0, x 1 = 1}, where x 0 means that all coordinates of x are nonnegative. Let (|Cij|) denote the subset of (|Cij|) with zero coordinates whenever k / Gij. Using this notation, p ij (|Cij|) (|Cij|).

2.2. General Formulation and Motivation

Let dij, v ij R|Cij| denote the vectors such that dij(k) = dij,k and v ij(k) = s ik + s jk for k Cij, respectively. We notice the following interesting relationship of p ij and s ij:

Proposition 2.2. For any ij E p ij dij = s ij and p ij v ij = 0. (6)

Proof. We prove the following more general result that implies (6) and we will be used later: p ijdij = s ij and p ijv ij = 0, for pij (|Cij|). (7) The definition of (|Cij|) and (5) imply the first equality as follows: p ijdij = P k Gij pij(k)dij,k = s ij. The second equality follows from pij(k)v ij(k) = pij(k)1{k Gij}(s ik + s jk) = pij(k)1{s ik+s jk=0}(s ik + s jk) = 0. Since pij (|Cij|), (7) implies (6).

We remark that (6) only relies on the assumption that |Gij| > 0, and does not assume any probabilistic model.

DESC aims to estimate both {p ij}ij E and {s ij}ij E, while directly using Proposition (2.2). The detection task in DESC refers to finding p ij, which is equivalent to detecting the set of good cycles ijk such that dij,k = s ij. The estimation task in DESC refers to estimating s ij. For ij E, let pij R|Cij| and sij R denote the estimates by DESC of p ij and s ij. We also define vij R|Cij| by vij(k) = sik + sjk for k Cij, so vij estimates v ij.

DESC aims to solve the following optimization problem. min {sij}ij E,{pij}ij E

ij E p ijvij (8)

subject to sij = p ijdij, ij E

pij (|Cij|), ij E. Note that the constraints in (8) and the fact that d was scaled to be in [0, 1] (so dij,k 1) imply that pij(k) and sij in (8) are between 0 and 1. This formulation is clearly motivated by Proposition (2.2). Indeed, the first equation of (6) is the first constraint of DESC (the other constraint of DESC just specifies the domain of pij). In order to try to enforce the second equation of (6) we aim to minimize the cumulative sum of the positive terms {p ij v ij}ij E. When the minimum value is 0, then the second equation of (6) is satisfied for all ij E.

Robust Group Synchronization via Quadratic Programming

Another interpretation of (8) arises when bounding the cumulative estimation error of {s ij}ij E, using the constraints in (8) as well as (4): X

ij E |sij s ij| = X

p ijdij s ij (9)

k Cij pij(k) dij(k) s ij

k Cij pij(k) dij(k) s ij

k Cij pij(k)(s ik + s jk) = X

ij E p ijv ij.

Therefore, (8) minimizes an approximate upper bound for the cumulative error, where vij replaces v ij in the right hand side (RHS) of (9).

2.3. DESC as a Quadratic Program

Plugging the first constraint of (8) into the objective function of (8) yields a quadratic objective function in {pij}ij E: X

k Cij pij(k)(sik + sjk)

k Cij pij(k) p ikdik + p jkdjk . (10)

Therefore, in order to minimize (8) it is sufficient to find all minimizers of the RHS of (10) of the form {bpij}ij E. The first constraint of (8) is not needed in the latter minimization, but it yields the additional minimizers of the form {bsij}ij E of (8) as follows: bsij = bp ijdij for ij E. (11) Thus, the quadratic programming formulation of DESC is min {pij }ij E (|Cij |)

k Cij pij(k) p ikdik + p jkdjk , (12)

where in view of (11), there is a one-to-one correspondence between the minimizers of (12) and (8).

2.4. Theory for the DESC Framework

We show that under some mild conditions, the global minimum of the DESC formulation exactly recovers s ij.

Theorem 2.3. If |Gij| 1 ij E and k Cij dij,k > 0 whenever ij, jk or ki / Eg, then any global minimum of (8) exactly recovers the corruption levels {s ij}ij E

Proof. Let {bpij}ij E be a minimizer of (12) and {bsij}ij E be defined by (11) (so {bpij}ij E, {bsij}ij E is a minimizer of (8)). We will show that bsij = s ij for all ij E and will thus conclude the stated exact recovery.

If bpij (|Cij|), then sij = p ijdij = s ij, where the first equality is due to the first constraint in (8) and the second one is due to (7). Consequently, bsij = s ij and the corruption levels are exactly recovered.

To conclude the proof we will show that bpij (|Cij|). Assume on the contrary that bpij(k) > 0 for some k / Gij. Since k / Gij, WLOG we assume that ik Eb. By our assumption on cycle-inconsistencies, we obtain that dik,l > 0 for all l Cik. Thus, sik = p ikdik > 0 (since all elements of dik are positive and at least one element of pik is positive). Consequently, pij(k)sik > 0 and the value of (10) is strictly greater than 0. This contradicts the assumption that {pij}ij E is a global minimum.

We provide two immediate corollaries of Theorem 2.3.

Corollary 2.4. Assume that G is a compact Lie group. If for any ij E, |Gij| 1, and for any ij Eb, gij is i.i.d. sampled from an absolutely continuous distribution over G, then with probability 1 any global minimum of (8) exactly recovers the corruption levels {s ij}ij E.

Proof. We claim that the assumptions of this corollary imply the conditions of Theorem 2.3 with probability 1. Indeed, if ijk is a 3-cycle that contains at least one edge in Eb, where WLOG this bad edge is ij, then Pr(dij,k = 0) = Pr(gij = gik gkj) = 0 due to the continuity of the density of gij. Thus with probability 1, dij,k > 0.

We remark that Corollary 2.4 does not assume a specific probabilistic distribution and thus it is more general than previous probabilistic results by Wang & Singer (2013).

Corollary 2.5. Assume a compact Lie group G and data generated by UCM with n nodes, probability p of connecting two nodes p, and probability q of corrupting an edge. Then for n/ log n 10/(p2(1 q)2), with probability at least 1 n 0.7 any global minimum of (8) exactly recovers the corruption levels {s ij}ij E.

Proof. It suffices to show that under the assumption of this corollary, |Gij| 1 is satisfied with high probability. We first observe that Xk := 1{k Gij}, for k [n], are i.i.d. Bernoulli random variables with mean µ = p2(1 q)2. Applying the Chernoff bound to Xk yields

Pr(|Gij| 1) = Pr 1

k [n] Xk µ np2(1 q)2

3 1 1 np2(1 q)2 2p2(1 q)2n. (13) If n/ log n 10/(p2(1 q)2), then 1/(np2(1 q)2) < 1/10 for n > 2 and thus (13) implies that Pr(|Gij| 1) > 1 e 27

100 p2(1 q)2n. By taking a union bound over ij E and applying the assumption n/ log n c/(p2(1 q)2) for c 10, we obtain that Pr(min ij E |Gij| 1) > 1 n2e 27

100 p2(1 q)2n

100 log n = 1 n2 27c

100 1 n 0.7.

Robust Group Synchronization via Quadratic Programming

Corollary 2.4 implies that the sample complexity for the DESC framework is n/ log n = Ω(p 2(1 q) 2), where the order of 1 q matches the information-theoretic one. On the other hand, as explained in Lerman & Shi (2021), due to the use of 3-cycles one cannot improve the dependence on p.

2.5. Optimization of the DESC Framework

We optimize the quadratic program in (12) by a projected gradient descent (PGD) method. At each iteration t, let {s(t) ij }ij E and {p(t) ij }ij E be the corresponding estimates of a minimizer of (8) (which is equivalently obtained via (12) and (11)). Denote the objective function in (12) by f({pij}ij E) and its gradient with respect to pij at the estimates p(t) ij and s(t) ij , ij E, by

(t) ij f :=

{pij }ij E ={p(t) ij }ij E

{sij }ij E ={s(t) ij }ij E

where f pij(k) (15)

= sik + sjk +

l Cij pil(j) + pjl(i)

Recall that for each ij E, pij (|Cij|). Note that (|Cij|) is contained in the hyperplane

Hij := {x R|Cij| :

i=1 xi = 1}

and that Hij is the tangent space of (|Cij|) in R|Cij|. We further note that the orthogonal projector onto Hij is 11 /|Cij|, where 1 is the all-one vector of length |Cij|. Therefore the corresponding Riemannian gradient (Boumal, 2020) is e (t) ij f = (I 11 /|Cij|) (t) ij f, where I denotes the identity matrix.

Our projected gradient descent method updates p(t+1) ij at each iteration using the Riemannian gradient and then projects onto (|Cij|). That is, for ij E ep(t+1) ij = p(t) ij αt e (t) ij f and

p(t+1) ij = Proj (|Cij|)(ep(t+1) ij ). The projector onto (|Cij|) can be computed following the method of Wang & Carreira-Perpin an (2013): For each fixed ij E, p(t+1) ij = max(ep(t+1) ij τ, 0), (16) where the parameter τ R is the solution of X

k Cij max(ep(t+1) ij (k) τ, 0) = 1.

We remark that h(τ) = P

k Cij max(ep(t+1) ij (k) τ, 0) is a piecewise linear function where the endpoints of the piecewise intervals are {ep(t+1) ij (k)}k. We order ep(t+1) ij (k)

by their values from low to high, and find k such that h(ep(t+1) ij (k)) 1 and h(ep(t+1) ij (k + 1)) > 1. In this way,

therangeofτ isnarroweddownto[ep(t+1) ij (k), ep(t+1) ij (k+1)), and on this interval h(τ) is a linear function and h(τ) = 1 can be easily solved.

We refer to this procedure as DESC-PGD and summarize it in Algorithm 1. We remark that our proposed PGD is analogous to the Riemannian gradient descent method in Boumal (2020), except that our projection onto the simplex is not a valid retraction. Algorithm 1 DESC-PGD Input: {gij}ij E, {dij,k}k Cij, tmax

Steps: p(0) ij = 1/|Cij| ij E for t = 1 : tmax do

ep(t) ij = p(t 1) ij αt 1 e (t 1) ij f ij E

p(t) ij = Proj (|Cij|)(ep(t) ij ) ij E

s(t) ij = (p(t) ij ) dij ij E end for Output: bsij = stmax ij

In practice, to accelerate the implementation of DESC, instead of using all the 3-cycles, one may use a randomly sampled subset. That is, for each ij E, Cij is a randomly sampled set of nodes k such that ik, jk E.

2.6. Computational Complexity of DESC-PGD

At each iteration of DESC-PGD, the gradient computation in (15) requires the sum of pil(j) and pjl(i) over l Cij for each ij E, which takes O(|E|c) computation time where c is the average of |Cij|. The projection onto (|Cij|) has the same O(|E|c) complexity. Since the complexity of computing dij,k is also O(|E|c), the per-iteration time complexity of DESC is O(|E|c), which is exactly the same as that of CEMP.

2.7. Estimation of General Group Elements

We follow ideas of Lerman & Shi (2021) to estimate the group elements, {g i }i [n] G, using {bsij}ij E. We assume that G is a subgroup of the orthogonal group O(D). For this purpose we use the graph connection weight (GCW) matrix (Singer & Wu, 2012), which aims to approximately solve the following weighted least squares problem: min {gi}i [n] G

j Ni wijd2(gig 1 j , gij), (17)

where Ni = {j : ij E} is a set of neighboring nodes of i and wij is a normalized graph weight such that P

1. In practice, we compute wij by normalizing bs 3/2 ij . We represent each group element, gi, by a D D orthogonal matrix and stack these matrices to form an n D D block matrix Y whose i-th block is the matrix representation of gi.

Robust Group Synchronization via Quadratic Programming

We initially estimate Y by finding the top D eigenvectors of the block matrix X, where the (i, j)-th black of X is wijgij, and each gij is represented as its corresponding orthogonal matrix. Then we project each block of the initially estimated Y onto G to obtain the estimated group elements.

2.8. A Refined Solution for Rotation Averaging

For rotation averaging, we propose using the DESC-based GCW procedure of 2.7 to initialize the absolute rotations. We then suggest using the bsij s obtained by DESC-PGD to improve the IRLS algorithm of Chatterjee & Govindu (2017) and thus refine the initialized rotations.

We first briefly review the latter IRLS algorithm. For i [n] and t N, let R(t) i denote the absolute rotation matrix estimated by IRLS at iteration t. For ij E, let Rij denote the input relative rotation matrix. IRLS updates at each iteration the estimated absolute rotations. Given R(t 1) i , i [n], it solves an optimization problem for the matrices R(t) i , i [n], which satisfy R(t) i = R(t 1) i R(t) i (note that R(t) i approaches I as t approaches infinity). The desired optimization is the weighted least squares of (17) with iteratively updated edge weights, w(t) ij , ij E, where gi, gj, gij are

replaced by R(t) i , R(t) j , (R(t 1) i ) Rij R(t 1) j , respectively. This formulation is further approximated by mapping, at each iteration, the rotation matrices in SO(3) to the tangent space of I, so(3), by the matrix logarithm, log. We denote the mapped elements by Ω(t) i = log R(t) i , i [n], and

Ω(t) ij = log((R(t 1) i ) Rij R(t 1) j ), ij E. (18) Chatterjee & Govindu (2017) approximate (17) by minimizing over { Ω(t) i }i [n] so(3) the function X

ij E w(t) ij Ω(t) i Ω(t) j Ω(t) ij 2 F , (19)

Next, they compute for any edge ij E the residual r(t) ij := Ω(t) i Ω(t) j Ω(t) ij F /

2π2 and update

the weights by w(t+1) ij = (r(t) ij ) 3/2. The basic idea is that edges with higher residuals are likelier to be corrupted and thus should be assigned smaller weights.

Wemodifythis IRLSprocedureasfollows. First, weinitialize the rotations by GCW, which uses the output of DESC-PGD (see 2.7). Our numerical experiments indicate that this initialization is often more accurate than IRLS (Chatterjee & Govindu, 2017). Second, we replace the residuals r(t) ij in

IRLS by a convex combination of r(t) ij and DESC-estimated

bsij, where the coefficient of r(t) ij is t/(t + 1). Consequently, the information from the residual is increasingly emphasized and bsij is mainly used to guide IRLS to escape the local minima in the first few iterations. At last, after computing the edge weights, we assign the weight 10 8 to a certain percentage of the edges with the lowest weights (the chosen percent-

age at iteration t is min(5t , 20)). We do not assign 0 weights (i.e., completely remove them) in order to avoid a disconnected graph. The last two ideas are also used in MPLS (Shi & Lerman, 2020)). Nevertheless, our rotation refinement is also different from MPLS in the following ways. First, MPLS uses a minimal spanning tree to initialize rotations which results in inaccuracies when all edges are noisy. Second, MPLS also uses a message passing unit to update edge weights in each iteration, which is more complex than our method.

Algorithm 2 describes our overall solution to rotation averaging, which we refer to as DESC-SO(3), or just DESC. We refer to the initialization of this solution (obtained in the second step of Algorithm 2) by DESC-init.

Algorithm 2 DESC-SO(3) (DESC) Input: {Rij}ij E, {dij,k}k Cij

Steps: Compute {bsij}ij E by DESC-PGD Initialize {R0 i }i [n] by DESC-based GCW (see 2.7). t = 0 w(0) ij = min(bs 3/2 ij , 108) ij E while not convergent do

t = t + 1 Compute Ω(t) ij according to (18) ij E

Find { Ω(t) i }i [n] as the minimizer of (19) over so(3)n

R(t) i = R(t 1) i exp( Ω(t) i ) i [n] r(t) ij = Ω(t) i Ω(t) j Ω(t) ij F /(

h(t) ij = (t r(t) ij + bsij)/(t + 1) ij E

w(t) ij = min((h(t) ij ) 3/2, 108) ij E τt = min(5t , 20) w(t) ij = 10 8 for τt% of edges with the highest h(t) ij end while Output:

3. Experiments

We test our methods for rotation averaging. In 3.1, we describe the implementation details of all tested algorithms. In 3.2 we report the estimation of the corruption levels and rotations on synthetic data generated by UCM for SO(3). In 3.3, we compare the performance of different algorithms on the Photo Tourism dataset (Wilson & Snavely, 2014).

3.1. Implementation Details of All Algorithms

We first compare our QP scheme with the following linear programming (LP) method for estimating corruption levels:

ij E sij (20)

subject to |sij dij,k| sik + sjk 0 sij 1,

Robust Group Synchronization via Quadratic Programming

where the first constraint is due to (4). We solve (20) using the default Matlab CVX LP solver. Using this solution for the corruption levels, one can apply the same post-processing as DESC to estimate the rotations (see 2.7). This LP formulation is very similar to that of Agarwal et al. (2020) except that Agarwal et al. (2020) is designed for rank aggregation. It is also similar to that of Zach et al. (2010), but Zach et al. (2010) with an additional penalty in the form of the sum over cycles of maximal corruption levels within each cycle. The objective function in (20) is based on the assumption that the overall corruption level of the graph is small and thus does not apply to highly corrupted scenarios. In contrast, DESC aims to enforce the orthogonality of vij and pij (see (8)), which seems also relevant to high corruption.

We also compare DESC with DESC-init and competitive GS methods. We test two versions of IRLS: IRLS-GM (Chatterjee & Govindu, 2013) and IRLS-ℓ1/2 (Chatterjee & Govindu, 2017), while using their default implementations. These versions use the Geman Mc Clure (GM) and ℓ1/2 losses. We implement CEMP (Lerman & Shi, 2021) and MPLS (Shi & Lerman, 2020) using the codes provided by the respective papers, with their default parameters. Following Lerman & Shi (2021), we use CEMP for recovering the corruption levels and both CEMP+MST and CEMP+GCW to recover rotations. CEMP+MST uses the minimum spanning tree (MST) as a post-processing step to estimate rotations, and CEMP+GCW uses GCW as in 2.7.

For the synthetic data experiments, we ran DESC with a constant step size of 0.01. The maximum number of iterations was set to 100. We noticed that increasing this number improved the accuracy, but we preferred a reasonable runtime (we will discuss the tradeoff between the two later). To further reduce the runtime, we also sampled (without replacement) a subset of the cycles of each edge. The number of cycles sampled was chosen as one quarter of the median number of cycles per edge, or at least 30. For edges with fewer cycles than the sample number, all cycles were used. No other parameters needed to be tuned.

For real data, due to the large sizes of the datasets, we increased the step size to 1 in order to accelerate the convergence and we decreased the maximum number of iterations to 30. Otherwise, all parameter settings were identical.

3.2. Synthetic Data Experiments

We compare DESC and other algorithms on synthetic data generated according to UCM, with and without noise. The underlying graph is generated by an Er os-R enyi model G(n, p) where n = 100 and p = 0.5 (two nodes are connected by an edge with probability p). The group is G = SO(3) and we represent its elements (rotations) by 3 3 rotation matrices. Let Haar(SO(3)) denote the Haar (or uniform ) probability measure on SO(3) and for

ij E, let Wij be a 3 3 Wigner matrix (with i.i.d. standard normal elements). For 0 q < 1 and σ 0, the following corruption model generates the rotation measurements:

( Proj(g ij + σWij), with probability 1 q; gij Haar(SO(3)), with probability q. That is, a group element is corrupted with probability q and in this case it is i.i.d. sampled from the uniform measure on SO(3) and otherwise its value is obtained by adding noise to the the ground-truth group ratio g ij with constant noise level σ. The resulting noisy matrix is then projected to SO(3).

Figure 1. Left: mean absolute error for corruption estimation of both DESC and linear programming, Right: log mean error in degrees for the rotation estimates of DESC and linear programming.

Using synthetic data generated from this model, we first compare our QP formulation with the LP formulation of 3.1. Since both of them aim to find the corruption levels, we compute the following absolute error for corruption estimation: 1 |E|

ij E |bsij s ij|. (21)

Figure 1 shows the absolute mean errors for corruption estimation and log mean errors for rotation estimation of both LP and QP with σ = 0 and varying q. In its first plot (on left), QP performs significantly better than LP for corruption estimation when q 0.4. This is due to the underlying assumption of the LP formulation that the overall edge corruption level is small. In its second plot (on right), DESC significantly outperforms LP with all values of q > 0. For fair comparison, we post-processed LP for rotation estimation with the same steps of Algorithm 2. Interestingly, even when q is small, the rotation estimates of LP are much worse than DESC, unlike the corruption estimates. The reason is that LP tends to underestimate the corruption levels due to its objective function. Underestimation of a small fraction of corruption levels as nearly 0 results in nearly infinite edge weights of the corresponding edges and consequently inaccurate rotation estimation. Due to the overall poor performance of LP, we ignore it in the rest of the experiments.

Next, we ran all algorithms, except LP, on synthetic datasets generated with q = 0, 0.1, 0.2, . . . , 0.8 and both σ = 0

Robust Group Synchronization via Quadratic Programming

Table 1. Average of the mean and median errors (in degrees) for rotation estimates across the 13 datasets of Photo Tourism

DESC DESC-init IRLS-GM IRLS-L 1

2 CEMP-MST CEMP-GCW MPLS

mean 3.5119 3.8354 3.9644 3.8447 4.1447 3.9191 3.7142 median 1.5938 1.8516 1.7255 1.7201 1.7975 2.0339 1.7032

Figure 2. Mean and median errors (in degrees) for rotation estimation of different algorithms (see legend) using the synthetic data with varying q and σ. Top: mean, bottom: median, left: σ = 0 and right σ = 0.1. We applied log base 10 to the y axis.

and σ = 0.1. Figure 2 reports the mean and median errors of rotation estimates by all tested methods. Because the values varied by several orders of magnitude, we used a logarithmic scale (base 10) for the y-axis. In all cases, DESC is comparable to MPLS. We note that DESC-init consistently outperforms CEMP-GCW, where both methods use the same GCW postprocessing for rotation estimation.

Figure 3 shows the absolute estimation errors of the corruption levels (see (21)) by DESC and CEMP using a logarithmic y-axis scale as in Figure 2. We note that overall the accuracy of DESC and CEMP for corruption estimation is comparable. In particular, in terms of the mean estimation error, DESC is more successful when q is small, whereas CEMP is more advantageous when q is large. In terms of the median error, DESC consistently outperforms CEMP for almost all values of q when σ = 0 and is several orders of magnitudes more accurate. When σ = 0.1, DESC yields slightly higher median error than that of CEMP for high q, and has much lower median error than CEMP for low q.

The per-iteration runtimes of DESC and CEMP on synthetic data were 0.06 and 0.02 seconds, respectively. While DESC is fast per iteration, it requires dozens of iterations to converge, making it slower than CEMP, even though both

Figure 3. Mean and median absolute error of corruption estimation for DESC and CEMP using the synthetic data. The upper two plots show the means and the lower two plots show the medians. The y axis uses a logarithmic scale with base 10.

of them have the same order of computational complexity.

3.3. Real Data Experiments

For experiments with real data, we used the Photo Tourism dataset, which was introduced in Wilson & Snavely (2014). It contains hundreds of images along with the approximate ground truth rotations estimated by the bundler software (Snavely et al., 2006). The relative rotations are estimated following the pipeline presented in Ozyesil & Singer (2015). We ran DESC along with the above benchmarks (excluding LP) on the 14 Photo Tourism datasets.

Figures 4 and 5 report the mean and median rotation errors, respectively, in degrees. The Gendarmenmarkt dataset is not included because all methods performed very poorly on it, with over 30 degrees error, which skewed the scale of the y axis. We note that DESC is overall competitive. The performance of all methods widely vary, though they are fairly consistent with each other for most datasets. Table 1 shows the average of the mean and median errors of all methods across all datasets. DESC performs the best by both metrics.

Next, we tested the ability of DESC to estimate edge corruptions. Figures 6 and 7 report the mean and median error of

Robust Group Synchronization via Quadratic Programming

Ellis Island

Madrid Metropolis

Montreal Notre Dame

NYC Library

Piazza del Popolo

Roman Forum

Tower of London

Union Square

Vienna Cathedral

Yorkminster

Mean Error (Degrees)

DESC DESC-init IRLS-GM IRLS-L12 CEMP-MST CEMP-GCW MPLS

Figure 4. Mean error (in degrees) of rotation estimation for each algorithm on 13 of the Photo Tourism datasets.

Ellis Island

Madrid Metropolis

Montreal Notre Dame

NYC Library

Piazza del Popolo

Roman Forum

Tower of London

Union Square

Vienna Cathedral

Yorkminster

Median Error (Degrees)

DESC DESC-init IRLS-GM IRLS-L12 CEMP-MST CEMP-GCW MPLS

Figure 5. Median error (in degrees) of rotation estimation for each algorithm on 13 of the Photo Tourism datasets.

corruption estimation, respectively, of both DESC and CEMP. Clearly, DESC is more successful than CEMP in recovering the corruption levels. In particular, the median error of DESC is more than 50% lower than that of CEMP on six datasets.

4. Conclusion

We proposed DESC, a novel framework for estimating the corruption levels of group ratios in group synchronization. It has a clear interpretation and we proved its exact recovery under a mild generic condition. We also established a tight recovery bound in terms of the corruption parameter under UCM. We proposed a simple numerical strategy that aimed to solve the optimization problem of DESC. We explained how to use it to solve the underlying group elements. We further refined this solution for the special case of rotation averaging. Our experiments on synthetic and real data of rotation averaging indicated that our proposed method often outperforms CEMP in corruption estimation and is competitive with state-of-the-art algorithms for rotation averaging.

Ellis Island

Madrid Metropolis

Montreal Notre Dame

NYC Library

Piazza del Popolo

Roman Forum

Tower of London

Union Square

Vienna Cathedral

Yorkminster

Mean absolute error for corruption estimates

Figure 6. Mean absolute error for the corruption estimates of DESC and CEMP on 13 of the Photo Tourism datasets.

Ellis Island

Madrid Metropolis

Montreal Notre Dame

NYC Library

Piazza del Popolo

Roman Forum

Tower of London

Union Square

Vienna Cathedral

Yorkminster

Median absolute error for corruption estimates

Figure 7. Median absolute error for the corruption estimates of DESC and CEMP on 13 of the Photo Tourism datasets.

Nevertheless, our method also has some limitations. First, our gradient descent algorithm is typically slower than CEMP. Second, when initializing the group elements, our edge weights are updated by bs 3/2 ij , which is quite heuristic. Consequently, an improvement in corruption estimation may not always result in improvement in rotation estimation.

In the future, we would like to study faster algorithms for optimizingour DESCformulation, andoptimalwaysofassigning edge weights under certain probabilistic models. We also plan to extend the idea behind our DESC framework to other tasks with structural consistency, such as subspace recovery and rank aggregation. We can also generalize our method to incorporate longer cycles in order to handle sparse graphs. For better numerical efficiency, we can use the ideas of Guibas et al. (2019) for sampling a smaller set of clean cycles.

Acknowledgement

This work was supported by NSF awards 1821266, 2124913.

Robust Group Synchronization via Quadratic Programming

Agarwal, A., Agarwal, S., Khanna, S., and Patil, P. Rank aggregation from pairwise comparisons in the presence of adversarial corruptions. In International Conference on Machine Learning, pp. 85 95. PMLR, 2020.

Bandeira, A. S. Random Laplacian matrices and convex relaxations. Foundations of Computational Mathematics, 18(2):345 379, 2018. doi: 10.1007/s10208-016-9341-9.

Birdal, T., Simsekli, U., Eken, M. O., and Ilic, S. Bayesian pose graph optimization via Bingham distributions and tempered geodesic MCMC. Advances in Neural Information Processing Systems, 31, 2018.

Birdal, T., Arbel, M., Simsekli, U., and Guibas, L. J. Synchronizing probability measures on rotations via optimal transport. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1569 1579, 2020.

Boumal, N. An introduction to optimization on smooth manifolds. Available online, Aug, 2020.

Chatterjee, A. and Govindu, V. M. Efficient and robust large-scale rotation averaging. In IEEE International Conference on Computer Vision, ICCV 2013, Sydney, Australia, December 1-8, 2013, pp. 521 528, 2013.

Chatterjee, A. and Govindu, V. M. Robust relative rotation averaging. IEEE transactions on pattern analysis and machine intelligence, 40(4):958 972, 2017.

Chen, Y., Guibas, L. J., and Huang, Q. Near-optimal joint object matching via convex relaxation. In Proceedings of the 31th International Conference on Machine Learning, ICML 2014, Beijing, China, 21-26 June 2014, pp. 100 108, 2014.

Chen, Y., Suh, C., and Goldsmith, A. J. Information recovery from pairwise measurements. IEEE Trans. Inf. Theory, 62(10):5881 5905, 2016.

Cucuringu, M. Synchronization over Z2 and community detection in signed multiplex networks with constraints. J. Complex Networks, 3(3):469 506, 2015.

Eriksson, A., Olsson, C., Kahl, F., and Chin, T.-J. Rotation averaging and strong duality. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 127 135, 2018.

Govindu, V. M. Lie-algebraic averaging for globally consistent motion estimation. In 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2004), 27 June - 2 July 2004, Washington, DC, USA, pp. 684 691, 2004.

Guibas, L. J., Huang, Q., and Liang, Z. A condition number for joint optimization of cycle-consistent networks. Advances in Neural Information Processing Systems, 32, 2019.

Hartley, R. I., Aftab, K., and Trumpf, J. L1 rotation averaging using the weiszfeld algorithm. In The 24th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2011, Colorado Springs, CO, USA, 20-25 June 2011, pp. 3041 3048, 2011.

Lerman, G. and Shi, Y. Robust group synchronization via cycle-edge message passing. Foundations of Computational Mathematics, pp. 1 77, 2021.

Ling, S. Near-optimal performance bounds for orthogonal and permutation group synchronization via spectral methods. ar Xiv preprint ar Xiv:2008.05341, 2020.

Maunu, T. and Lerman, G. Depth descent synchronization in SO(d), 2020. URL https: //arxiv.org/abs/2002.05299.

Ozyesil, O. and Singer, A. Robust camera location estimation by convex programming. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2674 2683, 2015.

Pachauri, D., Kondor, R., and Singh, V. Solving the multiway matching problem by permutation synchronization. In Advances in Neural Information Processing Systems 26, pp. 1860 1868. Curran Associates, Inc., 2013.

Shen, T., Zhu, S., Fang, T., Zhang, R., and Quan, L. Graphbased consistent matching for structure-from-motion. In European Conference on Computer Vision, pp. 139 155. Springer, 2016.

Shi, Y. and Lerman, G. Message passing least squares framework and its application to rotation synchronization. In Proceedings of the 37th International Conference on Machine Learning (ICML), 2020.

Singer, A. Angular synchronization by eigenvectors and semidefinite programming. Applied and computational harmonic analysis, 30(1):20 36, 2011.

Singer, A. and Wu, H.-T. Vector diffusion maps and the connection Laplacian. Comm. Pure Appl. Math., 65(8):1067 1144, 2012. ISSN 0010-3640. doi: 10.1002/cpa.21395.

Snavely, N., Seitz, S. M., and Szeliski, R. Photo tourism: Exploring photo collections in 3d. In SIGGRAPH Conference Proceedings, pp. 835 846, New York, NY, USA, 2006. ACM Press. ISBN 1-59593-364-6.

Sun, Y., Zhuo, J., Mohan, A., and Huang, Q. K-best transformation synchronization. In Proceedings of the

Robust Group Synchronization via Quadratic Programming

IEEE/CVF International Conference on Computer Vision, pp. 10252 10261, 2019.

Wang, L. and Singer, A. Exact and stable recovery of rotations for robust synchronization. Information and Inference, 2013.

Wang, W. and Carreira-Perpin an, M. A. Projection onto the probability simplex: An efficient algorithm with a simple proof, and an application. ar Xiv preprint ar Xiv:1309.1541, 2013.

Wilson, K. and Snavely, N. Robust global translations with 1dsfm. In Computer Vision - ECCV 2014 - 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part III, pp. 61 75, 2014.

Zach, C., Klopschitz, M., and Pollefeys, M. Disambiguating visual relations using loop constraints. In The Twenty Third IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2010, San Francisco, CA, USA, 13-18 June 2010, pp. 1426 1433, 2010.