# nonparametric_continuous_sensor_registration__a4598148.pdf

Journal of Machine Learning Research 22 (2021) 1-50 Submitted 12/20; Revised 8/21; Published 10/21

Nonparametric Continuous Sensor Registration

William Clark wac76@cornell.edu Department of Mathematics Cornell University Ithaca, NY 14850, USA

Maani Ghaﬀari maanigj@umich.edu Department of Naval Architecture and Marine Engineering University of Michigan Ann Arbor, MI 48109 USA

Anthony Bloch abloch@umich.edu Department of Mathematics University of Michigan Ann Arbor, MI 48109 USA

Editor: Sayan Mukherjee

This paper develops a new mathematical framework that enables nonparametric joint semantic and geometric representation of continuous functions using data. The joint embedding is modeled by representing the processes in a reproducing kernel Hilbert space. The functions can be deﬁned on arbitrary smooth manifolds where the action of a Lie group aligns them. The continuous functions allow the registration to be independent of a speciﬁc signal resolution. The framework is fully analytical with a closed-form derivation of the Riemannian gradient and Hessian. We study a more specialized but widely used case where the Lie group acts on functions isometrically. We solve the problem by maximizing the inner product between two functions deﬁned over data, while the continuous action of the rigid body motion Lie group is captured through the integration of the ﬂow in the corresponding Lie algebra. Low-dimensional cases are derived with numerical examples to show the generality of the proposed framework. The high-dimensional derivation for the special Euclidean group acting on the Euclidean space showcases the point cloud registration and bird s-eye view map registration abilities. An implementation of this framework for RGB-D cameras outperforms the state-of-the-art robust visual odometry and performs well in texture and structure-scarce environments.

Keywords: Kernel methods, nonparametric representation, sensor registration, Lie groups, Riemannian geometry, equivariant models

1. Introduction

We consider the problem of sensor registration, which is deﬁned as ﬁnding the transformation between two sensors with overlapping measurements or tracking a moving sensor using its sequential measurements. This problem frequently arises in engineering domains such as point cloud registration (Chen and Medioni, 1991; Censi, 2008; Segal et al., 2009; Stoyanov et al., 2012; Servos and Waslander, 2017; Parkison et al., 2018; Zaganidis et al., 2018;

c 2021 William Clark, Maani Ghaﬀari, and Anthony Bloch.

License: CC-BY 4.0, see https://creativecommons.org/licenses/by/4.0/. Attribution requirements are provided at http://jmlr.org/papers/v22/20-1468.html.

Clark, Ghaffari, and Bloch

Figure 1: Conceptual illustration of nonparametric continuous sensor registration. Continuous functions are constructed from measurements shown in red circles. The domain of the functions can be an arbitrary smooth manifold, M, where the action of a Lie group, G, is used to align them via h.f Z and h G.

Parkison et al., 2019), visual odometry and SLAM (Nist er et al., 2006; Scaramuzza and Fraundorfer, 2011; Fraundorfer and Scaramuzza, 2012; Strasdat, 2012; Kerl et al., 2013a; Engel et al., 2015; Ghaﬀari et al., 2019; Rosinol et al., 2019), satellite image registration (Kim and Im, 2003; Bentoutou et al., 2005), and photogrammetry (Mikhail et al., 2001). In terms of applications, the above mentioned problems are fundamental to many computer vision (Hartley and Zisserman, 2003; Szeliski, 2010) and robotics (Thrun et al., 2005; Barfoot, 2017) algorithms. In this paper, we formulate a fundamentally novel nonparametric framework for sensor registration that constructs continuous functions from raw measurements; hence our approach can be considered as a direct method. The functions can be thought of as curves, surfaces, or hyper-surfaces that live in a Reproducing Kernel Hilbert Space (RKHS). For example, given an RGB-D image, a function can be constructed that maps a three-dimensional (3D) point to the RGB color space. Therefore, such functions inherently represent a joint model for semantic/appearance and geometry. Furthermore, the domain of the functions can be an arbitrary smooth manifold where the action of a Lie group is used to align them. This concept is illustrated in Figure 1. The continuous functions allow the registration to be independent of a speciﬁc signal resolution and the framework is fully analytical with a closed-form derivation of the Riemannian gradient and Hessian. In the present work, we study a restriction of the framework where the Lie group acts on functions isometrically. This restriction has a variety of applications and its highdimensional derivation for the special Euclidean group acting on the Euclidean space showcases the point cloud registration and bird s-eye view map registration abilities. In particular, this work has the following contributions:

Nonparametric Continuous Sensor Registration

1. a new mathematical framework for sensor registration that enables nonparametric joint semantic/appearance and geometric representation of continuous functions using data;

2. low-dimensional cases are derived with numerical examples to show the generality of the proposed framework;

3. a high-dimensional derivation for the special Euclidean group acting on the Euclidean space showcases the point cloud registration and bird s-eye view map registration abilities. A speciﬁc derivation and implementation of this framework for RGB-D cameras performs well in texture and structure-scarce environments.

4. the open-source implementation of the derived cases in this work is available for download at https://github.com/Maani Ghaffari/c-sensor-registration https://github.com/Maani Ghaffari/cvo-rgbd

An early idea of this work in the context of RGB-D visual odometry was presented at the 2019 Robotics: Science and Systems conference in Freiburg, Germany (Ghaﬀari et al., 2019). This work generalizes the previous framework and places it into a broader context. In particular, an abstract sensor registration problem is deﬁned in Section 2, where low-dimensional and high-dimensional examples together with their second-order geometry on ﬂat or curved spaces are studied. The generalization to curved spaces also connects this work to an ongoing research direction regarding positive-deﬁnite kernel design and hyperparameter learning on curved spaces (Jayasumana et al., 2015; Feragen et al., 2015; Borovitskiy et al., 2020). Furthermore, the derivation of the Riemannian Hessian has important implications in quantifying the pose estimation covariance (Censi, 2007; Bonnabel et al., 2016; Landry et al., 2019; Brossard et al., 2020) and symmetry of the problem by studying vector ﬁelds generated by the eigenvalues of Hessian (Zhu et al., 2021), and therefore is included for completeness. However, its applications in those research areas and second-order solvers are left for future studies. Finally, the new experiments on RGB-D visual odometry in Section 9 are done via an open-source and eﬃcient C++ implementation and compared against DVO (Kerl et al., 2013a,b). The remainder of this paper is organized as follows. A review of the required mathematical machinery and notation is given in Appendix A. Section 2 presents the novel continuous sensor registration framework. Section 3 provides low-dimensional examples of the framework including point cloud registration on circles and torus. We discuss some theoretical limitations of the kernel design on compact Riemannian manifolds in Section 4. The high-dimensional example for the special Euclidean group is given in Section 5. A brief theoretical analysis for the veriﬁcation of the idea is provided in Section 6. The integration of the ﬂow for the special Euclidean group to obtain the solution is explained in Section 7. Section 8 deals with the Riemannian geometry of the special Euclidean groups. Experimental evaluations of the proposed method for registration and tracking using RGB-D images are presented in Section 9. Section 10 provides discussions regarding several important topics related to the proposed approach. Finally, Section 11 concludes the paper and provides future work ideas.

Clark, Ghaffari, and Bloch

2. Problem Setup

Let M be a smooth manifold and consider two (ﬁnite) collections of points, X = {xi}, Z = {zj} M. Also, suppose we have a (Lie) group, G, acting on M. We want to determine which element h G aligns best the two point clouds X and h Z = {hzj}. To assist with this, we will assume that each point contains information described by a point in an inner product space, (I, , I). To this end, we will introduce two labeling functions, ℓX : X I and ℓZ : Z I. In order to measure their alignment, we will be turning the clouds, X and Z, into functions f X, f Z : M I that live in some reproducing kernel Hilbert space, (H, , H).

Remark 1 The action, G M induces an action G C (M) by

h.f(x) := f(h 1x).

Inspired by this observation, we will set h.f Z := fh 1Z.

In the following, we give a summary of the recipe for cloud alignment as well as formal sensor registration problems.

Deﬁnition 2 A sensor registration problem is given by a 5-tuple: (G, M, ϕ, , g, k) where

(G1) G is a Lie group,

(G2) M is a smooth manifold,

(G3) ϕ : G Diﬀ(M) is a smooth group action,

(G4) , g is an inner product on g = Lie(G), and

(G5) k : M M R is a symmetric positive deﬁnite function, called the kernel,

while the information is given by a 3-tuple: (I, (X, ℓX), (Z, ℓZ)) where

(I1) I is an inner product space, called the information space,

(I2) X M is a ﬁnite collection of points called the target and ℓX : X I is its label, and

(I3) Z M is a ﬁnite collection of points called the source and ℓZ : Z I is its label.

The information (G1)-(G5) is required to build the general form of the gradient while the information (I1)-(I3) encodes the actual clouds which is subsequently plugged into the gradient. The remainder of this work consists of understanding cases with varying (G1)- (G5) information. We ﬁrst deﬁne the general form of the problem as follows.

Problem 1 Suppose f X and f Z are the constructed functions over point clouds X and Z, respectively. The problem of aligning two point clouds can be formulated as minimizing the distance between f X and h.f Z in the sense of the RKHS norm. That is, we want to solve

arg min h G J(h), J(h) := f X h.f Z 2 H, (1)

where H is the RKHS induced by the kernel k from (G5).

Nonparametric Continuous Sensor Registration

In this work, we restrict the problem to a special case. If G C (M) is an isometry, we have f X h.f Z 2 H = f X 2 H + f Z 2 H 2 f X, h.f Z H. Then we can reformulate Problem 1 into a reduced form as follows.

Problem 2 The problem of aligning the point clouds can now be rephrased as maximizing the scalar products of f X and h.f Z, i.e., we want to solve

arg max h G F(h), F(h) := f X, h.f Z H. (2)

2.1 Constructing the Functions

We ﬁrst choose a symmetric function k : M M R to be the kernel of our RKHS, H. This allows us to turn the point clouds to functions via

f X( ) := X

xi X ℓX(xi)k( , xi),

f Z( ) := X

zj Z ℓZ(zj)k( , zj). (3)

We can now deﬁne the inner product of f X and f Z by

f X, f Z H := X

xi X,zj Z ℓX(xi), ℓZ(zj) I k(xi, zj). (4)

We note that it is possible to use the well-known kernel trick in machine learning (Bishop, 2006; Williams and Rasmussen, 2006; Murphy, 2012) to substitute the inner products in (4) with the appearance/semantic kernel, i.e., kc : I I R. The kernel trick can be applied to carry out computations implicitly in the high dimensional space, which leads to computational savings when the dimensionality of the feature space is large compared to the number of data points (Williams and Rasmussen, 2006). After applying the kernel trick to (4), we get

f X, f Z H = X

xi X,zj Z kc(ℓX(xi), ℓZ(zj)) k(xi, zj) := X

xi X,zj Z cij k(xi, zj). (5)

Remark 3 We note two advantages of measuring the alignment of X and Z by (4). The ﬁrst is that we do not need identiﬁcation of which point of X should be paired with what point of Z. The second is that the number of points in X does not even need to be equal to the number of points in Z!

Remark 4 Although the double sum in (4) seems to make the problem computationally intractable when dealing with a large number of data points, the inner product in (4) is sparse. The sparse structure is due to the kernelized formulation and the fact that the kernel bandwidth is spatially limited. As such, most kernel terms are zero. See Fig. 1 in Ghaﬀari et al. (2019). This notion of sparsity extends to labels, i.e., semantic information, once we kernelize the information inner product as done in (5). Furthermore, to exploit this sparse structure is the motivation for the development of a custom solver. However, the speciﬁc solver used in this work is not tied to the proposed framework.

Clark, Ghaffari, and Bloch

2.2 Building the Gradient Flow

In order to (at least locally) solve (2), we will construct a gradient ﬂow: h = F(h) (this is similar to the treatment in Bloch et al. (1992) where a gradient ﬂow is used to maximize the inner product between elements of a Lie algebra corresponding to a compact Lie group). Before we proceed, we will ﬁrst determine the diﬀerential, d F. In order to do this, we will need the notion of an inﬁnitesimal generator for a group action (see chapter 4 of Berndt and Klucznik 2001).

Deﬁnition 5 Suppose that a Lie Group G acts diﬀeomorphically on a smooth manifold M via ϕ; that is

ϕ : G Diﬀ(M)

For a given ξ g = Lie(G), we denote the vector ﬁeld ξM (called the inﬁnitesimal generator) on M given by the rule:

dfx(ξM) := d

t=0 f(ϕexp(tξ)(x)), f C (M). (6)

This lets us compute the diﬀerential, d F.

Theorem 6 Suppose that F(h) = f X, h.f Z H as described above. Then

d Fe(ξ) = X

zj ( ξM(zj)) , (7)

where kxi = k(xi, ).

Remark 7 The notation for the diﬀerential of a function used throughout this paper is dfx(v), where x M and v Tx M:

t=0 f(c(t)), c(0) = x, c (0) = v.

Proof This follows from a straightforward application of the chain rule.

d Fe(ξ) = d

t=0 f X, exp(tξ).f Z H

t=0 k (xi, exp( tξ)zj)

= X cij d kxi

t=0 exp( tξ)zj

zj ( ξM(zj)) .

Nonparametric Continuous Sensor Registration

This matches equation (7).

Of course, to construct the gradient ﬂow we are interested in computing d Fh instead of just d Fe. We will accomplish this via left-translation. Left-translation is given by the smooth map ℓh : G G where x 7 hx. Its diﬀerential gives rise to an isomorphism of tangent spaces, (ℓh) : g Th G.

Corollary 8 Under the identiﬁcation Th G = (ℓh) g, we have

d Fh ((ℓh) ξ) = X

ξM(h 1zj) , (8)

In order to turn the co-vector d Fh T h G into a vector Fh Th G, we will use a leftinvariant metric. This can be accomplished by deﬁning an inner-product, , g on g and lifting to a (Riemannian) metric on G via left-translation, i.e.

(ℓh) η, (ℓh) ξ Th G := η, ξ g.

This allows us to deﬁne the gradient of F as

Fh, (ℓh) ξ Th G = d Fh ((ℓh) ξ) . (9)

This allows for a way to obtain a (local) solution to (2) by following

h = F(h). (10)

To demonstrate the generality of this registration algorithm, the cases in Table 1 will be examined in the remainder of this paper.

M S1 T 2 S2 Rn

G S1 T 2 SO(3) SE(n)

Table 1: A list of the manifolds / Lie groups to be studied in this paper. Particular interest will be placed on the (R2, SE(2)) and (R3, SE(3)) examples.

3. Low-dimensional Examples

While the primary focus of this paper is to align RGB-D images in R3, we take the opportunity to show how the above algorithm works on other spaces: speciﬁcally the spaces S1, T 2, and S2: the circle, the 2-torus, and the sphere.

3.1 The Circle

To demonstrate our algorithm, we will ﬁrst work out the case where the manifold containing the clouds is the circle, M = S1. S1 acts naturally on itself via left-translations so we shall take G = S1 as well and ϕ to be left-translation, i.e.

ϕ(α)(θ) = α + θ mod 2π.

Clark, Ghaffari, and Bloch

The Lie algebra for the circle is the real line and we will therefore take the inner product to be scalar multiplication, a, b R = ab.

The last piece of geometry we need is the kernel (the information space and the clouds will be determined later). To build k, we will use the Euclidean distance of points on the circle: let x, y [0, 2π) S1.

d2(x, y)2 := (cos x cos y)2 + (sin x sin y)2

= 2 (1 cos(x y)) . (11)

We will use the Gaussian kernel based on this distance function:

k(x, y) = σ2 exp d2(x, y)2

where ℓ> 0 is a parameter called the length-scale. If we again call cij = ℓX(xi), ℓZ(zj) I, then using (8), the diﬀerential is (for s R)

d Fh((ℓh) s) = 1

cijk(xi, zj h) sin(zj h xi) s.

Therefore, the gradient is

xi X,zj Z cijk(xi, zj h) sin(zj h xi). (13)

Finally, the Hessian can be found by simply diﬀerentiating the gradient as is

xi X,zj Z cijk(xi, zj h)[sin2 (zj h xi) cos (zj h xi)]. (14)

Example 1 We perform the registration problem on the circle. X = linspace(0, π/2, 10) and Z = linspace(π/2, π, 12). Figure 2 shows two point clouds before and after registration. The labels , ℓX and ℓZ, for all points are set to 1.

3.2 The Torus

The torus is the product of two circles: T 2 = S1 S1. Therefore, both the gradient and Hessian will be two copies of the corresponding object from the circle case. Similarly to the circle case, the group will act via left-translations and we will take (x1, x2) [0, 2π) [0, 2π) T 2 for coordinates. The kernel will be

k(x, y) = σ2 exp d T 2(x, y)2

d T 2(x, y)2 = d2(x1, y1)2 + d2(x2, y2)2. (15)

Nonparametric Continuous Sensor Registration

Figure 2: Left: The clouds X and A Z. Right: The clouds X and T A Z.

The gradient is then

cijk(xi, zj h)

sin z1 j h1 x1 i

sin z2 j h2 x2 i

The Hessian is straight-forward to calculate due to the fact that T 2 is abelian. The Hessian is HT 2 = 1

cijk(xi, zj h) hij, (17)

hij = diag sin2 z1 j h1 x1 i cos z1 j h1 x1 i ,

sin2 z2 j h2 x2 i cos z2 j h2 x2 i

This procedure naturally generalizes to T n.

Example 2 We perform the registration problem on the torus with a 5 pointed star as shown in Figure 3. X contains 50 points while Z contains 75. The displacement is [1, 1] and the transformation found was [1.0077, 0.9962] which is an error of [ 0.0077, 0.0038] which has norm of 0.0085.

3.3 The Sphere

The other initial example we wish to develop is the case when M = S2, the sphere. The Lie group we choose to act will be SO3, the special orthogonal group. This will act via usual matrix multiplication: let x S2 R3, then

ϕ(g)(x) = gx. (18)

Clark, Ghaffari, and Bloch

Figure 3: Top: The clouds X and A Z. Bottom: The clouds X and T A Z.

In terms of the point cloud data, assume that we have clouds with points in R3. These points will be projected to S2 (via normalizing) which is compatible with the group action described above. Given two points x, y R3, their arc-length distance, once projected to S2, is

d S2(x, y) = arccos x, y 3

With this distance (which is diﬀerent from the distance used in the circle example), we deﬁne the kernel to be

k(x, y) = exp d S2(x, y)2

In order to determine the gradient, we still need a metric for so3 for which we will use the (negative of the) Killing form. This gives a gradient of F(R) = Rˆω where R SO3, ˆ : R3 skew3 (three-by-three skew-symmetric matrices), and

cij k(xi, R 1zj)d S2(xi, R 1zj) p

xi 2 zj 2 xi, R 1zj 2 xi R 1zj . (21)

3.3.1 The Hessian

Unlike the circle and torus examples, calculating the Hessian for the sphere is substantially more involved. This follows from the fact that SO(3) is no longer abelian. In order to determine the Hessian, we need to know the connection on SO(3) induced by the Killing form on so(3). Let us choose the standard basis {ex, ey, ez} so(3) where

0 0 0 0 0 1 0 1 0

0 0 1 0 0 0 1 0 0

0 1 0 1 0 0 0 0 0

Nonparametric Continuous Sensor Registration

Additionally, let {Ex, Ey, Ez} be their corresponding left-invariant vector ﬁelds. The connection, calculated from (58), is given by

2Ez, Ex Ez = 1

2Ex, Ey Ex = 1

2Ey, Ez Ey = 1

Writing (21) as ˆω = ωxex + ωyey + ωzez, the Hessian is given by

H(eα, eβ) = X

γ {x,y,z} (LEαωγ) Eγ + ωγ EαEγ, eβ = LEαωβ + 1

where εαγβ = [eα, eγ], eβ are the structure constants. Combining, the Hessian is

LExωx LExωy 1

2ωz LExωz + 1

2ωy LEyωx + 1

2ωz LEyωy LEyωz 1

2ωx LEzωx 1

2ωy LEzωy + 1

which, although it is tedious, can be computed exactly.

Remark 9 The reason that (22) returns the negative of the Hessian is because we are actually diﬀerentiating the inverse of the group element.

Example 3 To illustrate registration in this case, we will align two pictures of the globe. This will be done with a topographic map of the Earth using Matlab s topo. Both X and Z

Parameters Value

transformation convergence threshold ϵ 1e 4 gradient norm convergence threshold ϵ 5e 4 kernel characteristic length-scale ℓ 0.25 kernel characteristic length-scale ℓ(iteration > 3) 0.15 kernel characteristic length-scale ℓ(iteration > 10) 0.10 kernel characteristic length-scale ℓ(iteration > 20) 0.05 kernel signal variance σ 0.1 step length 0.1 color space inner product scale 1e 5 kernel sparsiﬁcation threshold 1e 3

Table 2: Parameters used for evaluation of the globe registration. The kernel characteristic length-scale is chosen to be adaptive as the algorithm converges; intuitively, we prefer a large neighborhood of correlation for each point, but as the algorithm reaches the convergence reducing the local correlation neighborhood allows for faster convergence and better reﬁnement.

Clark, Ghaffari, and Bloch

Figure 4: Left (Reference): The image of the contour map of the Earth. Center (Perturbed): The plot of X along with A Z. Right (Aligned): The plot of X along with T A Z.

are initialized to be identical, however Z is perturbed by a transformation A SO(3). The goal is to recover A 1 to realign Z with X. The algorithm reports T SO(3) which should be the inverse of A. The values of A, T, T A, and T A F are shown in (23).

0.9386 0.3170 0.1361 0.3230 0.9461 0.0244 0.1210 0.0668 0.9904

0.9391 0.3217 0.1205 0.3160 0.9466 0.0643 0.1347 0.0223 0.9906

1.0000 0.0014 0.0007 0.0014 1.0000 0.0024 0.0007 0.0024 1.0000

, T A F = 0.0040.

The parameters used in this registration are included in Table 2. The initial and ﬁnal images are shown in Figure 4.

4. Limitations on Compact Riemannian Manifolds

In this section, we discuss an important problem regarding the positive-deﬁniteness of the kernel on compact Riemannian manifolds (Jayasumana et al., 2015; Feragen et al., 2015; Borovitskiy et al., 2020). The key result is the following theorem.

Theorem 10 (Theorem 5 of Feragen et al. 2015, Bekka et al. 2008) Let X be a topological space and let Ψ : X X R be a continuous kernel such that Ψ(x, x) = 0 and Ψ(x, y) = Ψ(y, x) for all x, y X. Then the following are equivalent

1. Ψ is conditionally of negative type, and

2. the kernel exp(tΨ) is of positive type for every t 0.

However, the issue is that for Ψ = d(x, y)2 to be conditionally negative, we need to be on Euclidean space. Fortunately, this is more than what we require in this work as we are not concerned if the positive-deﬁniteness holds for all t 0. Even though we are not on a Euclidean space, we do not know that there does not exist a t 0 to make it positive deﬁnite. The following theorem ensures the existence of such a t.

Nonparametric Continuous Sensor Registration

Theorem 11 Let M Rm be some embedded manifold. Let dc be the chordal distance on M (straight lines in Rm) and dg be the geodesic distance. For a ﬁnite collection of points {xi} M, there exists a t 0 such that the matrix

Kij = exp t dg(xi, xj)2

is positive deﬁnite.

Proof It is clear that for all i, Kii = 1. As long as the diagonal elements dominate, the matrix will be positive deﬁnite. Let

j =i exp ( t dg(xi, xj)) .

Choose t 0 large enough such that Ri < 1 for all i. Then by the Gershgorin circle theorem, all the eigenvalues of K are positive. A bound on t can be explicitly found via the following. Let Λ := min i =j dg(xi, xj).

Then Ri (N 1) exp( tΛ),

where N is the number of points. This produces

This is not a particularly useful result as the parameter t depends on the point cloud. Therefore it remains an open problem whether we can ﬁnd a t (or an interval) that works on all possible point clouds systematically. This is a particularly attractive problem in machine learning as one ultimately wants to learn all hyperparameters. In particular, the derivation of useful bounds on t as well as algorithmic implementation of learning it are interesting future research directions.

Example 4 (The Sphere) Let us see how the case of the sphere holds up. Here, we take the kernel

k(x, y) = exp d S2(x, y)q

, d S2(x, y) = arccos x, y 3

In particular, t = 1/(2ℓ2). For the usual case the exponent is q = 2. However, this may be changed as well. The idea is that k will not be positive deﬁnite for all ℓ, but it should be when ℓis small enough. This will be numerically explored by taking a random sample of N points on S2 and computing the matrix K. If all of its eigenvalues are positive, then it is positive deﬁnite. As such, we are interested in ﬁnding the smallest eigenvalue.

Clark, Ghaffari, and Bloch

Figure 5: Top left: The geodesic kernel (24) fails to be positive deﬁnite as ℓbecomes large. As long as ℓis small (which corresponds to t being large), the kernel appears to be positive deﬁnite. Top right: A zoomed version. Bottom left: The positive deﬁniteness appears to not depend too heavily on the number of points. Bottom right: All of the previous ﬁgures were with the exponent ﬁxed at q = 2. This ﬁgure varies the exponent. The chordal kernel appears to be positive-deﬁnite when q 2 which is in agreement with the work of Feragen et al. (2015).

5. Special Euclidean Groups

We now move onto the more involved examples where M = Rn and G = SE(n), the special Euclidean group in n dimensions. G acts on M in in the standard fashion: let (R, T) SE(n) where R SO(n) and T Rn,

(R, T).x = Rx + T, x Rn.

We will also choose the squared exponential kernel for k : Rn Rn R:

k(x, y) = σ2 exp x y 2 n 2ℓ2

Nonparametric Continuous Sensor Registration

for some ﬁxed real parameters σ and ℓ, and n is the standard Euclidean norm on Rn. In order to determine the gradient ﬂow (10), we need to compute the inﬁnitesimal generators of the action SE(n) Rn as well as decide on a left-invariant metric for se(n) = Lie(SE(n)).

5.1 Inﬁnitesimal Generator

For a ﬁxed ξ se(n), it has the form ξ = (ω, v) where ω so(n) and v Rn. Because the inﬁnitesimal generator map g X(M) is a Lie algebra homomorphism (where X(M) is the space of all vector ﬁelds over M, see 27 of Tu 2017), we see that ξM = ωM + v M. A straight forward computation leads to

ξRnx = ˆωx + v, ˆω skew(n), v Rn. (26)

We need to choose a metric on se(n) to turn d Fh into Fh. For this example, we will take a multiple of the Killing form on so(n) and the Euclidean norm on Rn. That is,

(ω, v), (η, u) se(n) = b2 v, u n a2 n 2

Tr(ωη), (27)

where , n is the standard Euclidean inner product on Rn (see Park and Brockett (1994) for a discussion in three dimensions), and a and b are tuning parameters. The reason for the (2 n)/2 term is because with this normalization (with a = 1) the skew matrices Eij Eji are orthonormal. Here Eij denotes the matrix with all zeros except for a 1 in the (i, j)-coordinate.

5.3 Calculating the Gradient

Before we ﬁnd the gradient, let us ﬁrst determine its diﬀerential (at the identity for simplicity).

d Fe(ξ) = X

zj ( ξM(zj))

ℓ2 k(xi, zj) (xi zj), ( ˆωzj v) n.

To turn d Fe into Fe, we will compute ωFe and v Fe separately:

Tr [( ωFe)ˆω] = X

ℓ2 k(xi, zj) (xi zj), ( ˆωzj) n,

b2 ( v Fe), v n = X

ℓ2 k(xi, zj) (xi zj), ( v) n.

Clark, Ghaffari, and Bloch

To solve for this in coordinates, we will let {em}n m=1 be the standard orthonormal basis for (Rn, , n) and {Jpq}p<q := {Epq Eqp}p<q be as above. Then, the gradient becomes:

( ωFe)pq = 1 a2ℓ2 X

cij k(xi, zj) (xi zj), ( Jpqzj) n,

( v Fe)m = 1 b2ℓ2 X

cij k(xi, zj) (xi zj), ( em) n.

The above can be simpliﬁed by computing the inner product on the right hand side:

( ωFe)pq = 1 a2ℓ2 X

cij k(xi, zj) xp i zq j xq i zp j ,

( v Fe)m = 1 b2ℓ2 X

cij k(xi, zj) zm j xm i .

Likewise, to translate away from the origin, we note that if h = (R, T) SE(n), (ℓh) (ˆω, v) = (Rˆω, Rv). Then, if we express the gradient as Fh = (ℓh) (ˆω, v) = (Rˆω, Rv), we get the following expression for (ˆω, v) se(n):

ˆωpq = 1 a2ℓ2 X

cij k(xi, zj) xp i zq j xq i zp j ,

vm = 1 b2ℓ2 X

cij k(xi, zj) zm j xm i , (28)

where zj = h 1zj = RTzj RTT. The Hessian calculation will be postponed until Section 8.

6. Analysis and Veriﬁcation of Idea

It is important to take a moment and examine when solving Problem 2 actually causes the clouds to be best aligned. It is of course impossible to perfectly align two non-identical clouds. Presented below is a discussion of the question: when the two clouds are identical, when does the identity element in the group maximize (2)? Suppose that Z = X and ℓZ = ℓX. Then the identity ideally should be a ﬁxed-point of (10). This leads to the following, which merely checks the consistency of the framework.

Proposition 12 Assume that for all h G and x M, k(hx, hx) = k(x, x). Then the identity is a global maximum of F.

Proof We have that

F(h) = f X, h.f X H, F(e) = f X 2 H 0.

Nonparametric Continuous Sensor Registration

Then using the Cauchy-Schwarz inequality we obtain:

| f X, h.f X H| f X H h.f X H,

which is less than F(e) provided that h.f X H = f X H. Computing this, we see that

h.f X 2 H = X

xi,zj X cij k h 1xi, h 1zj

xi,zj X cij k (xi, zj) = f X 2 H.

Combining everything, we get that

|F(h)| f X H h.f X H f X 2 H = F(e).

Corollary 13 Suppose k : M M R is a stationary kernel (Williams and Rasmussen, 2006, Page 82), that is k(x, y) = k(d(x, y)) for some distance function d. If G acts isometrically on M, then the identity is a global maximum of F.

Corollary 14 The identity is a global maximum for the SE(n) case.

Proof This follows from the fact that SE(n) acts on Rn isometrically, i.e. hx hy n = x y n.

Theorem 15 The maximizer of Problem 2, minimizes the angle between f X and f Z.

Proof Suppose h G is the maximizer of Problem 2. Then f X, f Z f X, f Z and f Z H f Z H. Using Cauchy-Schwarz inequality:

0 | f X, f Z | | f X, f Z | f X H f Z H f X H f Z H

dividing by f X H f Z H and replacing f Z H in the denominator by f Z H:

0 cos(θ) | f X, f Z | f X H f Z H f Z H f Z H 1

0 cos(θ) cos(θ ) 1

2 where cos(θ) = | f X, f Z |/( f X H f Z H).

Clark, Ghaffari, and Bloch

7. Integrating the Flow for the Special Euclidean Group

Now that we know the direction for the ﬂow, what remains is to determine a way to integrate the ﬂow and to determine a reasonable step size. We integrate using the Lie exponential map to preserve the group structure and the step size is calculated using a 4th-order Taylor approximation in a line search algorithm.

7.1 Integrating

We will take care that in integrating (10), our trajectories will remain on SE(n). This is slightly problematic because integrating is an additive process and SE(n) is not closed under addition. To address this, we note that if in (10), (ˆω, v) is constant in se(n) (i.e. Fh is a left-invariant vector ﬁeld) the solution is merely

(R(t), T(t)) = (R0, T0) exp(t(ˆω, v)),

where exp : se(n) SE(n) is the Lie exponential map (which is merely the matrix exponential). We will exploit this by assuming that ˆω and v are constant over each time step.

Rk+1 Tk+1 0 1

= Rk Tk 0 1

exp tˆω tv 0 0

= Rk Tk 0 1

= Rk R Rk T + Tk 0 1

Explicit formulas for R and T will be discussed in 7.4 for the special case where n = 3. Combining all of this, we get our integration step to be

Rk+1 = Rk R

Tk+1 = Rk T + Tk. (29)

7.2 Step Size

We can use (28) to point in the direction of maximal growth and (29) to ﬁnd the updateded element in SE(n). However, we currently have no intelligent way of choosing t. We will proceed by a Taylor approximation of the solution curve. If we let G(t) := F(h exp(tξ)), then we want to ﬁnd the value of t that maximizes G. We compute a 4th-order Taylor expansion of G(t) about t = 0 and determine the value of t that maximizes this polynomial.

cij eαij g1 ijt + g2 ijt2 + g3 ijt3 + g4 ijt4 , (30)

Nonparametric Continuous Sensor Registration

Figure 6: The images above show X and Z as used in Example 5. The cloud X contains 5282 points while Z contains 6865 points.

g1 ij = βij

g2 ij = γij + 1

g3 ij = δij + βijγij + 1

g4 ij = εij + βijδij + 1

2β2 ijγij + 1

2ℓ2 xi zj n

ℓ2 ˆωzj + v, xi zj n

2ℓ2 ˆωzj + v 2 n + 2 ˆω2zj + v, xi zj n

ℓ2 ˆωzj v, ˆω2zj + ˆωv n + ˆω3zj ˆω2v, xi zj n

2ℓ2 ˆω2zj + ˆωv 2 n + 2 ˆωzj + v, ˆω3zj + ˆω2v n + 2 ˆω4zj + ˆω3v, xi zj n

The optimal step size is then taken to be the value of t > 0 that maximizes the quartic (30).

Clark, Ghaffari, and Bloch

Figure 7: Left: The clouds X and A Z. Right: The clouds X and T A Z.

7.3 Special Case: n = 2

When we take the special case of SE(2), the gradient (28) takes the special form:

ω = 1 a2ℓ2 X

cijk(xi, h 1zj) xi (h 1zj)

v = 1 b2ℓ2 X

cijk(xi, h 1zj) h 1zj xi , (31)

where is the two-dimensional cross product, x1 x2

= x1y2 x2y1.

Finally, the exponential map for SE(2) can be solved exactly.

R = cos ω sin ω sin ω cos ω

sin ω cos ω 1 1 cos ω sin ω

Example 5 We perform a bird s-eye view map of registration. For the purposes of this example, we choose X to be a contour plot of Matlab s peaks(100) while Z is of peaks(120). Speciﬁcally, X is made up of 40 contour lines while Z is made up of 43 lines. For both images, the coordinates are constrained in the following way: x, y [ 3, 3]. The two images are shown in Figure 6. As these images are initially aligned, we ﬁrst perturb them and attempt to realign them. The initial perturbation is give by A SE(2) while the algorithm determines T SE(2). In this sense, we want T A = Id. The values of A, T, T A, and T A F are shown in (33).

Nonparametric Continuous Sensor Registration

Parameters Value

transformation convergence threshold ϵ 1e 4 gradient norm convergence threshold ϵ 5e 4 kernel characteristic length-scale ℓ 0.25 kernel characteristic length-scale ℓ(iteration > 3) 0.15 kernel characteristic length-scale ℓ(iteration > 10) 0.10 kernel characteristic length-scale ℓ(iteration > 20) 0.05 kernel signal variance σ 1 minimum step length 0.2 color space inner product scale 1e 5 kernel sparsiﬁcation threshold 1e 3

Table 3: Parameters used for evaluation of the bird s eye map registration. The kernel characteristic length-scale is chosen to be adaptive as the algorithm converges; intuitively, we prefer a large neighborhood of correlation for each point, but as the algorithm reaches the convergence reducing the local correlation neighborhood allows for faster convergence and better reﬁnement.

0.9654 0.2607 0.7250 0.2607 0.9654 0.6074 0 0 1.0000

0.9659 0.2591 0.8698 0.2591 0.9659 0.4049 0 0 1.0000

1.0000 0.0017 0.0122 0.0017 1.0000 0.0060 0 0 1.0000

T A F = 0.0138.

The parameters used in this registration are included in Table 3. The initial and ﬁnal images are in Figure 7. For each cloud, the color information is generated based on the level set numbers.

7.4 Special Case: n = 3

When we restrict attention to SE(3), the gradient (28) takes a special form:

ω = 1 a2ℓ2 X

cijk(xi, h 1zj) xi (h 1zj)

v = 1 b2ℓ2 X

cijk(xi, h 1zj) h 1zj xi . (34)

Clark, Ghaffari, and Bloch

Additionally, an explicit formula for the exponential map exp : se(3) SE(3) exists (see Ivancevic and Ivancevic (2011) and Rohan (2013) for example). This gives an exact way to solve (29).

R = I + sin tθ

ˆω + 1 cos tθ

T = t I + 1 cos tθ

ˆω + tθ sin tθ

ˆω2 v. (35)

where θ = ω 3 with ω R3 and t is taken to maximize G(t) in equation (30).

8. Riemannian Geometry of the Special Euclidean Groups

This section deals with the geometry of SE(2) and SE(3). Speciﬁcally, the Riemannian exponential corresponding to the metric (27) (and the analogous version for SE(2)) as well as the Hessians will be computed. Additionally, there will be a discussion on using Newton s method as an alternate update rule.

We begin with the simpler case of SE(2). We will compute the exponential ﬁrst and the Hessian second.

8.1.1 The Riemann Exponential

To compute the Riemann exponential, we will use the Euler-Poincar e equations as described by Theorem 38. The Lagrangian will be L : se(2) R where

L(ω; v1, v2) = 1

2b2 v2 1 + v2 2 .

The Euler-Poincar e equations are then

These can be integrated to get a path (ω(t); v1(t), v2(t)) se(2). However, this only deﬁnes a path in the Lie algebra and it needs to be lifted to the group. Doing so results in the following system of diﬀerential equations:

R(t) = R(t) ω(t),

T(t) = R(t) v1(t) v2(t)

The rotation part gives the same answer as the Lie exponential while cancellation occurs in the translation part:

T = R(t) v1(t) v2(t)

= R(t) R(t) 1 v1(0) v2(0)

Nonparametric Continuous Sensor Registration

Figure 8: Left: The images of the two point clouds. The blue stars represent X while the red are Z. Center: The vector ﬁeld corresponding to the eigenvector of the Hessian with the weakest eigenvalue. Right: The vector ﬁeld corresponding to the eigenvector of the Hessian with the strongest eigenvalue.

This provides us with the Riemann exponential (at the identity), Exp0 : se(2) SE(2)

0 ω v1 ω 0 v1 0 0 0

cos ω sin ω v1 sin ω cos ω v2 0 0 1

Remark 16 It is interesting to compare the Riemann and Lie exponential for SE(2). When we restrict to SO(2), both agree. This follows from the fact that SO(2) is a compact Lie group and the metric restricted to this subgroup is bi-invariant. The second interesting phenomena is that the translation part is the identity for the Riemann case while the Lie case is more involved. The reason for this is that in the metric there are no cross terms intertwining rotation with translation, i.e. the Riemann exponential is the Lie exponential for the group SO(2) R2 as opposed to SE(2) = SO(2) R2.

8.1.2 The Hessian

To calculate the Hessian for the SE(2) case, we ﬁrst choose an orthogonal basis {ez, e1, e2} se(2) where

0 1 0 1 0 0 0 0 0

0 0 1 0 0 0 0 0 0

0 0 0 0 0 1 0 0 0

Additionally, we will take {Ez, E1, E2} to be the corresponding left-invariant vector ﬁelds. The connection is given by

Ez E1 = E2,

Ez E2 = E1,

Clark, Ghaffari, and Bloch

while all other combinations are zero. The gradient in these coordinates is ωez +v1e1+v2e2. The Hessian in these coordinates is

a2LEzω b2LEzv1 + b2v2 b2LEzv2 b2v1

a2LE1ω b2LE1v1 b2LE1v2

a2LE2ω b2LE2v1 b2LE2v2

For the sake of simplicity, let the Hessian be

cijk(xi, zj)Sij,

1 ℓ2 xi zj 2 xi, zj x2 i + 1

ℓ2 ( z1 j x1 i )(xi zj) x1 i + 1

ℓ2 ( z2 j x2 i )(xi zj) x2 i + 1

ℓ2 ( z1 j x1 i )(xi zj) 1 ℓ2 ( z1 j x1 i )2 1 1 ℓ2 (x1 i z1 j )(x2 i z2 j ) x1 i + 1

ℓ2 ( z2 j x2 i )(xi zj) 1 ℓ2 (x1 i z1 j )(x2 i z2 j ) 1 ℓ2 ( z2 j x2 i )2 1

Example 6 (Hessian Eigenvalues) As an initial thought experiment, what happens if each cloud contains only a single point? As we are interested in the registration problem, let us assume that each of these points is at the same location x R2. Then, the Hessian is

x 2 x2 x1 x2 1 0 x1 0 1

It is interesting to note that this matrix is singular and its kernel is the span of [ 1, x2, x1]T. This Lie algebra element corresponds to a vector ﬁeld on R2 (the inﬁnitesimal generator, see Deﬁnition 5). The corresponding vector ﬁeld is

ξ(v) = v2 x2 v1 + x1

which is merely rotations about the point x. This rotation does not move either cloud and the degeneracy of H predicts this! In the more general context, the eigenvalues/vectors of H carry much information about the structure and symmetry of the problem. At the solution, the Hessian is negative-deﬁnite and so the eigenvector corresponding to the largest eigenvalue (smallest in absolute value) reports on the direction of greatest symmetry. That is, suppose that Hξ = λξ for ξ g and |λ| is small. Then moving Z along the vector ﬁeld ξM is not noticeable under F. To see this, we consider the case where X and Z are images of a smiley face as in Figure 8. The Hessian for this example is

422.1054 55.3179 75.8545 55.3179 16.1852 0.8360 75.8545 0.8360 23.3209

The weakest eigenvalue is 0.1907 with a corresponding eigenvector of

ξ = [ 0.2140, 0.7048, 0.6764]T.

The vector ﬁeld for this eigenvector is shown in Figure 8. The impressive aspect of this is that it is possible to learn symmetries in images!

Nonparametric Continuous Sensor Registration

This section will closely mimic the previous; however it will be more involved.

8.2.1 The Riemann Exponential

The Euler-Poincar e equations for SE(3) are

Integrating these in conjunction with the reconstruction equations produce the Lie exponential for the rotation part and the identity for the translations; the same as the SE(2) case above. That is, Exp0(R, T) = (exp(R), T) .

Remark 16 holds in this case as well. Moreover, this holds for the higher-dimensional SE(n) as well.

8.2.2 The Hessian

To compute the Hessian, we choose the following orthogonal coordinates for se(3):

0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0

0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0

0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0

0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0

Likewise, let {Ex, Ey, Ez, E1, E2, E3} be their corresponding left-invariant vector ﬁelds. With this, the connection is given by

2Ez, Ex Ez = 1

2Ey, Ey Ez = 1

2Ez, Ez Ex = 1

2Ey, Ez Ey = 1

Ex E2 = E3, Ex E3 = E2, Ey E3 = E1,

Ey E1 = E3, Ez E1 = E2, Ez E2 = E1, (37)

while all other combinations are zero.

Remark 17 Notice that the coeﬃcients for the connection in (37) do not depend on either a or b! This is to be expected because the Levi-Civita connection needs to be torsion free. This means that XY Y X = [X, Y ]. Since the coordinates we are taking are the leftinvariant vector ﬁelds, their bracket is just the bracket of the Lie algebra which does not depend on a nor b.

Clark, Ghaffari, and Bloch

We are now ready to write down the Hessian matrix (we will assume that a = b = 1).

LExωx LExωy + 1

2ωz LExωz 1

2ωy LExv1 LExv2 + v3 LExv3 v2 LEyωx 1

2ωz LEyωy LEyωz + 1

2ωx LEyv1 v3 LEyv2 LEyv3 + v1 LEzωx + 1

2ωy LEzωy 1

2ωx LEzωz LEzv1 + v2 LEzv2 v1 LEzv3 LE1ωx LE1ωy LE1ωz LE1v1 LE1v2 LE1v3 LE2ωx LE2ωy LE2ωz LE2v1 LE2v2 LE2v3 LE3ωx LE3ωy LE3ωz LE3v1 LE3v2 LE3v3

Notice that the ﬁrst 3 by 3 block, A, is identical to (22). This reﬂects the copy of SO(3) lying inside SE(3). For the sake of brevity, we will use the following abbreviation:

cij k(xi, zj) Aij Bij Cij Dij

Notice that by symmetry of H, BT ij = Cij. Therefore, we will only compute Aij, Cij, and Dij. To simplify expressions, we will call v = [v1, v2, v3]T := xi zj and u = [u1, u2, u3]T := zj xi.

Aij = A1 ij A2 ij A3 ij

1 ℓ2 v2 1 (x2 i z2 j + x3 i z3 j ) 1 ℓ2 v1v2 + 1

2(x1 i z2 j + x2 i z1 j ) 1 ℓ2 v1v3 + 1

2(x1 i z3 j + x3 i z1 j )

1 ℓ2 v1v2 + 1

2(x1 i z2 j + x2 i z1 j ) 1 ℓ2 v2 2 (x1 i z1 j + x3 i z3 j ) 1 ℓ2 v2v3 + 1

2(x2 i z3 j + x3 i z2 j )

1 ℓ2 v1v3 + 1

2(x1 i z3 j + x3 i z1 j ) 1 ℓ2 v2v3 + 1

2(x2 i z3 j + x3 i z2 j ) 1 ℓ2 v2 3 (x1 i z1 j + x2 i z2 j )

1 ℓ2 u1v1 x3 i + 1

ℓ2 u1v2 x2 i + 1

ℓ2 u1v3 x3 i + 1

ℓ2 u2v1 1 ℓ2 u2v2 x1 i + 1

ℓ2 u2v3 x2 i + 1

ℓ2 u3v1 x1 i + 1

ℓ2 u3v2 1 ℓ2 u3v3

1 ℓ2 u2 1 1 1 ℓ2 u1u2 1 ℓ2 u1u3 1 ℓ2 u1u2 1 ℓ2 u2 2 1 1 ℓ2 u2u3 1 ℓ2 u1u3 1 ℓ2 u2u3 1 ℓ2 u2 3 1

Nonparametric Continuous Sensor Registration

Example 7 (Hessian Eigenvalues, Cont.) The same analysis as presented in Example 6 can be carried out for SE(3) as well. Again, we will use the Hessian to identify symmetries within the point clouds. Speciﬁcally, we will consider the case where both X and Z are a sphere living in R3. The spheres can be aligned via translations but rotations do nothing. This should manifest in the Hessian; the block D in (38) should dominate the Hessian while A should be the smallest. In what follows, X contains 1200 points on S2 R3 and Z contains 1500 points (both of these clouds are already on S2 so the registration problem is already solved) and all cij = 3. The remaining parameters are ℓ= 0.25 and σ = 10 4. The Hessian is

1.3995 4.0561 1.4875 2.1207 0.9970 1.2481 4.0561 6.3054 0.3125 0.7368 1.8921 0.8737 1.4875 0.3125 0.0554 1.5803 0.3814 0.2286 2.1207 0.7368 1.5803 594.3227 1.9310 23.2125 0.9970 1.8921 0.3814 1.9310 603.8121 13.3898 1.2481 0.8737 0.2286 23.2125 13.3898 561.8939

It can be seen that the translation component is on the order of 100 times larger than the rotation component. More formally, we can examine the eigenvalues/vectors of H. Table 4 shows that the eigenvalues corresponding to rotations are around 100 times smaller than the eigenvalues corresponding to translations. As a result, the Hessian can report on the regularity of the solution / identify inﬁnitesimal symmetries in the pictures.

9. Experimental Results: RGB-D Visual Odometry

The RGB-D visual odometry problem is illustrated in Figure 9. In the context of visual odometry, we call our method Continuous Visual Odometry (CVO), which has appeared in our earlier work (Ghaﬀari et al., 2019; Lin et al., 2019). We compare CVO with the state-of-the-art direct (and dense) RGB-D visual odometry (DVO) (Kerl et al., 2013a,b). Since the original DVO source code requires outdated ROS dependency (Kerl, 2013), we reproduced DVO results using the version provided by Matthieu Pizenberg (Pizenberg, 2019), which only removes the dependency for ROS while maintaining the DVO core source code unchanged. To improve the computational eﬃciency, we adopted a similar approach to Direct Sparse Visual Odometry (DSO) by Engel et al. (2014) to create a semi-dense point cloud (around

Eigenvalue -613.9730 -598.4636 -547.6175 -8.1304 -0.2898 3.5950

Eigenvector

0.0000 0.0022 0.0021 0.5827 0.6858 0.4360

0.0032 0.0016 0.0015 0.6897 0.7011 0.1810

0.0033 0.0027 0.0010 0.4298 0.1952 0.8815

0.4029 0.9089 0.1073 0.0030 0.0023 0.0025

0.2685 0.2295 0.9355 0.0032 0.0006 0.0000

0.8750 0.3481 0.3365 0.0019 0.0027 0.0015

Table 4: The eigenvalue/vectors of H in (39).

Clark, Ghaffari, and Bloch

Figure 9: Visual odometry using depth cameras is the problem of ﬁnding a rigid body transformation between two colored point clouds.

Parameters Symbol Value

Transformation convergence threshold ϵ1 1e 5 Gradient norm convergence threshold ϵ2 5e 5 Minimum step length 0.2 Kernel sparsiﬁcation threshold 8.315e 3 Spatial kernel initial length-scale ℓinit 0.1 Spatial kernel signal variance σ 0.1 Color kernel length-scale ℓc 0.1 Color kernel signal variance σc 1

Table 5: Parameters used for evaluation using TUM RGB-D Benchmark, similar values are chosen for all experiments. The kernel characteristic length-scale is chosen to be adaptive as the algorithm converges (Ghaﬀari et al., 2019); intuitively, we prefer a large neighborhood of correlation for each point, but as the algorithm reaches the convergence reducing the local correlation neighborhood allows for faster convergence and better reﬁnement.

3000 points) for each scan. To prevent insuﬃcient points being selected in environments that lack rich visual information, we also used a Canny edge detector (Canny, 1987) from Open CV (Bradski, 2000). When the number of points selected by the DSO point selector is less than one-third of the desired number of points, more points will be selected by downsampling the pixels highlighted by the Canny detector. While generating the point cloud, RGB values are ﬁrst transformed into the HSV colormap and normalized. The normalized HSV values are then combined with the normalized intensity gradients and used as the labels of the selected points in the color space. For all experiments, we used the same set of parameters, which are listed in Table 5. All experiments are performed on a Dell XPS15 9750 laptop with Intel i7-8750H CPU (6 cores with 2.20 GHz each) and 32GB RAM. The source code is implemented in C++

Nonparametric Continuous Sensor Registration

Training Validation

CVO DVO CVO DVO Sequence Trans. Rot. Trans. Rot. Trans. Rot. Trans. Rot.

fr1/desk 0.0486 2.4860 0.0387 2.3589 0.0401 2.0148 0.0371 2.0645 fr1/desk2 0.0535 3.0383 0.0583 3.6529 0.0225 1.7691 0.0208 1.7416 fr1/room 0.0560 2.4566 0.0518 2.8686 0.0446 3.9183 0.2699 7.4144 fr1/360 0.0991 3.0025 0.1602 4.4407 0.1420 3.0746 0.2811 7.0876 fr1/teddy 0.0671 4.8089 0.0948 2.5495 n/a n/a n/a n/a fr1/ﬂoor 0.0825 2.3745 0.0635 2.2805 n/a n/a n/a n/a fr1/xyz 0.0240 1.1703 0.0327 1.8751 0.0154 1.3872 0.0453 3.0061 fr1/rpy 0.0457 3.3073 0.0336 2.6701 0.1138 3.6423 0.3607 7.9991 fr1/plant 0.0316 1.9973 0.0272 1.5523 0.0630 4.9185 0.0660 2.5865

Average 0.0561 2.7380 0.0623 2.6943 0.0631 2.9607 0.1544 4.5571

Table 6: The RMSE of Relative Pose Error (RPE) for fr1 sequences. The trans. columns show the RMSE of the translational drift in m/ sec and the rot. columns show the RMSE of the rotational error in deg/ sec. There s no corresponding validation data sets for fr1/teddy and fr1/floor.

and compiled with the Intel Compiler. The kernel computations are parallelized using the Intel Threading Building Blocks (TBB) (Intel Corporation, 2019). Using compiler autovectorization and the parallelization, the average time for frame-to-frame registration is 0.5 sec. Software is available for download at https://github.com/Maani Ghaffari/cvo-rgbd.

9.1 TUM RGB-D Benchmark

We performed experiments on two parts of RGB-D SLAM data set and benchmark by the Technical University of Munich (Sturm et al., 2012). This data set was collected indoors with a Microsoft Kinect using a motion capture system as a proxy for ground truth trajectory. For all tracking experiments, the entire images were used sequentially without any skipping, i.e., at full frame rate. We evaluated CVO and DVO on the training and validation sets for all the fr1 sequences and the structure versus texture sequences. RGB-D benchmark tools (Sturm et al., 2012) were then used to evaluate the Relative Pose Error (RPE) of both methods.

Table 6 shows the Root-Mean-Squared Error (RMSE) of the RPE for fr1 sequences. The Trans. columns show the RMSE of the translational drift in m/ sec and the Rot. columns show the RMSE of the rotational drift in deg/ sec. There are no corresponding validation sequences for fr1/teddy and fr1/floor. On the validation set, CVO has improved performance compared with DVO which shows it can generalize across diﬀerent scenarios better. From the results, we can see that CVO is intrinsically robust. The next experiment will further reveal that CVO has the advantage of performing well in extreme environments that lack rich structure or texture.

Clark, Ghaffari, and Bloch

Training Validation

Sequence CVO DVO CVO DVO structure-texture-dist. Trans. Rot. Trans. Rot. Trans. Rot. Trans. Rot.

near 0.0279 1.3470 0.0563 1.7560 0.0310 1.6367 0.0315 1.1498 far 0.0609 1.2342 0.1612 3.4135 0.1374 2.3929 0.5351 8.2529 near 0.0221 1.3689 0.1906 10.6424 0.0465 2.0359 0.1449 4.9022 far 0.0372 1.3061 0.1171 2.4044 0.0603 1.8142 0.1375 2.2728 near 0.0236 1.2972 0.0175 0.9315 0.0306 1.8694 0.0217 1.2653 far 0.0409 1.1640 0.0171 0.5717 0.0616 1.4760 0.0230 0.6312 near 0.2119 9.7944 0.3506 13.3127 0.1729 5.8674 0.1747 6.0443 far 0.0799 3.0978 0.1983 6.8419 0.0899 2.6199 0.2000 6.5192

Average 0.0631 2.5762 0.1386 4.9843 0.0787 2.4640 0.1586 3.8797

Table 7: The RMSE of Relative Pose Error (RPE) for the structure v.s texture sequence. The Trans. columns show the RMSE of the translational drift in m/ sec and the Rot. columns show the RMSE of the rotational error in deg/ sec. The means the sequence has structure/texture and means the sequence does not have structure/texture. The results show while DVO performs better in structure and texture case, CVO has signiﬁcantly better performance in the environments that lack structure and texture.

9.2 Structure vs. Texture Sequences

Table 7 shows the RMSE of RPE for the structure vs. texture sequences. This data set contains image sequences in structure/nostructure and texture/notexture environments. As elaborated by Ghaﬀari et al. (2019), by treating point clouds as points in the function space (RKHS), CVO is inherently robust to the lack of features in the environment. CVO shows the best performance on cases where either structure or texture is not rich in the environment. This reinforces the claim that CVO is robust to such scenes.

We also note that DVO has the best performance on the case where the environment contains rich texture and structure information. This could be due to two factors: 1) CVO adopts a semi-dense point cloud construction from DSO (Engel et al., 2018), while DVO uses the entire dense image without subsampling. Although the semi-dense tracking approach of Engel et al. (2014, 2018) is computationally attractive and we advocate it, the semidense point cloud construction process used in this work is a heuristic process and might not necessarily capture the relevant information in each frame optimally; 2) DVO uses a motion prior as regularizer whereas CVO solely depends on the camera information with no regularizer. We conjecture this latter is the reason DVO, relative to the training set, does not perform well on validation sequences. The motion prior is a useful assumption when it is true. It can help to tune the method better on the training sets but if the assumption gets violated can lead to poor performance. The addition of an IMU sensor of course can improve the performance of both methods and is an interesting future research direction.

Nonparametric Continuous Sensor Registration

Figure 10: Left: The images of the two point clouds. The blue stars represent X while the red are Z. Center: A contour plot of F : T 2 R where ℓ= 0.5. Right: A contour plot of F : T 2 R where ℓ= 0.2. A total of 33 local maxima are found in this picture. A video of the eﬀect of varying ℓis available at https://youtu.be/ETr6-c0Vap Q.

Figure 11: The images above display the number of local maxima of the function F : T 2 R as a function of the length-scale. The black curve shows the exact number of local maxima as reported by Matlab s imregionalmax while the blue curve is a plot of N = 1.16602 ℓ 2.

10. Discussions and Further Considerations

In this section, we present discussions on a number of topics that are worth investigating.

10.1 Locality of Solutions

It is important to stress that the registration problem laid out here is only a local solver. However, how detrimental is this? For the purposes of gaining intuition, we will compute F : T 2 R for the T 2 case, see Section 3.2. For the point clouds, we will use the same clouds as in Example 6 (although viewing them in [0, 2π)2 T 2 rather than R2). The torus

Clark, Ghaffari, and Bloch

is discretized into a 500 by 500 grid and length-scales are chosen in the range [0.01, 0.5]. For each value of ℓ, a contour plot of F : T 2 R is constructed and the number of local maxima is recorded (with Matlab s imregionalmax function). Two contour plots corresponding to ℓ= 0.5 and ℓ= 0.2 are shown in Figure 10.

It is important to note that there exists a plethora of local maxima. This illustrates how important it is to have a good initialization, or else the gradient ﬂow will end up getting trapped at a false solution. However, the number of local solutions depends heavily on ℓ. This can be used in practice by having ℓstart oﬀlarge and shrink between iterations, even though care must be taken if one deals with positive-deﬁnite kernel design and hyperparameter learning on curved spaces as the Gaussian kernel might not be positive-deﬁnite for all ℓ(Feragen et al., 2015; Borovitskiy et al., 2020).

Another observation is that the number of local solutions seems to grow according to a power law of the length-scale, as shown in Figure 11. Controlling the number of modes is an important problem because if we can control the number of modes, we can initially generate a convex problem and guide the algorithm to the global solution. This approach can be seen as the continuous analogue of the image pyramid method in computer vision (Szeliski, 2010) where low-resolution images are processed ﬁrst, and the next level has a higher resolution initialized by the previous step.

A heuristic explanation can be that we have more modes of the Gaussian kernel at lower bandwidths since a higher bandwidth makes the bumps around points coalesce. Assuming an inversely proportional relation to the bandwidth, then Figure 11 is a reasonable veriﬁcation of the number of maxima. Nevertheless, a systematic approach for learning the bandwidth and controlling the complexity of the problem remains an open problem and is an interesting future study.

10.2 Reduction to the Weighted Least Squares

The standard solution to sensor registration is typically formulated as a maximum likelihood or least-squares problem and solved via numerical optimization techniques. In the maximum likelihood setting, the measurement noise distribution signiﬁcantly aﬀects the robustness of the solver to outliers. Hence, in practice, the problem is necessarily solved using a robust estimator such as the iteratively reweighted least-squares algorithm or truncated least squares. The central shortcoming of this approach is the reliance on point-wise correspondences between two measurement sets. As a result, the implicit assumption is that the perceived data points by digital sensors are from the same points in reality. Of course, this is seldom true as the nature of digital sensing means a discretization of the reality is captured at best. Sparse and uniform (feature-scarce) measurement sets are where this shortcoming can become problematic. A prime example is the lack of robustness of the visual odometry algorithms to texture and structure-less scenes.

Without loss of generality, we will use the Gaussian (Squared Exponential) kernel based on a distance function, d( , ):

k(x, y) = σ2 exp d(x, y)2 .

Nonparametric Continuous Sensor Registration

Further, let us deﬁne wij := σ2 ℓX(xi), ℓZ(zj) I and dij := d(xi, zj). Then we have

f X, f Z H = X

wij exp d2 ij . (40)

Now we explicitly apply the three assumptions that are used in the least squares problem.

Assumption 1 The assumptions used in the least squares problem are:

1. The point clouds X and Z have the same number of points.

2. The point cloud Z is ordered such that the matching indices are corresponding measurements.

3. The residuals are only computed between corresponding measurements.

By applying the above-mentioned assumptions to (40), we get

f X, f Z H X

i wi exp d2 i ,

where wi = wii and di = dii. and it is reduced to a single sum over matching indices of the double sum in (40). Finally, the exp can be approximated using a Taylor expansion as

exp d2 i 1 d2 i + .

Problem 3 The problem of aligning the point clouds can now be rephrased as the following weighted least squares form

arg min h G S(h), S(h) := X

i wi d(xi, hzi)2.

10.3 Feature Embedding via Tensor Product Representation

For completeness, we include an extension of the proposed framework developed by Zhang et al. (2020) that is of particular interest to practitioners. The framework contains a specialized solver implemented using GPU that exploits the double sum s sparse structure in the objective function. In addition, it incorporates deep learning-based features such as semantic segmentation labels by extending the feature space to a hierarchical distributed representation. For extensive experimental results and open source software that validates presented topics in this work, please see Zhang et al. (2020). Let (V1, V2, . . . ) be diﬀerent inner product spaces describing diﬀerent types of non geometric features of a point, such as color, intensity, and semantics. Their tensor product, V1 V2 . . . is also an inner product space. For any x X, z Z with features ℓX(x) = (u1, u2, . . . ) and ℓZ(z) = (v1, v2, . . . ), with u1, v1 V1, u2, v2 V2, . . . , we have

ℓX(x), ℓZ(z) I = u1 u2 . . . , v1 v2 . . . = u1, v1 u2, v2 . . . . (41)

Clark, Ghaffari, and Bloch

By substituting (41) into (4), we obtain

f X, fh 1Z H = X

xi X,zj Z u1i, v1j u2i, v2j . . . k(xi, h 1zj)

After applying the kernel trick (Bishop, 2006; Williams and Rasmussen, 2006; Murphy, 2012) we arrive at

f X, fh 1Z H = X

xi X,zj Z k(xi, h 1zj) Y

k k Vk(uki, vkj) := X

xi X,zj Z k(xi, h 1zj) cij.

Equation (42) describes the full geometric and non-geometric relationship between the two point clouds. Each cij does not depend on the relative transformation; thus, it will be a constant when computing the gradient and the step size for solving registration problems. Interestingly, the double sum in (42) is sparse because a point xi X is far away from the majority of the points zj Z, either in the spatial (geometry) space or one of the feature (semantic) spaces. This formulation can be further simpliﬁed to a purely geometric model, if we let the label functions ℓX(xi) = ℓZ(zj) = 1. Then (42) becomes

f X, fh 1Z H = X

xi X,zj Z k(xi, h 1zj). (43)

Through (43), the proposed method can register point clouds that do not have appearance measurements. It is worth noting that, when choosing the squared exponential kernel, (43) has the same formulation as Kernel Correlation (Tsin and Kanade, 2004).

10.4 Non-Isotropic Kernel

For the 3D space, the kernel used throughout is

k(x, y) = σ2 exp x y 2

where ℓ R. What happens if we change so ℓ R3 so the length-scale changes depending on the direction? This makes the kernel into

kΛ(x, y) = σ2 exp ( x y, Λ(x y) ) , (44)

where Λ is positive-deﬁnite and symmetric. Going through the same calculations as before, we get that

ij cijkΛ(xi, h 1zj) xi (h 1zj) ,

ij cijkΛ(xi, h 1zj) xi h 1zj . (45)

Nonparametric Continuous Sensor Registration

Note that in the original case,

1 2ℓ2 0 0 0 1 2ℓ2 0 0 0 1 2ℓ2

In terms of applications, modeling directional similarities has been successful in the Generalized-ICP framework developed by Segal et al. (2009). In particular, it was shown that in structured environments, e.g., ﬂat walls, ceiling, and ﬂoor, deﬁning a norm weighted by local empirical covariances of the two point clouds eﬀectively reduces the pose estimation error. If such a distance metric is used inside the kernel, the kernel can be considered to be anisotropic. In terms of modeling, this is similar to a covariance function with automatic relevance determination (Neal, 1996).

10.5 Mat ern Kernel

Rather than using the Gaussian (squared exponential) kernel, we will use the Mat ern kernel (Williams and Rasmussen, 2006). The following derivations are only valid, i.e., positive deﬁnite, for all hyperparameters in Euclidean spaces. For the analogue of squared exponential and Mat ern kernels on compact Riemannian manifold please see the work of Borovitskiy et al. (2020). This is an inﬁnite family of kernels. The Mat ern Kernel, for points a distance d apart is

Cν(d) = σ2 21 ν

where Γ is the gamma function, Kν is the Bessel function of the second kind, and ρ and ν are positive parameters. We have the following derivative property:

d dt Kν(t) = Kν 1(t) ν

With half-integers, ν = p + 1/2, p N

Cp+1/2(d) = σ2 exp 2p + 1d

(p + i)! i!(p i)!

Diﬀerentiating the kernel with respect to d, we see that

d Cν(d) = σ2 21 ν

Working with this does not seem appealing, so let us work with half-integer values instead. Below, the kernels and their corresponding gradients are listed. In what follows, dij = xi h 1zj .

Clark, Ghaffari, and Bloch

10.5.1 ν = 1/2

C1/2(d) = σ2 exp d

ij cij exp dij

ij cij exp dij

xi h 1zj . (48)

10.5.2 ν = 3/2

C3/2(d) = σ2

ω = 3 a2ρ2 X

ij cijdij exp

! xi h 1zj ,

v = 3 b2ρ2 X

ij cijdij exp

! xi h 1zj .

10.5.3 ν = 5/2

C5/2(d) = σ2

ω = 5 3a2ρ3 X

ij cijdij ρ +

! xi h 1zj ,

v = 5 3b2ρ3 X

ij cijdij ρ +

! xi h 1zj .

10.6 Relationship with Deep Learning

Point clouds obtained by modern sensors such as RGB-D cameras, stereo cameras, and LIDARs contain up to 300, 000 points per scan at 10 60 Hz and rich color and intensity (reﬂectivity of a material sensed by an active light beam) measurements besides the geometric information. In addition, deep learning (Le Cun et al., 2015) can provide semantic attributes of the scene as measurements (Long et al., 2015; Chen et al., 2017; Zhu et al., 2019). Representation learning provides a way to perform semi-supervised learning (Chapelle et al., 2006), self-supervised learning (Dahlkamp et al., 2006; Sofman et al., 2006; Doersch and Zisserman, 2017; Pathak et al., 2017), and unsupervised learning (Goodfellow et al., 2016). Although the motivation behind learning a good representation from data varies across diﬀerent ﬁelds and research communities, its mathematical foundation is moving towards a more unifying direction (Bengio et al., 2013; Litjens et al., 2017) where exploiting

Nonparametric Continuous Sensor Registration

well-established classical tools and modern data-driven approaches such as deep learning seem inevitable (Girshick et al., 2014). Perhaps the main undesirable aspect of modern deep learning-based methods is the heavy dependency on large labeled data, i.e., supervised learning. An intrinsic metric available in our framework is the angle, θ, between two functions. This indicator can be computed to track the optimization progress. The cosine of the angle is deﬁned as

cos(θ) = f X, f Z H

f X f Z . (53)

However, calculating f X and f Z is time-consuming as it requires evaluating the double sum for each of the two point clouds. To approximate (53), we use the following result.

Remark 18 Suppose k(xi, xj) = δij and cii = 1, where δij is the Kronecker delta, then f X = p

Corollary 19 Using the previous assumption, we deﬁne the following alignment indicator.

xi X,zj Z cij k(xi, zj). (54)

The behavior of the alignment indicator with respect to the rotation and translation errors is stuied by Zhang et al. (2020). As a connection with deep neural networks, Overlap Net (Chen et al., 2020) uses a neural network to predict a similar metric and detect loop closures. The cosine of the angle in (53) or the indicator in (54) provide such a metric for self-supervised learning while taking into account the semantic information. Given the promising results of the work of Chen et al. (2020), the combination of our metric with deep learning is an interesting future research direction. The expected outcomes will be an scalable diagnostic tool for monitoring the quality of point cloud alignments and representation learning for the model in (42). A more direct approach will be the development of a diﬀerentiable loss functions for deep learning tasks. Zhu et al. (2020) developed a new continuous 3D loss function for monocular single-view depth prediction based on our function representation. The proposed loss function addresses the gap between dense image prediction and sparse LIDAR supervision. By simply adding this new loss function to existing network architectures, the accuracy and geometric consistency of depth predictions are improved signiﬁcantly on all three state-of-the-art baseline networks tested by Zhu et al. (2020).

11. Conclusion

We developed a new mathematical framework for sensor registration that enables nonparametric joint semantic/appearance and geometric representation of continuous functions using data. The continuous functions allow the registration to be independent of a speciﬁc signal resolution, and the framework is fully analytical with a closed-form derivation of the Riemannian gradient and Hessian. We studied a restriction of the framework where the Lie group acts on functions isometrically. This restriction has a variety of applications, and its high-dimensional derivation for the special Euclidean group acting on the Euclidean space

Clark, Ghaffari, and Bloch

showcases the point cloud registration and bird s-eye view map registration abilities. We derived low-dimensional cases with numerical examples to show the generality of the proposed framework. A speciﬁc implementation of this framework for RGB-D cameras performs well in texture and structure-scarce environments and is inherently robust to the input noise. The general problem where the Lie group does not act isometrically on the functions has interesting applications such as image registration and nonrigid structure from motion. Although, the Lie algebra of the aﬃne matrix group does not possess any particular structure, i.e., it includes all square matrices, due to its mentioned applications it is an attractive future research direction. The development of a globally optimal solver is, of course, another challenging and attractive future research direction. The developed framework is a data-driven analytical model that possesses interesting intrinsic mathematical structures simultaneously, i.e., equivariance, and computational structures such as sparsity and parallel processing. Autonomy via computational intelligence is a multifaceted research domain which nicely integrates mathematics, computer science, and engineering and can have enormous impacts on our future and improve our quality of life. We developed a novel framework that can spark new research directions in robotics via sensor registration and mapping applications, computer vision via 3D scene reconstruction applications, medical imaging via nonrigid 3D reconstructions and registration, machine learning via Lie group machine learning, and in general artiﬁcial intelligence via an equivariant cognitive model.

Acknowledgments

A. Bloch and W. Clark were supported in part by NSF grant DMS-1613819 and AFOSR grant FA 0550-18-0028. A. Bloch was also supported in part by NSF grant DMS-2103026. Toyota Research Institute provided funds to support this work. Funding for M. Ghaﬀari was in part provided by NSF Award No. 2118818. The authors thank the anonymous reviewers for many constructive feedback and in-depth review of this work.

Appendix A. Required Mathematical Machinery and Notation

We review some mathematical preliminaries to establish the notation.

A.1 Matrix Lie Group of Motion in Rn

The general linear group of degree n, denoted GLn(R), is the set of all n n real invertible matrices, where the group binary operation is the ordinary matrix multiplication. The n-dimensional special orthogonal group, denoted

SO(n) = {R GLn(R)| RRT = In, det R = +1},

is the rotation group on Rn. The n-dimensional special Euclidean group, denoted

SE(n) = h = R T 0 1

GLn+1(R)| R SO(n), T Rn ,

Nonparametric Continuous Sensor Registration

is the group of rigid transformations, i.e., direct isometries, on Rn. A transformation such as h SE(3) is the parameter space in many sensor registration problems which consists of a rotation and a translation components. Let h SE(3) be an estimate of h. We compute the rotational and translational distances using log( RRT) F and T RRTT , respectively, where log( ) is the Lie logarithm map which is, here, the matrix logarithm. These deﬁnitions are consistent with the transformation distance that can be directly computed using log( hh 1) F. Here, A 2 F = Tr(ATA) is the Frobenius norm. This paper studies not only matrix Lie groups, but also their actions on manifolds. We have the following deﬁnition:

Deﬁnition 20 Let G be a group and X a set. A (left) group action of G on X, denoted as G X, is a group homomorphism G Aut(X) (automorphism of X). If X is a smooth manifold, the action is smooth if G Diﬀ(X) (diﬀeomorphism of X).

Remark 21 A group action can similarly be viewed as a function ϕ : G X X satisfying two conditions (where ϕ(g, x) will be denoted by g.x)

1. Identity: If e G is the identity element, then e.x = x for all x X.

2. Compatibility: (gh).x = g.(h.x) for all g, h G and x X.

Remark 22 The standard action of SE(n) on Rn is given by (R, T).x = Rx + T for R SO(n) and T Rn.

A.2 Hilbert Space

Let V be a ﬁnite-dimensional vector space over the ﬁeld of real numbers R. An inner product on V is a function , : V V R that is bilinear, symmetric, and positive deﬁnite. The pair (V, , ) is called an inner product space. The inner product induces a norm that measures the magnitude or length of a vector: v = p

v, v . The norm in turn induces a metric that allows for calculating the distance between two vectors: d(v, w) = v w = p

v w, v w . Such a metric is homogeneous; for a R, d(av, aw) = |a|d(v, w), and translation invariant; d(v + x, w + x) = d(v, w). These properties are inherited from the induced norm. The distance metric is positive deﬁnite, symmetric, and satisﬁes the triangle inequality. In addition to measuring distances, it is important to be able to understand limits. This leads to the deﬁnition of a Cauchy sequence and completeness.

Deﬁnition 23 (Cauchy Sequence) A Cauchy sequence is a sequence {xi} i=1 such that for any real number ϵ > 0 there exists a natural number n N such that for some distance metric d(xn, xm) < ϵ for all n, m > n.

Deﬁnition 24 (Completeness) A metric space (M, d) is complete if every Cauchy sequence in M converges in M, i.e., to a limit that is in M.

Such a metric space contains all its limit points. Note that completeness is with respect to the metric d and not the topology of the space. Now, we can give a deﬁnition for a Hilbert space.

Clark, Ghaffari, and Bloch

Deﬁnition 25 (Hilbert Space) A Hilbert space, H, is a complete inner product space; that is, any Cauchy sequence in H, using the metric induced by the inner product, converges to an element in H.

The typical example for a Hilbert space is the space of square-integrable functions on R, i.e., H = L2(R, µ) where µ is the Lebesgue measure on R. The inner product is deﬁned as: f, g H := Z f(x)g(x)dµ(x). (55)

Similarly, the induced norm by the inner product is f H = p

f, f H. The Hilbert space of functions can be thought of as an inﬁnite-dimensional counterpart of the ﬁnite-dimensional vector spaces discussed earlier. However, the Hilbert spaces of interest in this work will be reproducing kernel Hilbert spaces discussed below.

A.3 Representation and Reproducing Kernel Hilbert Space

We now move to a more special type of Hilbert Space called Reproducing Kernel Hilbert Space (RKHS) (Berlinet and Thomas-Agnan, 2004) which we will use in this work.

Deﬁnition 26 (Kernel) Let x, x X be a pair of inputs for a function k : X X R known as the kernel. A kernel is symmetric if k(x, x ) = k(x , x), and is positive deﬁnite if for any nonzero f H (or L2(X, µ)): Z k(x, x )f(x)f(x )dµ(x)dµ(x ) > 0.

Deﬁnition 27 (Reproducing Kernel Hilbert Space) Let H be a real-valued Hilbert space on a non-empty set X. A function k : X X R is a reproducing kernel of the Hilbert space H iﬀ:

1. x X, k( , x) H,

2. x X, f H f, k( , x) = f(x).

The Hilbert space H (RKHS) which possesses a reproducing kernel k is called a Reproducing Kernel Hilbert Space or a proper Hilbert space.

The second property is called the reproducing property; that is using the inner product of f with k( , x), the value of function f is reproduced at point x. Also, using both conditions we have: x, z X, k(x, z) = k( , x), k( , z) .

Lemma 28 Any reproducing kernel is a positive deﬁnite function (Berlinet and Thomas Agnan, 2004).

Finding a reproducing kernel of an RKHS might seem diﬃcult, but fortunately, there is a one-to-one relation between a reproducing kernel and its associated RKHS, and such a reproducing kernel is unique. Therefore, our problem reduces to ﬁnding an appropriate kernel.

Nonparametric Continuous Sensor Registration

Theorem 29 (Moore-Aronszajn Theorem, Berlinet and Thomas-Agnan, 2004) Let k be a positive deﬁnite function on X X. There exists only one Hilbert space H of functions on X with k as reproducing kernel. The subspace H0 of H spanned by the function k( , x), x X is dense 1 in H and H is the set of functions on X which are point-wise limits of Cauchy sequence in H0 with the inner product

j=1 αiβjk(zj, xi), (56)

where f = Pn i=1 αik( , xi) and g = Pm j=1 βjk( , zj).

The important property while working in an RKHS is that the convergence in norm implies point-wise convergence; the converse need not be true. In other words, if two functions in an RKHS are close in the norm sense, they are also close point-wise. We will rely on this property to solve the problem discussed in this paper. In Theorem 29, f and g are deﬁned only in H0. The following theorem known as the representer theorem ensures that the solution of minimizing the regularized risk functional admits such a representation.

Theorem 30 (Nonparametric Representer Theorem, Sch olkopf et al., 2001) Let X be a nonempty set and Hk be an RKHS with reproducing kernel k on X X. Suppose we are given a training sample (x1, y1), . . . , (xm, ym) X R, a strictly monotonically increasing real-valued function h on [0, ), an arbitrary cost function c : (X R2)m R { }, and a class of functions 2

F = {f RX |f( ) =

i=1 βik( , zi), βi R, zi X, f Hk < },

where Hk is the induced norm of the RKHS Hk associated with k. Then any f F minimizing the regularized risk functional

c((x1, y1, f(x1)), . . . , (xm, ym, f(xm))) + h( f Hk)

admits a representation of the form

i=1 αik( , xi). (57)

A.4 Riemannian Geometry

Let M be a smooth manifold. This paper will be concerned with the problem of ﬁnding the maximum of a function f : M R. This will be accomplished (locally) via gradient ascent. Due to the fact that M will not generally be Euclidean, we will be using tools from Riemannian geometry. Recall that a vector ﬁeld X : M TM is a section of the tangent bundle, i.e. for π : TM M we have π X = Id. The set of all (smooth) vector ﬁelds on M will be

1. A dense subset of M implies the closure of the subset X equals M. 2. RX is the space of functions mapping X to R.

Clark, Ghaffari, and Bloch

denoted by X(M). In contrast to vector ﬁelds, 1-forms are sections of the cotangent bundle, M T M and the set of all 1-forms are denoted by Ω1(M). For a function f : M R, the diﬀerential is naturally a covector rather than a vector: dfx : Tx M R

t=0 f (γ(t)) , γ(0) = x, γ(0) = v.

In order to determine the gradient, f X(M), we need an identiﬁcation Ω1(M) X(M) which is where the Riemannian metric will be used.

Deﬁnition 31 A Riemannian metric on a manifold M is the assignment to each point p M, of an inner product , p on Tp M which is smooth in the following sense: for X, Y X(M), the map p 7 X(p), Y (p) p is a smooth function on M.

This gives a way to turn df into f as follows:

On a general manifold, if we wish to diﬀerentiate a vector ﬁeld Y in the direction of X (i.e. a directional derivative), we cannot compute it in the traditional sense since diﬀerent tangent vectors lie in diﬀerent tangent spaces so we cannot take their diﬀerences. A way around this issue is to use a connection (see, e.g., 6 of Tu 2017):

Deﬁnition 32 An aﬃne connection on a manifold, M, is a R-bilinear map : X(M) X(M) X(M) such that for all f C (M) and X, Y X(M):

i. f XY = f XY ,

ii. Xf Y = LXf Y + f XY ,

where L is the usual Lie derivative (Tu, 2011, Section 20).

For a Riemannian metric there exists a unique aﬃne connection, called the Levi-Civita connection that is both torsion-free and compatible with the metric. A connection is compatible with the metric if for all X, Y, Z X(M),

Z X, Y = ZX, Y + X, ZY ,

and torsion-free if XY Y X = [X, Y ].

This connection can be constructed in the following way:

2 XY, Z = X Y, Z + Y Z, X Z X, Y

X, [Y, Z] + Y, [Z, X] + Z, [X, Y ] . (58)

Remark 33 Suppose that G is a commutative Lie group and we choose an orthogonal basis, {ei}, of its Lie algebra g. If we let {Ei} be the corresponding left-invariant vector ﬁelds, then the connection vanishes in these coordinates: Ei Ej = 0 for all i, j. This will be useful in Sections 3.1 and 3.2 where we compute the registration problem for G = S1 and G = T 2.

Nonparametric Continuous Sensor Registration

The Levi-Civita connection is used to deﬁne a geodesic on a Riemannian manifold, i.e. a straight line.

Deﬁnition 34 A geodesic is a curve γ : I M where I is an interval in R and

where γ γ is the covariant derivative of the velocity vector ﬁeld γ(t) = d

Theorem 35 Let γ : I M be a curve. Then the following are equivalent:

1. γ is a geodesic,

2. In coordinates γ = (y1, . . . , yn), the curve satisiﬁes the following system of diﬀerential equations yk + X

ij Γk ij yi yj = 0,

where Γk ij are the Christoﬀel symbols (Tu, 2017, Sec. 13.3, p. 99), and

3. γ satisﬁes the Euler-Lagrange equations with Lagrangian L(y, y) = 1

Example 8 Consider the case where M = R2 and the metric is given by u, v p = u Tm(p)v for a symmetric, positive-deﬁnite matrix m which depends on the point p = (x, y) R2. The Euler-Lagrange equations are

x = 0, d dt L

Computing the ﬁrst equation results with:

dt (m11 x + m12 y) = m11 x + m12 y + m11

Combining these (as well as performing the analogous computation for the y equation) we obtain the Euler-Lagrange equations.

m11 x + m12 y + 1

m12 x + m22 y + m12

Decoupling the acceleration terms produces

x + Γx xx x2 + Γx xy x y + Γx yy y2 = 0,

y + Γy xx x2 + Γy xy x y + Γy yy y2 = 0,

Clark, Ghaffari, and Bloch

Γx xx = 1 det(m)

Γx xy = 1 det(m)

Γx yy = 1 det(m)

Γy xx = 1 det(m)

Γy xy = 1 det(m)

Γy yy = 1 det(m)

In particular, due to the existence and uniqueness of solutions to ordinary diﬀerential equations, geodesics always exist locally. Moreover, when the manifold is a Lie group and the metric is invariant under left-translations, the geodesic equations can be described via the Euler-Poincar e equations (Bloch et al., 1996).

Deﬁnition 36 Let G be a Lie group. The function ℓg : G G, h 7 gh for g G is called left-translation by g. Its tangent lift, (ℓg) : TG TG is deﬁned as follows: let v Th G and γ : ( ε : ε) G such that γ(0) = h and γ(0) = v. Then, we have

(ℓg) v := d

t=0 ℓg γ(t).

Deﬁnition 37 For a Lie group, G, a function L : TG R is said to be a left-invariant Lagrangian if (ℓg) L = L for all g G. That is,

L(x, v) = L(gx, (ℓg) v).

Theorem 38 Let G be a Lie group and L : TG R be a left-invariant Lagrangian. Let L : g R be its restriction to the tangent space of G at the identity (the corresponding Lie algebra). For a curve g(t) G, let

ξ(t) = g(t) 1 g(t); i.e., ξ(t) = ℓg 1

Then the following are equivalent:

1. g(t) satisﬁes the Euler-Lagrange equations for L on G,

2. The Euler-Poincar e equations hold:

δξ = ad ξ δL

Proof See Theorem 5.2 of Bloch et al. (1996).

Nonparametric Continuous Sensor Registration

Remark 39 The reasoning behind using g 1 g rather than just g is because g Tg G and g 1 : Tg G Te G = g and therefore g 1 g g.

In coordinates, let ξ = P i ξiei g where {ei} forms a basis. Then equation (59) takes the form d dt L ξj = ck ij L ξk ξi, [ei, ej] = X

k ck ijek. (60)

The idea of a geodesic is useful because it provides a map Tx M M in the following way:

v 7 γ(1), γ(0) = x, γ(0) = v

where γ is a geodesic. This map is called the Riemannian exponential map and is denoted by Expx : Tx M M.

Remark 40 There are two important aspects of the Riemannian exponential that need to be discussed. (1) In general, Expx is only deﬁned on a neighborhood of 0 Tx M and not the whole tangent space. (2) When M = G is a Lie group, the Lie exponential map exp : g G will usually not agree with the Riemannian exponential although they have matching domain and codomain.

The last tool from Riemannian geometry we will use is the Hessian. This is a generalization of the second derivative. For a function f : M R, the Hessian is a second-order tensor given by

k Γk ij f xk

A diﬀerent, equivalent, deﬁnition of the Hessian is:

H(f)(X, Y ) = Xgrad(f), Y = X(Y f) df( XY ).

This object can be used to calculate the covariance as well as be an ingredient in Newton s root ﬁnding method.

Timothy D Barfoot. State Estimation for Robotics. Cambridge University Press, 2017.

Bachir Bekka, Pierre de La Harpe, and Alain Valette. Kazhdan s property (T). Cambridge university press, 2008.

Yoshua Bengio, Aaron Courville, and Pascal Vincent. Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8):1798 1828, 2013.

Youcef Bentoutou, Nasreddine Taleb, Kidiyo Kpalma, and Joseph Ronsin. An automatic image registration for applications in remote sensing. IEEE Transactions on Geoscience and Remote Sensing, 43(9):2127 2137, 2005.

Clark, Ghaffari, and Bloch

Alain Berlinet and Christine Thomas-Agnan. Reproducing kernel Hilbert spaces in probability and statistics. Kluwer Academic, 2004.

Rolf Berndt and Michael Klucznik. An Introduction to Symplectic Geometry. Graduate Studies in Mathematics. American Mathematical Society, 2001. ISBN 9780821820568.

Christopher M Bishop. Pattern recognition and machine learning. Springer, 2006.

Anthony Bloch, P. S. Krishnaprasad, Jerrold E. Marsden, and Tudor S. Ratiu. The Euler Poincar e equations and double bracket dissipation. Communications in Mathematical Physics, 175(1):1 42, Jan 1996.

Anthony M. Bloch, Roger W. Brockett, and Tudor S. Ratiu. Completely integrable gradient ﬂows. Communications in Mathematical Physics, 147(1):57 74, Jun 1992.

Silvere Bonnabel, Martin Barczyk, and Fran cois Goulette. On the covariance of icp-based scan-matching techniques. In Proceedings of the American Control Conference, pages 5498 5503. IEEE, 2016.

Viacheslav Borovitskiy, Alexander Terenin, Peter Mostowsky, and Marc Peter Deisenroth. Mat ern gaussian processes on riemannian manifolds. ar Xiv preprint ar Xiv:2006.10160, 2020.

Gary Bradski. The Open CV Library. Dr. Dobb s Journal of Software Tools, 2000.

Martin Brossard, Silvere Bonnabel, and Axel Barrau. A new approach to 3d icp covariance estimation. IEEE Robotics and Automation Letters, 5(2):744 751, 2020.

John Canny. A computational approach to edge detection. In Readings in Computer Vision, pages 184 203. Elsevier, 1987.

Andrea Censi. An accurate closed-form estimate of icp s covariance. In Proceedings of the IEEE International Conference on Robotics and Automation, pages 3167 3172. IEEE, 2007.

Andrea Censi. An ICP variant using a point-to-line metric. In Proceedings of the IEEE International Conference on Robotics and Automation, pages 19 25. IEEE, 2008.

Olivier Chapelle, Bernhard Sch olkopf, and Alexander Zien. Semi-supervised Learning. Adaptive computation and machine learning. MIT Press, 2006. ISBN 9780262033589. URL https://books.google.com/books?id=kfqv Qg AACAAJ.

Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L Yuille. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4):834 848, 2017.

Xieyuanli Chen, Thomas L abe, Andres Milioto, Timo R ohling, Olga Vysotska, Alexandre Haag, Jens Behley, Cyrill Stachniss, and FKIE Fraunhofer. Overlap Net: Loop closing for Li DAR-based SLAM. In Proceedings of the Robotics: Science and Systems Conference, 2020.

Nonparametric Continuous Sensor Registration

Yang Chen and G erard Medioni. Object modeling by registration of multiple range images. In Proceedings of the IEEE International Conference on Robotics and Automation, pages 2724 2729. IEEE, 1991.

Hendrik Dahlkamp, Adrian Kaehler, David Stavens, Sebastian Thrun, and Gary R. Bradski. Self-supervised monocular road detection in desert terrain. In Proceedings of the Robotics: Science and Systems Conference, Philadelphia, USA, August 2006. doi: 10.15607/RSS. 2006.II.005.

Carl Doersch and Andrew Zisserman. Multi-task self-supervised visual learning. In Proceedings of the IEEE International Conference on Computer Vision, pages 2051 2060, 2017.

Jakob Engel, Thomas Sch ops, and Daniel Cremers. LSD-SLAM: Large-scale direct monocular SLAM. In European Conference on Computer Vision, pages 834 849. Springer, 2014.

Jakob Engel, J org St uckler, and Daniel Cremers. Large-scale direct SLAM with stereo cameras. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 1935 1942. IEEE, 2015.

Jakob Engel, Vladlen Koltun, and Daniel Cremers. Direct sparse odometry. IEEE transactions on pattern analysis and machine intelligence, 40(3):611 625, 2018.

Aasa Feragen, Francois Lauze, and Soren Hauberg. Geodesic exponential kernels: When curvature and linearity conﬂict. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3032 3042, 2015.

Friedrich Fraundorfer and Davide Scaramuzza. Visual odometry: Part ii: Matching, robustness, optimization, and applications. IEEE Robotics & Automation Magazine, 19(2): 78 90, 2012.

Maani Ghaﬀari, William Clark, Anthony Bloch, Ryan M. Eustice, and Jessy W. Grizzle. Continuous direct sparse visual odometry from RGB-D images. In Proceedings of the Robotics: Science and Systems Conference, Freiburg, Germany, June 2019.

Ross Girshick, JeﬀDonahue, Trevor Darrell, and Jitendra Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, June 2014.

Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT Press, 2016. http://www.deeplearningbook.org.

Richard Hartley and Andrew Zisserman. Multiple view geometry in computer vision. Cambridge university press, 2003.

Intel Corporation. Oﬃcial Threading Building Blocks (TBB) Git Hub repository. https: //github.com/intel/tbb, 2019.

Vladimir G. Ivancevic and Tijana T. Ivancevic. Lecture notes in Lie groups, 2011.

Clark, Ghaffari, and Bloch

Sadeep Jayasumana, Richard Hartley, Mathieu Salzmann, Hongdong Li, and Mehrtash Harandi. Kernel methods on riemannian manifolds with gaussian rbf kernels. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(12):2464 2477, 2015.

Christian Kerl. Dense Visual Odometry (dvo). https://github.com/tum-vision/dvo, 2013.

Christian Kerl, J urgen Sturm, and Daniel Cremers. Dense visual SLAM for RGB-D cameras. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 2100 2106. IEEE, 2013a.

Christian Kerl, J urgen Sturm, and Daniel Cremers. Robust odometry estimation for RGB-D cameras. In Proceedings of the IEEE International Conference on Robotics and Automation, pages 3748 3754. IEEE, 2013b.

Taejung Kim and Yong-Jo Im. Automatic satellite image registration by combination of matching and random sample consensus. IEEE Transactions on Geoscience and Remote Sensing, 41(5):1111 1117, 2003.

David Landry, Fran cois Pomerleau, and Philippe Giguere. Cello-3d: Estimating the covariance of icp in the real world. In Proceedings of the IEEE International Conference on Robotics and Automation, pages 8190 8196. IEEE, 2019.

Yann Le Cun, Yoshua Bengio, and Geoﬀrey Hinton. Deep learning. nature, 521(7553): 436 444, 2015.

Tzu-Yuan Lin, William Clark, Ryan M Eustice, Jessy W Grizzle, Anthony Bloch, and Maani Ghaﬀari. Adaptive continuous visual odometry from rgb-d images. ar Xiv preprint ar Xiv:1910.00713, 2019.

Geert Litjens, Thijs Kooi, Babak Ehteshami Bejnordi, Arnaud Arindra Adiyoso Setio, Francesco Ciompi, Mohsen Ghafoorian, Jeroen Awm Van Der Laak, Bram Van Ginneken, and Clara I S anchez. A survey on deep learning in medical image analysis. Medical image analysis, 42:60 88, 2017.

Jonathan Long, Evan Shelhamer, and Trevor Darrell. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3431 3440, 2015.

Edward M Mikhail, James S Bethel, and J Chris Mc Glone. Introduction to modern photogrammetry. Wiley, 2001.

Kevin P Murphy. Machine learning: a probabilistic perspective. The MIT Press, 2012.

Radford M Neal. Bayesian learning for neural networks, volume 118. Springer New York, 1996.

David Nist er, Oleg Naroditsky, and James Bergen. Visual odometry for ground vehicle applications. Journal of Field Robotics, 23(1):3 20, 2006.

Nonparametric Continuous Sensor Registration

Frank C Park and Roger W Brockett. Kinematic dexterity of robotic mechanisms. International Journal of Robotics Research, 13(1):1 15, 1994.

Steven A Parkison, Lu Gan, Maani Ghaﬀari Jadidi, and Ryan M. Eustice. Semantic iterative closest point through expectation-maximization. In Proceedings of the British Machine Vision Conference, pages 1 17, Newcastle, UK, September 2018.

Steven A Parkison, Maani Ghaﬀari, Lu Gan, Ray Zhang, Arash K Ushani, and Ryan M Eustice. Boosting shape registration algorithms via reproducing kernel Hilbert space regularizers. IEEE Robotics and Automation Letters, 4(4):4563 4570, 2019.

Deepak Pathak, Pulkit Agrawal, Alexei A. Efros, and Trevor Darrell. Curiosity-driven exploration by self-supervised prediction. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, July 2017.

Matthieu Pizenberg. DVO core (without ROS dependency). https://github.com/ mpizenberg/dvo/tree/76f65f0c9b438675997f595471d39863901556a9, 2019.

Ramona-Andreea Rohan. Some remarks on the exponential map on the groups SO(n) and SE(n). In Proceedings of the Fourteenth International Conference on Geometry, Integrability and Quantization, pages 160 175, Soﬁa, Bulgaria, 2013. Avangard Prima.

Antoni Rosinol, Marcus Abate, Yun Chang, and Luca Carlone. Kimera: an opensource library for real-time metric-semantic localization and mapping. ar Xiv preprint ar Xiv:1910.02490, 2019.

Davide Scaramuzza and Friedrich Fraundorfer. Visual odometry. IEEE robotics & automation magazine, 18(4):80 92, 2011.

Bernhard Sch olkopf, Ralf Herbrich, and Alex Smola. A generalized representer theorem. In Computational learning theory, pages 416 426. Springer, 2001.

Aleksandr Segal, Dirk Haehnel, and Sebastian Thrun. Generalized-ICP. In Proceedings of the Robotics: Science and Systems Conference, Seattle, USA, June 2009. doi: 10.15607/ RSS.2009.V.021.

James Servos and Steven L Waslander. Multi-Channel Generalized-ICP: A robust framework for multi-channel scan registration. Robotics and Autonomous Systems, 87:247 257, 2017.

Boris Sofman, Ellie Lin, J Andrew Bagnell, John Cole, Nicolas Vandapel, and Anthony Stentz. Improving robot navigation through self-supervised online learning. Journal of Field Robotics, 23(11-12):1059 1075, 2006.

Todor Stoyanov, Martin Magnusson, Henrik Andreasson, and Achim J Lilienthal. Fast and accurate scan registration through minimization of the distance between compact 3D NDT representations. International Journal of Robotics Research, 31(12):1377 1393, 2012.

Clark, Ghaffari, and Bloch

Hauke Strasdat. Local accuracy and global consistency for eﬃcient visual SLAM. Ph D thesis, Imperial College London, 2012.

J urgen Sturm, Nikolas Engelhard, Felix Endres, Wolfram Burgard, and Daniel Cremers. A benchmark for the evaluation of RGB-D SLAM systems. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 573 580. IEEE, 2012.

Richard Szeliski. Computer vision: algorithms and applications. Springer Science & Business Media, 2010.

Sebastian Thrun, Wolfram Burgard, and Dieter Fox. Probabilistic robotics. MIT press, 2005.

Yanghai Tsin and Takeo Kanade. A correlation-based approach to robust point set registration. In Proceedings of the European Conference on Computer Vision, pages 558 569. Springer, 2004.

Loring W. Tu. An introduction to manifolds. New York, US: Springer, 2011.

Loring W. Tu. Diﬀerential Geometry. Graduate Texts in Mathematics. Springer, 2017. ISBN 9783319660824.

Christopher K. Williams and Carl Edward Rasmussen. Gaussian processes for machine learning, volume 1. MIT press, 2006.

Anestis Zaganidis, Li Sun, Tom Duckett, and Grzegorz Cielniak. Integrating deep semantic segmentation into 3-d point cloud registration. IEEE Robotics and Automation Letters, 3(4):2942 2949, October 2018.

Ray Zhang, Tzu-Yuan Lin, Chien Erh Lin, Steven A Parkison, William Clark, Jessy W Grizzle, Ryan M Eustice, and Maani Ghaﬀari. A new framework for registration of semantic point clouds from stereo and RGB-D cameras. In Proceedings of the IEEE International Conference on Robotics and Automation, 2020.

Minghan Zhu, Maani Ghaﬀari, Yuanxin Zhong, Pingping Lu, Zhong Cao, Ryan M Eustice, and Huei Peng. Monocular depth prediction through continuous 3D loss. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2020.

Minghan Zhu, Maani Ghaﬀari, and Huei Peng. Correspondence-free point cloud registration with SO(3)-equivariant implicit shape representations. In Proceedings of the Conference on Robot Learning, 2021. URL https://openreview.net/forum?id=KOq9q Dgn-Ta.

Yi Zhu, Karan Sapra, Fitsum A Reda, Kevin J Shih, Shawn Newsam, Andrew Tao, and Bryan Catanzaro. Improving semantic segmentation via video propagation and label relaxation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 8856 8865, 2019.