# nonparametric_continuous_sensor_registration__a4598148.pdf Journal of Machine Learning Research 22 (2021) 1-50 Submitted 12/20; Revised 8/21; Published 10/21 Nonparametric Continuous Sensor Registration William Clark wac76@cornell.edu Department of Mathematics Cornell University Ithaca, NY 14850, USA Maani Ghaffari maanigj@umich.edu Department of Naval Architecture and Marine Engineering University of Michigan Ann Arbor, MI 48109 USA Anthony Bloch abloch@umich.edu Department of Mathematics University of Michigan Ann Arbor, MI 48109 USA Editor: Sayan Mukherjee This paper develops a new mathematical framework that enables nonparametric joint semantic and geometric representation of continuous functions using data. The joint embedding is modeled by representing the processes in a reproducing kernel Hilbert space. The functions can be defined on arbitrary smooth manifolds where the action of a Lie group aligns them. The continuous functions allow the registration to be independent of a specific signal resolution. The framework is fully analytical with a closed-form derivation of the Riemannian gradient and Hessian. We study a more specialized but widely used case where the Lie group acts on functions isometrically. We solve the problem by maximizing the inner product between two functions defined over data, while the continuous action of the rigid body motion Lie group is captured through the integration of the flow in the corresponding Lie algebra. Low-dimensional cases are derived with numerical examples to show the generality of the proposed framework. The high-dimensional derivation for the special Euclidean group acting on the Euclidean space showcases the point cloud registration and bird s-eye view map registration abilities. An implementation of this framework for RGB-D cameras outperforms the state-of-the-art robust visual odometry and performs well in texture and structure-scarce environments. Keywords: Kernel methods, nonparametric representation, sensor registration, Lie groups, Riemannian geometry, equivariant models 1. Introduction We consider the problem of sensor registration, which is defined as finding the transformation between two sensors with overlapping measurements or tracking a moving sensor using its sequential measurements. This problem frequently arises in engineering domains such as point cloud registration (Chen and Medioni, 1991; Censi, 2008; Segal et al., 2009; Stoyanov et al., 2012; Servos and Waslander, 2017; Parkison et al., 2018; Zaganidis et al., 2018; c 2021 William Clark, Maani Ghaffari, and Anthony Bloch. License: CC-BY 4.0, see https://creativecommons.org/licenses/by/4.0/. Attribution requirements are provided at http://jmlr.org/papers/v22/20-1468.html. Clark, Ghaffari, and Bloch Figure 1: Conceptual illustration of nonparametric continuous sensor registration. Continuous functions are constructed from measurements shown in red circles. The domain of the functions can be an arbitrary smooth manifold, M, where the action of a Lie group, G, is used to align them via h.f Z and h G. Parkison et al., 2019), visual odometry and SLAM (Nist er et al., 2006; Scaramuzza and Fraundorfer, 2011; Fraundorfer and Scaramuzza, 2012; Strasdat, 2012; Kerl et al., 2013a; Engel et al., 2015; Ghaffari et al., 2019; Rosinol et al., 2019), satellite image registration (Kim and Im, 2003; Bentoutou et al., 2005), and photogrammetry (Mikhail et al., 2001). In terms of applications, the above mentioned problems are fundamental to many computer vision (Hartley and Zisserman, 2003; Szeliski, 2010) and robotics (Thrun et al., 2005; Barfoot, 2017) algorithms. In this paper, we formulate a fundamentally novel nonparametric framework for sensor registration that constructs continuous functions from raw measurements; hence our approach can be considered as a direct method. The functions can be thought of as curves, surfaces, or hyper-surfaces that live in a Reproducing Kernel Hilbert Space (RKHS). For example, given an RGB-D image, a function can be constructed that maps a three-dimensional (3D) point to the RGB color space. Therefore, such functions inherently represent a joint model for semantic/appearance and geometry. Furthermore, the domain of the functions can be an arbitrary smooth manifold where the action of a Lie group is used to align them. This concept is illustrated in Figure 1. The continuous functions allow the registration to be independent of a specific signal resolution and the framework is fully analytical with a closed-form derivation of the Riemannian gradient and Hessian. In the present work, we study a restriction of the framework where the Lie group acts on functions isometrically. This restriction has a variety of applications and its highdimensional derivation for the special Euclidean group acting on the Euclidean space showcases the point cloud registration and bird s-eye view map registration abilities. In particular, this work has the following contributions: Nonparametric Continuous Sensor Registration 1. a new mathematical framework for sensor registration that enables nonparametric joint semantic/appearance and geometric representation of continuous functions using data; 2. low-dimensional cases are derived with numerical examples to show the generality of the proposed framework; 3. a high-dimensional derivation for the special Euclidean group acting on the Euclidean space showcases the point cloud registration and bird s-eye view map registration abilities. A specific derivation and implementation of this framework for RGB-D cameras performs well in texture and structure-scarce environments. 4. the open-source implementation of the derived cases in this work is available for download at https://github.com/Maani Ghaffari/c-sensor-registration https://github.com/Maani Ghaffari/cvo-rgbd An early idea of this work in the context of RGB-D visual odometry was presented at the 2019 Robotics: Science and Systems conference in Freiburg, Germany (Ghaffari et al., 2019). This work generalizes the previous framework and places it into a broader context. In particular, an abstract sensor registration problem is defined in Section 2, where low-dimensional and high-dimensional examples together with their second-order geometry on flat or curved spaces are studied. The generalization to curved spaces also connects this work to an ongoing research direction regarding positive-definite kernel design and hyperparameter learning on curved spaces (Jayasumana et al., 2015; Feragen et al., 2015; Borovitskiy et al., 2020). Furthermore, the derivation of the Riemannian Hessian has important implications in quantifying the pose estimation covariance (Censi, 2007; Bonnabel et al., 2016; Landry et al., 2019; Brossard et al., 2020) and symmetry of the problem by studying vector fields generated by the eigenvalues of Hessian (Zhu et al., 2021), and therefore is included for completeness. However, its applications in those research areas and second-order solvers are left for future studies. Finally, the new experiments on RGB-D visual odometry in Section 9 are done via an open-source and efficient C++ implementation and compared against DVO (Kerl et al., 2013a,b). The remainder of this paper is organized as follows. A review of the required mathematical machinery and notation is given in Appendix A. Section 2 presents the novel continuous sensor registration framework. Section 3 provides low-dimensional examples of the framework including point cloud registration on circles and torus. We discuss some theoretical limitations of the kernel design on compact Riemannian manifolds in Section 4. The high-dimensional example for the special Euclidean group is given in Section 5. A brief theoretical analysis for the verification of the idea is provided in Section 6. The integration of the flow for the special Euclidean group to obtain the solution is explained in Section 7. Section 8 deals with the Riemannian geometry of the special Euclidean groups. Experimental evaluations of the proposed method for registration and tracking using RGB-D images are presented in Section 9. Section 10 provides discussions regarding several important topics related to the proposed approach. Finally, Section 11 concludes the paper and provides future work ideas. Clark, Ghaffari, and Bloch 2. Problem Setup Let M be a smooth manifold and consider two (finite) collections of points, X = {xi}, Z = {zj} M. Also, suppose we have a (Lie) group, G, acting on M. We want to determine which element h G aligns best the two point clouds X and h Z = {hzj}. To assist with this, we will assume that each point contains information described by a point in an inner product space, (I, , I). To this end, we will introduce two labeling functions, ℓX : X I and ℓZ : Z I. In order to measure their alignment, we will be turning the clouds, X and Z, into functions f X, f Z : M I that live in some reproducing kernel Hilbert space, (H, , H). Remark 1 The action, G M induces an action G C (M) by h.f(x) := f(h 1x). Inspired by this observation, we will set h.f Z := fh 1Z. In the following, we give a summary of the recipe for cloud alignment as well as formal sensor registration problems. Definition 2 A sensor registration problem is given by a 5-tuple: (G, M, ϕ, , g, k) where (G1) G is a Lie group, (G2) M is a smooth manifold, (G3) ϕ : G Diff(M) is a smooth group action, (G4) , g is an inner product on g = Lie(G), and (G5) k : M M R is a symmetric positive definite function, called the kernel, while the information is given by a 3-tuple: (I, (X, ℓX), (Z, ℓZ)) where (I1) I is an inner product space, called the information space, (I2) X M is a finite collection of points called the target and ℓX : X I is its label, and (I3) Z M is a finite collection of points called the source and ℓZ : Z I is its label. The information (G1)-(G5) is required to build the general form of the gradient while the information (I1)-(I3) encodes the actual clouds which is subsequently plugged into the gradient. The remainder of this work consists of understanding cases with varying (G1)- (G5) information. We first define the general form of the problem as follows. Problem 1 Suppose f X and f Z are the constructed functions over point clouds X and Z, respectively. The problem of aligning two point clouds can be formulated as minimizing the distance between f X and h.f Z in the sense of the RKHS norm. That is, we want to solve arg min h G J(h), J(h) := f X h.f Z 2 H, (1) where H is the RKHS induced by the kernel k from (G5). Nonparametric Continuous Sensor Registration In this work, we restrict the problem to a special case. If G C (M) is an isometry, we have f X h.f Z 2 H = f X 2 H + f Z 2 H 2 f X, h.f Z H. Then we can reformulate Problem 1 into a reduced form as follows. Problem 2 The problem of aligning the point clouds can now be rephrased as maximizing the scalar products of f X and h.f Z, i.e., we want to solve arg max h G F(h), F(h) := f X, h.f Z H. (2) 2.1 Constructing the Functions We first choose a symmetric function k : M M R to be the kernel of our RKHS, H. This allows us to turn the point clouds to functions via f X( ) := X xi X ℓX(xi)k( , xi), f Z( ) := X zj Z ℓZ(zj)k( , zj). (3) We can now define the inner product of f X and f Z by f X, f Z H := X xi X,zj Z ℓX(xi), ℓZ(zj) I k(xi, zj). (4) We note that it is possible to use the well-known kernel trick in machine learning (Bishop, 2006; Williams and Rasmussen, 2006; Murphy, 2012) to substitute the inner products in (4) with the appearance/semantic kernel, i.e., kc : I I R. The kernel trick can be applied to carry out computations implicitly in the high dimensional space, which leads to computational savings when the dimensionality of the feature space is large compared to the number of data points (Williams and Rasmussen, 2006). After applying the kernel trick to (4), we get f X, f Z H = X xi X,zj Z kc(ℓX(xi), ℓZ(zj)) k(xi, zj) := X xi X,zj Z cij k(xi, zj). (5) Remark 3 We note two advantages of measuring the alignment of X and Z by (4). The first is that we do not need identification of which point of X should be paired with what point of Z. The second is that the number of points in X does not even need to be equal to the number of points in Z! Remark 4 Although the double sum in (4) seems to make the problem computationally intractable when dealing with a large number of data points, the inner product in (4) is sparse. The sparse structure is due to the kernelized formulation and the fact that the kernel bandwidth is spatially limited. As such, most kernel terms are zero. See Fig. 1 in Ghaffari et al. (2019). This notion of sparsity extends to labels, i.e., semantic information, once we kernelize the information inner product as done in (5). Furthermore, to exploit this sparse structure is the motivation for the development of a custom solver. However, the specific solver used in this work is not tied to the proposed framework. Clark, Ghaffari, and Bloch 2.2 Building the Gradient Flow In order to (at least locally) solve (2), we will construct a gradient flow: h = F(h) (this is similar to the treatment in Bloch et al. (1992) where a gradient flow is used to maximize the inner product between elements of a Lie algebra corresponding to a compact Lie group). Before we proceed, we will first determine the differential, d F. In order to do this, we will need the notion of an infinitesimal generator for a group action (see chapter 4 of Berndt and Klucznik 2001). Definition 5 Suppose that a Lie Group G acts diffeomorphically on a smooth manifold M via ϕ; that is ϕ : G Diff(M) For a given ξ g = Lie(G), we denote the vector field ξM (called the infinitesimal generator) on M given by the rule: dfx(ξM) := d t=0 f(ϕexp(tξ)(x)), f C (M). (6) This lets us compute the differential, d F. Theorem 6 Suppose that F(h) = f X, h.f Z H as described above. Then d Fe(ξ) = X zj ( ξM(zj)) , (7) where kxi = k(xi, ). Remark 7 The notation for the differential of a function used throughout this paper is dfx(v), where x M and v Tx M: t=0 f(c(t)), c(0) = x, c (0) = v. Proof This follows from a straightforward application of the chain rule. d Fe(ξ) = d t=0 f X, exp(tξ).f Z H t=0 k (xi, exp( tξ)zj) = X cij d kxi t=0 exp( tξ)zj zj ( ξM(zj)) . Nonparametric Continuous Sensor Registration This matches equation (7). Of course, to construct the gradient flow we are interested in computing d Fh instead of just d Fe. We will accomplish this via left-translation. Left-translation is given by the smooth map ℓh : G G where x 7 hx. Its differential gives rise to an isomorphism of tangent spaces, (ℓh) : g Th G. Corollary 8 Under the identification Th G = (ℓh) g, we have d Fh ((ℓh) ξ) = X ξM(h 1zj) , (8) In order to turn the co-vector d Fh T h G into a vector Fh Th G, we will use a leftinvariant metric. This can be accomplished by defining an inner-product, , g on g and lifting to a (Riemannian) metric on G via left-translation, i.e. (ℓh) η, (ℓh) ξ Th G := η, ξ g. This allows us to define the gradient of F as Fh, (ℓh) ξ Th G = d Fh ((ℓh) ξ) . (9) This allows for a way to obtain a (local) solution to (2) by following h = F(h). (10) To demonstrate the generality of this registration algorithm, the cases in Table 1 will be examined in the remainder of this paper. M S1 T 2 S2 Rn G S1 T 2 SO(3) SE(n) Table 1: A list of the manifolds / Lie groups to be studied in this paper. Particular interest will be placed on the (R2, SE(2)) and (R3, SE(3)) examples. 3. Low-dimensional Examples While the primary focus of this paper is to align RGB-D images in R3, we take the opportunity to show how the above algorithm works on other spaces: specifically the spaces S1, T 2, and S2: the circle, the 2-torus, and the sphere. 3.1 The Circle To demonstrate our algorithm, we will first work out the case where the manifold containing the clouds is the circle, M = S1. S1 acts naturally on itself via left-translations so we shall take G = S1 as well and ϕ to be left-translation, i.e. ϕ(α)(θ) = α + θ mod 2π. Clark, Ghaffari, and Bloch The Lie algebra for the circle is the real line and we will therefore take the inner product to be scalar multiplication, a, b R = ab. The last piece of geometry we need is the kernel (the information space and the clouds will be determined later). To build k, we will use the Euclidean distance of points on the circle: let x, y [0, 2π) S1. d2(x, y)2 := (cos x cos y)2 + (sin x sin y)2 = 2 (1 cos(x y)) . (11) We will use the Gaussian kernel based on this distance function: k(x, y) = σ2 exp d2(x, y)2 where ℓ> 0 is a parameter called the length-scale. If we again call cij = ℓX(xi), ℓZ(zj) I, then using (8), the differential is (for s R) d Fh((ℓh) s) = 1 cijk(xi, zj h) sin(zj h xi) s. Therefore, the gradient is xi X,zj Z cijk(xi, zj h) sin(zj h xi). (13) Finally, the Hessian can be found by simply differentiating the gradient as is xi X,zj Z cijk(xi, zj h)[sin2 (zj h xi) cos (zj h xi)]. (14) Example 1 We perform the registration problem on the circle. X = linspace(0, π/2, 10) and Z = linspace(π/2, π, 12). Figure 2 shows two point clouds before and after registration. The labels , ℓX and ℓZ, for all points are set to 1. 3.2 The Torus The torus is the product of two circles: T 2 = S1 S1. Therefore, both the gradient and Hessian will be two copies of the corresponding object from the circle case. Similarly to the circle case, the group will act via left-translations and we will take (x1, x2) [0, 2π) [0, 2π) T 2 for coordinates. The kernel will be k(x, y) = σ2 exp d T 2(x, y)2 d T 2(x, y)2 = d2(x1, y1)2 + d2(x2, y2)2. (15) Nonparametric Continuous Sensor Registration Figure 2: Left: The clouds X and A Z. Right: The clouds X and T A Z. The gradient is then cijk(xi, zj h) sin z1 j h1 x1 i sin z2 j h2 x2 i The Hessian is straight-forward to calculate due to the fact that T 2 is abelian. The Hessian is HT 2 = 1 cijk(xi, zj h) hij, (17) hij = diag sin2 z1 j h1 x1 i cos z1 j h1 x1 i , sin2 z2 j h2 x2 i cos z2 j h2 x2 i This procedure naturally generalizes to T n. Example 2 We perform the registration problem on the torus with a 5 pointed star as shown in Figure 3. X contains 50 points while Z contains 75. The displacement is [1, 1] and the transformation found was [1.0077, 0.9962] which is an error of [ 0.0077, 0.0038] which has norm of 0.0085. 3.3 The Sphere The other initial example we wish to develop is the case when M = S2, the sphere. The Lie group we choose to act will be SO3, the special orthogonal group. This will act via usual matrix multiplication: let x S2 R3, then ϕ(g)(x) = gx. (18) Clark, Ghaffari, and Bloch Figure 3: Top: The clouds X and A Z. Bottom: The clouds X and T A Z. In terms of the point cloud data, assume that we have clouds with points in R3. These points will be projected to S2 (via normalizing) which is compatible with the group action described above. Given two points x, y R3, their arc-length distance, once projected to S2, is d S2(x, y) = arccos x, y 3 With this distance (which is different from the distance used in the circle example), we define the kernel to be k(x, y) = exp d S2(x, y)2 In order to determine the gradient, we still need a metric for so3 for which we will use the (negative of the) Killing form. This gives a gradient of F(R) = Rˆω where R SO3, ˆ : R3 skew3 (three-by-three skew-symmetric matrices), and cij k(xi, R 1zj)d S2(xi, R 1zj) p xi 2 zj 2 xi, R 1zj 2 xi R 1zj . (21) 3.3.1 The Hessian Unlike the circle and torus examples, calculating the Hessian for the sphere is substantially more involved. This follows from the fact that SO(3) is no longer abelian. In order to determine the Hessian, we need to know the connection on SO(3) induced by the Killing form on so(3). Let us choose the standard basis {ex, ey, ez} so(3) where 0 0 0 0 0 1 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 Nonparametric Continuous Sensor Registration Additionally, let {Ex, Ey, Ez} be their corresponding left-invariant vector fields. The connection, calculated from (58), is given by 2Ez, Ex Ez = 1 2Ex, Ey Ex = 1 2Ey, Ez Ey = 1 Writing (21) as ˆω = ωxex + ωyey + ωzez, the Hessian is given by H(eα, eβ) = X γ {x,y,z} (LEαωγ) Eγ + ωγ EαEγ, eβ = LEαωβ + 1 where εαγβ = [eα, eγ], eβ are the structure constants. Combining, the Hessian is LExωx LExωy 1 2ωz LExωz + 1 2ωy LEyωx + 1 2ωz LEyωy LEyωz 1 2ωx LEzωx 1 2ωy LEzωy + 1 which, although it is tedious, can be computed exactly. Remark 9 The reason that (22) returns the negative of the Hessian is because we are actually differentiating the inverse of the group element. Example 3 To illustrate registration in this case, we will align two pictures of the globe. This will be done with a topographic map of the Earth using Matlab s topo. Both X and Z Parameters Value transformation convergence threshold ϵ 1e 4 gradient norm convergence threshold ϵ 5e 4 kernel characteristic length-scale ℓ 0.25 kernel characteristic length-scale ℓ(iteration > 3) 0.15 kernel characteristic length-scale ℓ(iteration > 10) 0.10 kernel characteristic length-scale ℓ(iteration > 20) 0.05 kernel signal variance σ 0.1 step length 0.1 color space inner product scale 1e 5 kernel sparsification threshold 1e 3 Table 2: Parameters used for evaluation of the globe registration. The kernel characteristic length-scale is chosen to be adaptive as the algorithm converges; intuitively, we prefer a large neighborhood of correlation for each point, but as the algorithm reaches the convergence reducing the local correlation neighborhood allows for faster convergence and better refinement. Clark, Ghaffari, and Bloch Figure 4: Left (Reference): The image of the contour map of the Earth. Center (Perturbed): The plot of X along with A Z. Right (Aligned): The plot of X along with T A Z. are initialized to be identical, however Z is perturbed by a transformation A SO(3). The goal is to recover A 1 to realign Z with X. The algorithm reports T SO(3) which should be the inverse of A. The values of A, T, T A, and T A F are shown in (23). 0.9386 0.3170 0.1361 0.3230 0.9461 0.0244 0.1210 0.0668 0.9904 0.9391 0.3217 0.1205 0.3160 0.9466 0.0643 0.1347 0.0223 0.9906 1.0000 0.0014 0.0007 0.0014 1.0000 0.0024 0.0007 0.0024 1.0000 , T A F = 0.0040. The parameters used in this registration are included in Table 2. The initial and final images are shown in Figure 4. 4. Limitations on Compact Riemannian Manifolds In this section, we discuss an important problem regarding the positive-definiteness of the kernel on compact Riemannian manifolds (Jayasumana et al., 2015; Feragen et al., 2015; Borovitskiy et al., 2020). The key result is the following theorem. Theorem 10 (Theorem 5 of Feragen et al. 2015, Bekka et al. 2008) Let X be a topological space and let Ψ : X X R be a continuous kernel such that Ψ(x, x) = 0 and Ψ(x, y) = Ψ(y, x) for all x, y X. Then the following are equivalent 1. Ψ is conditionally of negative type, and 2. the kernel exp(tΨ) is of positive type for every t 0. However, the issue is that for Ψ = d(x, y)2 to be conditionally negative, we need to be on Euclidean space. Fortunately, this is more than what we require in this work as we are not concerned if the positive-definiteness holds for all t 0. Even though we are not on a Euclidean space, we do not know that there does not exist a t 0 to make it positive definite. The following theorem ensures the existence of such a t. Nonparametric Continuous Sensor Registration Theorem 11 Let M Rm be some embedded manifold. Let dc be the chordal distance on M (straight lines in Rm) and dg be the geodesic distance. For a finite collection of points {xi} M, there exists a t 0 such that the matrix Kij = exp t dg(xi, xj)2 is positive definite. Proof It is clear that for all i, Kii = 1. As long as the diagonal elements dominate, the matrix will be positive definite. Let j =i exp ( t dg(xi, xj)) . Choose t 0 large enough such that Ri < 1 for all i. Then by the Gershgorin circle theorem, all the eigenvalues of K are positive. A bound on t can be explicitly found via the following. Let Λ := min i =j dg(xi, xj). Then Ri (N 1) exp( tΛ), where N is the number of points. This produces This is not a particularly useful result as the parameter t depends on the point cloud. Therefore it remains an open problem whether we can find a t (or an interval) that works on all possible point clouds systematically. This is a particularly attractive problem in machine learning as one ultimately wants to learn all hyperparameters. In particular, the derivation of useful bounds on t as well as algorithmic implementation of learning it are interesting future research directions. Example 4 (The Sphere) Let us see how the case of the sphere holds up. Here, we take the kernel k(x, y) = exp d S2(x, y)q , d S2(x, y) = arccos x, y 3 In particular, t = 1/(2ℓ2). For the usual case the exponent is q = 2. However, this may be changed as well. The idea is that k will not be positive definite for all ℓ, but it should be when ℓis small enough. This will be numerically explored by taking a random sample of N points on S2 and computing the matrix K. If all of its eigenvalues are positive, then it is positive definite. As such, we are interested in finding the smallest eigenvalue. Clark, Ghaffari, and Bloch Figure 5: Top left: The geodesic kernel (24) fails to be positive definite as ℓbecomes large. As long as ℓis small (which corresponds to t being large), the kernel appears to be positive definite. Top right: A zoomed version. Bottom left: The positive definiteness appears to not depend too heavily on the number of points. Bottom right: All of the previous figures were with the exponent fixed at q = 2. This figure varies the exponent. The chordal kernel appears to be positive-definite when q 2 which is in agreement with the work of Feragen et al. (2015). 5. Special Euclidean Groups We now move onto the more involved examples where M = Rn and G = SE(n), the special Euclidean group in n dimensions. G acts on M in in the standard fashion: let (R, T) SE(n) where R SO(n) and T Rn, (R, T).x = Rx + T, x Rn. We will also choose the squared exponential kernel for k : Rn Rn R: k(x, y) = σ2 exp x y 2 n 2ℓ2 Nonparametric Continuous Sensor Registration for some fixed real parameters σ and ℓ, and n is the standard Euclidean norm on Rn. In order to determine the gradient flow (10), we need to compute the infinitesimal generators of the action SE(n) Rn as well as decide on a left-invariant metric for se(n) = Lie(SE(n)). 5.1 Infinitesimal Generator For a fixed ξ se(n), it has the form ξ = (ω, v) where ω so(n) and v Rn. Because the infinitesimal generator map g X(M) is a Lie algebra homomorphism (where X(M) is the space of all vector fields over M, see 27 of Tu 2017), we see that ξM = ωM + v M. A straight forward computation leads to ξRnx = ˆωx + v, ˆω skew(n), v Rn. (26) We need to choose a metric on se(n) to turn d Fh into Fh. For this example, we will take a multiple of the Killing form on so(n) and the Euclidean norm on Rn. That is, (ω, v), (η, u) se(n) = b2 v, u n a2 n 2 Tr(ωη), (27) where , n is the standard Euclidean inner product on Rn (see Park and Brockett (1994) for a discussion in three dimensions), and a and b are tuning parameters. The reason for the (2 n)/2 term is because with this normalization (with a = 1) the skew matrices Eij Eji are orthonormal. Here Eij denotes the matrix with all zeros except for a 1 in the (i, j)-coordinate. 5.3 Calculating the Gradient Before we find the gradient, let us first determine its differential (at the identity for simplicity). d Fe(ξ) = X zj ( ξM(zj)) ℓ2 k(xi, zj) (xi zj), ( ˆωzj v) n. To turn d Fe into Fe, we will compute ωFe and v Fe separately: Tr [( ωFe)ˆω] = X ℓ2 k(xi, zj) (xi zj), ( ˆωzj) n, b2 ( v Fe), v n = X ℓ2 k(xi, zj) (xi zj), ( v) n. Clark, Ghaffari, and Bloch To solve for this in coordinates, we will let {em}n m=1 be the standard orthonormal basis for (Rn, , n) and {Jpq}p 0 that maximizes the quartic (30). Clark, Ghaffari, and Bloch Figure 7: Left: The clouds X and A Z. Right: The clouds X and T A Z. 7.3 Special Case: n = 2 When we take the special case of SE(2), the gradient (28) takes the special form: ω = 1 a2ℓ2 X cijk(xi, h 1zj) xi (h 1zj) v = 1 b2ℓ2 X cijk(xi, h 1zj) h 1zj xi , (31) where is the two-dimensional cross product, x1 x2 = x1y2 x2y1. Finally, the exponential map for SE(2) can be solved exactly. R = cos ω sin ω sin ω cos ω sin ω cos ω 1 1 cos ω sin ω Example 5 We perform a bird s-eye view map of registration. For the purposes of this example, we choose X to be a contour plot of Matlab s peaks(100) while Z is of peaks(120). Specifically, X is made up of 40 contour lines while Z is made up of 43 lines. For both images, the coordinates are constrained in the following way: x, y [ 3, 3]. The two images are shown in Figure 6. As these images are initially aligned, we first perturb them and attempt to realign them. The initial perturbation is give by A SE(2) while the algorithm determines T SE(2). In this sense, we want T A = Id. The values of A, T, T A, and T A F are shown in (33). Nonparametric Continuous Sensor Registration Parameters Value transformation convergence threshold ϵ 1e 4 gradient norm convergence threshold ϵ 5e 4 kernel characteristic length-scale ℓ 0.25 kernel characteristic length-scale ℓ(iteration > 3) 0.15 kernel characteristic length-scale ℓ(iteration > 10) 0.10 kernel characteristic length-scale ℓ(iteration > 20) 0.05 kernel signal variance σ 1 minimum step length 0.2 color space inner product scale 1e 5 kernel sparsification threshold 1e 3 Table 3: Parameters used for evaluation of the bird s eye map registration. The kernel characteristic length-scale is chosen to be adaptive as the algorithm converges; intuitively, we prefer a large neighborhood of correlation for each point, but as the algorithm reaches the convergence reducing the local correlation neighborhood allows for faster convergence and better refinement. 0.9654 0.2607 0.7250 0.2607 0.9654 0.6074 0 0 1.0000 0.9659 0.2591 0.8698 0.2591 0.9659 0.4049 0 0 1.0000 1.0000 0.0017 0.0122 0.0017 1.0000 0.0060 0 0 1.0000 T A F = 0.0138. The parameters used in this registration are included in Table 3. The initial and final images are in Figure 7. For each cloud, the color information is generated based on the level set numbers. 7.4 Special Case: n = 3 When we restrict attention to SE(3), the gradient (28) takes a special form: ω = 1 a2ℓ2 X cijk(xi, h 1zj) xi (h 1zj) v = 1 b2ℓ2 X cijk(xi, h 1zj) h 1zj xi . (34) Clark, Ghaffari, and Bloch Additionally, an explicit formula for the exponential map exp : se(3) SE(3) exists (see Ivancevic and Ivancevic (2011) and Rohan (2013) for example). This gives an exact way to solve (29). R = I + sin tθ ˆω + 1 cos tθ T = t I + 1 cos tθ ˆω + tθ sin tθ ˆω2 v. (35) where θ = ω 3 with ω R3 and t is taken to maximize G(t) in equation (30). 8. Riemannian Geometry of the Special Euclidean Groups This section deals with the geometry of SE(2) and SE(3). Specifically, the Riemannian exponential corresponding to the metric (27) (and the analogous version for SE(2)) as well as the Hessians will be computed. Additionally, there will be a discussion on using Newton s method as an alternate update rule. We begin with the simpler case of SE(2). We will compute the exponential first and the Hessian second. 8.1.1 The Riemann Exponential To compute the Riemann exponential, we will use the Euler-Poincar e equations as described by Theorem 38. The Lagrangian will be L : se(2) R where L(ω; v1, v2) = 1 2b2 v2 1 + v2 2 . The Euler-Poincar e equations are then These can be integrated to get a path (ω(t); v1(t), v2(t)) se(2). However, this only defines a path in the Lie algebra and it needs to be lifted to the group. Doing so results in the following system of differential equations: R(t) = R(t) ω(t), T(t) = R(t) v1(t) v2(t) The rotation part gives the same answer as the Lie exponential while cancellation occurs in the translation part: T = R(t) v1(t) v2(t) = R(t) R(t) 1 v1(0) v2(0) Nonparametric Continuous Sensor Registration Figure 8: Left: The images of the two point clouds. The blue stars represent X while the red are Z. Center: The vector field corresponding to the eigenvector of the Hessian with the weakest eigenvalue. Right: The vector field corresponding to the eigenvector of the Hessian with the strongest eigenvalue. This provides us with the Riemann exponential (at the identity), Exp0 : se(2) SE(2) 0 ω v1 ω 0 v1 0 0 0 cos ω sin ω v1 sin ω cos ω v2 0 0 1 Remark 16 It is interesting to compare the Riemann and Lie exponential for SE(2). When we restrict to SO(2), both agree. This follows from the fact that SO(2) is a compact Lie group and the metric restricted to this subgroup is bi-invariant. The second interesting phenomena is that the translation part is the identity for the Riemann case while the Lie case is more involved. The reason for this is that in the metric there are no cross terms intertwining rotation with translation, i.e. the Riemann exponential is the Lie exponential for the group SO(2) R2 as opposed to SE(2) = SO(2) R2. 8.1.2 The Hessian To calculate the Hessian for the SE(2) case, we first choose an orthogonal basis {ez, e1, e2} se(2) where 0 1 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 Additionally, we will take {Ez, E1, E2} to be the corresponding left-invariant vector fields. The connection is given by Ez E1 = E2, Ez E2 = E1, Clark, Ghaffari, and Bloch while all other combinations are zero. The gradient in these coordinates is ωez +v1e1+v2e2. The Hessian in these coordinates is a2LEzω b2LEzv1 + b2v2 b2LEzv2 b2v1 a2LE1ω b2LE1v1 b2LE1v2 a2LE2ω b2LE2v1 b2LE2v2 For the sake of simplicity, let the Hessian be cijk(xi, zj)Sij, 1 ℓ2 xi zj 2 xi, zj x2 i + 1 ℓ2 ( z1 j x1 i )(xi zj) x1 i + 1 ℓ2 ( z2 j x2 i )(xi zj) x2 i + 1 ℓ2 ( z1 j x1 i )(xi zj) 1 ℓ2 ( z1 j x1 i )2 1 1 ℓ2 (x1 i z1 j )(x2 i z2 j ) x1 i + 1 ℓ2 ( z2 j x2 i )(xi zj) 1 ℓ2 (x1 i z1 j )(x2 i z2 j ) 1 ℓ2 ( z2 j x2 i )2 1 Example 6 (Hessian Eigenvalues) As an initial thought experiment, what happens if each cloud contains only a single point? As we are interested in the registration problem, let us assume that each of these points is at the same location x R2. Then, the Hessian is x 2 x2 x1 x2 1 0 x1 0 1 It is interesting to note that this matrix is singular and its kernel is the span of [ 1, x2, x1]T. This Lie algebra element corresponds to a vector field on R2 (the infinitesimal generator, see Definition 5). The corresponding vector field is ξ(v) = v2 x2 v1 + x1 which is merely rotations about the point x. This rotation does not move either cloud and the degeneracy of H predicts this! In the more general context, the eigenvalues/vectors of H carry much information about the structure and symmetry of the problem. At the solution, the Hessian is negative-definite and so the eigenvector corresponding to the largest eigenvalue (smallest in absolute value) reports on the direction of greatest symmetry. That is, suppose that Hξ = λξ for ξ g and |λ| is small. Then moving Z along the vector field ξM is not noticeable under F. To see this, we consider the case where X and Z are images of a smiley face as in Figure 8. The Hessian for this example is 422.1054 55.3179 75.8545 55.3179 16.1852 0.8360 75.8545 0.8360 23.3209 The weakest eigenvalue is 0.1907 with a corresponding eigenvector of ξ = [ 0.2140, 0.7048, 0.6764]T. The vector field for this eigenvector is shown in Figure 8. The impressive aspect of this is that it is possible to learn symmetries in images! Nonparametric Continuous Sensor Registration This section will closely mimic the previous; however it will be more involved. 8.2.1 The Riemann Exponential The Euler-Poincar e equations for SE(3) are Integrating these in conjunction with the reconstruction equations produce the Lie exponential for the rotation part and the identity for the translations; the same as the SE(2) case above. That is, Exp0(R, T) = (exp(R), T) . Remark 16 holds in this case as well. Moreover, this holds for the higher-dimensional SE(n) as well. 8.2.2 The Hessian To compute the Hessian, we choose the following orthogonal coordinates for se(3): 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 Likewise, let {Ex, Ey, Ez, E1, E2, E3} be their corresponding left-invariant vector fields. With this, the connection is given by 2Ez, Ex Ez = 1 2Ey, Ey Ez = 1 2Ez, Ez Ex = 1 2Ey, Ez Ey = 1 Ex E2 = E3, Ex E3 = E2, Ey E3 = E1, Ey E1 = E3, Ez E1 = E2, Ez E2 = E1, (37) while all other combinations are zero. Remark 17 Notice that the coefficients for the connection in (37) do not depend on either a or b! This is to be expected because the Levi-Civita connection needs to be torsion free. This means that XY Y X = [X, Y ]. Since the coordinates we are taking are the leftinvariant vector fields, their bracket is just the bracket of the Lie algebra which does not depend on a nor b. Clark, Ghaffari, and Bloch We are now ready to write down the Hessian matrix (we will assume that a = b = 1). LExωx LExωy + 1 2ωz LExωz 1 2ωy LExv1 LExv2 + v3 LExv3 v2 LEyωx 1 2ωz LEyωy LEyωz + 1 2ωx LEyv1 v3 LEyv2 LEyv3 + v1 LEzωx + 1 2ωy LEzωy 1 2ωx LEzωz LEzv1 + v2 LEzv2 v1 LEzv3 LE1ωx LE1ωy LE1ωz LE1v1 LE1v2 LE1v3 LE2ωx LE2ωy LE2ωz LE2v1 LE2v2 LE2v3 LE3ωx LE3ωy LE3ωz LE3v1 LE3v2 LE3v3 Notice that the first 3 by 3 block, A, is identical to (22). This reflects the copy of SO(3) lying inside SE(3). For the sake of brevity, we will use the following abbreviation: cij k(xi, zj) Aij Bij Cij Dij Notice that by symmetry of H, BT ij = Cij. Therefore, we will only compute Aij, Cij, and Dij. To simplify expressions, we will call v = [v1, v2, v3]T := xi zj and u = [u1, u2, u3]T := zj xi. Aij = A1 ij A2 ij A3 ij 1 ℓ2 v2 1 (x2 i z2 j + x3 i z3 j ) 1 ℓ2 v1v2 + 1 2(x1 i z2 j + x2 i z1 j ) 1 ℓ2 v1v3 + 1 2(x1 i z3 j + x3 i z1 j ) 1 ℓ2 v1v2 + 1 2(x1 i z2 j + x2 i z1 j ) 1 ℓ2 v2 2 (x1 i z1 j + x3 i z3 j ) 1 ℓ2 v2v3 + 1 2(x2 i z3 j + x3 i z2 j ) 1 ℓ2 v1v3 + 1 2(x1 i z3 j + x3 i z1 j ) 1 ℓ2 v2v3 + 1 2(x2 i z3 j + x3 i z2 j ) 1 ℓ2 v2 3 (x1 i z1 j + x2 i z2 j ) 1 ℓ2 u1v1 x3 i + 1 ℓ2 u1v2 x2 i + 1 ℓ2 u1v3 x3 i + 1 ℓ2 u2v1 1 ℓ2 u2v2 x1 i + 1 ℓ2 u2v3 x2 i + 1 ℓ2 u3v1 x1 i + 1 ℓ2 u3v2 1 ℓ2 u3v3 1 ℓ2 u2 1 1 1 ℓ2 u1u2 1 ℓ2 u1u3 1 ℓ2 u1u2 1 ℓ2 u2 2 1 1 ℓ2 u2u3 1 ℓ2 u1u3 1 ℓ2 u2u3 1 ℓ2 u2 3 1 Nonparametric Continuous Sensor Registration Example 7 (Hessian Eigenvalues, Cont.) The same analysis as presented in Example 6 can be carried out for SE(3) as well. Again, we will use the Hessian to identify symmetries within the point clouds. Specifically, we will consider the case where both X and Z are a sphere living in R3. The spheres can be aligned via translations but rotations do nothing. This should manifest in the Hessian; the block D in (38) should dominate the Hessian while A should be the smallest. In what follows, X contains 1200 points on S2 R3 and Z contains 1500 points (both of these clouds are already on S2 so the registration problem is already solved) and all cij = 3. The remaining parameters are ℓ= 0.25 and σ = 10 4. The Hessian is 1.3995 4.0561 1.4875 2.1207 0.9970 1.2481 4.0561 6.3054 0.3125 0.7368 1.8921 0.8737 1.4875 0.3125 0.0554 1.5803 0.3814 0.2286 2.1207 0.7368 1.5803 594.3227 1.9310 23.2125 0.9970 1.8921 0.3814 1.9310 603.8121 13.3898 1.2481 0.8737 0.2286 23.2125 13.3898 561.8939 It can be seen that the translation component is on the order of 100 times larger than the rotation component. More formally, we can examine the eigenvalues/vectors of H. Table 4 shows that the eigenvalues corresponding to rotations are around 100 times smaller than the eigenvalues corresponding to translations. As a result, the Hessian can report on the regularity of the solution / identify infinitesimal symmetries in the pictures. 9. Experimental Results: RGB-D Visual Odometry The RGB-D visual odometry problem is illustrated in Figure 9. In the context of visual odometry, we call our method Continuous Visual Odometry (CVO), which has appeared in our earlier work (Ghaffari et al., 2019; Lin et al., 2019). We compare CVO with the state-of-the-art direct (and dense) RGB-D visual odometry (DVO) (Kerl et al., 2013a,b). Since the original DVO source code requires outdated ROS dependency (Kerl, 2013), we reproduced DVO results using the version provided by Matthieu Pizenberg (Pizenberg, 2019), which only removes the dependency for ROS while maintaining the DVO core source code unchanged. To improve the computational efficiency, we adopted a similar approach to Direct Sparse Visual Odometry (DSO) by Engel et al. (2014) to create a semi-dense point cloud (around Eigenvalue -613.9730 -598.4636 -547.6175 -8.1304 -0.2898 3.5950 Eigenvector 0.0000 0.0022 0.0021 0.5827 0.6858 0.4360 0.0032 0.0016 0.0015 0.6897 0.7011 0.1810 0.0033 0.0027 0.0010 0.4298 0.1952 0.8815 0.4029 0.9089 0.1073 0.0030 0.0023 0.0025 0.2685 0.2295 0.9355 0.0032 0.0006 0.0000 0.8750 0.3481 0.3365 0.0019 0.0027 0.0015 Table 4: The eigenvalue/vectors of H in (39). Clark, Ghaffari, and Bloch Figure 9: Visual odometry using depth cameras is the problem of finding a rigid body transformation between two colored point clouds. Parameters Symbol Value Transformation convergence threshold ϵ1 1e 5 Gradient norm convergence threshold ϵ2 5e 5 Minimum step length 0.2 Kernel sparsification threshold 8.315e 3 Spatial kernel initial length-scale ℓinit 0.1 Spatial kernel signal variance σ 0.1 Color kernel length-scale ℓc 0.1 Color kernel signal variance σc 1 Table 5: Parameters used for evaluation using TUM RGB-D Benchmark, similar values are chosen for all experiments. The kernel characteristic length-scale is chosen to be adaptive as the algorithm converges (Ghaffari et al., 2019); intuitively, we prefer a large neighborhood of correlation for each point, but as the algorithm reaches the convergence reducing the local correlation neighborhood allows for faster convergence and better refinement. 3000 points) for each scan. To prevent insufficient points being selected in environments that lack rich visual information, we also used a Canny edge detector (Canny, 1987) from Open CV (Bradski, 2000). When the number of points selected by the DSO point selector is less than one-third of the desired number of points, more points will be selected by downsampling the pixels highlighted by the Canny detector. While generating the point cloud, RGB values are first transformed into the HSV colormap and normalized. The normalized HSV values are then combined with the normalized intensity gradients and used as the labels of the selected points in the color space. For all experiments, we used the same set of parameters, which are listed in Table 5. All experiments are performed on a Dell XPS15 9750 laptop with Intel i7-8750H CPU (6 cores with 2.20 GHz each) and 32GB RAM. The source code is implemented in C++ Nonparametric Continuous Sensor Registration Training Validation CVO DVO CVO DVO Sequence Trans. Rot. Trans. Rot. Trans. Rot. Trans. Rot. fr1/desk 0.0486 2.4860 0.0387 2.3589 0.0401 2.0148 0.0371 2.0645 fr1/desk2 0.0535 3.0383 0.0583 3.6529 0.0225 1.7691 0.0208 1.7416 fr1/room 0.0560 2.4566 0.0518 2.8686 0.0446 3.9183 0.2699 7.4144 fr1/360 0.0991 3.0025 0.1602 4.4407 0.1420 3.0746 0.2811 7.0876 fr1/teddy 0.0671 4.8089 0.0948 2.5495 n/a n/a n/a n/a fr1/floor 0.0825 2.3745 0.0635 2.2805 n/a n/a n/a n/a fr1/xyz 0.0240 1.1703 0.0327 1.8751 0.0154 1.3872 0.0453 3.0061 fr1/rpy 0.0457 3.3073 0.0336 2.6701 0.1138 3.6423 0.3607 7.9991 fr1/plant 0.0316 1.9973 0.0272 1.5523 0.0630 4.9185 0.0660 2.5865 Average 0.0561 2.7380 0.0623 2.6943 0.0631 2.9607 0.1544 4.5571 Table 6: The RMSE of Relative Pose Error (RPE) for fr1 sequences. The trans. columns show the RMSE of the translational drift in m/ sec and the rot. columns show the RMSE of the rotational error in deg/ sec. There s no corresponding validation data sets for fr1/teddy and fr1/floor. and compiled with the Intel Compiler. The kernel computations are parallelized using the Intel Threading Building Blocks (TBB) (Intel Corporation, 2019). Using compiler autovectorization and the parallelization, the average time for frame-to-frame registration is 0.5 sec. Software is available for download at https://github.com/Maani Ghaffari/cvo-rgbd. 9.1 TUM RGB-D Benchmark We performed experiments on two parts of RGB-D SLAM data set and benchmark by the Technical University of Munich (Sturm et al., 2012). This data set was collected indoors with a Microsoft Kinect using a motion capture system as a proxy for ground truth trajectory. For all tracking experiments, the entire images were used sequentially without any skipping, i.e., at full frame rate. We evaluated CVO and DVO on the training and validation sets for all the fr1 sequences and the structure versus texture sequences. RGB-D benchmark tools (Sturm et al., 2012) were then used to evaluate the Relative Pose Error (RPE) of both methods. Table 6 shows the Root-Mean-Squared Error (RMSE) of the RPE for fr1 sequences. The Trans. columns show the RMSE of the translational drift in m/ sec and the Rot. columns show the RMSE of the rotational drift in deg/ sec. There are no corresponding validation sequences for fr1/teddy and fr1/floor. On the validation set, CVO has improved performance compared with DVO which shows it can generalize across different scenarios better. From the results, we can see that CVO is intrinsically robust. The next experiment will further reveal that CVO has the advantage of performing well in extreme environments that lack rich structure or texture. Clark, Ghaffari, and Bloch Training Validation Sequence CVO DVO CVO DVO structure-texture-dist. Trans. Rot. Trans. Rot. Trans. Rot. Trans. Rot. near 0.0279 1.3470 0.0563 1.7560 0.0310 1.6367 0.0315 1.1498 far 0.0609 1.2342 0.1612 3.4135 0.1374 2.3929 0.5351 8.2529 near 0.0221 1.3689 0.1906 10.6424 0.0465 2.0359 0.1449 4.9022 far 0.0372 1.3061 0.1171 2.4044 0.0603 1.8142 0.1375 2.2728 near 0.0236 1.2972 0.0175 0.9315 0.0306 1.8694 0.0217 1.2653 far 0.0409 1.1640 0.0171 0.5717 0.0616 1.4760 0.0230 0.6312 near 0.2119 9.7944 0.3506 13.3127 0.1729 5.8674 0.1747 6.0443 far 0.0799 3.0978 0.1983 6.8419 0.0899 2.6199 0.2000 6.5192 Average 0.0631 2.5762 0.1386 4.9843 0.0787 2.4640 0.1586 3.8797 Table 7: The RMSE of Relative Pose Error (RPE) for the structure v.s texture sequence. The Trans. columns show the RMSE of the translational drift in m/ sec and the Rot. columns show the RMSE of the rotational error in deg/ sec. The means the sequence has structure/texture and means the sequence does not have structure/texture. The results show while DVO performs better in structure and texture case, CVO has significantly better performance in the environments that lack structure and texture. 9.2 Structure vs. Texture Sequences Table 7 shows the RMSE of RPE for the structure vs. texture sequences. This data set contains image sequences in structure/nostructure and texture/notexture environments. As elaborated by Ghaffari et al. (2019), by treating point clouds as points in the function space (RKHS), CVO is inherently robust to the lack of features in the environment. CVO shows the best performance on cases where either structure or texture is not rich in the environment. This reinforces the claim that CVO is robust to such scenes. We also note that DVO has the best performance on the case where the environment contains rich texture and structure information. This could be due to two factors: 1) CVO adopts a semi-dense point cloud construction from DSO (Engel et al., 2018), while DVO uses the entire dense image without subsampling. Although the semi-dense tracking approach of Engel et al. (2014, 2018) is computationally attractive and we advocate it, the semidense point cloud construction process used in this work is a heuristic process and might not necessarily capture the relevant information in each frame optimally; 2) DVO uses a motion prior as regularizer whereas CVO solely depends on the camera information with no regularizer. We conjecture this latter is the reason DVO, relative to the training set, does not perform well on validation sequences. The motion prior is a useful assumption when it is true. It can help to tune the method better on the training sets but if the assumption gets violated can lead to poor performance. The addition of an IMU sensor of course can improve the performance of both methods and is an interesting future research direction. Nonparametric Continuous Sensor Registration Figure 10: Left: The images of the two point clouds. The blue stars represent X while the red are Z. Center: A contour plot of F : T 2 R where ℓ= 0.5. Right: A contour plot of F : T 2 R where ℓ= 0.2. A total of 33 local maxima are found in this picture. A video of the effect of varying ℓis available at https://youtu.be/ETr6-c0Vap Q. Figure 11: The images above display the number of local maxima of the function F : T 2 R as a function of the length-scale. The black curve shows the exact number of local maxima as reported by Matlab s imregionalmax while the blue curve is a plot of N = 1.16602 ℓ 2. 10. Discussions and Further Considerations In this section, we present discussions on a number of topics that are worth investigating. 10.1 Locality of Solutions It is important to stress that the registration problem laid out here is only a local solver. However, how detrimental is this? For the purposes of gaining intuition, we will compute F : T 2 R for the T 2 case, see Section 3.2. For the point clouds, we will use the same clouds as in Example 6 (although viewing them in [0, 2π)2 T 2 rather than R2). The torus Clark, Ghaffari, and Bloch is discretized into a 500 by 500 grid and length-scales are chosen in the range [0.01, 0.5]. For each value of ℓ, a contour plot of F : T 2 R is constructed and the number of local maxima is recorded (with Matlab s imregionalmax function). Two contour plots corresponding to ℓ= 0.5 and ℓ= 0.2 are shown in Figure 10. It is important to note that there exists a plethora of local maxima. This illustrates how important it is to have a good initialization, or else the gradient flow will end up getting trapped at a false solution. However, the number of local solutions depends heavily on ℓ. This can be used in practice by having ℓstart offlarge and shrink between iterations, even though care must be taken if one deals with positive-definite kernel design and hyperparameter learning on curved spaces as the Gaussian kernel might not be positive-definite for all ℓ(Feragen et al., 2015; Borovitskiy et al., 2020). Another observation is that the number of local solutions seems to grow according to a power law of the length-scale, as shown in Figure 11. Controlling the number of modes is an important problem because if we can control the number of modes, we can initially generate a convex problem and guide the algorithm to the global solution. This approach can be seen as the continuous analogue of the image pyramid method in computer vision (Szeliski, 2010) where low-resolution images are processed first, and the next level has a higher resolution initialized by the previous step. A heuristic explanation can be that we have more modes of the Gaussian kernel at lower bandwidths since a higher bandwidth makes the bumps around points coalesce. Assuming an inversely proportional relation to the bandwidth, then Figure 11 is a reasonable verification of the number of maxima. Nevertheless, a systematic approach for learning the bandwidth and controlling the complexity of the problem remains an open problem and is an interesting future study. 10.2 Reduction to the Weighted Least Squares The standard solution to sensor registration is typically formulated as a maximum likelihood or least-squares problem and solved via numerical optimization techniques. In the maximum likelihood setting, the measurement noise distribution significantly affects the robustness of the solver to outliers. Hence, in practice, the problem is necessarily solved using a robust estimator such as the iteratively reweighted least-squares algorithm or truncated least squares. The central shortcoming of this approach is the reliance on point-wise correspondences between two measurement sets. As a result, the implicit assumption is that the perceived data points by digital sensors are from the same points in reality. Of course, this is seldom true as the nature of digital sensing means a discretization of the reality is captured at best. Sparse and uniform (feature-scarce) measurement sets are where this shortcoming can become problematic. A prime example is the lack of robustness of the visual odometry algorithms to texture and structure-less scenes. Without loss of generality, we will use the Gaussian (Squared Exponential) kernel based on a distance function, d( , ): k(x, y) = σ2 exp d(x, y)2 . Nonparametric Continuous Sensor Registration Further, let us define wij := σ2 ℓX(xi), ℓZ(zj) I and dij := d(xi, zj). Then we have f X, f Z H = X wij exp d2 ij . (40) Now we explicitly apply the three assumptions that are used in the least squares problem. Assumption 1 The assumptions used in the least squares problem are: 1. The point clouds X and Z have the same number of points. 2. The point cloud Z is ordered such that the matching indices are corresponding measurements. 3. The residuals are only computed between corresponding measurements. By applying the above-mentioned assumptions to (40), we get f X, f Z H X i wi exp d2 i , where wi = wii and di = dii. and it is reduced to a single sum over matching indices of the double sum in (40). Finally, the exp can be approximated using a Taylor expansion as exp d2 i 1 d2 i + . Problem 3 The problem of aligning the point clouds can now be rephrased as the following weighted least squares form arg min h G S(h), S(h) := X i wi d(xi, hzi)2. 10.3 Feature Embedding via Tensor Product Representation For completeness, we include an extension of the proposed framework developed by Zhang et al. (2020) that is of particular interest to practitioners. The framework contains a specialized solver implemented using GPU that exploits the double sum s sparse structure in the objective function. In addition, it incorporates deep learning-based features such as semantic segmentation labels by extending the feature space to a hierarchical distributed representation. For extensive experimental results and open source software that validates presented topics in this work, please see Zhang et al. (2020). Let (V1, V2, . . . ) be different inner product spaces describing different types of non geometric features of a point, such as color, intensity, and semantics. Their tensor product, V1 V2 . . . is also an inner product space. For any x X, z Z with features ℓX(x) = (u1, u2, . . . ) and ℓZ(z) = (v1, v2, . . . ), with u1, v1 V1, u2, v2 V2, . . . , we have ℓX(x), ℓZ(z) I = u1 u2 . . . , v1 v2 . . . = u1, v1 u2, v2 . . . . (41) Clark, Ghaffari, and Bloch By substituting (41) into (4), we obtain f X, fh 1Z H = X xi X,zj Z u1i, v1j u2i, v2j . . . k(xi, h 1zj) After applying the kernel trick (Bishop, 2006; Williams and Rasmussen, 2006; Murphy, 2012) we arrive at f X, fh 1Z H = X xi X,zj Z k(xi, h 1zj) Y k k Vk(uki, vkj) := X xi X,zj Z k(xi, h 1zj) cij. Equation (42) describes the full geometric and non-geometric relationship between the two point clouds. Each cij does not depend on the relative transformation; thus, it will be a constant when computing the gradient and the step size for solving registration problems. Interestingly, the double sum in (42) is sparse because a point xi X is far away from the majority of the points zj Z, either in the spatial (geometry) space or one of the feature (semantic) spaces. This formulation can be further simplified to a purely geometric model, if we let the label functions ℓX(xi) = ℓZ(zj) = 1. Then (42) becomes f X, fh 1Z H = X xi X,zj Z k(xi, h 1zj). (43) Through (43), the proposed method can register point clouds that do not have appearance measurements. It is worth noting that, when choosing the squared exponential kernel, (43) has the same formulation as Kernel Correlation (Tsin and Kanade, 2004). 10.4 Non-Isotropic Kernel For the 3D space, the kernel used throughout is k(x, y) = σ2 exp x y 2 where ℓ R. What happens if we change so ℓ R3 so the length-scale changes depending on the direction? This makes the kernel into kΛ(x, y) = σ2 exp ( x y, Λ(x y) ) , (44) where Λ is positive-definite and symmetric. Going through the same calculations as before, we get that ij cijkΛ(xi, h 1zj) xi (h 1zj) , ij cijkΛ(xi, h 1zj) xi h 1zj . (45) Nonparametric Continuous Sensor Registration Note that in the original case, 1 2ℓ2 0 0 0 1 2ℓ2 0 0 0 1 2ℓ2 In terms of applications, modeling directional similarities has been successful in the Generalized-ICP framework developed by Segal et al. (2009). In particular, it was shown that in structured environments, e.g., flat walls, ceiling, and floor, defining a norm weighted by local empirical covariances of the two point clouds effectively reduces the pose estimation error. If such a distance metric is used inside the kernel, the kernel can be considered to be anisotropic. In terms of modeling, this is similar to a covariance function with automatic relevance determination (Neal, 1996). 10.5 Mat ern Kernel Rather than using the Gaussian (squared exponential) kernel, we will use the Mat ern kernel (Williams and Rasmussen, 2006). The following derivations are only valid, i.e., positive definite, for all hyperparameters in Euclidean spaces. For the analogue of squared exponential and Mat ern kernels on compact Riemannian manifold please see the work of Borovitskiy et al. (2020). This is an infinite family of kernels. The Mat ern Kernel, for points a distance d apart is Cν(d) = σ2 21 ν where Γ is the gamma function, Kν is the Bessel function of the second kind, and ρ and ν are positive parameters. We have the following derivative property: d dt Kν(t) = Kν 1(t) ν With half-integers, ν = p + 1/2, p N Cp+1/2(d) = σ2 exp 2p + 1d (p + i)! i!(p i)! Differentiating the kernel with respect to d, we see that d Cν(d) = σ2 21 ν Working with this does not seem appealing, so let us work with half-integer values instead. Below, the kernels and their corresponding gradients are listed. In what follows, dij = xi h 1zj . Clark, Ghaffari, and Bloch 10.5.1 ν = 1/2 C1/2(d) = σ2 exp d ij cij exp dij ij cij exp dij xi h 1zj . (48) 10.5.2 ν = 3/2 C3/2(d) = σ2 ω = 3 a2ρ2 X ij cijdij exp ! xi h 1zj , v = 3 b2ρ2 X ij cijdij exp ! xi h 1zj . 10.5.3 ν = 5/2 C5/2(d) = σ2 ω = 5 3a2ρ3 X ij cijdij ρ + ! xi h 1zj , v = 5 3b2ρ3 X ij cijdij ρ + ! xi h 1zj . 10.6 Relationship with Deep Learning Point clouds obtained by modern sensors such as RGB-D cameras, stereo cameras, and LIDARs contain up to 300, 000 points per scan at 10 60 Hz and rich color and intensity (reflectivity of a material sensed by an active light beam) measurements besides the geometric information. In addition, deep learning (Le Cun et al., 2015) can provide semantic attributes of the scene as measurements (Long et al., 2015; Chen et al., 2017; Zhu et al., 2019). Representation learning provides a way to perform semi-supervised learning (Chapelle et al., 2006), self-supervised learning (Dahlkamp et al., 2006; Sofman et al., 2006; Doersch and Zisserman, 2017; Pathak et al., 2017), and unsupervised learning (Goodfellow et al., 2016). Although the motivation behind learning a good representation from data varies across different fields and research communities, its mathematical foundation is moving towards a more unifying direction (Bengio et al., 2013; Litjens et al., 2017) where exploiting Nonparametric Continuous Sensor Registration well-established classical tools and modern data-driven approaches such as deep learning seem inevitable (Girshick et al., 2014). Perhaps the main undesirable aspect of modern deep learning-based methods is the heavy dependency on large labeled data, i.e., supervised learning. An intrinsic metric available in our framework is the angle, θ, between two functions. This indicator can be computed to track the optimization progress. The cosine of the angle is defined as cos(θ) = f X, f Z H f X f Z . (53) However, calculating f X and f Z is time-consuming as it requires evaluating the double sum for each of the two point clouds. To approximate (53), we use the following result. Remark 18 Suppose k(xi, xj) = δij and cii = 1, where δij is the Kronecker delta, then f X = p Corollary 19 Using the previous assumption, we define the following alignment indicator. xi X,zj Z cij k(xi, zj). (54) The behavior of the alignment indicator with respect to the rotation and translation errors is stuied by Zhang et al. (2020). As a connection with deep neural networks, Overlap Net (Chen et al., 2020) uses a neural network to predict a similar metric and detect loop closures. The cosine of the angle in (53) or the indicator in (54) provide such a metric for self-supervised learning while taking into account the semantic information. Given the promising results of the work of Chen et al. (2020), the combination of our metric with deep learning is an interesting future research direction. The expected outcomes will be an scalable diagnostic tool for monitoring the quality of point cloud alignments and representation learning for the model in (42). A more direct approach will be the development of a differentiable loss functions for deep learning tasks. Zhu et al. (2020) developed a new continuous 3D loss function for monocular single-view depth prediction based on our function representation. The proposed loss function addresses the gap between dense image prediction and sparse LIDAR supervision. By simply adding this new loss function to existing network architectures, the accuracy and geometric consistency of depth predictions are improved significantly on all three state-of-the-art baseline networks tested by Zhu et al. (2020). 11. Conclusion We developed a new mathematical framework for sensor registration that enables nonparametric joint semantic/appearance and geometric representation of continuous functions using data. The continuous functions allow the registration to be independent of a specific signal resolution, and the framework is fully analytical with a closed-form derivation of the Riemannian gradient and Hessian. We studied a restriction of the framework where the Lie group acts on functions isometrically. This restriction has a variety of applications, and its high-dimensional derivation for the special Euclidean group acting on the Euclidean space Clark, Ghaffari, and Bloch showcases the point cloud registration and bird s-eye view map registration abilities. We derived low-dimensional cases with numerical examples to show the generality of the proposed framework. A specific implementation of this framework for RGB-D cameras performs well in texture and structure-scarce environments and is inherently robust to the input noise. The general problem where the Lie group does not act isometrically on the functions has interesting applications such as image registration and nonrigid structure from motion. Although, the Lie algebra of the affine matrix group does not possess any particular structure, i.e., it includes all square matrices, due to its mentioned applications it is an attractive future research direction. The development of a globally optimal solver is, of course, another challenging and attractive future research direction. The developed framework is a data-driven analytical model that possesses interesting intrinsic mathematical structures simultaneously, i.e., equivariance, and computational structures such as sparsity and parallel processing. Autonomy via computational intelligence is a multifaceted research domain which nicely integrates mathematics, computer science, and engineering and can have enormous impacts on our future and improve our quality of life. We developed a novel framework that can spark new research directions in robotics via sensor registration and mapping applications, computer vision via 3D scene reconstruction applications, medical imaging via nonrigid 3D reconstructions and registration, machine learning via Lie group machine learning, and in general artificial intelligence via an equivariant cognitive model. Acknowledgments A. Bloch and W. Clark were supported in part by NSF grant DMS-1613819 and AFOSR grant FA 0550-18-0028. A. Bloch was also supported in part by NSF grant DMS-2103026. Toyota Research Institute provided funds to support this work. Funding for M. Ghaffari was in part provided by NSF Award No. 2118818. The authors thank the anonymous reviewers for many constructive feedback and in-depth review of this work. Appendix A. Required Mathematical Machinery and Notation We review some mathematical preliminaries to establish the notation. A.1 Matrix Lie Group of Motion in Rn The general linear group of degree n, denoted GLn(R), is the set of all n n real invertible matrices, where the group binary operation is the ordinary matrix multiplication. The n-dimensional special orthogonal group, denoted SO(n) = {R GLn(R)| RRT = In, det R = +1}, is the rotation group on Rn. The n-dimensional special Euclidean group, denoted SE(n) = h = R T 0 1 GLn+1(R)| R SO(n), T Rn , Nonparametric Continuous Sensor Registration is the group of rigid transformations, i.e., direct isometries, on Rn. A transformation such as h SE(3) is the parameter space in many sensor registration problems which consists of a rotation and a translation components. Let h SE(3) be an estimate of h. We compute the rotational and translational distances using log( RRT) F and T RRTT , respectively, where log( ) is the Lie logarithm map which is, here, the matrix logarithm. These definitions are consistent with the transformation distance that can be directly computed using log( hh 1) F. Here, A 2 F = Tr(ATA) is the Frobenius norm. This paper studies not only matrix Lie groups, but also their actions on manifolds. We have the following definition: Definition 20 Let G be a group and X a set. A (left) group action of G on X, denoted as G X, is a group homomorphism G Aut(X) (automorphism of X). If X is a smooth manifold, the action is smooth if G Diff(X) (diffeomorphism of X). Remark 21 A group action can similarly be viewed as a function ϕ : G X X satisfying two conditions (where ϕ(g, x) will be denoted by g.x) 1. Identity: If e G is the identity element, then e.x = x for all x X. 2. Compatibility: (gh).x = g.(h.x) for all g, h G and x X. Remark 22 The standard action of SE(n) on Rn is given by (R, T).x = Rx + T for R SO(n) and T Rn. A.2 Hilbert Space Let V be a finite-dimensional vector space over the field of real numbers R. An inner product on V is a function , : V V R that is bilinear, symmetric, and positive definite. The pair (V, , ) is called an inner product space. The inner product induces a norm that measures the magnitude or length of a vector: v = p v, v . The norm in turn induces a metric that allows for calculating the distance between two vectors: d(v, w) = v w = p v w, v w . Such a metric is homogeneous; for a R, d(av, aw) = |a|d(v, w), and translation invariant; d(v + x, w + x) = d(v, w). These properties are inherited from the induced norm. The distance metric is positive definite, symmetric, and satisfies the triangle inequality. In addition to measuring distances, it is important to be able to understand limits. This leads to the definition of a Cauchy sequence and completeness. Definition 23 (Cauchy Sequence) A Cauchy sequence is a sequence {xi} i=1 such that for any real number ϵ > 0 there exists a natural number n N such that for some distance metric d(xn, xm) < ϵ for all n, m > n. Definition 24 (Completeness) A metric space (M, d) is complete if every Cauchy sequence in M converges in M, i.e., to a limit that is in M. Such a metric space contains all its limit points. Note that completeness is with respect to the metric d and not the topology of the space. Now, we can give a definition for a Hilbert space. Clark, Ghaffari, and Bloch Definition 25 (Hilbert Space) A Hilbert space, H, is a complete inner product space; that is, any Cauchy sequence in H, using the metric induced by the inner product, converges to an element in H. The typical example for a Hilbert space is the space of square-integrable functions on R, i.e., H = L2(R, µ) where µ is the Lebesgue measure on R. The inner product is defined as: f, g H := Z f(x)g(x)dµ(x). (55) Similarly, the induced norm by the inner product is f H = p f, f H. The Hilbert space of functions can be thought of as an infinite-dimensional counterpart of the finite-dimensional vector spaces discussed earlier. However, the Hilbert spaces of interest in this work will be reproducing kernel Hilbert spaces discussed below. A.3 Representation and Reproducing Kernel Hilbert Space We now move to a more special type of Hilbert Space called Reproducing Kernel Hilbert Space (RKHS) (Berlinet and Thomas-Agnan, 2004) which we will use in this work. Definition 26 (Kernel) Let x, x X be a pair of inputs for a function k : X X R known as the kernel. A kernel is symmetric if k(x, x ) = k(x , x), and is positive definite if for any nonzero f H (or L2(X, µ)): Z k(x, x )f(x)f(x )dµ(x)dµ(x ) > 0. Definition 27 (Reproducing Kernel Hilbert Space) Let H be a real-valued Hilbert space on a non-empty set X. A function k : X X R is a reproducing kernel of the Hilbert space H iff: 1. x X, k( , x) H, 2. x X, f H f, k( , x) = f(x). The Hilbert space H (RKHS) which possesses a reproducing kernel k is called a Reproducing Kernel Hilbert Space or a proper Hilbert space. The second property is called the reproducing property; that is using the inner product of f with k( , x), the value of function f is reproduced at point x. Also, using both conditions we have: x, z X, k(x, z) = k( , x), k( , z) . Lemma 28 Any reproducing kernel is a positive definite function (Berlinet and Thomas Agnan, 2004). Finding a reproducing kernel of an RKHS might seem difficult, but fortunately, there is a one-to-one relation between a reproducing kernel and its associated RKHS, and such a reproducing kernel is unique. Therefore, our problem reduces to finding an appropriate kernel. Nonparametric Continuous Sensor Registration Theorem 29 (Moore-Aronszajn Theorem, Berlinet and Thomas-Agnan, 2004) Let k be a positive definite function on X X. There exists only one Hilbert space H of functions on X with k as reproducing kernel. The subspace H0 of H spanned by the function k( , x), x X is dense 1 in H and H is the set of functions on X which are point-wise limits of Cauchy sequence in H0 with the inner product j=1 αiβjk(zj, xi), (56) where f = Pn i=1 αik( , xi) and g = Pm j=1 βjk( , zj). The important property while working in an RKHS is that the convergence in norm implies point-wise convergence; the converse need not be true. In other words, if two functions in an RKHS are close in the norm sense, they are also close point-wise. We will rely on this property to solve the problem discussed in this paper. In Theorem 29, f and g are defined only in H0. The following theorem known as the representer theorem ensures that the solution of minimizing the regularized risk functional admits such a representation. Theorem 30 (Nonparametric Representer Theorem, Sch olkopf et al., 2001) Let X be a nonempty set and Hk be an RKHS with reproducing kernel k on X X. Suppose we are given a training sample (x1, y1), . . . , (xm, ym) X R, a strictly monotonically increasing real-valued function h on [0, ), an arbitrary cost function c : (X R2)m R { }, and a class of functions 2 F = {f RX |f( ) = i=1 βik( , zi), βi R, zi X, f Hk < }, where Hk is the induced norm of the RKHS Hk associated with k. Then any f F minimizing the regularized risk functional c((x1, y1, f(x1)), . . . , (xm, ym, f(xm))) + h( f Hk) admits a representation of the form i=1 αik( , xi). (57) A.4 Riemannian Geometry Let M be a smooth manifold. This paper will be concerned with the problem of finding the maximum of a function f : M R. This will be accomplished (locally) via gradient ascent. Due to the fact that M will not generally be Euclidean, we will be using tools from Riemannian geometry. Recall that a vector field X : M TM is a section of the tangent bundle, i.e. for π : TM M we have π X = Id. The set of all (smooth) vector fields on M will be 1. A dense subset of M implies the closure of the subset X equals M. 2. RX is the space of functions mapping X to R. Clark, Ghaffari, and Bloch denoted by X(M). In contrast to vector fields, 1-forms are sections of the cotangent bundle, M T M and the set of all 1-forms are denoted by Ω1(M). For a function f : M R, the differential is naturally a covector rather than a vector: dfx : Tx M R t=0 f (γ(t)) , γ(0) = x, γ(0) = v. In order to determine the gradient, f X(M), we need an identification Ω1(M) X(M) which is where the Riemannian metric will be used. Definition 31 A Riemannian metric on a manifold M is the assignment to each point p M, of an inner product , p on Tp M which is smooth in the following sense: for X, Y X(M), the map p 7 X(p), Y (p) p is a smooth function on M. This gives a way to turn df into f as follows: On a general manifold, if we wish to differentiate a vector field Y in the direction of X (i.e. a directional derivative), we cannot compute it in the traditional sense since different tangent vectors lie in different tangent spaces so we cannot take their differences. A way around this issue is to use a connection (see, e.g., 6 of Tu 2017): Definition 32 An affine connection on a manifold, M, is a R-bilinear map : X(M) X(M) X(M) such that for all f C (M) and X, Y X(M): i. f XY = f XY , ii. Xf Y = LXf Y + f XY , where L is the usual Lie derivative (Tu, 2011, Section 20). For a Riemannian metric there exists a unique affine connection, called the Levi-Civita connection that is both torsion-free and compatible with the metric. A connection is compatible with the metric if for all X, Y, Z X(M), Z X, Y = ZX, Y + X, ZY , and torsion-free if XY Y X = [X, Y ]. This connection can be constructed in the following way: 2 XY, Z = X Y, Z + Y Z, X Z X, Y X, [Y, Z] + Y, [Z, X] + Z, [X, Y ] . (58) Remark 33 Suppose that G is a commutative Lie group and we choose an orthogonal basis, {ei}, of its Lie algebra g. If we let {Ei} be the corresponding left-invariant vector fields, then the connection vanishes in these coordinates: Ei Ej = 0 for all i, j. This will be useful in Sections 3.1 and 3.2 where we compute the registration problem for G = S1 and G = T 2. Nonparametric Continuous Sensor Registration The Levi-Civita connection is used to define a geodesic on a Riemannian manifold, i.e. a straight line. Definition 34 A geodesic is a curve γ : I M where I is an interval in R and where γ γ is the covariant derivative of the velocity vector field γ(t) = d Theorem 35 Let γ : I M be a curve. Then the following are equivalent: 1. γ is a geodesic, 2. In coordinates γ = (y1, . . . , yn), the curve satisifies the following system of differential equations yk + X ij Γk ij yi yj = 0, where Γk ij are the Christoffel symbols (Tu, 2017, Sec. 13.3, p. 99), and 3. γ satisfies the Euler-Lagrange equations with Lagrangian L(y, y) = 1 Example 8 Consider the case where M = R2 and the metric is given by u, v p = u Tm(p)v for a symmetric, positive-definite matrix m which depends on the point p = (x, y) R2. The Euler-Lagrange equations are x = 0, d dt L Computing the first equation results with: dt (m11 x + m12 y) = m11 x + m12 y + m11 Combining these (as well as performing the analogous computation for the y equation) we obtain the Euler-Lagrange equations. m11 x + m12 y + 1 m12 x + m22 y + m12 Decoupling the acceleration terms produces x + Γx xx x2 + Γx xy x y + Γx yy y2 = 0, y + Γy xx x2 + Γy xy x y + Γy yy y2 = 0, Clark, Ghaffari, and Bloch Γx xx = 1 det(m) Γx xy = 1 det(m) Γx yy = 1 det(m) Γy xx = 1 det(m) Γy xy = 1 det(m) Γy yy = 1 det(m) In particular, due to the existence and uniqueness of solutions to ordinary differential equations, geodesics always exist locally. Moreover, when the manifold is a Lie group and the metric is invariant under left-translations, the geodesic equations can be described via the Euler-Poincar e equations (Bloch et al., 1996). Definition 36 Let G be a Lie group. The function ℓg : G G, h 7 gh for g G is called left-translation by g. Its tangent lift, (ℓg) : TG TG is defined as follows: let v Th G and γ : ( ε : ε) G such that γ(0) = h and γ(0) = v. Then, we have (ℓg) v := d t=0 ℓg γ(t). Definition 37 For a Lie group, G, a function L : TG R is said to be a left-invariant Lagrangian if (ℓg) L = L for all g G. That is, L(x, v) = L(gx, (ℓg) v). Theorem 38 Let G be a Lie group and L : TG R be a left-invariant Lagrangian. Let L : g R be its restriction to the tangent space of G at the identity (the corresponding Lie algebra). For a curve g(t) G, let ξ(t) = g(t) 1 g(t); i.e., ξ(t) = ℓg 1 Then the following are equivalent: 1. g(t) satisfies the Euler-Lagrange equations for L on G, 2. The Euler-Poincar e equations hold: δξ = ad ξ δL Proof See Theorem 5.2 of Bloch et al. (1996). Nonparametric Continuous Sensor Registration Remark 39 The reasoning behind using g 1 g rather than just g is because g Tg G and g 1 : Tg G Te G = g and therefore g 1 g g. In coordinates, let ξ = P i ξiei g where {ei} forms a basis. Then equation (59) takes the form d dt L ξj = ck ij L ξk ξi, [ei, ej] = X k ck ijek. (60) The idea of a geodesic is useful because it provides a map Tx M M in the following way: v 7 γ(1), γ(0) = x, γ(0) = v where γ is a geodesic. This map is called the Riemannian exponential map and is denoted by Expx : Tx M M. Remark 40 There are two important aspects of the Riemannian exponential that need to be discussed. (1) In general, Expx is only defined on a neighborhood of 0 Tx M and not the whole tangent space. (2) When M = G is a Lie group, the Lie exponential map exp : g G will usually not agree with the Riemannian exponential although they have matching domain and codomain. The last tool from Riemannian geometry we will use is the Hessian. This is a generalization of the second derivative. For a function f : M R, the Hessian is a second-order tensor given by k Γk ij f xk A different, equivalent, definition of the Hessian is: H(f)(X, Y ) = Xgrad(f), Y = X(Y f) df( XY ). This object can be used to calculate the covariance as well as be an ingredient in Newton s root finding method. Timothy D Barfoot. State Estimation for Robotics. Cambridge University Press, 2017. Bachir Bekka, Pierre de La Harpe, and Alain Valette. Kazhdan s property (T). Cambridge university press, 2008. Yoshua Bengio, Aaron Courville, and Pascal Vincent. Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8):1798 1828, 2013. Youcef Bentoutou, Nasreddine Taleb, Kidiyo Kpalma, and Joseph Ronsin. An automatic image registration for applications in remote sensing. IEEE Transactions on Geoscience and Remote Sensing, 43(9):2127 2137, 2005. Clark, Ghaffari, and Bloch Alain Berlinet and Christine Thomas-Agnan. Reproducing kernel Hilbert spaces in probability and statistics. Kluwer Academic, 2004. Rolf Berndt and Michael Klucznik. An Introduction to Symplectic Geometry. Graduate Studies in Mathematics. American Mathematical Society, 2001. ISBN 9780821820568. Christopher M Bishop. Pattern recognition and machine learning. Springer, 2006. Anthony Bloch, P. S. Krishnaprasad, Jerrold E. Marsden, and Tudor S. Ratiu. The Euler Poincar e equations and double bracket dissipation. Communications in Mathematical Physics, 175(1):1 42, Jan 1996. Anthony M. Bloch, Roger W. Brockett, and Tudor S. Ratiu. Completely integrable gradient flows. Communications in Mathematical Physics, 147(1):57 74, Jun 1992. Silvere Bonnabel, Martin Barczyk, and Fran cois Goulette. On the covariance of icp-based scan-matching techniques. In Proceedings of the American Control Conference, pages 5498 5503. IEEE, 2016. Viacheslav Borovitskiy, Alexander Terenin, Peter Mostowsky, and Marc Peter Deisenroth. Mat ern gaussian processes on riemannian manifolds. ar Xiv preprint ar Xiv:2006.10160, 2020. Gary Bradski. The Open CV Library. Dr. Dobb s Journal of Software Tools, 2000. Martin Brossard, Silvere Bonnabel, and Axel Barrau. A new approach to 3d icp covariance estimation. IEEE Robotics and Automation Letters, 5(2):744 751, 2020. John Canny. A computational approach to edge detection. In Readings in Computer Vision, pages 184 203. Elsevier, 1987. Andrea Censi. An accurate closed-form estimate of icp s covariance. In Proceedings of the IEEE International Conference on Robotics and Automation, pages 3167 3172. IEEE, 2007. Andrea Censi. An ICP variant using a point-to-line metric. In Proceedings of the IEEE International Conference on Robotics and Automation, pages 19 25. IEEE, 2008. Olivier Chapelle, Bernhard Sch olkopf, and Alexander Zien. Semi-supervised Learning. Adaptive computation and machine learning. MIT Press, 2006. ISBN 9780262033589. URL https://books.google.com/books?id=kfqv Qg AACAAJ. Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L Yuille. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4):834 848, 2017. Xieyuanli Chen, Thomas L abe, Andres Milioto, Timo R ohling, Olga Vysotska, Alexandre Haag, Jens Behley, Cyrill Stachniss, and FKIE Fraunhofer. Overlap Net: Loop closing for Li DAR-based SLAM. In Proceedings of the Robotics: Science and Systems Conference, 2020. Nonparametric Continuous Sensor Registration Yang Chen and G erard Medioni. Object modeling by registration of multiple range images. In Proceedings of the IEEE International Conference on Robotics and Automation, pages 2724 2729. IEEE, 1991. Hendrik Dahlkamp, Adrian Kaehler, David Stavens, Sebastian Thrun, and Gary R. Bradski. Self-supervised monocular road detection in desert terrain. In Proceedings of the Robotics: Science and Systems Conference, Philadelphia, USA, August 2006. doi: 10.15607/RSS. 2006.II.005. Carl Doersch and Andrew Zisserman. Multi-task self-supervised visual learning. In Proceedings of the IEEE International Conference on Computer Vision, pages 2051 2060, 2017. Jakob Engel, Thomas Sch ops, and Daniel Cremers. LSD-SLAM: Large-scale direct monocular SLAM. In European Conference on Computer Vision, pages 834 849. Springer, 2014. Jakob Engel, J org St uckler, and Daniel Cremers. Large-scale direct SLAM with stereo cameras. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 1935 1942. IEEE, 2015. Jakob Engel, Vladlen Koltun, and Daniel Cremers. Direct sparse odometry. IEEE transactions on pattern analysis and machine intelligence, 40(3):611 625, 2018. Aasa Feragen, Francois Lauze, and Soren Hauberg. Geodesic exponential kernels: When curvature and linearity conflict. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3032 3042, 2015. Friedrich Fraundorfer and Davide Scaramuzza. Visual odometry: Part ii: Matching, robustness, optimization, and applications. IEEE Robotics & Automation Magazine, 19(2): 78 90, 2012. Maani Ghaffari, William Clark, Anthony Bloch, Ryan M. Eustice, and Jessy W. Grizzle. Continuous direct sparse visual odometry from RGB-D images. In Proceedings of the Robotics: Science and Systems Conference, Freiburg, Germany, June 2019. Ross Girshick, JeffDonahue, Trevor Darrell, and Jitendra Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, June 2014. Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT Press, 2016. http://www.deeplearningbook.org. Richard Hartley and Andrew Zisserman. Multiple view geometry in computer vision. Cambridge university press, 2003. Intel Corporation. Official Threading Building Blocks (TBB) Git Hub repository. https: //github.com/intel/tbb, 2019. Vladimir G. Ivancevic and Tijana T. Ivancevic. Lecture notes in Lie groups, 2011. Clark, Ghaffari, and Bloch Sadeep Jayasumana, Richard Hartley, Mathieu Salzmann, Hongdong Li, and Mehrtash Harandi. Kernel methods on riemannian manifolds with gaussian rbf kernels. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(12):2464 2477, 2015. Christian Kerl. Dense Visual Odometry (dvo). https://github.com/tum-vision/dvo, 2013. Christian Kerl, J urgen Sturm, and Daniel Cremers. Dense visual SLAM for RGB-D cameras. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 2100 2106. IEEE, 2013a. Christian Kerl, J urgen Sturm, and Daniel Cremers. Robust odometry estimation for RGB-D cameras. In Proceedings of the IEEE International Conference on Robotics and Automation, pages 3748 3754. IEEE, 2013b. Taejung Kim and Yong-Jo Im. Automatic satellite image registration by combination of matching and random sample consensus. IEEE Transactions on Geoscience and Remote Sensing, 41(5):1111 1117, 2003. David Landry, Fran cois Pomerleau, and Philippe Giguere. Cello-3d: Estimating the covariance of icp in the real world. In Proceedings of the IEEE International Conference on Robotics and Automation, pages 8190 8196. IEEE, 2019. Yann Le Cun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. nature, 521(7553): 436 444, 2015. Tzu-Yuan Lin, William Clark, Ryan M Eustice, Jessy W Grizzle, Anthony Bloch, and Maani Ghaffari. Adaptive continuous visual odometry from rgb-d images. ar Xiv preprint ar Xiv:1910.00713, 2019. Geert Litjens, Thijs Kooi, Babak Ehteshami Bejnordi, Arnaud Arindra Adiyoso Setio, Francesco Ciompi, Mohsen Ghafoorian, Jeroen Awm Van Der Laak, Bram Van Ginneken, and Clara I S anchez. A survey on deep learning in medical image analysis. Medical image analysis, 42:60 88, 2017. Jonathan Long, Evan Shelhamer, and Trevor Darrell. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3431 3440, 2015. Edward M Mikhail, James S Bethel, and J Chris Mc Glone. Introduction to modern photogrammetry. Wiley, 2001. Kevin P Murphy. Machine learning: a probabilistic perspective. The MIT Press, 2012. Radford M Neal. Bayesian learning for neural networks, volume 118. Springer New York, 1996. David Nist er, Oleg Naroditsky, and James Bergen. Visual odometry for ground vehicle applications. Journal of Field Robotics, 23(1):3 20, 2006. Nonparametric Continuous Sensor Registration Frank C Park and Roger W Brockett. Kinematic dexterity of robotic mechanisms. International Journal of Robotics Research, 13(1):1 15, 1994. Steven A Parkison, Lu Gan, Maani Ghaffari Jadidi, and Ryan M. Eustice. Semantic iterative closest point through expectation-maximization. In Proceedings of the British Machine Vision Conference, pages 1 17, Newcastle, UK, September 2018. Steven A Parkison, Maani Ghaffari, Lu Gan, Ray Zhang, Arash K Ushani, and Ryan M Eustice. Boosting shape registration algorithms via reproducing kernel Hilbert space regularizers. IEEE Robotics and Automation Letters, 4(4):4563 4570, 2019. Deepak Pathak, Pulkit Agrawal, Alexei A. Efros, and Trevor Darrell. Curiosity-driven exploration by self-supervised prediction. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, July 2017. Matthieu Pizenberg. DVO core (without ROS dependency). https://github.com/ mpizenberg/dvo/tree/76f65f0c9b438675997f595471d39863901556a9, 2019. Ramona-Andreea Rohan. Some remarks on the exponential map on the groups SO(n) and SE(n). In Proceedings of the Fourteenth International Conference on Geometry, Integrability and Quantization, pages 160 175, Sofia, Bulgaria, 2013. Avangard Prima. Antoni Rosinol, Marcus Abate, Yun Chang, and Luca Carlone. Kimera: an opensource library for real-time metric-semantic localization and mapping. ar Xiv preprint ar Xiv:1910.02490, 2019. Davide Scaramuzza and Friedrich Fraundorfer. Visual odometry. IEEE robotics & automation magazine, 18(4):80 92, 2011. Bernhard Sch olkopf, Ralf Herbrich, and Alex Smola. A generalized representer theorem. In Computational learning theory, pages 416 426. Springer, 2001. Aleksandr Segal, Dirk Haehnel, and Sebastian Thrun. Generalized-ICP. In Proceedings of the Robotics: Science and Systems Conference, Seattle, USA, June 2009. doi: 10.15607/ RSS.2009.V.021. James Servos and Steven L Waslander. Multi-Channel Generalized-ICP: A robust framework for multi-channel scan registration. Robotics and Autonomous Systems, 87:247 257, 2017. Boris Sofman, Ellie Lin, J Andrew Bagnell, John Cole, Nicolas Vandapel, and Anthony Stentz. Improving robot navigation through self-supervised online learning. Journal of Field Robotics, 23(11-12):1059 1075, 2006. Todor Stoyanov, Martin Magnusson, Henrik Andreasson, and Achim J Lilienthal. Fast and accurate scan registration through minimization of the distance between compact 3D NDT representations. International Journal of Robotics Research, 31(12):1377 1393, 2012. Clark, Ghaffari, and Bloch Hauke Strasdat. Local accuracy and global consistency for efficient visual SLAM. Ph D thesis, Imperial College London, 2012. J urgen Sturm, Nikolas Engelhard, Felix Endres, Wolfram Burgard, and Daniel Cremers. A benchmark for the evaluation of RGB-D SLAM systems. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 573 580. IEEE, 2012. Richard Szeliski. Computer vision: algorithms and applications. Springer Science & Business Media, 2010. Sebastian Thrun, Wolfram Burgard, and Dieter Fox. Probabilistic robotics. MIT press, 2005. Yanghai Tsin and Takeo Kanade. A correlation-based approach to robust point set registration. In Proceedings of the European Conference on Computer Vision, pages 558 569. Springer, 2004. Loring W. Tu. An introduction to manifolds. New York, US: Springer, 2011. Loring W. Tu. Differential Geometry. Graduate Texts in Mathematics. Springer, 2017. ISBN 9783319660824. Christopher K. Williams and Carl Edward Rasmussen. Gaussian processes for machine learning, volume 1. MIT press, 2006. Anestis Zaganidis, Li Sun, Tom Duckett, and Grzegorz Cielniak. Integrating deep semantic segmentation into 3-d point cloud registration. IEEE Robotics and Automation Letters, 3(4):2942 2949, October 2018. Ray Zhang, Tzu-Yuan Lin, Chien Erh Lin, Steven A Parkison, William Clark, Jessy W Grizzle, Ryan M Eustice, and Maani Ghaffari. A new framework for registration of semantic point clouds from stereo and RGB-D cameras. In Proceedings of the IEEE International Conference on Robotics and Automation, 2020. Minghan Zhu, Maani Ghaffari, Yuanxin Zhong, Pingping Lu, Zhong Cao, Ryan M Eustice, and Huei Peng. Monocular depth prediction through continuous 3D loss. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2020. Minghan Zhu, Maani Ghaffari, and Huei Peng. Correspondence-free point cloud registration with SO(3)-equivariant implicit shape representations. In Proceedings of the Conference on Robot Learning, 2021. URL https://openreview.net/forum?id=KOq9q Dgn-Ta. Yi Zhu, Karan Sapra, Fitsum A Reda, Kevin J Shih, Shawn Newsam, Andrew Tao, and Bryan Catanzaro. Improving semantic segmentation via video propagation and label relaxation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 8856 8865, 2019.