# regionspecific_diffeomorphic_metric_mapping__07d7c520.pdf

Region-speciﬁc Diffeomorphic Metric Mapping

Zhengyang Shen UNC Chapel Hill zyshen@cs.unc.edu

François-Xavier Vialard LIGM, UPEM francois-xavier.vialard@u-pem.fr

Marc Niethammer UNC Chapel Hill mn@cs.unc.edu

We introduce a region-speciﬁc diffeomorphic metric mapping (RDMM) registration approach. RDMM is non-parametric, estimating spatio-temporal velocity ﬁelds which parameterize the sought-for spatial transformation. Regularization of these velocity ﬁelds is necessary. In contrast to existing non-parametric registration approaches using a ﬁxed spatially-invariant regularization, for example, the large displacement diffeomorphic metric mapping (LDDMM) model, our approach allows for spatially-varying regularization which is advected via the estimated spatio-temporal velocity ﬁeld. Hence, not only can our model capture large displacements, it does so with a spatio-temporal regularizer that keeps track of how regions deform, which is a more natural mathematical formulation. We explore a family of RDMM registration approaches: 1) a registration model where regions with separate regularizations are pre-deﬁned (e.g., in an atlas space or for distinct foreground and background regions), 2) a registration model where a general spatially-varying regularizer is estimated, and 3) a registration model where the spatially-varying regularizer is obtained via an end-to-end trained deep learning (DL) model. We provide a variational derivation of RDMM, showing that the model can assure diffeomorphic transformations in the continuum, and that LDDMM is a particular instance of RDMM. To evaluate RDMM performance we experiment 1) on synthetic 2D data and 2) on two 3D datasets: knee magnetic resonance images (MRIs) of the Osteoarthritis Initiative (OAI) and computed tomography images (CT) of the lung. Results show that our framework achieves comparable performance to state-of-the-art image registration approaches, while providing additional information via a learned spatio-temporal regularizer. Further, our deep learning approach allows for very fast RDMM and LDDMM estimations. Code is available at https://github.com/uncbiag/registration.

1 Introduction

Quantitative analysis of medical images frequently requires the estimation of spatial correspondences, i.e.image registration. For example, one may be interested in capturing knee cartilage changes over time, localized changes of brain structures, or how organs at risk move between planning and treatment for radiation treatment. Speciﬁcally, image registration seeks to estimate the spatial transformation between a source image and a target image, subject to a chosen transformation model.

Transformations can be parameterized via low-dimensional parametric models (e.g., an afﬁne transformation), but more ﬂexible models are required to capture subtle local deformations. Such registration models [3, 32] may have large numbers of parameters, e.g., a large number of B-spline control points [30] or may even be non-parametric where vector ﬁelds are estimated [5, 22]. Spatial regularity can be achieved by appropriate constraints on displacement ﬁelds [14] or by parameterizing the transformation via integration of a sufﬁciently regular stationary or time-dependent velocity ﬁeld [5, 15, 38, 8, 41]. Given sufﬁcient regularization, diffeomorphic transformations can be assured in the continuum. A popular approach based on time-dependent velocity ﬁelds is LDDMM [5, 15].

33rd Conference on Neural Information Processing Systems (Neur IPS 2019), Vancouver, Canada.

Optimal LDDMM solutions are geodesics and minimize a geodesic distance. Consequentially, one may directly optimize over a geodesic s initial conditions in a shooting approach [40].

Most existing non-parametric image registration approaches use spatially-invariant regularizers. However, this may not be realistic. E.g., when registering inhale to exhale images of a lung one expects large deformations of the lung, but not of the surrounding tissue. Hence, a spatially-varying regularization would be more appropriate. As the regularizer encodes the deformation model this then allows anticipating different levels of deformation at different image locations.

While spatially-varying regularizers may be used in LDDMM variants [31] existing approaches do not allow for time-varying spatially-varying regularizers. However, such regularizers would be natural for large displacement as they can move with a deforming image. Hence, we propose a family of registration approaches with spatio-temporal regularizers based on advecting spatially-varying regularizers via an estimated spatio-temporal velocity ﬁeld. Speciﬁcally, we extend LDDMM theory, where the original LDDMM model becomes a special case. In doing so, our entire model, including the spatio-temporal regularizer is expressed via the initial conditions of a partial differential equation. We propose three different approaches based on this model: 1) A model for which the regularizer is speciﬁed region-by-region. This would, for example, be natural when registering a labeled atlas image to a target image, as illustrated in Fig. 1. 2) A model in which the regularizer is estimated jointly with the spatio-temporal velocity ﬁelds. 3) A deep learning model which predicts the regularizer and the initial velocity ﬁeld, thereby resulting in a very fast registration approach.

Related Work Only limited work [26, 27, 36, 39] on spatially-varying regularizers exists. The most related work [23] learns a spatially-varying regularizer for a stationary velocity ﬁeld registration model. In [23] both the velocity ﬁeld and the regularizer are assumed to be constant in time. In contrast, both are time-dependent in RDMM. Spatially-varying regularization has also been addressed from a Bayesian view in [34] by putting priors on B-spline transformation parameters. However, metric estimation in [34] is in a ﬁxed atlas-space, whereas RDMM addresses general pairwise image registration.

Contributions: 1) We propose RDMM, a new registration model for large diffeomorphic deformations with a spatio-temporal regularizer capable of following deforming objects and thereby providing a more natural representation of deformations than existing non-parametric models, such as LDDMM. 2) Via a variational formulation we derive shooting equations that allow specifying RDMM solutions entirely based on their initial conditions: an initial momentum ﬁeld and an initial spatially-varying regularizer. 3) We prove that diffeomorphisms can be obtained for RDMM in the continuum for sufﬁciently regular regularizers. 4) We explore an entire new family of registration models based on RDMM and provide optimization-based and very fast deep-learning-based approaches to estimate the initial conditions of these registration models. 5) We demonstrate the utility of our approach via experiments on synthetic data and on two 3D medical image datasets.

2 Standard LDDMM Model

LDDMM [5] is a non-parametric registration approach based on principles from ﬂuid mechanics. It is based on the estimation of a spatio-temporal velocity ﬁeld v(t, x) from which the sought-for spatial transformation ϕ can be computed via integration of tϕ(t, x) = v(t, ϕ(t, x)) . For appropriately regularized velocity ﬁelds [13], diffeomorphic transformations can be guaranteed. The optimization problem underlying LDDMM for images can be written as ( being the gradient; , indicating the inner product)

v = argmin v 1 2

0 v(t) 2 L dt + Sim(I(1), I1), s.t. t I + I, v = 0; I(0) = I0 . (2.1)

Here, the goal is to register the source image I0 to the target image I1 in unit time. Sim(A, B) is a similarity measure between images, often sum of squared differences, normalized cross correlation, or mutual information. Furthermore, we note that I(1, y) = I0 ϕ 1(1, y), where ϕ 1 denotes the inverse of ϕ in the target image space. The evolution of this map can be expressed as

tϕ 1 + Dϕ 1v = 0 , (2.2)

Figure 1: RDMM registration example. The goal is to register the dark blue area with high ﬁdelity (i.e., allowing large local deformations), while assuring small deformations in the cyan area. Initially (t = 0), a spatially-varying regularizer (ﬁfth column) is speciﬁed in the source image space, where dark blue indicates small regularization and yellow large regularization. Speciﬁcally, regularizer values indicate effective local standard deviations of a local multi-Gaussian regularizer. Since the transformation map and the regularizer are both advected according to the estimated velocity ﬁeld, the shape of the regularizer follows the shape of the deforming dark blue region and is of the same shape as the region of interest in the target space at the ﬁnal time (t = 1) (as can be seen in the second and the last columns). Furthermore, objects inside the dark blue region are indeed aligned well, whereas objects in the cyan region were not strongly deformed due to the larger regularization there.

where D denotes the Jacobian. Equivalently, in Eq. (2.1), we directly advect the image [15, 40] via t I + I, v = 0. To assure smooth transformations, LDDMM penalizes non-smooth velocity ﬁelds via the norm v 2 L = Lv, v , where L is a differential operator.

At optimality of Eq. (2.1) the Euler-Lagrange equations are (div denoting the divergence)

t I+ I, v = 0, I(0) = I0; tλ+div(λv) = 0, λ(1) = δ δI(1) Sim(I(1), I1) = 0; v = L 1(λ I).

(2.3) Here, λ is the adjoint variable to I, also known as the scalar momentum [15, 40]. As L 1 is a smoother it is often chosen as a convolution, i.e., v = K (λ I). Note that m(t, x) := λ I is the vector-valued momentum and thus v = K m. One can directly optimize over v solving Eq. (2.3) [5] or regard Eq. (2.3) as a constraint deﬁning a geodesic path [40] and optimize over all such solutions subject to a penalty on the initial scalar momentum as well as the similarity measure. Alternatively, one can express 1 these equations entirely with respect to the vector-valued momentum, m, resulting in the Euler-Poincaré equation for diffeomorphisms (EPDiff) [44]:

tm + div(v)m + Dv T (m) + Dm(v) = 0, m(0) = m0, v = K m , (2.4)

which deﬁnes the evolution of the spatio-temporal velocity ﬁeld based on the initial condition, m0, of the momentum, from which the transformation ϕ can be computed using Eq. (2.2). Both (2.3) and (2.4) can be used to implement shooting-based LDDMM [40, 35]. As LDDMM preserves the momentum, v 2 L is constant over time and hence a shooting formulation can be written as

m(0) = argmin m(0)

1 2 v(0) 2 L + Sim(I(1), I1), (2.5)

subject to the EPDiff equation (2.4) including the advection of ϕ 1, where I(1) = I0 ϕ 1(1).

A shooting-formulation has multiple beneﬁts: 1) it allows for a compact representation of ϕ via its initial conditions; 2) as the optimization is w.r.t. the initial conditions, a solution is a geodesic by construction; 3) these initial conditions can be predicted via deep-learning, resulting in very fast registration algorithms which inherit the theoretical properties of LDDMM [43, 42]. We therefore use this formulation as the starting point for RDMM in Sec 3.

1Thm. (2) in suppl. 7.2 provides a more generalized derivation.

3 Region-Speciﬁc Diffeomorphic Metric Mapping (RDMM)

In standard LDDMM approaches, the regularizer L (equivalently the kernel K) is spatially invariant. While recent work introduced spatially-varying metrics in LDDMM, for stationary velocity ﬁelds, or for displacement-regularized registration [27, 31, 23, 26], all of these approaches use a temporally ﬁxed regularizer. Hereafter, we generalize LDDMM by advecting a spatially-varying regularizer via the estimated spatio-temporal velocity ﬁeld. Standard LDDMM is a special case of our model.

Following [23], we introduce (Vi)i=0,...,N 1 a ﬁnite family of reproducing kernel Hilbert spaces (RKHS) which are deﬁned by the pre-deﬁned Gaussian kernels Kσi with σ0 < . . . < σN 1. We use a partition of unity wi(x, t), i = 0, . . . , N 1, on the image domain. As we want the kernel Kσi to be active on the region determined by wi we introduce the vector ﬁeld v = PN 1 i=0 wiνi for νi Vi. On this new space of vector ﬁelds, there exists a natural RKHS structure deﬁned by

v 2 L := inf

i=0 νi 2 Vi | v =

whose kernel is K = PN 1 i=0 wi Kσiwi. Thus the velocity reads (see suppl. 7.1 for the derivation)

v = K m def. =

i=0 wi Kσi (wim), wi 0 , (3.2)

which can capture multi-scale aspects of deformation [28]. In LDDMM, the weights are constant and pre-deﬁned. Here, we allow spatially-dependent weights wi(x). In particular (see formulation below), we advect them via, v(t, x) thereby making them spatio-temporal, i.e., wi(t, x). In this setup, weights only need to be speciﬁed at initial time t = 0. As the Gaussian kernels are ﬁxed convolutions can still be efﬁciently computed in the Fourier domain.

We prove (see suppl. 7.4 for the proof) that, for sufﬁciently smooth weights wi, the velocity ﬁeld is bounded and its ﬂow is a diffeomorphism. Following [23], to assure the smoothness of the initial weights we instead optimize over initial pre-weights, hi(0, x) 0, s.t. wi(0, x) = Gσ hi(0, x), where Gσ is a ﬁxed Gaussian with a small standard deviation, σ. In addition, we constrain PN 1 i=0 h2 i (0, x) to locally sum to one. The optimization problem for our RDMM model then becomes

v , {h i } = argmin v,{hi}

0 v(t) 2 L dt + Sim(I(1), I1) + Reg({hi(0)}) , (3.3)

subject to the constraints

t I + I, v = 0, I(0) = I0; thi + hi, v = 0, hi(0) = (hi)0; (3.4)

νi = Kσi (wim); v =

i=0 wiνi; wi = Gσ hi .

As for LDDMM, we can compute the optimality conditions for Eq. (3.3) which we use for shooting.

Theorem 1 (Image-based RDMM optimality conditions). With the adjoints γi (for hi) and λ (for I) and the momentum m := λ I + PN 1 i=0 γi hi the optimality conditions for (3.3) are:

t I + I, v = 0, I(0) = I0; tλ + div(λv) = 0, λ(1) + δ δI(1) Sim(I(1), I1) = 0; (3.5)

thi + hi, v = 0, hi(0) = (hi)0; (3.6)

tγi + div(γiv) = Gσ (m νi), γi(0) + δ δhi(0) Reg({hi(0)}) = 0 . (3.7)

νi = Kσi (wim); v =

i=0 wiνi; wi = Gσ hi . (3.8)

Theorem 2 (Momentum-based RDMM optimality conditions). The RDMM optimality conditions of Thm. (1) can be written entirely w.r.t. the momentum (as deﬁned in Thm (1)). They are:

tϕ 1 + Dϕ 1v = 0, ϕ 1(0, x) = x , (3.9)

tm + div(v)m + Dv T (m) + Dm(v) =

i=0 Gσ (m νi) hi, m(0) = m0 , (3.10)

where hi(t, x) = hi(0, x) ϕ(t, x) 1 and subject to the constraints of Eq. (3.8) which deﬁne the relationship between the velocity and the momentum.

For spatially constant pre-weights, we recover EPDiff from the momentum-based RDMM optimality conditions. Instead of advecting hi via ϕ(t, x) 1, we can alternatively advect the pre-weights directly, as thi + hi, v = 0, hi(0) = (hi)0. For the image-based and the momentum-based formulations, the velocity ﬁeld v is obtained by smoothing the momentum, v = PN 1 i=0 wi Kσi (wim).

Regularization of the Regularizer

Given the standard deviations σ0 < . . . < σN 1 of Eq. (3.2), assigning larger weights to the Gaussians with larger standard deviation will result in smoother (i.e., more regular) and therefore simpler transformations. To encourage choosing simpler transformations we follow [23], where a simple optimal mass transport (OMT) penalty on the weights is used. Such an OMT penalty is sensible as the non-negative weights, {wi}, sum to one and can therefore be considered a discrete probability distribution. The chosen OMT penalty is designed to measure differences from the probability distribution assigning all weight to the Gaussian with the largest standard deviation, i.e., w N 1 = 1 and all other

weights being zero. Speciﬁcally, the OMT penalty of [23] is log σN 1

s PN 1 i=0 wi

s , where s is a chosen power. To make the penalty more consistent with penalizing weights for a standard multi-Gaussian regularizer (as our regularizer contains effectively the weights squared) we do not penalize the weights directly, but instead penalize their squares using the same form of OMT penalty. Further, as the regularization only affects the initial conditions for the pre-weights, the evolution equations for the optimality conditions (i.e., the modiﬁed EPDiff equation) do not change. Additional regularizers, such as total variation terms as proposed in [23], are possible and easy to integrate into our RDMM framework as they only affect initial conditions. For simplicity, we focus on regularizing via OMT.

Shooting Formulation

As energy is conserved (see suppl. 7.3 for the proof) the momentum-based shooting formulation becomes

m(0) , {hi(0) } = argmin m(0),{hi(0)}

1 2 v(0) 2 L + Sim(I(1), I1) + Reg({hi(0)}) , (3.11)

subject to the evolution equations of Thm. 2. Similarly, the shooting formulation can use the image-based evolution equations of Thm. 1 where optimization would be over λ(0) instead of m(0).

4 Learning Framework

The parameters for RDMM, i.e., the initial momentum and the initial pre-weights, can be obtained by numerical optimization, either over the momentum and the pre-weights or only over the momentum if the pre-weights are prescribed. Just as for the LDDMM model, such a numerical optimization is computationally costly. Consequentially, various deep-learning (DL) approaches have been proposed to instead predict displacements [4, 6], stationary velocity [29] or momentum ﬁelds [43, 23]. Supervised [43, 42] and unsupervised [16, 11, 18, 4, 10] DL registration approaches exist. All of them are fast as only a regression solution needs to be evaluated at test time and no further numerical optimization is necessary. Additionally, such DL models beneﬁt from learning over an entire population instead of relying only on information from given image-pairs.

Most non-parametric registration approaches are not invariant to afﬁne transformations based on the chosen regularizers. Hence, for such non-parametric methods, afﬁne registration is typically performed ﬁrst as a pre-registration step to account for large, global displacements or rotations. Similarly, we make use of a two-step learning framework (Fig. 2 (left)) learning afﬁne transformations

Figure 2: Illustration of the learning framework (left) and its non-parametric registration component (right). A multi-step afﬁne-network ﬁrst predicts the afﬁne transformation map [33] followed by an iterable non-parametric registration to estimate the ﬁnal transformation map. The LDDMM component uses one network to generate the initial momentum. RDMM also uses a second network to predict the initial regularizer pre-weights. We integrate the RDMM evolution equations at lowresolution (based on the predicted initial conditions) to save memory. The ﬁnal transformation map is obtained via upsampling. See suppl. 7.6 for the detailed structure of the LDDMM/RDMM units.

and subsequent non-parametric deformations separately. For the afﬁne part, a multi-step afﬁne network is used to predict the afﬁne transformation following [33]. For the non-parametric part, we use two different deep learning approaches, illustrated in the right part of Fig. 2, to predict 1) the LDDMM initial momentum, m0, in Eq. (2.5) and 2) the RDMM initial momentum, m0, and the pre-weights, {hi(0) }, in Eq. (3.11). Overall, including the afﬁne part, we use two networks for LDDMM prediction and three networks for RDMM prediction.

We use low-resolution maps and map compositions for the momentum and the pre-weight networks. This reduces computational cost signiﬁcantly. The ﬁnal transformation map is obtained via upsampling, which is reasonable as we assume smooth transformations. We use 3D UNets [9] for momentum and pre-weight prediction. Both the afﬁne and the non-parametric networks can be iterated to reﬁne the prediction results: i.e. the input source image and the initial map are replaced with the currently warped image and the transformation map respectively for the next iteration.

During training of the non-parametric part, the gradient is ﬁrst backpropagated through the differentiable interpolation operator, then through the LDDMM/RDMM unit, followed by the momentum generation network and the pre-weight network.

Inverse Consistency: For the DL approaches we follow [33] and compute bidirectional (source to target denoted as st and target to source denoted as ts) registration losses and an additional symmetry loss, (ϕst) 1 (ϕts) 1 id 2 2, where id refers to the identity map. This encourages symmetric consistency.

5 Experimental Results and Setup

Datasets: To demonstrate the behavior of RDMM, we evaluate the model on three datasets: 1) a synthetic dataset for illustration, 2) a 3D computed tomography dataset (CT) of a lung, and 3) a large 3D magnetic resonance imaging (MRI) dataset of the knee from the Osteoarthritis Initiative (OAI). The synthetic dataset consists of three types of shapes (rectangles, triangles, and ellipses). There is one foreground object in each image with two objects inside and at most ﬁve objects outside. Each source image object has a counterpart in the target image; the shift, scale, and rotations are random. We generated 40 image pairs of size 2002 for evaluation. Fig. 1 shows example synthetic images. The lung dataset consists of 49 inspiration/expiration image pairs with lung segmentations. Each image is of size 1603. We register from the expiration phase to the inspiration phase for all 49 pairs. The OAI dataset consists of 176 manually labeled MRI from 88 patients (2 longitudinal scans per patient) and 22,950 unlabeled MR images from 2,444 patients. Labels are available for femoral and tibial cartilage. We divide the patients into training (2,800 pairs), validation (50 pairs) and testing groups (300 pairs), with the same evaluation settings as for the cross-subject experiments in [33].

Deformation models: Afﬁne registration is performed before each LDDMM/RDMM registration.

Afﬁne model: We implemented a multi-scale afﬁne model solved via numerical optimization and a multi-step deep neural network to predict the afﬁne transformation parameters.

Family of non-parametric models: We implemented both optimization and deep-learning versions of a family of non-parametric registration methods: a vector-momentum based stationary velocity ﬁeld model (v SVF) (v(x), w = const), LDDMM (v(t, x), w = const), and RDMM (v(t, x), w(t, x)). We use the dopri5 solver using the adjoint sensitivity method [7] to integrate the evolution equations in time. For solutions based on numerical optimization, we use a multi-scale strategy with L-BGFS [19] as the optimizer. For the deep learning models, we compute solutions for a low-resolution map (factor of 0.5) which is then upsampled. We use Adam [17] for optimization.

Image similarity measure: We use multi-kernel Localized Normalized Cross Correlation (mk LNCC) [33]. mk-LNCC computes localized normalized cross correlation (NCC) with different window sizes and combines these measures via a weighted sum.

Weight visualization: To illustrate the behavior of the RDMM model, we visualize the estimated standard deviations, i.e.the square root of the local variance σ2(x) = PN 1 i=0 w2 i (x)σ2 i .

Estimation approaches: To illustrate different aspects of our approach we perform three types of RDMM registrations: 1) registration with a pre-deﬁned regularizer (Sec. 5.1), 2) registration with simultaneous optimization of the regularizer, via optimization of the initial momentum and pre-weights (Sec. 5.2), and 3) registration via deep learning predicting the initial momentum and regularizer pre-weights (Sec. 5.3). Detailed settings for all approaches are in the suppl. 7.8.

5.1 Registration with a pre-deﬁned regularizer

To illustrate the base capabilities of our models, we prescribe an initial spatially-varying regularizer in the source image space. We show experimental results for pair-wise registration of the synthetic data as well as for the 3D lung volumes.

Fig. 1 shows the registration result for an example synthetic image pair. We use small regularization in the blue area and large regularization in the surrounding area. As expected, most of the deformations occur inside the blue area as the regularizer is more permissive there. We also observe that the regularizer is indeed advected with the image. For the real lung image data we use a small regularizer inside the lung (as speciﬁed by the given lung mask) and a large regularizer in the surrounding tissue. Fig. 3 shows that most of the deformations are indeed inside the lung area while the deformation outside the lung is highly regularized as desired. We evaluate the Dice score between the warped lung and the target lung, achieving 95.22% on average (over all inhalation/exhalation pairs). Fig. 3 also shows the determinant of the Jacobian of the transformation map Jϕ 1(x) := |Dϕ 1(x)| (deﬁned in target space): the lung region shows small values (illustrating expansion) while other region are either volume preserved (close to 1) or compressed (bigger than 1). Overall the deformations are smooth.

Figure 3: RDMM lung registration result with a pre-deﬁned regularizer. Lung images at expiration (source) are registered to the corresponding inspiration (target) images. Last column: Inside the lung a regularizer with small standard deviation (σi = {0.04, 0.06, 0.08} , h2 0 = {0.1, 0.4, 0.5}) and outside the lung with large standard deviation is used (σi = {0.2}, h2 0 = {1.0}). Deformations are largely inside the lung, the surrounding tissue is well regularized as expected. Columns 1 to 3 show results in image space while columns 4 to 6 refer to the results in label space. The second to last column shows the determinant of the Jacobian of the spatial transformation, ϕ 1.

5.2 Registration with an optimized regularizer

In contrast to the experiments in Sec. 5.1, we jointly estimate the initial momentum and the initial regularizer pre-weights for pairwise registration. We use the synthetic data and the knee MRIs.

Fig. 4 shows that the registration warped every object in the source image to the corresponding object in the target image. The visualization of the initial regularizer shows that low regularization is

Figure 4: Illustration of the RDMM registration results with an optimized regularizer on the synthetic dataset. All objects are warped from the source image space to the target image space. The last two columns show the regularizer (σ(x)) at t = 0 and t = 1 respectively.

Figure 5: Average Dice scores for all objects. Left to right: SVF, LDDMM and RDMM.

Method OAI Dataset Dice Folds Time (s) Afﬁne Methods - afﬁne-Nifty Reg 30.43 (12.11) 0 45 afﬁne-opt 34.49 (18.07) 0 8 afﬁne-net 44.58 (7.74) 0 0.20 Optimization Methods - - Demons[38, 37] 63.47 (9.52) 0.56 114 Sy N[2, 1] 65.71 (15.01) 0 1330 Nifty Reg-NMI[25, 20, 30, 21] 59.65 (7.62) 0 143 Nifty Reg-LNCC 67.92 (5.24) 35.19 270 v SVF-opt 67.35 (9.73) 0 79 LDDMM-opt 67.72(8.94) 0 457 RDMM-opt 68.18(8.36) 17.37 627 - Learning-based Methods - - Voxel Morph[10](with aff) 66.08 (5.13) 3.31 0.31 v SVF-net [33] 67.59 (4.47) 0.39 0.62 LDDMM-net 67.63(4.51) 0 0.85 RDMM-net 67.94(4.40) 0.47 1.1

Figure 6: Comparison of registration methods for cross-subject registrations on the OAI dataset based on Dice scores. -opt and -net refer to optimizationand DL-based methods respectively. For all DL methods, we report performance after two-step reﬁnement. Folds refers to the absolute value of the sum of the determinant of the Jacobian in the folding area (i.e., where the determinant of the Jacobian is negative); Time refers to the average registration time for a single image pair.

assigned close to object edges making them locally deformable. The visualizations of the regularizer at the initial time point (t = 0) and the ﬁnal time point (t = 1) show that it deforms with the image. That low regularization values are localized is sensible as the OMT regularizer prefers spatially sparse solutions (in the sense of sparsely assigning low levels of regularity). If the desired deformation model is piecewise constant our RDMM model could readily be combined with a total variation penalty as in [23]. Fig. 5 compares average Dice scores for all objects for v SVF, LDDMM and RDMM separately. They all achieve high and comparable performance indicating good registration quality. But only RDMM provides additional information about local regularity.

We further evaluate RDMM on 300 images pairs from the OAI dataset. The optimization methods section of Tab. 6 compares registration performance for different optimization-based algorithms. RDMM achieves high performance. While RDMM is in theory diffeomorphic, we observe some foldings, whereas no such foldings appear for LDDMM and SVF. This is likely due to inaccuracies when discretizing the evolution equations and when discretizing the determinant of the Jacobian. Further, RDMM may locally exhibit stronger deformations than LDDMM or v SVF, especially when the local regularization (via OMT) is small, making it numerically more challenging. Most of the folds appear at the image boundary (and are hence likely due to boundary discretization artifacts) or due to anatomical inconsistency in the source and target images, where large deformations may be estimated.

5.3 Registration with a learnt regularizer via deep learning

Finally, we evaluate our learning framework for non-parametric registration approaches on the OAI dataset. For v SVF, we follow [33] to predict the momentum. We implement the same approach for LDDMM, where the v SVF inference unit is replaced by one for LDDMM (i.e., instead of advecting via a stationary velocity ﬁeld we integrate EPDiff). Similarly, for RDMM, the inference unit is replaced by the evolution equation for RDMM and we use an additional network to predict the regularizer pre-weights. Fig. 6 shows the results. The non-parametric DL models achieve comparable

Figure 7: Illustration of RDMM registration results on the OAI dataset. "* s" and "* b" refer to the regularizer with σ in Gσ set to 0.04 and 0.06 respectively; "learn *" and "opt *" refer to a learnt regularizer and an optimized one respectively; "Jaco" refers to the absolute value of the determinant of the Jacobian; "init_w" refers to the initial weight map of the regularizer (as visualized via σ(x)). The ﬁrst four columns refer to registration results in image space.

performance to their optimization counterparts, but are much faster, while learning-based RDMM simultaneously predicts the initial regularizer which captures aspects of the knee anatomy. Fig. 7 shows the determinant of the Jacobian of the transformation map. It is overall smooth and folds are much less frequent than for the corresponding optimization approach, because the DL model penalizes transformation asymmetry. Fig. 7 also clearly illustrates the beneﬁt of training the DL model based on a large image dataset: compared with the optimization approach (which only works on individual image pairs), the initial regularizer predicted by the deep network captures anatomically meaningful information much better: the bone (femur and tibia) and the surrounding tissue show large regularity.

6 Conclusion and Future Work

We introduced RDMM, a generalization of LDDMM registration which allows for spatially-varying regularizers advected with a deforming image. In RDMM, both the estimated velocity ﬁeld and the estimated regularizer are timeand spatially-varying. We used a variational approach to derive shooting equations which generalize EPDiff and allow the parameterization of RDMM using only the initial momentum and regularizer pre-weights. We also prove that diffeomorphic transformation can be obtained for RDMM with sufﬁciently regular regularizers. Experiments with pre-deﬁned, optimized, and learnt regularizers show that RDMM is ﬂexible and its solutions can be estimated quickly via deep learning. Future work could focus on numerical aspects and explore different initial constraints, such as totalvariation constraints, depending on the desired deformation model. Indeed, a promising avenue of research consists in learning regularizers which include more physical/mechanical a-priori information in the deformation model. For instance, a possible ﬁrst step in this direction consists in parameterizing non-isotropic kernels to favor deformations in particular directions.

Acknowledgements Research reported in this work was supported by the National Institutes of Health (NIH) and the National Science Foundation (NSF) under award numbers NSF EECS-1711776 and NIH 1R01AR072013. The content is solely the responsibility of the authors and does not necessarily represent the ofﬁcial views of the NIH or the NSF. We would also like to thank Dr. Raúl San José Estépar for providing the lung data.

[1] B. B. Avants, C. L. Epstein, M. Grossman, and J. C. Gee. Symmetric diffeomorphic image registration with cross-correlation: evaluating automated labeling of elderly and neurodegenerative brain. Medical image analysis, 12(1):26 41, 2008.

[2] B. B. Avants, N. Tustison, and G. Song. Advanced normalization tools (ANTS). Insight j, 2:1 35, 2009.

[3] R. Bajcsy and S. Kovaˇciˇc. Multiresolution elastic matching. CVGIP, 46(1):1 21, 1989.

[4] G. Balakrishnan, A. Zhao, M. R. Sabuncu, J. Guttag, and A. V. Dalca. An unsupervised learning model for deformable medical image registration. In CVPR, pages 9252 9260, 2018.

[5] M. F. Beg, M. I. Miller, A. Trouvé, and L. Younes. Computing large deformation metric mappings via geodesic ﬂows of diffeomorphisms. IJCV, 61(2):139 157, 2005.

[6] X. Cao, J. Yang, J. Zhang, Q. Wang, P.-T. Yap, and D. Shen. Deformable image registration using a cue-aware deep regression network. IEEE Transactions on Biomedical Engineering, 65(9):1900 1911, 2018.

[7] T. Q. Chen, Y. Rubanova, J. Bettencourt, and D. K. Duvenaud. Neural ordinary differential equations. In Advances in Neural Information Processing Systems, pages 6571 6583, 2018.

[8] Z. Chen, H. Jin, Z. Lin, S. Cohen, and Y. Wu. Large displacement optical ﬂow from nearest neighbor ﬁelds. In CVPR, pages 2443 2450, 2013.

[9] Ö. Çiçek, A. Abdulkadir, S. S. Lienkamp, T. Brox, and O. Ronneberger. 3D U-Net: learning dense volumetric segmentation from sparse annotation. In International conference on medical image computing and computer-assisted intervention, pages 424 432. Springer, 2016.

[10] A. V. Dalca, G. Balakrishnan, J. Guttag, and M. R. Sabuncu. Unsupervised learning for fast probabilistic diffeomorphic registration. MICCAI, 2018.

[11] B. D. de Vos, F. F. Berendsen, M. A. Viergever, M. Staring, and I. Išgum. End-to-end unsupervised deformable image registration with a convolutional neural network. In DLMIA, pages 204 212. Springer, 2017.

[12] J. Delon, J. Salomon, and A. Sobolevski. Local matching indicators for transport problems with concave costs. SIAM Journal on Discrete Mathematics, 26(2):801 827, 2012.

[13] P. Dupuis, U. Grenander, and M. I. Miller. Variational problems on ﬂows of diffeomorphisms for image matching. Quarterly of applied mathematics, pages 587 600, 1998.

[14] E. Haber and J. Modersitzki. Image registration with guaranteed displacement regularity. International Journal of Computer Vision, 71(3):361 372, 2007.

[15] G. L. Hart, C. Zach, and M. Niethammer. An optimal control approach for deformable registration. In CVPR, pages 9 16. IEEE, 2009.

[16] M. Jaderberg, K. Simonyan, A. Zisserman, et al. Spatial transformer networks. In NIPS, pages 2017 2025, 2015.

[17] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. ICLR, 2014.

[18] H. Li and Y. Fan. Non-rigid image registration using fully convolutional networks with deep selfsupervision. ISBI, 2018.

[19] D. C. Liu and J. Nocedal. On the limited memory BFGS method for large scale optimization. Mathematical programming, 45(1-3):503 528, 1989.

[20] M. Modat, D. M. Cash, P. Daga, G. P. Winston, J. S. Duncan, and S. Ourselin. Global image registration using a symmetric block-matching approach. Journal of Medical Imaging, 1(2):024003, 2014.

[21] M. Modat, G. R. Ridgway, Z. A. Taylor, M. Lehmann, J. Barnes, D. J. Hawkes, N. C. Fox, and S. Ourselin. Fast free-form deformation using graphics processing units. Computer methods and programs in biomedicine, 98(3):278 284, 2010.

[22] J. Modersitzki. Numerical methods for image registration. Oxford University Press on Demand, 2004.

[23] M. Niethammer, R. Kwitt, and F.-X. Vialard. Metric learning for image registration. CVPR, 2019.

[24] J. Nocedal and S. Wright. Numerical optimization. Springer Science & Business Media, 2006.

[25] S. Ourselin, A. Roche, G. Subsol, X. Pennec, and N. Ayache. Reconstructing a 3D structure from serial histological sections. Image and vision computing, 19(1-2):25 31, 2001.

[26] D. F. Pace, S. R. Aylward, and M. Niethammer. A locally adaptive regularization based on anisotropic diffusion for deformable image registration of sliding organs. IEEE transactions on medical imaging, 32(11):2114 2126, 2013.

[27] L. Risser, F.-X. Vialard, H. Y. Baluwala, and J. A. Schnabel. Piecewise-diffeomorphic image registration: Application to the motion estimation between 3D CT lung images with sliding conditions. Medical image analysis, 17(2):182 193, 2013.

[28] L. Risser, F.-X. Vialard, R. Wolz, D. D. Holm, and D. Rueckert. Simultaneous ﬁne and coarse diffeomorphic registration: application to atrophy measurement in Alzheimer s disease. In MICCAI, pages 610 617. Springer, 2010.

[29] M.-M. Rohé, M. Datar, T. Heimann, M. Sermesant, and X. Pennec. SVF-Net: Learning deformable image registration using shape matching. In MICCAI, pages 266 274. Springer, 2017.

[30] D. Rueckert, L. I. Sonoda, C. Hayes, D. L. Hill, M. O. Leach, and D. J. Hawkes. Nonrigid registration using free-form deformations: application to breast MR images. TMI, 18(8):712 721, 1999.

[31] T. Schmah, L. Risser, and F.-X. Vialard. Left-invariant metrics for diffeomorphic image registration with spatially-varying regularisation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 203 210. Springer, 2013.

[32] D. Shen and C. Davatzikos. HAMMER: hierarchical attribute matching mechanism for elastic registration. TMI, 21(11):1421 1439, 2002.

[33] Z. Shen, X. Han, Z. Xu, and M. Niethammer. Networks for joint afﬁne and non-parametric image registration. CVPR, 2019.

[34] I. J. Simpson, M. J. Cardoso, M. Modat, D. M. Cash, M. W. Woolrich, J. L. Andersson, J. A. Schnabel, S. Ourselin, A. D. N. Initiative, et al. Probabilistic non-linear registration with spatially adaptive regularisation. Medical image analysis, 26(1):203 216, 2015.

[35] N. Singh, J. Hinkle, S. Joshi, and P. T. Fletcher. A vector momenta formulation of diffeomorphisms for improved geodesic regression and atlas construction. In 2013 IEEE 10th International Symposium on Biomedical Imaging, pages 1219 1222. IEEE, 2013.

[36] R. Stefanescu, X. Pennec, and N. Ayache. Grid powered nonlinear image registration with locally adaptive regularization. Medical image analysis, 8(3):325 342, 2004.

[37] T. Vercauteren, X. Pennec, A. Perchant, and N. Ayache. Symmetric log-domain diffeomorphic registration: A demons-based approach. In MICCAI, pages 754 761. Springer, 2008.

[38] T. Vercauteren, X. Pennec, A. Perchant, and N. Ayache. Diffeomorphic demons: Efﬁcient non-parametric image registration. Neuro Image, 45(1):S61 S72, 2009.

[39] F.-X. Vialard and L. Risser. Spatially-varying metric learning for diffeomorphic image registration: A variational framework. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 227 234. Springer, 2014.

[40] F.-X. Vialard, L. Risser, D. Rueckert, and C. J. Cotter. Diffeomorphic 3D image registration via geodesic shooting using an efﬁcient adjoint calculation. International Journal of Computer Vision, 97(2):229 241, 2012.

[41] J. Wulff and M. J. Black. Efﬁcient sparse-to-dense optical ﬂow estimation using a learned basis and layers. In CVPR, pages 120 130, 2015.

[42] X. Yang, R. Kwitt, and M. Niethammer. Fast predictive image registration. In DLMIA, pages 48 57. Springer, 2016.

[43] X. Yang, R. Kwitt, M. Styner, and M. Niethammer. Quicksilver: Fast predictive image registration a deep learning approach. Neuro Image, 158:378 396, 2017.

[44] L. Younes, F. Arrate, and M. I. Miller. Evolutions equations in computational anatomy. Neuro Image, 45(1):S40 S50, 2009.