# learning_the_geometry_of_wavebased_imaging__6129c625.pdf

Learning the Geometry of Wave-Based Imaging

Konik Kothari UIUC kkothar3@illinois.edu

Maarten de Hoop Rice University mvd2@rice.edu

Ivan Dokmani c University of Basel ivan.dokmanic@unibas.ch

We propose a general physics-based deep learning architecture for wave-based imaging problems. A key difﬁculty in imaging problems with a varying background wave speed is that the medium bends the waves differently depending on their position and direction. This space-bending geometry makes the equivariance to translations of convolutional networks an undesired inductive bias. We build an interpretable neural architecture inspired by Fourier integral operators (FIOs) which approximate the wave physics. FIOs model a wide range of imaging modalities, from seismology and radar to Doppler and ultrasound. We focus on learning the geometry of wave propagation captured by FIOs, which is implicit in the data, via a loss based on optimal transport. The proposed FIONet performs signiﬁcantly better than the usual baselines on a number of imaging inverse problems, especially in out-of-distribution tests.

1 Introduction

Recording surface

Recording surface Source

Dyadic-parabolic decomposition

Inverse source problem

Reflector imaging

Sensor traces

Figure 1: A: Sources S1, S2 and S3 are recorded at the surface at times t1, t2 and t3. B: The recording surface generates sensor traces that are used to image the sources. C: Wave packets are localized in frequency by directional bandpass ﬁlters χ2 ν,k. D: In reﬂection imaging, reﬂections of waves are recorded on the surface (see Appendix A.6).

We propose a deep learning approach for wavebased imaging with applications ranging from medical photoacoustic tomography to reﬂection seismology. A simple intuition for imaging with waves can be gleaned from Figure 1. Elementary wave packets propagate from where they are created (sources S1,2,3 in 1A), and then possibly scattered (an interface, 1D), to where they are sensed. When and where a wave packet arrives at a sensor (1B) depends on its orientation and position and the geometry associated with the background wave speed. To the ﬁrst approximation, imaging is accomplished by routing the wave packets back where they were created or scattered.

We consider the problem of estimating an image, v from measurements, u obtained by an imaging forward operator, Aσ given as

u = Aσv (+errors). (1)

The unknown v could, for example, represent the interfaces within the subsurface of Earth. The forward operator (and, hence, its inverse) is parameterized by σ. Here, σ is the background wave speed, a material parameter of the medium. Note that σ varies across the domain and controls the ray paths of wave packets (see Figure 1A,D); thus deﬁning a

34th Conference on Neural Information Processing Systems (Neur IPS 2020), Vancouver, Canada.

geometry that enables wave-based imaging. Aσ depends on σ in a strongly nonlinear fashion. In this work we assume that σ, and hence Aσ, is unknown we only know Aσ up to a class.

We aim to estimate the inverse of (1) by a neural network f trained with a loss L. The central question is how to design f and L. Our approach is based on the physics of wave propagation as captured by the Fourier integral operators (FIOs) with a loss based on optimal transport. As a consequence, our network exhibits strong out-of-distribution (Oo D) generalization and improves interpretability. FIOs describe a vast variety of imaging modalities including reﬂection seismology [1, 22], thermoacoustic [28, 44] and photoacoustic [17] tomography, radar [3, 35, 10], and single photon emission computed tomography [23], modeling both forward and inverse maps. Therefore, our network design is applicable for many imaging modalities.

An FIO Fσ maps the input u L2(R2) (for example, a record of pressure time traces (see Figure 1B)) to its output Fσ[u] (for example, an image of a human brain), as

Fσ[u](y) = 1 (2π)2

R2 a(y, ξ)ˆu(ξ)ei Sσ(y,ξ)dξ, (2)

where ˆu denotes the Fourier transform of u, a(x, ξ) is called the symbol of Fσ, and Sσ(x, ξ) is a suitable phase function (cf. Section 2.2). FIOs are a natural extension of convolutions. If S(x, ξ) = x, ξ and a(x, ξ) ˆa(ξ), (2) is indeed a convolution; it models simple deblurring or denoising. Allowing a general a(x, ξ) makes it a pseudodifferential operator; these appear as approximate solutions of elliptic PDEs [46] or normal operators of imaging [27]. For a general phase S(x, ξ), F becomes powerful: it can deform the domain in an orientation-dependent way. This models approximate solutions of hyperbolic PDEs and therefore wave propagation. The geometry of an FIO (Figure 1), dictated by the medium parameter σ, is completely captured in its phase Sσ(x, ξ).

1.1 Our results

Our architecture design, based on discretization of FIOs [8, 25], improves interpretability and enables strong out-of-distribution generalization without any additional transferor meta-learning schemes [6, 31, 32]. This is essential to imaging in exploratory sciences and medical applications where failing out-of-distribution can be disastrous [5]. A key ingredient that allows this is a module that learns geometry the wave packet (WP) routing network. This module is interpretable in that its output provides physically meaningful deformation maps of the domain. The WP routing network warps pixel grids and never looks at pixel intensities. Hence, once trained, it is data-independent. Another key ingredient to learning this geometry is a training strategy and a loss function based on optimal transport.

1.2 Relation to prior work

Existing physics-based approaches either substitute forward models into unrolled networks or apply auto-differentiation to spatiotemporal ﬁelds parameterized by neural networks [7, 36, 37]. In either case the forward operator should be known in closed form and should be simple to implement; neither is true in our case. The most popular choice for end-to-end learning in imaging are convolutional neural networks (CNNs). There is a vast number of papers on supervised learning for inverse imaging problems; we mention a small selection [41, 40, 38, 27]. CNNs are (approximately) translationcovariant and they excel in problems that are classically solved by ﬁltering. Examples are deblurring, denoising or inverting the Radon transform which becomes a Fourier multiplier upon a composition with its adjoint. Versatile architectures like the U-Net [39] can be applied to more general problems but they lack the right structure to capture wave physics and therefore fail out-of-dataset. A related issue with current CNN architectures is the lack of interpretability: it is not straightforward to associate different parts of a CNN with corresponding physical processes.

In the context of waves, architectures based on wavelet transforms [18] were applied to various imaging modalities [19, 20, 21]. It is however unclear whether the architecture generalizes outof-distribution or how they compare to standard high-quality baselines such as the U-Net, which performs surprisingly well on simple generalization tasks. Finally, we point out the work on meta learning for Calderón-Zygmund operators [24]; our σ is similar to their parameterizations.

2 Imaging with Fourier integral operators

The true Aσ and its inverse can be approximated by FIOs. Our aim, however, is not to simply replicate the functionality of FIOs in a neural network. We rather follow the structure of FIOs to tease out and generalize the key components required to build a more general neural wave imaging operator. The geometry of wave propagation in a medium depends on the orientation of the elementary wave packets (see Figure 1). This suggests to decompose the input into its directional components via a bank of oriented bandpass ﬁlters. Analysis of FIOs provides a geometrically and computationally optimal choice of these ﬁlters.

2.1 Filtering u to a box in the dyadic-parabolic tiling of Fourier space

It has been shown in [45] that for wave propagators, the so-called dyadic-parabolic tiling of the Fourier space as shown in Figure 1C is optimal [43]. Such a tiling divides the Fourier space into overlapping boxes Bν,k, where the length of the box is approximately square of its width. The boxes are indexed by ν, k where ν is a unit-vector denoting the orientation of each box and k is its scale. We deﬁne smooth directional bandpass ﬁlters, ˆχ2 ν,k supported on Bν,k such that they form a partition of unity, ˆχ2 0(ξ) + P

ν ˆχ2 ν,k(ξ) = 1 ξ. We ﬁlter ˆu := Fu into its directional components as ˆuν,k(ξ) = ˆχ2 ν,k(ξ)ˆu(ξ). Note that ˆu(ξ) = P

ν,k ˆuν,k(ξ) by deﬁnition of ˆχ2 ν,k.

2.2 Geometry of FIOs: diffeomorphisms

We now show how the phase function of an FIO characterizes the geometry of wave propagation. The phase Sσ is positive homogeneous of degree 1 in ξ. A Taylor expansion of Sσ(y, ξ) in Bν,k around (y, ν) is then

Sσ(y, ξ) = ξ, Sσ

ξ (y, ν) + S2(y, ξ) + higher order terms. (3)

The second-order term S2(y, ξ) varies only slowly within a box, so exp(i S2(y, ξ)) can be absorbed in the amplitude a(y, ξ). Following this expansion if we discretize the Fourier transform in (2) and ignore the a and S2 terms, then for a box-ﬁltered uν,k we have

(Fσuν,k)(y) 1 (2π)2 X

ξ 1ν,k ˆuν,k(ξ)ei ξ, Sσ

ξ (y,ν) = uν,k

Therefore, for each box, Bν,k (or equivalently each ν), the imaging operator could be coarsely approximated via a diffeomorphism or warping, y Tσ(y, ν) = ξSσ(y, ν). We implement this warping via a bilinear resampling of uν,k. Note that the medium s wave speed, σ dictates the warping.

2.3 Low-rank separated representations

To get a more accurate approximation, we incorporate the amplitude a(y, ξ) and the second-order term S2(y, ξ) via a low-rank separated representation [25],

a(y, ξ) exp [i S2(y, ξ)] 1ν,k(ξ)

r=1 α(r) ν,k(y)ˆϑ(r) ν,k(ξ),

where Rk k/ log(k). Multiplications by ˆϑ(r) ν,k(ξ) act as convolutions in space, while α(r) ν,k(y) correspond to simple (diagonal) spatial multipliers. We can now use (3) in (2) to get

α(r) ν,k(y)

ei Tσ(y,ν),ξ ˆϑ(r) ν,k(ξ)ˆχ2 ν ,k (ξ) ˆu(ξ). (4)

Note that the Bν,k s overlap and therefore, we can generalize results of (4) to allow for the convolutions with ϑν,k interact across the boxes. This parallels a sum across channels in a CNN convolution layer. We denote Cν,k(u) := uν,k, A(r) ν,k(w) := α(r) ν,k w and introduce

H(r) ν,k(u) := P

ν ,k ϑ(r) ν,k;ν ,k uν ,k , to generalize the convolutions with ϑν,k. We introduce I

Fixed curvelet transform

Target Ground truth

Critic network

Fixed curvelet transform

U-Net Downsampling

U-Net Upsampling

Skip connections

Bilinear Interpolation

Latent code

6464 3232 64 64 2

Convolution layers

Routing network,

Figure 2: The FIONet. Upsampling blocks use bilinear interpolation.

as I(w, Tσ(y, ν))(y) := w(Tσ(y, ν)). Therefore, I resamples w via bilinear interpolation on a grid warped via Tσ. Note that I is ﬁxed and only Tσ is learned. This gives

r=1 A(r) ν,k I(H(r) ν,k Cν,k(u), Tσ(y, ν))

3 FIONet: the architecture for wave-based imaging

The action of an FIO, Fσ in (5) suggests incorporating spatial multipliers (A(r) ν,k), convolutions (H(r) ν,k), grid generators (Tσ) and directional ﬁlters (Cν,k) in our architecture. The dyadic-parabolic tiling of the frequency domain corresponds to the map Cν,k. It is a fundamental property tied to the structure of wave propagation. We thus implement it using ﬁxed, non-trainable box ﬁlters constructed from Py Curvelab [47]. These ﬁlters correspond to the ﬁxed curvelet transform layer in Figure 2 that takes an input u deﬁned on an M M pixel grid G and returns an M M Nb output of spectrally ﬁltered uν,ks, where Nb is the number of boxes in the tiling (see Figure 1C). We then design a convolutional module f H,θH, wave packet (WP) routing module f T,θT , and a spatial multiplier module f A,θA with parameters θH, θT , θA such that

f H,θH : RM M Nb RM M Nb R [f H,θH(w)](r) ν,k H(r) ν,k(wν,k) , f T,θT : Rp R2 RM M 2 f T,θT (z, ν) Tσ(z)(y) y G , f A,θA : RM M Nb R RM M Nb [f A,θA(w)]ν,k PR r=1 A(r) ν,kw(r) ν,k . (6)

The map f H,θH operates on the entire stack of box-ﬁltered inputs, uν,k allowing for channel interaction. Moreover, it has nonlinear activation units which generalize Hν,k. A(r) ν,k is implemented via simple multiplication layers. Here R := Rkmax is the maximum number of terms in (5) and is a hyperparameter in our training. Note that we assume that σ belongs to a set of natural medium wave speeds parametrized by a low-dimensional code z Z Rp and write σ as σ(z). We denote the full set of trainable FIONet network parameters by Θ = (θH, θT , θA). The network output on an M M input u is

FIONetΘ(u) = X

ν,k f A,θAI(f ν,k,: H,θH(Cν,k(u)), f T,θT (z, ν)). (7)

3.1 Geometry module: warped grids and resampling

The geometry module takes as input an M M C tensor and resamples each of the C channels deﬁnd on the pixel grid G to a new grid given by f T,θT . The WP routing network, f T,θT , is the central component of the geometry module that routes wave packets via diffeomorphisms introduced in (3).1 Note that the grid given by f T,θT depends on the direction ν. Due to the ﬁxed ﬁltering map

1We can learn more general transformations than diffeomorphisms; for example, we can handle caustics [25].

1 SSIM MSE Original

Figure 3: Left: Small shifts of the (ν, k) channels introduce strong distortion. Right: Comparison of metrics between oscillatory images: MSE(x, xδ), SSIM(|x|, |xδ|) and entropically smoothed W2,ℓ2 between |x|/ x 1 and |xδ|/ xδ 1 via Sinkhorn iteration [42].

Cν,k, the set of all νs is ﬁxed. We associate each of the C channels with one ν; this mapping is ﬁxed. Therefore, for each channel of the input to the geometry module, we receive a warped output grid Gν := f T,θT (z, ν) for a given z (see Figure 2). The channel is then warped via the bilinear resampling operator I using the grid Gν. We zero-ﬁll if points in Gν lie outside G.

An important input to the WP routing network is z which represents a low-dimensional code for σ, the wave speed of the medium. The WP routing network can therefore learn the geometry for mutliple backgrounds; see Appendix E for some preliminary results. However, in this work, we assume that our all our data was generated over a ﬁxed but still unknown wave speed corresponding to a code z = z . Not knowing the wave speed leads to not knowing the exact forward operator, which therefore renders reconstruction methods like Tikhonov-regularized inverses, sparsity-promoting inverses along with more modern methods like deep image prior [50] infeasible.

Note that since the WP routing network only works on ﬁxed νs and does not look at image pixel intensities, once trained it is data-independent. This is key for Oo D generalization as the implicitly learned geometry essentially captures the effect of σ. It is absent in popular neural architectures like the U-Net [39].

3.2 Learning diffeomorphisms via optimal transport

We train the FIONet using a labeled set of input output images {(ui, vi)} (see (1)). We do not assume having any direct geometric information about the medium. The geometric routing information is implicit in {(ui, vi)} and we aim to infer it by a suitable training strategy. We note that the warping was previously used in spatial transformers [26], but those act independently on each channel. Here it is essential that the different Gν grids are tightly coordinated so as to get a meaningful reconstruction (see Figure 3A).

We devise a two-stage training strategy: ﬁrst, we only train the geometry module which captures the bulk of the physics, i.e., we let fΘ(u) = P

ν,k I(Cν,k(u), f T,θT (z, ν)) such that an appropriate loss metric L(v, fΘ(u)) is minimized. This stage is important to prevent the convolutional module f H,θH from overﬁtting the training data. After e0 epochs, we train the full network as in (7) using the MSE loss.

We illustrate in Figure 3 that for the ﬁrst stage of training, popular metrics such as MSE for L(v, fΘ(u)) fail for ﬁltered images uν,k = Cν,k(u). Since uν,k are oscillatory L has many local minima which leads to the problem of cycle skipping [52, 53, 34]. SSIM is smoother but still does not give good gradients for training. A natural optimal transport metric based on entropically smoothed W2,ℓ2 [16] gives a consistently increasing distance metric.

While smoothed Wasserstein metrics have been used for inverse problems [2] via the iterative Sinkhorn Knoop algorithm [11], we ﬁnd that backpropagating through the iterates is unstable. We thus adopt the method of [16] and weaken the loss to an unsupervised one: instead of matching uis to vis in a paired fashion, we match the marginal distributions, P|v| and P|fΘ(u)| while, importantly, using W2,ℓ2 distance between images as the ground metric. Note that we use absolute value to ensure

Pressure snapshot at time T

Reverse time continuation

Inverse source problem

Record sensor data for time T

Reflector imaging

Record reflected data for time T

Figure 4: Three inverse problems. Left: Reverse time continuation: The initial pressure (boxes) propagates over the shown background wave speed. (b) Inverse source problem: Waves are recorded on the blue sensor line giving sensor traces (c) Reﬂector imaging: A source (blue dot) sends a pulse that is reﬂected at the interfaces. The dashed white line show an example ray path.

positivity of measures. As in [16], we use a critic network f D,θD that is employed only during the stage-1 of geometric training. The critic network estimates the W1,W2,ℓ2(P|v|, P|fθT (u)|), giving the ﬁnal learning objective as min θT max θD Eˆv PfΘT (u)f D,θD(|ˆv|) Ev Puf D,θD(|v|) + λE v Pint( f D,θD( v) W2,ℓ2 1)2.

where Pint is the density generated via linear interpolations of samples from P|fΘ(u)| and P|v|.

From Figure 3, we see that minor misalignments of the (ν, k) channels strongly distorts the output. Distribution matching synchronizes the diffeomorphisms to produce sharp images in the ﬁrst stage of training. However, this metric only matches the distributions and not actual data pairs, hence giving us only plausible looking images. In the second stage, we train the entire FIONet (including the convolutions) using only the standard MSE loss. Much of the required geometry is already learnt in the ﬁrst stage which is only tuned further via the MSE loss along with training the other modules in the network. Please see Figure 2 for the architecture details of the FIONet.

3.3 Modeling the low-rank separated representation by the U-Net

Finally, we implement the map Hν,k. We want to use standard convolutional layers. However, it is essential to ensure that we can implement the large receptive-ﬁeld ﬁlters ϑν,k;ν ,k of Hν,k. We choose to use the U-Net [39] owing to its success in convolutional tasks [27]. We give an argument on why U-Net is successful at modeling arbitrarily large ﬁlters based on the polyphase decomposition in Appendix B. Note that there are several ways to implement large ﬁlters like factorized ﬁlters, implementing ﬁlters in Fourier domain etc. In our experiments we found that U-Net was best.

Approximating FIOs by the FIONet While the FIONet architecture generalizes FIO, it is important to show that as a special case it can approximate exact FIOs. We make the following simplifying assumptions: 1) the WP routing network is implemented using fully connected layers; 2) the U-Net uses regular downsampling instead of max pooling. The ﬁrst assumption gives us access to standard approximation theorems; in practice it only makes the forward pass slower. Theorem 1. There exists a set of weights Θ = (θH, θT , θA) such that

F[u] FIONetΘ[u] = O(2 kmin/2) u . (8)

This parallels [25, Theorem 2.1]. Here, the presumed sampling density in the space domain is naturally of order 2 2kmax though an oversampling factor is required.

4 Experiments

We showcase the advantages of learning geometry and the fact that the same network architecture can be applied to various problems. We choose three inverse problems as shown in Figure 4: reverse time continuation, inverse source problem, and reﬂector imaging. We discuss reﬂector imaging and provide additional results in Appendix A. In all problems, we learn the geometry induced by the background wavespeed directly from data. In all our experiments we invoke scale separation. The coarse scale is implicit in the learned geometry. We thus aim to image the ﬁne scales and hence high-pass our target reconstructions. The dataset and training experiment details are given in Appendix D. We choose as baseline, the U-Net, arguably the most successful architecture in imaging [40, 27].

Out of training distribution results In training distribution

Input UNet FIONet Ground truth

Thick lines Random shapes Rotated MNIST Exploding reflectors Celeb A face edges

Figure 5: Reverse time continuation results. FIONet performs better in training distribution and is signiﬁcantly better in out-of-distribution generalization.

In training distribution Out of training distribution

N=1000 N=3000 N=30000

Thick lines Random shapes Rotated MNIST Exploding reflectors Celeb A face edges

Figure 6: Inductive bias of FIONet: for each dataset the topmost row shows the ground truth, the left column per dataset shows baseline U-Net results and the right column shows FIONet results. Each row shows the result trained with N samples from the training distribution.

4.1 Reverse time continuation

In this problem a source pressure ﬁeld, p0 propagates for time T over an unknown background. We are given the ﬁnal pressure p T at t = T and we wish to estimate p0. This problem most intuitively illustrates the geometry of wave propagation. Formally, it corresponds to a sum of two FIOs, one per half-wave propagation (Appendix C). We therefore train two copies of f H,θH in parallel to model the convolutions (see (4)) in the two FIO branches, but follow them by a single WP routing network which now outputs 2 warped grids per ν. In Figure 7), we show the two learned grids Gν for each ν. The outputs of f H,θH,1 and f H,θH,2 are resampled on the grids given by the WP routing network. We train on 3000 samples of randomly oriented short thick box sources and test on samples from completely different distributions. As shown in Figure 5, in the training distribution the FIONet performs slightly better than the U-Net. In out-of-distribution testing the U-Net seems to synthesize outputs from box-like patterns seen during training and therefore does considerably worse compared to FIONet. For numerical results, please see Table 1 in Appendix A.2. We attribute the out-of-distribution results to our architecture design. Due to the geometry module, the effect of the background wave speed σ is completely captured within the learned grids Gν which are data-independent once trained. We also

Background wavespeed [m/s]

Figure 7: Diffeomorphisms learnt in reverse time continuation by the WP routing network at different orientations ν of the wave-packet - red line indicates the orientation of the wave-packet in space.

Out of training distribution results In training distribution

Sensor trace

Baseline UNet

FIONet (Ours)

Ground truth Source pressure

Thick lines Random shapes Rotated MNIST Exploding reflectors

Masked source pressure

Figure 8: Inverse source problem results: We faithfully recover what the sensor sees.

study the invariance to noise in measurements in Appendix A.3 and show intermediate results from the ﬁrst stage of training in Appendix A.4.

Favorable inductive bias: Figure 6 shows that even with a small training set (1000 samples), the FIONet achieves good performance both within and out of training distribution, which improves with the dataset size. The U-Net still synthesizes outputs using box-like patterns seen in the training set.

Interpretability: Since the WP routing network explicitly models the geometry of the operator, the Gνs are physically meaningful estimates (Figure 7). The deformed grids clearly show the propagation of the two half-wave solutions (Appendix C). We also found that whenever the FIONet did not give reasonable warped grids, the out-of-distribution performance suffered. This suggests that getting the geometry right is indeed central to imaging. This information is not explicitly encoded in any previous architecture.

4.2 Inverse source problem

In many imaging modalities (for e.g., photo-acoustic tomography, seismic imaging) sensors are placed at the domain boundary. Instead of having a snapshot of the waveﬁeld at time T, we have the pressure trace at the sensor locations for all times [0, T]. The inverse map in such scenarios is modeled by a single FIO [29].

In Figure 8 we show how the FIONet handles such a scenario. Note that the sensor trace is in the (x1, t) domain while the source is in the (x1, x2) domain. We deliberately choose a background such

that not all wave-packets reach the sensor boundary. In Figure 8, we show the interfaces that are seen by the sensor as masked source pressure. We see that these are faithfully recovered by the FIONet. Often in deep learning approaches to imaging, one claims that since the baseline U-Net reconstructs unseen data as well it is better. However, these networks can be unreliable when tested out-of-distribution [5]. Here we aim to be faithful to the physics.

The FIONet does not predict below the black line which demarcates the seen and unseen regions as dictated by the physics. Nonetheless, from the seen data it still reconstructs more faithfully out-of-distribution than a black-box U-Net even without knowing the background. For numerical results see Table 2 in Appendix A.2.

5 Conclusion and future work

We proposed a general architecture, FIONet, and a training strategy for solving inverse problems in wave-based imaging. The wave packet routing network central to our proposal manifests the geometry of wave propagation and scattering in its warped output grids. We showed that explicitly learning the geometry enables strong out-of-distribution generalization, outperforming competitive baselines on a variety of imaging problems. This is essential in applications of machine learning in exploratory science. FIOs model a remarkable collection of inverse problems in exploratory imaging, all of which can be addressed with the FIONet. This points to exciting opportunities in applying machine learning to relevant problems in medicine, Earth and planetary sciences, and astronomy. Our codes are available2 for the community to reproduce our results and use our architectures for their own imaging modalities.

2https://github.com/kkothari93/fionet

Broader Impact

We do not see any major ethical consequences of this work. Our work has implications in the ﬁelds of exploratory imaging earthquake detection, medical imaging etc. Our work improves the quality and reliability of imaging in these ﬁelds. Improving these ﬁelds has direct societal impact in ﬁnding new natural preserves, improved diagnosis in healthcare etc. A failure of our system leaves machine learning unreliable in exploratory imaging. Our method provides strong out-of-distribution generalization and hence is not biased according to the data.

Acknowledgments and Disclosure of Funding

Authors would like to thank Herwig Wendt for the helpful discussions. MVd H gratefully acknowledges support from the Department of Energy under grant DE-SC0020345, the Simons Foundation under the MATH + X program, and the corporate members of the Geo-Mathematical Imaging Group at Rice University. ID was supported by the European Research Council Starting Grant 852821 SWING.

[1] Tim JPM Op t Root, Christiaan C Stolk, and V Maarten. Linearized inverse scattering based on seismic reverse time migration . In: Journal de mathématiques pures et appliquées 98.2 (2012), pp. 211 238. [2] Jonas Adler et al. Learning to solve inverse problems using Wasserstein loss . In: ar Xiv preprint ar Xiv:1710.10898 (2017). [3] Gaik Ambartsoumian et al. A class of singular Fourier integral operators in synthetic aperture radar imaging . In: Journal of Functional Analysis 264.1 (2013), pp. 246 269. [4] Fredrik Andersson, Maarten V de Hoop, and Herwig Wendt. Multiscale discrete approximation of Fourier integral operators . In: Multiscale Modeling & Simulation 10.1 (2012), pp. 111 145. [5] Vegard Antun et al. On instabilities of deep learning in image reconstruction and the potential costs of AI . In: Proceedings of the National Academy of Sciences (2020). [6] Yogesh Balaji, Swami Sankaranarayanan, and Rama Chellappa. Meta Reg: Towards Domain Generalization using Meta-Regularization . In: Advances in Neural Information Processing Systems. 2018, pp. 998 1008. [7] Tatiana A Bubba et al. Learning the invisible: A hybrid deep learning-shearlet framework for limited angle computed tomography . In: Inverse Problems 35.6 (2019), p. 064002. [8] Emmanuel Candes, Laurent Demanet, and Lexing Ying. Fast computation of Fourier integral operators . In: SIAM Journal on Scientiﬁc Computing 29.6 (2007), pp. 2464 2493. [9] Emmanuel Candes et al. Fast discrete curvelet transforms . In: Multiscale Modeling & Simulation 5.3 (2006), pp. 861 899. [10] Margaret Cheney and Brett Borden. Fundamentals of radar imaging. Vol. 79. Siam, 2009. [11] Marco Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport . In: Advances in neural information processing systems. 2013, pp. 2292 2300. [12] Maarten V De Hoop et al. Seismic imaging with the generalized Radon transform: A curvelet transform perspective . In: Inverse Problems 25.2 (2009), p. 025005. [13] Robert B Dingle and RB Dingle. Asymptotic expansions: their derivation and interpretation. Vol. 48. Academic Press London, 1973. [14] Minh N Do and Yue M Lu. Multidimensional ﬁlter banks and multiscale geometric representations . In: Foundation and Trends in Signal Processing 5.3 (2011), pp. 157 264. [15] Anton A Duchkov, Fredrik Andersson, and V Maarten. Discrete almost-symmetric wave packets and multiscale geometrical representation of (seismic) waves . In: IEEE Transactions on Geoscience and Remote Sensing 48.9 (2010), pp. 3408 3423. [16] Yonatan Dukler et al. Wasserstein of Wasserstein loss for learning generative models . In: International Conference in Machine Learning (2019). [17] Peter Elbau, Otmar Scherzer, and Rainer Schulze. Reconstruction formulas for photoacoustic sectional imaging . In: Inverse Problems 28.4 (2012), p. 045004.

[18] Yuwei Fan, Cindy Orozco Bohorquez, and Lexing Ying. BCR-Net: A neural network based on the nonstandard wavelet form . In: Journal of Computational Physics 384 (2019), pp. 1 15. [19] Yuwei Fan and Lexing Ying. Solving Inverse Wave Scattering with Deep Learning . In: ar Xiv preprint ar Xiv:1911.13202 (2019). [20] Yuwei Fan and Lexing Ying. Solving optical tomography with deep learning . In: ar Xiv preprint ar Xiv:1910.04756 (2019). [21] Yuwei Fan and Lexing Ying. Solving Traveltime Tomography with Deep Learning . In: ar Xiv preprint ar Xiv:1911.11636 (2019). [22] Hongjian Fang et al. Parsimonious Seismic Tomography with Poisson Voronoi Projections: Methodology and Validation . In: Seismological Research Letters 91.1 (2020), pp. 343 355. [23] Raluca Felea and Eric Todd Quinto. The microlocal properties of the local 3-D SPECT operator . In: SIAM journal on mathematical analysis 43.3 (2011), pp. 1145 1157. [24] Jordi Feliu-Faba, Yuwei Fan, and Lexing Ying. Meta-learning pseudo-differential operators with deep neural networks . In: Journal of Computational Physics 408 (2020), p. 109309. [25] Maarten V de Hoop et al. Multiscale discrete approximations of Fourier integral operators associated with canonical transformations and caustics . In: Multiscale Modeling & Simulation 11.2 (2013), pp. 566 585. [26] Max Jaderberg, Karen Simonyan, Andrew Zisserman, et al. Spatial transformer networks . In: Advances in neural information processing systems. 2015, pp. 2017 2025. [27] Kyong Hwan Jin et al. Deep convolutional neural network for inverse problems in imaging . In: IEEE Transactions on Image Processing 26.9 (2017), pp. 4509 4522. [28] Geng Ku et al. Thermoacoustic and photoacoustic tomography of thick biological tissues toward breast imaging . In: Technology in cancer research & treatment 4.5 (2005), pp. 559 565. [29] Peter Kuchment. Generalized transforms of Radon type and their applications . In: Proceedings of Symposia in Applied Mathematics. Vol. 63. 2006, p. 67. [30] Richard C Lee and Lane R Johnson. Extremal bounds on the seismic velocities in the Earth s mantle . In: Geophysical Journal International 77.3 (1984), pp. 667 681. [31] Da Li et al. Deeper, broader and artier domain generalization . In: Proceedings of the IEEE international conference on computer vision. 2017, pp. 5542 5550. [32] Da Li et al. Learning to generalize: Meta-learning for domain generalization . In: Thirty Second AAAI Conference on Artiﬁcial Intelligence. 2018. [33] Ziwei Liu et al. Deep Learning Face Attributes in the Wild . In: Proceedings of International Conference on Computer Vision (ICCV). Dec. 2015. [34] Ludovic Métivier et al. Optimal transport for mitigating cycle skipping in full-waveform inversion: A graph-space transform approach . In: Geophysics 83.5 (2018), R515 R540. [35] Eric Todd Quinto, Andreas Rieder, and Thomas Schuster. Local inversion of the sonar transform regularized by the approximate inverse . In: Inverse Problems 27.3 (2011), p. 035006. [36] Maziar Raissi, Paris Perdikaris, and George Em Karniadakis. Physics informed deep learning (part I): Data-driven solutions of nonlinear partial differential equations . In: ar Xiv preprint ar Xiv:1711.10561 (2017). [37] Maziar Raissi, Paris Perdikaris, and George Em Karniadakis. Physics informed deep learning (part II): Data-driven discovery of nonlinear partial differential equations . In: ar Xiv preprint ar Xiv:1711.10566 (2017). [38] Yair Rivenson et al. Deep learning microscopy . In: Optica 4.11 (2017), pp. 1437 1443. [39] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmentation . In: International Conference on Medical image computing and computer-assisted intervention. Springer. 2015, pp. 234 241. [40] Johannes Schwab et al. Real-time photoacoustic projection imaging using deep learning . In: ar Xiv preprint ar Xiv:1801.06693 (2018). [41] Dinggang Shen, Guorong Wu, and Heung-Il Suk. Deep learning in medical image analysis . In: Annual review of biomedical engineering 19 (2017), pp. 221 248. [42] Richard Sinkhorn and Paul Knopp. Concerning nonnegative matrices and doubly stochastic matrices . In: Paciﬁc Journal of Mathematics 21.2 (1967), pp. 343 348.

[43] Hart F Smith. A parametrix construction for wave equations with C1,1 coefﬁcients . In: Annales de l Institut Fourier. Vol. 48. 1998, pp. 797 835. [44] Plamen Stefanov and Gunther Uhlmann. Thermoacoustic tomography arising in brain imaging . In: Inverse Problems 27.4 (2011), p. 045004. [45] Elias M Stein and Timothy S Murphy. Harmonic analysis: real-variable methods, orthogonality, and oscillatory integrals. Vol. 3. Princeton University Press, 1993. [46] Michael Taylor. Pseudodifferential operators and nonlinear PDE. Vol. 100. Springer Science & Business Media, 2012. [47] Darren Thomson and Gilles Hennenfent. Py Curvelab. 2020. URL: https://github.com/ slimgroup/Py Curvelab. [48] Bradley E Treeby and Benjamin T Cox. k-Wave: MATLAB toolbox for the simulation and reconstruction of photoacoustic wave ﬁelds . In: Journal of biomedical optics 15.2 (2010), p. 021314. [49] Bradley E Treeby et al. Rapid calculation of acoustic ﬁelds from arbitrary continuous-wave sources . In: The Journal of the Acoustical Society of America 143.1 (2018), pp. 529 537. [50] Dmitry Ulyanov, Andrea Vedaldi, and Victor Lempitsky. Deep image prior . In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018, pp. 9446 9454. [51] Martin Vetterli, Jelena Kovaˇcevi c, and Vivek K Goyal. Foundations of signal processing. Cambridge University Press, 2014. [52] Michael Warner and Lluís Guasch. Adaptive waveform inversion-FWI without cycle skippingtheory . In: 76th EAGE Conference and Exhibition 2014. European Association of Geoscientists & Engineers. 2014, pp. 1 5. [53] Michael Warner et al. Full-waveform inversion of cycle-skipped seismic data by frequency down-shifting . In: SEG Technical Program Expanded Abstracts 2013. Society of Exploration Geophysicists, 2013, pp. 903 907. [54] Dmitry Yarotsky. Error bounds for approximations with deep Re LU networks . In: Neural Networks 94 (2017), pp. 103 114. [55] Jinming Zhu, Larry Lines, and Sam Gray. Smiles and frowns in migration/velocity analysis . In: Geophysics 63.4 (1998), pp. 1200 1209.