# gaussian_interpolation_flows__76314f8b.pdf

Journal of Machine Learning Research 25 (2024) 1-52 Submitted 11/23; Revised 7/24; Published 8/24

Gaussian Interpolation Flows

Yuan Gao yuan0.gao@connect.polyu.hk Department of Applied Mathematics The Hong Kong Polytechnic University Hong Kong SAR, China

Jian Huang j.huang@polyu.edu.hk Departments of Data Science and AI, and Applied Mathematics The Hong Kong Polytechnic University Hong Kong SAR, China

Yuling Jiao yulingjiaomath@whu.edu.cn School of Mathematics and Statistics and Hubei Key Laboratory of Computational Science Wuhan University, Wuhan, China

Editor: Mingyuan Zhou

Gaussian denoising has emerged as a powerful method for constructing simulation-free continuous normalizing ﬂows for generative modeling. Despite their empirical successes, theoretical properties of these ﬂows and the regularizing eﬀect of Gaussian denoising have remained largely unexplored. In this work, we aim to address this gap by investigating the well-posedness of simulation-free continuous normalizing ﬂows built on Gaussian denoising. Through a uniﬁed framework termed Gaussian interpolation ﬂow, we establish the Lipschitz regularity of the ﬂow velocity ﬁeld, the existence and uniqueness of the ﬂow, and the Lipschitz continuity of the ﬂow map and the time-reversed ﬂow map for several rich classes of target distributions. This analysis also sheds light on the auto-encoding and cycle consistency properties of Gaussian interpolation ﬂows. Additionally, we study the stability of these ﬂows in source distributions and perturbations of the velocity ﬁeld, using the quadratic Wasserstein distance as a metric. Our ﬁndings oﬀer valuable insights into the learning techniques employed in Gaussian interpolation ﬂows for generative modeling, providing a solid theoretical foundation for end-to-end error analyses of learning Gaussian interpolation ﬂows with empirical observations.

Keywords: Continuous normalizing ﬂows, Gaussian denoising, generative modeling, Lipschitz transport maps, stochastic interpolation.

1. Introduction

Generative modeling, which aims to learn the underlying data generating distribution from a ﬁnite sample, is a fundamental task in the ﬁeld of machine learning and statistics (Salakhutdinov, 2015). Deep generative models (DGMs) ﬁnd wide-ranging applications across diverse domains such as computer vision, natural language processing, drug discovery, and recom-

c 2024 Yuan Gao, Jian Huang, and Yuling Jiao.

License: CC-BY 4.0, see https://creativecommons.org/licenses/by/4.0/. Attribution requirements are provided at http://jmlr.org/papers/v25/23-1515.html.

Gao, Huang, and Jiao

mendation systems. The core objective of DGMs is to learn a nonlinear mapping, either deterministic or stochastic (with outsourcing randomness), which transforms latent samples drawn from a simple reference distribution into samples that closely resemble the target distribution. Generative adversarial networks (GANs) have emerged as a prominent class of DGMs (Goodfellow et al., 2014; Arjovsky et al., 2017; Goodfellow et al., 2020). Through an adversarial training process, GANs learn to approximately generate samples from the data distribution. Variational auto-encoders (VAEs) are another category of DGMs (Kingma and Welling, 2014; Rezende et al., 2014; Kingma and Welling, 2019). In VAEs, the encoding and decoding procedures produce a compressed and structured latent representation, enabling eﬃcient sampling and interpolation. Score-based diﬀusion models are a promising approach to deep generative modeling that has evolved rapidly since its emergence (Song and Ermon, 2019, 2020; Ho et al., 2020; Song et al., 2021b,a). The basis of score-based diﬀusion models lies in the notion of the score function, which characterizes the gradient of the log-density function of a given distribution. In addition, normalizing ﬂows have gained attention as another powerful class of DGMs (Tabak and Vanden-Eijnden, 2010; Tabak and Turner, 2013; Kobyzev et al., 2020; Papamakarios et al., 2021). In normalizing ﬂows, an invertible mapping is learned to transform a simple source distribution into a more complex target distribution by a composition of a series of parameterized, invertible and diﬀerentiable intermediate transformations. This framework allows for eﬃcient sampling and training by maximum likelihood estimation (Dinh et al., 2014; Rezende and Mohamed, 2015). Continuous normalizing ﬂows (CNFs) pursue this idea further by performing the transformation over continuous time, enabling ﬁne-grained modeling of dynamic systems from the source distribution to the target distribution. The essence of CNFs lies in deﬁning ordinary diﬀerential equations (ODEs) that govern the evolution of CNFs in terms of continuous trajectories. Inspired by the Gaussian denoising approach, which learns a target distribution by denoising its Gaussian smoothed counterpart, many authors have considered simulation-free estimation methods that have shown great potential in large-scale applications (Song et al., 2021a; Liu et al., 2023; Albergo and Vanden-Eijnden, 2023; Lipman et al., 2023; Neklyudov et al., 2023; Tong et al., 2023; Chen and Lipman, 2023; Albergo et al., 2023b; Shaul et al., 2023; Pooladian et al., 2023). However, despite the empirical success of simulation-free CNFs based on Gaussian denoising, a rigorous theoretical analysis of these CNFs has received limited attention thus far. In this work, we explore an ODE ﬂow-based approach for generative modeling, which we refer to as Gaussian Interpolation Flows (GIFs). This method is derived from the Gaussian stochastic interpolation detailed in Section 3. GIFs represent a straightforward extension of the stochastic interpolation method (Albergo and Vanden-Eijnden, 2023; Liu et al., 2023; Lipman et al., 2023). They can be considered a class of CNFs and encompass various ODE ﬂows as special cases. According to the classical Cauchy-Lipschitz theorem, also known as the Picard-Lindel of theorem (Hartman, 2002b, Theorem 1.1), a unique solution to the initial value problem for an ODE ﬂow exists if the velocity ﬁeld is continuous in the time variable and uniformly Lipschitz continuous in the space variable. In the case of GIFs, the velocity ﬁeld depends on the score function of the push-forward measure. Therefore, it remains to be shown that this velocity ﬁeld satisﬁes the regularity conditions stipulated

Gaussian Interpolation Flows

by the Cauchy-Lipschitz theorem. These regularity conditions are commonly assumed in the literature when analyzing the convergence properties of CNFs or general neural ODEs (Chen et al., 2018; Biloˇs et al., 2021; Marion et al., 2023; Marion, 2023; Marzouk et al., 2023). However, there is a theoretical gap in understanding how to translate these regularity conditions on velocity ﬁelds into conditions on target distributions. The main focus of this work is to study and establish the theoretical properties of Gaussian interpolation ﬂow and its corresponding ﬂow map. We show that the regularity conditions of the Cauchy-Lipschitz theorem are satisﬁed for several rich classes of probability distributions using variance inequalities. Based on the obtained regularity results, we further expose the well-posedness of GIFs, the Lipschitz continuity of ﬂow mappings, and applications to generative modeling. The well-posedness results are crucial for studying the approximation and convergence properties of GIFs learned with the ﬂow or score matching method. When applied to generative modeling, our results further elucidate the auto-encoding and cycle consistency properties exhibited by GIFs.

Geometric regularity (Assumption 2.2)

Gaussian interpolation ﬂows

Lipschitz velocity ﬁelds (Proposition 4.1)

Well-posedness (Theorems 5.1, 5.2)

Lipschitz ﬂow maps (Propositions 5.1, 5.2)

Auto-encoding, cycle consistency

Stability in source distributions, stability in velocity ﬁelds (Propositions 6.1, 6.2)

Corollary 5.1

Corollary 6.3

Corollary B.1

Figure 1: Roadmap of the main results.

1.1 Our main contributions

We provide an overview of the main results in Figure 1, in which we indicate the assumptions used in our analysis and the relationship between the results. We also summarize our main contributions below.

In Section 3, we extend the framework of stochastic interpolation proposed in Albergo and Vanden-Eijnden (2023). Various ODE ﬂows can be considered special cases of the extended framework. We prove that the marginal distributions of GIFs satisfy

Gao, Huang, and Jiao

the continuity equation converging to the target distribution in the weak sense. Several explicit formulas of the velocity ﬁeld and its derivatives are derived, which can facilitate computation and regularity estimation.

In Sections 4 and 5, we establish the spatial Lipschitz regularity of the velocity ﬁeld for a range of target measures with rich structures, which is suﬃcient to guarantee the well-posedness of GIFs. Additionally, we deduce the Lipschitz regularity of both the ﬂow map and its time-reversed counterpart. The well-posedness of GIFs is an essential attribute, serving as a foundational requirement for investigating numerical solutions of GIFs. It is important to note that while the ﬂow maps are demonstrated to be Lipschitz continuous transport maps for generative modeling, the Lipschitz regularity for optimal transport maps has only been partially established to date.

In Section 6, we show that the auto-encoding and cycle consistency properties of GIFs are inherently satisﬁed when the ﬂow maps exhibit Lipschitz continuity with respect to the spatial variable. This demonstrates that exact auto-encoding and cycle consistency are intrinsic characteristics of GIFs. Our ﬁndings lend theoretical support to the ﬁndings made by Su et al. (2023), as illustrated in Figures 3 and 4.

In Section 6, we conduct the stability analysis of GIFs, examining how they respond to changes in source distributions and to perturbations in the velocity ﬁeld. This analysis, conducted in terms of the quadratic Wasserstein distance, provides valuable insights that justify the use of learning techniques such as Gaussian initialization and ﬂow or score matching.

2. Preliminaries

In this section, we include several preliminary setups to show notations, basic assumptions, and several useful variance inequalities. Notation. Here we summarize the notation. The space Rd is endowed with the Euclidean metric and we denote by and , the corresponding norm and inner product. Let Sd 1 := {x Rd : x = 1}. For a matrix A Rk d, we use A for the transpose, and the spectral norm is denoted by A 2,2 := supx Sd 1 Ax . For a square matrix A Rd d, we use det(A) for the determinant and Tr(A) for the trace. We use Id to denote the d d identity matrix. For two symmetric matrices A, B Rd d, we denote A B or B A if A B is positive semi-deﬁnite. For two vectors x, y Rd, we denote x y := xy . For Ω1 Rk, Ω2 Rd, n 1, we denote by Cn(Ω1; Ω2) the space of continuous functions f : Ω1 Ω2 that are n times diﬀerentiable and whose partial derivatives of order n are continuous. If Ω2 R, we simply write Cn(Ω1). For any f(x) C2(Rd), let xf, 2 xf, x f, and xf denote its gradient, Hessian, divergence, and Laplacian, respectively. We use X Y to denote X CY for some constant C > 0. The function composition operation is marked as g f := g(f(x)) for functions f and g. The Borel σ-algebra of Rd is denoted by B(Rd). The space of probability measures deﬁned on (Rd, B(Rd)) is denoted as P(Rd). For any Rd-valued random variable X, we use E[X] and Cov(X) to denote its expectation and covariance matrix, respectively. We use µ ν

to denote the convolution for any two probability measures µ and ν, and we use d= to indicate

Gaussian Interpolation Flows

two random variables have the same probability distribution. For a random variable X, let Law(X) denote its probability distribution. Let g : Rk Rd be a measurable mapping and µ be a probability measure on Rk. The push-forward measure f#µ of a measurable set A is deﬁned as f#µ := µ(f 1(A)). Let N(m, Σ) denote a d-dimensional Gaussian random variable with mean vector m Rd and covariance matrix Σ Rd d. For simplicity, let γd,σ2 := N(0, σ2Id), and let ϕm,σ2(x) denote the probability density function of N(m, σ2Id) with respect to the Lebesgue measure. If m = 0, σ = 1, we abbreviate these as γd and ϕ(x). Let Lp(Rd; Rℓ, µ) denote the Lp space with the Lp norm for p [1, ] w.r.t. a measure µ. To simplify the notation, we write Lp(Rd, µ) if ℓ= 1, Lp(Rd; Rℓ) if the Lebesgue measure is used, and Lp(Rd) if both hold.

2.1 Assumptions

We focus on the probability distributions satisfying several types of assumptions of weak convexity, which oﬀer a geometric notion of regularity that is dimension-free in the study of high-dimensional distributions (Klartag, 2010). On the one hand, weak-convexity regularity conditions are useful in deriving dimension-free guarantees for generative modeling and sampling from high-dimensional distributions. On the other hand, they accommodate distributions with complex shapes, including those with multiple modes.

Deﬁnition 2.1 (Cattiaux and Guillin, 2014) A probability measure µ(dx) = exp( U)dx is κ-semi-log-concave for some κ R if its support Ω Rd is convex and its potential function U C2(Ω) satisﬁes 2 x U(x) κId, x Ω.

The κ-semi-log-concavity condition is a relaxed notion of log-concavity, since here κ < 0 is allowed. When κ 0, we are considering a log-concave probability measure that is proved to be unimodal (Saumard and Wellner, 2014). However, when κ < 0, a κ-semi-log-concave probability measure can be multimodal.

Deﬁnition 2.2 (Eldan and Lee, 2018) A probability measure µ(dx) = exp( U)dx is βsemi-log-convex for some β > 0 if its support Ω Rd is convex and its potential function U C2(Ω) satisﬁes 2 x U(x) βId, x Ω.

The following deﬁnition of L-log-Lipschitz continuity is a variant of L-Lipschitz continuity. It characterizes a ﬁrst-order condition on the target function rather than a second-order condition such as κ-semi-log-concavity and β-semi-log-convexity in Deﬁnitions 2.1 and 2.2.

Deﬁnition 2.3 A function f : Rd R+ is L-log-Lipschitz continuous if its logarithm is L-Lipschitz continuous for some L 0.

Based on the deﬁnitions, we present two assumptions on the target distribution. Assumption 2.1 concerns the absolute continuity and the moment condition. Assumption 2.2 imposes geometric regularity conditions.

Assumption 2.1 The probability measure ν is absolutely continuous with respect to the Lebesgue measure and has a ﬁnite second moment.

Gao, Huang, and Jiao

Assumption 2.2 Let D := (1/

2)diam(supp(ν)). The probability measure ν satisﬁes one or more of the following conditions:

(i) ν is β-semi-log-convex for some β > 0 and κ-semi-log-concave for some κ > 0 with supp(ν) = Rd;

(ii) ν is κ-semi-log-concave for some κ R with D (0, );

(iii) ν = γd,σ2 ρ where ρ is a probability measure supported on a Euclidean ball of radius R on Rd;

(iv) ν is β-semi-log-convex for some β > 0, κ-semi-log-concave for some κ 0, and dν dγd (x) is L-log-Lipschitz in x for some L 0 with supp(ν) = Rd.

Multimodal distributions. Assumption 2.2 enumerates scenarios where probability distributions are endowed with geometric regularity. We examine the scenarios and clarify whether they cover multimodal distributions. Scenario (i) is referred to as the classical strong log-concavity case (κ > 0), and thus, describes unimodal distributions. Scenario (ii) allows κ 0 and requires that the support is bounded. Mixtures of Gaussian distributions are considered in Scenario (iii), and typically are multimodal distributions. Scenario (iv) also allows κ 0 when considering a log-Lipschitz perturbation of the standard Gaussian distribution. Both Scenario (ii) and Scenario (iv) incorporate multimodal distributions due to the potential negative lower bound κ. Lipschitz score. Lipschitz continuity of the score function is a basic regularity assumption on target distributions in the study of sampling algorithms based on Langevin and Hamiltonian dynamics. Even for high-dimensional distributions, this assumption endows a great source of regularity. For an L-Lipschitz score function, its corresponding distribution is both L-semi-log-convex and ( L)-semi-log-concave for some L 0.

2.2 Variance inequalities

Variance inequalities like the Brascamp-Lieb inequality and the Cram er-Rao inequality are fundamental inequalities for explaining the regularizing eﬀect of Gaussian denoising. Combined with κ-semi-log-concavity and β-semi-log-convexity, these inequalities are crucial for deducing the Lipschitz regularity of the velocity ﬁelds of GIFs in Proposition 4.1-(b) and (c).

Lemma 2.1 (Brascamp-Lieb inequality) Let µ(dx) = exp( U(x))dx be a probability measure on a convex set Ω Rd whose potential function U : Ω R is of class C2 and strictly convex. Then for every locally Lipschitz function f L2(Ω, µ),

Varµ(f) Eµ xf, ( 2 x U) 1 xf . (2.1)

When applied to functions of the form f : x 7 x, e for any e Sd 1, the Brascamp Lieb inequality yields an upper bound of the covariance matrix

Covµ(X) Eµ ( 2 x U(x)) 1 (2.2)

Gaussian Interpolation Flows

with equality if X N(m, Σ) with Σ positive deﬁnite. Under the strong log-concavity condition, that is, µ is κ-semi-log-concave with κ > 0, and if the Euclidean Bakry Emery criterion is satisﬁed (Bakry and Emery, 1985), the Brascamp-Lieb inequality instantly recovers the Poincar e inequality (see Deﬁnition G.2). The Brascamp-Lieb inequality originally appeared in (Brascamp and Lieb, 1976, Theorem 4.1). Alternative proofs are provided in Bobkov and Ledoux (2000); Bakry et al. (2014); Cordero-Erausquin (2017). The dimension-free inequality (2.1) can be further strengthened to obtain several variants with dimensional improvement.

Lemma 2.2 (Cram er-Rao inequality) Let µ(dx) = exp( U(x))dx be a probability measure on Rd whose potential function U : Rd R is of class C2. Then for every f C1(Rd),

Varµ(f) Eµ[ xf], Eµ[ 2 x U] 1 Eµ[ xf] . (2.3)

When applied to functions of the form f : x 7 x, e for any e Sd 1, the Cram er-Rao inequality yields a lower bound of the covariance matrix

Covµ(X) Eµ[ 2 x U(x)] 1 (2.4)

with equality as well if X N(m, Σ) with Σ positive deﬁnite. The Cram er-Rao inequality plays a central role in asymptotic statistics as well as in information theory. The inequality (2.4) has an alternative derivation from the Cram er Rao bound for the location parameter. For detailed proofs of the Cram er-Rao inequality, readers are referred to Chewi and Pooladian (2022); Dai et al. (2023), and the references therein.

3. Gaussian interpolation ﬂows

Simulation-free CNFs represent a potent class of generative models based on ODE ﬂows. Albergo and Vanden-Eijnden (2023) and Albergo et al. (2023b) introduce an innovative CNF that is constructed using stochastic interpolation techniques, such as Gaussian denoising. They conduct a thorough investigation of this ﬂow, particularly examining its applications and eﬀectiveness in generative modeling. We study the ODE ﬂow and its associated ﬂow map as deﬁned by the Gaussian denoising process. This process has been explored from various perspectives, including diﬀusion models and stochastic interpolants. Building upon the work of Albergo and Vanden-Eijnden (2023) and Albergo et al. (2023b), we expand the stochastic interpolant framework by relaxing certain conditions on the functions at and bt, oﬀering a more comprehensive perspective on the Gaussian denoising process. In our generalization, we introduce an adaptive starting point to the stochastic interpolation framework, which allows for greater ﬂexibility in the modeling process. By examining this modiﬁed framework, we aim to demonstrate that the Gaussian denoising principle is eﬀectively implemented within the context of stochastic interpolation.

Deﬁnition 3.1 (Vector interpolation) Let z Rd, x1 Rd be two vectors in the Euclidean space and let x0 := a0z +b0x1 with a0 > 0, b0 0. Then we construct an interpolant

Gao, Huang, and Jiao

between x0 and x1 over time t [0, 1] through It(x0, x1), deﬁned by

It(x0, x1) = atz + btx1, (3.1)

where at, bt satisfy

at 0, bt 0, a0 > 0, b0 0, a1 = 0, b1 = 1,

at > 0 for any t (0, 1), bt > 0 for any t (0, 1),

at, bt C2([0, 1)), a2 t C1([0, 1]), bt C1([0, 1]).

Remark 3.1 Compared with the vector interpolant deﬁned by Albergo and Vanden-Eijnden (2023) (a.k.a. one-sided interpolant in Albergo et al. (2023b)), we extend its deﬁnition by relaxing the requirements that a0 = 1, b0 = 0 with a0 > 0, b0 0. This consideration is largely motivated by analyzing the probability ﬂow ODEs of the variance-exploding (VE) SDE and the variance-preserving (VP) SDE (Song et al., 2021b). We illustrate examples of interpolants incorporated by Deﬁnition 3.1 in Table 1.

Remark 3.2 We have eased the smoothness conditions for the functions at and bt required in Albergo and Vanden-Eijnden (2023). Speciﬁcally, we consider the case where at, bt C2([0, 1)), a2 t C1([0, 1]), and bt C1([0, 1]). This relaxation enables us to include the F ollmer ﬂow into our framework, characterized by at =

1 t2 and bt = t. It is evident that at =

1 t2 does not fulﬁll the condition at C2([0, 1]), but it does meet the requirements at C2([0, 1)) and a2 t C1([0, 1]).

Remark 3.3 The C2 regularity of at, bt is necessary to derive the regularity of the velocity ﬁeld v(t, x) in Eq. (3.5) concerning the time variable t. In addition, the C1 regularity of a2 t , bt is suﬃcient to ensure the Lipschitz regularity of the velocity ﬁeld v(t, x) in Eq. (3.5) concerning the space variable x.

A natural generalization of the vector interpolant (3.1) is to construct a set interpolant between two convex sets through Minkowski sum, which is common in convex geometry. A set interpolant stimulates the construction of a measure interpolant between a structured source measure and a target measure. As noted, we can construct a measure interpolation using a Gaussian convolution path. The measure interpolation is particularly relevant to Gaussian denoising and Gaussian channels in information theory as elucidated in Remark 3.8. Because of this connection with Gaussian denoising, we call the measure interpolation a Gaussian stochastic interpolation. The Gaussian stochastic interpolation can be understood as a collection of linear combinations of a standard Gaussian random variable and the target random variable. The coeﬃcients of the linear combinations vary with time t [0, 1] as shown in Deﬁnition 3.1. Later in this section, we will show this Gaussian stochastic interpolation can be transformed into a deterministic ODE ﬂow. Gaussian stochastic interpolation has been investigated from several perspectives in the literature. The rectiﬁed ﬂow has been proposed in Liu et al. (2023), and its theoretical connection with optimal transport has been investigated in Liu (2022). The formulation of the rectiﬁed ﬂow is to learn the ODE ﬂow deﬁned by stochastic interpolation with linear

Gaussian Interpolation Flows

time coeﬃcients. In Appendix C of Liu et al. (2023), there is a nonlinear extension of the rectiﬁed ﬂow in which the linear coeﬃcients are replaced by general nonlinear coeﬃcients. Albergo et al. (2023b) extends the stochastic interpolant framework proposed in (Albergo and Vanden-Eijnden, 2023) by considering a linear combination among three random variables. In Section 3 of Albergo et al. (2023b), the original stochastic interpolant framework is recovered as a one-sided interpolant between the Gaussian distribution and the target distribution. Moreover, Lipman et al. (2023) propose a ﬂow matching method which directly learns a Gaussian conditional probability path with a neural ODE. In Section 4.1 of Lipman et al. (2023), the velocity ﬁelds of the variance exploding and variance preserving probability ﬂows are shown as special instances of the ﬂow matching framework. We summarize these formulations as Gaussian stochastic interpolation by slightly extending the original stochastic interpolant framework.

Type VE VP Linear F ollmer Trigonometric

at αt αt 1 t

1 t2 cos( π

2 t) bt 1 p

1 α2 t t t sin(π

2 t) a0 α0 α0 1 1 1 b0 1 p

1 α2 0 0 0 0 Source Convolution Convolution γd γd γd

Table 1: Summary of various measure interpolants including VE interpolant (Song et al., 2021b), VP interpolant (Song et al., 2021b), linear interpolant (Liu et al., 2023), F ollmer interpolant (Dai et al., 2023), and trigonometric interpolant (Albergo and Vanden-Eijnden, 2023). There are two types of source measures including a standard Gaussian distribution γd and a convoluted distribution consisting of the target distribution and γd.

Deﬁnition 3.2 (Measure interpolation) Let µ = Law(X0) and ν = Law(X1) be two probability measures satisfying X0 = a0Z + b0X1 where Z γd := N(0, Id) is independent from X1. We call (Xt)t [0,1] a Gaussian stochastic interpolation from the source measure µ to the target measure ν, which is deﬁned through It over time interval [0, 1] as follows

Xt = It(X0, X1), X0 = a0Z + b0X1, Z γd, X1 ν. (3.3)

Remark 3.4 It is obvious that the marginal distribution of Xt satisﬁes Xt d= at Z + bt X1 with Z γd, X1 ν.

Motivated by the time-varying properties of the Gaussian stochastic interpolation, we derive that its marginal ﬂow satisﬁes the continuity equation. This result characterizes the dynamics of the marginal density ﬂow of the Gaussian stochastic interpolation.

Theorem 3.1 Suppose that Assumption 2.1 holds. Then the marginal ﬂow (pt)t [0,1] of the Gaussian stochastic interpolation (Xt)t [0,1] between µ and ν satisﬁes the continuity equation

Gao, Huang, and Jiao

tpt + x (ptv(t, x)) = 0, (t, x) [0, 1] Rd, p0(x) = dµ

dx(x), p1(x) = dν

dx(x) (3.4)

in the weak sense with the velocity ﬁeld

v(t, x) := E[ at Z + bt X1|Xt = x], t (0, 1), (3.5)

v(0, x) := lim t 0 v(t, x), v(1, x) := lim t 1 v(t, x). (3.6)

Remark 3.5 We notice that x = at E[Z|Xt = x] + bt E[X1|Xt = x] due to Eq. (3.3). Then it holds that v(t, x) = at

at x + bt at

at bt E[X1|Xt = x], t (0, 1). (3.7)

We also notice that, according to Tweedie s formula (cf. Lemma G.1 in the appendix), it holds that s(t, x) = bt

a2 t E [X1|Xt = x] 1

a2 t x, t (0, 1), (3.8)

where s(t, x) is the score function of the marginal distribution of Xt pt. Combining (3.7) and (3.8), it follows that the velocity ﬁeld is a gradient ﬁeld and its nonlinear term is the score function s(t, x), namely, for any t (0, 1),

v(t, x) = bt bt x + bt

bt a2 t atat s(t, x). (3.9)

Remark 3.6 A relevant result has been provided in the proof of (Albergo and Vanden Eijnden, 2023, Proposition 4) in a restricted case that a0 = 1, b0 = 0. In this case, if a0, a1, b0, b1 are well-deﬁned, the velocity ﬁeld reads

v(0, x) = a0x + b0Eν[X1], v(1, x) = b1x + a1Eγd[Z]

at time 0 and 1. Otherwise, if any one of a0, a1, b0, b1 is not well-deﬁned, the velocity ﬁeld v(0, x) or v(1, x) should be considered on a case-by-case basis. In addition, we provide an alternative viewpoint of the relationship between the velocity ﬁeld associated with stochastic interpolation and the score function of its marginal ﬂow using Tweedie s formula in Lemma G.1.

Remark 3.7 (Diﬀusion process) The marginal ﬂow of the Gaussian stochastic interpolation (3.3) coincides with the time-reversed marginal ﬂow of a diﬀusion process (Xt)t [0,1) (Albergo et al., 2023b, Theorem 3.5) deﬁned by

d Xt = b1 t b1 t Xt +

b1 t a2 1 t a1 ta1 t d W t.

Remark 3.8 (Gaussian denoising) The Gaussian stochastic interpolation has an informationtheoretic interpretation as a time-varying Gaussian channel. Here a2 t and b2 t /a2 t stand for the noise level and signal-to-noise ratio (SNR) for time t [0, 1], respectively. As time t 1, we are approaching the high-SNR regime, that is, the SNR b2 t /a2 t grows to . Moreover, the SNR b2 t /a2 t is monotonically increasing in time t over [0, 1]. The Gaussian noise level gets reduced through this Gaussian denoising process.

Gaussian Interpolation Flows

Figure 2: Snapshots of a Gaussian interpolation ﬂow based on the F ollmer interpolant. The source distribution is the standard two-dimensional Gaussian distribution γ2, and the target distribution is a mixture of six two-dimensional Gaussian distributions as the shape of a circle. The image panels are placed sequentially from time t = 0 to time t = 1.

We are now ready to deﬁne Gaussian interpolation ﬂows by representing the continuity equation (3.4) with Lagrangian coordinates (Ambrosio and Crippa, 2014). A basic observation is that GIFs share the same marginal density ﬂow with Gaussian stochastic interpolations. The continuity equation (3.4) plays a central role in the derandomization procedure from Gaussian stochastic interpolations to GIFs. We additionally illustrate GIFs using a two-dimensional example as in Figure 2.

Deﬁnition 3.3 (Gaussian interpolation ﬂow) Suppose that probability measure ν satisﬁes Assumption 2.1. If (Xt)t [0,1] solves the initial value problem (IVP)

dt (x) = v(t, Xt(x)), X0(x) µ, t [0, 1], (3.10)

where µ is deﬁned in Deﬁnition 3.2 and the velocity ﬁeld v is given by Eq. (3.5) and (3.6), we call (Xt)t [0,1] a Gaussian interpolation ﬂow associated with the target measure ν.

4. Spatial Lipschitz estimates for the velocity ﬁeld

We have explicated the idea of Gaussian denoising with the procedure of Gaussian stochastic interpolation or a Gaussian channel with increasing SNR w.r.t. time. By interpreting the process as an ODE ﬂow, we derive the framework of Gaussian interpolation ﬂows. First and foremost, an intuition is that the regularizing eﬀect of Gaussian denoising would ensure

Gao, Huang, and Jiao

the Lipschitz smoothness of the velocity ﬁeld. Since the standard Gaussian distribution is both 1-semi-log-concave and 1-semi-log-convex, its convolution with a target distribution will maintain its high regularity as long as the target distribution satisﬁes the regularity conditions. We rigorously justify this intuition by establishing spatial Lipschitz estimates for the velocity ﬁeld. These estimates are established based on the upper bounds and lower bounds regarding the Jacobian matrix of the velocity ﬁeld v(t, x) according to the Cauchy Lipschitz theorem, which are given in Proposition 4.1 below. To deal with the Jacobian matrix xv(t, x), we introduce a covariance expression of it and present the associated upper bounds and lower bounds. The velocity ﬁeld v(t, x) is decomposed into a linear term and a nonlinear term, the score function s(t, x). To analyze the Jacobian xv(t, x), we only need to focus on xs(t, x), that is, 2 x log pt(x). To ease the notation, we would henceforth use Y for X1. Correspondingly, we replace p1(x) with p1(y) for the density function of Y. According to Bayes theorem, the marginal density pt of Xt satisﬁes

pt(x) = Z p(t, x|y)p1(y)dy

where Y p1(y) and p(t, x|y) = ϕbty,a2 t (x) is a conditional distribution induced by the Gaussian noise. Due to the factorization pt(x)p(y|t, x) = p(t, x|y)p1(y), the score function s(t, x) and its derivative xs(t, x) have the following expressions

s(t, x) = x log p(y|t, x) x bty

a2 t , xs(t, x) = 2 x log p(y|t, x) 1

Thanks to the expressions above, a covariance matrix expression of xs(t, x) is endowed by the exponential family property of p(y|t, x).

Lemma 4.1 The conditional distribution p(y|t, x) is an exponential family distribution and a covariance matrix expression of the log-Hessian matrix 2 x log p(y|t, x) for any t (0, 1) is given by 2 x log p(y|t, x) = b2 t a4 t Cov(Y|Xt = x), (4.1)

where Cov(Y|Xt = x) is the covariance matrix of Y|Xt = x p(y|t, x). Moreover, for any t (0, 1), it holds that

xs(t, x) = b2 t a4 t Cov(Y|Xt = x) 1

a2 t Id, (4.2)

and that xv(t, x) = b2 t a2 t

Cov(Y|Xt = x) + at

at Id. (4.3)

Remark 4.1 Since t b2 t a2 t

= 2b2 t a2 t

, it follows from (4.3) that the derivative of the SNR with respect to time t controls the dependence of xv(t, x) on Cov(Y|Xt = x).

The representation (4.3) can be used to upper bound and lower bound xv(t, x). This technique has been widely used to deduce the regularity of the score function concerning the space variable (Mikulincer and Shenfeld, 2021, 2023; Chen et al., 2023b; Lee et al., 2023;

Gaussian Interpolation Flows

Chen et al., 2023a). The covariance matrix expression (4.2) of the score function has a close connection with the Hatsell-Nolte identity in information theory (Hatsell and Nolte, 1971; Palomar and Verd u, 2005; Wu and Verd u, 2011; Cai and Wu, 2014; Wibisono et al., 2017; Wibisono and Jog, 2018a,b; Dytso et al., 2023a,b). Employing the covariance expression in Lemma 4.1, we establish several bounds on xv(t, x) in the following proposition.

Proposition 4.1 Let ν(dy) = p1(y)dy be a probability measure on Rd with D := (1/

2)diam(supp(ν)).

(a) For any t (0, 1),

at at Id xv(t, x)

( bt(at bt atbt)

a3 t D2 + at

(b) Suppose that p1 is β-semi-log-convex with β > 0 and supp(p1) = Rd. Then for any t (0, 1],

xv(t, x) βat at + bt bt

βa2 t + b2 t Id.

(c) Suppose that p1 is κ-semi-log-concave with κ R. Then for any t (t0, 1],

xv(t, x) κat at + bt bt

κa2 t + b2 t Id,

where t0 is the root of the equation κ + b2 t a2 t = 0 over t (0, 1) if κ < 0 and t0 = 0 if κ 0.

(d) Fix a probability measure ρ on Rd supported on a Euclidean ball of radius R, and let ν := γd,σ2 ρ with σ > 0. Then for any t (0, 1),

atat + σ2 btbt

a2 t + σ2b2 t Id xv(t, x)

( atbt(at bt atbt)

(a2 t + σ2b2 t )2 R2 + atat + σ2 btbt

a2 t + σ2b2 t

(e) Suppose that dν dγd (x) is L-log-Lipschitz for some L 0. Then for any t (0, 1),

bt a2 t atat Bt L2 bt a2 t +b2 t

2 + atat+ btbt

xv(t, x) n bt

bt a2 t atat Bt + atat+ btbt

where Bt := 5Lbt(a2 t + b2 t ) 3

2 (L + (log( p

a2 t + b2 t /bt)) 1

Comparing part (a) with part (d) in Proposition 4.1, we can see that the bounds in (a) are consistent with those in (d) in the sense that (a) is a limiting case of part (d) as σ 0. The lower bound in part (a) blows up at time t = 1 owing to a1 = 0, while in part (d) it

Gao, Huang, and Jiao

behaves well since the lower bound in part (d) coincides with a lower bound indicated by the 1 σ2 -semi-log-convex property. It reveals that the regularity of the velocity ﬁeld v(t, x) with respect to the space variable x improves when the target random variable is bounded and is subject to Gaussian perturbation. The lower bound in part (b) and the upper bound in part (c) are tight in the sense that both of them are attainable for a Gaussian target distribution, that is,

xv(t, x) = βat at + bt bt

βa2 t + b2 t Id if ν = γd,1/β.

The upper and lower bounds in Proposition 4.1-(a) and (e) become vacuous as they both blow up at time t = 1. The intuition behind is that the Jacobian matrix of the velocity ﬁeld can be both lower and upper bounded at time t = 1 only if the score function of the target measure is Lipschitz continuous in the space variable x. Under an additional Lipschitz score assumption (equivalently, β-semi-log-convex and κ-semi-log-concave for some β = κ 0), the upper and lower bounds in part (a) and part (e) can be strengthened at time t = 1 based on the lower bound in (b) and the upper bound in part (c). According to Proposition 4.1-(a) and (c), there are two upper bounds available that shall be compared with each other. One is the D2-based bound in part (a), and the other is the κ-based bound in part (c). According to the proof of Proposition 4.1 given in the Appendix, these two upper bounds are equal if and only if the corresponding upper bounds on Cov(Y|Xt = x) are equal, that is,

D2 = κ + b2 t a2 t

Then the critical case is κD2 = 1 since simplifying Eq. (4.4) reveals that

D 2 κ = b2 t a2 t . (4.5)

We note that b2 t /a2 t , ranging over (0, ), is monotonically increasing w.r.t. t (0, 1). Suppose that κD2 > 1. Then (4.5) has no root over t (0, 1), which implies that the κ-based bound is tighter over [0, 1), i.e.,

D2 > κ + b2 t a2 t

1 , t [0, 1).

Otherwise, suppose that κD2 < 1. Then (4.5) has a root t1 (0, 1), which implies that the D2-based bound is tighter over [0, t1), i.e.,

D2 < κ + b2 t a2 t

1 , t [0, t1),

and that the κ-based bound is tighter over [t1, 1), i.e.,

D2 κ + b2 t a2 t

1 , t [t1, 1).

Gaussian Interpolation Flows

Next, we present several upper bounds on the maximum eigenvalue of the Jacobian matrix of the velocity ﬁeld λmax( xv(t, x)) and its exponential estimates for studying the Lipschitz regularity of the ﬂow maps as noted in Lemma 5.1.

Corollary 4.1 Let ν be a probability measure on Rd with D := (1/

2)diam(supp(ν)) and suppose that ν is κ-semi-log-concave with κ 0.

(a) If κD2 1, then

λmax( xv(t, x)) θt := κat at + bt bt

κa2 t + b2 t , t [0, 1]. (4.6)

(b) If κD2 < 1, then

λmax( xv(t, x)) θt :=

at , t [0, t1),

κat at+bt bt

κa2 t +b2 t , t [t1, 1], (4.7)

where t1 solves (4.5).

Corollary 4.2 Let ν be a probability measure on Rd with D := (1/

2)diam(supp(ν)) < and suppose that ν is κ-semi-log-concave with κ < 0. Then

λmax( xv(t, x)) θt :=

at , t [0, t1),

κat at+bt bt

κa2 t +b2 t , t [t1, 1], (4.8)

where t1 solves (4.5).

Corollary 4.3 Fix a probability measure ρ on Rd supported on a Euclidean ball of radius R and let ν := γd,σ2 ρ with σ > 0. Then

λmax( xv(t, x)) θt := atat + σ2 btbt

a2 t + σ2b2 t + atbt(at bt atbt)

(a2 t + σ2b2 t )2 R2. (4.9)

Corollary 4.4 Suppose that ν is κ-semi-log-concave for some κ 0, and dν dγd (x) is L-log Lipschitz for some L 0. Then

λmax( xv(t, x)) θt :=

bt a2 t atat Bt + atat+ btbt

a2 t +b2 t , t [0, t2),

κat at+bt bt

κa2 t +b2 t , t [t2, 1], (4.10)

where Bt := 5Lbt(a2 t + b2 t ) 3

2 (L + (log( p

a2 t + b2 t /bt)) 1

2 ) and t2 (t0, 1).

Gao, Huang, and Jiao

5. Well-posedness and Lipschtiz ﬂow maps

In this section, we study the well-posedness of GIFs and the Lipschitz properties of their ﬂow maps. We also show that the marginal distributions of GIFs satisfy the log-Sobolev inequality and the Poincar e inequality if Assumptions 2.1 and 2.2 are satisﬁed.

Theorem 5.1 (Well-posedness) Suppose Assumptions 2.1 and 2.2-(i), (iii), or (iv) are satisﬁed. Then there exists a unique solution (Xt)t [0,1] to the IVP (3.10). Moreover, the push-forward measure satisﬁes Xt#µ = Law(at Z + bt X1) with Z γd, X1 ν.

Theorem 5.2 Suppose Assumptions 2.1 and 2.2-(ii) are satisﬁed. For any t (0, 1), there exists a unique solution (Xt)t [0,1 t] to the IVP (3.10). Moreover, the push-forward measure satisﬁes Xt#µ = Law(at Z + bt X1) with Z γd, X1 ν.

Corollary 5.1 (Time-reversed ﬂow) Suppose Assumptions 2.1 and 2.2-(i), (iii), or (iv) are satisﬁed. Then the time-reversed ﬂow (X t )t [0,1] associated with ν is a unique solution to the IVP: d X t dt (x) = v(1 t, X t (x)), X 0(x) ν, t [0, 1]. (5.1)

The push-forward measure satisﬁes X t #ν = Law(a1 t Z + b1 t X1) where Z γd, X1 ν. Moreover, the ﬂow map satisﬁes X t (x) = X 1 t (x).

Corollary 5.2 Suppose Assumptions 2.1 and 2.2-(ii) are satisﬁed. For any t (0, 1), the time-reversed ﬂow (X t )t [t,1] associated with ν is a unique solution to the IVP:

d X t dt (x) = v(1 t, X t (x)), X t (x) Law(a1 t Z + b1 t X1), t [t, 1], (5.2)

where Z γd, X1 ν. The push-forward measure satisﬁes X t #ν = Law(a1 t Z + b1 t X1). Moreover, the ﬂow map satisﬁes X t (x) = X 1 t (x).

Based on the well-posedness of the ﬂow, we can provide an upper bound on the Lipschitz constant of the induced ﬂow map.

Lemma 5.1 Suppose that a ﬂow (Xt)t [0,1] is well-posed with a velocity ﬁeld v(t, x) : [0, 1] Rd Rd of class C1 in x, and that for any (t, x) [0, 1] Rd, it holds xv(t, x) θt Id. Let the ﬂow map Xs,t : Rd Rd be of class C1 in x for any 0 s t 1. Then the ﬂow map Xs,t is Lipschitz continuous with an upper bound of its Lipschitz constant given by

x Xs,t(x) 2,2 exp Z t

s θudu . (5.3)

Using Lemma 5.1, we show that the ﬂow map of a GIF is Lipschitz continuous in the space variable x.

Proposition 5.1 (Lipschitz mappings) Suppose that Assumptions 2.1 and 2.2-(i) hold.

Gaussian Interpolation Flows

(i) If ν is κ-semi-log-concave for some κ > 0, then the ﬂow map X1(x) is a Lipschitz mapping, that is,

x X1(x) 2,2 1 p

κa2 0 + b2 0 , x Rd.

In particular, if a0 = 1 and b0 = 0, then

x X1(x) 2,2 1 κ, x Rd.

(ii) If ν is β-semi-log-convex for some β > 0, then the time-reversed ﬂow map X 1(x) is a Lipschitz mapping, that is,

x X 1(x) 2,2 q

βa2 0 + b2 0, x supp(ν).

In particular, if a0 = 1 and b0 = 0, then

x X 1(x) 2,2 p

β, x supp(ν).

Proposition 5.2 (Gaussian mixtures) Suppose that Assumptions 2.1 and 2.2-(iii) hold. Then the ﬂow map X1(x) is a Lipschitz mapping, that is,

x X1(x) 2,2 σ p

a2 0 + σ2b2 0 exp a2 0 a2 0 + σ2b2 0 R2

In particular, if a0 = 1 and b0 = 0, then

x X1(x) 2,2 σ exp R2

Moreover, the time-reversed ﬂow map X 1(x) is a Lipschitz mapping, that is,

x X 1(x) 2,2 q

σ 2a2 0 + b2 0, x supp(ν).

In particular, if a0 = 1 and b0 = 0, then

x X 1(x) 2,2 1

σ, x supp(ν).

Remark 5.1 Well-posed GIFs produce diﬀeomorphisms that transport the source measure onto the target measure. The diﬀeomorphism property of the transport maps are relevant to the auto-encoding and cycle consistency properties of their generative modeling applications. We defer a detailed discussion to Section 6.

Early stopping implicitly molliﬁes the target measure with a small Gaussian noise. For image generation tasks (with bounded pixel values), the molliﬁed target measure is indeed a Gaussian mixture distribution considered in Theorem 5.2. The regularity of the target measure largely gets enhanced through such molliﬁcation, especially when the target measure is supported on a low-dimensional manifold in accordance with the data manifold hypothesis. Therefore, although such a diﬀeomorphism X1(x) may not be well-deﬁned for general bounded target measures, an oﬀ-the-shelf solution would be to perturb the target measure with a small Gaussian noise or to employ the early stopping technique. Both approaches will smooth the landscape of the target measure.

Gao, Huang, and Jiao

Proposition 5.3 Suppose the target measure ν satisﬁes the log-Sobolev inequality with constant CLS(ν). Then the marginal distribution of the GIF (pt)t [0,1] satisﬁes the log-Sobolev inequality, and its log-Sobolev constant CLS(pt) is bounded as

CLS(pt) a2 t + b2 t CLS(ν).

Moreover, suppose the target measure ν satisﬁes the Poincar e inequality with constant CP (ν). Then the marginal distribution of the GIF (pt)t [0,1] satisﬁes the Poincar e inequality, and its Poincar e constant CP(pt) is bounded as

CP(pt) a2 t + b2 t CP(ν).

The log-Sobolev and Poincar e inequalities (see Deﬁnitions G.1 and G.2) are fundamental tools for establishing convergence guarantees for Langevin Monte Carlo algorithms. From an algorithmic viewpoint, the predictor-corrector algorithm in score-based diﬀusion models and the corresponding probability ﬂow ODEs essentially combine the ODE numerical solver (performing as the predictor) and the overdamped Langevin diﬀusion (performing as the corrector) to simulate samples from the marginal distributions (Song et al., 2021b). Proposition 5.3 shows that the marginal distributions all satisfy the log-Sobolev and Poincar e inequalities under mild assumptions on the target distribution. This conclusion suggests that Langevin Monte Carlo algorithms are certiﬁed to have convergence guarantees for sampling from the marginal distributions of GIFs. Furthermore, the target distributions covered in Assumption 2.2 are shown to satisfy the log-Sobolev and Poincar e inequalities (Mikulincer and Shenfeld, 2021; Dai et al., 2023; Fathi et al., 2023), which suggests that the assumptions of Proposition 5.3 generally hold.

6. Applications to generative modeling

Auto-encoding is a primary principle in learning a latent representation with generative models (Goodfellow et al., 2016, Chapter 14). Meanwhile, the concept of cycle consistency is important to unpaired image-to-image translation between the source and target domains (Zhu et al., 2017). The recent work by Su et al. (2023) propose the dual diﬀusion implicit bridges (DDIB) for image-to-image translation, which shows a strong pattern of exact auto-encoding and image-to-image translation. DDIBs are built upon the denoising diﬀusion implicit models (DDIM), which share the same probability ﬂow ODE with VESDE (considered as VE interpolant in Table 1), as pointed out by (Song et al., 2021a, Proposition 1). First, DDIBs attain latent embeddings of source images encoded with one DDIM operating in the source domain. The encoding embeddings are then decoded using another DDIM trained in the target domain to construct target images. The whole process consisting of two DDIMs seems to be cycle consistent up to numerical errors. Several phenomena of auto-encoding and cycle consistency are observed in the unpaired data generation procedure with DDIBs. We replicate the two-dimensional experiments by Su et al. (2023) in Figures 3 and 4 to show the phenomena of approximate auto-encoding and cycle consistency of GIFs1. To elucidate the empirical auto-encoding and cycle consistency for measure transport, we derive

1. The implementation is based on the Git Hub repository at https://github.com/suxuann/ddib.

Gaussian Interpolation Flows

Figure 3: An illustration of auto-encoding using DDIBs. The Concentric Rings data in the source domain (the ﬁrst panel) is encoded into the latent domain (the second panel), and then decoded into the source domain (the third panel). According to the consistent color pattern and pointwise correspondences across the domains, both the learned encoder mapping and the learned decoder mapping exhibit approximate Lipschitz continuity with respect to the space variable. One justiﬁcation of such auto-encoding observation is presented in Corollary 6.1 where we prove that the composition of the encoder map and the decoder map yields an identity map.

Corollaries 6.1 and 6.2 below and analyze the transport maps deﬁned by GIFs (covering the probability ﬂow ODE of VESDE used by DDIBs). We consider the continuous-time framework and the population level, which precludes learning errors including the time discretization errors and velocity ﬁeld estimation errors, and show that the transport maps naturally possess the exact auto-encoding and cycle consistency properties at the population level.

Corollary 6.1 (Auto-encoding) Suppose Assumptions 2.1 and 2.2-(i), (iii), or (iv) hold for a target measure ν. The Gaussian interpolation ﬂow (Xt)t [0,1] and its time-reversed ﬂow (X t )t [0,1] form an auto-encoder with a Lipschitz encoder X 1(x) and a Lipschitz decoder X1(x). The auto-encoding property holds in the sense that

X1 X 1 = Id. (6.1)

Corollary 6.2 (Cycle consistency) Suppose Assumptions 2.1 and 2.2-(i), (iii), or (iv) hold for the target measures ν1 and ν2. For the target measure ν1, we deﬁne the Gaussian interpolation ﬂow (X1,t)t [0,1] and its time-reversed ﬂow (X 1,t)t [0,1]. We also deﬁne the Gaussian interpolation ﬂow (X2,t)t [0,1] and its time-reversed ﬂow (X 2,t)t [0,1] for the target measure ν2 using the same at and bt. Then the transport maps X1,1(x), X 1,1(x), X2,1(x), and X 2,1(x) are Lipschitz continuous in the space variable x. Furthermore, the cycle consistency property holds in the sense that

X1,1 X 2,1 X2,1 X 1,1 = Id. (6.2)

Corollaries 6.1 and 6.2 show that the auto-encoding and cycle consistency properties hold for the ﬂows at the population level. These results provide insights to the approximate auto-encoding and cycle consistency properties at the sample level.

Gao, Huang, and Jiao

Figure 4: An illustration of cycle consistency using DDIBs. The cycle consistency property is manifested through the consistency of color patterns across the transformations. We transform the Moons data in the source domain onto the Concentric Squares data in the target domain, and then complete the cycle by mapping the target data back to the source domain. The latent spaces play a central role in the bidirectional translation. We provide a proof in Corollary 6.2 accounting for the cycle consistency property.

There are several types of errors introduced in the training of GIFs. On the one hand, the approximation in specifying source measures would exert inﬂuence on modeling the distribution. On the other hand, the approximation in the velocity ﬁeld also aﬀects the distribution learning error. We use the stability analysis method in the diﬀerential equations theory to address the potential eﬀects of these errors.

Corollary 6.3 Suppose Assumptions 2.1 and 2.2-(i), (iii), or (iv) hold. It holds that

C1 := sup x Rd x X1(x) 2,2 < , C2 := sup (t,x) [0,1] Rd xv(t, x) 2,2 < .

Proposition 6.1 (Stability in the source distribution) Suppose Assumptions 2.1 and 2.2-(i), (iii), or (iv) hold. If the source measure µ = Law(a0Z + b0X1) is replaced with the Gaussian measure γd,a2 0, then the stability of the transport map X1 is guaranteed by the W2 distance between the push-forward measure X1#γd,a2 0 and the target measure ν = Law(X1) as follows W2(X1#γd,a2 0, ν) C1b0 p

Eν[ X1 2] exp(C2d). (6.3)

The stability analysis in Proposition 6.1 provides insights into the selection of source measures for learning probability ﬂow ODEs and GIFs. The error bound (6.3) demonstrates that when the signal intensity is reasonably small in the source measure, that is, b0 1, the

Gaussian Interpolation Flows

distribution estimation error, induced by the approximation with a Gaussian source measure, is small as well in the sense of the quadratic Wasserstein distance. Using a Gaussian source measure to replace the true convolution source measure is a common approximation method for learning probability ﬂow ODEs and GIFs. Our analysis shows this replacement is reasonable for the purpose of distribution estimation. The Alekseev-Gr obner formula and its stochastic variants (Del Moral and Singh, 2022) have been shown eﬀective in quantifying the stability of well-posed ODE and SDE ﬂows against perturbations of its velocity ﬁeld or drift (Bortoli, 2022; Benton et al., 2023). We state these results below for convenience.

Lemma 6.1 (Hairer et al., 1993, Theorem 14.5) Let (Xt)t [0,1] and (Yt)t [0,1] solve the following IVPs, respectively

dt = v(t, Xt), X0 = x0, t [0, 1],

dt = v(t, Yt), Y0 = x0, t [0, 1],

where v(t, x) : [0, 1] Rd Rd and v(t, x) : [0, 1] Rd Rd are the velocity ﬁelds.

(i) Suppose that v is of class C1 in x. Then the Alekseev-Gr obner formula for the diﬀerence Xt(x0) Yt(x0) is given by

Xt(x0) Yt(x0) = Z t

0 ( x Xs,t)(Ys(x0)) (v(s, Ys(x0)) v(s, Ys(x0))) ds (6.4)

where x Xs,t(x) satisﬁes the variational equation

t( x Xs,t(x)) = ( xv)(t, Xs,t(x)) x Xs,t(x), x Xs,s(x) = Id. (6.5)

(ii) Suppose that v is of class C1 in x. Then the Alekseev-Gr obner formula for the diﬀerence Yt(x0) Xt(x0) is given by

Yt(x0) Xt(x0) = Z t

0 ( x Ys,t)(Xs(x0)) ( v(s, Xs(x0)) v(s, Xs(x0))) ds (6.6)

where x Ys,t(x) satisﬁes the variational equation

t( x Ys,t(x)) = ( x v)(t, Ys,t(x)) x Ys,t(x), x Ys,s(x) = Id. (6.7)

Exploiting the Alekseev-Gr obner formulas in Lemma 6.1 and uniform Lipschitz properties of the velocity ﬁeld, we deduce two error bounds in terms of the quadratic Wasserstein (W2) distance to show the stability of the ODE ﬂow when the velocity ﬁeld is not accurate.

Proposition 6.2 (Stability in the velocity ﬁeld) Suppose Assumptions 2.1 and 2.2 hold. Let qt denote the density function of Yt#µ.

Gao, Huang, and Jiao

(i) Suppose that Z 1

Rd v(t, x) v(t, x) 2 qt(x)dxdt ε. (6.8)

W 2 2 (Y1#µ, ν) ε Z 1

0 exp 2 Z 1

s θudu ds. (6.9)

(ii) Suppose that sup (t,x) [0,1] Rd x v(t, x) 2,2 C3.

W 2 2 (Y1#µ, ν) exp(2C3) 1

Rd v(t, x) v(t, x) 2pt(x)dxdt. (6.10)

Proposition 6.2 provides a stability analysis against the estimation error of the velocity ﬁeld using the W2 distance. The estimation error originates from the ﬂow matching or score matching procedures and the approximation error rising from using deep neural networks in estimating the velocity ﬁeld or the score function. These two W2 bounds imply that the distribution estimation error is controlled by the L2 estimation error of ﬂow matching and score matching. Indeed, this point justiﬁes the soundness of the approximation method through ﬂow matching and score matching. The ﬁrst W2 bound (6.9) relies on the L2 control (6.8) of the perturbation error of the velocity ﬁeld. The second W2 bound (6.10) is slightly better than that provided in (Albergo and Vanden-Eijnden, 2023, Proposition 3) but still has exponential dependence on the Lipschitz constant of v(t, x).

0.05 0.10 0.15 0.20 0.25 b0

W2 distance

Figure 5: An approximately linear relation between b0 and the Wasserstein-2 distance.

To demonstrate the bounds presented in Propositions 6.1 and 6.2, we conducted further experiments with a mixture of eight two-dimensional Gaussian distributions. These

Gaussian Interpolation Flows

propositions provide bounds for the stability of the ﬂow when subjected to perturbations in either the source distribution or the velocity ﬁeld. Let the target distribution be the following two-dimensional Gaussian mixture

j=1 φ(x; µj, Σj),

where φ(x; µj, Σj) is the probability density function for the Gaussian distribution with mean µj = 12(sin(2(j 1)π/8), cos(2(j 1)π/8)) and covariance matrix Σj = 0.032I2 for j = 1, , 8. For Gaussian mixtures, the velocity ﬁeld has an explicit formula, which facilitates the perturbation analysis. To illustrate the bound in Proposition 6.1, we consider a perturbation of the source distribution for the following model:

Xt = at Z + bt X with at = 1 t + ζ

1 + ζ , bt = t + ζ

where ζ [0, 0.3] is a value controlling the perturbation level. It is easy to see a0 = 1/(1 + ζ), b0 = ζ/(1 + ζ). Thus, the source distribution Law(a0Z + b0X) is a mixture of Gaussian distributions. Practically, we can use a Gaussian distribution γ2,a2 0 to replace this source distribution. In Proposition 6.1, we bound the error between the distributions of generated samples due to the replacement, that is,

W2(X1#γd,a2 0, ν) Cb0,

where C is a constant. We illustrate this theoretical bound using the mixture of Gaussian distributions and the Gaussian interpolation ﬂow given above. We consider a mesh for the variable ζ and plot the curve for b0 and W2(X1#γd,a2 0, ν) in Figure 5. Through Figure 5, an approximate linear relation between b0 and W2(X1#γd,a2 0, ν) is observed, which supports the results of Proposition 6.1. We now consider perturbing the velocity ﬁeld vt by adding random noise. Let ϵ [0.5, 5.5]. The random noise is generated using a Bernoulli random variable supported on { ϵ, ϵ}. Let vt denote the perturbed velocity ﬁeld. Then we can compute

vt := vt vt 2 = 2ϵ2.

We use the velocity ﬁeld vt and the perturbed velocity ﬁeld t to generate samples and compute the squared Wasserstein-2 distance between the sample distributions. According to Proposition 6.2, the squared Wasserstein-2 distance should be linearly upper bounded as O( vt), that is,

W 2 2 (Y1#µ, ν) C Z 1

R2 ϵ2pt(x)dxdt = Cϵ2,

where C is a constant. This theoretical insight is illustrated in Figure 6, where a linear relationship between these two variables is observed.

Gao, Huang, and Jiao

0 10 20 30 40 50 60 vt

Squared W2 distance

Figure 6: A linear relation between vt and the squared Wasserstein-2 distance.

7. Related work

GIFs and the induced transport maps are related to CNFs and score-based diﬀusion models. Mathematically, they interrelate with the literature on Lipschitz mass transport and Wasserstein gradient ﬂows. A central question in developing the ODE ﬂow or transport map method for generative modeling is how to construct an ODE ﬂow or transport map that are suﬃciently smooth and enable eﬃcient computation. Various approaches have been proposed to answer the question. CNFs construct invertible mappings between an isotropic Gaussian distribution and a complex target distribution (Chen et al., 2018; Grathwohl et al., 2019). They fall within the broader framework of neural ODEs (Chen et al., 2018; Ruiz-Balet and Zuazua, 2023). A major challenge for CNFs is designing a time-dependent ODE ﬂow whose marginal distribution converges to the target distribution while allowing for eﬃcient estimation of its velocity ﬁeld. Previous work has explored several principles to construct such ﬂows, including optimal transport, Wasserstein gradient ﬂows, and diﬀusion processes. Additionally, Gaussian denoising has emerged as an eﬀective principle for constructing simulation-free CNFs in generative modeling. Liu et al. (2023) propose the rectiﬁed ﬂow, which is based on a linear interpolation between a standard Gaussian distribution and the target distribution, mimicking the Gaussian denoising procedure. Albergo and Vanden-Eijnden (2023) study a similar formulation called stochastic interpolation, deﬁning a trigonometric interpolant between a standard Gaussian distribution and the target distribution. Albergo et al. (2023b) extend this idea by proposing a stochastic bridge interpolant between two arbitrary distributions. Under a few regularity assumptions, the velocity ﬁeld of the ODE ﬂow modeling the stochastic bridge interpolant is proven to be continuous in the time variable and smooth in the space variable.

Gaussian Interpolation Flows

Lipman et al. (2023) introduce a nonlinear least squares method called ﬂow matching to directly estimate the velocity ﬁeld of probability ﬂow ODEs. All of these models are encompassed within the framework of simulation-free CNFs, which have been the focus of numerous ongoing research eﬀorts (Neklyudov et al., 2023; Tong et al., 2023; Chen and Lipman, 2023; Albergo et al., 2023b; Shaul et al., 2023; Pooladian et al., 2023; Albergo et al., 2023a,c). Furthermore, Marzouk et al. (2023) provide the ﬁrst statistical convergence rate for the simulation-based method by placing neural ODEs within the nonparametric estimation framework.

Score-based diﬀusion models integrate the time reversal of stochastic diﬀerential equations (SDEs) with the score matching technique (Sohl-Dickstein et al., 2015; Song and Ermon, 2019; Ho et al., 2020; Song and Ermon, 2020; Song et al., 2021b,a; De Bortoli et al., 2021). These models are capable of modeling highly complex probability distributions and have achieved state-of-the-art performance in image synthesis tasks (Dhariwal and Nichol, 2021; Rombach et al., 2022). The probability ﬂow ODEs of diﬀusion models can be considered as CNFs, whose velocity ﬁeld incorporates the nonlinear score function (Song et al., 2021b; Karras et al., 2022; Lu et al., 2022b,a; Zheng et al., 2023). In addition to the score matching method, Lu et al. (2022a) and Zheng et al. (2023) explore maximum likelihood estimation for probability ﬂow ODEs. However, the regularity of these probability ﬂow ODEs has not been studied and their well-posedness properties remain to be established.

A key concept in deﬁning measure transport is Lipschitz mass transport, where the transport maps are required to be Lipschitz continuous. This ensures the smoothness and stability of the measure transport. There is a substantial body of research on the Lipschitz properties of transport maps. The celebrated Caﬀarelli s contraction theorem (Caﬀarelli, 2000, Theorem 2) establishes the Lipschitz continuity of optimal transport maps that push the standard Gaussian measure onto a log-concave measure. Colombo et al. (2017) study a Lipschitz transport map between perturbations of log-concave measures using optimal transport theory.

Mikulincer and Shenfeld (2021) demonstrate that the Brownian transport map, deﬁned by the F ollmer process, is Lipschitz continuous when it pushes forward the Wiener measure on the Wiener space to the target measure on the Euclidean space. Additionally, Neeman (2022) and Mikulincer and Shenfeld (2023) prove that the transport map along the reverse heat ﬂow of certain target measures is Lipschitz continuous.

Beyond studying Lipschitz transport maps, signiﬁcant eﬀort has been devoted to applying optimal transport theory in generative modeling. Zhang et al. (2018) propose the Monge-Ampe re ﬂow for generative modeling by solving the linearized Monge-Ampe re equation. Optimal transport theory has been utilized as a general principle to regularize the training of continuous normalizing ﬂows or generators for generative modeling (Finlay et al., 2020; Yang and Karniadakis, 2020; Onken et al., 2021; Makkuva et al., 2020). Liang (2021) leverage the regularity theory of optimal transport to formalize the generator-discriminatorpair regularization of GANs under a minimax rate framework.

In our work, we study the Lipschitz transport maps deﬁned by GIFs, which diﬀer from the optimal transport map. GIFs naturally ﬁt within the framework of continuous normalizing ﬂows, and their ﬂow mappings are examined from the perspective of Lipschitz mass transport.

Gao, Huang, and Jiao

Wasserstein gradient ﬂows oﬀer another principled approach to constructing ODE ﬂows for generative modeling. A Wasserstein gradient ﬂow is derived from the gradient descent minimization of a certain energy functional over probability measures endowed with the quadratic Wasserstein metric (Ambrosio et al., 2008). The Eulerian formulation of Wasserstein gradient ﬂows produces the continuity equations that govern the evolution of marginal distributions. After transferred into a Lagrangian formulation, Wasserstein gradient ﬂows deﬁne ODE ﬂows that have been widely explored for generative modeling (Johnson and Zhang, 2018; Gao et al., 2019; Liutkus et al., 2019; Johnson and Zhang, 2019; Arbel et al., 2019; Mroueh et al., 2019; Ansari et al., 2021; Mroueh and Nguyen, 2021; Fan et al., 2022; Gao et al., 2022; Duncan et al., 2023; Xu et al., 2022). Wasserstein gradient ﬂows are shown to be connected with the forward process of diﬀusion models. The variance preserving SDE of diﬀusion models is equivalent to the Langevin dynamics towards the standard Gaussian distribution that can be interpreted as a Wasserstein gradient ﬂow of the Kullback Leibler divergence for a standard Gaussian distribution (Song et al., 2021b). In the meantime, the probability ﬂow ODE of the variance preserving SDE conforms to the Eulerian formulation of this Wasserstein gradient ﬂow. However, when assigning a general distribution instead of the standard Gaussian distribution, it remains unclear whether the ODE formulation of Wasserstein gradient ﬂows possesses well-posedness. The main contribution of our work lies in establishing the theoretical properties of GIFs and their associated ﬂow maps in a uniﬁed way. Our theoretical results encompass the Lipschitz continuity of both the ﬂow velocity ﬁeld and the ﬂow map, addressing the existence, uniqueness, and stability of the ﬂow. We also demonstrate that both the ﬂow map and its inverse possess Lipschitz properties. Our proposed framework for Gaussian interpolation ﬂow builds upon previous research on probability ﬂow methods in diﬀusion models (Song et al., 2021b,a) and stochastic interpolation methods for generative modeling (Liu et al., 2023; Albergo and Vanden-Eijnden, 2023; Lipman et al., 2023). Rather than adopting a methodological perspective, we focus on elucidating the theoretical aspects of these ﬂows from a uniﬁed standpoint, thereby enhancing the understanding of various methodological approaches. Our theoretical results are derived from geometric considerations of the target distribution and from analytic calculations that exploit the Gaussian denoising property.

8. Conclusions and discussion

Gaussian denoising as a framework for constructing continuous normalizing ﬂows holds great promise in generative modeling. Through a uniﬁed framework and rigorous analysis, we have established the well-posedness of these ﬂows, shedding light on their capabilities and limitations. We have examined the Lipschitz regularity of the corresponding ﬂow maps for several rich classes of probability measures. When applied to generative modeling based on Gaussian denoising, we have shown that GIFs possess auto-encoding and cycle consistency properties at the population level. Additionally, we have established stability error bounds for the errors accumulated during the process of learning GIFs. The regularity properties of the velocity ﬁeld established in this paper provide a solid theoretical basis for end-to-end error analyses of learning GIFs using deep neural networks with empirical data. Another potential application is to perform rigorous analyses of consis-

Gaussian Interpolation Flows

tency models, a nascent family of ODE-based deep generative models designed for one-step generation (Song et al., 2023; Kim et al., 2023; Song and Dhariwal, 2023). We intend to investigate these intriguing problems in our subsequent work. We expect that our analytical results will facilitate further studies and advancements in applying simulation-free CNFs, including GIFs, to a diverse range of generative modeling tasks.

Acknowledgements

The authors wish to thank the action editor and reviewers for their valuable and constructive comments, which have signiﬁcantly improved the quality of this paper. The work of J. Huang is supported by the research grants (1-BDCC, 4-ZZ4B, 1-WZ3P, and 4-ZZPN) from The Hong Kong Polytechnic University and the National Natural Science Foundation of China (Grant No. 72331005). The work of Y. Jiao is supported by the National Nature Science Foundation of China (Grant No.12371441), the Fundamental Research Funds for the Central Universities , and the research fund of KLATASDSMOE of China.

Gao, Huang, and Jiao

In the appendices, we prove the results stated in the paper and provide necessary technical details and discussions.

Appendix A. Proofs of Theorem 3.1 and Lemma 4.1

Dynamical properties of Gaussian interpolation ﬂow (Xt)t [0,1] form the cornerstone of the measure interpolation method. Following Albergo and Vanden-Eijnden (2023); Albergo et al. (2023b), we leverage an argument of characteristic functions to quantify the dynamics of its marginal ﬂow, and in result, to prove Theorem 3.1. Proof [Proof of Theorem 3.1] Let ω Rd. For the Gaussian stochastic interpolation (Xt)t [0,1], we deﬁne the characteristic function of Xt by

Ψ(t, ω) := E[exp(i ω, Xt )] = E[exp(i ω, at Z + bt X1 )] = E[exp(iat ω, Z )]E[exp(ibt ω, X1 )],

where the last equality is due to the independence of between Z γd and X1 ν. Taking the time derivative of Ψ(t, ω) for t (0, 1), we derive that

tΨ(t, ω) = i ω, ψ(t, ω))

where ψ(t, ω) := E[exp(i ω, Xt )( at Z + bt X1)].

We ﬁrst deﬁne

v(t, Xt) := E[ at Z + bt X1|Xt]. (A.1)

Using the double expectation formula, we deduce that

ψ(t, ω) = E[exp(i ω, Xt )E[ at Z + bt X1|Xt]] = E[exp(i ω, Xt )v(t, Xt)].

Applying the inverse Fourier transform to ψ(t, ω), it holds that

j(t, x) := (2π) d Z

Rd exp( i ω, x )ψ(t, ω)dω = pt(x)v(t, x),

where v(t, x) := E[ at Z + bt X1|Xt = x]. Then it further yields that

tpt + x j(t, x) = 0,

that is, tpt + x (ptv(t, x)) = 0.

Next, we study the property of v(t, x) at t = 0 and t = 1. Notice that

x = at E[Z|Xt = x] + bt E[X1|Xt = x]. (A.2)

Combining Eq. (A.1) and (A.2), it implies that

v(t, x) = at

at x + bt at

at bt E[X1|Xt = x], t (0, 1). (A.3)

Gaussian Interpolation Flows

According to Tweedie s formula in Lemma G.1, it holds that

s(t, x) = bt

a2 t E [X1|Xt = x] 1

a2 t x, t (0, 1), (A.4)

where s(t, x) is the score function of the marginal distribution of Xt pt. Combining Eq. (A.3), (A.4), it holds that the velocity ﬁeld is a gradient ﬁeld and its nonlinear term is the score function s(t, x), namely, for any t (0, 1),

v(t, x) = bt bt x + bt

bt a2 t atat s(t, x). (A.5)

By the regularity properties that at, bt C2([0, 1)), a2 t C1([0, 1]), bt C1([0, 1]), we have that a0, b0, a1a1, and b1 are well-deﬁned. Then by Eq. (A.3), we deﬁne that

v(0, x) := lim t 0 v(t, x) = a0

a0 x + b0 a0

a0 b0 E[X1|X0 = x]

Using Eq. (A.5) yields that

v(1, x) := lim t 1 v(t, x) = b1 b1 x a1a1s(1, x). (A.6)

This completes the proof.

Lemma 4.1 presents several standard properties of Gaussian channels in information theory (Wibisono and Jog, 2018a,b; Dytso et al., 2023b) that will facilitate our proof. Proof [Proof of Lemma 4.1] By Bayes rule, Law(Y|Xt = x) = p(y|t, x) can be represented as

p(y|t, x) = ϕbty,a2 t (x)p1(y)/pt(x)

= (2π) d/2a d t exp x bty 2

p1(y)/pt(x)

= (2π) d/2a d t exp x 2

2a2 t + bt x, y

a2 t b2 t y 2

p1(y)/pt(x)

= exp bt x, y

a2 t b2 t y 2

p1(y) / (2π)d/2ad t exp x 2

Let θ = btx

a2 t , h(y) = p1(y) exp( b2 t y 2

2a2 t ), and the logarithmic partition function

A(θ) = log Z

Rd h(y) exp( y, θ )dy,

then by the deﬁnition of exponential family distributions, we conclude that

p(y|t, x) = h(y) exp( y, θ A(θ))

is an exponential family distribution of y. By simple calculation, it follows that

2 x log p(y|t, x) = b2 t a4 t 2 θA(θ).

Gao, Huang, and Jiao

For an exponential family distribution, a basic equality shows that

2 θA(θ) = Cov(Y|Xt = x),

which further yields that 2 x log p(y|t, x) = b2 t a4 t Cov(Y|Xt = x).

Appendix B. Auxiliary lemmas for Lipschitz ﬂow maps

The following lemma, due to G. Peano (Hartman, 2002a, Theorem 3.1), describes several meaningful diﬀerential equations associated with well-posed ﬂows and supports the derivation of Lipschitz continuity of their ﬂow maps.

Lemma B.1 (Ambrosio et al., 2023, Lemma 3.4) Suppose that a ﬂow (Xt)t [0,1] is wellposed and its velocity ﬁeld v(t, x) : [0, 1] Rd Rd is of class C1. Then the ﬂow map Xs,t : Rd Rd is of class C1 for any 0 s t 1. Fix (s, x) [0, 1] Rd and set the following functions deﬁned with t [s, 1]

y(t) := x Xs,t(x), J(t) := ( xv)(t, Xs,t(x)),

w(t) := det( x Xs,t(x)), b(t) := ( x v)(t, Xs,t(x)) = Tr(J(t)).

Then y(t) and w(t) are the unique C1 solutions of the following IVPs

y(t) = J(t)y(t), y(s) = Id, (B.1)

w(t) = b(t)w(t), w(s) = 1. (B.2)

We present an upper bound of the Lipschitz constant of its ﬂow map Xs,t(x) in Lemma 5.1. The upper bound has been deduced in Mikulincer and Shenfeld (2023); Ambrosio et al. (2023); Dai et al. (2023). For completeness, we derive it as a direct implication of Eq. (B.1) in Lemma B.1 and an upper bound of the Jacobian matrix of the velocity ﬁeld. Proof [Proof of Lemma 5.1] Let y(u) = x Xs,u(x), J(u) = ( xv)(u, Xs,u(x)). Owing to Lemma B.1, y(u) is of class C1, and the function u 7 y(u) 2,2 is absolutely continuous over [s, t]. By Lemma B.1, it follows that

u y(u) 2 2,2 = 2 y(u), y(u) = 2 y(u), J(u)y(u) 2θu y(u) 2 2,2.

Applying Gr onwall s inequality yields that y(t) 2,2 exp( R t s θudu) which concludes the proof.

Another result is concerning the theorem of instantaneous change of variables that is widely deployed in studying neural ODEs (Chen et al., 2018, Theorem 1). We also exploit the instantaneous change of variables to prove Proposition 6.1. To make the proof selfcontained, we show that the instantaneous change of variables directly follows Eq. (B.2) in Lemma B.1. Compared with the original proof in (Chen et al., 2018, Theorem 1), we illustrate that the well-posedness of a ﬂow is suﬃcient to ensure the instantaneous change of variables property, without a boundedness condition on the ﬂow.

Gaussian Interpolation Flows

Corollary B.1 (Instantaneous change of variables) Suppose that a ﬂow (Xt)t [0,1] is well-posed with a velocity ﬁeld v(t, x) : [0, 1] Rd Rd of class C1 in x. Let X0(x) π0(X0(x)) be a distribution of the initial value. Then the law of Xt(x) satisﬁes the following diﬀerential equation t log πt(Xt(x)) = Tr(( xv)(t, Xt(x))).

Proof Let δ(t) := det( x Xt(x)). Thanks to Eq. (B.2) in Lemma B.1, it holds that

δ(t) = Tr(( xv)(t, Xt(x)))δ(t), δ(0) = 1,

which implies δ(t) > 0 for t [0, 1]. Notice that log πt(Xt(x)) = log π0(X0(x)) log |δ(t)| by change of variables. Then it follows that t log πt(Xt(x)) = Tr(( xv)(t, Xt(x))).

Appendix C. Proofs of spatial Lipschitz estimates for the velocity ﬁeld

The main results in Section 4 are proved in this appendix. We ﬁrst present some ancillary lemmas before proceeding to give the proofs.

Lemma C.1 (Fathi et al., 2023) Suppose that f : Rd R+ is L-log-Lipschitz for some L 0. Let Pt be the Ornstein Uhlenbeck semigroup deﬁned by Pth(x) := EZ γd[h(e tx +

1 e 2t Z)] for any h C(Rd) and t 0. Then it holds that n 5Le t(L + t 1

2 ) L2e 2to Id 2 x log Ptf(x) n 5Le t(L + t 1

Proof This is a restatement of known results. See Proposition 2, Proposition 6, Theorem 6, and their proofs in Fathi et al. (2023).

Corollary C.1 Suppose that f : Rd R+ is L-log-Lipschitz for some L 0. Let Qt be an operator deﬁned by Qth(x) := EZ γd[h(βtx + αt Z)] (C.1)

for any h C(Rd) and t [0, 1] where 0 αt 1, βt 0 for any t [0, 1]. Then it holds that At L2β2 t Id 2 x log Qtf(x) At Id,

where At := 5Lβ2 t (1 α2 t ) 1

2 log(1 α2 t )) 1

Proof It is easy to notice that Qtf(x) = Psf(βtesx) where s = 1

2 log(1 α2 t ). Then it follows that 2 x log Qtf(x) = (βtes)2( 2 x log Psf)(βtesx) which yields At L2β2 t Id 2 x log Qtf(x) At Id,

where At := 5Lβ2 t (1 α2 t ) 1

2 log(1 α2 t )) 1

Gao, Huang, and Jiao

Lemma C.2 The Jacobian matrix of the velocity ﬁeld (3.5) has an alternative expression over time t (0, 1), that is,

xv(t, x) = bt

bt a2 t atat 2 x log e Qtf(x) 1 a2 t +b2 t Id + bt bt Id,

where f(x) := dν

dγd (x) and e Qtf(x) := EZ γd[f( bt a2 t +b2 t x + at

a2 t +b2 t Z)].

Proof By direct calculations, it holds that

pt(x) = a d t

Rd p1(y)ϕ x bty

Rd f(y)ϕ(y)ϕ x bty

= a d t ϕ (a2 t + b2 t ) 1

a2 t + b2 t

! 1 y bt a2 t + b2 t x

= a d t ϕ (a2 t + b2 t ) 1

a2 t + b2 t

bt a2 t + b2 t x + at p

a2 t + b2 t z

= (a2 t + b2 t ) d/2ϕ (a2 t + b2 t ) 1

2 x e Qtf(x).

Taking the logarithm and then the second-order derivative of the equation above, it yields

xs(t, x) = 2 x log e Qtf(x) 1 a2 t +b2 t Id.

Recalling that xv(t, x) = bt

bt a2 t atat xs(t, x) + bt bt Id, it further yields that

xv(t, x) = bt

bt a2 t atat 2 x log e Qtf(x) + atat+ btbt

a2 t +b2 t Id,

which completes the proof.

Corollary C.2 Suppose that f(x) := dν dγd (x) is L-log-Lipschitz for some L 0. Then for t (0, 1), it holds that bt

bt a2 t atat Bt L2 bt a2 t +b2 t

2 + atat+ btbt

xv(t, x) n bt

bt a2 t atat Bt + atat+ btbt

where Bt := 5Lbt(a2 t + b2 t ) 3

2 (L + (log( p

a2 t + b2 t /bt)) 1

Proof Let αt = at

a2 t +b2 t and βt = bt a2 t +b2 t . Then these bounds hold according to Corollary

C.1 and Lemma C.2.

Then we are prepared to prove Proposition 4.1. The proof is mainly based on the techniques for bounding conditional covariance matrices that are developed in a series of work (Wibisono and Jog, 2018a,b; Mikulincer and Shenfeld, 2021, 2023; Chewi and Pooladian, 2022; Dai et al., 2023). Proof [Proof of Proposition 4.1]

Gaussian Interpolation Flows

(a) By Jung s theorem (Danzer et al., 1963, Theorem 2.6), there exists a closed Euclidean ball with radius less than D := (1/

2)diam(supp(ν)) that contains supp(ν) in Rd. Then the desired bounds hold due to 0Id Cov(Y|Xt = x) D2Id and Eq. (4.3).

(b) Let p1 be β-semi-log-convex for some β > 0 on Rd. Then for any t [0, 1), the conditional distribution p(y|t, x) is β + b2 t a2 t

-semi-log-convex because

2 y log p(y|t, x) = 2 y log p1(y) 2 y log p(t, x|y) β + b2 t a2 t

By the Cram er-Rao inequality (2.4), we obtain

Cov(Y|Xt = x) β + b2 t a2 t

Therefore, by Eq. (4.3), we obtain

! b2 t βa2 t + b2 t + at

which implies

xv(t, x) βat at + bt bt

βa2 t + b2 t Id.

In addition, the bound above can be veriﬁed at time t = 1 by the deﬁnition (A.6).

(c) Let p1 be κ-semi-log-concave for some κ R. Then for any t [0, 1), the conditional distribution p(y|t, x) is κ + b2 t a2 t

-semi-log-concave because

2 y log p(y|t, x) = 2 y log p1(y) 2 y log p(t, x|y) κ + b2 t a2 t

When t n t : κ + b2 t a2 t > 0, t (0, 1) o , by the Brascamp-Lieb inequality (2.2), we obtain

Cov(Y|Xt = x) κ + b2 t a2 t

Therefore, by Eq. (4.3), we obtain

! b2 t κa2 t + b2 t + at

which implies

xv(t, x) κat at + bt bt

κa2 t + b2 t Id.

Moreover, the bound above can be veriﬁed at time t = 1 by the deﬁnition (A.6).

Gao, Huang, and Jiao

(d) Notice that

p(y|t, x) = p(t, x|y)

pt(x) d(γd,σ2 ρ)

Rd ϕz,σ2(y)ϕ x

bt , a2 t b2 t (y)ρ(dz),

where the prefactor Ax,t only depends on x and t. Then it follows that

p(y|t, x) = Z

Rd ϕ a2 t z+σ2btx

a2 t +σ2b2 t , σ2a2 t a2 t +σ2b2 t (y) ρ(dz)

where ρ is a probability measure on Rd whose density function is a multiple of ρ by a positive function. It also indicates that ρ is supported on the same Euclidean ball as ρ. To further illustrate p(y|t, x), let Q ρ and Z γd be independent. Then it holds that a2 t a2 t + σ2b2 t Q +

σ2a2 t a2 t + σ2b2 t Z + σ2bt a2 t + σ2b2 t x p(y|t, x).

Thus, it holds that

Cov(Y|Xt = x) = a2 t a2 t + σ2b2 t

2 Cov(Q) + σ2a2 t a2 t + σ2b2 t Id

( a2 t a2 t + σ2b2 t

2 R2 + σ2a2 t a2 t + σ2b2 t

By Eq. (4.3), it holds that

xv(t, x) b2 t a2 t

! a2 t a2 t + σ2b2 t

2 R2 + σ2a2 t a2 t + σ2b2 t

which implies

( atbt(at bt atbt)

(a2 t + σ2b2 t )2 R2 + atat + σ2 btbt

a2 t + σ2b2 t

Analogously, due to Cov(Q) 0Id, a lower bound would be yielded as follows

xv(t, x) atat + σ2 btbt

a2 t + σ2b2 t Id.

Then the results follow by combining the upper and lower bounds.

(e) The result follows from Corollary C.2.

Gaussian Interpolation Flows

We complete the proof.

Proof [Proof of Corollary 4.1] Let us consider that κ > 0 which is divided into two cases where κD2 1 and κD2 < 1. On the one hand, suppose that the ﬁrst case κD2 1 holds. By Proposition 4.1, the κ-based upper bound is tighter, that is,

λmax( xv(t, x)) θt := κat at + bt bt

κa2 t + b2 t .

On the other hand, suppose that the second case κD2 < 1 holds. Let t1 be deﬁned in Eq. (4.5). Again, by Proposition 4.1, the D2-based upper bound is tighter over [0, t1) and the κ-based upper bound is tighter over [t1, 1], which is denoted by

λmax( xv(t, x)) θt :=

at , t [0, t1),

κat at+bt bt

κa2 t +b2 t , t [t1, 1].

This completes the proof.

Proof [Proof of Corollary 4.2] Let κ < 0, D < such that κD2 < 1 is fulﬁlled. Then an argument similar to the proof of Corollary 4.1 yields the desired bounds.

Proof [Proof of Corollary 4.3] The result follows from Proposition 4.1-(d).

Proof [Proof of Corollary 4.4] The L-based upper and lower bounds in Proposition 4.1-(e) would blow up at time t = 1 because the term (log( p

a2 t + b2 t /bt)) 1

2 in Bt goes to as t 1. To ensure the spatial derivative of the velocity ﬁeld v(t, x) is upper bounded at time t = 1, we additionally require the target measure is κ-semi-log-concave with κ 0. Hence, a κ-based upper bound is available for t (t0, 1] as shown in Proposition 4.1-(c). Next, these two upper bounds are combined by choosing any t2 (t0, 1) ﬁrst. Then we exploit the L-based bound over [0, t2) and κ-based bound over [t0, 1]. This completes the proof.

Appendix D. Proofs of well-posedness and Lipschitz ﬂow maps

The proofs of main results in Section 5 are oﬀered in the following. Before proceeding, let us introduce some deﬁnitions and notations about function spaces that are collected in (Evans, 2010, Chapter 5). Let L1 loc(Rd; Rℓ) := {locally integrable function u : Rd Rℓ}. For integers k 0 and 1 p , we deﬁne the Sobolev space W k,p(Rd) := {u L1 loc(Rd)|Dαu exists and Dαu Lp(Rd) for |α| k}, where Dαu is the weak derivative of u. Then the local Sobolev space W k,p loc (Rd) is deﬁned as the function space such that for any u W k,p loc (Rd) and any compact set Ω Rd, u W k,p(Ω). As a result, we denote the vectorvalued local Sobolev space by W k,p loc (Rd; Rℓ). Provided that v(t, x) : [0, 1] Rd Rd, we use v L1([0, 1]; W 1, loc (Rd; Rd)) to indicate that v has a ﬁnite L1 norm over (t, x) [0, 1] Rd

Gao, Huang, and Jiao

and v(t, ) W 1, loc (Rd; Rd) for any t [0, 1]. Similarly, we say v L1([0, 1]; L (Rd; Rd)) when v has a ﬁnite L1 norm over (t, x) [0, 1] Rd and v(t, ) L (Rd; Rd) for every t [0, 1]. We will use the deﬁnitions and notations in the following proof. Proof [Proof of Theorem 5.1] Under Assumptions 1 and 2, we claim that the velocity ﬁeld v(t, x) satisﬁes

v L1([0, 1]; W 1, loc (Rd; Rd)), v 2 1 + x 2 L1([0, 1]; L (Rd; Rd)).

where the ﬁrst condition indicates the velocity ﬁeld v is locally bounded and locally Lipschitz continuous in x, and the second condition is a growth condition on v. According to the Cauchy-Lipschitz theorem (Ambrosio and Crippa, 2014, Remark 2.4), we have the representation formulae for solutions of the continuity equation. As a result, there exists a ﬂow (Xt)t [0,1] uniquely solves the IVP (3.10). Furthermore, the marginal ﬂow of (Xt)t [0,1] satisﬁes the continuity equation (3.4) in the weak sense. Then it remains to show the velocity ﬁeld v is locally bounded and locally Lipschitz continuous in x, and satisﬁes the growth condition. By the lower and upper bounds given in Proposition 4.1, we know that v is globally Lipschitz continuous in x under Assumptions 1 and 2. Indeed, the global Lipschitz continuity leads to local boundedness and linear growth properties by simple arguments. More concretely, for any t (0, 1), it holds that

v(t, 0) = bt at

E[X1|Xt = 0] = bt at

Rd yp(y|t, 0)dy

Rd yp1(y)a d t exp b2 t y 2 2 2a2 t

which implies v(t, 0) 2 < due to fast growth of the exponential function. Besides, it holds that v(0, 0) = ( b0 a0

a0 b0)E[X1|X0 = x] < , v(1, 0) = a1a1s(1, 0) < . Then by the boundedness of v(t, 0) 2 and the global Lipschitz continuity in x over t [0, 1], we bound v(t, x) as follows

v(t, x) 2 v(t, 0) 2 + v(t, x) v(t, 0) 2

v(t, 0) 2 +

sup (t,y) [0,1] Rd yv(t, y) 2,2

max{ x 2, 1}.

Hence, the local boundedness and linear growth properties of v are proved. This completes the proof.

Proof [Proof of Theorem 5.2] The proof is similar to that of Theorem 5.1.

Proof [Proof of Corollary 5.1] A well-posed ODE ﬂow has the time-reversal symmetry (Lamb and Roberts, 1998). By Theorem 5.1, the desired results are proved.

Gaussian Interpolation Flows

Proof [Proof of Corollary 5.2] The proof is similar to that of Corollary 5.1.

Proof [Proof of Proposition 5.1] Combining Proposition 4.1-(b), (c), and Lemma 5.1, we complete the proof.

Proof [Proof of Proposition 5.2] Combining Proposition 4.1-(d) and Lemma 5.1, we complete the proof.

Proof [Proof of Corollary 6.1] By Theorem 5.1 and Corollary 5.1, it holds that

X1 X 1 = X1 X 1 1 = Id.

This completes the proof.

Proof [Proof of Corollary 6.2] By Theorem 5.1 and Corollary 5.1, it holds that

X1,1 X 2,1 X2,1 X 1,1 = X1,1 X 1 2,1 X2,1 X 1 1,1 = Id.

This completes the proof.

Proof [Proof of Corollary 6.3] Let Assumptions 2.1 and 2.2 hold. According to Propositions 5.1 and 5.2, x X1(x) 2,2 is uniformly bounded for Case (i)-(iii) in Assumption 2.2. For Case (iv), the boundedness of x X1(x) 2,2 holds by combining Corollary 4.4 and Lemma 5.1. Using Proposition 4.1, we know that xv(t, x) 2,2 is uniformly bounded.

Proof [Proof of Proposition 5.3] The proof idea is similar to those of (Ball et al., 2003, Proposition 1) and (Cattiaux and Guillin, 2014, Proposition 18). Let f : Ω R be of class C1 and Xt pt. First, we consider the case of log-Sobolev inequalities. Using that Z γd and X1 ν both satisfy the log-Sobolev inequalities in Deﬁnition G.1, we have

E[(f2 log f2)(Xt)] = E[(f2 log f2)(at Z + bt X1)]

Z Z f2(atz + btx)dγd(z) log Z f2(atz + btx)dγd(z) dν(x)

+ Z 2CLS(γd) Z a2 t ( f 2 2)(atz + btx)dγd(z) dν(x)

Z Z f2(atz + btx)dγd(z)dν(x) log Z Z f2(atz + btx)dγd(z)dν(x)

+ 2CLS(ν) Z x Z f2(atz + btx)dγd(z) 1

+ 2a2 t CLS(γd) Z Z ( f 2 2)(atz + btx)dγd(z)dν(x)

E[f2(Xt)] log E[f2(Xt)] + 2a2 t CLS(γd)E[ f(Xt) 2 2]

+ 2CLS(ν) Z x Z f2(atz + btx)dγd(z) 1

Gao, Huang, and Jiao

By Jensen s inequality and the Cauchy-Schwartz inequality, it holds that Z x Z f2(atz + btx)dγd(z) 1

R R ( f f 2)(atz + btx)dγd(z) 2 dν(x) R R f2(atz + btx)dγd(z)dν(x)

Z Z ( f 2 2)(atz + btx)dγd(z)dν(x)

b2 t E[ f(Xt) 2 2].

Hence, combining the equations above and the fact that CLS(γd) 1 (Gross, 1975), it implies that

E[(f2 log f2)(Xt)] E[f2(Xt)] log E[f2(Xt)] 2 a2 t + b2 t CLS(ν) E[ f(Xt) 2 2],

that is, CLS(pt) a2 t + b2 t CLS(ν). Next, we tackle the case of Poincar e inequalities by similar calculations. Using that Z γd and X1 ν both satisfy the Poincar e inequalities in Deﬁnition G.2, we have

E[f2(Xt)] = E[f2(at Z + bt X1)]

Z Z f(atz + btx)dγd(z) 2 dν(x)

+ Z CP(γd) Z a2 t ( f 2 2)(atz + btx)dγd(z) dν(x)

Z Z f(atz + btx)dγd(z)dν(x) 2

+ CP(ν) Z x Z f(atz + btx)dγd(z) 2

+ a2 t CP(γd) Z Z ( f 2 2)(atz + btx)dγd(z)dν(x)

(E[f(Xt)])2 + a2 t CP(γd) + b2 t CP(ν) E[ f(Xt) 2 2].

Combining the expression above and CP(γd) 1, it implies that

E[f2(Xt)] (E[f(Xt)])2 a2 t + b2 t CP(ν) E[ f(Xt) 2 2],

that is, CP(pt) a2 t + b2 t CP(ν). This completes the proof.

Appendix E. Proofs of the stability results

We provide the proofs of the stability results in Section 6. Proof [Proof of Proposition 6.1] Let x0 = a0z + b0x1 and suppose X0(x0) µ, X0(a0z) γd,a2 0. According to Corollary 6.3, the Lipschitz property of X1(x) implies that X1(x0) X1(a0z) C1 x0 a0z . We consider an integral deﬁned by

It := Z x0 a0z 2dπt(Xt(x0), Xt(a0z)),

Gaussian Interpolation Flows

where πt is a coupling made of the joint distribution of (Xt(x0), Xt(a0z)). In particular, the initial value I0 is computed by

I0 = Z x0 a0z 2p0(x0)ϕ(z)dx0dz = Z b0x1 2p1(x1)dx1 = b2 0Eν[ X1 2].

Since (Xt)t [0,1] is well-posed with X0(x0) µ or X0(a0z) γd,a2 0, according to Corollary B.1, the coupling πt satisﬁes the following diﬀerential equation

t log πt(Xt(x0), Xt(a0z)) = Tr(( xv)(t, Xt(x0))) Tr(( xv)(t, Xt(a0z))). (E.1)

Taking the derivative of It and using Eq. (E.1), it implies that

sup (s,x) [0,1] Rd Tr( xv(s, x))

Thanks to Tr( xv(s, x)) d xv(s, x) 2,2, it follows that

dt 2C2d It, I0 = b2 0Eν[ X1 2].

By Gr onwall s inequality, it holds that It b2 0Eν[ X1 2] exp(2C2dt). Therefore, we obtain the following W2 bound

W2(X1#γd,a2 0, ν) = W2(X1#γd,a2 0, X1#µ) C1 p

Eν[ X1 2] exp(C2d),

which completes the proof.

Proof [Proof of Proposition 6.2]

(i) On the one hand, by Corollary 6.3, v(t, x) is Lipschitz continuous in x uniformly over (t, x) [0, 1] Rd with Lipschitz constant C2. By the variational equation (6.5) and Lemma 5.1, it follows that

x Xs,t(x) 2 2,2 exp 2 Z t

Due to the equality (6.4), we deduce that

X1(x0) Y1(x0) 2

0 ( x Xs,1)(Ys(x0)) 2,2 v(s, Ys(x0)) v(s, Ys(x0)) ds 2

0 ( x Xs,1)(Ys(x0)) 2 2,2ds Z 1

0 v(s, Ys(x0)) v(s, Ys(x0)) 2ds

0 exp 2 Z 1

s θudu ds Z 1

0 v(s, Ys(x0)) v(s, Ys(x0)) 2ds.

Gao, Huang, and Jiao

Take expectation and it follows that

W 2 2 (Y1#µ, ν) Ex0 µ Y1(x0) X1(x0) 2

0 exp 2 Z 1

s θudu ds Z 1

Rd v(t, x) v(t, x) 2 qt(x)dxdt

0 exp 2 Z 1

where qt denotes the density function of Yt#µ, and we use the assumption that

Rd v(t, x) v(t, x) 2 qt(x)dxdt ε

in the last inequality.

(ii) On the other hand, suppose that v(t, x) is Lipschitz continuous in x uniformly over (t, x) [0, 1] Rd with Lipschitz constant C3. Applying Gr onwall s inequality to the variational equation (6.7), it follows that

x Ys,t(x) 2 2,2 exp(2C3(t s)).

By the equality (6.6), it holds that

Y1(x0) X1(x0) 2

0 ( x Ys,1)(Xs(x0)) 2,2 v(s, Xs(x0)) v(s, Xs(x0)) ds 2

0 ( x Ys,1)(Xs(x0)) 2 2,2ds Z 1

0 v(s, Xs(x0)) v(s, Xs(x0)) 2ds

0 v(s, Xs(x0)) v(s, Xs(x0)) 2ds.

Taking expectations, it further yields that

W 2 2 (Y1#µ, ν) Ex0 µ Y1(x0) X1(x0) 2

Rd v(t, x) v(t, x) 2pt(x)dxdt

where Xt(x0) pt.

Appendix F. Time derivative of the velocity ﬁeld

In this appendix, we are interested in representing the time derivative of the velocity ﬁeld via moments of Y|Xt = x. The result is eﬃcacious for controlling the time derivative with moment estimates, though the computation is somehow tedious.

Gaussian Interpolation Flows

Proposition F.1 The time derivative of the velocity ﬁeld v(t, x) has an expression with moments of X1|Xt for any t (0, 1) as follows

tv(t, x) = at

at a2 t a2 t

a2 t bt bt atat bt bt atat + a2 t

! bt a2 t M1

+ b2 t a2 t

Mc 2x b3 t a2 t

!2 (M3 M2M1) ,

where M1 := E[X1|Xt = x], M2 := E[X 1 X1|Xt = x], Mc 2 := Cov(X1|Xt = x), M3 := E[X1X 1 X1|Xt = x].

Proof By direct diﬀerentiation, it implies that

tv(t, x) = t

bt a2 t atat

bt a2 t atat

= btbt b2 t b2 t x +

btbt b2 t b2 t a2 t + bt bt 2 atat atat a2 t

bt a2 t atat

We ﬁrst focus on ts(t, x). Since pt satisﬁes the continuity equation (3.4), it holds that

ts(t, x) = x( t log pt(x))

x (pt(x)v(t, x))

( xpt(x)) v(t, x) + pt(x)( x v(t, x))

= x s(t, x) v(t, x) + x v(t, x)

= ( xs(t, x)) v(t, x) + ( xv(t, x)) s(t, x) + x( x v(t, x))

= ( xs(t, x)v(t, x) + xv(t, x)s(t, x) + x Tr( xv(t, x))) .

By direct computation, it holds that

xs(t, x)v(t, x) + xv(t, x)s(t, x)

bt a2 t atat

bt a2 t atat

= bt bt xs(t, x)x +

bt a2 t atat

xs(t, x)s(t, x) + bt bt s(t, x) +

bt a2 t atat

xs(t, x)s(t, x)

= bt bt s(t, x) + bt bt xs(t, x)x + 2

bt a2 t atat

xs(t, x)s(t, x).

Gao, Huang, and Jiao

Then we focus on the trace term

x Tr( xv(t, x))

! b2 t a2 t Cov(Y|Xt = x) + at

! b2 t a2 t x Tr(Cov(Y|Xt = x))

! b2 t a2 t x

Z y 2p(y|t, x)dy

Z yp(y|t, x)dy

! b2 t a2 t

Z y 2 xp(y|t, x)dy 2 Z xp(y|t, x) ydy Z yp(y|t, x)dy ,

where we notice that

xp(y|t, x) = x

p(t, x|y)p1(y)

= xp(t, x|y)p1(y)

pt(x) p(t, x|y)p1(y)

pt(x) s(t, x)

= p(y|t, x) bty x

a2 t s(t, x) .

For ease of presentation, we introduce the following notations to denote several moments of Y|Xt = x

M1 := E[Y|Xt = x], M2 := E[Y Y|Xt = x],

Mc 2 := Cov(Y|Xt = x), M3 := E[YY Y|Xt = x].

By Tweedie s formula in Lemma G.1, it yields s(t, x) = bt

a2 t x. By this expression of s(t, x), it yields

xs(t, x)v(t, x) + xv(t, x)s(t, x)

= bt bt s(t, x) + bt bt xs(t, x)x + 2

bt a2 t atat

xs(t, x)s(t, x)

a2 t x + bt bt

b2 t a4 t Mc 2 1

bt a2 t atat

! b2 t a4 t Mc 2 1

a3 t x + bt

M1 + b2 t a4 t

Mc 2x + 2 b3 t a4 t

Gaussian Interpolation Flows

and xp(y|t, x) = bt

a2 t (y M1) p(y|t, x). Therefore, we obtain

Z y 2 xp(y|t, x)dy 2 Z xp(y|t, x) ydy Z yp(y|t, x)dy

a2 t (y M1) p(y|t, x)dy 2 Z bt a2 t (y M1) yp(y|t, x)dy Z yp(y|t, x)dy

Z y 2yp(y|t, x)dy Z y 2p(y|t, x)dy M1

2 Z y yp(y|t, x)dy M1 Z yp(y|t, x)dy Z yp(y|t, x)dy

a2 t (M3 M2M1 2Mc 2M1) .

Combining the equations above, we obtain

tv(t, x) = btbt b2 t b2 t x +

btbt b2 t b2 t a2 t + bt bt 2 atat atat a2 t

bt a2 t atat

a3 t x + bt

+ b2 t a4 t

Mc 2x + 2 b3 t a4 t

bt a2 t atat

! b3 t a4 t (M3 M2M1 2Mc 2M1)

at a2 t a2 t

a2 t bt bt atat bt bt atat + a2 t

! bt a2 t M1

+ b2 t a2 t

Mc 2x b3 t a2 t

!2 (M3 M2M1) .

Then we complete the proof.

Appendix G. Functional inequalities and Tweedie s formula

This appendix is devoted to an exposition of functional inequalities and Tweedie s formula that would assist in our proof. For a probability measure µ on a compact set Ω Rd, we deﬁne the variance of a function f L2(Ω, µ) as

Varµ(f) := Z

Gao, Huang, and Jiao

Moreover, for a probability measure µ on a compact set Ω Rd and any positive integrable function f : Ω R such that R

Ωf log f dν < , we deﬁne the entropy of f as

Entµ(f) := Z

Ω f log fdµ Z

Ω fdµ log Z

Deﬁnition G.1 (Log-Sobolev inequality) A probability measure µ P(Ω) is said to satisfy a log-Sobolev inequality with constant C > 0, if for all functions f : Ω R, it holds that

Entµ(f2) 2C Z

The best constant C > 0 for which such an inequality holds is referred to as the log-Sobolev constant CLS(µ).

Deﬁnition G.2 (Poincar e inequality) A probability measure µ P(Ω) is said to satisfy a Poincar e inequality with constant C > 0, if for all functions f : Ω R, it holds that

Varµ(f) C Z

The best constant C > 0 for which such an inequality holds is referred to as the Poincar e constant CP(µ).

Finally, for ease of reference, we present Tweedie s formula that was ﬁrst reported in Robbins (1956), and then was used as a simple empirical Bayes approach for correcting selection bias (Efron, 2011). Here, we use Tweedie s formula to link the score function with the expectation conditioned on an observation with Gaussian noise.

Lemma G.1 (Tweedie s formula) Suppose that X µ and ϵ γd,σ2. Let Y = X+ϵ and p(y) be the marginal density of Y. Then E[X|Y = y] = y + σ2 y log p(y).

Michael S. Albergo and Eric Vanden-Eijnden. Building normalizing ﬂows with stochastic interpolants. In The Eleventh International Conference on Learning Representations, 2023.

Michael S. Albergo, Nicholas M. Boﬃ, Michael Lindsey, and Eric Vanden-Eijnden. Multimarginal generative modeling with stochastic interpolants. ar Xiv preprint ar Xiv:2310.03695, 2023a.

Michael S. Albergo, Nicholas M. Boﬃ, and Eric Vanden-Eijnden. Stochastic interpolants: A unifying framework for ﬂows and diﬀusions. ar Xiv preprint ar Xiv:2303.08797, 2023b.

Michael S. Albergo, Mark Goldstein, Nicholas M. Boﬃ, Rajesh Ranganath, and Eric Vanden-Eijnden. Stochastic interpolants with data-dependent couplings. ar Xiv preprint ar Xiv:2310.03725, 2023c.

Gaussian Interpolation Flows

Luigi Ambrosio and Gianluca Crippa. Continuity equations and ODE ﬂows with nonsmooth velocity. Proceedings of the Royal Society of Edinburgh Section A: Mathematics, 144(6):1191 1244, 2014.

Luigi Ambrosio, Nicola Gigli, and Giuseppe Savar e. Gradient ﬂows: In metric spaces and in the space of probability measures. Springer Science & Business Media, 2008.

Luigi Ambrosio, Sebastiano N. Golo, and Francesco S. Cassano. Classical ﬂows of vector ﬁelds with exponential or sub-exponential summability. Journal of Diﬀerential Equations, 372:458 504, 2023.

Abdul Fatir Ansari, Ming Liang Ang, and Harold Soh. Reﬁning deep generative models via discriminator gradient ﬂow. In International Conference on Learning Representations, 2021.

Michael Arbel, Anna Korba, Adil Salim, and Arthur Gretton. Maximum mean discrepancy gradient ﬂow. In Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019.

Martin Arjovsky, Soumith Chintala, and L eon Bottou. Wasserstein generative adversarial networks. In Proceedings of the 34th International Conference on Machine Learning, volume 70, pages 214 223. PMLR, 2017.

Dominique Bakry and Michel Emery. Diﬀusions hypercontractives. In Seminaire de probabilit es XIX 1983/84, pages 177 206. Springer, 1985.

Dominique Bakry, Ivan Gentil, and Michel Ledoux. Analysis and geometry of Markov diﬀusion operators, volume 103. Springer, 2014.

Keith Ball, Franck Barthe, and Assaf Naor. Entropy jumps in the presence of a spectral gap. Duke Mathematical Journal, 119(1):41 63, 2003.

Joe Benton, George Deligiannidis, and Arnaud Doucet. Error bounds for ﬂow matching methods. ar Xiv preprint ar Xiv:2305.16860, 2023.

Marin Biloˇs, Johanna Sommer, Syama Sundar Rangapuram, Tim Januschowski, and Stephan G unnemann. Neural ﬂows: Eﬃcient alternative to neural ODEs. In Advances in Neural Information Processing Systems, volume 34, pages 21325 21337. Curran Associates, Inc., 2021.

Sergey G. Bobkov and Michel Ledoux. From Brunn-Minkowski to Brascamp-Lieb and to logarithmic Sobolev inequalities. Geometric and Functional Analysis, 10(5):1028 1052, 2000.

Valentin De Bortoli. Convergence of denoising diﬀusion models under the manifold hypothesis. Transactions on Machine Learning Research, 2022.

Herm J. Brascamp and Elliott H. Lieb. On extensions of the Brunn-Minkowski and Pr ekopa Leindler theorems, including inequalities for log concave functions, and with an application to the diﬀusion equation. Journal of Functional Analysis, 22(4):366 389, 1976.

Gao, Huang, and Jiao

Luis A. Caﬀarelli. Monotonicity properties of optimal transportation and the FKG and related inequalities. Communications in Mathematical Physics, 214(3):547 563, 2000.

Tony T. Cai and Yihong Wu. Optimal detection of sparse mixtures against a given null distribution. IEEE Transactions on Information Theory, 60(4):2217 2232, 2014.

Patrick Cattiaux and Arnaud Guillin. Semi log-concave Markov diﬀusions. In Catherine Donati-Martin, Antoine Lejay, and Alain Rouault, editors, S eminaire de probabilit es XLVI, pages 231 292. Springer International Publishing, Cham, 2014.

Hongrui Chen, Holden Lee, and Jianfeng Lu. Improved analysis of score-based generative modeling: User-friendly bounds under minimal smoothness assumptions. In Proceedings of the 40th International Conference on Machine Learning, volume 202, pages 4735 4763. PMLR, 2023a.

Ricky T.Q. Chen and Yaron Lipman. Riemannian ﬂow matching on general geometries. ar Xiv preprint ar Xiv:2302.03660, 2023.

Ricky T.Q. Chen, Yulia Rubanova, Jesse Bettencourt, and David K. Duvenaud. Neural ordinary diﬀerential equations. In Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018.

Sitan Chen, Sinho Chewi, Jerry Li, Yuanzhi Li, Adil Salim, and Anru Zhang. Sampling is as easy as learning the score: Theory for diﬀusion models with minimal data assumptions. In The Eleventh International Conference on Learning Representations, 2023b.

Sinho Chewi and Aram-Alexandre Pooladian. An entropic generalization of Caﬀarelli s contraction theorem via covariance inequalities. ar Xiv preprint ar Xiv:2203.04954, 2022.

Maria Colombo, Alessio Figalli, and Yash Jhaveri. Lipschitz changes of variables between perturbations of log-concave measures. Annali della Scuola Normale Superiore di Pisa. Classe di scienze, 17(4):1491 1519, 2017.

Dario Cordero-Erausquin. Transport inequalities for log-concave measures, quantitative forms, and applications. Canadian Journal of Mathematics, 69(3):481 501, 2017.

Yin Dai, Yuan Gao, Jian Huang, Yuling Jiao, Lican Kang, and Jin Liu. Lipschitz transport maps via the F ollmer ﬂow. ar Xiv preprint ar Xiv:2309.03490, 2023.

Ludwig Danzer, Branko Gr unbaum, and Victor Klee. Helly s theorem and its relatives. In Proceedings of Symposia in Pure Mathematics: Convexity, volume VII, pages 101 180, Providence, RI, 1963. American Mathematical Society.

Valentin De Bortoli, James Thornton, Jeremy Heng, and Arnaud Doucet. Diﬀusion Schr odinger bridge with applications to score-based generative modeling. In Advances in Neural Information Processing Systems, volume 34, pages 17695 17709. Curran Associates, Inc., 2021.

Pierre Del Moral and Sumeetpal S. Singh. Backward Itˆo Ventzell and stochastic interpolation formulae. Stochastic Processes and their Applications, 154:197 250, 2022.

Gaussian Interpolation Flows

Prafulla Dhariwal and Alexander Nichol. Diﬀusion models beat GANs on image synthesis. In Advances in Neural Information Processing Systems, volume 34, pages 8780 8794, 2021.

Laurent Dinh, David Krueger, and Yoshua Bengio. NICE: Non-linear independent components estimation. ar Xiv preprint ar Xiv:1410.8516, 2014.

Andrew Duncan, Nikolas N usken, and Lukasz Szpruch. On the geometry of Stein variational gradient descent. Journal of Machine Learning Research, 24(56):1 39, 2023.

Alex Dytso, Martina Cardone, and Ian Zieder. Meta derivative identity for the conditional expectation. IEEE Transactions on Information Theory, 69(7):4284 4302, 2023a.

Alex Dytso, H. Vincent Poor, and Shlomo Shamai Shitz. Conditional mean estimation in Gaussian noise: A meta derivative identity with applications. IEEE Transactions on Information Theory, 69(3):1883 1898, 2023b.

Bradley Efron. Tweedie s formula and selection bias. Journal of the American Statistical Association, 106(496):1602 1614, 2011.

Ronen Eldan and James R. Lee. Regularization under diﬀusion and anticoncentration of the information content. Duke Mathematical Journal, 167(5):969 993, 2018.

Lawrence C. Evans. Partial diﬀerential equations, volume 19 of Graduate studies in mathematics. American Mathematical Society, Providence, Rhode Island, second edition edition, 2010.

Jiaojiao Fan, Qinsheng Zhang, Amirhossein Taghvaei, and Yongxin Chen. Variational Wasserstein gradient ﬂow. In Proceedings of the 39th International Conference on Machine Learning, volume 162, pages 6185 6215. PMLR, 2022.

Max Fathi, Dan Mikulincer, and Yair Shenfeld. Transportation onto log-Lipschitz perturbations. ar Xiv preprint ar Xiv:2305.03786, 2023.

Chris Finlay, J orn-Henrik Jacobsen, Levon Nurbekyan, and Adam Oberman. How to train your neural ODE: The world of Jacobian and kinetic regularization. In Proceedings of the 37th International Conference on Machine Learning, volume 119, pages 3154 3164. PMLR, 2020.

Yuan Gao, Yuling Jiao, Yang Wang, Yao Wang, Can Yang, and Shunkang Zhang. Deep generative learning via variational gradient ﬂow. In Proceedings of the 36th International Conference on Machine Learning, volume 97, pages 2093 2101. PMLR, 2019.

Yuan Gao, Jian Huang, Yuling Jiao, Jin Liu, Xiliang Lu, and Zhijian Yang. Deep generative learning via Euler particle transport. In Proceedings of the 2nd Mathematical and Scientiﬁc Machine Learning Conference, volume 145, pages 336 368. PMLR, 2022.

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in Neural Information Processing Systems 27, pages 2672 2680. Curran Associates, Inc., 2014.

Gao, Huang, and Jiao

Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT Press, 2016. http://www.deeplearningbook.org.

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial networks. Communications of the ACM, 63(11):139 144, 2020.

Will Grathwohl, Ricky T.Q. Chen, Jesse Bettencourt, and David Duvenaud. FFJORD: Free-form continuous dynamics for scalable reversible generative models. In International Conference on Learning Representations, 2019.

Leonard Gross. Logarithmic Sobolev inequalities. American Journal of Mathematics, 97 (4):1061 1083, 1975.

Ernst Hairer, Gerhard Wanner, and Syvert P. Nørsett. Classical Mathematical Theory, chapter I, pages 1 128. Springer Berlin Heidelberg, Berlin, Heidelberg, 1993.

Philip Hartman. Dependence on Initial Conditions and Parameters, chapter V, pages 93 116. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 2002a.

Philip Hartman. Existence, chapter II, pages 8 23. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 2002b.

Charles P. Hatsell and Loren W. Nolte. Some geometric properties of the likelihood ratio (corresp.). IEEE Transactions on Information Theory, 17(5):616 618, 1971.

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diﬀusion probabilistic models. In Advances in Neural Information Processing Systems, volume 33, pages 6840 6851, 2020.

Rie Johnson and Tong Zhang. Composite functional gradient learning of generative adversarial models. In Proceedings of the 35th International Conference on Machine Learning, volume 80, pages 2371 2379. PMLR, 2018.

Rie Johnson and Tong Zhang. A framework of composite functional gradient methods for generative adversarial models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(1):17 32, 2019.

Tero Karras, Miika Aittala, Timo Aila, and Samuli Laine. Elucidating the design space of diﬀusion-based generative models. In Advances in Neural Information Processing Systems, volume 35, pages 26565 26577. Curran Associates, Inc., 2022.

Dongjun Kim, Chieh-Hsin Lai, Wei-Hsiang Liao, Naoki Murata, Yuhta Takida, Toshimitsu Uesaka, Yutong He, Yuki Mitsufuji, and Stefano Ermon. Consistency trajectory models: Learning probability ﬂow ODE trajectory of diﬀusion. ar Xiv preprint ar Xiv:2310.02279, 2023.

Diederik P. Kingma and Max Welling. Auto-encoding variational bayes. In International Conference on Learning Representations, 2014.

Diederik P. Kingma and Max Welling. An introduction to variational autoencoders. Foundations and Trends R in Machine Learning, 12(4):307 392, 2019.

Gaussian Interpolation Flows

Bo az Klartag. High-dimensional distributions with convexity properties. In Proceedings of the Fifth European Congress of Mathematics, pages 401 417, Amsterdam, 14 July 18 July 2010. European Mathematical Society Publishing House.

Ivan Kobyzev, Simon J.D. Prince, and Marcus A. Brubaker. Normalizing ﬂows: An introduction and review of current methods. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(11):3964 3979, 2020.

Jeroen S.W. Lamb and John A.G. Roberts. Time-reversal symmetry in dynamical systems: A survey. Physica D: Nonlinear Phenomena, 112(1-2):1 39, 1998.

Holden Lee, Jianfeng Lu, and Yixin Tan. Convergence of score-based generative modeling for general data distributions. In Proceedings of The 34th International Conference on Algorithmic Learning Theory, volume 201, pages 946 985. PMLR, 2023.

Tengyuan Liang. How well generative adversarial networks learn distributions. Journal of Machine Learning Research, 22(228):1 41, 2021.

Yaron Lipman, Ricky T.Q. Chen, Heli Ben-Hamu, Maximilian Nickel, and Matthew Le. Flow matching for generative modeling. In The Eleventh International Conference on Learning Representations, 2023.

Qiang Liu. Rectiﬁed ﬂow: A marginal preserving approach to optimal transport. ar Xiv preprint ar Xiv:2209.14577, 2022.

Xingchao Liu, Chengyue Gong, and Qiang Liu. Flow straight and fast: Learning to generate and transfer data with rectiﬁed ﬂow. In The Eleventh International Conference on Learning Representations, 2023.

Antoine Liutkus, Umut Simsekli, Szymon Majewski, Alain Durmus, and Fabian-Robert St oter. Sliced-Wasserstein ﬂows: Nonparametric generative modeling via optimal transport and diﬀusions. In Proceedings of the 36th International Conference on Machine Learning, volume 97, pages 4104 4113. PMLR, 2019.

Cheng Lu, Kaiwen Zheng, Fan Bao, Jianfei Chen, Chongxuan Li, and Jun Zhu. Maximum likelihood training for score-based diﬀusion ODEs by high order denoising score matching. In Proceedings of the 39th International Conference on Machine Learning, pages 14429 14460. PMLR, 2022a.

Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, and Jun Zhu. DPMSolver: A fast ODE solver for diﬀusion probabilistic model sampling in around 10 steps. In Advances in Neural Information Processing Systems, volume 35, pages 5775 5787. Curran Associates, Inc., 2022b.

Ashok Makkuva, Amirhossein Taghvaei, Sewoong Oh, and Jason Lee. Optimal transport mapping via input convex neural networks. In Proceedings of the 37th International Conference on Machine Learning, volume 119, pages 6672 6681. PMLR, 2020.

Pierre Marion. Generalization bounds for neural ordinary diﬀerential equations and deep residual networks. ar Xiv preprint ar Xiv:2305.06648, 2023.

Gao, Huang, and Jiao

Pierre Marion, Yu-Han Wu, Michael E Sander, and G erard Biau. Implicit regularization of deep residual networks towards neural ODEs. ar Xiv preprint ar Xiv:2309.01213, 2023.

Youssef Marzouk, Zhi Ren, Sven Wang, and Jakob Zech. Distribution learning via neural diﬀerential equations: A nonparametric statistical perspective. ar Xiv preprint ar Xiv:2309.01043, 2023.

Dan Mikulincer and Yair Shenfeld. The Brownian transport map. ar Xiv preprint ar Xiv:2111.11521, 2021.

Dan Mikulincer and Yair Shenfeld. On the Lipschitz properties of transportation along heat ﬂows. In Geometric Aspects of Functional Analysis: Israel Seminar (GAFA) 2020-2022, pages 269 290. Springer, 2023.

Youssef Mroueh and Truyen Nguyen. On the convergence of gradient descent in GANs: MMD GAN as a gradient ﬂow. In Proceedings of The 24th International Conference on Artiﬁcial Intelligence and Statistics, volume 130, pages 1720 1728. PMLR, 2021.

Youssef Mroueh, Tom Sercu, and Anant Raj. Sobolev descent. In Proceedings of the 22nd International Conference on Artiﬁcial Intelligence and Statistics, volume 89, pages 2976 2985. PMLR, 2019.

Joe Neeman. Lipschitz changes of variables via heat ﬂow. ar Xiv preprint ar Xiv:2201.03403, 2022.

Kirill Neklyudov, Rob Brekelmans, Daniel Severo, and Alireza Makhzani. Action matching: Learning stochastic dynamics from samples. In Proceedings of the 40th International Conference on Machine Learning, volume 202, pages 25858 25889. PMLR, 2023.

Derek Onken, Samy Wu Fung, Xingjian Li, and Lars Ruthotto. OT-ﬂow: Fast and accurate continuous normalizing ﬂows via optimal transport. In Proceedings of the AAAI Conference on Artiﬁcial Intelligence, volume 35, pages 9223 9232, 2021.

Daniel P. Palomar and Sergio Verd u. Gradient of mutual information in linear vector Gaussian channels. IEEE Transactions on Information Theory, 52(1):141 154, 2005.

George Papamakarios, Eric Nalisnick, Danilo Jimenez Rezende, Shakir Mohamed, and Balaji Lakshminarayanan. Normalizing ﬂows for probabilistic modeling and inference. Journal of Machine Learning Research, 22(57):1 64, 2021.

Aram-Alexandre Pooladian, Heli Ben-Hamu, Carles Domingo-Enrich, Brandon Amos, Yaron Lipman, and Ricky T.Q. Chen. Multisample ﬂow matching: Straightening ﬂows with minibatch couplings. In Proceedings of the 40th International Conference on Machine Learning, volume 202, pages 28100 28127. PMLR, 2023.

Danilo J. Rezende and Shakir Mohamed. Variational inference with normalizing ﬂows. In Proceedings of the 32nd International Conference on Machine Learning, volume 37, pages 1530 1538. PMLR, 2015.

Gaussian Interpolation Flows

Danilo J. Rezende, Shakir Mohamed, and Daan Wierstra. Stochastic backpropagation and approximate inference in deep generative models. In Proceedings of the 31st International Conference on Machine Learning, pages 1278 1286. PMLR, 2014.

Herbert E. Robbins. An empirical Bayes approach to statistics. In Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, 1954 1955, volume I, page 157 163, Berkeley and Los Angeles, 1956. University of California Press.

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj orn Ommer. High-resolution image synthesis with latent diﬀusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10684 10695, 2022.

Domenec Ruiz-Balet and Enrique Zuazua. Neural ODE control for classiﬁcation, approximation, and transport. SIAM Review, 65(3):735 773, 2023.

Ruslan Salakhutdinov. Learning deep generative models. Annual Review of Statistics and Its Application, 2(1):361 385, 2015.

Adrien Saumard and Jon A. Wellner. Log-concavity and strong log-concavity: A review. Statistics Surveys, 8:45 114, 2014.

Neta Shaul, Ricky T.Q. Chen, Maximilian Nickel, Matthew Le, and Yaron Lipman. On kinetic optimal probability paths for generative models. In Proceedings of the 40th International Conference on Machine Learning, volume 202, pages 30883 30907. PMLR, 2023.

Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsupervised learning using nonequilibrium thermodynamics. In Proceedings of the 32nd International Conference on Machine Learning, volume 37, pages 2256 2265. PMLR, 2015.

Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diﬀusion implicit models. In International Conference on Learning Representations, 2021a.

Yang Song and Prafulla Dhariwal. Improved techniques for training consistency models. ar Xiv preprint ar Xiv:2310.14189, 2023.

Yang Song and Stefano Ermon. Generative modeling by estimating gradients of the data distribution. In Advances in Neural Information Processing Systems, volume 32, pages 11895 11907. Curran Associates, Inc., 2019.

Yang Song and Stefano Ermon. Improved techniques for training score-based generative models. In Advances in Neural Information Processing Systems, volume 33, pages 12438 12448. Curran Associates, Inc., 2020.

Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic diﬀerential equations. In International Conference on Learning Representations, 2021b.

Gao, Huang, and Jiao

Yang Song, Prafulla Dhariwal, Mark Chen, and Ilya Sutskever. Consistency models. In Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pages 32211 32252. PMLR, 23 29 Jul 2023.

Xuan Su, Jiaming Song, Chenlin Meng, and Stefano Ermon. Dual diﬀusion implicit bridges for image-to-image translation. In The Eleventh International Conference on Learning Representations, 2023.

Esteban G. Tabak and Cristina V. Turner. A family of nonparametric density estimation algorithms. Communications on Pure and Applied Mathematics, 66(2):145 164, 2013.

Esteban G. Tabak and Eric Vanden-Eijnden. Density estimation by dual ascent of the log-likelihood. Communications in Mathematical Sciences, 8(1):217 233, 2010.

Alexander Tong, Nikolay Malkin, Guillaume Huguet, Yanlei Zhang, Jarrid Rector-Brooks, Kilian Fatras, Guy Wolf, and Yoshua Bengio. Conditional ﬂow matching: Simulation-free dynamic optimal transport. ar Xiv preprint ar Xiv:2302.00482, 2023.

Andre Wibisono and Varun Jog. Convexity of mutual information along the heat ﬂow. In 2018 IEEE International Symposium on Information Theory (ISIT), pages 1615 1619. IEEE, 2018a.

Andre Wibisono and Varun Jog. Convexity of mutual information along the Ornstein Uhlenbeck ﬂow. In 2018 International Symposium on Information Theory and Its Applications (ISITA), pages 55 59. IEEE, 2018b.

Andre Wibisono, Varun Jog, and Po-Ling Loh. Information and estimation in Fokker Planck channels. In 2017 IEEE International Symposium on Information Theory (ISIT), pages 2673 2677. IEEE, 2017.

Yihong Wu and Sergio Verd u. Functional properties of minimum mean-square error and mutual information. IEEE Transactions on Information Theory, 58(3):1289 1301, 2011.

Chen Xu, Xiuyuan Cheng, and Yao Xie. Invertible normalizing ﬂow neural networks by JKO scheme. ar Xiv preprint ar Xiv:2212.14424, 2022.

Liu Yang and George E. Karniadakis. Potential ﬂow generator with L2 optimal transport regularity for generative models. IEEE Transactions on Neural Networks and Learning Systems, 33(2):528 538, 2020.

Linfeng Zhang, Weinan E, and Lei Wang. Monge-Amp ere ﬂow for generative modeling. ar Xiv preprint ar Xiv:1809.10188, 2018.

Kaiwen Zheng, Cheng Lu, Jianfei Chen, and Jun Zhu. Improved techniques for maximum likelihood estimation for diﬀusion ODEs. ar Xiv preprint ar Xiv:2305.03935, 2023.

Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A. Efros. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, pages 2223 2232, 2017.