# gaussian_interpolation_flows__76314f8b.pdf Journal of Machine Learning Research 25 (2024) 1-52 Submitted 11/23; Revised 7/24; Published 8/24 Gaussian Interpolation Flows Yuan Gao yuan0.gao@connect.polyu.hk Department of Applied Mathematics The Hong Kong Polytechnic University Hong Kong SAR, China Jian Huang j.huang@polyu.edu.hk Departments of Data Science and AI, and Applied Mathematics The Hong Kong Polytechnic University Hong Kong SAR, China Yuling Jiao yulingjiaomath@whu.edu.cn School of Mathematics and Statistics and Hubei Key Laboratory of Computational Science Wuhan University, Wuhan, China Editor: Mingyuan Zhou Gaussian denoising has emerged as a powerful method for constructing simulation-free continuous normalizing flows for generative modeling. Despite their empirical successes, theoretical properties of these flows and the regularizing effect of Gaussian denoising have remained largely unexplored. In this work, we aim to address this gap by investigating the well-posedness of simulation-free continuous normalizing flows built on Gaussian denoising. Through a unified framework termed Gaussian interpolation flow, we establish the Lipschitz regularity of the flow velocity field, the existence and uniqueness of the flow, and the Lipschitz continuity of the flow map and the time-reversed flow map for several rich classes of target distributions. This analysis also sheds light on the auto-encoding and cycle consistency properties of Gaussian interpolation flows. Additionally, we study the stability of these flows in source distributions and perturbations of the velocity field, using the quadratic Wasserstein distance as a metric. Our findings offer valuable insights into the learning techniques employed in Gaussian interpolation flows for generative modeling, providing a solid theoretical foundation for end-to-end error analyses of learning Gaussian interpolation flows with empirical observations. Keywords: Continuous normalizing flows, Gaussian denoising, generative modeling, Lipschitz transport maps, stochastic interpolation. 1. Introduction Generative modeling, which aims to learn the underlying data generating distribution from a finite sample, is a fundamental task in the field of machine learning and statistics (Salakhutdinov, 2015). Deep generative models (DGMs) find wide-ranging applications across diverse domains such as computer vision, natural language processing, drug discovery, and recom- c 2024 Yuan Gao, Jian Huang, and Yuling Jiao. License: CC-BY 4.0, see https://creativecommons.org/licenses/by/4.0/. Attribution requirements are provided at http://jmlr.org/papers/v25/23-1515.html. Gao, Huang, and Jiao mendation systems. The core objective of DGMs is to learn a nonlinear mapping, either deterministic or stochastic (with outsourcing randomness), which transforms latent samples drawn from a simple reference distribution into samples that closely resemble the target distribution. Generative adversarial networks (GANs) have emerged as a prominent class of DGMs (Goodfellow et al., 2014; Arjovsky et al., 2017; Goodfellow et al., 2020). Through an adversarial training process, GANs learn to approximately generate samples from the data distribution. Variational auto-encoders (VAEs) are another category of DGMs (Kingma and Welling, 2014; Rezende et al., 2014; Kingma and Welling, 2019). In VAEs, the encoding and decoding procedures produce a compressed and structured latent representation, enabling efficient sampling and interpolation. Score-based diffusion models are a promising approach to deep generative modeling that has evolved rapidly since its emergence (Song and Ermon, 2019, 2020; Ho et al., 2020; Song et al., 2021b,a). The basis of score-based diffusion models lies in the notion of the score function, which characterizes the gradient of the log-density function of a given distribution. In addition, normalizing flows have gained attention as another powerful class of DGMs (Tabak and Vanden-Eijnden, 2010; Tabak and Turner, 2013; Kobyzev et al., 2020; Papamakarios et al., 2021). In normalizing flows, an invertible mapping is learned to transform a simple source distribution into a more complex target distribution by a composition of a series of parameterized, invertible and differentiable intermediate transformations. This framework allows for efficient sampling and training by maximum likelihood estimation (Dinh et al., 2014; Rezende and Mohamed, 2015). Continuous normalizing flows (CNFs) pursue this idea further by performing the transformation over continuous time, enabling fine-grained modeling of dynamic systems from the source distribution to the target distribution. The essence of CNFs lies in defining ordinary differential equations (ODEs) that govern the evolution of CNFs in terms of continuous trajectories. Inspired by the Gaussian denoising approach, which learns a target distribution by denoising its Gaussian smoothed counterpart, many authors have considered simulation-free estimation methods that have shown great potential in large-scale applications (Song et al., 2021a; Liu et al., 2023; Albergo and Vanden-Eijnden, 2023; Lipman et al., 2023; Neklyudov et al., 2023; Tong et al., 2023; Chen and Lipman, 2023; Albergo et al., 2023b; Shaul et al., 2023; Pooladian et al., 2023). However, despite the empirical success of simulation-free CNFs based on Gaussian denoising, a rigorous theoretical analysis of these CNFs has received limited attention thus far. In this work, we explore an ODE flow-based approach for generative modeling, which we refer to as Gaussian Interpolation Flows (GIFs). This method is derived from the Gaussian stochastic interpolation detailed in Section 3. GIFs represent a straightforward extension of the stochastic interpolation method (Albergo and Vanden-Eijnden, 2023; Liu et al., 2023; Lipman et al., 2023). They can be considered a class of CNFs and encompass various ODE flows as special cases. According to the classical Cauchy-Lipschitz theorem, also known as the Picard-Lindel of theorem (Hartman, 2002b, Theorem 1.1), a unique solution to the initial value problem for an ODE flow exists if the velocity field is continuous in the time variable and uniformly Lipschitz continuous in the space variable. In the case of GIFs, the velocity field depends on the score function of the push-forward measure. Therefore, it remains to be shown that this velocity field satisfies the regularity conditions stipulated Gaussian Interpolation Flows by the Cauchy-Lipschitz theorem. These regularity conditions are commonly assumed in the literature when analyzing the convergence properties of CNFs or general neural ODEs (Chen et al., 2018; Biloˇs et al., 2021; Marion et al., 2023; Marion, 2023; Marzouk et al., 2023). However, there is a theoretical gap in understanding how to translate these regularity conditions on velocity fields into conditions on target distributions. The main focus of this work is to study and establish the theoretical properties of Gaussian interpolation flow and its corresponding flow map. We show that the regularity conditions of the Cauchy-Lipschitz theorem are satisfied for several rich classes of probability distributions using variance inequalities. Based on the obtained regularity results, we further expose the well-posedness of GIFs, the Lipschitz continuity of flow mappings, and applications to generative modeling. The well-posedness results are crucial for studying the approximation and convergence properties of GIFs learned with the flow or score matching method. When applied to generative modeling, our results further elucidate the auto-encoding and cycle consistency properties exhibited by GIFs. Geometric regularity (Assumption 2.2) Gaussian interpolation flows Lipschitz velocity fields (Proposition 4.1) Well-posedness (Theorems 5.1, 5.2) Lipschitz flow maps (Propositions 5.1, 5.2) Auto-encoding, cycle consistency Stability in source distributions, stability in velocity fields (Propositions 6.1, 6.2) Corollary 5.1 Corollary 6.3 Corollary B.1 Figure 1: Roadmap of the main results. 1.1 Our main contributions We provide an overview of the main results in Figure 1, in which we indicate the assumptions used in our analysis and the relationship between the results. We also summarize our main contributions below. In Section 3, we extend the framework of stochastic interpolation proposed in Albergo and Vanden-Eijnden (2023). Various ODE flows can be considered special cases of the extended framework. We prove that the marginal distributions of GIFs satisfy Gao, Huang, and Jiao the continuity equation converging to the target distribution in the weak sense. Several explicit formulas of the velocity field and its derivatives are derived, which can facilitate computation and regularity estimation. In Sections 4 and 5, we establish the spatial Lipschitz regularity of the velocity field for a range of target measures with rich structures, which is sufficient to guarantee the well-posedness of GIFs. Additionally, we deduce the Lipschitz regularity of both the flow map and its time-reversed counterpart. The well-posedness of GIFs is an essential attribute, serving as a foundational requirement for investigating numerical solutions of GIFs. It is important to note that while the flow maps are demonstrated to be Lipschitz continuous transport maps for generative modeling, the Lipschitz regularity for optimal transport maps has only been partially established to date. In Section 6, we show that the auto-encoding and cycle consistency properties of GIFs are inherently satisfied when the flow maps exhibit Lipschitz continuity with respect to the spatial variable. This demonstrates that exact auto-encoding and cycle consistency are intrinsic characteristics of GIFs. Our findings lend theoretical support to the findings made by Su et al. (2023), as illustrated in Figures 3 and 4. In Section 6, we conduct the stability analysis of GIFs, examining how they respond to changes in source distributions and to perturbations in the velocity field. This analysis, conducted in terms of the quadratic Wasserstein distance, provides valuable insights that justify the use of learning techniques such as Gaussian initialization and flow or score matching. 2. Preliminaries In this section, we include several preliminary setups to show notations, basic assumptions, and several useful variance inequalities. Notation. Here we summarize the notation. The space Rd is endowed with the Euclidean metric and we denote by and , the corresponding norm and inner product. Let Sd 1 := {x Rd : x = 1}. For a matrix A Rk d, we use A for the transpose, and the spectral norm is denoted by A 2,2 := supx Sd 1 Ax . For a square matrix A Rd d, we use det(A) for the determinant and Tr(A) for the trace. We use Id to denote the d d identity matrix. For two symmetric matrices A, B Rd d, we denote A B or B A if A B is positive semi-definite. For two vectors x, y Rd, we denote x y := xy . For Ω1 Rk, Ω2 Rd, n 1, we denote by Cn(Ω1; Ω2) the space of continuous functions f : Ω1 Ω2 that are n times differentiable and whose partial derivatives of order n are continuous. If Ω2 R, we simply write Cn(Ω1). For any f(x) C2(Rd), let xf, 2 xf, x f, and xf denote its gradient, Hessian, divergence, and Laplacian, respectively. We use X Y to denote X CY for some constant C > 0. The function composition operation is marked as g f := g(f(x)) for functions f and g. The Borel σ-algebra of Rd is denoted by B(Rd). The space of probability measures defined on (Rd, B(Rd)) is denoted as P(Rd). For any Rd-valued random variable X, we use E[X] and Cov(X) to denote its expectation and covariance matrix, respectively. We use µ ν to denote the convolution for any two probability measures µ and ν, and we use d= to indicate Gaussian Interpolation Flows two random variables have the same probability distribution. For a random variable X, let Law(X) denote its probability distribution. Let g : Rk Rd be a measurable mapping and µ be a probability measure on Rk. The push-forward measure f#µ of a measurable set A is defined as f#µ := µ(f 1(A)). Let N(m, Σ) denote a d-dimensional Gaussian random variable with mean vector m Rd and covariance matrix Σ Rd d. For simplicity, let γd,σ2 := N(0, σ2Id), and let ϕm,σ2(x) denote the probability density function of N(m, σ2Id) with respect to the Lebesgue measure. If m = 0, σ = 1, we abbreviate these as γd and ϕ(x). Let Lp(Rd; Rℓ, µ) denote the Lp space with the Lp norm for p [1, ] w.r.t. a measure µ. To simplify the notation, we write Lp(Rd, µ) if ℓ= 1, Lp(Rd; Rℓ) if the Lebesgue measure is used, and Lp(Rd) if both hold. 2.1 Assumptions We focus on the probability distributions satisfying several types of assumptions of weak convexity, which offer a geometric notion of regularity that is dimension-free in the study of high-dimensional distributions (Klartag, 2010). On the one hand, weak-convexity regularity conditions are useful in deriving dimension-free guarantees for generative modeling and sampling from high-dimensional distributions. On the other hand, they accommodate distributions with complex shapes, including those with multiple modes. Definition 2.1 (Cattiaux and Guillin, 2014) A probability measure µ(dx) = exp( U)dx is κ-semi-log-concave for some κ R if its support Ω Rd is convex and its potential function U C2(Ω) satisfies 2 x U(x) κId, x Ω. The κ-semi-log-concavity condition is a relaxed notion of log-concavity, since here κ < 0 is allowed. When κ 0, we are considering a log-concave probability measure that is proved to be unimodal (Saumard and Wellner, 2014). However, when κ < 0, a κ-semi-log-concave probability measure can be multimodal. Definition 2.2 (Eldan and Lee, 2018) A probability measure µ(dx) = exp( U)dx is βsemi-log-convex for some β > 0 if its support Ω Rd is convex and its potential function U C2(Ω) satisfies 2 x U(x) βId, x Ω. The following definition of L-log-Lipschitz continuity is a variant of L-Lipschitz continuity. It characterizes a first-order condition on the target function rather than a second-order condition such as κ-semi-log-concavity and β-semi-log-convexity in Definitions 2.1 and 2.2. Definition 2.3 A function f : Rd R+ is L-log-Lipschitz continuous if its logarithm is L-Lipschitz continuous for some L 0. Based on the definitions, we present two assumptions on the target distribution. Assumption 2.1 concerns the absolute continuity and the moment condition. Assumption 2.2 imposes geometric regularity conditions. Assumption 2.1 The probability measure ν is absolutely continuous with respect to the Lebesgue measure and has a finite second moment. Gao, Huang, and Jiao Assumption 2.2 Let D := (1/ 2)diam(supp(ν)). The probability measure ν satisfies one or more of the following conditions: (i) ν is β-semi-log-convex for some β > 0 and κ-semi-log-concave for some κ > 0 with supp(ν) = Rd; (ii) ν is κ-semi-log-concave for some κ R with D (0, ); (iii) ν = γd,σ2 ρ where ρ is a probability measure supported on a Euclidean ball of radius R on Rd; (iv) ν is β-semi-log-convex for some β > 0, κ-semi-log-concave for some κ 0, and dν dγd (x) is L-log-Lipschitz in x for some L 0 with supp(ν) = Rd. Multimodal distributions. Assumption 2.2 enumerates scenarios where probability distributions are endowed with geometric regularity. We examine the scenarios and clarify whether they cover multimodal distributions. Scenario (i) is referred to as the classical strong log-concavity case (κ > 0), and thus, describes unimodal distributions. Scenario (ii) allows κ 0 and requires that the support is bounded. Mixtures of Gaussian distributions are considered in Scenario (iii), and typically are multimodal distributions. Scenario (iv) also allows κ 0 when considering a log-Lipschitz perturbation of the standard Gaussian distribution. Both Scenario (ii) and Scenario (iv) incorporate multimodal distributions due to the potential negative lower bound κ. Lipschitz score. Lipschitz continuity of the score function is a basic regularity assumption on target distributions in the study of sampling algorithms based on Langevin and Hamiltonian dynamics. Even for high-dimensional distributions, this assumption endows a great source of regularity. For an L-Lipschitz score function, its corresponding distribution is both L-semi-log-convex and ( L)-semi-log-concave for some L 0. 2.2 Variance inequalities Variance inequalities like the Brascamp-Lieb inequality and the Cram er-Rao inequality are fundamental inequalities for explaining the regularizing effect of Gaussian denoising. Combined with κ-semi-log-concavity and β-semi-log-convexity, these inequalities are crucial for deducing the Lipschitz regularity of the velocity fields of GIFs in Proposition 4.1-(b) and (c). Lemma 2.1 (Brascamp-Lieb inequality) Let µ(dx) = exp( U(x))dx be a probability measure on a convex set Ω Rd whose potential function U : Ω R is of class C2 and strictly convex. Then for every locally Lipschitz function f L2(Ω, µ), Varµ(f) Eµ xf, ( 2 x U) 1 xf . (2.1) When applied to functions of the form f : x 7 x, e for any e Sd 1, the Brascamp Lieb inequality yields an upper bound of the covariance matrix Covµ(X) Eµ ( 2 x U(x)) 1 (2.2) Gaussian Interpolation Flows with equality if X N(m, Σ) with Σ positive definite. Under the strong log-concavity condition, that is, µ is κ-semi-log-concave with κ > 0, and if the Euclidean Bakry Emery criterion is satisfied (Bakry and Emery, 1985), the Brascamp-Lieb inequality instantly recovers the Poincar e inequality (see Definition G.2). The Brascamp-Lieb inequality originally appeared in (Brascamp and Lieb, 1976, Theorem 4.1). Alternative proofs are provided in Bobkov and Ledoux (2000); Bakry et al. (2014); Cordero-Erausquin (2017). The dimension-free inequality (2.1) can be further strengthened to obtain several variants with dimensional improvement. Lemma 2.2 (Cram er-Rao inequality) Let µ(dx) = exp( U(x))dx be a probability measure on Rd whose potential function U : Rd R is of class C2. Then for every f C1(Rd), Varµ(f) Eµ[ xf], Eµ[ 2 x U] 1 Eµ[ xf] . (2.3) When applied to functions of the form f : x 7 x, e for any e Sd 1, the Cram er-Rao inequality yields a lower bound of the covariance matrix Covµ(X) Eµ[ 2 x U(x)] 1 (2.4) with equality as well if X N(m, Σ) with Σ positive definite. The Cram er-Rao inequality plays a central role in asymptotic statistics as well as in information theory. The inequality (2.4) has an alternative derivation from the Cram er Rao bound for the location parameter. For detailed proofs of the Cram er-Rao inequality, readers are referred to Chewi and Pooladian (2022); Dai et al. (2023), and the references therein. 3. Gaussian interpolation flows Simulation-free CNFs represent a potent class of generative models based on ODE flows. Albergo and Vanden-Eijnden (2023) and Albergo et al. (2023b) introduce an innovative CNF that is constructed using stochastic interpolation techniques, such as Gaussian denoising. They conduct a thorough investigation of this flow, particularly examining its applications and effectiveness in generative modeling. We study the ODE flow and its associated flow map as defined by the Gaussian denoising process. This process has been explored from various perspectives, including diffusion models and stochastic interpolants. Building upon the work of Albergo and Vanden-Eijnden (2023) and Albergo et al. (2023b), we expand the stochastic interpolant framework by relaxing certain conditions on the functions at and bt, offering a more comprehensive perspective on the Gaussian denoising process. In our generalization, we introduce an adaptive starting point to the stochastic interpolation framework, which allows for greater flexibility in the modeling process. By examining this modified framework, we aim to demonstrate that the Gaussian denoising principle is effectively implemented within the context of stochastic interpolation. Definition 3.1 (Vector interpolation) Let z Rd, x1 Rd be two vectors in the Euclidean space and let x0 := a0z +b0x1 with a0 > 0, b0 0. Then we construct an interpolant Gao, Huang, and Jiao between x0 and x1 over time t [0, 1] through It(x0, x1), defined by It(x0, x1) = atz + btx1, (3.1) where at, bt satisfy at 0, bt 0, a0 > 0, b0 0, a1 = 0, b1 = 1, at > 0 for any t (0, 1), bt > 0 for any t (0, 1), at, bt C2([0, 1)), a2 t C1([0, 1]), bt C1([0, 1]). Remark 3.1 Compared with the vector interpolant defined by Albergo and Vanden-Eijnden (2023) (a.k.a. one-sided interpolant in Albergo et al. (2023b)), we extend its definition by relaxing the requirements that a0 = 1, b0 = 0 with a0 > 0, b0 0. This consideration is largely motivated by analyzing the probability flow ODEs of the variance-exploding (VE) SDE and the variance-preserving (VP) SDE (Song et al., 2021b). We illustrate examples of interpolants incorporated by Definition 3.1 in Table 1. Remark 3.2 We have eased the smoothness conditions for the functions at and bt required in Albergo and Vanden-Eijnden (2023). Specifically, we consider the case where at, bt C2([0, 1)), a2 t C1([0, 1]), and bt C1([0, 1]). This relaxation enables us to include the F ollmer flow into our framework, characterized by at = 1 t2 and bt = t. It is evident that at = 1 t2 does not fulfill the condition at C2([0, 1]), but it does meet the requirements at C2([0, 1)) and a2 t C1([0, 1]). Remark 3.3 The C2 regularity of at, bt is necessary to derive the regularity of the velocity field v(t, x) in Eq. (3.5) concerning the time variable t. In addition, the C1 regularity of a2 t , bt is sufficient to ensure the Lipschitz regularity of the velocity field v(t, x) in Eq. (3.5) concerning the space variable x. A natural generalization of the vector interpolant (3.1) is to construct a set interpolant between two convex sets through Minkowski sum, which is common in convex geometry. A set interpolant stimulates the construction of a measure interpolant between a structured source measure and a target measure. As noted, we can construct a measure interpolation using a Gaussian convolution path. The measure interpolation is particularly relevant to Gaussian denoising and Gaussian channels in information theory as elucidated in Remark 3.8. Because of this connection with Gaussian denoising, we call the measure interpolation a Gaussian stochastic interpolation. The Gaussian stochastic interpolation can be understood as a collection of linear combinations of a standard Gaussian random variable and the target random variable. The coefficients of the linear combinations vary with time t [0, 1] as shown in Definition 3.1. Later in this section, we will show this Gaussian stochastic interpolation can be transformed into a deterministic ODE flow. Gaussian stochastic interpolation has been investigated from several perspectives in the literature. The rectified flow has been proposed in Liu et al. (2023), and its theoretical connection with optimal transport has been investigated in Liu (2022). The formulation of the rectified flow is to learn the ODE flow defined by stochastic interpolation with linear Gaussian Interpolation Flows time coefficients. In Appendix C of Liu et al. (2023), there is a nonlinear extension of the rectified flow in which the linear coefficients are replaced by general nonlinear coefficients. Albergo et al. (2023b) extends the stochastic interpolant framework proposed in (Albergo and Vanden-Eijnden, 2023) by considering a linear combination among three random variables. In Section 3 of Albergo et al. (2023b), the original stochastic interpolant framework is recovered as a one-sided interpolant between the Gaussian distribution and the target distribution. Moreover, Lipman et al. (2023) propose a flow matching method which directly learns a Gaussian conditional probability path with a neural ODE. In Section 4.1 of Lipman et al. (2023), the velocity fields of the variance exploding and variance preserving probability flows are shown as special instances of the flow matching framework. We summarize these formulations as Gaussian stochastic interpolation by slightly extending the original stochastic interpolant framework. Type VE VP Linear F ollmer Trigonometric at αt αt 1 t 1 t2 cos( π 2 t) bt 1 p 1 α2 t t t sin(π 2 t) a0 α0 α0 1 1 1 b0 1 p 1 α2 0 0 0 0 Source Convolution Convolution γd γd γd Table 1: Summary of various measure interpolants including VE interpolant (Song et al., 2021b), VP interpolant (Song et al., 2021b), linear interpolant (Liu et al., 2023), F ollmer interpolant (Dai et al., 2023), and trigonometric interpolant (Albergo and Vanden-Eijnden, 2023). There are two types of source measures including a standard Gaussian distribution γd and a convoluted distribution consisting of the target distribution and γd. Definition 3.2 (Measure interpolation) Let µ = Law(X0) and ν = Law(X1) be two probability measures satisfying X0 = a0Z + b0X1 where Z γd := N(0, Id) is independent from X1. We call (Xt)t [0,1] a Gaussian stochastic interpolation from the source measure µ to the target measure ν, which is defined through It over time interval [0, 1] as follows Xt = It(X0, X1), X0 = a0Z + b0X1, Z γd, X1 ν. (3.3) Remark 3.4 It is obvious that the marginal distribution of Xt satisfies Xt d= at Z + bt X1 with Z γd, X1 ν. Motivated by the time-varying properties of the Gaussian stochastic interpolation, we derive that its marginal flow satisfies the continuity equation. This result characterizes the dynamics of the marginal density flow of the Gaussian stochastic interpolation. Theorem 3.1 Suppose that Assumption 2.1 holds. Then the marginal flow (pt)t [0,1] of the Gaussian stochastic interpolation (Xt)t [0,1] between µ and ν satisfies the continuity equation Gao, Huang, and Jiao tpt + x (ptv(t, x)) = 0, (t, x) [0, 1] Rd, p0(x) = dµ dx(x), p1(x) = dν dx(x) (3.4) in the weak sense with the velocity field v(t, x) := E[ at Z + bt X1|Xt = x], t (0, 1), (3.5) v(0, x) := lim t 0 v(t, x), v(1, x) := lim t 1 v(t, x). (3.6) Remark 3.5 We notice that x = at E[Z|Xt = x] + bt E[X1|Xt = x] due to Eq. (3.3). Then it holds that v(t, x) = at at x + bt at at bt E[X1|Xt = x], t (0, 1). (3.7) We also notice that, according to Tweedie s formula (cf. Lemma G.1 in the appendix), it holds that s(t, x) = bt a2 t E [X1|Xt = x] 1 a2 t x, t (0, 1), (3.8) where s(t, x) is the score function of the marginal distribution of Xt pt. Combining (3.7) and (3.8), it follows that the velocity field is a gradient field and its nonlinear term is the score function s(t, x), namely, for any t (0, 1), v(t, x) = bt bt x + bt bt a2 t atat s(t, x). (3.9) Remark 3.6 A relevant result has been provided in the proof of (Albergo and Vanden Eijnden, 2023, Proposition 4) in a restricted case that a0 = 1, b0 = 0. In this case, if a0, a1, b0, b1 are well-defined, the velocity field reads v(0, x) = a0x + b0Eν[X1], v(1, x) = b1x + a1Eγd[Z] at time 0 and 1. Otherwise, if any one of a0, a1, b0, b1 is not well-defined, the velocity field v(0, x) or v(1, x) should be considered on a case-by-case basis. In addition, we provide an alternative viewpoint of the relationship between the velocity field associated with stochastic interpolation and the score function of its marginal flow using Tweedie s formula in Lemma G.1. Remark 3.7 (Diffusion process) The marginal flow of the Gaussian stochastic interpolation (3.3) coincides with the time-reversed marginal flow of a diffusion process (Xt)t [0,1) (Albergo et al., 2023b, Theorem 3.5) defined by d Xt = b1 t b1 t Xt + b1 t a2 1 t a1 ta1 t d W t. Remark 3.8 (Gaussian denoising) The Gaussian stochastic interpolation has an informationtheoretic interpretation as a time-varying Gaussian channel. Here a2 t and b2 t /a2 t stand for the noise level and signal-to-noise ratio (SNR) for time t [0, 1], respectively. As time t 1, we are approaching the high-SNR regime, that is, the SNR b2 t /a2 t grows to . Moreover, the SNR b2 t /a2 t is monotonically increasing in time t over [0, 1]. The Gaussian noise level gets reduced through this Gaussian denoising process. Gaussian Interpolation Flows Figure 2: Snapshots of a Gaussian interpolation flow based on the F ollmer interpolant. The source distribution is the standard two-dimensional Gaussian distribution γ2, and the target distribution is a mixture of six two-dimensional Gaussian distributions as the shape of a circle. The image panels are placed sequentially from time t = 0 to time t = 1. We are now ready to define Gaussian interpolation flows by representing the continuity equation (3.4) with Lagrangian coordinates (Ambrosio and Crippa, 2014). A basic observation is that GIFs share the same marginal density flow with Gaussian stochastic interpolations. The continuity equation (3.4) plays a central role in the derandomization procedure from Gaussian stochastic interpolations to GIFs. We additionally illustrate GIFs using a two-dimensional example as in Figure 2. Definition 3.3 (Gaussian interpolation flow) Suppose that probability measure ν satisfies Assumption 2.1. If (Xt)t [0,1] solves the initial value problem (IVP) dt (x) = v(t, Xt(x)), X0(x) µ, t [0, 1], (3.10) where µ is defined in Definition 3.2 and the velocity field v is given by Eq. (3.5) and (3.6), we call (Xt)t [0,1] a Gaussian interpolation flow associated with the target measure ν. 4. Spatial Lipschitz estimates for the velocity field We have explicated the idea of Gaussian denoising with the procedure of Gaussian stochastic interpolation or a Gaussian channel with increasing SNR w.r.t. time. By interpreting the process as an ODE flow, we derive the framework of Gaussian interpolation flows. First and foremost, an intuition is that the regularizing effect of Gaussian denoising would ensure Gao, Huang, and Jiao the Lipschitz smoothness of the velocity field. Since the standard Gaussian distribution is both 1-semi-log-concave and 1-semi-log-convex, its convolution with a target distribution will maintain its high regularity as long as the target distribution satisfies the regularity conditions. We rigorously justify this intuition by establishing spatial Lipschitz estimates for the velocity field. These estimates are established based on the upper bounds and lower bounds regarding the Jacobian matrix of the velocity field v(t, x) according to the Cauchy Lipschitz theorem, which are given in Proposition 4.1 below. To deal with the Jacobian matrix xv(t, x), we introduce a covariance expression of it and present the associated upper bounds and lower bounds. The velocity field v(t, x) is decomposed into a linear term and a nonlinear term, the score function s(t, x). To analyze the Jacobian xv(t, x), we only need to focus on xs(t, x), that is, 2 x log pt(x). To ease the notation, we would henceforth use Y for X1. Correspondingly, we replace p1(x) with p1(y) for the density function of Y. According to Bayes theorem, the marginal density pt of Xt satisfies pt(x) = Z p(t, x|y)p1(y)dy where Y p1(y) and p(t, x|y) = ϕbty,a2 t (x) is a conditional distribution induced by the Gaussian noise. Due to the factorization pt(x)p(y|t, x) = p(t, x|y)p1(y), the score function s(t, x) and its derivative xs(t, x) have the following expressions s(t, x) = x log p(y|t, x) x bty a2 t , xs(t, x) = 2 x log p(y|t, x) 1 Thanks to the expressions above, a covariance matrix expression of xs(t, x) is endowed by the exponential family property of p(y|t, x). Lemma 4.1 The conditional distribution p(y|t, x) is an exponential family distribution and a covariance matrix expression of the log-Hessian matrix 2 x log p(y|t, x) for any t (0, 1) is given by 2 x log p(y|t, x) = b2 t a4 t Cov(Y|Xt = x), (4.1) where Cov(Y|Xt = x) is the covariance matrix of Y|Xt = x p(y|t, x). Moreover, for any t (0, 1), it holds that xs(t, x) = b2 t a4 t Cov(Y|Xt = x) 1 a2 t Id, (4.2) and that xv(t, x) = b2 t a2 t Cov(Y|Xt = x) + at at Id. (4.3) Remark 4.1 Since t b2 t a2 t = 2b2 t a2 t , it follows from (4.3) that the derivative of the SNR with respect to time t controls the dependence of xv(t, x) on Cov(Y|Xt = x). The representation (4.3) can be used to upper bound and lower bound xv(t, x). This technique has been widely used to deduce the regularity of the score function concerning the space variable (Mikulincer and Shenfeld, 2021, 2023; Chen et al., 2023b; Lee et al., 2023; Gaussian Interpolation Flows Chen et al., 2023a). The covariance matrix expression (4.2) of the score function has a close connection with the Hatsell-Nolte identity in information theory (Hatsell and Nolte, 1971; Palomar and Verd u, 2005; Wu and Verd u, 2011; Cai and Wu, 2014; Wibisono et al., 2017; Wibisono and Jog, 2018a,b; Dytso et al., 2023a,b). Employing the covariance expression in Lemma 4.1, we establish several bounds on xv(t, x) in the following proposition. Proposition 4.1 Let ν(dy) = p1(y)dy be a probability measure on Rd with D := (1/ 2)diam(supp(ν)). (a) For any t (0, 1), at at Id xv(t, x) ( bt(at bt atbt) a3 t D2 + at (b) Suppose that p1 is β-semi-log-convex with β > 0 and supp(p1) = Rd. Then for any t (0, 1], xv(t, x) βat at + bt bt βa2 t + b2 t Id. (c) Suppose that p1 is κ-semi-log-concave with κ R. Then for any t (t0, 1], xv(t, x) κat at + bt bt κa2 t + b2 t Id, where t0 is the root of the equation κ + b2 t a2 t = 0 over t (0, 1) if κ < 0 and t0 = 0 if κ 0. (d) Fix a probability measure ρ on Rd supported on a Euclidean ball of radius R, and let ν := γd,σ2 ρ with σ > 0. Then for any t (0, 1), atat + σ2 btbt a2 t + σ2b2 t Id xv(t, x) ( atbt(at bt atbt) (a2 t + σ2b2 t )2 R2 + atat + σ2 btbt a2 t + σ2b2 t (e) Suppose that dν dγd (x) is L-log-Lipschitz for some L 0. Then for any t (0, 1), bt a2 t atat Bt L2 bt a2 t +b2 t 2 + atat+ btbt xv(t, x) n bt bt a2 t atat Bt + atat+ btbt where Bt := 5Lbt(a2 t + b2 t ) 3 2 (L + (log( p a2 t + b2 t /bt)) 1 Comparing part (a) with part (d) in Proposition 4.1, we can see that the bounds in (a) are consistent with those in (d) in the sense that (a) is a limiting case of part (d) as σ 0. The lower bound in part (a) blows up at time t = 1 owing to a1 = 0, while in part (d) it Gao, Huang, and Jiao behaves well since the lower bound in part (d) coincides with a lower bound indicated by the 1 σ2 -semi-log-convex property. It reveals that the regularity of the velocity field v(t, x) with respect to the space variable x improves when the target random variable is bounded and is subject to Gaussian perturbation. The lower bound in part (b) and the upper bound in part (c) are tight in the sense that both of them are attainable for a Gaussian target distribution, that is, xv(t, x) = βat at + bt bt βa2 t + b2 t Id if ν = γd,1/β. The upper and lower bounds in Proposition 4.1-(a) and (e) become vacuous as they both blow up at time t = 1. The intuition behind is that the Jacobian matrix of the velocity field can be both lower and upper bounded at time t = 1 only if the score function of the target measure is Lipschitz continuous in the space variable x. Under an additional Lipschitz score assumption (equivalently, β-semi-log-convex and κ-semi-log-concave for some β = κ 0), the upper and lower bounds in part (a) and part (e) can be strengthened at time t = 1 based on the lower bound in (b) and the upper bound in part (c). According to Proposition 4.1-(a) and (c), there are two upper bounds available that shall be compared with each other. One is the D2-based bound in part (a), and the other is the κ-based bound in part (c). According to the proof of Proposition 4.1 given in the Appendix, these two upper bounds are equal if and only if the corresponding upper bounds on Cov(Y|Xt = x) are equal, that is, D2 = κ + b2 t a2 t Then the critical case is κD2 = 1 since simplifying Eq. (4.4) reveals that D 2 κ = b2 t a2 t . (4.5) We note that b2 t /a2 t , ranging over (0, ), is monotonically increasing w.r.t. t (0, 1). Suppose that κD2 > 1. Then (4.5) has no root over t (0, 1), which implies that the κ-based bound is tighter over [0, 1), i.e., D2 > κ + b2 t a2 t 1 , t [0, 1). Otherwise, suppose that κD2 < 1. Then (4.5) has a root t1 (0, 1), which implies that the D2-based bound is tighter over [0, t1), i.e., D2 < κ + b2 t a2 t 1 , t [0, t1), and that the κ-based bound is tighter over [t1, 1), i.e., D2 κ + b2 t a2 t 1 , t [t1, 1). Gaussian Interpolation Flows Next, we present several upper bounds on the maximum eigenvalue of the Jacobian matrix of the velocity field λmax( xv(t, x)) and its exponential estimates for studying the Lipschitz regularity of the flow maps as noted in Lemma 5.1. Corollary 4.1 Let ν be a probability measure on Rd with D := (1/ 2)diam(supp(ν)) and suppose that ν is κ-semi-log-concave with κ 0. (a) If κD2 1, then λmax( xv(t, x)) θt := κat at + bt bt κa2 t + b2 t , t [0, 1]. (4.6) (b) If κD2 < 1, then λmax( xv(t, x)) θt := at , t [0, t1), κat at+bt bt κa2 t +b2 t , t [t1, 1], (4.7) where t1 solves (4.5). Corollary 4.2 Let ν be a probability measure on Rd with D := (1/ 2)diam(supp(ν)) < and suppose that ν is κ-semi-log-concave with κ < 0. Then λmax( xv(t, x)) θt := at , t [0, t1), κat at+bt bt κa2 t +b2 t , t [t1, 1], (4.8) where t1 solves (4.5). Corollary 4.3 Fix a probability measure ρ on Rd supported on a Euclidean ball of radius R and let ν := γd,σ2 ρ with σ > 0. Then λmax( xv(t, x)) θt := atat + σ2 btbt a2 t + σ2b2 t + atbt(at bt atbt) (a2 t + σ2b2 t )2 R2. (4.9) Corollary 4.4 Suppose that ν is κ-semi-log-concave for some κ 0, and dν dγd (x) is L-log Lipschitz for some L 0. Then λmax( xv(t, x)) θt := bt a2 t atat Bt + atat+ btbt a2 t +b2 t , t [0, t2), κat at+bt bt κa2 t +b2 t , t [t2, 1], (4.10) where Bt := 5Lbt(a2 t + b2 t ) 3 2 (L + (log( p a2 t + b2 t /bt)) 1 2 ) and t2 (t0, 1). Gao, Huang, and Jiao 5. Well-posedness and Lipschtiz flow maps In this section, we study the well-posedness of GIFs and the Lipschitz properties of their flow maps. We also show that the marginal distributions of GIFs satisfy the log-Sobolev inequality and the Poincar e inequality if Assumptions 2.1 and 2.2 are satisfied. Theorem 5.1 (Well-posedness) Suppose Assumptions 2.1 and 2.2-(i), (iii), or (iv) are satisfied. Then there exists a unique solution (Xt)t [0,1] to the IVP (3.10). Moreover, the push-forward measure satisfies Xt#µ = Law(at Z + bt X1) with Z γd, X1 ν. Theorem 5.2 Suppose Assumptions 2.1 and 2.2-(ii) are satisfied. For any t (0, 1), there exists a unique solution (Xt)t [0,1 t] to the IVP (3.10). Moreover, the push-forward measure satisfies Xt#µ = Law(at Z + bt X1) with Z γd, X1 ν. Corollary 5.1 (Time-reversed flow) Suppose Assumptions 2.1 and 2.2-(i), (iii), or (iv) are satisfied. Then the time-reversed flow (X t )t [0,1] associated with ν is a unique solution to the IVP: d X t dt (x) = v(1 t, X t (x)), X 0(x) ν, t [0, 1]. (5.1) The push-forward measure satisfies X t #ν = Law(a1 t Z + b1 t X1) where Z γd, X1 ν. Moreover, the flow map satisfies X t (x) = X 1 t (x). Corollary 5.2 Suppose Assumptions 2.1 and 2.2-(ii) are satisfied. For any t (0, 1), the time-reversed flow (X t )t [t,1] associated with ν is a unique solution to the IVP: d X t dt (x) = v(1 t, X t (x)), X t (x) Law(a1 t Z + b1 t X1), t [t, 1], (5.2) where Z γd, X1 ν. The push-forward measure satisfies X t #ν = Law(a1 t Z + b1 t X1). Moreover, the flow map satisfies X t (x) = X 1 t (x). Based on the well-posedness of the flow, we can provide an upper bound on the Lipschitz constant of the induced flow map. Lemma 5.1 Suppose that a flow (Xt)t [0,1] is well-posed with a velocity field v(t, x) : [0, 1] Rd Rd of class C1 in x, and that for any (t, x) [0, 1] Rd, it holds xv(t, x) θt Id. Let the flow map Xs,t : Rd Rd be of class C1 in x for any 0 s t 1. Then the flow map Xs,t is Lipschitz continuous with an upper bound of its Lipschitz constant given by x Xs,t(x) 2,2 exp Z t s θudu . (5.3) Using Lemma 5.1, we show that the flow map of a GIF is Lipschitz continuous in the space variable x. Proposition 5.1 (Lipschitz mappings) Suppose that Assumptions 2.1 and 2.2-(i) hold. Gaussian Interpolation Flows (i) If ν is κ-semi-log-concave for some κ > 0, then the flow map X1(x) is a Lipschitz mapping, that is, x X1(x) 2,2 1 p κa2 0 + b2 0 , x Rd. In particular, if a0 = 1 and b0 = 0, then x X1(x) 2,2 1 κ, x Rd. (ii) If ν is β-semi-log-convex for some β > 0, then the time-reversed flow map X 1(x) is a Lipschitz mapping, that is, x X 1(x) 2,2 q βa2 0 + b2 0, x supp(ν). In particular, if a0 = 1 and b0 = 0, then x X 1(x) 2,2 p β, x supp(ν). Proposition 5.2 (Gaussian mixtures) Suppose that Assumptions 2.1 and 2.2-(iii) hold. Then the flow map X1(x) is a Lipschitz mapping, that is, x X1(x) 2,2 σ p a2 0 + σ2b2 0 exp a2 0 a2 0 + σ2b2 0 R2 In particular, if a0 = 1 and b0 = 0, then x X1(x) 2,2 σ exp R2 Moreover, the time-reversed flow map X 1(x) is a Lipschitz mapping, that is, x X 1(x) 2,2 q σ 2a2 0 + b2 0, x supp(ν). In particular, if a0 = 1 and b0 = 0, then x X 1(x) 2,2 1 σ, x supp(ν). Remark 5.1 Well-posed GIFs produce diffeomorphisms that transport the source measure onto the target measure. The diffeomorphism property of the transport maps are relevant to the auto-encoding and cycle consistency properties of their generative modeling applications. We defer a detailed discussion to Section 6. Early stopping implicitly mollifies the target measure with a small Gaussian noise. For image generation tasks (with bounded pixel values), the mollified target measure is indeed a Gaussian mixture distribution considered in Theorem 5.2. The regularity of the target measure largely gets enhanced through such mollification, especially when the target measure is supported on a low-dimensional manifold in accordance with the data manifold hypothesis. Therefore, although such a diffeomorphism X1(x) may not be well-defined for general bounded target measures, an off-the-shelf solution would be to perturb the target measure with a small Gaussian noise or to employ the early stopping technique. Both approaches will smooth the landscape of the target measure. Gao, Huang, and Jiao Proposition 5.3 Suppose the target measure ν satisfies the log-Sobolev inequality with constant CLS(ν). Then the marginal distribution of the GIF (pt)t [0,1] satisfies the log-Sobolev inequality, and its log-Sobolev constant CLS(pt) is bounded as CLS(pt) a2 t + b2 t CLS(ν). Moreover, suppose the target measure ν satisfies the Poincar e inequality with constant CP (ν). Then the marginal distribution of the GIF (pt)t [0,1] satisfies the Poincar e inequality, and its Poincar e constant CP(pt) is bounded as CP(pt) a2 t + b2 t CP(ν). The log-Sobolev and Poincar e inequalities (see Definitions G.1 and G.2) are fundamental tools for establishing convergence guarantees for Langevin Monte Carlo algorithms. From an algorithmic viewpoint, the predictor-corrector algorithm in score-based diffusion models and the corresponding probability flow ODEs essentially combine the ODE numerical solver (performing as the predictor) and the overdamped Langevin diffusion (performing as the corrector) to simulate samples from the marginal distributions (Song et al., 2021b). Proposition 5.3 shows that the marginal distributions all satisfy the log-Sobolev and Poincar e inequalities under mild assumptions on the target distribution. This conclusion suggests that Langevin Monte Carlo algorithms are certified to have convergence guarantees for sampling from the marginal distributions of GIFs. Furthermore, the target distributions covered in Assumption 2.2 are shown to satisfy the log-Sobolev and Poincar e inequalities (Mikulincer and Shenfeld, 2021; Dai et al., 2023; Fathi et al., 2023), which suggests that the assumptions of Proposition 5.3 generally hold. 6. Applications to generative modeling Auto-encoding is a primary principle in learning a latent representation with generative models (Goodfellow et al., 2016, Chapter 14). Meanwhile, the concept of cycle consistency is important to unpaired image-to-image translation between the source and target domains (Zhu et al., 2017). The recent work by Su et al. (2023) propose the dual diffusion implicit bridges (DDIB) for image-to-image translation, which shows a strong pattern of exact auto-encoding and image-to-image translation. DDIBs are built upon the denoising diffusion implicit models (DDIM), which share the same probability flow ODE with VESDE (considered as VE interpolant in Table 1), as pointed out by (Song et al., 2021a, Proposition 1). First, DDIBs attain latent embeddings of source images encoded with one DDIM operating in the source domain. The encoding embeddings are then decoded using another DDIM trained in the target domain to construct target images. The whole process consisting of two DDIMs seems to be cycle consistent up to numerical errors. Several phenomena of auto-encoding and cycle consistency are observed in the unpaired data generation procedure with DDIBs. We replicate the two-dimensional experiments by Su et al. (2023) in Figures 3 and 4 to show the phenomena of approximate auto-encoding and cycle consistency of GIFs1. To elucidate the empirical auto-encoding and cycle consistency for measure transport, we derive 1. The implementation is based on the Git Hub repository at https://github.com/suxuann/ddib. Gaussian Interpolation Flows Figure 3: An illustration of auto-encoding using DDIBs. The Concentric Rings data in the source domain (the first panel) is encoded into the latent domain (the second panel), and then decoded into the source domain (the third panel). According to the consistent color pattern and pointwise correspondences across the domains, both the learned encoder mapping and the learned decoder mapping exhibit approximate Lipschitz continuity with respect to the space variable. One justification of such auto-encoding observation is presented in Corollary 6.1 where we prove that the composition of the encoder map and the decoder map yields an identity map. Corollaries 6.1 and 6.2 below and analyze the transport maps defined by GIFs (covering the probability flow ODE of VESDE used by DDIBs). We consider the continuous-time framework and the population level, which precludes learning errors including the time discretization errors and velocity field estimation errors, and show that the transport maps naturally possess the exact auto-encoding and cycle consistency properties at the population level. Corollary 6.1 (Auto-encoding) Suppose Assumptions 2.1 and 2.2-(i), (iii), or (iv) hold for a target measure ν. The Gaussian interpolation flow (Xt)t [0,1] and its time-reversed flow (X t )t [0,1] form an auto-encoder with a Lipschitz encoder X 1(x) and a Lipschitz decoder X1(x). The auto-encoding property holds in the sense that X1 X 1 = Id. (6.1) Corollary 6.2 (Cycle consistency) Suppose Assumptions 2.1 and 2.2-(i), (iii), or (iv) hold for the target measures ν1 and ν2. For the target measure ν1, we define the Gaussian interpolation flow (X1,t)t [0,1] and its time-reversed flow (X 1,t)t [0,1]. We also define the Gaussian interpolation flow (X2,t)t [0,1] and its time-reversed flow (X 2,t)t [0,1] for the target measure ν2 using the same at and bt. Then the transport maps X1,1(x), X 1,1(x), X2,1(x), and X 2,1(x) are Lipschitz continuous in the space variable x. Furthermore, the cycle consistency property holds in the sense that X1,1 X 2,1 X2,1 X 1,1 = Id. (6.2) Corollaries 6.1 and 6.2 show that the auto-encoding and cycle consistency properties hold for the flows at the population level. These results provide insights to the approximate auto-encoding and cycle consistency properties at the sample level. Gao, Huang, and Jiao Figure 4: An illustration of cycle consistency using DDIBs. The cycle consistency property is manifested through the consistency of color patterns across the transformations. We transform the Moons data in the source domain onto the Concentric Squares data in the target domain, and then complete the cycle by mapping the target data back to the source domain. The latent spaces play a central role in the bidirectional translation. We provide a proof in Corollary 6.2 accounting for the cycle consistency property. There are several types of errors introduced in the training of GIFs. On the one hand, the approximation in specifying source measures would exert influence on modeling the distribution. On the other hand, the approximation in the velocity field also affects the distribution learning error. We use the stability analysis method in the differential equations theory to address the potential effects of these errors. Corollary 6.3 Suppose Assumptions 2.1 and 2.2-(i), (iii), or (iv) hold. It holds that C1 := sup x Rd x X1(x) 2,2 < , C2 := sup (t,x) [0,1] Rd xv(t, x) 2,2 < . Proposition 6.1 (Stability in the source distribution) Suppose Assumptions 2.1 and 2.2-(i), (iii), or (iv) hold. If the source measure µ = Law(a0Z + b0X1) is replaced with the Gaussian measure γd,a2 0, then the stability of the transport map X1 is guaranteed by the W2 distance between the push-forward measure X1#γd,a2 0 and the target measure ν = Law(X1) as follows W2(X1#γd,a2 0, ν) C1b0 p Eν[ X1 2] exp(C2d). (6.3) The stability analysis in Proposition 6.1 provides insights into the selection of source measures for learning probability flow ODEs and GIFs. The error bound (6.3) demonstrates that when the signal intensity is reasonably small in the source measure, that is, b0 1, the Gaussian Interpolation Flows distribution estimation error, induced by the approximation with a Gaussian source measure, is small as well in the sense of the quadratic Wasserstein distance. Using a Gaussian source measure to replace the true convolution source measure is a common approximation method for learning probability flow ODEs and GIFs. Our analysis shows this replacement is reasonable for the purpose of distribution estimation. The Alekseev-Gr obner formula and its stochastic variants (Del Moral and Singh, 2022) have been shown effective in quantifying the stability of well-posed ODE and SDE flows against perturbations of its velocity field or drift (Bortoli, 2022; Benton et al., 2023). We state these results below for convenience. Lemma 6.1 (Hairer et al., 1993, Theorem 14.5) Let (Xt)t [0,1] and (Yt)t [0,1] solve the following IVPs, respectively dt = v(t, Xt), X0 = x0, t [0, 1], dt = v(t, Yt), Y0 = x0, t [0, 1], where v(t, x) : [0, 1] Rd Rd and v(t, x) : [0, 1] Rd Rd are the velocity fields. (i) Suppose that v is of class C1 in x. Then the Alekseev-Gr obner formula for the difference Xt(x0) Yt(x0) is given by Xt(x0) Yt(x0) = Z t 0 ( x Xs,t)(Ys(x0)) (v(s, Ys(x0)) v(s, Ys(x0))) ds (6.4) where x Xs,t(x) satisfies the variational equation t( x Xs,t(x)) = ( xv)(t, Xs,t(x)) x Xs,t(x), x Xs,s(x) = Id. (6.5) (ii) Suppose that v is of class C1 in x. Then the Alekseev-Gr obner formula for the difference Yt(x0) Xt(x0) is given by Yt(x0) Xt(x0) = Z t 0 ( x Ys,t)(Xs(x0)) ( v(s, Xs(x0)) v(s, Xs(x0))) ds (6.6) where x Ys,t(x) satisfies the variational equation t( x Ys,t(x)) = ( x v)(t, Ys,t(x)) x Ys,t(x), x Ys,s(x) = Id. (6.7) Exploiting the Alekseev-Gr obner formulas in Lemma 6.1 and uniform Lipschitz properties of the velocity field, we deduce two error bounds in terms of the quadratic Wasserstein (W2) distance to show the stability of the ODE flow when the velocity field is not accurate. Proposition 6.2 (Stability in the velocity field) Suppose Assumptions 2.1 and 2.2 hold. Let qt denote the density function of Yt#µ. Gao, Huang, and Jiao (i) Suppose that Z 1 Rd v(t, x) v(t, x) 2 qt(x)dxdt ε. (6.8) W 2 2 (Y1#µ, ν) ε Z 1 0 exp 2 Z 1 s θudu ds. (6.9) (ii) Suppose that sup (t,x) [0,1] Rd x v(t, x) 2,2 C3. W 2 2 (Y1#µ, ν) exp(2C3) 1 Rd v(t, x) v(t, x) 2pt(x)dxdt. (6.10) Proposition 6.2 provides a stability analysis against the estimation error of the velocity field using the W2 distance. The estimation error originates from the flow matching or score matching procedures and the approximation error rising from using deep neural networks in estimating the velocity field or the score function. These two W2 bounds imply that the distribution estimation error is controlled by the L2 estimation error of flow matching and score matching. Indeed, this point justifies the soundness of the approximation method through flow matching and score matching. The first W2 bound (6.9) relies on the L2 control (6.8) of the perturbation error of the velocity field. The second W2 bound (6.10) is slightly better than that provided in (Albergo and Vanden-Eijnden, 2023, Proposition 3) but still has exponential dependence on the Lipschitz constant of v(t, x). 0.05 0.10 0.15 0.20 0.25 b0 W2 distance Figure 5: An approximately linear relation between b0 and the Wasserstein-2 distance. To demonstrate the bounds presented in Propositions 6.1 and 6.2, we conducted further experiments with a mixture of eight two-dimensional Gaussian distributions. These Gaussian Interpolation Flows propositions provide bounds for the stability of the flow when subjected to perturbations in either the source distribution or the velocity field. Let the target distribution be the following two-dimensional Gaussian mixture j=1 φ(x; µj, Σj), where φ(x; µj, Σj) is the probability density function for the Gaussian distribution with mean µj = 12(sin(2(j 1)π/8), cos(2(j 1)π/8)) and covariance matrix Σj = 0.032I2 for j = 1, , 8. For Gaussian mixtures, the velocity field has an explicit formula, which facilitates the perturbation analysis. To illustrate the bound in Proposition 6.1, we consider a perturbation of the source distribution for the following model: Xt = at Z + bt X with at = 1 t + ζ 1 + ζ , bt = t + ζ where ζ [0, 0.3] is a value controlling the perturbation level. It is easy to see a0 = 1/(1 + ζ), b0 = ζ/(1 + ζ). Thus, the source distribution Law(a0Z + b0X) is a mixture of Gaussian distributions. Practically, we can use a Gaussian distribution γ2,a2 0 to replace this source distribution. In Proposition 6.1, we bound the error between the distributions of generated samples due to the replacement, that is, W2(X1#γd,a2 0, ν) Cb0, where C is a constant. We illustrate this theoretical bound using the mixture of Gaussian distributions and the Gaussian interpolation flow given above. We consider a mesh for the variable ζ and plot the curve for b0 and W2(X1#γd,a2 0, ν) in Figure 5. Through Figure 5, an approximate linear relation between b0 and W2(X1#γd,a2 0, ν) is observed, which supports the results of Proposition 6.1. We now consider perturbing the velocity field vt by adding random noise. Let ϵ [0.5, 5.5]. The random noise is generated using a Bernoulli random variable supported on { ϵ, ϵ}. Let vt denote the perturbed velocity field. Then we can compute vt := vt vt 2 = 2ϵ2. We use the velocity field vt and the perturbed velocity field t to generate samples and compute the squared Wasserstein-2 distance between the sample distributions. According to Proposition 6.2, the squared Wasserstein-2 distance should be linearly upper bounded as O( vt), that is, W 2 2 (Y1#µ, ν) C Z 1 R2 ϵ2pt(x)dxdt = Cϵ2, where C is a constant. This theoretical insight is illustrated in Figure 6, where a linear relationship between these two variables is observed. Gao, Huang, and Jiao 0 10 20 30 40 50 60 vt Squared W2 distance Figure 6: A linear relation between vt and the squared Wasserstein-2 distance. 7. Related work GIFs and the induced transport maps are related to CNFs and score-based diffusion models. Mathematically, they interrelate with the literature on Lipschitz mass transport and Wasserstein gradient flows. A central question in developing the ODE flow or transport map method for generative modeling is how to construct an ODE flow or transport map that are sufficiently smooth and enable efficient computation. Various approaches have been proposed to answer the question. CNFs construct invertible mappings between an isotropic Gaussian distribution and a complex target distribution (Chen et al., 2018; Grathwohl et al., 2019). They fall within the broader framework of neural ODEs (Chen et al., 2018; Ruiz-Balet and Zuazua, 2023). A major challenge for CNFs is designing a time-dependent ODE flow whose marginal distribution converges to the target distribution while allowing for efficient estimation of its velocity field. Previous work has explored several principles to construct such flows, including optimal transport, Wasserstein gradient flows, and diffusion processes. Additionally, Gaussian denoising has emerged as an effective principle for constructing simulation-free CNFs in generative modeling. Liu et al. (2023) propose the rectified flow, which is based on a linear interpolation between a standard Gaussian distribution and the target distribution, mimicking the Gaussian denoising procedure. Albergo and Vanden-Eijnden (2023) study a similar formulation called stochastic interpolation, defining a trigonometric interpolant between a standard Gaussian distribution and the target distribution. Albergo et al. (2023b) extend this idea by proposing a stochastic bridge interpolant between two arbitrary distributions. Under a few regularity assumptions, the velocity field of the ODE flow modeling the stochastic bridge interpolant is proven to be continuous in the time variable and smooth in the space variable. Gaussian Interpolation Flows Lipman et al. (2023) introduce a nonlinear least squares method called flow matching to directly estimate the velocity field of probability flow ODEs. All of these models are encompassed within the framework of simulation-free CNFs, which have been the focus of numerous ongoing research efforts (Neklyudov et al., 2023; Tong et al., 2023; Chen and Lipman, 2023; Albergo et al., 2023b; Shaul et al., 2023; Pooladian et al., 2023; Albergo et al., 2023a,c). Furthermore, Marzouk et al. (2023) provide the first statistical convergence rate for the simulation-based method by placing neural ODEs within the nonparametric estimation framework. Score-based diffusion models integrate the time reversal of stochastic differential equations (SDEs) with the score matching technique (Sohl-Dickstein et al., 2015; Song and Ermon, 2019; Ho et al., 2020; Song and Ermon, 2020; Song et al., 2021b,a; De Bortoli et al., 2021). These models are capable of modeling highly complex probability distributions and have achieved state-of-the-art performance in image synthesis tasks (Dhariwal and Nichol, 2021; Rombach et al., 2022). The probability flow ODEs of diffusion models can be considered as CNFs, whose velocity field incorporates the nonlinear score function (Song et al., 2021b; Karras et al., 2022; Lu et al., 2022b,a; Zheng et al., 2023). In addition to the score matching method, Lu et al. (2022a) and Zheng et al. (2023) explore maximum likelihood estimation for probability flow ODEs. However, the regularity of these probability flow ODEs has not been studied and their well-posedness properties remain to be established. A key concept in defining measure transport is Lipschitz mass transport, where the transport maps are required to be Lipschitz continuous. This ensures the smoothness and stability of the measure transport. There is a substantial body of research on the Lipschitz properties of transport maps. The celebrated Caffarelli s contraction theorem (Caffarelli, 2000, Theorem 2) establishes the Lipschitz continuity of optimal transport maps that push the standard Gaussian measure onto a log-concave measure. Colombo et al. (2017) study a Lipschitz transport map between perturbations of log-concave measures using optimal transport theory. Mikulincer and Shenfeld (2021) demonstrate that the Brownian transport map, defined by the F ollmer process, is Lipschitz continuous when it pushes forward the Wiener measure on the Wiener space to the target measure on the Euclidean space. Additionally, Neeman (2022) and Mikulincer and Shenfeld (2023) prove that the transport map along the reverse heat flow of certain target measures is Lipschitz continuous. Beyond studying Lipschitz transport maps, significant effort has been devoted to applying optimal transport theory in generative modeling. Zhang et al. (2018) propose the Monge-Ampe re flow for generative modeling by solving the linearized Monge-Ampe re equation. Optimal transport theory has been utilized as a general principle to regularize the training of continuous normalizing flows or generators for generative modeling (Finlay et al., 2020; Yang and Karniadakis, 2020; Onken et al., 2021; Makkuva et al., 2020). Liang (2021) leverage the regularity theory of optimal transport to formalize the generator-discriminatorpair regularization of GANs under a minimax rate framework. In our work, we study the Lipschitz transport maps defined by GIFs, which differ from the optimal transport map. GIFs naturally fit within the framework of continuous normalizing flows, and their flow mappings are examined from the perspective of Lipschitz mass transport. Gao, Huang, and Jiao Wasserstein gradient flows offer another principled approach to constructing ODE flows for generative modeling. A Wasserstein gradient flow is derived from the gradient descent minimization of a certain energy functional over probability measures endowed with the quadratic Wasserstein metric (Ambrosio et al., 2008). The Eulerian formulation of Wasserstein gradient flows produces the continuity equations that govern the evolution of marginal distributions. After transferred into a Lagrangian formulation, Wasserstein gradient flows define ODE flows that have been widely explored for generative modeling (Johnson and Zhang, 2018; Gao et al., 2019; Liutkus et al., 2019; Johnson and Zhang, 2019; Arbel et al., 2019; Mroueh et al., 2019; Ansari et al., 2021; Mroueh and Nguyen, 2021; Fan et al., 2022; Gao et al., 2022; Duncan et al., 2023; Xu et al., 2022). Wasserstein gradient flows are shown to be connected with the forward process of diffusion models. The variance preserving SDE of diffusion models is equivalent to the Langevin dynamics towards the standard Gaussian distribution that can be interpreted as a Wasserstein gradient flow of the Kullback Leibler divergence for a standard Gaussian distribution (Song et al., 2021b). In the meantime, the probability flow ODE of the variance preserving SDE conforms to the Eulerian formulation of this Wasserstein gradient flow. However, when assigning a general distribution instead of the standard Gaussian distribution, it remains unclear whether the ODE formulation of Wasserstein gradient flows possesses well-posedness. The main contribution of our work lies in establishing the theoretical properties of GIFs and their associated flow maps in a unified way. Our theoretical results encompass the Lipschitz continuity of both the flow velocity field and the flow map, addressing the existence, uniqueness, and stability of the flow. We also demonstrate that both the flow map and its inverse possess Lipschitz properties. Our proposed framework for Gaussian interpolation flow builds upon previous research on probability flow methods in diffusion models (Song et al., 2021b,a) and stochastic interpolation methods for generative modeling (Liu et al., 2023; Albergo and Vanden-Eijnden, 2023; Lipman et al., 2023). Rather than adopting a methodological perspective, we focus on elucidating the theoretical aspects of these flows from a unified standpoint, thereby enhancing the understanding of various methodological approaches. Our theoretical results are derived from geometric considerations of the target distribution and from analytic calculations that exploit the Gaussian denoising property. 8. Conclusions and discussion Gaussian denoising as a framework for constructing continuous normalizing flows holds great promise in generative modeling. Through a unified framework and rigorous analysis, we have established the well-posedness of these flows, shedding light on their capabilities and limitations. We have examined the Lipschitz regularity of the corresponding flow maps for several rich classes of probability measures. When applied to generative modeling based on Gaussian denoising, we have shown that GIFs possess auto-encoding and cycle consistency properties at the population level. Additionally, we have established stability error bounds for the errors accumulated during the process of learning GIFs. The regularity properties of the velocity field established in this paper provide a solid theoretical basis for end-to-end error analyses of learning GIFs using deep neural networks with empirical data. Another potential application is to perform rigorous analyses of consis- Gaussian Interpolation Flows tency models, a nascent family of ODE-based deep generative models designed for one-step generation (Song et al., 2023; Kim et al., 2023; Song and Dhariwal, 2023). We intend to investigate these intriguing problems in our subsequent work. We expect that our analytical results will facilitate further studies and advancements in applying simulation-free CNFs, including GIFs, to a diverse range of generative modeling tasks. Acknowledgements The authors wish to thank the action editor and reviewers for their valuable and constructive comments, which have significantly improved the quality of this paper. The work of J. Huang is supported by the research grants (1-BDCC, 4-ZZ4B, 1-WZ3P, and 4-ZZPN) from The Hong Kong Polytechnic University and the National Natural Science Foundation of China (Grant No. 72331005). The work of Y. Jiao is supported by the National Nature Science Foundation of China (Grant No.12371441), the Fundamental Research Funds for the Central Universities , and the research fund of KLATASDSMOE of China. Gao, Huang, and Jiao In the appendices, we prove the results stated in the paper and provide necessary technical details and discussions. Appendix A. Proofs of Theorem 3.1 and Lemma 4.1 Dynamical properties of Gaussian interpolation flow (Xt)t [0,1] form the cornerstone of the measure interpolation method. Following Albergo and Vanden-Eijnden (2023); Albergo et al. (2023b), we leverage an argument of characteristic functions to quantify the dynamics of its marginal flow, and in result, to prove Theorem 3.1. Proof [Proof of Theorem 3.1] Let ω Rd. For the Gaussian stochastic interpolation (Xt)t [0,1], we define the characteristic function of Xt by Ψ(t, ω) := E[exp(i ω, Xt )] = E[exp(i ω, at Z + bt X1 )] = E[exp(iat ω, Z )]E[exp(ibt ω, X1 )], where the last equality is due to the independence of between Z γd and X1 ν. Taking the time derivative of Ψ(t, ω) for t (0, 1), we derive that tΨ(t, ω) = i ω, ψ(t, ω)) where ψ(t, ω) := E[exp(i ω, Xt )( at Z + bt X1)]. We first define v(t, Xt) := E[ at Z + bt X1|Xt]. (A.1) Using the double expectation formula, we deduce that ψ(t, ω) = E[exp(i ω, Xt )E[ at Z + bt X1|Xt]] = E[exp(i ω, Xt )v(t, Xt)]. Applying the inverse Fourier transform to ψ(t, ω), it holds that j(t, x) := (2π) d Z Rd exp( i ω, x )ψ(t, ω)dω = pt(x)v(t, x), where v(t, x) := E[ at Z + bt X1|Xt = x]. Then it further yields that tpt + x j(t, x) = 0, that is, tpt + x (ptv(t, x)) = 0. Next, we study the property of v(t, x) at t = 0 and t = 1. Notice that x = at E[Z|Xt = x] + bt E[X1|Xt = x]. (A.2) Combining Eq. (A.1) and (A.2), it implies that v(t, x) = at at x + bt at at bt E[X1|Xt = x], t (0, 1). (A.3) Gaussian Interpolation Flows According to Tweedie s formula in Lemma G.1, it holds that s(t, x) = bt a2 t E [X1|Xt = x] 1 a2 t x, t (0, 1), (A.4) where s(t, x) is the score function of the marginal distribution of Xt pt. Combining Eq. (A.3), (A.4), it holds that the velocity field is a gradient field and its nonlinear term is the score function s(t, x), namely, for any t (0, 1), v(t, x) = bt bt x + bt bt a2 t atat s(t, x). (A.5) By the regularity properties that at, bt C2([0, 1)), a2 t C1([0, 1]), bt C1([0, 1]), we have that a0, b0, a1a1, and b1 are well-defined. Then by Eq. (A.3), we define that v(0, x) := lim t 0 v(t, x) = a0 a0 x + b0 a0 a0 b0 E[X1|X0 = x] Using Eq. (A.5) yields that v(1, x) := lim t 1 v(t, x) = b1 b1 x a1a1s(1, x). (A.6) This completes the proof. Lemma 4.1 presents several standard properties of Gaussian channels in information theory (Wibisono and Jog, 2018a,b; Dytso et al., 2023b) that will facilitate our proof. Proof [Proof of Lemma 4.1] By Bayes rule, Law(Y|Xt = x) = p(y|t, x) can be represented as p(y|t, x) = ϕbty,a2 t (x)p1(y)/pt(x) = (2π) d/2a d t exp x bty 2 p1(y)/pt(x) = (2π) d/2a d t exp x 2 2a2 t + bt x, y a2 t b2 t y 2 p1(y)/pt(x) = exp bt x, y a2 t b2 t y 2 p1(y) / (2π)d/2ad t exp x 2 Let θ = btx a2 t , h(y) = p1(y) exp( b2 t y 2 2a2 t ), and the logarithmic partition function A(θ) = log Z Rd h(y) exp( y, θ )dy, then by the definition of exponential family distributions, we conclude that p(y|t, x) = h(y) exp( y, θ A(θ)) is an exponential family distribution of y. By simple calculation, it follows that 2 x log p(y|t, x) = b2 t a4 t 2 θA(θ). Gao, Huang, and Jiao For an exponential family distribution, a basic equality shows that 2 θA(θ) = Cov(Y|Xt = x), which further yields that 2 x log p(y|t, x) = b2 t a4 t Cov(Y|Xt = x). Appendix B. Auxiliary lemmas for Lipschitz flow maps The following lemma, due to G. Peano (Hartman, 2002a, Theorem 3.1), describes several meaningful differential equations associated with well-posed flows and supports the derivation of Lipschitz continuity of their flow maps. Lemma B.1 (Ambrosio et al., 2023, Lemma 3.4) Suppose that a flow (Xt)t [0,1] is wellposed and its velocity field v(t, x) : [0, 1] Rd Rd is of class C1. Then the flow map Xs,t : Rd Rd is of class C1 for any 0 s t 1. Fix (s, x) [0, 1] Rd and set the following functions defined with t [s, 1] y(t) := x Xs,t(x), J(t) := ( xv)(t, Xs,t(x)), w(t) := det( x Xs,t(x)), b(t) := ( x v)(t, Xs,t(x)) = Tr(J(t)). Then y(t) and w(t) are the unique C1 solutions of the following IVPs y(t) = J(t)y(t), y(s) = Id, (B.1) w(t) = b(t)w(t), w(s) = 1. (B.2) We present an upper bound of the Lipschitz constant of its flow map Xs,t(x) in Lemma 5.1. The upper bound has been deduced in Mikulincer and Shenfeld (2023); Ambrosio et al. (2023); Dai et al. (2023). For completeness, we derive it as a direct implication of Eq. (B.1) in Lemma B.1 and an upper bound of the Jacobian matrix of the velocity field. Proof [Proof of Lemma 5.1] Let y(u) = x Xs,u(x), J(u) = ( xv)(u, Xs,u(x)). Owing to Lemma B.1, y(u) is of class C1, and the function u 7 y(u) 2,2 is absolutely continuous over [s, t]. By Lemma B.1, it follows that u y(u) 2 2,2 = 2 y(u), y(u) = 2 y(u), J(u)y(u) 2θu y(u) 2 2,2. Applying Gr onwall s inequality yields that y(t) 2,2 exp( R t s θudu) which concludes the proof. Another result is concerning the theorem of instantaneous change of variables that is widely deployed in studying neural ODEs (Chen et al., 2018, Theorem 1). We also exploit the instantaneous change of variables to prove Proposition 6.1. To make the proof selfcontained, we show that the instantaneous change of variables directly follows Eq. (B.2) in Lemma B.1. Compared with the original proof in (Chen et al., 2018, Theorem 1), we illustrate that the well-posedness of a flow is sufficient to ensure the instantaneous change of variables property, without a boundedness condition on the flow. Gaussian Interpolation Flows Corollary B.1 (Instantaneous change of variables) Suppose that a flow (Xt)t [0,1] is well-posed with a velocity field v(t, x) : [0, 1] Rd Rd of class C1 in x. Let X0(x) π0(X0(x)) be a distribution of the initial value. Then the law of Xt(x) satisfies the following differential equation t log πt(Xt(x)) = Tr(( xv)(t, Xt(x))). Proof Let δ(t) := det( x Xt(x)). Thanks to Eq. (B.2) in Lemma B.1, it holds that δ(t) = Tr(( xv)(t, Xt(x)))δ(t), δ(0) = 1, which implies δ(t) > 0 for t [0, 1]. Notice that log πt(Xt(x)) = log π0(X0(x)) log |δ(t)| by change of variables. Then it follows that t log πt(Xt(x)) = Tr(( xv)(t, Xt(x))). Appendix C. Proofs of spatial Lipschitz estimates for the velocity field The main results in Section 4 are proved in this appendix. We first present some ancillary lemmas before proceeding to give the proofs. Lemma C.1 (Fathi et al., 2023) Suppose that f : Rd R+ is L-log-Lipschitz for some L 0. Let Pt be the Ornstein Uhlenbeck semigroup defined by Pth(x) := EZ γd[h(e tx + 1 e 2t Z)] for any h C(Rd) and t 0. Then it holds that n 5Le t(L + t 1 2 ) L2e 2to Id 2 x log Ptf(x) n 5Le t(L + t 1 Proof This is a restatement of known results. See Proposition 2, Proposition 6, Theorem 6, and their proofs in Fathi et al. (2023). Corollary C.1 Suppose that f : Rd R+ is L-log-Lipschitz for some L 0. Let Qt be an operator defined by Qth(x) := EZ γd[h(βtx + αt Z)] (C.1) for any h C(Rd) and t [0, 1] where 0 αt 1, βt 0 for any t [0, 1]. Then it holds that At L2β2 t Id 2 x log Qtf(x) At Id, where At := 5Lβ2 t (1 α2 t ) 1 2 log(1 α2 t )) 1 Proof It is easy to notice that Qtf(x) = Psf(βtesx) where s = 1 2 log(1 α2 t ). Then it follows that 2 x log Qtf(x) = (βtes)2( 2 x log Psf)(βtesx) which yields At L2β2 t Id 2 x log Qtf(x) At Id, where At := 5Lβ2 t (1 α2 t ) 1 2 log(1 α2 t )) 1 Gao, Huang, and Jiao Lemma C.2 The Jacobian matrix of the velocity field (3.5) has an alternative expression over time t (0, 1), that is, xv(t, x) = bt bt a2 t atat 2 x log e Qtf(x) 1 a2 t +b2 t Id + bt bt Id, where f(x) := dν dγd (x) and e Qtf(x) := EZ γd[f( bt a2 t +b2 t x + at a2 t +b2 t Z)]. Proof By direct calculations, it holds that pt(x) = a d t Rd p1(y)ϕ x bty Rd f(y)ϕ(y)ϕ x bty = a d t ϕ (a2 t + b2 t ) 1 a2 t + b2 t ! 1 y bt a2 t + b2 t x = a d t ϕ (a2 t + b2 t ) 1 a2 t + b2 t bt a2 t + b2 t x + at p a2 t + b2 t z = (a2 t + b2 t ) d/2ϕ (a2 t + b2 t ) 1 2 x e Qtf(x). Taking the logarithm and then the second-order derivative of the equation above, it yields xs(t, x) = 2 x log e Qtf(x) 1 a2 t +b2 t Id. Recalling that xv(t, x) = bt bt a2 t atat xs(t, x) + bt bt Id, it further yields that xv(t, x) = bt bt a2 t atat 2 x log e Qtf(x) + atat+ btbt a2 t +b2 t Id, which completes the proof. Corollary C.2 Suppose that f(x) := dν dγd (x) is L-log-Lipschitz for some L 0. Then for t (0, 1), it holds that bt bt a2 t atat Bt L2 bt a2 t +b2 t 2 + atat+ btbt xv(t, x) n bt bt a2 t atat Bt + atat+ btbt where Bt := 5Lbt(a2 t + b2 t ) 3 2 (L + (log( p a2 t + b2 t /bt)) 1 Proof Let αt = at a2 t +b2 t and βt = bt a2 t +b2 t . Then these bounds hold according to Corollary C.1 and Lemma C.2. Then we are prepared to prove Proposition 4.1. The proof is mainly based on the techniques for bounding conditional covariance matrices that are developed in a series of work (Wibisono and Jog, 2018a,b; Mikulincer and Shenfeld, 2021, 2023; Chewi and Pooladian, 2022; Dai et al., 2023). Proof [Proof of Proposition 4.1] Gaussian Interpolation Flows (a) By Jung s theorem (Danzer et al., 1963, Theorem 2.6), there exists a closed Euclidean ball with radius less than D := (1/ 2)diam(supp(ν)) that contains supp(ν) in Rd. Then the desired bounds hold due to 0Id Cov(Y|Xt = x) D2Id and Eq. (4.3). (b) Let p1 be β-semi-log-convex for some β > 0 on Rd. Then for any t [0, 1), the conditional distribution p(y|t, x) is β + b2 t a2 t -semi-log-convex because 2 y log p(y|t, x) = 2 y log p1(y) 2 y log p(t, x|y) β + b2 t a2 t By the Cram er-Rao inequality (2.4), we obtain Cov(Y|Xt = x) β + b2 t a2 t Therefore, by Eq. (4.3), we obtain ! b2 t βa2 t + b2 t + at which implies xv(t, x) βat at + bt bt βa2 t + b2 t Id. In addition, the bound above can be verified at time t = 1 by the definition (A.6). (c) Let p1 be κ-semi-log-concave for some κ R. Then for any t [0, 1), the conditional distribution p(y|t, x) is κ + b2 t a2 t -semi-log-concave because 2 y log p(y|t, x) = 2 y log p1(y) 2 y log p(t, x|y) κ + b2 t a2 t When t n t : κ + b2 t a2 t > 0, t (0, 1) o , by the Brascamp-Lieb inequality (2.2), we obtain Cov(Y|Xt = x) κ + b2 t a2 t Therefore, by Eq. (4.3), we obtain ! b2 t κa2 t + b2 t + at which implies xv(t, x) κat at + bt bt κa2 t + b2 t Id. Moreover, the bound above can be verified at time t = 1 by the definition (A.6). Gao, Huang, and Jiao (d) Notice that p(y|t, x) = p(t, x|y) pt(x) d(γd,σ2 ρ) Rd ϕz,σ2(y)ϕ x bt , a2 t b2 t (y)ρ(dz), where the prefactor Ax,t only depends on x and t. Then it follows that p(y|t, x) = Z Rd ϕ a2 t z+σ2btx a2 t +σ2b2 t , σ2a2 t a2 t +σ2b2 t (y) ρ(dz) where ρ is a probability measure on Rd whose density function is a multiple of ρ by a positive function. It also indicates that ρ is supported on the same Euclidean ball as ρ. To further illustrate p(y|t, x), let Q ρ and Z γd be independent. Then it holds that a2 t a2 t + σ2b2 t Q + σ2a2 t a2 t + σ2b2 t Z + σ2bt a2 t + σ2b2 t x p(y|t, x). Thus, it holds that Cov(Y|Xt = x) = a2 t a2 t + σ2b2 t 2 Cov(Q) + σ2a2 t a2 t + σ2b2 t Id ( a2 t a2 t + σ2b2 t 2 R2 + σ2a2 t a2 t + σ2b2 t By Eq. (4.3), it holds that xv(t, x) b2 t a2 t ! a2 t a2 t + σ2b2 t 2 R2 + σ2a2 t a2 t + σ2b2 t which implies ( atbt(at bt atbt) (a2 t + σ2b2 t )2 R2 + atat + σ2 btbt a2 t + σ2b2 t Analogously, due to Cov(Q) 0Id, a lower bound would be yielded as follows xv(t, x) atat + σ2 btbt a2 t + σ2b2 t Id. Then the results follow by combining the upper and lower bounds. (e) The result follows from Corollary C.2. Gaussian Interpolation Flows We complete the proof. Proof [Proof of Corollary 4.1] Let us consider that κ > 0 which is divided into two cases where κD2 1 and κD2 < 1. On the one hand, suppose that the first case κD2 1 holds. By Proposition 4.1, the κ-based upper bound is tighter, that is, λmax( xv(t, x)) θt := κat at + bt bt κa2 t + b2 t . On the other hand, suppose that the second case κD2 < 1 holds. Let t1 be defined in Eq. (4.5). Again, by Proposition 4.1, the D2-based upper bound is tighter over [0, t1) and the κ-based upper bound is tighter over [t1, 1], which is denoted by λmax( xv(t, x)) θt := at , t [0, t1), κat at+bt bt κa2 t +b2 t , t [t1, 1]. This completes the proof. Proof [Proof of Corollary 4.2] Let κ < 0, D < such that κD2 < 1 is fulfilled. Then an argument similar to the proof of Corollary 4.1 yields the desired bounds. Proof [Proof of Corollary 4.3] The result follows from Proposition 4.1-(d). Proof [Proof of Corollary 4.4] The L-based upper and lower bounds in Proposition 4.1-(e) would blow up at time t = 1 because the term (log( p a2 t + b2 t /bt)) 1 2 in Bt goes to as t 1. To ensure the spatial derivative of the velocity field v(t, x) is upper bounded at time t = 1, we additionally require the target measure is κ-semi-log-concave with κ 0. Hence, a κ-based upper bound is available for t (t0, 1] as shown in Proposition 4.1-(c). Next, these two upper bounds are combined by choosing any t2 (t0, 1) first. Then we exploit the L-based bound over [0, t2) and κ-based bound over [t0, 1]. This completes the proof. Appendix D. Proofs of well-posedness and Lipschitz flow maps The proofs of main results in Section 5 are offered in the following. Before proceeding, let us introduce some definitions and notations about function spaces that are collected in (Evans, 2010, Chapter 5). Let L1 loc(Rd; Rℓ) := {locally integrable function u : Rd Rℓ}. For integers k 0 and 1 p , we define the Sobolev space W k,p(Rd) := {u L1 loc(Rd)|Dαu exists and Dαu Lp(Rd) for |α| k}, where Dαu is the weak derivative of u. Then the local Sobolev space W k,p loc (Rd) is defined as the function space such that for any u W k,p loc (Rd) and any compact set Ω Rd, u W k,p(Ω). As a result, we denote the vectorvalued local Sobolev space by W k,p loc (Rd; Rℓ). Provided that v(t, x) : [0, 1] Rd Rd, we use v L1([0, 1]; W 1, loc (Rd; Rd)) to indicate that v has a finite L1 norm over (t, x) [0, 1] Rd Gao, Huang, and Jiao and v(t, ) W 1, loc (Rd; Rd) for any t [0, 1]. Similarly, we say v L1([0, 1]; L (Rd; Rd)) when v has a finite L1 norm over (t, x) [0, 1] Rd and v(t, ) L (Rd; Rd) for every t [0, 1]. We will use the definitions and notations in the following proof. Proof [Proof of Theorem 5.1] Under Assumptions 1 and 2, we claim that the velocity field v(t, x) satisfies v L1([0, 1]; W 1, loc (Rd; Rd)), v 2 1 + x 2 L1([0, 1]; L (Rd; Rd)). where the first condition indicates the velocity field v is locally bounded and locally Lipschitz continuous in x, and the second condition is a growth condition on v. According to the Cauchy-Lipschitz theorem (Ambrosio and Crippa, 2014, Remark 2.4), we have the representation formulae for solutions of the continuity equation. As a result, there exists a flow (Xt)t [0,1] uniquely solves the IVP (3.10). Furthermore, the marginal flow of (Xt)t [0,1] satisfies the continuity equation (3.4) in the weak sense. Then it remains to show the velocity field v is locally bounded and locally Lipschitz continuous in x, and satisfies the growth condition. By the lower and upper bounds given in Proposition 4.1, we know that v is globally Lipschitz continuous in x under Assumptions 1 and 2. Indeed, the global Lipschitz continuity leads to local boundedness and linear growth properties by simple arguments. More concretely, for any t (0, 1), it holds that v(t, 0) = bt at E[X1|Xt = 0] = bt at Rd yp(y|t, 0)dy Rd yp1(y)a d t exp b2 t y 2 2 2a2 t which implies v(t, 0) 2 < due to fast growth of the exponential function. Besides, it holds that v(0, 0) = ( b0 a0 a0 b0)E[X1|X0 = x] < , v(1, 0) = a1a1s(1, 0) < . Then by the boundedness of v(t, 0) 2 and the global Lipschitz continuity in x over t [0, 1], we bound v(t, x) as follows v(t, x) 2 v(t, 0) 2 + v(t, x) v(t, 0) 2 v(t, 0) 2 + sup (t,y) [0,1] Rd yv(t, y) 2,2 max{ x 2, 1}. Hence, the local boundedness and linear growth properties of v are proved. This completes the proof. Proof [Proof of Theorem 5.2] The proof is similar to that of Theorem 5.1. Proof [Proof of Corollary 5.1] A well-posed ODE flow has the time-reversal symmetry (Lamb and Roberts, 1998). By Theorem 5.1, the desired results are proved. Gaussian Interpolation Flows Proof [Proof of Corollary 5.2] The proof is similar to that of Corollary 5.1. Proof [Proof of Proposition 5.1] Combining Proposition 4.1-(b), (c), and Lemma 5.1, we complete the proof. Proof [Proof of Proposition 5.2] Combining Proposition 4.1-(d) and Lemma 5.1, we complete the proof. Proof [Proof of Corollary 6.1] By Theorem 5.1 and Corollary 5.1, it holds that X1 X 1 = X1 X 1 1 = Id. This completes the proof. Proof [Proof of Corollary 6.2] By Theorem 5.1 and Corollary 5.1, it holds that X1,1 X 2,1 X2,1 X 1,1 = X1,1 X 1 2,1 X2,1 X 1 1,1 = Id. This completes the proof. Proof [Proof of Corollary 6.3] Let Assumptions 2.1 and 2.2 hold. According to Propositions 5.1 and 5.2, x X1(x) 2,2 is uniformly bounded for Case (i)-(iii) in Assumption 2.2. For Case (iv), the boundedness of x X1(x) 2,2 holds by combining Corollary 4.4 and Lemma 5.1. Using Proposition 4.1, we know that xv(t, x) 2,2 is uniformly bounded. Proof [Proof of Proposition 5.3] The proof idea is similar to those of (Ball et al., 2003, Proposition 1) and (Cattiaux and Guillin, 2014, Proposition 18). Let f : Ω R be of class C1 and Xt pt. First, we consider the case of log-Sobolev inequalities. Using that Z γd and X1 ν both satisfy the log-Sobolev inequalities in Definition G.1, we have E[(f2 log f2)(Xt)] = E[(f2 log f2)(at Z + bt X1)] Z Z f2(atz + btx)dγd(z) log Z f2(atz + btx)dγd(z) dν(x) + Z 2CLS(γd) Z a2 t ( f 2 2)(atz + btx)dγd(z) dν(x) Z Z f2(atz + btx)dγd(z)dν(x) log Z Z f2(atz + btx)dγd(z)dν(x) + 2CLS(ν) Z x Z f2(atz + btx)dγd(z) 1 + 2a2 t CLS(γd) Z Z ( f 2 2)(atz + btx)dγd(z)dν(x) E[f2(Xt)] log E[f2(Xt)] + 2a2 t CLS(γd)E[ f(Xt) 2 2] + 2CLS(ν) Z x Z f2(atz + btx)dγd(z) 1 Gao, Huang, and Jiao By Jensen s inequality and the Cauchy-Schwartz inequality, it holds that Z x Z f2(atz + btx)dγd(z) 1 R R ( f f 2)(atz + btx)dγd(z) 2 dν(x) R R f2(atz + btx)dγd(z)dν(x) Z Z ( f 2 2)(atz + btx)dγd(z)dν(x) b2 t E[ f(Xt) 2 2]. Hence, combining the equations above and the fact that CLS(γd) 1 (Gross, 1975), it implies that E[(f2 log f2)(Xt)] E[f2(Xt)] log E[f2(Xt)] 2 a2 t + b2 t CLS(ν) E[ f(Xt) 2 2], that is, CLS(pt) a2 t + b2 t CLS(ν). Next, we tackle the case of Poincar e inequalities by similar calculations. Using that Z γd and X1 ν both satisfy the Poincar e inequalities in Definition G.2, we have E[f2(Xt)] = E[f2(at Z + bt X1)] Z Z f(atz + btx)dγd(z) 2 dν(x) + Z CP(γd) Z a2 t ( f 2 2)(atz + btx)dγd(z) dν(x) Z Z f(atz + btx)dγd(z)dν(x) 2 + CP(ν) Z x Z f(atz + btx)dγd(z) 2 + a2 t CP(γd) Z Z ( f 2 2)(atz + btx)dγd(z)dν(x) (E[f(Xt)])2 + a2 t CP(γd) + b2 t CP(ν) E[ f(Xt) 2 2]. Combining the expression above and CP(γd) 1, it implies that E[f2(Xt)] (E[f(Xt)])2 a2 t + b2 t CP(ν) E[ f(Xt) 2 2], that is, CP(pt) a2 t + b2 t CP(ν). This completes the proof. Appendix E. Proofs of the stability results We provide the proofs of the stability results in Section 6. Proof [Proof of Proposition 6.1] Let x0 = a0z + b0x1 and suppose X0(x0) µ, X0(a0z) γd,a2 0. According to Corollary 6.3, the Lipschitz property of X1(x) implies that X1(x0) X1(a0z) C1 x0 a0z . We consider an integral defined by It := Z x0 a0z 2dπt(Xt(x0), Xt(a0z)), Gaussian Interpolation Flows where πt is a coupling made of the joint distribution of (Xt(x0), Xt(a0z)). In particular, the initial value I0 is computed by I0 = Z x0 a0z 2p0(x0)ϕ(z)dx0dz = Z b0x1 2p1(x1)dx1 = b2 0Eν[ X1 2]. Since (Xt)t [0,1] is well-posed with X0(x0) µ or X0(a0z) γd,a2 0, according to Corollary B.1, the coupling πt satisfies the following differential equation t log πt(Xt(x0), Xt(a0z)) = Tr(( xv)(t, Xt(x0))) Tr(( xv)(t, Xt(a0z))). (E.1) Taking the derivative of It and using Eq. (E.1), it implies that sup (s,x) [0,1] Rd Tr( xv(s, x)) Thanks to Tr( xv(s, x)) d xv(s, x) 2,2, it follows that dt 2C2d It, I0 = b2 0Eν[ X1 2]. By Gr onwall s inequality, it holds that It b2 0Eν[ X1 2] exp(2C2dt). Therefore, we obtain the following W2 bound W2(X1#γd,a2 0, ν) = W2(X1#γd,a2 0, X1#µ) C1 p Eν[ X1 2] exp(C2d), which completes the proof. Proof [Proof of Proposition 6.2] (i) On the one hand, by Corollary 6.3, v(t, x) is Lipschitz continuous in x uniformly over (t, x) [0, 1] Rd with Lipschitz constant C2. By the variational equation (6.5) and Lemma 5.1, it follows that x Xs,t(x) 2 2,2 exp 2 Z t Due to the equality (6.4), we deduce that X1(x0) Y1(x0) 2 0 ( x Xs,1)(Ys(x0)) 2,2 v(s, Ys(x0)) v(s, Ys(x0)) ds 2 0 ( x Xs,1)(Ys(x0)) 2 2,2ds Z 1 0 v(s, Ys(x0)) v(s, Ys(x0)) 2ds 0 exp 2 Z 1 s θudu ds Z 1 0 v(s, Ys(x0)) v(s, Ys(x0)) 2ds. Gao, Huang, and Jiao Take expectation and it follows that W 2 2 (Y1#µ, ν) Ex0 µ Y1(x0) X1(x0) 2 0 exp 2 Z 1 s θudu ds Z 1 Rd v(t, x) v(t, x) 2 qt(x)dxdt 0 exp 2 Z 1 where qt denotes the density function of Yt#µ, and we use the assumption that Rd v(t, x) v(t, x) 2 qt(x)dxdt ε in the last inequality. (ii) On the other hand, suppose that v(t, x) is Lipschitz continuous in x uniformly over (t, x) [0, 1] Rd with Lipschitz constant C3. Applying Gr onwall s inequality to the variational equation (6.7), it follows that x Ys,t(x) 2 2,2 exp(2C3(t s)). By the equality (6.6), it holds that Y1(x0) X1(x0) 2 0 ( x Ys,1)(Xs(x0)) 2,2 v(s, Xs(x0)) v(s, Xs(x0)) ds 2 0 ( x Ys,1)(Xs(x0)) 2 2,2ds Z 1 0 v(s, Xs(x0)) v(s, Xs(x0)) 2ds 0 v(s, Xs(x0)) v(s, Xs(x0)) 2ds. Taking expectations, it further yields that W 2 2 (Y1#µ, ν) Ex0 µ Y1(x0) X1(x0) 2 Rd v(t, x) v(t, x) 2pt(x)dxdt where Xt(x0) pt. Appendix F. Time derivative of the velocity field In this appendix, we are interested in representing the time derivative of the velocity field via moments of Y|Xt = x. The result is efficacious for controlling the time derivative with moment estimates, though the computation is somehow tedious. Gaussian Interpolation Flows Proposition F.1 The time derivative of the velocity field v(t, x) has an expression with moments of X1|Xt for any t (0, 1) as follows tv(t, x) = at at a2 t a2 t a2 t bt bt atat bt bt atat + a2 t ! bt a2 t M1 + b2 t a2 t Mc 2x b3 t a2 t !2 (M3 M2M1) , where M1 := E[X1|Xt = x], M2 := E[X 1 X1|Xt = x], Mc 2 := Cov(X1|Xt = x), M3 := E[X1X 1 X1|Xt = x]. Proof By direct differentiation, it implies that tv(t, x) = t bt a2 t atat bt a2 t atat = btbt b2 t b2 t x + btbt b2 t b2 t a2 t + bt bt 2 atat atat a2 t bt a2 t atat We first focus on ts(t, x). Since pt satisfies the continuity equation (3.4), it holds that ts(t, x) = x( t log pt(x)) x (pt(x)v(t, x)) ( xpt(x)) v(t, x) + pt(x)( x v(t, x)) = x s(t, x) v(t, x) + x v(t, x) = ( xs(t, x)) v(t, x) + ( xv(t, x)) s(t, x) + x( x v(t, x)) = ( xs(t, x)v(t, x) + xv(t, x)s(t, x) + x Tr( xv(t, x))) . By direct computation, it holds that xs(t, x)v(t, x) + xv(t, x)s(t, x) bt a2 t atat bt a2 t atat = bt bt xs(t, x)x + bt a2 t atat xs(t, x)s(t, x) + bt bt s(t, x) + bt a2 t atat xs(t, x)s(t, x) = bt bt s(t, x) + bt bt xs(t, x)x + 2 bt a2 t atat xs(t, x)s(t, x). Gao, Huang, and Jiao Then we focus on the trace term x Tr( xv(t, x)) ! b2 t a2 t Cov(Y|Xt = x) + at ! b2 t a2 t x Tr(Cov(Y|Xt = x)) ! b2 t a2 t x Z y 2p(y|t, x)dy Z yp(y|t, x)dy ! b2 t a2 t Z y 2 xp(y|t, x)dy 2 Z xp(y|t, x) ydy Z yp(y|t, x)dy , where we notice that xp(y|t, x) = x p(t, x|y)p1(y) = xp(t, x|y)p1(y) pt(x) p(t, x|y)p1(y) pt(x) s(t, x) = p(y|t, x) bty x a2 t s(t, x) . For ease of presentation, we introduce the following notations to denote several moments of Y|Xt = x M1 := E[Y|Xt = x], M2 := E[Y Y|Xt = x], Mc 2 := Cov(Y|Xt = x), M3 := E[YY Y|Xt = x]. By Tweedie s formula in Lemma G.1, it yields s(t, x) = bt a2 t x. By this expression of s(t, x), it yields xs(t, x)v(t, x) + xv(t, x)s(t, x) = bt bt s(t, x) + bt bt xs(t, x)x + 2 bt a2 t atat xs(t, x)s(t, x) a2 t x + bt bt b2 t a4 t Mc 2 1 bt a2 t atat ! b2 t a4 t Mc 2 1 a3 t x + bt M1 + b2 t a4 t Mc 2x + 2 b3 t a4 t Gaussian Interpolation Flows and xp(y|t, x) = bt a2 t (y M1) p(y|t, x). Therefore, we obtain Z y 2 xp(y|t, x)dy 2 Z xp(y|t, x) ydy Z yp(y|t, x)dy a2 t (y M1) p(y|t, x)dy 2 Z bt a2 t (y M1) yp(y|t, x)dy Z yp(y|t, x)dy Z y 2yp(y|t, x)dy Z y 2p(y|t, x)dy M1 2 Z y yp(y|t, x)dy M1 Z yp(y|t, x)dy Z yp(y|t, x)dy a2 t (M3 M2M1 2Mc 2M1) . Combining the equations above, we obtain tv(t, x) = btbt b2 t b2 t x + btbt b2 t b2 t a2 t + bt bt 2 atat atat a2 t bt a2 t atat a3 t x + bt + b2 t a4 t Mc 2x + 2 b3 t a4 t bt a2 t atat ! b3 t a4 t (M3 M2M1 2Mc 2M1) at a2 t a2 t a2 t bt bt atat bt bt atat + a2 t ! bt a2 t M1 + b2 t a2 t Mc 2x b3 t a2 t !2 (M3 M2M1) . Then we complete the proof. Appendix G. Functional inequalities and Tweedie s formula This appendix is devoted to an exposition of functional inequalities and Tweedie s formula that would assist in our proof. For a probability measure µ on a compact set Ω Rd, we define the variance of a function f L2(Ω, µ) as Varµ(f) := Z Gao, Huang, and Jiao Moreover, for a probability measure µ on a compact set Ω Rd and any positive integrable function f : Ω R such that R Ωf log f dν < , we define the entropy of f as Entµ(f) := Z Ω f log fdµ Z Ω fdµ log Z Definition G.1 (Log-Sobolev inequality) A probability measure µ P(Ω) is said to satisfy a log-Sobolev inequality with constant C > 0, if for all functions f : Ω R, it holds that Entµ(f2) 2C Z The best constant C > 0 for which such an inequality holds is referred to as the log-Sobolev constant CLS(µ). Definition G.2 (Poincar e inequality) A probability measure µ P(Ω) is said to satisfy a Poincar e inequality with constant C > 0, if for all functions f : Ω R, it holds that Varµ(f) C Z The best constant C > 0 for which such an inequality holds is referred to as the Poincar e constant CP(µ). Finally, for ease of reference, we present Tweedie s formula that was first reported in Robbins (1956), and then was used as a simple empirical Bayes approach for correcting selection bias (Efron, 2011). Here, we use Tweedie s formula to link the score function with the expectation conditioned on an observation with Gaussian noise. Lemma G.1 (Tweedie s formula) Suppose that X µ and ϵ γd,σ2. Let Y = X+ϵ and p(y) be the marginal density of Y. Then E[X|Y = y] = y + σ2 y log p(y). Michael S. Albergo and Eric Vanden-Eijnden. Building normalizing flows with stochastic interpolants. In The Eleventh International Conference on Learning Representations, 2023. Michael S. Albergo, Nicholas M. Boffi, Michael Lindsey, and Eric Vanden-Eijnden. Multimarginal generative modeling with stochastic interpolants. ar Xiv preprint ar Xiv:2310.03695, 2023a. Michael S. Albergo, Nicholas M. Boffi, and Eric Vanden-Eijnden. Stochastic interpolants: A unifying framework for flows and diffusions. ar Xiv preprint ar Xiv:2303.08797, 2023b. Michael S. Albergo, Mark Goldstein, Nicholas M. Boffi, Rajesh Ranganath, and Eric Vanden-Eijnden. Stochastic interpolants with data-dependent couplings. ar Xiv preprint ar Xiv:2310.03725, 2023c. Gaussian Interpolation Flows Luigi Ambrosio and Gianluca Crippa. Continuity equations and ODE flows with nonsmooth velocity. Proceedings of the Royal Society of Edinburgh Section A: Mathematics, 144(6):1191 1244, 2014. Luigi Ambrosio, Nicola Gigli, and Giuseppe Savar e. Gradient flows: In metric spaces and in the space of probability measures. Springer Science & Business Media, 2008. Luigi Ambrosio, Sebastiano N. Golo, and Francesco S. Cassano. Classical flows of vector fields with exponential or sub-exponential summability. Journal of Differential Equations, 372:458 504, 2023. Abdul Fatir Ansari, Ming Liang Ang, and Harold Soh. Refining deep generative models via discriminator gradient flow. In International Conference on Learning Representations, 2021. Michael Arbel, Anna Korba, Adil Salim, and Arthur Gretton. Maximum mean discrepancy gradient flow. In Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. Martin Arjovsky, Soumith Chintala, and L eon Bottou. Wasserstein generative adversarial networks. In Proceedings of the 34th International Conference on Machine Learning, volume 70, pages 214 223. PMLR, 2017. Dominique Bakry and Michel Emery. Diffusions hypercontractives. In Seminaire de probabilit es XIX 1983/84, pages 177 206. Springer, 1985. Dominique Bakry, Ivan Gentil, and Michel Ledoux. Analysis and geometry of Markov diffusion operators, volume 103. Springer, 2014. Keith Ball, Franck Barthe, and Assaf Naor. Entropy jumps in the presence of a spectral gap. Duke Mathematical Journal, 119(1):41 63, 2003. Joe Benton, George Deligiannidis, and Arnaud Doucet. Error bounds for flow matching methods. ar Xiv preprint ar Xiv:2305.16860, 2023. Marin Biloˇs, Johanna Sommer, Syama Sundar Rangapuram, Tim Januschowski, and Stephan G unnemann. Neural flows: Efficient alternative to neural ODEs. In Advances in Neural Information Processing Systems, volume 34, pages 21325 21337. Curran Associates, Inc., 2021. Sergey G. Bobkov and Michel Ledoux. From Brunn-Minkowski to Brascamp-Lieb and to logarithmic Sobolev inequalities. Geometric and Functional Analysis, 10(5):1028 1052, 2000. Valentin De Bortoli. Convergence of denoising diffusion models under the manifold hypothesis. Transactions on Machine Learning Research, 2022. Herm J. Brascamp and Elliott H. Lieb. On extensions of the Brunn-Minkowski and Pr ekopa Leindler theorems, including inequalities for log concave functions, and with an application to the diffusion equation. Journal of Functional Analysis, 22(4):366 389, 1976. Gao, Huang, and Jiao Luis A. Caffarelli. Monotonicity properties of optimal transportation and the FKG and related inequalities. Communications in Mathematical Physics, 214(3):547 563, 2000. Tony T. Cai and Yihong Wu. Optimal detection of sparse mixtures against a given null distribution. IEEE Transactions on Information Theory, 60(4):2217 2232, 2014. Patrick Cattiaux and Arnaud Guillin. Semi log-concave Markov diffusions. In Catherine Donati-Martin, Antoine Lejay, and Alain Rouault, editors, S eminaire de probabilit es XLVI, pages 231 292. Springer International Publishing, Cham, 2014. Hongrui Chen, Holden Lee, and Jianfeng Lu. Improved analysis of score-based generative modeling: User-friendly bounds under minimal smoothness assumptions. In Proceedings of the 40th International Conference on Machine Learning, volume 202, pages 4735 4763. PMLR, 2023a. Ricky T.Q. Chen and Yaron Lipman. Riemannian flow matching on general geometries. ar Xiv preprint ar Xiv:2302.03660, 2023. Ricky T.Q. Chen, Yulia Rubanova, Jesse Bettencourt, and David K. Duvenaud. Neural ordinary differential equations. In Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018. Sitan Chen, Sinho Chewi, Jerry Li, Yuanzhi Li, Adil Salim, and Anru Zhang. Sampling is as easy as learning the score: Theory for diffusion models with minimal data assumptions. In The Eleventh International Conference on Learning Representations, 2023b. Sinho Chewi and Aram-Alexandre Pooladian. An entropic generalization of Caffarelli s contraction theorem via covariance inequalities. ar Xiv preprint ar Xiv:2203.04954, 2022. Maria Colombo, Alessio Figalli, and Yash Jhaveri. Lipschitz changes of variables between perturbations of log-concave measures. Annali della Scuola Normale Superiore di Pisa. Classe di scienze, 17(4):1491 1519, 2017. Dario Cordero-Erausquin. Transport inequalities for log-concave measures, quantitative forms, and applications. Canadian Journal of Mathematics, 69(3):481 501, 2017. Yin Dai, Yuan Gao, Jian Huang, Yuling Jiao, Lican Kang, and Jin Liu. Lipschitz transport maps via the F ollmer flow. ar Xiv preprint ar Xiv:2309.03490, 2023. Ludwig Danzer, Branko Gr unbaum, and Victor Klee. Helly s theorem and its relatives. In Proceedings of Symposia in Pure Mathematics: Convexity, volume VII, pages 101 180, Providence, RI, 1963. American Mathematical Society. Valentin De Bortoli, James Thornton, Jeremy Heng, and Arnaud Doucet. Diffusion Schr odinger bridge with applications to score-based generative modeling. In Advances in Neural Information Processing Systems, volume 34, pages 17695 17709. Curran Associates, Inc., 2021. Pierre Del Moral and Sumeetpal S. Singh. Backward Itˆo Ventzell and stochastic interpolation formulae. Stochastic Processes and their Applications, 154:197 250, 2022. Gaussian Interpolation Flows Prafulla Dhariwal and Alexander Nichol. Diffusion models beat GANs on image synthesis. In Advances in Neural Information Processing Systems, volume 34, pages 8780 8794, 2021. Laurent Dinh, David Krueger, and Yoshua Bengio. NICE: Non-linear independent components estimation. ar Xiv preprint ar Xiv:1410.8516, 2014. Andrew Duncan, Nikolas N usken, and Lukasz Szpruch. On the geometry of Stein variational gradient descent. Journal of Machine Learning Research, 24(56):1 39, 2023. Alex Dytso, Martina Cardone, and Ian Zieder. Meta derivative identity for the conditional expectation. IEEE Transactions on Information Theory, 69(7):4284 4302, 2023a. Alex Dytso, H. Vincent Poor, and Shlomo Shamai Shitz. Conditional mean estimation in Gaussian noise: A meta derivative identity with applications. IEEE Transactions on Information Theory, 69(3):1883 1898, 2023b. Bradley Efron. Tweedie s formula and selection bias. Journal of the American Statistical Association, 106(496):1602 1614, 2011. Ronen Eldan and James R. Lee. Regularization under diffusion and anticoncentration of the information content. Duke Mathematical Journal, 167(5):969 993, 2018. Lawrence C. Evans. Partial differential equations, volume 19 of Graduate studies in mathematics. American Mathematical Society, Providence, Rhode Island, second edition edition, 2010. Jiaojiao Fan, Qinsheng Zhang, Amirhossein Taghvaei, and Yongxin Chen. Variational Wasserstein gradient flow. In Proceedings of the 39th International Conference on Machine Learning, volume 162, pages 6185 6215. PMLR, 2022. Max Fathi, Dan Mikulincer, and Yair Shenfeld. Transportation onto log-Lipschitz perturbations. ar Xiv preprint ar Xiv:2305.03786, 2023. Chris Finlay, J orn-Henrik Jacobsen, Levon Nurbekyan, and Adam Oberman. How to train your neural ODE: The world of Jacobian and kinetic regularization. In Proceedings of the 37th International Conference on Machine Learning, volume 119, pages 3154 3164. PMLR, 2020. Yuan Gao, Yuling Jiao, Yang Wang, Yao Wang, Can Yang, and Shunkang Zhang. Deep generative learning via variational gradient flow. In Proceedings of the 36th International Conference on Machine Learning, volume 97, pages 2093 2101. PMLR, 2019. Yuan Gao, Jian Huang, Yuling Jiao, Jin Liu, Xiliang Lu, and Zhijian Yang. Deep generative learning via Euler particle transport. In Proceedings of the 2nd Mathematical and Scientific Machine Learning Conference, volume 145, pages 336 368. PMLR, 2022. Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in Neural Information Processing Systems 27, pages 2672 2680. Curran Associates, Inc., 2014. Gao, Huang, and Jiao Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT Press, 2016. http://www.deeplearningbook.org. Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial networks. Communications of the ACM, 63(11):139 144, 2020. Will Grathwohl, Ricky T.Q. Chen, Jesse Bettencourt, and David Duvenaud. FFJORD: Free-form continuous dynamics for scalable reversible generative models. In International Conference on Learning Representations, 2019. Leonard Gross. Logarithmic Sobolev inequalities. American Journal of Mathematics, 97 (4):1061 1083, 1975. Ernst Hairer, Gerhard Wanner, and Syvert P. Nørsett. Classical Mathematical Theory, chapter I, pages 1 128. Springer Berlin Heidelberg, Berlin, Heidelberg, 1993. Philip Hartman. Dependence on Initial Conditions and Parameters, chapter V, pages 93 116. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 2002a. Philip Hartman. Existence, chapter II, pages 8 23. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 2002b. Charles P. Hatsell and Loren W. Nolte. Some geometric properties of the likelihood ratio (corresp.). IEEE Transactions on Information Theory, 17(5):616 618, 1971. Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems, volume 33, pages 6840 6851, 2020. Rie Johnson and Tong Zhang. Composite functional gradient learning of generative adversarial models. In Proceedings of the 35th International Conference on Machine Learning, volume 80, pages 2371 2379. PMLR, 2018. Rie Johnson and Tong Zhang. A framework of composite functional gradient methods for generative adversarial models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(1):17 32, 2019. Tero Karras, Miika Aittala, Timo Aila, and Samuli Laine. Elucidating the design space of diffusion-based generative models. In Advances in Neural Information Processing Systems, volume 35, pages 26565 26577. Curran Associates, Inc., 2022. Dongjun Kim, Chieh-Hsin Lai, Wei-Hsiang Liao, Naoki Murata, Yuhta Takida, Toshimitsu Uesaka, Yutong He, Yuki Mitsufuji, and Stefano Ermon. Consistency trajectory models: Learning probability flow ODE trajectory of diffusion. ar Xiv preprint ar Xiv:2310.02279, 2023. Diederik P. Kingma and Max Welling. Auto-encoding variational bayes. In International Conference on Learning Representations, 2014. Diederik P. Kingma and Max Welling. An introduction to variational autoencoders. Foundations and Trends R in Machine Learning, 12(4):307 392, 2019. Gaussian Interpolation Flows Bo az Klartag. High-dimensional distributions with convexity properties. In Proceedings of the Fifth European Congress of Mathematics, pages 401 417, Amsterdam, 14 July 18 July 2010. European Mathematical Society Publishing House. Ivan Kobyzev, Simon J.D. Prince, and Marcus A. Brubaker. Normalizing flows: An introduction and review of current methods. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(11):3964 3979, 2020. Jeroen S.W. Lamb and John A.G. Roberts. Time-reversal symmetry in dynamical systems: A survey. Physica D: Nonlinear Phenomena, 112(1-2):1 39, 1998. Holden Lee, Jianfeng Lu, and Yixin Tan. Convergence of score-based generative modeling for general data distributions. In Proceedings of The 34th International Conference on Algorithmic Learning Theory, volume 201, pages 946 985. PMLR, 2023. Tengyuan Liang. How well generative adversarial networks learn distributions. Journal of Machine Learning Research, 22(228):1 41, 2021. Yaron Lipman, Ricky T.Q. Chen, Heli Ben-Hamu, Maximilian Nickel, and Matthew Le. Flow matching for generative modeling. In The Eleventh International Conference on Learning Representations, 2023. Qiang Liu. Rectified flow: A marginal preserving approach to optimal transport. ar Xiv preprint ar Xiv:2209.14577, 2022. Xingchao Liu, Chengyue Gong, and Qiang Liu. Flow straight and fast: Learning to generate and transfer data with rectified flow. In The Eleventh International Conference on Learning Representations, 2023. Antoine Liutkus, Umut Simsekli, Szymon Majewski, Alain Durmus, and Fabian-Robert St oter. Sliced-Wasserstein flows: Nonparametric generative modeling via optimal transport and diffusions. In Proceedings of the 36th International Conference on Machine Learning, volume 97, pages 4104 4113. PMLR, 2019. Cheng Lu, Kaiwen Zheng, Fan Bao, Jianfei Chen, Chongxuan Li, and Jun Zhu. Maximum likelihood training for score-based diffusion ODEs by high order denoising score matching. In Proceedings of the 39th International Conference on Machine Learning, pages 14429 14460. PMLR, 2022a. Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, and Jun Zhu. DPMSolver: A fast ODE solver for diffusion probabilistic model sampling in around 10 steps. In Advances in Neural Information Processing Systems, volume 35, pages 5775 5787. Curran Associates, Inc., 2022b. Ashok Makkuva, Amirhossein Taghvaei, Sewoong Oh, and Jason Lee. Optimal transport mapping via input convex neural networks. In Proceedings of the 37th International Conference on Machine Learning, volume 119, pages 6672 6681. PMLR, 2020. Pierre Marion. Generalization bounds for neural ordinary differential equations and deep residual networks. ar Xiv preprint ar Xiv:2305.06648, 2023. Gao, Huang, and Jiao Pierre Marion, Yu-Han Wu, Michael E Sander, and G erard Biau. Implicit regularization of deep residual networks towards neural ODEs. ar Xiv preprint ar Xiv:2309.01213, 2023. Youssef Marzouk, Zhi Ren, Sven Wang, and Jakob Zech. Distribution learning via neural differential equations: A nonparametric statistical perspective. ar Xiv preprint ar Xiv:2309.01043, 2023. Dan Mikulincer and Yair Shenfeld. The Brownian transport map. ar Xiv preprint ar Xiv:2111.11521, 2021. Dan Mikulincer and Yair Shenfeld. On the Lipschitz properties of transportation along heat flows. In Geometric Aspects of Functional Analysis: Israel Seminar (GAFA) 2020-2022, pages 269 290. Springer, 2023. Youssef Mroueh and Truyen Nguyen. On the convergence of gradient descent in GANs: MMD GAN as a gradient flow. In Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, volume 130, pages 1720 1728. PMLR, 2021. Youssef Mroueh, Tom Sercu, and Anant Raj. Sobolev descent. In Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics, volume 89, pages 2976 2985. PMLR, 2019. Joe Neeman. Lipschitz changes of variables via heat flow. ar Xiv preprint ar Xiv:2201.03403, 2022. Kirill Neklyudov, Rob Brekelmans, Daniel Severo, and Alireza Makhzani. Action matching: Learning stochastic dynamics from samples. In Proceedings of the 40th International Conference on Machine Learning, volume 202, pages 25858 25889. PMLR, 2023. Derek Onken, Samy Wu Fung, Xingjian Li, and Lars Ruthotto. OT-flow: Fast and accurate continuous normalizing flows via optimal transport. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 9223 9232, 2021. Daniel P. Palomar and Sergio Verd u. Gradient of mutual information in linear vector Gaussian channels. IEEE Transactions on Information Theory, 52(1):141 154, 2005. George Papamakarios, Eric Nalisnick, Danilo Jimenez Rezende, Shakir Mohamed, and Balaji Lakshminarayanan. Normalizing flows for probabilistic modeling and inference. Journal of Machine Learning Research, 22(57):1 64, 2021. Aram-Alexandre Pooladian, Heli Ben-Hamu, Carles Domingo-Enrich, Brandon Amos, Yaron Lipman, and Ricky T.Q. Chen. Multisample flow matching: Straightening flows with minibatch couplings. In Proceedings of the 40th International Conference on Machine Learning, volume 202, pages 28100 28127. PMLR, 2023. Danilo J. Rezende and Shakir Mohamed. Variational inference with normalizing flows. In Proceedings of the 32nd International Conference on Machine Learning, volume 37, pages 1530 1538. PMLR, 2015. Gaussian Interpolation Flows Danilo J. Rezende, Shakir Mohamed, and Daan Wierstra. Stochastic backpropagation and approximate inference in deep generative models. In Proceedings of the 31st International Conference on Machine Learning, pages 1278 1286. PMLR, 2014. Herbert E. Robbins. An empirical Bayes approach to statistics. In Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, 1954 1955, volume I, page 157 163, Berkeley and Los Angeles, 1956. University of California Press. Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj orn Ommer. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10684 10695, 2022. Domenec Ruiz-Balet and Enrique Zuazua. Neural ODE control for classification, approximation, and transport. SIAM Review, 65(3):735 773, 2023. Ruslan Salakhutdinov. Learning deep generative models. Annual Review of Statistics and Its Application, 2(1):361 385, 2015. Adrien Saumard and Jon A. Wellner. Log-concavity and strong log-concavity: A review. Statistics Surveys, 8:45 114, 2014. Neta Shaul, Ricky T.Q. Chen, Maximilian Nickel, Matthew Le, and Yaron Lipman. On kinetic optimal probability paths for generative models. In Proceedings of the 40th International Conference on Machine Learning, volume 202, pages 30883 30907. PMLR, 2023. Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsupervised learning using nonequilibrium thermodynamics. In Proceedings of the 32nd International Conference on Machine Learning, volume 37, pages 2256 2265. PMLR, 2015. Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models. In International Conference on Learning Representations, 2021a. Yang Song and Prafulla Dhariwal. Improved techniques for training consistency models. ar Xiv preprint ar Xiv:2310.14189, 2023. Yang Song and Stefano Ermon. Generative modeling by estimating gradients of the data distribution. In Advances in Neural Information Processing Systems, volume 32, pages 11895 11907. Curran Associates, Inc., 2019. Yang Song and Stefano Ermon. Improved techniques for training score-based generative models. In Advances in Neural Information Processing Systems, volume 33, pages 12438 12448. Curran Associates, Inc., 2020. Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2021b. Gao, Huang, and Jiao Yang Song, Prafulla Dhariwal, Mark Chen, and Ilya Sutskever. Consistency models. In Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pages 32211 32252. PMLR, 23 29 Jul 2023. Xuan Su, Jiaming Song, Chenlin Meng, and Stefano Ermon. Dual diffusion implicit bridges for image-to-image translation. In The Eleventh International Conference on Learning Representations, 2023. Esteban G. Tabak and Cristina V. Turner. A family of nonparametric density estimation algorithms. Communications on Pure and Applied Mathematics, 66(2):145 164, 2013. Esteban G. Tabak and Eric Vanden-Eijnden. Density estimation by dual ascent of the log-likelihood. Communications in Mathematical Sciences, 8(1):217 233, 2010. Alexander Tong, Nikolay Malkin, Guillaume Huguet, Yanlei Zhang, Jarrid Rector-Brooks, Kilian Fatras, Guy Wolf, and Yoshua Bengio. Conditional flow matching: Simulation-free dynamic optimal transport. ar Xiv preprint ar Xiv:2302.00482, 2023. Andre Wibisono and Varun Jog. Convexity of mutual information along the heat flow. In 2018 IEEE International Symposium on Information Theory (ISIT), pages 1615 1619. IEEE, 2018a. Andre Wibisono and Varun Jog. Convexity of mutual information along the Ornstein Uhlenbeck flow. In 2018 International Symposium on Information Theory and Its Applications (ISITA), pages 55 59. IEEE, 2018b. Andre Wibisono, Varun Jog, and Po-Ling Loh. Information and estimation in Fokker Planck channels. In 2017 IEEE International Symposium on Information Theory (ISIT), pages 2673 2677. IEEE, 2017. Yihong Wu and Sergio Verd u. Functional properties of minimum mean-square error and mutual information. IEEE Transactions on Information Theory, 58(3):1289 1301, 2011. Chen Xu, Xiuyuan Cheng, and Yao Xie. Invertible normalizing flow neural networks by JKO scheme. ar Xiv preprint ar Xiv:2212.14424, 2022. Liu Yang and George E. Karniadakis. Potential flow generator with L2 optimal transport regularity for generative models. IEEE Transactions on Neural Networks and Learning Systems, 33(2):528 538, 2020. Linfeng Zhang, Weinan E, and Lei Wang. Monge-Amp ere flow for generative modeling. ar Xiv preprint ar Xiv:1809.10188, 2018. Kaiwen Zheng, Cheng Lu, Jianfei Chen, and Jun Zhu. Improved techniques for maximum likelihood estimation for diffusion ODEs. ar Xiv preprint ar Xiv:2305.03935, 2023. Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A. Efros. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, pages 2223 2232, 2017.