# light_and_optimal_schrödinger_bridge_matching__f44b7a31.pdf

Light and Optimal Schr odinger Bridge Matching

Nikita Gushchin * 1 Sergei Kholkin * 1 Evgeny Burnaev 1 2 Alexander Korotin 1 2

Schr odinger Bridges (SB) have recently gained the attention of the ML community as a promising extension of classic diffusion models which is also interconnected to the Entropic Optimal Transport (EOT). Recent solvers for SB exploit the pervasive bridge matching procedures. Such procedures aim to recover a stochastic process transporting the mass between distributions given only a transport plan between them. In particular, given the EOT plan, these procedures can be adapted to solve SB. This fact is heavily exploited by recent works giving rise to matching-based SB solvers. The cornerstone here is recovering the EOT plan: recent works either use heuristical approximations (e.g., the minibatch OT) or establish iterative matching procedures which by the design accumulate the error during the training. We address these limitations and propose a novel procedure to learn SB which we call the optimal Schr odinger bridge matching. It exploits the optimal parameterization of the diffusion process and provably recovers the SB process (a) with a single bridge matching step and (b) with arbitrary transport plan as the input. Furthermore, we show that the optimal bridge matching objective coincides with the recently discovered energy-based modeling (EBM) objectives to learn EOT/SB. Inspired by this observation, we develop a light solver (which we call Light SB-M) to implement optimal matching in practice using the Gaussian mixture parameterization of the adjusted Schr odinger potential. We experimentally showcase the performance of our solver in a range of practical tasks. The code for our solver can be found at https://github. com/SKholkin/Light SB-Matching.

*Equal contribution 1Skolkovo Institute of Science and Technology 2Artificial Intelligence Research Institute. Correspondence to: Nikita Gushchin <n.gushchin@skoltech.ru>, Sergei Kholkin <s.kholkin@skoltech.ru>.

Proceedings of the 41 st International Conference on Machine Learning, Vienna, Austria. PMLR 235, 2024. Copyright 2024 by the author(s).

Figure 1: Unpaired adult child translation with our Light SB-M solver applied in the latent space of ALAE (Pidhorskyi et al., 2020) for 1024x1024 FFHQ images (Karras et al., 2019). Our Light SB-M solver converges on 4 cpu cores in several minutes.

Light and Optimal Schr odinger Bridge Matching

1. Introduction

Diffusion models are a powerful type of generative models that show an impressive quality of image generation (Ho et al., 2020; Rombach et al., 2022). However, they still have several directions for improvement on which the research community is actively working. Some of these directions are: speeding up the generation (Wang et al., 2022; Song et al., 2023), application to the image-to-image transfer (Liu et al., 2023a) extension to unpaired image transfer (Meng et al., 2021) or domain adaptation (Vargas et al., 2021), including biological tasks with single cell data.

A promising approach to advance these directions is the development of new theoretical frameworks for learning flows and diffusions. Recently proposed novel techniques such as flow (Lipman et al., 2022) and bridge (Shi et al., 2023) matching for flow and diffusion-based models show promising potential for further extending and improving generative and translation models. Furthermore, by exploiting theoretical links between flow and diffusion models with Optimal Transport (Villani, 2008, OT) and Schr odinger Bridge (L eonard, 2013, SB) problems, several new methods have been proposed to speed up the inference (Liu et al., 2023b), to improve the quality of image generation (Liu et al., 2023a), and to solve unpaired image and domain translations (De Bortoli et al., 2021; Shi et al., 2023).

Recent approaches (Tong et al., 2023; Shi et al., 2023; Liu et al., 2022) to OT and SB problems based on flow and bridge matching either use iterative bridge matching-based procedures or employ heuristic approximations (e.g., the minibatch OT) to recover the SB through its relation to the Entropic OT problem. Unfortunately, iterative methods imply solving a sequence of time-consuming optimization problems and experience error accumulation. In turn, minibatch OT approximations can lead to biased solutions.

Contributions. We show that the above-mentioned issues can be eliminated. We do this by proposing a novel bridge matching-based approach to solve the SB in one iteration.

1. We propose a new bridge matching-based approach to solve the SB problem. Our approach exploits the novel optimal projection for stochastic processes that projects directly onto the set of SBs (M3.1).

2. Based on the new theoretical results, we develop a new fast solver for the SB problem. We use the light parameterization for SBs (Korotin et al., 2024) and our new theory on optimal projections to solve the SB problem in one bridge matching iteration (M3.2).

3. We perform extensive comparisons of this new solver on many setups where SB solvers are widely used, including the SB benchmark (M5.2), single-cell data (M5.3) and unpaired image translation (M5.4).

Notations. The notations of our paper mostly follow those used by the Light SB s authors in their work (Korotin et al., 2024). We work in RD, which is the D-dimensional Euclidean space equipped with the Euclidean norm . We use P(RD) to denote the absolutely continuous Borel probability distributions whose variance and differential entropy are finite. To denote the density of p P(RD) at a point x RD, we use p(x). We use N(x|µ, Σ) to denote the density at a point x RD of the normal distribution with mean µ RD and covariance 0 Σ RD D. We write KL ( ) to denote the Kullback-Leibler divergence between two distributions. In turn, H( ) denotes the differential entropy of a distribution. We use Ωto denote the space of trajectories, i.e., continuous RD-valued functions of t [0, 1]. We write P(Ω) to denote the probability distributions on the trajectories Ωwhose marginals at t = 0 and t = 1 belong to P(RD); this is the set of stochastic processes. We use d Wt to denote the differential of the standard Wiener process W P(Ω). For a process T P(Ω), we denote its joint distribution at t = 0, 1 by πT P(RD RD). In turn, we use T|x0,x1 to denote the distribution of T for t (0, 1) conditioned on T s values x0, x1 at t = 0, 1.

2. Preliminaries

We start with recalling the main concepts of the Schr odinger Bridge problem (M2.1). Next, we discuss the SB solvers which are the most relevant to our study (M2.2, 2.3).

2.1. Background on Schr odinger Bridges

To begin with, we recall the SB problem with the Wiener prior and its equivalent Entropic Optimal Transport problem with the quadratic cost. We start from the latter as it is easier to introduce and interpret. For a detailed discussion of both these problems, we refer to (L eonard, 2013; Chen et al., 2016). Next, we describe the computational setup for learning SBs, which we consider in the paper.

Entropic Optimal Transport (EOT) with the quadratic cost. Consider distributions p0 P(RD), p1 P(RD). For ϵ > 0, the EOT problem with the quadratic cost is to find the minimizer of

min π Π(p0,p1)

2 π(x0, x1)dx0dx1 ϵH(π), (1)

where Π(p0, p1) is the set of the transport plans, i.e., probability distributions on RD RD whose marginals are p0 and p1, respectively. The minimizer π of (1) exists, is unique, and is absolutely continuous; it is called the EOT plan.

Schr odinger Bridge with the Wiener Prior. Consider the Wiener process W ϵ P(Ω) with volatility ϵ > 0 which starts at p0 at t = 0. Its differential satisfies the stochastic differential equation (SDE): d W ϵ t = ϵd Wt. The SB

Light and Optimal Schr odinger Bridge Matching

problem with the Wiener prior W ϵ between p0, p1 is

min T F(p0,p1) KL (T W ϵ) , (2)

where F(p0, p1) P(Ω) is the subset of stochastic processes which start at distribution p0 (at t = 0) and end at p1 (at t = 1). There exists a unique minimizer T . Furthermore, it is a diffusion process described by the SDE: dxt = g (xt, t)dt + d W ϵ t (L eonard, 2013, Prop. 2.3). The optimal process T is called the Schr odinger Bridge and g : RD [0, 1] RD is the optimal drift.

Relation of EOT and SB. EOT (1) and SB (2) are closely related to each other. It holds that the joint marginal distribution πT of T at times 0, 1 coincides with the EOT plan π solving (1), i.e., πT = π . Hence, the solution π

of the EOT problem (1) can be recovered from T . Thus, SB can be viewed as a dynamic extension of EOT: a user is interested not only in the optimal mass transport plan π , but in the entire time-dependent mass transport process T .

Given just the optimal plan π , one may also complete it to get the full process T . It suffices to consider a process whose join marginal distribution at t = 0, 1 is π and the trajectory distribution T |x0,x1 at t (0, 1) conditioned on the ends (x0, x1) coincides with the Wiener Prior s, i.e., T |x0,x1 = W ϵ |x0,x1. The latter is known as the Brownian Bridge (Pinsky & Karlin, 2011, Sec. 8.3.3). Thus, we obtain T = R

RD RD W ϵ |x0,x1dπ (x0, x1). This strategy does not directly give the optimal drift g , but it can recovered by other means, e.g., with the bridge matching (M2.3).

Characterization for EOT and SB solutions. It is known that the EOT plan π can be represented through the input density p0 and a function v : RD R+:

π (x0, x1) = p0(x0) | {z } =π (x0)

exp x0, x1 /ϵ v (x1)/cv (x0) | {z } =π (x1|x0)

where cv (x0) def = R

RD exp x0, x1 /ϵ v (x1)dy. Following the notation of (Korotin et al., 2024), we call v the adjusted Schr odinger potential. The optimal drift of T can also be expressed using v . Namely,

g (xt, t) = ϵ xt log Z

RD N(x |xt, (1 t)ϵID)

2ϵ v (x )dx , (4)

see (Korotin et al., 2024, M2, 3) for a deeper discussion. Note that v is defined up to the multiplicative constant.

Computational SB/EOT setup. In practice, distributions p0 and p1 are usually not available explicitly but only through their empirical samples {x1 0, . . . , x N 0 } p0 and {x1 1, . . . , x M 1 } p1. The typical task is to obtain a good

approximation bg g of the drift of SB process T or explicitly/implicitly approximate the EOT plan s conditional distributions bπ( |x0) π ( |x0) for all x0 RD. This is needed to do the out-of-sample estimation, i.e., for new (test) points xnew 0 p0 sample x1 π ( |xnew 0 ) or simulate T s trajectories staring at a point xnew 0 at time t = 0. This setup widely appears in generative modeling (De Bortoli et al., 2021; Gushchin et al., 2023a) and analysis of biological single cell data (Vargas et al., 2021; Koshizuka & Sato, 2022; Tong et al., 2023).

The setup above is usually called the continuous EOT or SB and should not be confused with the discrete setup, which is widely studied in the discrete OT literature (Peyr e et al., 2019; Cuturi, 2013). There one is mostly interested in computing the EOT plan directly between the empirical samples (probably weighted), i.e., match them with each other. There is usually no need in the out-of-sample estimation.

2.2. Energy-based EOT/SB Solvers

Given a good approximation of the optimal potential v , one may approximate the conditional EOT plans and the optimal drift via (3) and (4), respectively (using v s approximation). Inspired by the idea above, papers (Korotin et al., 2024; Mokrov et al., 2024) provide related approaches to learn this potential. They show that v can be learned via solving

L0(v) def = min v

RD log cv(x0)p0(x0)dx0

RD log v(x1)p1(x1)dy o , (5)

where cv(x) def = R

RD exp x, y /ϵ v(y)dy. This objective magically turns to be equal up to an additive v-independent constant to KL (π πv) = KL (T Sv), where

πv(x0, x1) def = p0(x0) exp x0, x1 /ϵ v(x1) cv(x0) | {z } =πv(x1|x0)

is an approximation of the optimal plan constructed by v instead of v in (3). In turn, Sv P(Ω) is a process with joint marginal (at t = 0, 1) is πv and S|x0,x1 = W ϵ |x0,x1. Its drift gv can be recovered by using (4) with v instead of v .

Here we use the letter S instead of T to denote the process, and this is for a reason. With mild assumptions on v, the process Sv is the Schr odinger bridge between p0 and pv(x1) def = R

RD πv(x0, x1)dx0, i.e., its marginal at t = 1. This follows from the EOT benchmark constructor theorem (Gushchin et al., 2023b, Theorem 3.2). Hence, minimization (5) can be viewed as the optimization over processes Sv, which are SBs determined by their potential v.

Unfortunately, the optimization of (5) is tricky. While the potential v can be directly parameterized, e.g., with

Light and Optimal Schr odinger Bridge Matching

a neural network vθ, the key challenge is to compute cv, which is a non-trivial integral. Note that due to (3), one has π (x1|x0 =0) v (y), i.e., v is an unnormalized density of some distribution. This fact is exploited in (Mokrov et al., 2024; Korotin et al., 2024) to establish ways to optimize (5).

Energy-guided EOT solver (Eg NOT). In (Mokrov et al., 2024), the authors find out that, informally, objective (5) aims to find an unnormalized density v by optimizing KL divergence. Therefore, it resembles the objectives of Energybased Models (Le Cun et al., 2006, EBM). Inspired by this discovery, the authors show how the standard EBM approaches can be modified to optimize (5) and later sample from the learned plan πv. The limitation of the approach is the necessity to use time-consuming MCMC techniques.

Light Schr odinger Bridge solver (Light SB). In (Korotin et al., 2024), they use the fact from (Gushchin et al., 2023b) that the Gaussian parameterization

k=1 αk N(x1|µk, ϵΣk) (7)

for v provides a closed form analytic expression for cθ. This removes the necessity to use time-consuming MCMC approaches at both the training and the inference. Furthermore, Gaussian parameterization provides the closed form expression for the drift of Sv and allows lightspeed sampling from conditional distributions πv(x1|x0), see (Korotin et al., 2024, Propositions 3.2, 3.3).

2.3. Bridge matching Procedures for EOT/SB

Recovering SB process from EOT plan (OTCFM). Since every SB solution is given by the EOT plan π and the Brownian Bridges W ϵ |x0,x1, i.e., T = R

RD RD W ϵ |x0,x1dπ (x0, x1), solution of the EOT problem π already provides a way to sample from marginal distributions p T (xt, t) of T at each time t [0, 1]. The authors of (Tong et al., 2023) propose to use this property to recover the drift g (xt, t) of the process T

using flow (Lipman et al., 2022) and score matching techniques. They use the flow matching to fit the drift g (xt, t) of the probability flow ODE for marginals p T (xt, t) (at time t) of the process T , i.e., g (xt, t) for which the continuity equation p T (xt,t)

t = (p T (xt, t)g (xt, t)) holds. In turn, score matching is used to fit the score functions log p T (xt, t) of marginal distributions. Then they recover the Schr odinger bridge drift by using the relationship between the probability flow ODE and the SDE representation of stochastic processes: g (xt, t) + ϵ

2 log p T (xt, t) = g (xt, t).

Unfortunately, the solution of the EOT problem π for two arbitrary distributions p0 and p1 is unknown. The authors use the discrete (minibatch) OT between empirical distribu-

tions bp0 def = PN n=1 δxn and bp1 def = PM m=1 δym constructed by available samples instead. However, the empirical EOT plan bπ may be highly biased from the true π . This potentially leads to undesirable errors in approximating SB.

Learning SB process without EOT solution (DSBM). Another matching method has been proposed by (Shi et al., 2023) to get the SB without knowing the EOT plan π . To begin with, for any π Π(p0, p1), define Tπ (called the reciprocal process of π) as a mixture of Brownian Bridges with weights given by π, i.e., Tπ = R

RD RD W ϵ |x0,x1dπ(x0, x1).

To get π and T , the authors alternate between two projections of stochastic processes: the reciprocal and the Markovian. For a process T P(Ω), its reciprocal projection is a mixture of Brownian bridges given by the plan πT :

proj R(T) def = Z

RD RD W ϵ |x0,x1dπT (x, y). (8)

This is a reciprocal process with the same joint marginal πT

at times t = 0, 1 as T (one may write proj R(T) = TπT ).

Consider any reciprocal process Tπ. Its Markovian projection proj M(Tπ) is a diffusion process defined by an SDE dxt = g(xt, t)dt+ ϵd Wt, that preserves all time marginals of Tπ. Its drift function is analytically given by:

g(xt, t) = Z

1 t p Tπ(x1|xt)dx1, (9)

where p Tπ denotes the distribution of Tπ. Drift (9) is a solution to the following optimization problem:

RD RD||g(xt, t) x1 xt

1 t ||2dp Tπ(xt, x1)dt (10)

and can be learned by sampling (x0, x1) π, xt W ϵ |x0,x1 and parametrizing g by a neural network. This procedure is the so-called bridge matching procedure.

The authors prove (Shi et al., 2023, Theorem 8) that a sequence (T l)l N constructed by alternating the projections

T 2l+1 = proj M(T 2l+2), T 2l = proj R(T 2l+1), (11)

with T 0 = Tπ and any π Π(p0, p1) converges to the SB solution T between p0 and p1. When ϵ 0, the Markovian projection transforms into the well-known flow matching procedure (Lipman et al., 2022), and the whole iterative procedure becomes the Rectified Flow (Liu et al., 2022).

Markovian projection (9) is the bottleneck of the iterative procedure. In practice, the method uses a neural net to learn the drift of the projection. This introduces approximation errors at each iteration. The errors lead to differences between the process T n s marginal distribution at time t = 1 and the actual p1. These errors accumulate after each iteration and affect convergence, motivating the search for a bridge matching procedure that converges in a single iteration.

Light and Optimal Schr odinger Bridge Matching

3. Light and Optimal SB Matching Solver

In M3.1, we present the main theoretical development of our paper the optimal Schr odinger bridge matching method. Next, in M3.2, we propose our novel Light SB-M solver, which implements the method in practice. In M3.3, we discuss its connections with the related EOT/SB solvers. In Appendix A we provide proofs of all theorems.

3.1. Theory. Optimal Schr odinger Bridge Matching

Our algorithm is based on the properties of KL projections of stochastic processes on the set S of Schr odinger Bridges:

S def = S P(Ω) such that p S 0 , p S 1 P(RD)

for which S = arg min T F(p S 0 ,p S 1 ) KL (T W ϵ) . (12)

In addition to reciprocal and Markovian projections, we define a new optimal projection (OP). Consider any plan π Π(p0, p1), e.g., independent, minibatch, optimal, etc. Given a reciprocal process Tπ, its projection is the process

proj S(Tπ) def = arg min S S KL (Tπ S) . (13)

We prove that optimal projection allows to obtain the solution of Schr odinger Bridge in just one projection step.

Theorem 3.1 (OP of a reciprocal process). The optimal projection of a reciprocal process Tπ, given by a joint distribution π Π(p0, p1) leads to the Schr odinger Bridge T

between the distributions p0 and p1, i.e.:

proj S(Tπ) = arg min S S KL (Tπ S) = T . (14)

To implement this in practice, we need to (a) have a tractable estimator of KL (Tπ S) and (b) be able to optimize over S. We denote S(p0) as the subset of S of processes which start at p0 at t = 0. Since T S(p0), it suffices to optimize over S(p0) in (14). As it was noted in the background M2.2, processes S S(p0) are determined by their adjusted Schr odinger potential v. We will write Sv instead of S for convenience.

Theorem 3.2 (Tractable objective for the OP). For the SB Sv S(p0) and a reciprocal process Tπ with π Π(p0, p1) the optimal projection objective (13) is

KL (Tπ Sv) = C(π)+ (15)

RD RD ||gv(xt, t) x1 xt

1 t ||2dp Tπ(xt, x1)dt,

where gv is the drift of Sv given by (4) (with v instead of v ). Here the constant C(π) does not depend on Sv.

This result provides an opportunity to optimize Sv via fitting its drift gv. Indeed, we can estimate KL (Tπ Sv) up to a constant by sampling from Tπ. To sample from Tπ, it is sufficient to sample a pair (x0, x1) π and then to sample xt from the Brownian bridge W ϵ |x0,x1. The natural remaining question is how to parameterize the drifts of the SB processes Sv S. We explain this in the section below.

3.2. Practice. Light SB-M Optimization Procedure

To solve the Schr odinger Bridge between two distributions p0 and p1 by using optimal projection (13) and its tractable objective (15), we use any plan π Π(p0, p1) accessible by samples. It can be the independent plan, i.e., just independent samples from p0, p1, any minibatch OT plan, i.e., the one obtained by solving discrete OT on minibatch from p0 and p1, etc. To optimize over Schr odinger Bridges Sv S, we use the parametrization of v as a Gaussian mixture (7) from Light SB (M2.2), which for every vθ provides gθ def = gvθ (4) in a closed form (Korotin et al., 2024, Proposition 3.3):

gθ(x, t) = ϵ x log N(x|0, ϵ(1 t)ID)

αk N(rk|0, ϵΣk)N(h(x, t)|0, At k) (16)

with At k def = t ϵ(1 t)ID+Σ 1 k ϵ and hk(x, t) def = x ϵ(1 t) + 1

ϵ Σ 1 k rk. Using this parametrization and any π Π(p0, p1), we optimize objective (15) with the stochastic gradient descent.

Algorithm 1 Light SB Matching (Light SB-M) Input : plan π Π(p0, p1) accessible by samples; adjusted Schr odinger potential vθ parametrized by a gaussian mixture (θ = {αk, µk, Σk}K k=1). Output :learned drift gθ approximating the optimal g . repeat

Sample batch of pairs {xn 0, xn 1}N n=0 π; Sample batch {tn}N n=0 U[0, 1]; Sample batch {xn t }N n=0 W ϵ |x0,x1;

N PN n=1 ||gθ(xn t , tn) 1 1 tn (xn 1 xn t )||2; Update θ using Lθ

θ ; until converged;

The training procedure is described in Algorithm 1. We recall that the Brownian bridge W ϵ |x0,x1 has time marginals

p BB(xt|x0, x1) def = N(xt|tx1 + (1 t)x0, ϵt(1 t)ID), i.e. has a normal distribution with a scalar covariance matrix.

After learning the drift gv(x, t) of the Schr odinger Bridge SDE dxt = gv(xt, t)dt + ϵd Wt, one can use any SDE solver to infer trajectories. For example, one can use the simplest and most popular Euler-Maruyama scheme (Kloeden et al., 1992, M9.2). However, SDE solvers introduce some errors due to discrete approximations. Using the

Light and Optimal Schr odinger Bridge Matching

Light SB parameterization of Schr odinger bridges from (Korotin et al., 2024), we can sample trajectories without having to solve the learned SDE numerically. To do so, we first sample from the learned plan πv(x1|x0) given by (6) and then sample the trajectory of the Brownian bridge W ϵ |x0,x1 using it s self-similarity property (Korotin et al., 2024, M3.2). We recall that the self-similarity of the Brownian bridge means that if we have a trajectory x0, xt1, ..., xt L, x1, we can sample a new point at time tl < t < tl+1 by using the following property of the Brownian bridge:

xt N xt|xtl+ t tl

tl+1 tl (xtl+1 xtl),ϵ(t tl)(tl+1 t )

3.3. Connections to the Most Related Prior Works

DSBM (Shi et al., 2023). Schr odinger Bridge T between p0 and p1 is the only process that simultaneously is Markovian and reciprocal (L eonard, 2013, Proposition 2.3). This fact lies at the core of DSBM s iterative approach of alternating Markovian and reciprocal projections. In turn, our optimal projection (13) provides the SB in one step, projecting a process on the set of processes that are both reciprocal and Markovian, i.e., Schr odinger Bridges.

OT-CFM (Tong et al., 2023). Our optimal projection (13) of a reciprocal process of Tπ with any π Π(p0, p1) is the same Schr odinger Bridge between p0 and p1. Thus, optimal projection does not depend on the choice of the plan π. In turn, OT-CFM provides theoretical guarantees of finding the Schr odinger Bridge only if one chooses as plan π the EOT plan π , which is unknown for arbitrary distributions p0, p1. Eg NOT/Light SB (Mokrov et al., 2024; Korotin et al., 2024). Our main objective (13) resembles objective (5) of Eg NOT and Light SB as the latter equals KL (T Sv) up to a constant. At the same time, our objective allows to use any reciprocal process Tπ instead of T = Tπ . Interestingly, our obtained tractable bridge matching objective turns out to be closely related to the Eg NOT/Light SB objective (5). Theorem 3.3 (Equivalence to Eg NOT/Light SB objective). The OP objective (15) for a reciprocal process Tπ and π Π(p0, p1) is equivalent to Light SB objective L0 (5):

RD RD ||gv(xt, t) x1 xt

1 t ||2dp Tπ(xt, x1)dt =

e C(π) + L0(v).

One interesting conclusion from this equivalence is that our Light SB-M solver automatically inherits the theoretical generalization and approximation properties of the Light SB solver; see (Korotin et al., 2024, M3) for details about them.

4. Other Related Works

Here, we overview other existing works related to solving SB/EOT. Unlike the works described above, these are less

relevant to our study. Still, we want to highlight some aspects of other solvers related to our solver.

4.1. Iterative proportional fitting (IPF) solvers.

There are several Schr odinger Bridge solvers (Vargas et al., 2021; De Bortoli et al., 2021; Chen et al., 2021a) for continuous probability distributions based on the Iterative Proportional Fitting (IPF) procedure (Fortet, 1940; Kullback, 1968; Ruschendorf, 1995). The IPF procedure is related to the Sinkhorn algorithm (Cuturi, 2013) and, as was recently shown in work (Vargas & N usken, 2023), coincides with the expectation-maximization (EM) algorithm (Dempster et al., 1977). All these three IPF-based SB solvers consist of iterative reversing of Markovian processes and differ only in particular methods to fit a reversion of a process by a neural network. The first two (Vargas et al., 2021; De Bortoli et al., 2021) methods use similar mean-matching procedures, while the last (Chen et al., 2021a) utilizes a different approach which includes the estimation of a divergence.

In (Shi et al., 2023) the authors show, that due to iterative nature of one of these solvers (De Bortoli et al., 2021) it can diverge, due to errors accumulation on each iteration. Furthermore, the authors of (Vargas & N usken, 2023) show that these solvers tend to lose the information of Wiener Prior of Schr odinger Bridge and converge to the Markovian process that does not solve the SB problem. In turn, our approach eliminates the need for iterative learning of a sequence of Markovian processes and is free from the possible issues with divergence or obtaining a biased solution.

4.2. EOT solvers and EOT-based SB solvers.

Recall that EOT and SB problems are closely related: SB solutions can be recovered from EOT solutions by using Brownian Bridge W ϵ |x0,x1 or recovering the drift g(xt, t), e.g., as in (Tong et al., 2023). Due to this, we also give a quick overview of EOT solvers for continuous distributions. Several works (Genevay et al., 2016; Seguy et al., 2018; Daniels et al., 2021) consider solving the EOT problem by utilizing the classic dual EOT problem (Genevay et al., 2019). Classic dual EOT problem for continuous p0 and p1 is an unconstrained maximization problem over dual variables, also called potentials, which can be parameterized by neural networks and trained. After training, these potentials can be used to directly sample from distribution π (x1|x0) by using additional score model for x log p1(x) (Daniels et al., 2021) or to train neural network model to predict conditional expectation Eπ (x1|x0)x1, i.e., the barycentric projection. However, the main disadvantage of these methods is that in practice, dual EOT problem cannot be solved by neural networks for practically meaningful (small) coefficients ϵ due to numerical errors of calculating dual EOT objective since it includes terms in form Ex0 p0,x1 p1 exp( f(x0,x1)

Light and Optimal Schr odinger Bridge Matching

(a) x p0, y p1.

(b) ϵ = 0.01.

(c) ϵ = 0.1.

Figure 2: The process Sθ learned with Light SB-M (ours) in Gaussian Swiss roll example (M5.1).

There is also one SB solver based on the theory of EOT dual problem (Gushchin et al., 2023a). This solver directly fits the drift g of the Schr odinger Bridge by using a maximin reformulation of the dual EOT problem and its link to the SB problem. This allows to overcome the numerical problems and solve SB for practically meaningful values of ϵ.

Our solver is also based on solving EOT and SB using the theory behind the dual EOT problem. Thanks to using parametrization of adjusted Schr odinger potential as in (Korotin et al., 2024) instead of EOT potentials as in (Seguy et al., 2018; Daniels et al., 2021) and using novel optimization objective based on bridge matching, our method overcomes numerical issues of the previously developed dual EOT-based methods without the maximin optimization.

4.3. Other SB solvers.

The authors of (Kim et al., 2024) propose a different minimax SB solver by considering the self-similarity of the SB in learning objectives and an additional consistency regularization. While showing good results, their approach requires using neural estimation of entropy, which involves solving additional optimization problem at every minimization step.

All previously considered solvers are designed to solve SB as a problem of finding the optimal translation between two distributions p0, p1 without any paired data from them, but there are also several SB solvers (Liu et al., 2023a; Somnath et al., 2023) for setups with paired trained data such as the super-resolution. In fact, the concept of bridge matching was introduced in (Liu et al., 2023a) but for the paired setup. The authors work under the assumption that the available paired data is a good approximation of the EOT plan and propose using Bridge matchi to recover the SB from this data, which makes their method related to (Tong et al., 2023). As noted earlier, our solver provably recovers SB using data provided by arbitrary plan π between p0 and p1.

5. Experimental Illustrations

To evaluate our new Light SB-M solver, we considered several setups from related works. The code for our solver is

written in Py Torch and available at https://github. com/SKholkin/Light SB-Matching. For each experiment, we present a separate self-explaining Jupyter notebook, which can be used to reproduce the results of our solver. We provide the technical details in Appendix B.

5.1. Qualitative 2D Example

We start our evaluation with an illustrative 2D setup. We solve the SB between a Gaussian distribution p0 and a Swiss roll p1. We run our Light SB-M solver with mini-batch (MB) discrete OT as plan π for different values of the coefficient ϵ and present the results in Figure 2. As expected, we see that the amount of noise in the trajectories and the stochasticity of the learned map are proportional to coefficient ϵ. The technical details of this setup are given in Appendix B.1.

5.2. Quantitative Evaluation on the SB Benchmark

We use the SB mixtures benchmark proposed by (Gushchin et al., 2023b, M4) to experimentally verify that our approach based on the optimal projection is indeed able to solve the Schr odinger Bridge between p0 and p1 by using any reciprocal process Tπ, π Π(p0, p1). The benchmark provides continuous probability distribution pairs p0, p1 for dimensions D {2, 16, 64, 128} with the known EOT plan π (x0, x1) for parameter ϵ {0.1, 1.10}. To evaluate the quality of the SB solution (EOT plan) we use c BW2 2-UVP metric as suggested by the authors (Gushchin et al., 2023b, M5). Additionally, we study how well the solvers restore the target distribution p1 in Appendix B.3.

We provide results of our Light SB-M solver with independent (ID) and mini-batch discrete OT (MB) as π in Tπ for mixture benchmark pairs in Table 1. Since the benchmark provides the ground truth EOT plan π (GT), we also run our solver with it. Note that we have access to the GT EOT plan thanks to the benchmark, and in regular setups there is, of course, no access to it. As shown in the Table 1, our solver demonstrates comparable performance to the best among other solvers for all considered plans π. As noted in (Korotin et al., 2024, M5.2), the mixture parameterization used by Light SB and which we adapt in our Light SB-M

Light and Optimal Schr odinger Bridge Matching

ϵ = 0.1 ϵ = 1 ϵ = 10

Solver Type D =2 D =16 D =64 D =128 D =2 D =16 D =64 D =128 D =2 D =16 D =64 D =128

Best solver on benchmark Varies 1.94 13.67 11.74 11.4 1.04 9.08 18.05 15.23 1.40 1.27 2.36 1.31 Light SB KL minimization 0.03 0.08 0.28 0.60 0.05 0.09 0.24 0.62 0.07 0.11 0.21 0.37 DSBM

Bridge matching

5.2 16.8 37.3 35 0.3 1.1 9.7 31 3.7 105 3557 15000 SF2M-Sink 0.54 3.7 9.5 10.9 0.2 1.1 9 23 0.31 4.9 319 819 Light SB-M (ID, ours) 0.04 0.18 0.77 1.66 0.09 0.18 0.47 1.2 0.12 0.19 0.36 0.71 Light SB-M (MB, ours) 0.02 0.1 0.56 1.32 0.09 0.18 0.46 1.2 0.13 0.18 0.36 0.71 Light SB-M (GT, ours) 0.02 0.1 0.49 1.16 0.09 0.18 0.47 1.2 0.13 0.18 0.36 0.69

Table 1: Comparisons of c BW2 2-UVP (%) between the optimal plan π and the learned plan πθ on the EOT/SB benchmark (M5.2). The best metric over bridge matching solvers is bolded. Results marked with are taken from (Korotin et al., 2024).

Solver type Solver DIM 50 100 1000

Langevin-based (Mokrov et al., 2024) [1 GPU V100] 2.39 0.06 (19 m) 2.32 0.15 (19 m) 1.46 0.20 (15 m) Minimax (Gushchin et al., 2023a) [1 GPU V100] 2.44 0.13 (43 m) 2.24 0.13 (45 m) 1.32 0.06 (71 m) IPF (Vargas et al., 2021) [1 GPU V100] 3.14 0.27 (8 m) 2.86 0.26 (8 m) 2.05 0.19 (11 m) KL minimization Light SB (Korotin et al., 2024) [4 CPU cores] 2.31 0.27 (65 s) 2.16 0.26 (66 s) 1.27 0.19 (146 s)

Bridge matching

DSBM (Shi et al., 2023) [1 GPU V100] 2.46 0.1 (6.6 m) 2.35 0.1 (6.6 m) 1.36 0.04 (8.9 m) SF2M-Sink (Tong et al., 2023) [1 GPU V100] 2.66 0.18 (8.4 m) 2.52 0.17 (8.4 m) 1.38 0.05 (13.8 m) Light SB-M (ID, ours) [4 CPU cores] 2.347 0.11 (58 s) 2.174 0.08 (60 s) 1.35 0.05 (147 s) Light SB-M (MB, ours) [4 CPU cores] 2.33 0.09 (80 s) 2.172 0.08 (80 s) 1.33 0.05 (176 s)

Table 2: Energy distance (averaged for two setups and 5 random seeds) on the MSCI dataset (M5.3) along with 95%-confidence interval ( intervals) and average training times (s - seconds, m - minutes). The best bridge matching solver according to the mean value is bolded. Results marked with are taken from (Korotin et al., 2024).

solver may introduce some inductive bias, since it uses the analogous principles used to construct the benchmark.

We empirically see that our Light SB-M solver finds the same (optimal) solution for all considered plans π.

Baselines. We present results for other bridge matching methods such as DSBM (Shi et al., 2023), which uses Markovian and reciprocal projections, and SF2M-Sink (Tong et al., 2023), which uses an approximation of the EOT plan by the Sinkhorn algorithm (Cuturi, 2013). On the setups with ϵ = 10 both methods exibits difficulties due to the necessity to learn SDE with high magnitude. On the setups with ϵ = 0.1 and ϵ = 1, SF2M-Sink works better than DSBM. This result may seem counterintuitive at first, since DSBM methods should find the true SB solution, while SF2M-Sink should find some approximation to it based on how close the minibatch discrete EOT approximates the GT EOT plan. One possible reason is that DSBM simply requires more iterations of Markovian/reciprocal projections. However, in our experiments we observe that increasing the number of iterations does not improve the quality.

We provide an additional study of dynamic metrics and the inference speed of our solver in Appendix B.3.

5.3. Quantitative Evaluation on Biological Data

We evaluate our algorithm on the inference of cell trajectories from unpaired single-cell data problem, where OT/SB is widely used (Vargas et al., 2021; Tong et al., 2023; Koshizuka & Sato, 2022). We consider the recent high-dimensional single-cell setup provided by (Tong et al., 2023) based on the dataset from the Kaggle competition

Open Problems - Multimodal Single-Cell Integration. This dataset provides single-cell data from four human donors on days 2, 3, 4 and 7 and describes the gene expression levels of distinct cells. The task of this setup is to learn a trajectory model for the cell dynamics, given only unpaired samples at two time points, representing distributions p0 and p1. As in related works (Tong et al., 2023; Korotin et al., 2024), we use PCA projections of the original data with DIM {50, 100, 1000} components.

In our experiments, we consider two setups by taking data from two different days as p0, p1 to solve the Shr odinger Bridge and one intermediate day for evaluation. The first setup includes data from day 2 as p0, data from day 4 as p1, and data from day 3 for evaluation, while the second setup includes data from day 3 as p0, data from day 7 as p1, and data from day 4 for evaluation. At evaluation, we use learned models to sample one trajectory for each cell from the initial distribution p0 and then compare the predicted distribution at the intermediate time point with the ground truth data distribution. For comparison, we use energy distance (Rizzo & Sz ekely, 2016) and present results in Table 2.

We see that our Light SB-M s solution with independent (ID) and minibatch discrete OT (MB) plans for Tπ provides the same metrics since it learns the same solution, as follows from the developed theory. It also shows performance on the same level as other neural network-based matching methods such as DSBM and SF2M-Sink, but converges faster even without using GPU similar to the Light SB solver.

In Appendix B.2, we provide the technical details for this setup and additional results for different values of ϵ.

Light and Optimal Schr odinger Bridge Matching

(a) Adult Child

(b) Man Woman

Figure 3: Unpaired translation between subsets of FFHQ dataset (1024x1024) performed by various SB solvers (M5.4) in the latent space of ALAE (Pidhorskyi et al., 2020).

5.4. Comparison on Unpaired Image-to-image Transfer

Another popular setup that involves learning a translation between two distributions without paired data is image-toimage translation (Zhu et al., 2017). Methods based on SB show promising results in solving this problem thanks to the perfect theoretical agreement of this setup with the SB formulation (Shi et al., 2023). Due to the used parameterization based on Gaussian mixture, learning the translation between low-dimensional image manifolds is difficult for Light SB-M. Fortunately, many approaches use autoencoders (Rombach et al., 2022) for more efficient generation and translation. We follow the setup of (Korotin et al., 2024) with the pre-trained ALAE autoencoder (Pidhorskyi et al., 2020) on 1024 1024 FFHQ dataset (Karras et al., 2019).

We present the qualitative results of our solver with discrete minibatch OT plan (MB) and independent plan (ID) in Fig 3. For comparison, we also provide results of DSBM and SF2M-Sink. Our Light SB-M solver converges to nearly the same solution for both ID and MB plans and demonstrates good results. The samples provided by DSBM are close to the samples of Light SB-M, which is expected since both methods provide theoretical guarantees for solving the SB problem. Samples obtained by SF2M-Sink slightly differ, probably due to the bias of the discrete EOT plans. We provide additional examples of translation in Appendix B.4. The details of the baselines are given in Appendix B.5.

6. Discussion

Potential impact. Our main contribution is methodological: we show that one may perform just a single (but optimal) bridge matching step to learn SB. This finding helps us eliminate limitations of existing bridge matching-based approaches, such as heuristical minibatch OT approximations

or error accumulation during training. We believe that this insight is a significant step towards developing novel efficient computational approaches for SB/EOT tasks.

Limitations. Given an adjusted Schr odinger potential v, it may be not easy to compute the drift gv (4) of Sv needed to perform the optimal SB matching. We employ the Gaussian mixture parameterization for v for which this drift gv is analytically known (16). This allows to easily implement our optimal SB matching in practice and obtain a fast bridge matching based solver. Still such a parameterization sometimes may be not sufficient, e.g., for large-scale generative modeling tasks. We point to developing ways to use more general parameterization of v to our optimal SB matching, e.g., neural-network-based, as a promising research avenue. We show possible steps in this direction in Appendix C.

One other limitation of our Light SB-M solver is that it is applicable to a limited set of priors. In this paper, we only consider the Wiener prior, which is one of the most popular priors used for SB. However, our method can be applied to other priors by changing the variables. These include Arithmetic Brownian Motion and Geometric Brownian Motion, also known as the Black-Scholes model, which is widely used in mathematical finance. Developing light solvers for Scr odinger Bridges with more general priors is a promising direction for the future research.

Acknowledgements

The work was supported by the Analytical center under the RF Government (subsidy agreement 000000D730321P5Q0002, Grant No. 70-2021-00145 02.11.2021).

Light and Optimal Schr odinger Bridge Matching

Impact Statement

This paper presents work whose goal is to advance the field of Machine Learning. There are many potential societal consequences of our work, none which we feel must be specifically highlighted here.

Chen, T., Liu, G.-H., and Theodorou, E. Likelihood training of schr odinger bridge using forward-backward sdes theory. In International Conference on Learning Representations, 2021a.

Chen, Y., Georgiou, T. T., and Pavon, M. On the relation between optimal transport and schr odinger bridges: A stochastic control viewpoint. Journal of Optimization Theory and Applications, 169:671 691, 2016.

Chen, Y., Georgiou, T. T., and Pavon, M. Stochastic control liaisons: Richard sinkhorn meets gaspard monge on a schrodinger bridge. SIAM Review, 63(2):249 313, 2021b.

Cuturi, M. Sinkhorn distances: Lightspeed computation of optimal transport. Advances in neural information processing systems, 26, 2013.

Daniels, M., Maunu, T., and Hand, P. Score-based generative neural networks for large-scale optimal transport. Advances in neural information processing systems, 34: 12955 12965, 2021.

De Bortoli, V., Thornton, J., Heng, J., and Doucet, A. Diffusion schr odinger bridge with applications to score-based generative modeling. Advances in Neural Information Processing Systems, 34:17695 17709, 2021.

Dempster, A. P., Laird, N. M., and Rubin, D. B. Maximum likelihood from incomplete data via the em algorithm. Journal of the royal statistical society: series B (methodological), 39(1):1 22, 1977.

Flamary, R., Courty, N., Gramfort, A., Alaya, M. Z., Boisbunon, A., Chambon, S., Chapel, L., Corenflos, A., Fatras, K., Fournier, N., Gautheron, L., Gayraud, N. T., Janati, H., Rakotomamonjy, A., Redko, I., Rolet, A., Schutz, A., Seguy, V., Sutherland, D. J., Tavenard, R., Tong, A., and Vayer, T. Pot: Python optimal transport. Journal of Machine Learning Research, 22(78):1 8, 2021. URL http: //jmlr.org/papers/v22/20-451.html.

Fortet, R. R esolution d un syst eme d equations de m. schr odinger. Journal de Math ematiques Pures et Appliqu ees, 19(1-4):83 105, 1940.

Genevay, A., Cuturi, M., Peyr e, G., and Bach, F. Stochastic optimization for large-scale optimal transport. In Advances in neural information processing systems, pp. 3440 3448, 2016.

Genevay, A., Chizat, L., Bach, F., Cuturi, M., and Peyr e, G. Sample complexity of sinkhorn divergences. In The 22nd international conference on artificial intelligence and statistics, pp. 1574 1583. PMLR, 2019.

Gushchin, N., Kolesov, A., Korotin, A., Vetrov, D., and Burnaev, E. Entropic neural optimal transport via diffusion processes. In Advances in Neural Information Processing Systems, 2023a.

Gushchin, N., Kolesov, A., Mokrov, P., Karpikova, P., Spiridonov, A., Burnaev, E., and Korotin, A. Building the bridge of schr\ odinger: A continuous entropic optimal transport benchmark. In Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2023b.

Ho, J., Jain, A., and Abbeel, P. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840 6851, 2020.

Karras, T., Laine, S., and Aila, T. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 4401 4410, 2019.

Kim, B., Kwon, G., Kim, K., and Ye, J. C. Unpaired imageto-image translation via neural schr odinger bridge. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/ forum?id=u QBW7ELXf O.

Kingma, D. P. and Ba, J. Adam: A method for stochastic optimization. ar Xiv preprint ar Xiv:1412.6980, 2014.

Kloeden, P. E., Platen, E., Kloeden, P. E., and Platen, E. Stochastic differential equations. Springer, 1992.

Korotin, A., Gushchin, N., and Burnaev, E. Light schr\ odinger bridge. In International Conference on Learning Representations, 2024.

Koshizuka, T. and Sato, I. Neural lagrangian schr\ {o} dinger bridge: Diffusion modeling for population dynamics. In The Eleventh International Conference on Learning Representations, 2022.

Kullback, S. Probability densities with given marginals. The Annals of Mathematical Statistics, 39(4):1236 1243, 1968.

Le Cun, Y., Chopra, S., Hadsell, R., Ranzato, M., and Huang, F. A tutorial on energy-based learning. Predicting structured data, 1(0), 2006.

L eonard, C. A survey of the schr\ odinger problem and some of its connections with optimal transport. ar Xiv preprint ar Xiv:1308.0215, 2013.

Light and Optimal Schr odinger Bridge Matching

Lipman, Y., Chen, R. T., Ben-Hamu, H., Nickel, M., and Le, M. Flow matching for generative modeling. In The Eleventh International Conference on Learning Representations, 2022.

Liu, G.-H., Vahdat, A., Huang, D.-A., Theodorou, E. A., Nie, W., and Anandkumar, A. Iˆ2 sb: Image-to-image schr\ odinger bridge. ar Xiv preprint ar Xiv:2302.05872, 2023a.

Liu, X., Gong, C., et al. Flow straight and fast: Learning to generate and transfer data with rectified flow. In The Eleventh International Conference on Learning Representations, 2022.

Liu, X., Zhang, X., Ma, J., Peng, J., and Liu, Q. Instaflow: One step is enough for high-quality diffusion-based textto-image generation. ar Xiv preprint ar Xiv:2309.06380, 2023b.

Meng, C., He, Y., Song, Y., Song, J., Wu, J., Zhu, J.-Y., and Ermon, S. Sdedit: Guided image synthesis and editing with stochastic differential equations. ar Xiv preprint ar Xiv:2108.01073, 2021.

Mokrov, P., Korotin, A., Kolesov, A., Gushchin, N., and Burnaev, E. Energy-guided entropic neural optimal transport. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview. net/forum?id=d6t Us Ze Vs7.

Pavon, M. and Wakolbinger, A. On free energy, stochastic control, and schr odinger processes. In Modeling, Estimation and Control of Systems with Uncertainty: Proceedings of a Conference held in Sopron, Hungary, September 1990, pp. 334 348. Springer, 1991.

Peyr e, G., Cuturi, M., et al. Computational optimal transport. Foundations and Trends in Machine Learning, 11(5-6): 355 607, 2019.

Pidhorskyi, S., Adjeroh, D. A., and Doretto, G. Adversarial latent autoencoders. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14104 14113, 2020.

Pinsky, M. A. and Karlin, S. 8 - brownian motion and related processes. In Pinsky, M. A. and Karlin, S. (eds.), An Introduction to Stochastic Modeling (Fourth Edition), pp. 391 446. Academic Press, Boston, fourth edition edition, 2011. ISBN 978-0-12-381416-6. doi: https://doi.org/ 10.1016/B978-0-12-381416-6.00008-3. URL https: //www.sciencedirect.com/science/ article/pii/B9780123814166000083.

Rizzo, M. L. and Sz ekely, G. J. Energy distance. wiley interdisciplinary reviews: Computational statistics, 8(1): 27 38, 2016.

Roberts, G. O. and Tweedie, R. L. Exponential convergence of Langevin distributions and their discrete approximations. Bernoulli, 2(4):341 363, 1996.

Rombach, R., Blattmann, A., Lorenz, D., Esser, P., and Ommer, B. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10684 10695, 2022.

Ruschendorf, L. Convergence of the iterative proportional fitting procedure. The Annals of Statistics, pp. 1160 1174, 1995.

Seguy, V., Damodaran, B. B., Flamary, R., Courty, N., Rolet, A., and Blondel, M. Large scale optimal transport and mapping estimation. In International Conference on Learning Representations, 2018.

Shi, Y., Bortoli, V. D., Campbell, A., and Doucet, A. Diffusion schr odinger bridge matching. In Thirty-seventh Conference on Neural Information Processing Systems, 2023. URL https://openreview.net/forum? id=qy07OHs JT5.

Somnath, V. R., Pariset, M., Hsieh, Y.-P., Martinez, M. R., Krause, A., and Bunne, C. Aligned diffusion schr\ odinger bridges. ar Xiv preprint ar Xiv:2302.11419, 2023.

Song, Y., Dhariwal, P., Chen, M., and Sutskever, I. Consistency models. ar Xiv preprint ar Xiv:2303.01469, 2023.

Tong, A., Malkin, N., Fatras, K., Atanackovic, L., Zhang, Y., Huguet, G., Wolf, G., and Bengio, Y. Simulationfree schr\ odinger bridges via score and flow matching. ar Xiv preprint ar Xiv:2307.03672, 2023.

Vargas, F. and N usken, N. Transport, variational inference and diffusions: with applications to annealed flows and schr\ odinger bridges. ar Xiv preprint ar Xiv:2307.01050, 2023.

Vargas, F., Thodoroff, P., Lamacraft, A., and Lawrence, N. Solving schr odinger bridges via maximum likelihood. Entropy, 23(9):1134, 2021.

Villani, C. Optimal transport: old and new, volume 338. Springer Science & Business Media, 2008.

Wang, Z., Zheng, H., He, P., Chen, W., and Zhou, M. Diffusion-gan: Training gans with diffusion. In The Eleventh International Conference on Learning Representations, 2022.

Zhu, J.-Y., Park, T., Isola, P., and Efros, A. A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE international conference on computer vision, pp. 2223 2232, 2017.

Light and Optimal Schr odinger Bridge Matching

Proof of Theorem 3.1. Let p0, p1 denote the marginals of π. Let π be the EOT plan between p0, p1. Let p S 0 , p S 1 denote the distribution of S at t = 0 and t = 1, respectively. We use the fact that each element S of S is a reciprocal process with some EOT plan πS Π(p S 0 , p S 1 ), i.e. S = R W ϵ |x0,x1dπS(x0, x1), recall M2.1. In turn, πS can be represented through the input density p S 0 and the potential v S as in (6), i.e.:

πS(x0, x1) = p S 0 (x0)exp x0, x1 /ϵ v S(x1) cv S(x0) (17)

KL (Tπ S) = KL π πS + Z

RD RD KL Tπ|x0,x1 S|x0,x1 π(x0, x1)dx0dx1 = (18)

KL π πS + Z

RD RD KL W ϵ |x0,x1 W ϵ |x0,x1

π(x0, x1)dx0dx1 = (19)

RD RD log π(x0, x1) log πS(x0, x1)π(x0, x1)dx0dx1 = Z

RD RD log π(x0, x1)π(x0, x1)dx0dx1 Z

RD RD log πS(x0, x1)π(x0, x1)dx0dx1 =

RD RD log πS(x0, x1)π(x0, x1)dx0dx1 = (20)

RD RD log p S 0 (x0) exp x0, x1 /ϵ v S(x1)

π(x0, x1)dx0dx1 = (21)

log p S 0 (x0) + x0, x1 + log v S(x1) log cv S(x0) π(x0, x1)dx0dx1 =

RD RD x0, x1 π(x0, x1)dx0dx1

log p S 0 (x0) log cv S(x0) π(x0, x1)dx0dx1 Z

RD RD log v S(x1)π(x0, x1)dx0dx1 =

RD RD x0, x1 π(x0, x1)dx0dx1 Z

RD log v S(x1) Z

RD π(x0|x1)dx0

| {z } =1= R

RD π (x0|x1)dx0

π(x1) | {z } =π (x1)

(log p S 0 (x0) log cv S(x0)) Z

RD π(x1|x0)dx1 | {z } =1= R

RD π (x1|x0)dx1

π(x0) | {z } =π (x0)

RD RD x0, x1 π(x0, x1)dx0dx1 Z

RD log v S(x1) Z

RD π (x0|x1)dx0 π (x1)dx1

(log p S 0 (x0) log cv S(x0)) Z

RD π (x1|x0)dx1 π (x0)dx0 =

RD RD x0, x1 π(x0, x1)dx0dx1 Z

RD RD log v S(x1)π (x0, x1)dx0dx1

log p S 0 (x0) log cv S(x0) π (x0, x1)dx0dx1 =

RD RD x0, x1 π(x0, x1)dx0dx1

Light and Optimal Schr odinger Bridge Matching

RD RD x0, x1 π (x0, x1)dx0dx1 Z

RD RD x0, x1 π (x0, x1)dx0dx1 | {z } =0

log p S 0 (x0) + log v S(x1) log cv S(x0) π (x0, x1)dx0dx1 =

RD RD x0, x1 π(x0, x1)dx0dx1 + Z

RD RD x0, x1 π (x0, x1)dx0dx1

log p S 0 (x0) + x0, x1 + log v S(x1) log cv S(x0) π (x0, x1)dx0dx1 =

RD RD x0, x1 π(x0, x1)dx0dx1 + Z

RD RD x0, x1 π (x0, x1)dx0dx1

RD RD log p S 0 (x0) exp x0, x1 /ϵ v S(x1)

cv S(x0) | {z } πS(x1|x0)

π (x0, x1)dx0dx1 =

RD RD x0, x1 (π(x0, x1) π (x0, x1))dx0dx1 Z

RD RD log πS(x0, x1)π (x0, x1)dx0dx1

RD RD log π (x0, x1)π (x0, x1)dx0dx1 Z

RD RD log π (x0, x1)π (x0, x1)dx0dx1 | {z } =0

RD RD x0, x1 (π(x0, x1) π (x0, x1))dx0dx1 Z

RD RD log π (x0, x1)π (x0, x1)dx0dx1 | {z }

def = b C(π)

RD RD log π (x0, x1)

πS(x0, x1)π (x0, x1)dx0dx1 | {z } =KL(π πS)

b C(π) + KL π πS .

In (18) we use disintegration theorem for KL divergence to distinguish process plan πS and inner part (Vargas et al., 2021, Appendix C, D). In transition from (18) to (19) we notice, that Tπ|x0,x1 = W ϵ |x0,x1 and S|x0,x1 = W ϵ |x0,x1, since Tπ is a reciprocal process as well as Schr odinger Bridge S. In transition from (20) to (21) we use the fact, that πS is given by (17). Since

KL (Tπ S) = b C(π) + KL π πS ,

the minimum of KL (Tπ S) is achieved for S such that πS = π , i.e., when S is the SB between p0 and p1.

Proof of Theorem 3.2. We start by using a Pythagorean theorem for Markovian projection (Shi et al., 2023, Lemma 6)

KL (Tπ Sv) = KL (Tπ proj M(Tπ)) + KL (proj M(Tπ) Sv) , (22)

where the drift g M of the Markoian projection proj M(Tπ) is given by (9):

g M(xt, t) = Z

1 t dp Tπ(x1|xt). (23)

We use the expression of KL between Markovian processes starting from the same distribution p0 through their drifts (Pavon & Wakolbinger, 1991) and note that Markovian projection preserve the time marginals p Tπ(xt):

KL (proj M(Tπ) Sv) = 1

RD ||gv(xt, t) g M(xt, t)||2dp Tπ(xt)dt (24)

Light and Optimal Schr odinger Bridge Matching

Then we substitute g M by (23):

KL (proj M(Tπ) Sv) = 1

RD ||gv(xt, t) g M(xt, t)||2dp Tπ(xt)dt =

RD ||gv(xt, t) Z

1 t dp Tπ(x1|xt)||2dp Tπ(xt)dt =

n ||gv(xt, t)||2 2 gv(xt, t), Z

1 t dp Tπ(x1|xt) o dp Tπ(xt)dt +

1 t dp Tπ(x1|xt)||2dp Tπ(xt)dt | {z }

n ||gv(xt, t)||2 2 gv(xt, t), Z

1 t dp Tπ(x1|xt) o dp Tπ(xt)dt + C (π) =

n ||gv(xt, t)||2 2 gv(xt, t), x1 xt

1 t o dp Tπ(x1|xt)dp Tπ(xt) | {z } dp Tπ (xt,x1)

dt + C (π) =

n ||gv(xt, t)||2 2 gv(xt, t), x1 xt

1 t o dp Tπ(xt, x1)dt + C (π) =

n ||gv(xt, t) x1 xt

1 t ||2o dp Tπ(xt, x1)dt

1 t ||2dp Tπ(xt, x1)dt + C (π) | {z }

n ||gv(xt, t) x1 xt

1 t ||2o dp Tπ(xt, x1)dt + C (π)

KL (Tπ Sv) = KL (Tπ proj M(Tπ)) + C (Tπ) | {z }

n ||gv(xt, t) x1 xt

1 t ||2o dp Tπ(xt, x1)dt =

n ||gv(xt, t) x1 xt

1 t ||2o dp Tπ(xt, x1)dt. (25)

Proof of Theorem 3.3. From Theorem 3.2 it follows that:

KL (Tπ Sv) = C(π) + 1

n ||gv(xt, t) x1 xt

1 t ||2o dp Tπ(xt, x1)dt. (26)

In turn, from the proof of Theorem (3.1) it holds that:

KL (Tπ Sv) = b C(π) + KL π πSv . (27)

From the (Korotin et al., 2024, Propositon 3.1) it follows that KL π πS v = L0(v) L , where L is a constant depending on distirbutions p0, p1 and value ϵ. Hence, we combine these two expressions and get

n ||gv(xt, t) x1 xt

1 t ||2o dp Tπ(xt, x1)dt = e C(π) + L0(v),

where e C(π) def = b C(π) L C(π).

Light and Optimal Schr odinger Bridge Matching

B. Experiments details and extra results

We build our Light SB-M implementation upon Light SB official implementation https://github.com/ngushchin/ Light SB. All the parametrization, optimization and initialization details are the same as (Korotin et al., 2024) if not stated otherwise. In the Mini-batch (MB) setting, discrete OT algorithm ot.emd is taken from POT library (Flamary et al., 2021). The batch size is always 128.

B.1. Qualitative 2D setup hyperparameters

We use K = 250 potentials and Adam optimizer with lr = 10 3 in all the cases to train Light SB-M.

B.2. Evaluation on Biological Single-cell Data.

We follow the same setup as (Korotin et al., 2024) and use their code and data from https://github.com/ ngushchin/Light SB. All models are trained with ϵ = 0.1 if not stated otherwise. For completeness, we provide additional results of our solver trained with the independent plan with different values of the parameter ϵ, see Table 3.

ϵ DIM 50 100 1000

0.3 2.37 0.11 2.169 0.11 1.310 0.06 0.1 2.347 0.11 2.174 0.08 1.35 0.05 0.03 2.349 0.09 2.32 0.09 1.279 0.05 0.01 2.404 0.12 2.28 0.07 1.309 0.04

Table 3: Energy distance (averaged for two setups and 5 random seeds) on the MSCI dataset (M5.3) along with 95%-confidence interval ( intervals) for Light SB-M (ID).

B.3. Evaluation on the Schrodinger Bridge Benchmark

Here we first provide an additional evaluation of solvers using target matching and dynamic metrics. Then we study the speed of inference in our Light SB-M solver using the Brownian bridge vs. using the Euler Maruyama simulation.

Target metric evaluation. We additionally study how well each solver map initial distribution p0 into p1 by measuring the metric BW2 2-UVP also proposed by the authors of the benchmark (Gushchin et al., 2023b, M4). We present the results in Table 4. We observe that our method performs better than other bridge-matching approaches.

ϵ = 0.1 ϵ = 1 ϵ = 10

Solver Type D =2 D =16 D =64 D =128 D =2 D =16 D =64 D =128 D =2 D =16 D =64 D =128

Best solver on benchmark Varies 0.016 0.05 0.25 0.22 0.005 0.09 0.56 0.12 0.01 0.02 0.15 0.23 Light SB KL minimization 0.005 0.017 0.037 0.069 0.004 0.01 0.03 0.07 0.03 0.04 0.17 0.30 DSBM

Bridge matching

0.03 0.18 0.7 2.26 0.04 0.09 1.9 7.3 0.26 102 3563 15000 SF2M-Sink 0.04 0.18 0.39 1.1 0.07 0.3 4.5 17.7 0.17 4.7 316 812 Light SB-M (ID, ours) 0.02 0.03 0.2 0.46 0.005 0.04 0.11 0.27 0.07 0.03 0.11 0.21 Light SB-M (MB, ours) 0.005 0.07 0.27 0.63 0.002 0.04 0.12 0.36 0.04 0.07 0.11 0.23 Light SB-M (GT, ours) 0.02 0.03 0.21 0.55 0.011 0.03 0.11 0.26 0.016 0.04 0.09 0.21

Table 4: Comparisons of BW2 2-UVP (%) between the ground truth target distribution p1 and learned target distribution πθ(x1). The best metric over bridge matching solvers is bolded. Results marked with are taken from (Korotin et al., 2024).

Dynamic metrics evaluation. Following the authors of the benchmark paper (Gushchin et al., 2023b, Appendix F), we provide additional metrics for the learned dynamic of the Schr odinger Bridge. The authors of the benchmark measure forward KL(T ||S) and reversed KL(S||T ) divergences between the ground-truth process T and the learned process S. To do so, they define two auxiliary values:

L2 fwd[t] = Ext T g (xt, t) g S(xt, t) 2, L2 rev[t] = Ext S g (xt, t) g S(xt, t) 2,

and use the fact, that KL(T ||S) = 1 2ϵ R 1 0 L2 fwd[t]dt and KL(S||T ) = 1 2ϵ R 1 0 L2 rev[t]dt. The values of L2 fwd[t] and L2 rev[t] for the range of t [0, 1] are plotted in Figure 4. We observe that Light SB-M has a lower error L2 fwd[t] and L2 rev[t] as well as KL(T ||S) and KL(S||T ) in approximating the ground-truth optimal drift g (Xt, t) than other algorithm including DSBM (Shi et al., 2023) and SF2M (Tong et al., 2023). Values of KL(T ||S) and KL(S||T ) are given in the Table 5. Since L2 fwd[t] and L2 rev[t] are lower for most times for our algorithm, KL(T ||S) and KL(S||T ) are lower for our Light SB-M algorithm.

Light and Optimal Schr odinger Bridge Matching

Figure 4: Dynamic KL evaluation. L2 fwd[t] and L2 bwd[t] values w.r.t. time for different algorithms. Results denoted as Best solver (benchmark) are taken from the benchmark paper (Gushchin et al., 2023b)

Solver Light SBM (ID, ours) Best solver (benchmark) SF2M DSBM KL(T ||S) 0.0093 1.64 0.6422 0.2950 KL(S |T ) 0.0099 49.65 1.0765 0.39

Table 5: Dynamic KL values for different algorithms. Results denoted as Best solver (benchmark) are taken from the benchmark paper (Gushchin et al., 2023b).

Study of the efficiency of the sampling. Here we measure the performance of sampling (time and c BW2 2-UVP ) directly from the learned plan πθ(x1|x0) versus sampling by the Euler-Maruyama algorithm (Kloeden et al., 1992, M9.2) and using the drift function gθ(xt, t). We conduct our experiments on the benchmark setup with ϵ = 0.1 and D = 16. We present our results in the Table 6 and Table 7:

Inference type Time Euler-Maruyama, 3 steps 0.046 0.053 sec Euler-Maruyama, 10 steps 0.19 0.14 sec Euler-Maruyama, 30 steps 0.365 0.08 sec Euler-Maruyama, 100 steps 1.268 0.3 sec Euler-Maruyama, 300 steps 3.931 0.34 sec Euler-Maruyama, 1000 steps 12.61 1.32 sec Sampling from the plan πθ 0.00058 0.0001 sec

Table 6: Time measurements for Light SB-M sampling using the SDE approach (Euler-Maruyama) and direct sampling from the plan πθ on SB Benchmark (Gushchin et al., 2023b) with ϵ = 0.1 and D = 16. The number of steps for Euler-Maruyama is the number of SDE solver discretization steps. Results are averaged over 5 runs with std provided after .

As we can see from the obtained results, the Euler-Maruyama approach requires up to 500 steps to accurately solve the Schr odinger Bridge SDE. Thanks to the special form of the SDE provided by the used parametrization, we can directly sample from πθ(x1|x0). This is orders of magnitude faster than the full simulation of the trajectories.

B.4. Evaluation on unpaired image-to-image translation.

We follow the same setup as (Korotin et al., 2024) and use their code and data from https://github.com/ ngushchin/Light SB. All models are trained with ϵ = 0.1 if not stated otherwise.

According to (Korotin et al., 2024) we first split the FFHQ data into train (first 60k) and test (last 10k) images. Then we create subsets of males, females, children and adults in both train and test subsets. For training we first use the ALAE encoder to extract 512 dimensional latent vectors for each image and then train our solver on the extracted latent vectors. At

Light and Optimal Schr odinger Bridge Matching

Inference type c BW2 2-UVP Euler-Maruyama, 10 steps 1.53 Euler-Maruyama, 50 steps 0.22 Euler-Maruyama, 100 steps 0.126 Euler-Maruyama, 200 steps 0.102 Euler-Maruyama, 500 steps 0.09 Sampling from the plan πθ 0.09

Table 7: c BW2 2-UVP measurements for Light SB-M sampling using SDE approach (Euler-Maruyama) and sampling from the plan πθ on SB Benchmark (Gushchin et al., 2023b) with ϵ = 0.1 and D = 16. Number of steps for Euler-Maruyama means number of SDE solver discretization steps.

the inference stage, we first extract the latent vector from the image, translate it by Light SB-M, and then decode the mapped vector to produce the mapped image. In Figure 7, we provide extra examples for our Ligth SB-M and other baselines.

Test FID values. The FID values for our man woman FFHQ image translation setup are provided in Table 8. We measure FID values between decoded translated latents and encoded-decoded true images from the FFHQ dataset. For all considered solvers, we use the same value of the coefficient ϵ = 0.1, which produces moderate diversity in the generated images. The FID values are similar for all methods, which align with the good quality of images given in Figure 3.

Solver Light SBM (ID, ours) Light SBM (MB, ours) DSBM SF2M-Sink FID 0.852 0.859 0.859 0.8613

Table 8: FID values on unpaired man woman translation for different solvers applied in the latent space of ALAE (Pidhorskyi et al., 2020) for 1024x1024 FFHQ images. (Karras et al., 2019)

Different values of ϵ. We provide extra male female results for a wide range of values ϵ {0.01, 0.1, 1, 10}) in the Figure 8 below. We observe that our solver shows the expected behavior by providing more diversity for larger ϵ.

B.5. Baselines

DSBM (Shi et al., 2023). Implementation is taken from official repo

https://github.com/yuyang-shi/dsbm-pytorch

For forward and backward drift approximations, instead of those used in the official repository, we use MLP neural networks with positional encoding as they give better results. Number of inner gradient steps for Markovian Fitting Iteration is 10000, number of Markovian Fitting Iterations is 10. Adam optimizer (Kingma & Ba, 2014) with lr = 10 4 is used for optimization.

SF2M-Sink (Tong et al., 2023). Implementation is taken from official repo

https://github.com/atong01/conditional-flow-matching

For drift and score function approximation, we use MLP neural networks with positional encoding instead of those used in the official repository, as they give better results. Number of gradient updates 50000 for SB benchmark and Single-cell Data experiments and 20000 for unpaired image-to-image translation. Adam optimizer (Kingma & Ba, 2014) with lr = 10 4 is used for optimization.

Light SB s results are taken from the paper (Korotin et al., 2024).

C. Neural Network parametrization

Our Light SB-M is based on the Gaussian mixture parametrization. However, it is not the only way to implement optimal projection in practice. Here, we additionally propose a method to use neural network parametrization. We call this

Light and Optimal Schr odinger Bridge Matching

modification of our algorithm Hard Schr odinger Bridge Matching, or Hard SB-M.

To begin with, we discuss another type of Schr odinger potential for better clarity. In the main text, we utilize the more convenient adjusted Schr odinger potential v since it simplifies the usage of Gaussian Mixture approximation. However, in the literature, the more popular way is the usage of the Schr odinger potential φ (Chen et al., 2021b, Eq. 4.11), which has a one-to-one correspondence with the adjusted Schr odinger potential v:

φ(x, t = 1) = φ(x) = v(x) exp(||x||2

Furthermore, the drift g(x, t) of the Schr odinger Bridge with potential φ(x) is given by:

g(xt, t) = ϵ xtlog Z

RDN(x |xt, (1 t)ϵID) exp x 2

2ϵ v(x )dx =

RDN(x |xt, (1 t)ϵID)φ(x )dx . (29)

Below we use this non adjusted Schr odinger potential and denote φ(x, t = 1) as φ(x) to make derivations more concise. We parametrize this potential by a neural network φθ(x) and use (29) to derive the drift gθ given by φθ(x).

C.1. Drift Estimation

In the case of neural parametrization of φθ (or vθ), computation of drift gθ(xt, t) (29) becomes a non-trivial task since it is no longer a convolution of a Gaussian mixture with a Gaussian distribution. We propose two ways to tackle this issue.

Variant 1. Monte Carlo (MC) estimator. First, we recap SB drift expression (29) which for parametrized Schr odinger potential φθ states that the drift gθ(xt, t) is given by:

gθ(xt, t) = ϵ xtlog Z

RDN(x |xt, (1 t)ϵID)φθ(x )dx

By using the reparametrization trick (introducing x def = z p

(1 t)ϵ + xt), we get

gθ(xt, t) = ϵ xtlog Z

RDN(z|0, ID)φθ(z p

(1 t)ϵ + xt)dz =

RD xt φθ(z p

(1 t)ϵ + xt) N(z|0, ID)dz R

(1 t)ϵ + xt) N(z|0, ID)dz = ϵEz N(z|0,ID) xt φθ(z p

(1 t)ϵ + xt)

Ez N(z|0,ID) φθ(z p

(1 t)ϵ + xt) .

Then we can estimate gθ(xt, t) just by drawing samples {z}N n=1 and {z}M m=1 from N(z|0, ID) and using:

1 N PN n=1 xt φθ(zn p

(1 t)ϵ + xt)

1 M PM m=1 φθ(zn p

(1 t)ϵ + xt)

Calculation of the gradient of loss KL (Tπ Sφθ) given by (15) w.r.t. the parameters is straightforward using autodifferentiation software.

Variant 2. Monte Carlo Markov Chain (MCMC) estimator. The MC estimator proposed above is biased. We also suggest a non-biased estimator based on sampling from the unnormalized density below.

Theorem C.1 (Hard SB-M drift expression). The drift g(x, t) for the Schrodinger potential φ(x) is given by:

g(xt, t) = 1 1 t Ex pφ(x |xt)[x ] xt , (30)

where pφ(x |xt) exp ( x xt 2

2ϵ(1 t) )φ(x ).

To estimate drift by Theorem C.1, one needs to sample from unnormalized density pφ(x |xt). To do this, one may use the standard Unadjusted Langevin Algorithm (ULA), also known as just Langevin Dynamics (Roberts & Tweedie, 1996).

Light and Optimal Schr odinger Bridge Matching

C.2. Loss Gradient Estimation

To optimize the objective (15), one needs to compute its gradient w.r.t. the parameters θ which also involves θgθ(xt, t). With the MC estimator for gθ(xt, t), the gradient θgθ(xt, t) is trivially computed using automatic differentiation. However, with the MCMC estimator proposed in Theorem C.1 the way to compute θgθ(xt, t) is not trivial. We propose an unbiased gradient of loss estimator via sampling from unnormalized density. For this, we need the following theorem.

Theorem C.2 (Hard SB-M loss gradient expression). The gradient of (15) for the Schrodinger potential φθ(x) is given by: 1 ϵ

n ( θgθ(xt, t)) (gθ(xt, t) x1 xt

1 t ) o dp Tπ(xt, x1)dt,

θgθ(xt, t) = 1 1 t θEpφθ (x |xt)[x ].

In turn, θEpφθ (xt)[x ] can be computed via

θEpφθ (x |xt)[x ] = Epφθ (x |xt) h x { θ log φθ(x ) Ex pφθ (x |xt)[ θ log φθ(x )]} i .

To use this theorem in practice, we first estimate gθ(xt, t) by MCMC using (30) from Theorem C.1. Then, we estimate the gradient of the objective by using the Theorem C.2. At both stages, samples from pφθ(x |xt) can be drawn, e.g., using the Unadjusted Langevin Algorithm (Roberts & Tweedie, 1996, ULA).

C.3. Inference after model training

There are several inference approaches, e.g., Energy-Based and SDE Based.

Energy based inference. We can sample directly from the EOT plan (3) using generic MCMC samplers similar to Eg NOT (Mokrov et al., 2024). After sampling the end point x1 given the start point x0, the trajectories can be infered using self similarity property (Korotin et al., 2024, M3.2) of the Brownian Bridge W ϵ |x0,x1.

SDE based inference. Given the way to estimate drift gθ of the Schr odinger Bridge e.g. by MC (Appendix C.1) or MCMC (Appendix C.1) approaches, one can use any SDE solver to simulate trajectories. For example, one can use the simplest and most popular Euler-Maruyama scheme (Kloeden et al., 1992, M9.2).

One can combine these approaches by sampling an MCMC proposal used for Energy Based inference via SDE simulation.

C.4. Toy 2D experimental illustration

We use the same setups as in M5.1 with ϵ {0.03, 0.1, 1} and provide results for MC estimation in Figure 5 and for MCMC estimation in Figure 6. The Schr odinger potential φθ(x) : RD R+ is parametrized using exp(NNθ), where NNθ is a MLP. We test both MC and MCMC approaches.

Hyperparameters. For both MC and MCMC estimators, we use MLP with two hidden layers of widths [256, 256] with torch.nn.Si LU activations as NNθ. During training, Adam optimizer (Kingma & Ba, 2014) with lr = 10 4 is used, batch size is 128, model is trained for 105 loss gradient updates.

MC estimator. During training and inference, we use 1000 MC samples. Inference is held with SDE simulation using 1000 Euler-Maruyama discretization steps for ϵ = 1 and 100 Euler-Maruyama discretization steps for other ϵ. Due to the necessity to compute exp(NNθ) which may have very high values, we use double precision torch.Double Tensor for all MC-related calculations.

MCMC estimator. During training and inference to estimate gθ by (30) we use 100 samples drawn using Unadjusted Langevin Algorithm (ULA) with 50 steps and step size η = 0.001. The inference was performed in two steps: first, the SDE simulation was performed with gθ estimation by ULA, and then the result was used as a proposal for energy-based sampling from the EOT plan, (3). SDE simulaion was held using 100 Euler-Maruyama discretization steps (ULA settings are the same as for training) and Energy Based sampling from EOT plan using ULA with with 1000 steps and step size η = 10 4.

Light and Optimal Schr odinger Bridge Matching

(a) x p0, y p1.

(b) ϵ = 0.03.

(c) ϵ = 0.1.

Figure 5: The process Sθ learned by Hard SB-M (ours) with MC drift estimator Gaussian Swiss roll example.

(a) x p0, y p1.

(b) ϵ = 0.03.

(c) ϵ = 0.1.

Figure 6: The process Sθ learned by Hard SB-M (ours) with MCMC drift estimator Gaussian Swiss roll example.

C.5. Proofs

Proof of Theorem C.1. We denote by Zxt,(1 t)ϵ def = R

RD exp x xt 2

2(1 t)ϵ dx the normalization constant of the normal distribution N(x |xt, (1 t)ϵID). For a potential φθ(x), the corresponding drift g(xt, t) is given by (29):

g(xt, t) = ϵ xt log Z

RD N(x |xt, (1 t)ϵID)φ(x )dx

We will proceed with this equality to obtain an unbiased estimator. We proceed as follows:

g(xt, t) = ϵ xt log Z

RD N(x |xt, (1 t))φ(x )dx =

RD 1 Zxt,(1 t)ϵ exp x xt 2

2(1 t)ϵ φ(x )dx =

RD 1 Zxt,(1 t)ϵ exp x xt 2

2(1 t)ϵ φ(x )dx R

RD 1 Zxt,(1 t)ϵ exp x xt 2

2(1 t)ϵ φ(x )dx =

RD exp x xt 2

2(1 t)ϵ φ(x )dx R

RD exp x xt 2

2(1 t)ϵ φ(x )dx =

RD xt n exp x xt 2

2(1 t)ϵ o φ(x )dx R

RD exp x xt 2

2(1 t)ϵ φ(x )dx = h xf(x) = f(x) x log f(x) i = (31)

RD xt x xt 2

2(1 t)ϵ exp x xt 2

2(1 t)ϵ φ(x )dx R

RD exp x xt 2

2(1 t)ϵ φ(x )dx =

h pφ(x |xt) exp ( x xt 2

2ϵ(1 t) )φ(x ) i =

Light and Optimal Schr odinger Bridge Matching

RD xt x xt 2

2(1 t)ϵ pφ(x |xt)dx =

ϵEx pφ(x |xt) x xt

(1 t)ϵ = 1 (1 t) Ex pφ(x |xt)[x ] xt . (32)

In line (31), we use log-derivatve trick.

Proof of Theorem C.2. We derive

RD RD gθ(xt, t) x1 xt

1 t 2dp Tπ(xt,x1)dt =

n ( θgθ(xt, t)) (gθ(xt, t) x1 xt

1 t ) o dp Tπ(xt,x1)dt. (33)

Now we recap the result of Theorem C.1: gθ(x, t) = 1 (1 t) Ex pφθ (x |xt)[x ] xt . Now we derive θgθ(x, t):

θgθ(xt, t) = 1 (1 t) θEx pφθ (x |xt)[x ] = Z(xt, φθ) def = Z

RD exp x xt 2

2(1 t)ϵ φθ(x )dx =

= 1 (1 t) θ

RD x exp x xt 2

2(1 t)ϵ φθ(x )

Z(xt, φθ) dx =

= 1 (1 t) Z

RD x θ exp x xt 2

2(1 t)ϵ φθ(x )

Z(xt, φθ) dx = h θfθ( ) = fθ( ) θ log fθ( ) i = (34)

= 1 (1 t) θEx pφθ (x |xt) x θ( x xt 2

2(1 t)ϵ + log φθ(x ) log Z(xt, φθ)) =

= 1 (1 t) θEx pφθ (x |xt) x ( θ log φθ(x ) θ log Z(xt, φθ)) =

= 1 (1 t) θEx pφθ (x |xt) x θ log φθ(x )

RD θ exp x xt 2

2ϵ(1 t) φθ(x ) dx

Z(xt, φθ) ) =

= h θfθ( ) = fθ( ) θ log fθ( ) i = (35)

= 1 (1 t) θEx pφθ (x |xt) x θ log φθ(x )

RD exp x xt 2

2ϵ(1 t) φθ(x ) θ log φθ(x )dx

Z(xt, φθ) =

= 1 (1 t) θEx pφθ (x |xt) x θ log φθ(x ) Ex pφθ (x |xt) θ log φθ(x ) . (36)

In lines (34) and (35), we use the log-derivative trick.

Light and Optimal Schr odinger Bridge Matching

(a) Man Woman.

(b) Adult Child.

Figure 7: Additional examples of image-to-image translation.

Light and Optimal Schr odinger Bridge Matching

Figure 8: Image-to-image experiments with ϵ {0.01, 0.1, 1, 10}.