# topological_schrödinger_bridge_matching__f7e46b10.pdf Published as a conference paper at ICLR 2025 TOPOLOGICAL SCHR ODINGER BRIDGE MATCHING Maosheng Yang Delft University of Technology m.yang-2@tudelft.nl Given two boundary distributions, the Schr odinger Bridge (SB) problem seeks the most likely random evolution between them with respect to a reference process. It has revealed rich connections to recent machine learning methods for generative modeling and distribution matching. While these methods perform well in Euclidean domains, they are not directly applicable to topological domains such as graphs and simplicial complexes, which are crucial for data defined over network entities, such as node signals and edge flows. In this work, we propose the Topological Schr odinger Bridge problem (T SBP) for matching signal distributions on a topological domain, where we set the reference process to follow some linear tractable topology-aware stochastic dynamics such as topological heat diffusion. For the case of Gaussian boundary distributions, we derive a closed-form Gaussian topological SB in terms of its time-marginal and stochastic differential. In the general case, leveraging the well-known result, we show that the optimal process follows the forward-backward topological dynamics governed by some unknowns. Building on these results, we develop T SB-based models for matching topological signals by parameterizing the unknowns in the optimal process as (topological) neural networks and learning them through likelihood training. We validate the theoretical results and demonstrate the practical applications of T SBbased models on both synthetic and real-world networks, emphasizing the role of topology. Additionally, we discuss the connections of T SB-based models to other emerging models, and outline future directions for topological signal matching. 1 INTRODUCTION As a fundamental problem in statistics and optimization, matching distributions aims to find a map that transforms one distribution to another. It has found numerous applications in machine learning tasks, particularly in generative modeling, which often involves learning a transformation from a data distribution to a simple one (often Gaussian) for efficient sampling and inference. While various methods have been proposed, including score-based [Ho et al., 2020; Song et al., 2020b] and flowbased [Lipman et al., 2022] generative models, among others [Neklyudov et al., 2023; Albergo et al., 2024; Tong et al., 2024a], the Schr odinger Bridge (SB)-based methods [De Bortoli et al., 2021; Chen et al., 2022a; Liu et al., 2024] provide a principled framework for matching arbitrary distributions. Inspired by Schr odinger [1931; 1932], the classical SB problem (SBP) aims to find an optimal stochastic process that evolves from an initial distribution to a final distribution, while minimizing the relative entropy (Kullback-Leibler divergence) between the measures of the optimal process and the Wiener process [L eonard, 2014]. Alternatively, the SBP can be cast as a stochastic optimal control (SOC) problem which minimizes the kinetic energy while matching the distributions through a nonlinear stochastic process [Dai Pra, 1991; Pavon & Wakolbinger, 1991]. The optimal solution to this problem satisfies a Schr odinger system of coupled forward-backward (FB) stochastic differential equations (SDEs) [L eonard, 2014]. Traditionally, SB problems have been solved by addressing the unknowns in this system using purely numerical methods [Fortet, 1940; F ollmer, 1988]. More recently, machine learning approaches have been proposed [Pavon et al., 2021; Vargas et al., 2021; Wang et al., 2021; De Bortoli et al., 2021; Chen et al., 2022a] where the unknowns are approximated by learnable models (e.g., Gaussian processes, neural networks) trained on data-driven objectives, such as the likelihood training [Chen et al., 2022a]. Published as a conference paper at ICLR 2025 However, SB-based methods have primarily focused on solving tasks in Euclidean spaces, such as time series, images [Deng et al., 2024] and point clouds. Modern learning tasks often involve data supported on irregular topological domains such as graphs, simplicial and cell complexes. Arising from applications like chemical reaction networks, biological networks, power systems, social networks [Wang et al., 2022; Faskowitz et al., 2022; Bick et al., 2023], the emerging field of topological machine learning [Papamarkou et al., 2024] centers on signals supported on topological objects such as nodes and edges, which can represent sensor data or flow type data over network entities. A direct application of existing SB models to such topological data may fail due to their inability to account for the underlying topology. Thus, in this work, we investigate the SBP for topological signals, with a focus on node and edge signals over networks modeled as graphs and simplicial complexes. To match distributions of such topological signals, our contributions are threefold. (i) We propose the Topological Schr odinger Bridge problem (T SBP), which seeks an optimal topological stochastic process that minimizes the relative entropy with respect to a reference process, while respecting the initial and final distributions. To incorporate the domain knowledge, we define the reference process to follow topology-aware SDEs (T SDEs) with a linear topological convolution drift term, admitting tractable Gaussian transition kernels. This subsumes the commonly-used stochastic heat diffusions on graphs and simplicial complexes for networked dynamics modeling. (ii) Focusing on the case where the end distributions are Gaussian, we find the closed-form optimal Gaussian T SB and characterize it in terms of a stochastic interpolant time marginal, as well as its Itˆo differential. This generalizes the results of Bunne et al. [2023] where the reference process is limited to SDEs scalar-valued linear coefficients. For the general case, we show that, upon existing results, the optimal T SB adheres a pair of FB-T SDEs governed by some unknown terms (also called policies), which in turn satisfy a system driven by topological dynamics. (iii) We propose the T SB-based model for topological signal generative modeling and matching. Specifically, we parameterize the hard-to-solve policies by some (topology-aware) learnable models (e.g., graph/simplicial neural networks), and train them by maximizing the likelihood of the model based on Chen et al. [2022a]. We show that T SB-based models unify the extensions of score-based and diffusion bridge-based models [Song et al., 2020b; Zhou et al., 2024] for topological signals. We validate the theoretical results and demonstrate the practical implications of T SB-models on synthetic and real-world networks involved with brain signals, single-cell data, ocean currents, seismic events and traffic flows. Before concluding the paper, we extensively discuss future directions in generative modeling for topological data. Overall, our work lies in the intersection of SB theory, stochastic dynamics on topology, machine learning and generative modeling for topological signals. Notations. We denote by X a stochastic process (Xt)0 t 1 as a map X : [0, 1] X Rn from the unit time interval [0, 1] (i.e., index space) and sample space X (e.g., Euclidean space) to Rn (state space). Here, Xt is a random variable representing the state at t. The standard n-dim Wiener process (Brownian motion) is denoted by W. Let Ω= C([0, 1], Rn) denote the space of all continuous Rnvalued paths on [0, 1], and let P(Ω) denote the space of probability measures on Ω. For a path measure P P(Ω) describing the law of the process X, we denote by Pt its time marginal that describes the distribution of Xt, i.e., if X P, then Xt Pt. We assume distributions of random variables are associated with measures that have a density with respect to the Lebesgue measure. We also assume locally Lipschitz smoothness on the drift and diffusion coefficients of SDEs. 2 BACKGROUND 2.1 SCHR ODINGER BRIDGE PROBLEM Let QW be the path measure of a Wiener process d Yt = σ d Wt with variance σ2. The classical SBP [L eonard, 2014] seeks an optimal path measure P on Ωby minimizing its relative entropy DKL with respect to QW min DKL(P QW ), s.t. P P(Ω), P0 = ρ0, P1 = ρ1, (SBP) where ρ0 and ρ1 are the prescribed initial and final time marginals on Rn. Intuitively, the SBP aims to find a stochastic process evolving from ρ0 to ρ1 that are most likely to a reference (a prior) process, here the Wiener process. This is in fact a dynamic formulation of the entropic-regularized Published as a conference paper at ICLR 2025 optimal transport (OT) with a quadratic transport cost [Villani, 2009]. The static formulation reads min π Π(ρ0,ρ1) Rn Rn 1 2 x0 x1 2 dπ(x0, x1) + σ2DKL(π ρ0 ρ1) (E-OT) where Π(ρ0, ρ1) is the set of couplings between ρ0 and ρ1, and ρ0 ρ1 denotes their product measure. In this static formulation, the optimization is over the coupling of ρ0 and ρ1, as opposed to the full path measure in the dynamic formulation (SBP). As σ 0, E-OT reduces to the typical 2-Wasserstein OT, and the associated dynamic problem is given by Benamou & Brenier [2000]. 2.2 TOPOLOGICAL SIGNALS In this work, we are interested in signals defined on a graph, a simplicial complex or a cell complex such a topological domain, denoted by T . If T is a graph with a node set and an edge set, we may define a node signal x Rn as a collection of values associated to the nodes, where n denotes the number of nodes. Such signals often arise from sensor measurements in sensor networks, user properties in social networks, etc [Shuman et al., 2013]. Similarly, if T is a simplicial 2-complex SC2 with the sets of nodes, edges, as well as triangular faces (or triangles), we can define an edge flow by associating a real value to each oriented edge. Here, for an edge e = {i, j}, if we choose [i, j] as its positive orientation, then [j, i] represents the opposite [Godsil & Royle, 2001]. The sign of the signal thus indicates the flow orientation relative to the chosen one. Such edge signals often represent flows of information or energy, such as water flows, power flows, or transaction flows [Bick et al., 2023]. Moreover, we may consider signals on general topological objects such as higher-order simplices (or cells). If n is the number of simplices, we refer to x Rn as a topological signal where the i-th entry represents the signal value on the i-th simplex. In topology, these are called cochains, which are the discrete analogues to differential forms [Lim, 2020]. The emerging field of learning on graphs and topology [Barbarossa & Sardellitti, 2020; Papamarkou et al., 2024] concerns such topological signals, where the central idea is to leverage the underlying topological structure in T . For example, the graph Laplacian or adjacency matrix (or their variants) can be used to encode the graph s structure, acting as a spectral operator for node signals [Chung, 1997]. Similarly, in a SC2, the Hodge Laplacian can be defined as the operator for edge flows, composed of the down and up parts, which encode the edge adjacency through a common node or triangle, respectively. Other variants of Hodge Laplacians can also be defined [Grady & Polimeni, 2010; Schaub et al., 2020]. Thus, for a topological signal x Rn, we assume a Laplacian-type, positive semidefinite, topological operator L Rn n on T which encodes the topological structure. In a probabilistic setting, a topological signal can be considered random, following some high-dim distribution on Rn associated with the topology T . This allows for the application of probabilistic methods to topological signals, similar to Euclidean cases. Recent works [Borovitskiy et al., 2021; Yang et al., 2024; Alain et al., 2024] have modeled node signals and edge flows using Gaussian processes (GPs) on graphs and simplicial complexes. These GPs encode the topological structure by building their covariance matrix (kernel) Σ as a matrix function of the associated Laplacian L. For example, a diffusion node GP uses the kernel Σ = exp( κ2 2 L) with a hyperparameter κ and the graph Laplacian L. Other GPs can be defined as well to model signals in specific subspaces with certain properties [Yang et al., 2024] or to jointly model the node-edge signals [Alain et al., 2024]. 3 TOPOLOGICAL SCHR ODINGER BRIDGE PROBLEM In a topological domain T , we consider a topological stochastic process X where the index space is instead the product space of [0, 1] and the set of topological objects (e.g., nodes, edges) in T , and the state space Rn is the space of topological signals with n the cardinality of the set. When we consider the node set in a graph (the edge set in a SC2), X is a stochastic process of node signals (edge flows). We assume that X follows some unknown dynamics with its law described by the path measure P. For some prescribed initial and final time-marginals, i.e., X0 P0 = ν0 and X1 P1 = ν1, we then aim to obtain the optimal P by solving the Topological Schr odinger Bridge Problem (T SBP): min DKL(P QT ), s.t. P P(Ω), P0 = ν0, P1 = ν1. (T SBP) Here, QT is the path measure of a reference process Y which follows some prior topology-aware stochastic dynamics on T . Effectively, the solution P to T SBP describes the most likely process Published as a conference paper at ICLR 2025 Figure 1: Heat diffusion starting from a node over a graph (Left) and an edge over a SC2 (Right), followed by intermediate states, then reaching the steady states where the heat becomes uniform for the node case whereas circulating around the cycle for the edge case. X that conforms to the prior Y in the sense of minimizing relative entropy with respect to QT , while respecting the initial and final distributions ν0 and ν1. Topological stochastic dynamics. For the reference process Y , given an initial topological signal condition Y0 = y0, we assume it follows a general class of topological SDEs: d Yt = f(t, Yt; L) dt + gt d Wt, (T SDE) where ft f(t, ; L) : Rn Rn is a time-varying drift that depends on the topological structure T through the operator L, and gt g(t) R is a scalar diffusion coefficient. For tractability, we consider a class of linear dynamics on T with the following drift term: f(t, Yt; L) = Ht(L)Yt + αt, with Ht(L) = PK k=0 hk(t)Lk (1) and αt Rn a bias term. Here, Ht(L), denoted simply as Ht, is a matrix polynomial of L with time-varying coefficients hk,t hk(t), which is able to approximate any analytic function of L for an appropriate K by the Cayley-Hamilton theorem. The drift ft is also referred to as a topological convolution of the topological signal in the literature. With a graph Laplacian L, this returns the graph convolution of a node signal [Sandryhaila & Moura, 2013; 2014], and with the Hodge Laplacian for edges or general simplices, it yields the simplicial convolution [Yang et al., 2022b]. Various topological machine learning methods have been developed based on such convolutions for their expressivity and efficiency. We provide a few examples of linear T SDE, which will be used later. Topological stochastic heat diffusion: T SDE gives the stochastic variant of heat equation on T d Yt = c LYt dt + gt d Wt, (T SHeat) by setting Ht = c L, with c > 0. When T is a graph with the graph Laplacian L, this dynamics enables modeling graph-time GPs [Nikitin et al., 2022], networked dynamic system [Pereira et al., 2010; Delvenne et al., 2015; Santos et al., 2024], and social opinion dynamics [Gaitonde et al., 2021]. More importantly, in the deterministic case of gt = 0, it has a harmonic steady-state, revealing the topological features of T . Specifically, node diffusion converges to a state that can identify the connected components (0-dim holes), while edge diffusion in a SC2 based on the Hodge Laplacian has a converging state of circulating around cycles (1-dim holes). We refer to Fig. 1 for such illustrations. In line with our goal of distribution matching for topological signals, we present three examples of T SHeat, inspired by diffusion models for generative modeling [Song et al., 2020b]. Example 1 (T SHeat BM). Consider a constant gt = g in T SHeat. This results in a mixture of a topological heat diffusion and BM with variance g2, which we refer to as T SHeat BM. Example 2 (T SHeat VE). For some noise scales 0 < σmin < σmax, consider a time-increasing gt = p dσ2(t)/dt with σ(t) = σmin(σmax/σmin)t, which drives the well-known variance exploding (VE) noising process [Song & Ermon, 2020; Song et al., 2020b]. The resulting form of T SHeat is d Yt = c LYt dt + p dσ2(t)/dt d Wt. (T SHeat VE) Example 3 (T SHeat VP). When combined with another noising process, known as the variance preserving (VP) process [Sohl-Dickstein et al., 2015; Ho et al., 2020; Song et al., 2020b], we obtain 2β(t)I + c L Yt dt + p β(t) d Wt, (T SHeat VP) where β(t) = βmin+t(βmax βmin) with scales 0 < βmin < βmax. The drift here can be considered as an instantiation of the topological convolution Ht = 1 2β(t)I + c L in (1). Gaussian transition kernels. The T SDE, as an Itˆo process, is fully characterized by its transition kernel in a probabilistic sense. As a result of the linear drift (1) of T SDE, the associated transition kernel pt|s(yt|ys) (i.e., conditional distribution of Yt|Ys) is Gaussian. Its mean and covariance can be computed according to S arkk a & Solin [2019, Eq. 6.7]. Let the transition matrix of the ODE d Yt = Ht(L)Yt dt be denoted by Ψts Ψ(t, s), which is given by Ψts = exp( R t s Hτ dτ) [cf. Lemma B.5]. For brevity, we denote Ψt0 as simply Ψt. We then have the following lemma. Published as a conference paper at ICLR 2025 Lemma 4 (Statistics of transition kernels). For the T SDE with the linear drift (1), its Gaussian transition kernel pt|0(yt|y0) has the mean mt and the cross covariance Kt1t2, at t1 and t2: mt = Ψty0 + Ψt 0 Ψ 1 τ ατ dτ =: Ψty0 + ξt, (cond. mean) Kt1t2 = Ψt1 Z min{t1,t2} 0 g2 τΨ 2 τ dτ Ψ t2. (cond. cross cov) More importantly, we may characterize them for T SHeat BM and T SHeat VE in closed-forms. Both have the same mean mt = Ψty0 with Ψt = exp( c Lt), and their covariances are given by: 2c exp( c L|t1 t2|) exp( c L(t1 + t2)) L 1, for T SHeat BM σ2 min ln σmax σmin exp( c L(t1 + t2)) exp(2A min{t1, t2}) I A 1, for T SHeat VE with A = ln σmax σmin I + c L. If L is singular, we use a perturbed L + ϵI for a small ϵ > 0. We detail the derivations in Appendix B. These expressions allow for tractable solutions for T SBP and, more importantly, facilitate the construction of T SB-based learning models, which we will discuss later. 3.1 TOWARDS AN OPTIMAL SOLUTION OF T SBP To solve the classical SBP, early mathematical treatments [Fortet, 1940; Beurling, 1960; Jamison, 1975; F ollmer, 1988] lead to a Schr odinger system characterizing the SB optimality. Similarly, by Disintegration of Measures, we can convert the T SBP to a static problem over the joint measure P01 of the initial and final states, instead of the full path measure P min DKL(P01 QT 01), s.t. P01 P(Rn Rn), P0 = ν0, P1 = ν1 (T SBPstatic) where QT 01 is the joint measure of the reference process Y at t = 0 and 1. The T SBPstatic only concerns at the boundary times, unlike the (dynamic) T SBP. Using Lagrange multipliers for the linear constraints above, we can arrive at a Schr odinger System that is instead driven by topological dynamics (see Appendix C), differing from the classical case [Jamison, 1975; L eonard, 2014]. This can be also interpreted through the equivalent E-OT formulation of T SBPstatic: Rn Rn 1 2 y1 Ψ1y0 ξ1 2 K 1 11 d P01(y0, y1) + Z Rn Rn log(P01) d P01 (T E-OT) where the transport cost is linked to the T SDE as a K 1 11 -weighted norm of the difference y1 m1. On the other hand, this system could also be derived from the SOC view which makes more apparent connections to machine learning methods. Building on the variational formulations of the classical SBP by Dai Pra [1991]; Pavon & Wakolbinger [1991], Caluya & Halder [2021] extended the analysis to the case with a general nonlinear reference process and derived the corresponding optimality condition. As it is convenient to arrive at an SOC formulation for the T SBP (see Appendix C), we readily obtain the following optimality. Proposition 5 (T SBP optimality; Caluya & Halder [2021]; Chen et al. [2022a]). The optimal solution P of T SBP can be expressed as the path measure of the following forward (3a), or equivalently, backward (3b), T SDE: d Xt = [ft + gt Zt] dt + gt d Wt, X0 ν0, Zt gt log φt(Xt) (3a) d Xt = [ft gt ˆZt] dt + gt d Wt, X1 ν1, ˆZt gt log ˆφt(Xt) (3b) where (3a) runs forward and (3b) runs backward with a backward Wiener process. Here, φt φt(Xt) and ˆφt ˆφt(Xt) satisfy a pair of PDEs system (forward-backward Kolmogorov equations). Using nonlinear Feynman-Kac formula (or applying Itˆo s formula on log φt and log ˆφt), this PDE system admits the SDEs d log φt = 1 2 Zt 2 dt+Z t d Wt, d log ˆφt = 1 2 ˆZt 2 + (gt ˆZt ft)+ ˆZ t Zt dt+ ˆZ t d Wt (4) Then, the optimal path measure has the time-marginal Pt = φt(Xt) ˆφt(Xt) = P(3a) t = P(3b) t . Published as a conference paper at ICLR 2025 This optimality condition adapts the result from Chen et al. [2022a] for T SBP. From the forwardbackward T SDEs (FB-T SDEs in (3)), we see that the optimal Zt guides the forward T SDE to the final ν1, and likewise ˆZt adjusts the reverse T SDE to return to the initial ν0. While solving the system (4) is still highly nontrivial, we highlight that the FB-T SDEs (3) and (4) pave a way for constructing generative models and efficient training algorithms, as demonstrated by the recent works, to name a few, [Pavon et al., 2021; Vargas et al., 2021; De Bortoli et al., 2021; Chen et al., 2022a]. We further discuss in detail how to build such models for topological signals in Section 5. 4 GAUSSIAN TOPOLOGICAL SBP In this section, we consider the special case of T SBP where the initial and final measures are Gaussians, to which we refer as the Gaussian topological SBP (GT SBP). We show that there exists a closed-form GT SB by following the idea in Bunne et al. [2023], which focuses on a limited class of reference SDEs with a scalar coefficient in the drift, instead of a convolution operator Ht(L). We establish the first closed-form expression on the GT SB in the following theorem. Theorem 6. Denote by P the solution to GT SBP with ν0 = N(µ0, Σ0) and ν1 = N(µ1, Σ1). Then, P is the path measure of a Markov Gaussian process whose marginal Xt N(µt, Σt) admits an expression in terms of the initial and final variables, X0, X1, as follows Xt = Rt X0 + Rt X1 + ξt Rtξ1 + Γt Z (5) where Z N(0, I) is standard Gaussian, independent of (X0, X1), and Rt = Kt1K 1 11 , Rt = Ψt RtΨ1, Γt := Cov[Yt|(Y0, Y1)] = Ktt Kt1K 1 11 K1t. (6) Proof. We provide a sketch of the proof here, with the full derivations presented in Appendix D. 1. By Disintegration of Measures, we first solve the reduced static Gaussian T SBPstatic (i.e., T EOT). We can then convert the problem into a classical Gaussian E-OT via a change-of-variables. The closed-form formula for the latter has been recently found by Janati et al. [2020, Theorem 1]. Via an inverse transform, we can then obtain the optimal coupling P01 [i.e., the optimal GT SBstatic]. 2. In the disintegration of GT SBP to its static problem, the optimum is achieved when P shares the bridge with the reference QT (i.e., P is in the reciprocal class of QT ) [F ollmer, 1988; L eonard, 2014, Proposition 1]. The QT -bridge, Qxy T = QT [ |Y0 = x, Y1 = y], can be constructed using the conditional Gaussian formula and Lemma 4. Upon this, together with the optimal P01, we can construct the optimal Xt and the marginal Pt. At the first sight, the construction of optimal process X in (5) meets the recently proposed stochastic interpolant framework by Albergo et al. [2024, Definition 1], in that Xt=0 = X0 and Xt=1 = X1, and Γ0 = Γ1 = 0. Moreover, from (5), we can compute the marginal statistics Pt in terms of its mean µt and covariance Σt in closed-form as well, detailed in Corollary D.1. In the following, we characterize the process X under the optimal P in terms of its Itˆo differential. Theorem 7 (SDE representation). Under the optimal P, the process X admits the SDE dynamics: d Xt = f T (t, Xt; L) dt + gt d Wt, where f T (t, x; L) = S t Σ 1 t (x µt) + µt (7) with µt, Σt the mean and covariance of Xt [cf. Corollary D.1] and we have St = Pt Q t + Ht Ktt Kt1K 1 11 Υ t , (8) with Pt = (RtΣ1 + Rt C) R t , Qt = Rt(CR t + Σ0 R t ), Υt = Ht Kt1 + g2 t Ψ 1 t Ψ 1 , where C is the covariance of X0, X1 in the optimal P01. Proof. We detail the proof in Appendix D and outline a sketch here. From L eonard [2014]; Caluya & Halder [2021] [cf. Theorem C.2], the optimal P is the law of an SDE in the class of (7). To determine the drift, we first compute the associated infinitesimal generator by definition for some test function. Since the generator for an Itˆo SDE is known (dependent on the drift) [S arkk a & Solin, 2019, Eq. 5.9], we can then match the two expressions and find a closed-form for the drift term. Published as a conference paper at ICLR 2025 Theorems 6 and 7 characterize the optimal P of the GT SBP from different views. While the stochastic interpolant formula is intuitive and straightforward, it is natural to look for the associated SDE for a Markov measure. Despite the packed variables, both results [cf. Eqs. (5) and (7)] fundamentally depend on the transition matrix Ψt and ξt, Kt1t2 [cf. Lemma 4], which has closed-forms (2) for T SHeat BM and T SHeat VE. From a broader perspective, Theorems 6 and 7 extended the existing results of Bunne et al. [2023], where the reference process has a limited drift c Yt + αt for some scalar c. While Chen et al. [2016] aimed to solve for a linear drift with a matrix coefficient, their results lead to the solution of a matrix Riccati equation, which is computationally expensive. Solution complexity. While the variables involved in Eqs. (5) and (7) involve many matrix operations, we remark that (i) the underlying Ψt is a matrix function of L and can be computed efficiently [Higham, 2008]. Given the eigen-decomposition L = UΛU , denote by ht,s k = R t s hk,τ dτ the integral of the scalar coefficients in Ht [cf. (1)], then we have Ψts = U exp PK k=0 ht,s k Λk U , where the matrix exponential can be directly computed elementwise on each diagonal element of Λ. (ii) The other terms depending on Ψt can be computed similarly in the eigenspectrum of L. 5 FROM T SBP TO TOPOLOGICAL SIGNAL GENERATIVE MODELS The recent SB-based generative modeling framework primarily relies on the learnable parameterizations of the (Zt, ˆZt) pair (also viewed as the FB policies) in the FB-SDEs and a trainable objective that approximates the SBP. Specifically, Vargas et al. [2021] and De Bortoli et al. [2021] use GPs and neural networks, respectively, to parameterize the policies, and alternatively train them using iterative proportional fitting (IPF) to solve the half-bridge problem. On the other hand, Chen et al. [2022a] derived a likelihood based on the SB optimality condition, generalizing the score matching framework [Song et al., 2020b]. Upon the proposed T SBP, along with the above theoretical results, we now discuss how to build generative models for topological signals using the existing framework designed for Euclidean domains. T SB-based model. Consider the matching task: In some topological domain T , given two sets of signal samples following initial and final distributions ν0, ν1 on T , we aim to learn a Topological Schr odinger Bridge between the two distributions. From Proposition 5, the optimal T SB follows the FB-T SDEs in (3). Moreover, given a path sampled from the forward SDE (3a) with an initial signal x0, one can obtain an unbiased estimation of the log-likelihood L(x0) of the T SB model driven by the optimal policies by using (4) [Chen et al., 2022a, Theorem 4]. Similarly, the loglikelihood L(x1), given a final sample x1, can be found. This allows us to build a T SB-based model for topological signals, following the ideas of De Bortoli et al. [2021]; Chen et al. [2022a]. We first parameterize the policies, Zt and ˆZt, by two learnable models Zθ t Z(t, x; θ) and ˆZ ˆθ t ˆZ(t, x; ˆθ) with parameters θ and ˆθ, resulting in the parameterized FB-T SDEs. Then, we can perform a likelihood training by minimizing the following loss functions in an alternative fashion at initial and final signal samples x0 and x1 l(x0; ˆθ) = Z 1 2 ˆZ ˆθ t 2 + gt ˆZ ˆθ t + Zθ t ˆZ ˆθ t X0 = x0 l(x1; θ) = Z 1 2 Zθ t 2 + gt Zθ t + ˆZ ˆθ t Zθ t X1 = x1 which are, respectively, the upper bounds of the negative log-likelihoods (after dropping the unrelated terms) of the signal samples x0 and x1 given paths sampled from the FB-T SDEs. Other choice of reference T SDE. In this work, we mainly consider reference dynamics following T SHeat BM, T SHeat VE and T SHeat VP. For the dynamics involved with Hodge Laplacians in a SC2, we may further allow heterogeneous diffusion based on the down and up parts of the Laplacian. Bunne et al. [2023] proposed to better initialize the SB model using the closed-form Gaussian SB. Likewise, we can consider the GT SB in (7) as a stronger prior process, which yet requires a GP approximation from signal samples. We also consider fractional Laplacian for some cases to enable a more efficient exploration of the network [Riascos & Mateos, 2014] due to its non-local nature. Topological neural networks (TNNs). While De Bortoli et al. [2021]; Chen et al. [2022a] applied convolutional neural networks for the Euclidean SBP, we naturally consider parameterizing the policies using the emergent TNNs. For node signals, we could consider graph convolution networks Published as a conference paper at ICLR 2025 (GCNs) [Kipf & Welling, 2017]; and likewise, for edge flows in a SC2, simplicial neural networks (SNNs) [Roddenberry et al., 2021]. These topology-aware models perform convolutional learning upon the topological structure, more efficient with less parameters and better in performance. Complexity. Like standard SB models, T SB-based models also require simulations of the FBT SDEs. The key difference is that these models operate over topological networks where the drift (1) involves a matrix-vector multiplication Ht Yt. However, this is essentially a recursive iteration of LYt, which is efficient due to the typically sparse structure of L, reflecting the underlying topological sparsity. Moreover, our TNN-parameterized policies are also efficient for the same reason. Connection to other models. As discussed in Chen et al. [2022a], in the special case of Zt 0 and ˆZt as the score function (scaled by gt), the likelihood of SB models reduces to that of the score-based models [Song et al., 2020b] when ν1 is a simple Gaussian and the forward process is designed to reach ν1. Furthermore, if the reference process is poorly designed. SB models can still guide the process to the target distribution through these learnable policies, thus generalizing scorebased models. On the other hand, for FB-T SDEs, we can also obtain probability flow ODEs [Chen et al., 2018; Song et al., 2020b] which share the same time-marginals and likelihoods, allowing for exact likelihood evaluation of the model. Training through the likelihood of these flow ODEs naturally links to flow-based models. While there are no direct score-based or flow-based models for topological signals, the above discussions apply to T SB-based models. We refer to Appendix E for more details where we show how the variants of these models including the diffusion bridge models [Zhou et al., 2024] for topological processes can be constructed. 6 EXPERIMENTS GTSB BM GTSB VE1 GTSB VE2 First, we validate the theoretical results on GT SB using the synthetic graph in Fig. 1. Here, we aim to bridge a zero-mean graph Mat ern GP ν0 with Σ0 = (I + L) 1.5 and a diffusion GP ν1 with Σ1 = exp( 20L). Using the T SHeat BM and T SHeat VE reference dynamics, we obtain the closed-form Xt in (5), from which we further compute the covariance Σt [cf. Corollary D.1]. We measure the Bures-Wasserstein distance between Σt and Σ1. From the right-hand-side figure, we see that both bridges reach the target distribution. The bridges exhibit distinct behaviors depending on the reference dynamics, as demonstrated by the disparate curves for T SHeat VE with c = 0.01 and 10. This highlights the flexibility of T SB-models in exploring a large space of topological bridges. We then focus on evaluating T SB-based models for topological signal generation (ν1 is a Gaussian noise) and matching (general ν1) in different applications, with the goal of investigating the question: whether T SB-based models are beneficial for these tasks compared to the standard SB-based models? For this goal, we consider as the baseline SB-based models in Euclidean domains which use BM, VE and VP as reference dynamics [Chen et al., 2022a], labeled as SB-BM, SB-VE and SB-VP, respectively. We consider T SB-based models using T SHeat BM, T SHeat VE and T SHeat VP as references, labeled as TSB-BM, TSB-VE and TSB-VP, respectively. We refer to Appendix F for the experimental details and additional results, as well as complexity analyses in Appendix F.2.5. Topological signal matching. We first consider matching two sets of f MRI brain signals from the Human Connectome Project [Van Essen et al., 2013], which represent the liberal (with high energy, as the initial) and aligned (with low energy, as the final) brain activities, respectively. We use the recommended brain graph [Glasser et al., 2016] that connects 360 parcelled brain regions with edge weights denoting the connection strength. From Fig. 2, we see that a TSB-VE model learns to reach at a final state with low energy indicating the aligned activity, whereas SB-VE fails. We then consider the single-cell embryoid body data that describes cell differentiation over 5 timepoints [Moon et al., 2019]. We follow the preprocessing from Tong [2023]; Tong et al. [2024a;b]. We aim to transport the initial observations to the final state. Our method relies on the affinity graph constructed from the entire set of observations ( 18k). We define two normalized indicator functions as the boundary distributions, which specify the nodes corresponding to the data observed at the first and last timepoints. Fig. 3 shows the two-dim phate embeddings of the groundtruth and predicted data points using TSB-BM and SB-BM models. Here, SB-BM gives very noisy predictions, especially for intermediate ones, even when trained on the full dataset (see Table F.7). Published as a conference paper at ICLR 2025 Figure 2: Energies of true (Left) final state and the predictions obtained from TSB-VE (Center) and SB-VE (Right) models. Figure 3: Phate embeddings of the single-cell data observations (Left) and predictions based on TSB-BM (Center) and SB-BM (Right) models. Table 1: 1-Wasserstein distances for generating and matching tasks across datasets over five runs, where indicates using GSB-VE and GTSB-VE for ocean currents. Method Seismic magnitudes Traffic flows Brain signals Single-cell data Ocean currents SB-BM 11.73 0.05 18.69 0.02 12.08 0.08 0.33 0.01 7.21 0.00 SB-VE 11.49 0.04 19.04 0.02 17.46 0.14 0.33 0.01 7.17 0.02 SB-VP 12.61 0.06 18.22 0.03 13.41 0.05 0.33 0.01 0.83 0.01 TSB-BM 9.01 0.03 10.57 0.02 7.51 0.08 0.14 0.03 6.94 0.01 TSB-VE 7.69 0.04 10.51 0.02 7.59 0.05 0.14 0.02 6.89 0.00 TSB-VP 8.40 0.04 9.92 0.02 7.67 0.11 0.14 0.01 0.53 0.00 Edge flows have been used to model vector fields upon a discrete Hodge Laplacian estimate of the manifold Helmholtzian [Chen et al., 2021b]. Following the setup there, we consider the edge-based ocean current matching in a SC2 ( 20k edges). With an edge GP, learned by Yang et al. [2024] from drifter data, as the initial distribution modeling the currents, we synthetize a curl-free edge GP as the final one, modeling different behaviors of currents. From Fig. 4, we see that SB-BM fails to reach the final curl-free state, while TSB-BM becomes more divergent, ultimately closer to the target. For these matching tasks, we evaluate the forward final predictions using 1-Wasserstein distances in Table 1, showing the consistent superiority of T SB-based models over SB ones. We reasonably argue that this difference is due to the improper reference in SB-based models, which overlooks the underlying topology. This highlights the role of topology using T SB-based models in these tasks. Generative modeling. We model the magnitudes of yearly seismic events from IRIS as node signals on a mesh graph of 576 nodes based on the geodesic distance between the vertices of an icosahedral triangulated earth surface [Moresi & Mather, 2019]. We also consider the traffic flow from Pe MSD4 dataset modeled as edge flows on a SC2 with 340 edges [Chen et al., 2022b]. From Table 1, we see that T SB-based models consistently outperform SB-based models also for signal generation tasks, highlighting the importance of topology-aware reference processes. 0 1 2 3 4 5 Training Stage TSB-BM,GCN TSB-VE,GCN TSB-BM,Res TSB-VE,Res Effect of policy models. From the training curves of TSB-BM/VE for Res Block and GCN as policy models on the right, we see that the training converges much faster and better using GCN compared to the former. This underlines the positive effect of TNNs on topological signal generative modeling. We refer to Tables F.2 and F.3 for the performance metrics of other bridge models with the two policy parameterizations on both seismic and traffic datasets. Effect of GT SB prior. Instead of using T SHeat BM or T SHeat VE as the reference, we here consider their corresponding closed-form SDEs (7) as the reference, imposing on the bridges a stronger prior carrying the moment information of the data samples [Bunne et al., 2023]. For ocean current matching, we show the samples from the learned FB-T SDEs using GTSB-BM in Fig. 4, which arrives at a more faithful final state compared to TSB-BM, as also evaluated in Table 1. Figure 4: Forward sampled currents using TSB-BM (Top), SB-BM (Center) and GTSB-BM (Bottom). Published as a conference paper at ICLR 2025 7 DISCUSSION AND CONCLUSION In this work, we demonstrated how to construct T SB-based topological signal matching models within the likelihood training framework [Chen et al., 2022a]. We here discuss a few promising future directions based on emerging work and unexplored theoretical results. On model training. Peluchetti [2023]; Shi et al. [2023] applied iterative Markovian fitting (IMF), as an alternative of IPF, to the classical SBP. This algorithm, trained via score matching, extends to T SB-models with T SHeat BM or T SHeat VE as the reference, thanks to their closed-form transition kernels in (2). Recent work proposed (partially) simulation-free training of SB models. Tong et al. [2024b] learns the optimal SB by flow and score matching the forward SDE upon a heuristic EOT. Korotin et al. [2024]; Gushchin et al. [2024] modeled Schr odinger potentials using Gaussian mixtures, enabling light training for the optimal drift, and Deng et al. [2024] linearized the forward policy. However, these methods require the reference dynamics to be either Wiener process or have scalar drifts. While training T SB-based models remain scalable w.r.t. the topology size (see Appendix F.2.5), extending these approaches to our models is worthwhile but nontrivial. On model improving. We focused on the reference dynamics driven by a topological convolution Ht up to order one. It is however worthwhile to consider more involved (potentially learnable) convolutions to impose more general priors or incorporate physics knowledge of the process. The scalar diffusion coefficient gt could be extended as matrix-valued, enabling spatially correlated noising processes over the topology. On the other hand, SB models perform a kinetic energy minimization from the SOC view. [Liu et al., 2024] considered generalized SBP by adding a cost term which can model other knowledge of the process. This broadens the applicability of SB models and T SB models could benefit from this, when there are external interactions with the topological process or prior knowledge on the process, such as enforcing curl-free edge flows. On other models. While we showed the connections of T SB to stochastic interpolants, flowand score-based models, as well as diffusion bridges, we notice that the T SB optimality can be interpreted as a Wasserstein gradient flow (WGF) (see Appendix C.2). For example, the T SHeat BMdriven T SB is the WGF of a functional F(ν) of some measure ν with F(ν) = c R 1 2x Lx ν(x) dx+ 1 2g2 R ν log ν dx where D(x) := 1 2x Lx is the Dirichlet energy of x and the second term is the negative entropy. Thus, the T SHeat BM-driven forward T SB essentially reduces the Dirichlet energy. This may not always align with the needs of real-world applications, which in turn motivates developing topological dynamics learning models via the JKO flow [Jordan et al., 1998] of a parametrized functional D(x) on topology, akin to approaches used in Euclidean domains [Bunne et al., 2022]. We focused on a fixed topological domain, but it is also of interest to study the case where T itself evolves over time. The T SBP in this scenario may rely on a time-varying operator Lt to guide the reference process. This is relevant for recent generative models for graphs, to name a few [Niu et al., 2020; Jo et al., 2022; Liu et al., 2023], where the graph structure, together with node features, are learned in the latent space based on diffusion models [Song et al., 2020b]. Lastly, we remark that discrete distributions on topological domains may be defined. For instance, nodes of a graph can represent discrete states where node i is associated with a discrete probability Pi. This motivates the emerging geneartive models for discrete data [Austin et al., 2021; Ye et al., 2022; Haefeli et al., 2023; Campbell et al., 2024]. For a formal treatment of matching such discrete distributions on graphs, we refer to Maas [2011]; L eonard [2013]; Solomon [2018]; Chow et al. [2022]. Conclusion. With the goal of matching topological signal distributions beyond Euclidean domains, we introduced the T SBP (topological Schr odinger bridge problem). We defined the reference process using an SDE driven by a topological convolution linear operator, which is tractable and includes the commonly used heat diffusion on topological domains. When the end distributions are Gaussians, we derived a closed-form T SB, generalizing the existing results by Bunne et al. [2023]. In general cases, we showed that the optimal process satisfies a pair of FB-T SDEs governed by some optimal policies. Building upon these results, we developed T SB-based models where we parameterize the policies as (topological) neural networks and learn them from likelihood training, extending the framework of De Bortoli et al. [2021]; Chen et al. [2022a] to topological domains. We applied T SB-based models for both topological signal generation and matching in various applications, demonstrating their improved performance compared to standard SB-based models. Overall, our work lies at the intersection of the SB-based distribution matching and topological machine learning, and we hope it inspires further research in this direction. Published as a conference paper at ICLR 2025 ACKNOWLEDGEMENTS The author s research is funded by TU Delft AI Labs Programme and supported by professors Elvin Isufi and Geert Leus. The author is grateful for the financial support provided by the Multimedia computing group at TU Delft. Additionally, the author thanks the anonymous reviewers for their comments and valuable feedback, especially on the single-cell data experiments. Lastly, the author acknowledges the work of Bunne et al. [2023] for inspiring the proofs in Section 4. REPRODUCIBILITY STATEMENT For reproducibility of the theoretical results, the complete proofs of the claims can be found in the appendix, which is organized here. For reproducibility of the experiments, we refer to the Git Hub repository at topological SB matching. Mathieu Alain, So Takao, Brooks Paige, and Marc Peter Deisenroth. Gaussian Processes on Cellular Complexes. In Proceedings of the 41st International Conference on Machine Learning, volume 235, pp. 879 905. PMLR, 2024. Cited on pages 3 and 25. Michael Samuel Albergo, Mark Goldstein, Nicholas Matthew Boffi, Rajesh Ranganath, and Eric Vanden-Eijnden. Stochastic Interpolants with Data-Dependent Couplings. In Forty-first International Conference on Machine Learning, 2024. Cited on pages 1, 6, and 30. Luigi Ambrosio, Nicola Gigli, and Giuseppe Savar e. Gradient flows: in metric spaces and in the space of probability measures. Springer Science & Business Media, 2008. Cited on page 26. Brian D.O. Anderson. Reverse-time diffusion equation models. Stochastic Processes and their Applications, 12(3):313 326, May 1982. ISSN 03044149. doi: 10.1016/0304-4149(82)90051-5. Cited on page 26. Panos J Antsaklis and Anthony N Michel. Linear systems, volume 8. Springer, 1997. Cited on page 20. Cedric Archambeau, Dan Cornford, Manfred Opper, and John Shawe-Taylor. Gaussian process approximations of stochastic differential equations. In Gaussian Processes in Practice, pp. 1 16. PMLR, 2007. Cited on page 19. Jacob Austin, Daniel D Johnson, Jonathan Ho, Daniel Tarlow, and Rianne Van Den Berg. Structured denoising diffusion models in discrete state-spaces. Advances in Neural Information Processing Systems, 34:17981 17993, 2021. Cited on page 10. Sergio Barbarossa and Stefania Sardellitti. Topological Signal Processing Over Simplicial Complexes. IEEE Transactions on Signal Processing, 68:2992 3007, 2020. ISSN 1941-0476. doi: 10.1109/TSP.2020.2981920. Cited on page 3. Jean-David Benamou and Yann Brenier. A computational fluid mechanics solution to the Monge Kantorovich mass transfer problem. Numerische Mathematik, 84(3):375 393, January 2000. ISSN 0945-3245. doi: 10.1007/s002110050002. Cited on page 3. Arne Beurling. An automorphism of product measures. Annals of Mathematics, 72(1):189 200, 1960. Cited on page 5. Christian Bick, Elizabeth Gross, Heather A Harrington, and Michael T Schaub. What are higherorder networks? SIAM Review, 65(3):686 731, 2023. Cited on pages 2 and 3. Viacheslav Borovitskiy, Iskander Azangulov, Alexander Terenin, Peter Mostowsky, Marc Peter Deisenroth, and Nicolas Durrande. Mat ern Gaussian Processes on Graphs, April 2021. ar Xiv:2010.15538 [cs, stat]. Cited on pages 3 and 20. Charlotte Bunne, Laetitia Papaxanthos, Andreas Krause, and Marco Cuturi. Proximal optimal transport modeling of population dynamics. In International Conference on Artificial Intelligence and Statistics, pp. 6511 6528. PMLR, 2022. Cited on page 10. Published as a conference paper at ICLR 2025 Charlotte Bunne, Ya-Ping Hsieh, Marco Cuturi, and Andreas Krause. The Schr odinger Bridge between Gaussian Measures has a Closed Form. In Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, pp. 5802 5833. PMLR, April 2023. ISSN: 26403498. Cited on pages 2, 6, 7, 9, 10, 11, 28, and 39. Donald Bures. An extension of Kakutani s theorem on infinite product measures to the tensor product of semifinite w*-algebras. Transactions of the American Mathematical Society, 135:199 212, 1969. Cited on page 36. Kenneth F Caluya and Abhishek Halder. Wasserstein proximal algorithms for the Schr odinger bridge problem: Density control with nonlinear drift. IEEE Transactions on Automatic Control, 67(3): 1163 1178, 2021. Cited on pages 5, 6, 25, 26, and 33. Andrew Campbell, Jason Yim, Regina Barzilay, Tom Rainforth, and Tommi Jaakkola. Generative Flows on Discrete State-Spaces: Enabling Multimodal Flows with Applications to Protein Co Design, February 2024. ar Xiv:2402.04997 [cs, q-bio, stat]. Cited on page 10. Ricky TQ Chen, Yulia Rubanova, Jesse Bettencourt, and David K Duvenaud. Neural ordinary differential equations. Advances in neural information processing systems, 31, 2018. Cited on page 8. Tianrong Chen, Guan-Horng Liu, and Evangelos A Theodorou. Likelihood training of Schr odinger bridge using forward-backward sdes theory. International Conference on Learning Representation (ICLR), 2022a. Cited on pages 1, 2, 5, 6, 7, 8, 10, 26, 34, and 39. Yongxin Chen, Tryphon T Georgiou, and Michele Pavon. Optimal transport over a linear dynamical system. IEEE Transactions on Automatic Control, 62(5):2137 2152, 2016. Cited on pages 7 and 25. Yongxin Chen, Tryphon T Georgiou, and Michele Pavon. Optimal transport in systems and control. Annual Review of Control, Robotics, and Autonomous Systems, 4(1):89 113, 2021a. Cited on page 19. Yu-Chia Chen and Marina Meila. The decomposition of the higher-order homology embedding constructed from the k-Laplacian. Advances in Neural Information Processing Systems, 34:15695 15709, 2021. Cited on page 37. Yu-Chia Chen, Marina Meil a, and Ioannis G Kevrekidis. Helmholtzian Eigenmap: Topological feature discovery & edge flow learning from point cloud data. ar Xiv preprint ar Xiv:2103.07626, 2021b. Cited on pages 9, 36, and 37. Yuzhou Chen, Yulia Gel, and H Vincent Poor. Time-conditioned dances with simplicial complexes: Zigzag filtration curve based supra-hodge convolution networks for time-series forecasting. Advances in Neural Information Processing Systems, 35:8940 8953, 2022b. Cited on pages 9 and 37. Shui-Nee Chow, Wuchen Li, Chenchen Mou, and Haomin Zhou. Dynamical Schr odinger Bridge Problems on Graphs. Journal of Dynamics and Differential Equations, 34(3):2511 2530, September 2022. ISSN 1040-7294, 1572-9222. doi: 10.1007/s10884-021-09977-1. Cited on page 10. Fan RK Chung. Spectral graph theory, volume 92. American Mathematical Soc., 1997. Cited on page 3. Soon-Yeong Chung, Yun-Sung Chung, and Jong-Ho Kim. Diffusion and elastic equations on networks. Publications of the Research Institute for Mathematical Sciences, 43(3):699 726, 2007. Cited on page 25. Paolo Dai Pra. A stochastic control approach to reciprocal diffusion processes. Applied mathematics and Optimization, 23(1):313 329, 1991. Cited on pages 1, 5, and 19. Valentin De Bortoli, James Thornton, Jeremy Heng, and Arnaud Doucet. Diffusion schr odinger bridge with applications to score-based generative modeling. Advances in Neural Information Processing Systems, 34:17695 17709, 2021. Cited on pages 1, 6, 7, and 10. Published as a conference paper at ICLR 2025 Mauricio Delbracio and Peyman Milanfar. Inversion by direct iteration: An alternative to denoising diffusion for image restoration. ar Xiv preprint ar Xiv:2303.11435, 2023. Cited on page 35. Jean-Charles Delvenne, Renaud Lambiotte, and Luis EC Rocha. Diffusion on networked systems is a question of time or structure. Nature communications, 6(1):7366, 2015. Cited on page 4. Wei Deng, Weijian Luo, Yixin Tan, Marin Biloˇs, Yu Chen, Yuriy Nevmyvaka, and Ricky T. Q. Chen. Variational Schr odinger Diffusion Models. In Proceedings of the 41st International Conference on Machine Learning, volume 235, pp. 10506 10529. PMLR, 2024. Cited on pages 2 and 10. Joshua Faskowitz, Richard F Betzel, and Olaf Sporns. Edges in brain networks: Contributions to models of structure and function. Network Neuroscience, 6(1):1 28, 2022. Cited on page 2. Lucas CF Ferreira and Julio C Valencia-Guevara. Gradient flows of time-dependent functionals in metric spaces and applications to PDEs. Monatshefte f ur Mathematik, 185:231 268, 2018. Cited on page 26. H F ollmer and A Wakolbinger. Time reversal of infinite-dimensional diffusions. Stochastic processes and their applications, 22(1):59 77, 1986. Cited on page 28. Hans F ollmer. Random fields and diffusion processes. Lect. Notes Math, 1362:101 204, 1988. Cited on pages 1, 5, 6, and 19. Robert Fortet. R esolution d un syst eme d equations de M. Schr odinger. Journal de Math ematiques Pures et Appliqu ees, 19(1-4):83 105, 1940. Cited on pages 1 and 5. Jason Gaitonde, Jon Kleinberg, and Eva Tardos. Polarization in geometric opinion dynamics. In Proceedings of the 22nd ACM Conference on Economics and Computation, pp. 499 519, 2021. Cited on page 4. Matthew F Glasser, Timothy S Coalson, Emma C Robinson, Carl D Hacker, John Harwell, Essa Yacoub, Kamil Ugurbil, Jesper Andersson, Christian F Beckmann, Mark Jenkinson, et al. A multi-modal parcellation of human cerebral cortex. Nature, 536(7615):171 178, 2016. Cited on pages 8 and 38. Chris Godsil and Gordon Royle. Algebraic Graph Theory, volume 207 of Graduate Texts in Mathematics. Springer New York, New York, NY, 2001. ISBN 978-0-387-95220-8 978-1-4613-0163-9. doi: 10.1007/978-1-4613-0163-9. Cited on page 3. Leo J. Grady and Jonathan R. Polimeni. Discrete Calculus. Springer London, London, 2010. ISBN 978-1-84996-289-6 978-1-84996-290-2. doi: 10.1007/978-1-84996-290-2. Cited on page 3. Nikita Gushchin, Sergei Kholkin, Evgeny Burnaev, and Alexander Korotin. Light and Optimal Schr odinger Bridge Matching. In Forty-first International Conference on Machine Learning, 2024. Cited on page 10. Kilian Konstantin Haefeli, Karolis Martinkus, Nathana el Perraudin, and Roger Wattenhofer. Diffusion Models for Graphs Benefit From Discrete State Spaces, August 2023. ar Xiv:2210.01549 [cs, stat]. Cited on page 10. Jeremy Heng, Valentin De Bortoli, Arnaud Doucet, and James Thornton. Simulating diffusion bridges with score matching. ar Xiv preprint ar Xiv:2111.07243, 2021. Cited on page 35. NJ Higham. Functions of Matrices: Theory and Computation, 2008. Cited on page 7. Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840 6851, 2020. Cited on pages 1 and 4. Michael F Hutchinson. A stochastic estimator of the trace of the influence matrix for Laplacian smoothing splines. Communications in Statistics-Simulation and Computation, 18(3):1059 1076, 1989. Cited on pages 35 and 39. Benton Jamison. The markov processes of schr odinger. Zeitschrift f ur Wahrscheinlichkeitstheorie und verwandte Gebiete, 32(4):323 331, 1975. Cited on pages 5, 19, and 25. Published as a conference paper at ICLR 2025 Hicham Janati, Boris Muzellec, Gabriel Peyr e, and Marco Cuturi. Entropic optimal transport between unbalanced gaussian measures has a closed form. Advances in neural information processing systems, 33:10468 10479, 2020. Cited on pages 6, 18, and 28. Jaehyeong Jo, Seul Lee, and Sung Ju Hwang. Score-based generative modeling of graphs via the system of stochastic differential equations. In International conference on machine learning, pp. 10362 10383. PMLR, 2022. Cited on page 10. Richard Jordan, David Kinderlehrer, and Felix Otto. The Variational Formulation of the Fokker Planck Equation. SIAM Journal on Mathematical Analysis, 29(1):1 17, January 1998. ISSN 0036-1410, 1095-7154. doi: 10.1137/S0036141096303359. Cited on pages 10 and 26. Thomas N. Kipf and Max Welling. Semi-Supervised Classification with Graph Convolutional Networks. In International Conference on Learning Representations, 2017. Cited on pages 8 and 39. Alexander Korotin, Nikita Gushchin, and Evgeny Burnaev. Light Schr odinger Bridge. In The Twelfth International Conference on Learning Representations, 2024. Cited on page 10. Bo Li, Kaitao Xue, Bin Liu, and Yu-Kun Lai. Bbdm: Image-to-image translation with brownian bridge diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern Recognition, pp. 1952 1961, 2023. Cited on page 35. Lek-Heng Lim. Hodge Laplacians on Graphs. SIAM Review, 62(3):685 715, January 2020. ISSN 0036-1445, 1095-7200. doi: 10.1137/18M1223101. Cited on pages 3 and 20. Yaron Lipman, Ricky TQ Chen, Heli Ben-Hamu, Maximilian Nickel, and Matt Le. Flow matching for generative modeling. ar Xiv preprint ar Xiv:2210.02747, 2022. Cited on page 1. Chengyi Liu, Wenqi Fan, Yunqing Liu, Jiatong Li, Hang Li, Hui Liu, Jiliang Tang, and Qing Li. Generative Diffusion Models on Graphs: Methods and Applications, August 2023. ar Xiv:2302.02591 [cs]. Cited on page 10. Guan-Horng Liu, Yaron Lipman, Maximilian Nickel, Brian Karrer, Evangelos Theodorou, and Ricky T. Q. Chen. Generalized Schr odinger Bridge Matching. In The Twelfth International Conference on Learning Representations, 2024. Cited on pages 1 and 10. Xingchao Liu, Lemeng Wu, Mao Ye, and Qiang Liu. Let us Build Bridges: Understanding and Extending Diffusion Generative Models, August 2022. ar Xiv:2208.14699 [cs]. Cited on page 35. Christian L eonard. Lazy random walks and optimal transport on graphs, November 2013. ar Xiv:1308.0226 [math]. Cited on page 10. Christian L eonard. A survey of the Schr odinger problem and some of its connections with optimal transport. Discrete & Continuous Dynamical Systems - A, 34(4):1533 1574, 2014. ISSN 15535231. doi: 10.3934/dcds.2014.34.1533. Cited on pages 1, 2, 5, 6, 18, 19, 25, and 33. Jan Maas. Gradient flows of the entropy for finite Markov chains. Journal of Functional Analysis, 261(8):2250 2292, 2011. Cited on page 10. Anton Mallasto, Augusto Gerolin, and H a Quang Minh. Entropy-regularized 2-Wasserstein distance between Gaussian measures. Information Geometry, 5(1):289 323, 2022. Cited on page 28. Ben Mather, Sandra Mc Laren, David Taylor, Sukanta Roy, and Louis Moresi. Variations and controls on crustal thermal regimes in Southeastern Australia. Tectonophysics, 723:261 276, 2018. Cited on page 36. Kevin R Moon, David Van Dijk, Zheng Wang, Scott Gigante, Daniel B Burkhardt, William S Chen, Kristina Yim, Antonia van den Elzen, Matthew J Hirn, Ronald R Coifman, et al. Visualizing structure and transitions in high-dimensional biological data. Nature biotechnology, 37(12):1482 1492, 2019. Cited on pages 8 and 38. Louis Moresi and Ben Mather. Stripy: A Python module for (constrained) triangulation in Cartesian coordinates and on a sphere. Journal of Open Source Software, 4(38):1410, 2019. Cited on pages 9 and 36. Published as a conference paper at ICLR 2025 Kirill Neklyudov, Rob Brekelmans, Daniel Severo, and Alireza Makhzani. Action Matching: Learning Stochastic Dynamics from Samples. In Proceedings of the 40th International Conference on Machine Learning, pp. 25858 25889. PMLR, July 2023. ISSN: 2640-3498. Cited on page 1. Edward Nelson. Dynamical theories of Brownian motion, volume 101. Princeton university press, 2020. Cited on page 26. Alexander V. Nikitin, St John, Arno Solin, and Samuel Kaski. Non-separable Spatio-temporal Graph Kernels via SPDEs. In Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, pp. 10640 10660. PMLR, May 2022. ISSN: 2640-3498. Cited on pages 4 and 25. Chenhao Niu, Yang Song, Jiaming Song, Shengjia Zhao, Aditya Grover, and Stefano Ermon. Permutation invariant graph generation via score-based generative modeling. In International Conference on Artificial Intelligence and Statistics, pp. 4474 4484. PMLR, 2020. Cited on page 10. Bernt Oksendal. Stochastic differential equations: an introduction with applications. Springer Science & Business Media, 2013. Cited on page 21. Felix Otto. The geometry of dissipative evolution equations: the porous medium equation. 2001. Cited on page 26. Theodore Papamarkou, Tolga Birdal, Michael M Bronstein, Gunnar E Carlsson, Justin Curry, Yue Gao, Mustafa Hajij, Roland Kwitt, Pietro Lio, Paolo Di Lorenzo, et al. Position: Topological Deep Learning is the New Frontier for Relational Learning. In Forty-first International Conference on Machine Learning, 2024. Cited on pages 2 and 3. Michele Pavon and Anton Wakolbinger. On free energy, stochastic control, and Schr odinger processes. In Modeling, Estimation and Control of Systems with Uncertainty: Proceedings of a Conference held in Sopron, Hungary, September 1990, pp. 334 348. Springer, 1991. Cited on pages 1, 5, and 19. Michele Pavon, Giulio Trigila, and Esteban G Tabak. The Data-Driven Schr odinger Bridge. Communications on Pure and Applied Mathematics, 74(7):1545 1573, 2021. Cited on pages 1 and 6. Stefano Peluchetti. Diffusion bridge mixture transports, Schr odinger bridge problems and generative modeling. Journal of Machine Learning Research, 24(374):1 51, 2023. Cited on page 10. Jos e Pereira, Morteza Ibrahimi, and Andrea Montanari. Learning networks of stochastic differential equations. Advances in Neural Information Processing Systems, 23, 2010. Cited on page 4. Alessio Perinelli, Davide Tabarelli, Carlo Miniussi, and Leonardo Ricci. Dependence of connectivity on geometric distance in brain networks. Scientific Reports, 9(1):13412, 2019. Cited on page 38. Michael Poli, Stefano Massaroli, Junyoung Park, Atsushi Yamashita, Hajime Asama, and Jinkyoo Park. Graph Neural Ordinary Differential Equations, June 2021. ar Xiv:1911.07532 [cs, stat]. Cited on page 25. Philip E. Protter. Stochastic Differential Equations, pp. 249 361. Springer Berlin Heidelberg, Berlin, Heidelberg, 2005. ISBN 978-3-662-10061-5. doi: 10.1007/978-3-662-10061-5 6. Cited on page 28. A P erez Riascos and Jos e L Mateos. Fractional dynamics on networks: Emergence of anomalous diffusion and L evy flights. Physical Review E, 90(3):032809, 2014. Cited on pages 7 and 24. Hannes Risken. Fokker-planck equation. Springer, 1996. Cited on page 19. T Mitchell Roddenberry, Nicholas Glaze, and Santiago Segarra. Principled simplicial neural networks for trajectory prediction. In International Conference on Machine Learning, pp. 9020 9029. PMLR, 2021. Cited on pages 8 and 39. Aliaksei Sandryhaila and Jose M. F. Moura. Discrete Signal Processing on Graphs. IEEE Transactions on Signal Processing, 61(7):1644 1656, April 2013. ISSN 1053-587X, 1941-0476. doi: 10.1109/TSP.2013.2238935. Cited on page 4. Published as a conference paper at ICLR 2025 Aliaksei Sandryhaila and Jos e M. F. Moura. Discrete Signal Processing on Graphs: Frequency Analysis. IEEE Transactions on Signal Processing, 62(12):3042 3054, June 2014. ISSN 19410476. doi: 10.1109/TSP.2014.2321121. Cited on page 4. Augusto Santos, Diogo Rente, Rui Seabra, and Jos e MF Moura. Learning the causal structure of networked dynamical systems under latent nodes and structured noise. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pp. 14866 14874, 2024. Cited on page 4. Michael T. Schaub, Austin R. Benson, Paul Horn, Gabor Lippner, and Ali Jadbabaie. Random Walks on Simplicial Complexes and the Normalized Hodge 1-Laplacian. SIAM Review, 62(2):353 391, January 2020. ISSN 0036-1445, 1095-7200. doi: 10.1137/18M1201019. Cited on page 3. Erwin Schr odinger. Uber die umkehrung der naturgesetze. Verlag der Akademie der Wissenschaften in Kommission bei Walter De Gruyter u, 1931. Cited on page 1. Erwin Schr odinger. Sur la th eorie relativiste de l electron et l interpr etation de la m ecanique quantique. In Annales de l institut Henri Poincar e, volume 2, pp. 269 310, 1932. Cited on page 1. Yuyang Shi, Valentin De Bortoli, Andrew Campbell, and Arnaud Doucet. Diffusion Schr odinger Bridge Matching. In Thirty-seventh Conference on Neural Information Processing Systems, 2023. Cited on page 10. David I. Shuman, Sunil K. Narang, Pascal Frossard, Antonio Ortega, and Pierre Vandergheynst. The Emerging Field of Signal Processing on Graphs: Extending High-Dimensional Data Analysis to Networks and Other Irregular Domains. IEEE Signal Processing Magazine, 30(3):83 98, May 2013. ISSN 1053-5888. doi: 10.1109/MSP.2012.2235192. Cited on page 3. Jascha Sohl-Dickstein, Eric A. Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep Unsupervised Learning using Nonequilibrium Thermodynamics, November 2015. ar Xiv:1503.03585 [cond-mat, q-bio, stat]. Cited on page 4. Justin Solomon. Optimal Transport on Discrete Domains, May 2018. ar Xiv:1801.07745 [cs, math]. Cited on page 10. Yang Song and Stefano Ermon. Generative Modeling by Estimating Gradients of the Data Distribution, October 2020. ar Xiv:1907.05600 [cs, stat]. Cited on page 4. Yang Song, Sahaj Garg, Jiaxin Shi, and Stefano Ermon. Sliced score matching: A scalable approach to density and score estimation. In Uncertainty in Artificial Intelligence, pp. 574 584. PMLR, 2020a. Cited on page 35. Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. ar Xiv preprint ar Xiv:2011.13456, 2020b. Cited on pages 1, 2, 4, 7, 8, 10, 34, 35, and 39. Simo S arkk a and Arno Solin. Applied Stochastic Differential Equations. Cambridge University Press, 1 edition, April 2019. ISBN 978-1-108-18673-5 978-1-316-51008-7 978-1-316-64946-6. doi: 10.1017/9781108186735. Cited on pages 4, 6, 21, 22, and 35. Alexander Tong. Processed Single-cell RNA Time Series Data, 2023. Cited on pages 8 and 38. Alexander Tong, Kilian FATRAS, Nikolay Malkin, Guillaume Huguet, Yanlei Zhang, Jarrid Rector Brooks, Guy Wolf, and Yoshua Bengio. Improving and generalizing flow-based generative models with minibatch optimal transport. Transactions on Machine Learning Research, 2024a. ISSN 2835-8856. Expert Certification. Cited on pages 1, 8, and 38. Alexander Y. Tong, Nikolay Malkin, Kilian Fatras, Lazar Atanackovic, Yanlei Zhang, Guillaume Huguet, Guy Wolf, and Yoshua Bengio. Simulation-Free Schr odinger Bridges via Score and Flow Matching. In Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, 2024b. Cited on pages 8, 10, and 38. David C Van Essen, Stephen M Smith, Deanna M Barch, Timothy EJ Behrens, Essa Yacoub, Kamil Ugurbil, Wu-Minn HCP Consortium, et al. The WU-Minn human connectome project: an overview. Neuroimage, 80:62 79, 2013. Cited on pages 8 and 38. Published as a conference paper at ICLR 2025 Francisco Vargas, Pierre Thodoroff, Neil D. Lawrence, and Austen Lamacraft. Solving Schr odinger Bridges via Maximum Likelihood. Entropy, 23(9):1134, August 2021. ISSN 1099-4300. doi: 10.3390/e23091134. ar Xiv:2106.02081 [cs, stat]. Cited on pages 1, 6, and 7. Prakhar Verma, Vincent Adam, and Arno Solin. Variational Gaussian Process Diffusion Processes. In International Conference on Artificial Intelligence and Statistics, pp. 1909 1917. PMLR, 2024. Cited on page 19. C edric Villani. Optimal transport: old and new. Number 338 in Grundlehren der mathematischen Wissenschaften. Springer, Berlin Heidelberg, 2009. ISBN 978-3-540-71049-3 978-3-662-501801. Cited on page 3. Gefei Wang, Yuling Jiao, Qian Xu, Yang Wang, and Can Yang. Deep generative learning via schr odinger bridge. In International conference on machine learning, pp. 10794 10804. PMLR, 2021. Cited on page 1. Yuyang Wang, Jianren Wang, Zhonglin Cao, and Amir Barati Farimani. Molecular contrastive learning of representations via graph neural networks. Nature Machine Intelligence, 4(3):279 287, 2022. Cited on page 2. Maosheng Yang, Elvin Isufi, and Geert Leus. Simplicial convolutional neural networks. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8847 8851. IEEE, 2022a. Cited on page 39. Maosheng Yang, Elvin Isufi, Michael T Schaub, and Geert Leus. Simplicial convolutional filters. IEEE Transactions on Signal Processing, 70:4633 4648, 2022b. Cited on pages 4, 20, and 24. Maosheng Yang, Viacheslav Borovitskiy, and Elvin Isufi. Hodge-Compositional Edge Gaussian Processes. In Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, PMLR, volume 238, pp. 3754 3762, 2024. Cited on pages 3, 9, 20, and 38. Mao Ye, Lemeng Wu, and Qiang Liu. First hitting diffusion models for generating manifold, graph and categorical data. Advances in Neural Information Processing Systems, 35:27280 27292, 2022. Cited on page 10. Anthony Zee. Quantum field theory in a nutshell, volume 7. Princeton university press, 2010. Cited on page 27. Linqi Zhou, Aaron Lou, Samar Khanna, and Stefano Ermon. Denoising Diffusion Bridge Models. In The Twelfth International Conference on Learning Representations, 2024. Cited on pages 2, 8, and 35. Published as a conference paper at ICLR 2025 Organizations We include several appendices with additional details, proofs, derivations and experiments. This work concerns with the intersection of Schr odinger bridge theory, topological signal processing and learning, stochastic dynamics (on topology) and generative modeling. For that, we first introduce the necessary preliminaries on Schr odinger bridge theory in Appendix A, including needed theorems and lemmas for later. In Appendix B, we first provide an overview of topological signals and probabilistic methods. Then, we extensively discuss the topological stochastic dynamics based on the linear T SDE, as well as its three instantiations: T SHeat BM, T SHeat VE and T SHeat VP. Here, we provide the detailed derivations on how to obtain the transition kernels of the T SDE, which are crucial for the T SBP. In Appendix C, we discuss the optimality of T SBP, relying on the existing results. Specifically, it includes the Schr odinger system for T SBP, an SOC formulation of the T SBP, the optimality, and the WGF interpretation. Appendix D proves the closed-form solution of the GT SBP, along with the marginal and conditional statistics of the optimal path measure. In Appendices E and F, we provide more details on the T SB-based models, their connections to other models, as well as the experiment details and additional results. A PRELIMINARIES ON SCHR ODINGER BRIDGES A.1 ON OPTIMAL TRANSPORT Theorem A.1 (Static Gaussian OT; [Janati et al., 2020]). Let Σ0, Σ1 be positive definite. Given two Gaussian measures ρ0 N(µ0, Σ0) and ρ1 N(µ1, Σ1), the entropic-regularized optimal transport min π Π(µ0,µ1) Rn Rn 1 2 x0 x1 2 dπ(x0, x1) + σ2DKL(π µ0 µ1) (E-OT) admits a closed-form solution π , Σ0 Cσ C σ Σ1 1 2 0 DσΣ 1 2 0 σ2I), Dσ = (4Σ 1 2 0 + σ4I) 1 2 . (A.2) Remark 2. Note that while the above results are stated for positive definite covariance matrices (in order for ρ0 and ρ1 to have a Lebesgue density), the closed-form solution remains well-defined for positive semi-definite covariance matrices. A.2 ON SCHR ODINGER BRIDGE Lemma A.3 (L eonard [2014]). For a given measure P over the path space Ω, let Pxy represent the conditioning of P on paths that take values x and y at t = 0 and 1, respectively. That is, Pxy = P[ |X0 = x, X1 = y]. Let P01 denote the joint probability for the values of paths at the two ends t = 0, 1. Then, P can be disintegrated into Rn Rn Pxy( )P01(dx dy). (Disintegration of Measures) Static SBP By Disintegration of Measures, for all P P(Ω), the relative entropy can be factorized as DKL(P Q) = DKL(P01 Q01) + Z Rn Rn DKL(Pxy Qxy)P01(dx dy), (A.3) which implies that DKL(P01 Q01) DKL(P Q) with equality if and only if Pxy = Qxy for each (x, y) Rn Rn. This allows us to reduce the (dynamic) SBP to the static one. min DKL(P01 Q01), s.t. P P(Rn Rn), P0 = ρ0, P1 = ρ1. (SBPstatic) Furthermore, it readily gives the following important theorem. Published as a conference paper at ICLR 2025 Theorem A.4 (F ollmer [1988]; L eonard [2014]). The SBP and SBPstatic admit, respectively, at most one solution. If SBP has the solution P, then its joint-marginal at the end times P01 is the solution of SBPstatic. Conversely, if P01 solves SBPstatic, then the solution of SBP can be expressed as Rn Rn Qxy()P01(dx dy), (A.4) which means that P shares its bridges with Q (i.e., P is in the reciprocal class of Q): Pxy = Qxy (x, y) Rn Rn. (A.5) Proposition A.5 (L eonard [2014]). If the reference measure Q is Markov, then the solution P of SBP is also Markov. Schr odinger System Theorem A.6 (Jamison [1975]; L eonard [2014]; Chen et al. [2021a]). Given two probability measures ρ0, ρ1 on Rn and the continuous, everywhere positive Markov kernel pt|s(y|x) (not necessarily associated to a scaled Brownian motion), there exists a unique pair of (up to scaling) of functions ˆφ0, φ1 on Rn such that the measure P01 on Rn Rn defined by Rn Rn p1|0(y|x) ˆφ0(dx)φ1(dy) (A.6) has marginals ρ0 and ρ1. Moreover, the Schr odinger bridge from ρ0 to ρ1 induces the distribution flow Pt = φt ˆφt with φt(x) = Z p1|t(y|x)φ1(dy), ˆφt(x) = Z pt|0(x|y) ˆφ0(dy). (A.7) SOC Formulation By Girsanov s theorem, Dai Pra [1991]; Pavon & Wakolbinger [1991] showed an equivalent SOC formulation of the SBP which aims to minimize the kinetic energy min u U E Z 1 1 2 u(t, Xt) 2 , s.t. d Xt = u(t, Xt) dt + σ d Wt X0 ρ0, X1 ρ1, (SBPsoc) where U is the set of finite control functions ut u(t, x). Given the SDE constraint in SBPsoc, the associated marginal density ρt ρ(t, x) evolves according to the Fokker-Planck-Kolmogorov equation (FPK, Risken [1996]). This allows to arrive at an equivalent variational formulation min (ρt,ut) 1 2 u(t, x)ρ(t, x) 2 dt dx s.t. ( tρt + (ρtut) = σ2 2 ρt, ρ0 = ρ0, ρ1 = ρ1. (SBPvar) Given ρt = φt ˆφt, the optimal control in SBPsoc can be obtained by ut = σ2 log φt. B STOCHASTIC DYNAMICS ON TOPOLOGICAL DOMAINS Compared to the Euclidean domain, the dynamics on topological domains are less studied. Here we provide some existing work on the dynamics on graphs and simplicial complexes. Note that our choice of the linear topological drift ft in (1) is analogous to the ideas in [Archambeau et al., 2007; Verma et al., 2024] which considered linear SDEs to approximate nonlinear dynamics, enabling approximations of more complex topological dynamics. B.1 PRELIMINARIES ON TOPOLOGICAL SIGNALS Here we review the standard notions about topological signals, and we focus on the node signals and edge flows on graphs and simplicial complexes. Note that we abuse some notions in this subsection. Node signals Let G = (V, E) be an unweighted graph where V = {1, . . . , n0} is the set of nodes and E is the set of n1 edges such that if nodes i, j are connected, then e = (i, j) E. We can define real-valued functions on its node set V R, collected into a vector x = [x(1), . . . , x(n0)] Rn0, which is referred to as a node signal (or graph signal). Denote the oriented node-to-edge incidence Published as a conference paper at ICLR 2025 matrix by B1 of dimension n0 n1. One can obtain the graph Laplacian by L = B1B 1 , which is a positive semi-definite linear operator on the space Rn0 of node signals. A graph GP x N(0, Σ) assumes x is a random function with zero mean and a graph kernel (covariance matrix) Σ which encodes the covariance between pairs of nodes. [Borovitskiy et al., 2021] constructed the diffusion and the Mat ern graph GPs by extending the idea of deriving continuous GPs from SDEs, which have the kernels as follows Σdiffu = exp κ2 2 L , ΣMat ern = 2ν κ2 I + L ν (B.1) where κ > 0, ν > 0 are hyperparameters. Edge flows While it is possible to define edge flows on graphs, we consider a more general setting for a SC2. A SC2 generally contains V, E, T three sets, where V, E are the sets of nodes and edges, same as for graphs, and T is the set of triangular faces (triangles) such that if (i, j), (j, k), (i, k) form a closed triangle, then t = (i, j, k) T. Not all three pairwise connected edges are necessarily closed and included in T. For each edge and triangle, we assume the increasing order of their node labels as their reference orientation. (Note that the orientation of a general simplex is an equivalence class of permutations of its labels. Two orientations are equivalent (resp. opposite) if they differ by an even (resp. odd) permutation [Lim, 2020].) An oriented edge, denoted as e = [i, j], is an ordering of {i, j}. This is not a directed edge allowing flow only from i to j, but rather an assignment of the sign of the flow: from i to j, it is positive and the reverse is negative. Likewise goes for the oriented triangle t = [i, j, k]. In a SC2, we can define an edge flow by real-valued functions on its edges, collected in x = [x(e0), . . . , x(en1)] Rn1. which are required to be alternating, meaning that x( e) = x(e) if e = [j, i] is oriented opposite to the reference e = [i, j]. Likewise, a triangle signal can be defined via an alternating function on triangles. In the same spirit as graph Laplacians operating on node functions, we can define the discrete Hodge Laplacian operating on edge flows as L = Ld + Lu := B 1 B1 + B2B 2 where B2 is the edge-totriangle incidence matrix. The Hodge Laplacian L describes the connectivity of edges where the down part Ld and the up part Lu encode how edges are adjacent, respectively, through nodes and via triangles. Moreover, the Hodge Laplacian L is also positive semi-definite. Yang et al. [2024] generalized the graph GPs for edge flows, resulting in the diffusion and Mat ern edge GPs with the kernels of the same forms as in (B.1) but with the Hodge Laplacian L instead of the graph Laplacian L. Moreover, based on the combinatorial Hodge theory [Lim, 2020] that Rn1 = im(B 1 ) ker(L) im(B2) (B.2) where im(B 1 ) is the gradient space, ker(L) the harmonic space and im(B2) the curl space, one can define the two types of edge GPs living in certain Hodge subspace {H, G, C} with the kernels of the forms Σdiffu, = σ2 U exp κ2 2 Λ U , ΣMat ern, = σ2 U where (U , Λ ) are the eigenpairs of the Hodge Laplacian in the gradient (G), curl (C) and harmonic (H) subspaces, respectively [Yang et al., 2022b]. The samples from certain Hodge GP are in the corresponding Hodge subspace, which allows us to model the edge flows with different properties. B.2 PRELIMINARIES ON (STOCHASTIC) DIFFERENTIAL EQUATIONS We are involved with differential equations in this work. In the following, we review some results on ordinary differential equations (ODEs) and SDEs which are required later. B.2.1 ON ODES Given an initial solution x0 Rn, consider a linear differential system of the form dxt = Atxt dt (linear ODE) with At a time-varying matrix. To solve this system in closed-form, we require an expression for the state transition matrix Ψ(t, s) which transforms the solution at s to t, xt = Ψ(t, s)xs. For the general case, the closed-form of Ψ(t, s) is not possible. In the following, we introduce an important class of matrices At for which a closed-form solution is possible [Antsaklis & Michel, 1997]. Published as a conference paper at ICLR 2025 Lemma B.1 (Closed-form of the transition matrix of a linear ODE). Given a linear ODE, if for every s, t 0, we have s Aτ dτ = Z t s Aτ dτ At, (B.4) then the transition matrix is given by Ψ(t, s) = exp Z t s Aτ dτ I + Z t s Aτ dτ + 1 s Aτ dτ 2 + . . . . (B.5) For the scalar case, or when At is diagonal, or for At = A, (B.4) is always true. Lemma B.2. For At C[R, Rn n], (B.4) is true if and only if At As = As At for all s, t. Lemma B.3 (Integration of matrix exponential). For a nonsingular matrix A, we have Z t s exp(Aτ) dτ = exp(At) exp(As) A 1. (B.6) B.2.2 ON SDES Lemma B.4 (Itˆo isometry, Oksendal [2013]). Let W : [0, 1] X R denote the canonical realvalued Wiener process defined up to time 1, and let X : [0, 1] X R be a stochastic process that is adapted to the filtration generated by W. Then s X2 t dt , (B.7) 0 Xt Yt dt . (B.8) This corollary allows us to compute the covariance of two stochastic processes Xt and Yt that are adapted to the same filtration. Transition densities of SDEs All Itˆo processes, that is, solutions to Itˆo SDEs, are Markov processes. This means that all Itˆo processes are, in a probabilistic sense, completely characterized by the transition densities (from xs at time s to xt at time t, denoted by pt|s(xt|xs) p(xs, s; xt, t)). The transition density is also a solution to the FPK equation with a degenerate (Dirac delta) initial density concentrated on xs at time s. We refer to S arkk a & Solin [2019, Thm 5.10]. B.3 TRANSITION DENSITIES OF T SDE T SDE First, we can find the transition matrix of the associated ODE to T SDE. Lemma B.5. For an ODE dyt = Ht(L)yt dt, the transition matrix is given by Ψ(t, s) =: Ψts = exp Z t s Hτ dτ = I + s Hτ dτ k (transition matrix) with yt = Ψtsys. Note that Ψts is symmetric since Ht is a function of the symmetric L. This is a direct result from Lemmas B.1 and B.2 since Ht Hτ = HτHt for all t, τ. By the definition of matrix integral, the computation of Ψts is given by Ψ(t, τ) = exp( Ht,τ(L)) = exp K X k=0 ht,τ k Lk (B.9) where ht,τ k = R t τ hk,s ds are the integral of the scalar coefficients in Ht. In the following, we characterize the transition densities of the T SDE, as well as the three concrete examples in Examples 1 to 3. Then, using the formulas in S arkk a & Solin [2019, Eq. 6.7], we can compute the statistics of the transition kernel. The following two lemmas compose Lemma 4. Published as a conference paper at ICLR 2025 Lemma B.6. The transition density pt|s(yt|ys) of the T SDE conditioned on Ys = ys is Gaussian pt|s(yt|ys) N(yt; mt|s, Kt|s) (B.10) with the mean and covariance, for t s, as follows mt|s = Ψtsys + Ψt s Ψ 1 τ ατ dτ, Kt|s = Ψt s g2 τΨ 2 τ dτ Ψ t . Proof. Given the transition matrix, using the transition kernel formula in S arkk a & Solin [2019, Eq. 6.7], we have mt|s = Ψtsys + Z t s Ψtτατ dτ = Ψtsys + Ψt s Ψ 1 τ ατ dτ, where we use the property Ψtτ = ΨtΨ 1 τ . Likewise, we have s g2 τΨtτΨ tτ dτ = Ψt s g2 τΨ 2 τ dτ Ψ t . Lemma B.7. Conditioned on Y0 = y0, the cross covariance K(t1, t2) of T SDE at t1, t2 is given by Kt1,t2 = Ψt1 Z min{t1,t2} 0 g2 τΨ 2 τ dτ Ψ t2. Proof. By applying the cross covariance function in S arkk a & Solin [2019, Sec 6.4], we have Kt1,t2 = Cov[Yt1, Yt2|Y0] = E Z t1 0 gτ1Ψt1,τ1 d Wτ1 0 gτ2Ψt2,τ2 d Wτ2 = Z min{t1,t2} 0 g2 τΨt1,τΨ t2,τ dτ = Ψt1 Z min{t1,t2} 0 g2 τΨ 2 τ dτ Ψ t2. (by Lemma B.4) T SHeat BM Given an initial sample y0 of the random topological signal Y0, consider the SDE: d Yt = c LYt dt + g d Wt. (T SHeat BM) Note that when L is singular, we consider a perturbed version L + ϵI with a small constant ϵ > 0. Its steady-state distribution has zero mean and covariance matrix Σ = g2 2c L 1. The transition matrix of the associated ODE d Yt = c LYt dt is given by Ψts = exp ( c(t s)L) . (B.11) Lemma B.8. The transition density pt|s(yt|ys) of the T SHeat BM conditioned on Ys = ys is Gaussian with the mean and covariance, for t s, as follows mt|s = Ψtsys, Kt|s = g2 2c I exp( 2c L(t s)) L 1. Moreover, the conditional dynamics Yt|Y0 = y0 has the covariance process at t1, t2 as Kt1,t2 = g2 h exp( c L|t2 t1|) exp( c L(t1 + t2)) i L 1. (B.12) Proof. For the conditional mean, we can obtain it directly from the transition matrix of the associated ODE. For the conditional covariance, we have s Ψ(t, τ)g2Ψ(t, τ) dτ = Ψt s Ψ 2 τ g2 dτ Ψ t s exp(2c Lτ) dτ Ψ t (by Ψ 2 τ = exp(2c Lτ)) 2cΨt exp(2c Lt) exp(2c Ls) L 1Ψ t (by Lemma B.3) 2c I exp( 2c L(t s)) L 1. (Ψt = exp( c Lt)) Published as a conference paper at ICLR 2025 To compute the covariance process of the conditional dynamics Yt|Y0 = y0, by definition, we have, for t1 t2 Kt1,t2 = Cov[Yt1, Yt2|Y0] = Ψt1 0 g2Ψ 2 τ dτ Ψ t2 (by Lemma 4) 0 exp(2c Lτ) dτ Ψ t2 (by Ψ 2 τ = exp(2c Lτ)) 2c exp(2c Lt1) I L 1Ψ t2 (by Lemma B.3) 2c exp( c L(t2 t1)) exp( c L(t1 + t2)) L 1. The case of t1 > t2 can be similarly derived, which completes the proof. T SHeat VE Lemma B.9. The Gaussian transition kernel p(t|s) of T SHeat VE has the mean and covariance mt|s = Ψtsys, Kt|s = σ2 min ln σmax exp( 2c Lt) exp(2At) exp(2As) A 1 (B.13) where Ψts is the same as (B.11) and A = ln σmax σmin I + c L. The cross covariance between Yt and Ys, conditioned on Y0, is given by Kt1,t2 = σ2 min ln σmax exp( c L(t1 + t2))[exp(2A min{t1, t2}) I]A 1. (B.14) Proof. As the associated ODE of T SHeat VE is also a topological heat diffusion, the transition matrix is the same as Ψts = exp( c L(t s)) for T SHeat BM. By substituting σ(t) into gt, we can find that For the covariance of the transition density, we have s g2 τΨ 2 τ dτ Ψ t for which, we need to compute the integral Z t s g2 τΨ 2 τ dτ = Z t 2τ 2 ln σmax exp(2c Lτ) dτ = 2σ2 min ln σmax 2τ exp(2c Lτ) dτ (factor out the constant) = 2σ2 min ln σmax s exp 2τ ln σmax exp(2c Lτ) dτ (by the identity exp(ln x) = x) = 2σ2 min ln σmax s exp(2τA) dτ (by A = ln(σmax/σmin)I + c L) = σ2 min ln σmax (exp(2At) exp(2As))A 1. (by Lemma B.3) Thus, we have Kt|s = σ2 min ln σmax exp( 2c Lt) exp(2At) exp(2As) A 1. Published as a conference paper at ICLR 2025 For the cross covariance, assuming t1 t2, then we can find the covariance kernel as Kt1,t2 = Cov[Yt1, Yt2|Y0] = Ψt1 0 g2 τΨ 2 τ dτ Ψ t2 (by Lemma 4) 2τ 2 ln σmax exp(2c Lτ) dτ Ψ t2 (since Ψ 2 τ = exp(2c Lτ)) = σ2 min ln σmax Ψt1 exp(2At1) I A 1Ψ t2 (using the same steps as above) = σ2 min ln σmax exp( c L(t1 + t2))[exp(2At1) I]A 1. The similar steps can be followed for t1 > t2, which completes the proof. T SHeat VP For this stochastic process, a closed-form transition kernel cannot be found. Yet, we could proceed the following for numerical computations. First, we can find the closed-form transition matrix of the associated ODE as Ψts = exp Z t 2β(τ)I + c L dτ = exp c L(t s) 1 s β(τ) dτ (B.16) where the integral can be easily obtained as Z t s β(τ) dτ = 1 2τ 2(βmax βmin) + τβmin s =: βts. (B.17) This allows to compute the mean of the transition kernel mt|s given an initial solution ys. For the covariance kernel, we have s β(τ)Ψ 2 τ dτ Ψ t where the integral can be expressed as Z t s β(τ)Ψ 2 τ dτ = Z t τ(βmax βmin) + βmin exp 2c Lτ + βτ0 = τ(βmax βmin) + βmin s (βmax βmin) Z s 0 v(τ) dτ. (integration by parts) Here, we denote v(τ) = R exp 2c Lτ + βτ0 dτ, thus v (τ) := exp 2c Lτ + βτ0 , which does not have a simple closed-form, we need to compute it numerically. This gives the covariance kernel. Following the similar procedures, we can compute the cross covariance Kt1,t2 of the conditional process Yt|Y0 = y0. B.4 OTHER TOPOLOGICAL DYNAMICS We may consider fractional Laplacian in T SHeat which allows for a more efficient exploration of the network [Riascos & Mateos, 2014] due to its non-local nature. For T SHeat, we can further allow heterogeneous heat diffusion on the edge space as follows d Yt = (c1Ld + c2Lu)Yt dt + gt d Wt (B.18) by setting Ht = (c1Ld + c2Lu), with c1, c2 > 0. Here, the diffusion rates are different for the different edge-adajcency types encoded in Ld and Lu. This in fact can be generalized to using a more general topological convolution operator, if L := Ld + Lu consists of the down and up parts, k=0 h1 k(t)Lk d + k=0 h2 k(t)Lk u (B.19) where h1 k(t), h2 k(t) are the coefficients of the topological convolution. We refer to Yang et al. [2022b] for more details on its expressive power compared to T SDE. Published as a conference paper at ICLR 2025 Beyond T SDE: Instead of first-order dynamics, we can use higher-order dynamics such as wave equations. Graph wave equations [Chung et al., 2007] have been used for building more expressive graph neural networks [Poli et al., 2021] and its stochastic variant for modeling graph-time GPs [Nikitin et al., 2022]. Moreover, we may allow the interactions between node and edge signals in which the dynamics is defined over the direct sum of the two spaces. While not considered in this work, we refer to [Alain et al., 2024] for such cases to define topological GPs on SCs. C TOWARDS THE OPTIMALITY OF TOPOLOGICAL SBP Proposition C.1 (T -Schr odinger System; [Chen et al., 2016; Jamison, 1975]). The optimal solution of T SBPstatic has the form P01 = R Rn Rn ˆφ0(x0)p1|0(x1|x0)φ1(x1) dx0 dx1 with φ and ˆφ satisfying the system Rn p1|t(x1|xt)φ1(x1) dx1, ˆφt(xt) = Z Rn pt|0(xt|x0) ˆφ0(x0) dx0 (C.1) where pt|s(y|x) = N(y; µt|s, Kt|s) is the Gaussian transition density [cf. Lemma B.6] of T SDE with drift in (1). Moreover, the time-marginal at t can be factored as Pt(x) = φt(x) ˆφt(x). Proof. This is a direct result of the Schr odinger system in Theorem A.6 by replacing the Markov kernel by that [cf. Lemma B.6] of the T SDE. From this system, we see that the optimal path measure has its marginal Pt factorized into two time-marginals φt and ˆφt, which are both governed by the T SDE. C.1 VARIATIONAL FORMULATIONS OF T SBP By Girsanov s theorem, the T SBP can be formulated as the minimum energy SOC problem: 0 b(t, Xt) 2 dt , s.t. d Xt = [ft + gtb(t, Xt)] dt + gt d Wt X0 ν0, X0 ν1 (T SBPsoc) where bt b(t, Xt) is the control function. The SDE constraint in T SBPsoc is also known as the controlled SDE, in comparison to the uncontrolled reference T SDE. It further leads to the variational problem min (bt,νt) 1 2 Rn bt 2ν(t, x) dx dt, s.t. tνt + [νt(ft + gtbt)] = 1 2g2 t νt ν(0, x) = ν0, ν(1, x) = ν1 (T SBPvar) where νt ν(t, x) Pt is the time-marginal of P and follows some PDE constraint, which is the FPK equation of the SDE constraint in T SBPsoc. Theorem C.2 (T SBP Optimality; L eonard [2014]; Caluya & Halder [2021]). Let φt φ(t, x) and ˆφt ˆφ(t, x) be the solutions to the pair of PDEs tφt = φ t ft 1 2g2 t φt t ˆφt = ( ˆφtft) + 1 2g2 t ˆφt, s.t. φ(0, ) ˆφ(0, ) = ν0, φ(1, ) ˆφ(1, ) = ν1. (C.2) Then, the optimal control in T SBPvar is b t = g2 t log φt and the optimal path measure is νt = Pt = φt ˆφt. Moreover, the solution to T SBP can be represented by the path measure of the following coupled (forward-backward) T SDEs d Xt = [ft + g2 t log φ(t, Xt)] dt + gt d Wt, X0 ν0, (C.3a) d Xt = [ft g2 t log ˆφ(t, Xt)] dt + gt d Wt, X1 ν1, (C.3b) where log φ(t, Xt) and log ˆφ(t, Xt) are the forward and backward optimal drifts, respectively. Proof. The proof is an adaption of Caluya & Halder [2021] to the topological setting. First, we make the assumptions1 that g(t) is uniformly lower-bounded and f(t, x; L) satisfies Lipschitz conditions 1The nonexplosive Lipschitz condition on ft rules out the finite-time blow up of the sample paths of the SDE, and ensures the existance and uniqueness. It, together with the uniformly lower-bounded diffusion gt, guarantees the transition kernel pt|s is positive and everwhere continuous. Published as a conference paper at ICLR 2025 with at most linear growth in x. From the first oprder optimality conditions for the SOC formulation T SBPvar, we can obtain a coupled system of nonlinear PDEs for ψt (the potential function of bt, i.e., bt = ψt) and νt, which are known as the Hamilton-Jacobi-Bellman (HJB) and FPK equations, respectively, as well as the optimal control b t = g2 t log φt. Via the Hopf-Cole transform, this system returns (C.2) [Caluya & Halder, 2021, Thm 2]. Then, by substituting the optimal control into the constraint in T SBPsoc, one can obtain the forward SDE, and the backward SDE can be derived from the time-reversal of the forward SDE [Anderson, 1982; Nelson, 2020]. Using nonlinear Feynman-Kac formula (or applying Itˆo s formula on log φt and log ˆφt), the PDE system (C.2) admits the SDEs [Chen et al., 2022a] d log φt = 1 2 Zt 2 dt + Z t d Wt, (C.4) d log ˆφt = 1 2 ˆZt 2 + (gt ˆZt ft) + ˆZ t Zt dt + ˆZ t d Wt (C.5) where Zt gt log φt(Xt) and ˆZt gt log ˆφt(Xt). This results in Proposition 5. C.2 WASSERSTEIN GRADIENT FLOW INTERPRETATION The gradient flow of a funcitional over the space of probability measures with Wasserstein metric, i.e., the Wasserstein gradient flow (WGF), is fundamentally linked to FPK equations [Otto, 2001; Ambrosio et al., 2008]. In the following, we show that solving the T SBP with T SHeat BM reference amounts to solving the WGF of some functional on a probability measure ν. Theorem C.3. Consider the T SBPvar with the reference process T SHeat. The SB optimality [cf. (C.2)] respects a pair of FPK equations of the form t ˆφt(x) = (c Lx ˆφt(x)) + 1 2g2 ˆφt(x), ˆφ0(x) = ˆφ0(x), (C.6a) tρt(x) = (c Lxρt(x)) + 1 2g2 ρt(x), ρ0(x) = φ1(x) exp(2cx Lx/g2). (C.6b) Therefore, the Wasserstein gradient flow of F(v) recovers the paired PDE in solving T SBPvar Rn 1 2x Lx ν(x) dx + 1 Rn ν log ν dx := c Eν[D(x)] + 1 2g2S(ν) (C.7) where D(x) = 1 2x Lx is the Dirichlet energy of x and S(ν) is the negative differential entropy. Proof. First, from Theorem C.2, we have the PDE system in (C.2) which can be rewritten as the pair of PDEs in (C.6) by applying Caluya & Halder [2021, Thm 3]. Both PDEs are of the following FPK form with V (x) := 1 2cx Lx on some density pt tpt(x) = (pt(x) V (x)) + 1 2g2 pt(x), (C.8) for some initial condition. We can view V (x) as the potential energy of some function ft(x). Here, we have ft(x) = c Lxt = V (x). Then, from the seminal work Jordan et al. [1998], the flows generated by the PDEs in (C.6) (both of the FPK form) can be seen as the gradient descent of the Lyapunov functional F( ) in the following form Rn 1 2x Lx ( ) dx + 1 Rn( ) log( ) dx := c E( )[D(x)] + 1 2g2S( ) (C.9) with respect to the 2-Wasserstein distance in the space P2(Rn) of probability measures on Rn with finite second moments. Here, ( ) can be ˆφt or pt. We note that it is also possible to obtain the associated functional for the T SBP with more general reference T SDE based on the similar argument, but it would be more involved and lead to a timedependent functional [Ferreira & Valencia-Guevara, 2018]. Published as a conference paper at ICLR 2025 D THE CLOSED-FORM OF GAUSSIAN TOPOLOGICAL SCHR ODINGER BRIDGES [THEOREMS 6 AND 7] (PROOFS AND OTHERS) For convenience, we state the Gaussian T SBP min DKL(P QT ), s.t. P P(Ω), ν0 = N(µ0, Σ0), ν1 = N(µ1, Σ1) (GT SBP) and its static problem min DKL(P01 QT 01), s.t. P01 P(Rn Rn), P0 = ν0, P 1 = ν1. (GT SBPstatic) We also restate the main results of the Gaussian T SBP. Theorem 6. Denote by P the solution to GT SBP with ν0 = N(µ0, Σ0) and ν1 = N(µ1, Σ1). Then, P is the path measure of a Markov Gaussian process whose marginal Xt N(µt, Σt) admits an expression in terms of the initial and final variables, X0, X1, as follows Xt = Rt X0 + Rt X1 + ξt Rtξ1 + Γt Z (5) where Z N(0, I) is standard Gaussian, independent of (X0, X1), and Rt = Kt1K 1 11 , Rt = Ψt RtΨ1, Γt := Cov[Yt|(Y0, Y1)] = Ktt Kt1K 1 11 K1t. (6) Corollary D.1 (Marginal Statistics). The time marginal variable Xt in (5) of the optimal solution to GT SBP has the mean and covariance as follows µt = Rtµ0 + Rtµ1 + ξt Rtξ1, (D.1a) Σt = RtΣ0 R t + RtΣ1R t + Rt CR t + Rt C R t + Γt, (D.1b) where C = Ψ 1 1 K1/2 11 CK1/2 11 with 2( Σ1/2 0 D Σ 1/2 0 I), D = (4 Σ1/2 0 Σ1 Σ1/2 0 + I)1/2, Σ0 = K 1/2 11 Ψ1Σ0Ψ 1 K 1/2 11 , Σ1 = K 1/2 11 Σ1K 1/2 11 . (D.2) Theorem 7 (SDE representation). Under the optimal P, the process X admits the SDE dynamics: d Xt = f T (t, Xt; L) dt + gt d Wt, where f T (t, x; L) = S t Σ 1 t (x µt) + µt (7) with µt, Σt the mean and covariance of Xt [cf. Corollary D.1] and we have St = Pt Q t + Ht Ktt Kt1K 1 11 Υ t , (8) with Pt = (RtΣ1 + Rt C) R t , Qt = Rt(CR t + Σ0 R t ), Υt = Ht Kt1 + g2 t Ψ 1 t Ψ 1 , where C is the covariance of X0, X1 in the optimal P01. D.1 PRELIMINARIES FOR THE PROOF We first introduce the following three lemmas and the definition of infinitesimal generators. Lemma D.2 (Central identity of Quantum Field Theory [Zee, 2010]). For all matrix M 0 and all sufficiently regular analytical function v (e.g., polynomials or v C (Rd) with compact support), we have 2 (det M) 1 Rd v(x) exp 1 2x Mx dx = exp 1 v(x) x=0 (D.3) where exp(D) = I + D + 1 2D2 + , for a differential operator D. Lemma D.3 (Conditional Gaussians). Let (Y0, Y1) N µ0 µ1 , Σ00 Σ01 Σ10 Σ11 . Then, Y0|Y1 = y is Gaussian with E[Y0|Y1 = y] = µ0 + Σ01Σ 1 11 (y µ1), and Cov(Y0|Y1 = y) = Σ00 Σ01Σ 1 11 Σ10. (D.4) Published as a conference paper at ICLR 2025 Definition 4 (Infinitesimal generator of a stochastic process). For a sufficiently regular timedependent function ϕ(t, x) R+ Rn R, the infinitesimal generator of a stochastic process Xt for ϕ(t, x) can be defined as Atϕ(t, x) = lim h 0 E[ϕ(t + h, Xt+h)|Xt = x] ϕ(t, x) For an Itˆo process defined as the solution to the SDE d Xt = f(t, Xt) dt + g(t, Xt) d Wt, (D.6) with f(t, x), g(t, x) : R+ Rn Rn, the generator is given as Atϕ(t, x) = tϕ(t, x) + f(t, x) xϕ(t, x) + 1 2 Tr[g(t, x)g(t, x) ϕ(t, x)] (D.7) where := 2 x is the Euclidean Laplacian operator. D.2 OUTLINE OF THE PROOF Our proofs follow the idea from Bunne et al. [2023, Theorem 3]. For Theorem 6. We follow the following two steps: 1. We first solve the associated static GT SBP. Specifically, we formulate the equivalent E-OT problem, which has the transport cost dependent on the transition kernel of the T SDE. By introducing new variables, we can convert this involved transport cost to a quadratic cost over new variables, thus, converting the GT SBPstatic to a classical Gaussian E-OT. Based on the existing results [Janati et al., 2020; Mallasto et al., 2022], we can then obtain the optimal coupling over the transformed variables. The coupling over the original variables can be recovered via an inverse transform. 2. From Theorem A.4, we can obtain the solution of GT SBP based on that the optimal P is in the reciprocal class of QT , specifically, by composing the static solution with the QT -bridge. This is an optimality condition obtained from the reduction to the static problem. From that, we know that the solution P is a Markov Gaussian process and shares the same bridge as QT [cf. Proposition A.5]. This further allows us to characterize the mean and covariance of the time-marginal. For Theorem 7. Let P P(Ω) be a finite-energy diffusion [F ollmer & Wakolbinger, 1986]; that is, under P, the canonical process X has a (forward) Itˆo differential. Furthermore, since P is in the reciprocal class of QT , it has the SDE representation in the class of (7) from the SOC formulation where the drift f T (t, x : L) is to be determined. We then proceed the following two steps: 1. For the SDE (7) of the optimal process Xt, we first compute its infinitesimal generator [Protter, 2005] for a test function ϕ(t, x) R+ Rn R by definition using (D.5). 2. Second, we express the generator in terms of its given solution in (D.7) for the SDE (7) Atϕ(t, x) = tϕ(t, x) + f T xϕ(t, x) + 1 2g2 t 2 xϕ(t, x). (D.8) By matching the generators computed in both ways, we then obtain the closed-form of the drift term. D.3 DETAILED PROOF OF THEOREM 6 D.3.1 STEP 1: SOLVE GT SBPSTATIC First, recall that Yt|Y0 = y0 is a Gaussian process with mean mt := E(Yt|y0) and covariance Ktt [cf. cond. mean and cond. cross cov], respectively. Thus, we have the transition probability density QT 1|0(y1|y0) exp 1 2(y1 Ψ1y0 ξ1) K 1 11 (y1 Ψ1y0 ξ1) 2(y1 m1) K 1 11 (y1 m1) . (D.9) By introcuding the variables Y0 = K 1 2 11 (Ψ1Y0 + ξ1) and Y1 = K 1 2 11 Y1, we have QT 1|0(y1|y0) exp 1 2 y1 y0 2 . (D.10) Published as a conference paper at ICLR 2025 Furthermore, if the joint distribution P01 has marginals Y0 ν0 and Y1 ν1, then after the change of variables (Y0 Y0 and Y1 Y1), it gives rise to a joint distribution P01 with marginals Y0 ν0 = N( µ0, Σ0) and Y1 ν1 = N( µ1, Σ1), where 2 11 (Ψ1µ0 + ξ1), Σ0 = K 1 2 11 Ψ1Σ0Ψ 1 K 1 2 11 µ1, Σ1 = K 1 2 11 . (D.11) That is, there is an one-to-one correspondence between P01 and P01. This allows us to expand the objective of GT SBPstatic in terms of minimization as follows DKL(P01 QT 01) = Z Rn Rn P01(y0, y1) log P01(y0, y1) QT 01(y0, y1) = Z log(QT 01(y0, y1)) d P01(y0, y1) + Z log(P01) d P01 Z y1 y0 2 d P01(y0, y1) + Z log(P01) d P01 + const. 1 ( ) Z y1 y0 2 d P01( y0, y1) + Z log( P01) d P01 + const. 2 DKL( P01 QT 01) where the second last step results from R log( P01) d P01 = R log(P01) d P01 + const.. To obtain ( ), we notice that QT 01(y0, y1) = QT 1|0(y1|y0)QT 0(y0), and we have Z log(QT 01(y0, y1)) d P01(y0, y1) = Z log(QT 1|0(y1|y0)) d P01(y0, y1) Z log(QT 0(y0)) d P01(y0, y1) = Z log(QT 1|0(y1|y0)) d P01(y0, y1) Z log QT 0(y0) d P0(y0) where the last equality holds since we can remove the dependence on y1 in the second term by integrating over y1, thus, appearing as a constant in the optimization over P01. Moreover, the expression ( ) is in fact the equivalent E-OT associated to the T SBP min P01 1 2 Z y1 Ψ1y0 ξ1 2 K 1 11 d P01(y0, y1) + Z log(P01) d P01. (T E-OT) Note that by definition we have DKL(P01 ν0 ν1) = R log(P01) d P01 R log(ν0 ν1) d P01 = R log(P01) d P01 R log ν0 dν0 R log ν1 dν1 where the last two terms are constants. Thus, solving GT SBPstatic is equivalent to solving the following problem min P01 P(Rn Rn) DKL( P01 QT 01) Z 1 2 y1 y0 2 d P01( y0, y1) + Z log( P01) d P01 (D.12) with P0 = ν0 and P1 = ν1. This is a classical static Gaussian E-OT between ν0 and ν1 with σ = 1. The closed-form solution is given by the joint Gaussian [cf. Theorem A.1] P 01 = N µ0 µ1 , Σ0 C C Σ1 where C = 1 2( Σ1/2 0 D Σ 1/2 0 I), D = (4 Σ1/2 0 Σ1 Σ1/2 0 + I)1/2. (D.14) Finally, via the inverse transforms Y0 = Ψ 1 1 (K 1 2 11 y0 ξ1) and Y1 = K 1 2 11 y1, we can obtain the solution to the original problem GT SBPstatic as P 01 N µ0 µ1 , Σ0 C C Σ1 (GT SBstatic) where C = Ψ 1 1 K Published as a conference paper at ICLR 2025 D.3.2 STEP 2: FROM STATIC TO DYNAMIC VIA DISINTEGRATION FORMULA From Theorem A.4, we know that the solution P to GT SBP shares its bridges with the reference QT . We denote by Qy0y1 T the process Y conditioning on Y0 = y0 and Y1 = y1 under QT , i.e., Y |y0, y1 Qy0y1 T = QT [Y0 = y0, Y1 = y1]. It is the bridge of QT , following Rn Rn Qy0y1 T ( )QT 01(dy0 dy1). (D.15) In the classical case of Brownian motion Y = W QW , Qy0y1 W is often referred to as the Brownian bridge. Here, we aim to first find the Qy0y1 T -bridge, and then construct the optimal solution P by composing the static solution P 01 with the Qy0y1 T -bridge [cf. (A.4) in Theorem A.4]. From the transition kernel in Lemma 4, we have the conditional distributions Yt|y0 N(mt, Ktt) and Y1|y0 N(m1, K11). Thus, the joint distribution of Yt and Y1 given y0 follows Yt, Y1|y0 N mt m1 , Ktt Kt1 K1t K11 Applying Lemma D.3, we know that Yt|y0, Y1 = y1 is Gaussian with mean E(Yt|y0, Y1 = y1) = mt + Kt1K 1 11 (y1 m1) = Ψty0 + ξt + Kt1K 1 11 (y1 Ψ1y0 ξ1) = (Ψt Kt1K 1 11 Ψ1)y0 + Kt1K 1 11 y1 + ξt Kt1K 1 11 ξ1 Rty0 + Rty1 + ξt Rtξ1 where we recall the definitions of Rt and Rt in (8), and covariance Γt := Cov(Yt|Y0 = y0, Y1 = y1) = Ktt Kt1K 1 11 K1t. (D.18) Since a Gaussian process is completely determined by its mean and covariance, we have Yt|Y0, Y1 law = Rt Y0 + Rt Y1 + ξt Rtξ1 + Γt Z Qy0y1 T t (D.19) where Z N(0, I) is independent of Yt. Now, the Disintegration of Measures and Theorem A.4 allow us to construct the solution to GT SBP by first generating (X0, X1) P 01 in GT SBstatic, then connecting X0 and X1 using the Qy0y1 T -bridge. This is equivalent to, for X0 ν0, X1 ν1 and Z N(0, I), Z (X0, X1), building a process as Xt law = Rt X0 + Rt X1 + ξt Rtξ1 + Γt Z P t , (D.20) which in fact is a stochastic interpolant for stochastic processes over topological domains, generalizing the same notion in Euclidean domains in Albergo et al. [2024, Definition 1]. Note that since QT is a stochastic process following an Itˆo SDE, which is a Markov process, the solution P is also a Markov process [cf. Proposition A.5]. Finally, we obtain the mean and covariance of the time-marginal Xt as µt = Rtµ0 + Rtµ1 + ξt Rtξ1, Σt = RtΣ0 R t + RtΣ1R t + Rt CR t + Rt C R t + Ktt Kt1K 1 11 K1t. (D.21) This concludes the proofs of Theorem 6 and Corollary D.1. D.4 DETAILED PROOF OF THEOREM 7 D.4.1 STEP 1: COMPUTE THE INFINITESIMAL GENERATOR OF Xt BY DEFINITION For some time-varying function ϕ(t, x), by definition, the infinitesimal generator of Xt is given by (D.5). Since Xt is a Gaussian process, we could express the conditional expectation using Lemma D.3. As we are only interested in the terms that are of order O(h), we then ignore the higher-order terms. First, we compute the first-order approximation of Σt in (D.1) Σt = RtΣ0 R t + RtΣ0 R t + RtΣ1R t + RtΣ1 R t + Rt CR t + Rt C R t + Rt C R t + Rt C R t + t Ktt ( t Kt1)K 1 11 K1t (Kt1K 1 11 ) t K1t (P t + Pt) (Qt + Q t ) + t Ktt ( t Kt1)K 1 11 K1t (Kt1K 1 11 ) t K1t Published as a conference paper at ICLR 2025 where at the last equality we recall the definitions of Pt and Qt in (8). Next, denote by Σt,t+h the covariance process of Xt evaluated at t and t + h. We can estimate Σt,t+h up to the first order of o(h) as Σt,t+h := E[(Xt µt)(Xt+h µt+h) ] = RtΣ0 R t+h + RtΣ1R t+h + Rt CR t+h + Rt C R t+h + Kt,t+h Kt1K 1 11 K1,t+h = Σt + RtΣ0( Rt+h Rt) + RtΣ1(Rt+h Rt) + Rt C(Rt+h Rt) + Rt C ( Rt+h Rt) + (Kt,t+h Ktt) Kt1K 1 11 (K1,t+h K1t) (a) = Σt + h( RtΣ0 R t + RtΣ1 R t + Rt C R t + Rt C R t ) + o(h) + (Kt,t+h Ktt) Kt1K 1 11 (K1,t+h K1t) (b) = Σt + h(Pt Q t + t2Kt1,t2|t1=t,t2=t Kt1K 1 11 t2Kt1,t2|t1=1,t2=t) + o(h) (D.23) where we obtain (a) by plugging in limh 0 1 h(Rt+h Rt) = Rt and limh 0 1 h( Rt+h Rt); and likewise, we obtain (b) by recognizing the definitions of Pt and Q t in (8) and using the partial derivatives lim h 0 1 h(Kt,t+h Kt,t) = t2Kt1,t2|t1=t,t2=t, t1 t2 lim h 0 1 h(K1,t+h K1,t) = t2Kt1,t2|t1=1,t2=t = t K1t, t1 > t2. (D.24) Following the similar procedure, we can obtain a first-order approximation of Σt+h,t as Σt+h,t := E[(Xt+h µt+h)(Xt µt) ] = Σt + h( RtΣ0 R t + RtΣ1R t + Rt CR t + Rt C R t ) + o(h) + (Kt+h,t Ktt) (Kt+h,1 Kt1)K 1 11 K1t = Σt + h(P t Qt + t1Kt1,t2|t1=t,t2=t t1Kt1,t2|t1=t,t2=1K 1 11 K1t) + o(h), (D.25) where the partial derivatives should be understood as lim h 0 1 h(Kt+h,t Kt,t) = t1Kt1,t2|t1=t,t2=t, t1 > t2 lim h 0 1 h(Kt+h,1 Kt,1) = t1Kt1,t2|t1=t,t2=1 = t Kt1, t1 t2. (D.26) Since Eqs. (D.22), (D.23) and (D.25) are all involved with the partial derivatives of Kt1,t2, we can compute them by the closed-form of the transition matrix as t1Kt1,t2 = t1 0 g2 sΨ 2 s ds Ψ t2 , for t1 t2 (a) = Ψt1Ht1 0 g2 sΨ 2 s ds Ψ t2 + g2 t1Ψ 1 t1 Ψ t2 (b) = Ht1Kt1,t2 + g2 t1Ψ 1 t1 Ψ t2, for t1 t2, where we use the symmetry of Ψt. At (a) we use tΨt = t exp Z t 0 Hs ds = exp Z t 0 Hs ds Ht = Ψt Ht, (D.28) and at (b) we use the commutativity of Ht and Ψt [cf. Lemmas B.1 and B.5]. Similarly, we have t2Kt1,t2 = t2 0 g2 sΨ 2 s ds Ψ t2 = Kt1,t2H t2, for t1 t2 t1Kt1,t2 = t1 0 g2 sΨ 2 s ds Ψ t2 = Ht1Kt1,t2, for t1 > t2 t2Kt1,t2 = t2 0 g2 sΨ 2 s ds Ψ t2 = g2 t2Ψt1Ψ 1 t2 + Kt1,t2H t2, for t1 > t2. Published as a conference paper at ICLR 2025 We notice that ( t K1t) = t Kt1 = g2 t Ψ 1 t Ψ 1 + Ht Kt1K 1 11 K1t. Now, by introducing in (D.23) the variable S Pt Q t + t2Kt1,t2|t1=t,t2=t Kt1K 1 11 t2Kt1,t2|t1=1,t2=t = Pt Q t + Ktt H t Kt1K 1 11 (g2 t Ψ1Ψ 1 t + K1t H t ) = Pt Q t + Ktt H t Kt1K 1 11 ( t K1t), we can then express the covariance process as Σt,t+h = Σt + h St + o(h), (D.31) Σt+h,t = Σt + h(P t Qt + H t Ktt (g2 t Ψ 1 t Ψ 1 + Ht Kt1)K 1 11 K1t) + o(h) = Σt + h(P t Qt + H t Ktt ( t Kt1)K 1 11 K1t) + o(h) = Σt + h S t + o(h). Lastly, using Lemma D.3, we see tha variable Xt+h|Xt = x is a Gaussian process with mean ˇµt+h := = µt+h + Σt+h,tΣ 1 t (x µt) (a) = µt + h µt + (Σt + h S t )Σ 1(x µt) + o(h) = µt + h µt + (I + h S t Σ 1 t )(x µt) + o(h) = x + h( µt + S t Σ 1 t (x µt)) + o(h) where in (a) we used Σt = Σ t , and covariance ˇΣt+h = Σt+h Σt+h,tΣ 1 t Σt,t+h = Σt + h Σt (Σt + h S t )Σ 1 t (Σt + h St) + o(h) = Σt + h Σt (Σt + h S t + h St) + o(h) (b) = h( Σt St S t ) + o(h). By seeing Ktt as a matrix function of t, we have 0 g2 sΨ 2 s ds Ψ t = Ht Ktt + g2 t I + Ktt H t . (D.35) This reduces (D.22) to Σt = (P t + Pt) (Qt + Q t ) + t Ktt ( t Kt1)K 1 11 K1t (Kt1K 1 11 ) t K1t (D.36) where the last two items appear in S t and St, respectively. Thus, we obtain ˇΣt+h = h( Σt St S t ) + o(h) = h n (P t + Pt) (Qt + Q t ) + t Ktt ( t Kt1)K 1 11 K1t (Kt1K 1 11 ) t K1t Pt Q t + Ktt H t Kt1K 1 11 ( t K1t) P t Qt + H t Ktt ( t Kt1)K 1 11 K1t o + o(h) = hg2 t I + o(h). We can now compute E[ϕ(t + h, Xt+h)|Xt = x] as follows E[ϕ(t + h, Xt+h)|Xt = x] = (2π) d 2 (det ˇΣt+h) 1 Rn ϕ(t + h, x ) exp 1 2(x ˇµt+h) ˇΣ 1 t+h(x ˇµt+h) dx (a) = (2π) d 2 (det ˇΣt+h) 1 Rn ϕ(t + h, x + ˇµt+h) exp 1 2 x ˇΣ 1 t+h x d x Published as a conference paper at ICLR 2025 where in (a) we apply a change-of-variable x := x ˇµt+h. We further apply Lemma D.2 and arrive at E[ϕ(t + h, Xt+h)|Xt = x] = exp 1 2 x ˇΣt+h x ϕ(t + h, x + ˇµt+h) x=0 (a) = I + 1 2 x ˇΣt+h x + o(h.o.t.) ϕ(t + h, x + ˇµt+h)| x=0 (b) = ϕ(t + h, ˇµt+h) + 1 2hg2 t ϕ(t + h, ˇµt+h) + o(h), where we expand the power series of exp( 1 2 x ˇΣt+h x) and ignore the higher-order-terms in (a), and plug in (D.37) in (b). Recalling ˇµt+h in (D.33), we can expand the Taylor series of ϕ(t+h, ˇµt+h) in the second variabel at x as ϕ(t + h, ˇµt+h) = ϕ t + h, x + h( µt + S t Σ 1 t (x µt)) = ϕ(t + h, x) + h ϕ(t + h, x), µt + S Σ 1 t (x µt) + o(h). (D.40) Therefore, we have E[ϕ(t + h, Xt+h)|Xt = x] = ϕ(t + h, x) + h ϕ(t + h, x), µt + S Σ 1 t (x µt) + 1 2hg2 t ϕ(t + h, x) + o(h). (D.41) Now we can express the infinitesimal generator of Xt as lim h 0 E[ϕ(t + h, Xt+h)|Xt = x] ϕ(t, x) = lim h 0 u(t + h, x) u(t, x) h + ϕ(t, x), µt + S Σ 1 t (x µt) + 1 2g2 t ϕ(t, x) = tu(t, x) + ϕ(t, x), µt + S Σ 1 t (x µt) + 1 2g2 t ϕ(t, x). D.4.2 STEP 2: MATCH THE SOLUTION OF GENERATOR FOR AN IT ˆO SDE From L eonard [2014]; Caluya & Halder [2021], we search for the optimal solution to T SBP within the class of stochastic processes following an SDE: d Xt = f T (t, Xt) dt + gt d Wt. (D.43) Recalling the solution of an infinitesimal generator in (D.7) for this SDE Atϕ(t, x) = tϕ(t, x) + f T (t, x) xϕ(t, x) + 1 2g2 t ϕ(t, x), (D.44) we then match it with the generator obtained by definition in (D.42). We observe that the two are equivalent if we set f T (t, x) = µt + S Σ 1 t (x µt). (D.45) This concludes the proof of Theorem 7. D.5 CONDITIONAL DISTRIBUTION OF Xt|X0 Corollary D.5 (Conditional distribution of Xt|X0). Let Xt be the stochastic process associated to the solution P to GT SBP. Given an initial sample x0 ν0, the conditional distribution ν(Xt|X0 = x0) N(µt|0, Σt|0) is Gaussian with µt|0 = Rtx0 + Rtµ1 + Rt C Σ 1 0 (x0 µ0) + ξt Rtξ1, Σt|0 = RtΣ1R t Rt C Σ 1 0 CR t + Ktt Kt1K 1 11 K1t. (D.46) Similarly, given a final sample x1, the conditional distribution ν(Xt|X1 = x1) N(µt|1, Σt|1) is Gaussian with µt|1 = Rtx1 + Rtµ0 + Rt CΣ 1 1 (x1 µ1) + ξt Rtξ1, Σt|1 = RtΣ0 R t Rt CΣ 1 1 C R t + Ktt Kt1K 1 11 K1t. (D.47) Published as a conference paper at ICLR 2025 Proof. First, recall the stochastic interpolant expression in (5) of the solution to GT SBP and its mean µt and covariance Σt in (D.1). Due to the Gaussian nature of the process, we can write the joint distribution of Xt and X0 as Xt X0 , Σt Σt,0 Σ0,t Σ0 where we work out the covariance between Xt and X0 below Σt,0 = E[(Xt µt)(X0 µ0) ] = RtΣ0 + Rt Cov(X1, X0) + Cov(ξt, X0) Rt Cov(ξ1, X0) + Cov(ζt, X0) = RtΣ0 + Rt Cov(X1, X0) (since ξt is deterministic, ζt X0) = RtΣ0 + Rt C . (by P 01 in GT SBstatic) Based on Lemma D.3, we know that Xt|X0 = x0 is Gaussian with mean µt|0 and covariance Σt|0 given by µt|0 = µt + Σt,0Σ 1 0 (x0 µ0) Σt|0 = Σt Σt,0Σ 1 0 Σ0,t. (D.49) By substituting the expressions of µt and Σt from (D.1) and canceling out terms, we complete the proof for Xt|X0 = x0. The proof for Xt|X1 = x1 follows similarly. E TOPOLOGICAL SB GENERATIVE MODELS Here, we provide more details on the T SB-based models. First, we give the likelihood of the model which allows for the training objective in (9), and the probability flow ODEs corresponding to the FB-T SDEs in (3). Then, we discuss the variants of score-based and diffusion bridges models for topological signals, as well as their training objectives, with the goal of illustrating how T SB-based models connect to these models. E.1 LIKELIHOOD TRAINING FOR TOPOLOGICAL SBP The likelihood for the Euclidean SBP by Chen et al. [2022a] extends to the topological case. Corollary E.1 (Likelihood for T SB models; Chen et al. [2022a]). Given the optimal solution of T SBP satisfying the FB-T SDE system in (3), the log-likelihood of the T SB model at an initial signal sample x0 can be expressed as LT SB(x0) = E[log ν1(X1)] Z 1 2 ˆZt 2 + (gt ˆZt ft) + ˆZ t Zt where the expectation is taken over the forward SDE in (3) with the initial condition X0 = x0. Corollary E.2 (Probability flow ODE for T SB). The following ODE characterizes the probability flow of the optimal processes of T SB in (3) d Xt = ft + gt Zt 1 2gt(Zt + ˆZt) dt (E.2) and we have that for all t, Pt = p(E.2) t , i.e., the time marginal of the path measure P is equal to the probability flow pt of this ODE. This is a direct result from the probability flow for general SB [Chen et al., 2022a], which extends the probability flow for score-based models [Song et al., 2020b], and relates to the flow-based training. E.2 SCORE MATCHING FOR TOPOLOGICAL SIGNALS As discussed in Section 5 and by Chen et al. [2022a], SB-based models generalize the score-based models [Song et al., 2020b]. Here, we provide a detailed derivation on how a score-based model can be built for topological signals, specifically, on the score matching objective, since there is no direct literature on this. First, we show in detail that the likelihood training based on (9b) returns a score Published as a conference paper at ICLR 2025 matching objective for topological signals when Zt = 0 and the final ν1 is a simple Gaussian. The backward training objective in this case becomes l(x0; ˆθ) = Z 1 2 ˆZ ˆθ t 2 + gt ˆZ ˆθ t X0 = x0 2g2 t st(ˆθ) 2 + g2 t st(ˆθ) X0 = x0 where we introduce a score function st(ˆθ) to approximate log pt|0(Xt|X0 = x0), following ˆZ ˆθ t = gtst(ˆθ). Here, pt|0 is the transition kernel of the T SDE [cf. Lemma 4]. By using the trace estimator Hutchinson [1989] to compute the divergence, i.e., st(ˆθ) = Eu N(0,I)[u st(ˆθ)u], (E.3) and setting the weighting function λ(t) := g2 t , we then obtain the sliced score matching objective [Song et al., 2020a;b, Eq. 19] for topological signals based on T SDE, which has the form ˆθ = arg min Et U(0,1) λ(t)Ex0Ext Eu N(0,I) 2 st(ˆθ) 2 + u st(ˆθ)u , (E.4) and is equivalent to l(x0; ˆθ). This does not require a closed-form solution for the true score function log pt|0. The associated FB-T SDEs now become the forward-backward processes for the scorebased models d Xt = ft dt + gt d Wt, (E.5) d Xt = (ft g2 t st(ˆθ)) dt + gt d Wt. (E.6) Closed-form score matching For T SHeat BM and T SHeat VE, since we have their closed-form transition kernels in (2), we can use the direct score matching objective [Song et al., 2020b, Eq. 7] to train a score-based model for topological signals ˆθ = arg min Et U(0,1) λ(t)Ex0Ext|x0 st(ˆθ) log pt|0(xt|x0) 2 (E.7) where log pt|0 can be readily obtained based on (2). E.3 DIFFUSION BRIDGES FOR TOPOLOGICAL SIGNALS As discussed earlier, SB models are closely related to stochastic interpolants, flowand score-based models. We further remark that from the T SDE, we can build the topological diffusion bridge to directly construct transport models between any topological distributions via Doob s h-transform [S arkk a & Solin, 2019]. This has been evidenced in Euclidean domains by converting existing diffusion processes (BM, VE, VP) to diffusion bridges so to arrive at arbitrary distributions, and training upon score matching [Heng et al., 2021; Liu et al., 2022; Delbracio & Milanfar, 2023; Li et al., 2023; Zhou et al., 2024]. Specifically, consider the T SDE. To let it arrive at a final sample x1, the Doob s h-transform gives d Xt = ft + g2 t log p1|t(x1|Xt) dt + gt d Wt, X1 = x1, x0 ν0 (E.8) where p1|t(x1|xt) is the transition kernel of the T SDE satisfying the associated backward FPK, given by Lemma B.6 (cf. Lemma 4). We can further find the time-reversal process for (E.8) [Zhou et al., 2024, Theorem 1] d Xt = ft g2 t ( log qt|1(xt|x1) log p1|t(x1|xt)) dt + gt d Wt, X1 = x1 (E.9) where qt|1(xt|x1) is the transition kernel of the new SDE in (E.8) (instead of T SDE) conditioned on Y1 = x1. The goal is to learn this new score function log qt|1(xt|x1), which can be achieved by applying the score matching [Song et al., 2020b]. Given paired training samples (x0, x1) q01, Zhou et al. [2024, Theorem 2] considered a score matching objective to learn the new score function ˆθ = arg min Ext,x0,x1,t λ(t) st(ˆθ) log qt|1(xt|x1) 2 . (E.10) Published as a conference paper at ICLR 2025 However, we cannot directly find closed-form qt|1 in the topological case, we need to use sliced score matching for this case. Note that we can view that the underlying topological process X now follows the new SDE pair (E.8) and (E.9) as the forward and backward processes, respectively. In this sense, the topological diffusion bridge is a special case of the T SB when setting the policies as Zt = gt log p1|t(x1|xt) and ˆZt = gt [log qt|1(xt|x1) log p1|t(x1|xt)]. When performing learning on these policies, since Zt is fixed once T SDE is given, the learning boils down to training the parameterized ˆZ ˆθ t . F ADDITIONAL EXPERIMENTS AND DETAILS We first describe the synthetic experiment on matching Gaussian topological signal distributions based on the closed-form GT SB. Then, we detail the generative modeling experiments conducted on real-world datasets based on T SB-models. F.1 CLOSED-FORM GT SB CORROBORATION Graph GP matching: We build a synthetic graph with 30 nodes and 67 edges, as shown in Fig. 1 (Left). From its graph Laplacian L, we construct the initial distribution of node signals as a Mat ern GP with zero mean and the kernel Σ0 = (I + L) 1.5, and the final distribution as a diffusion GP with zero mean and the kernel Σ1 = exp( 20L). We consider GT SB closed forms Xt in (5) driven by both T SHeat BM and T SHeat VE. For the former, we set c = 0.5 and g = 0.01, labeled as GTSB-BM. For the latter, we consider σmin = 0.01 and σmax = 1 with c = 0.01 and c = 10, labeled as GTSB-VE1 and GTSB-VE2, respectively. We then compute the covariances Σt of the time marginals, which has a closed-form given by Corollary D.1, and obtain the samples based on the closed-form conditional distribution [cf. Corollary D.5] given an initial sample, illustrated in Fig. F.1. We also measure the Bures-Wasserstein (BW) distance [Bures, 1969] of Σt and Σ1 to evaluate the bridge quality, shown in Section 6. Edge GP matching: We also consider matching two edge GPs which are able to model the discretized edge flows in a SC2 of the vector fields defined on a 2D plane [Chen et al., 2021b]. The initial edge GP has a zero mean and a divergence-free diffusion kernel with κC = 10, while the final edge GP has a zero mean and a curl-free diffusion kernel with κG = 10 [cf. Appendix B.1]. We construct the closed-form GT SB Xt with the T SHeat BM as the reference dynamics, where c = 1 and g = 0.01. We obtain the samples from the closed-form SDE representation (7), shown in Fig. F.2. We can see that the forward samples are able to reach the final state, and the backward samples are able to reach the initial state, despite some noise due to numerical simulation. F.2 T SB-BASED GENERATIVE MODELING AND MATCHING Heat flows: We use the heatflow dataset from Southeastern Australia from Mather et al. [2018], which collects the heatflow measurements with coordinates in total 294 from 1982 to 2016. Here we split the data into two parts, before and after 2010 (there is a significant change in the heat flow pattern), to understand the evolution of the heat flow by modeling them as initial and terminal data. That is, for this dataset, we consider the signal matching task. Seismic magnitudes: We use the seismic event catalogue for M5.5+ (from 1990 to 2018) from IRIS which consists of 12,940 recorded earthquake events with magnitudes greater than 5.5. To process these events, we use the stripy toolbox to obtain the icosahedral triangulated mesh of the Earth surface [Moresi & Mather, 2019]. Using the refinement of level three, this spherical mesh has 1,922 vertices. We refer to Fig. F.3 for a visualization of such a mesh of level one for better clarity. Upon this mesh, we first associate each earthquake event to the nearest mesh vertex based on its longitude and latitude. All events are located on 576 unique vertices of the mesh. Using these unique vertices, we then construct a 10-nearest neighbour graph based on the geodesic distance between the vertices, and we use the symmetric normalized graph Laplacian. Lastly, on top of this graph, we associate the yearly earthquake events to its vertices and take the magnitudes as node Published as a conference paper at ICLR 2025 Figure F.1: Covariances of the time marginals of the GT SB driven by T SHeat BM (Top) and T SHeat VE (Center), as well as the samples conditioned on the inisital signal (Bottom). t=0.00 t=0.20 t=0.40 t=0.60 t=0.80 t=1.00 Figure F.2: Forward and backward samples based on the closed-form SDE in (7) with respect to T SHeat BM. Figure F.3: Earth mesh (Left) and a node signal sample of the earthquake magnitudes (Right). signals, resulting in 29 such signals. Followed by this, we preprocess the magnitudes by removing the mean over the years. For this dataset, we consider the signal generation task. Traffic flows: We consider the Pe MSD4 dataset which contains traffic flow in California from 0101-2018 to 28-02-2018 over 307 sensors. We convert the node data into edge flows over a SC2 with 307 nodes, 340 edges and 29 triangles, following Chen et al. [2022b], and use the normalized Hodge Laplacian. For this dataset, we consider the signal generation task. Ocean currents: We consider the Global Lagrangian Drifter Data, which was collected by NOAA Atlantic Oceanographic and Meteorological Laboratory. The dataset itself is a 3D point cloud after converting the locations of buoys to the earth-centered, earth-fixed (ECEF) coordinate system. We follow the procedure in Chen & Meila [2021]; Chen et al. [2021b] to first sample 1,500 buoys furthest from each other, then construct a weighted SC2 as a Vietoris-Rips (VR) complex with around Published as a conference paper at ICLR 2025 20k edges and around 90k triangles. For this dataset, we consider the signal matching task Upon the weighted Hodge Laplacian, we use edge GP learned by Yang et al. [2024] from the drifter collected data as the initial distribution and synthetize a curl-free edge GP as the final distribution. These two GPs have rather different behaviors, able to model ocean currents with different behaviors and make the matching task challenging. From these GPs, we can generate the samples for training and testing in an efficient way based on eigenpairs associated to the 500 largest eigenvalues, analogous to using Karhunen-Lo eve type-decomposition for continuous GPs. Brain f MRI signals: We consider the Human Connectome Project (HCP) [Van Essen et al., 2013] Young Adult dataset where we model the human brain network as a graph and perform the matching task on the measured f MRI signals recorded when the subject performed different tasks. We use the HCP recommended brain atlas [Glasser et al., 2016] where each hemisphere is divided into 180 cortical parcels. This results in a total of 360 brain regions. We then build a graph based on the physical conenction patterns between these regions where the edge weights measure the strength of the axonal connections between two regions, i.e., proportional to the inverse of the square distance [Perinelli et al., 2019]. We use the symmetric normalized graph Laplacian. In our experiments, the two sets of the f MRI signals, respectively, correspond to the liberal and aligned brain activities. The former is associated with brain regions involved in high-level cognition, like decision making and memory, whereas the latter is associated with the sensory regions, like visual and auditory, meaning that functional signals are aligned with anatomical brain structure, as shown in Fig. F.4. Figure F.4: The energies of the initial (liberal) (Left) and final (aligned) brain signals (Right). Figure F.5: Two-dim phate embedding of the single-cell data [Moon et al., 2019]. Single-cell data: We consider the single-cell embryoid body data from [Moon et al., 2019], which describes the differentiation of human embryonic stem cells grown as embryoid bodies into diverse cell lineages over a period of 27 days. These cell data, X1, X2, . . . , X5, are collected at 5 timepoints (day 0 3, day 6 9, day 12 15, day 18 21, day 24 27, indexed by t {1, 2, 3, 4, 5}), resulting in total 18,203 observations. We followed the preprocessing steps provided by Torch CFM [Tong et al., 2024a;b]. Please refer to this link for the direct use of preprocessed data [Tong, 2023]. Followed by this, we consider the two-dimensional phate embedding for the data [Moon et al., 2019], resulting in the data coordinates of dimension 18, 203 2, as illustrated in Fig. F.5. From the preprocessing, we can build a sparse k-nearest neighbouring graph over the entire set of data observations. That is, we have an adjacency matrix of dimension 18, 203 18, 203. In our experiment, we aim to transport the observed data from day 0 3 to day 24 27, i.e., from t = 1 to t = 5. Thus, we build the two boundary distributions based on the normalized indicator functions, which indicate the associated nodes of the data points observed at these two timepoints. That is, ν0 := 1X1/ P j X1 1X1(j) (F.1) where the sum is over the nodes associated to the first-timepoint observations in X1, as the initial distribution, and similarly, ν1 := 1X5/ P j X5 1X5(j) as the final one. After training the models, using the final sample ˆXt=5 obtained from the learned T SB given the initial observations, we can obtain the predictions at the five timepoints based on the sorting (from large to small) of ˆXt=5. Specifically, given the indices after sorting, idx = arg sort( ˆXt=5), we partition them into the disjoint sets, idx = S1 S2 S5 with |St| = nt the number of observations at timepoint t for t = 1, . . . , 5. We then have the prediction labels given by St that indicate the nodes supporting the data points predicted at timepoint t. The disjointed indices in St essentially provide a labeling of the whole predictions for the five timepoints. We found that using adjacency matrix as the convolution operator in the reference dynamics performs better in practice. Published as a conference paper at ICLR 2025 F.2.2 MODEL Models. We consider the following two sets of methods: Euclidean SB-based models with BM, VE and VP reference processes [Chen et al., 2022a], which we refer to as SB-BM, SB-VE and SB-VP, respectively. Topological SB-based model with T SHeat BM, T SHeat VE and T SHeat VP as the reference processes, which we refer to as TSB-BM, TSB-VE and TSB-VP, respectively. For some datasets, we also apply the Gaussian SB SDE solution as the reference dynamics: Euclidean SB-based models with the closed-form GSB SDEs (under BM and VE reference processes) as the reference [Bunne et al., 2023], which we refer to as GSB-BM and GSB-VE, respectively; and, topological SB-based model with the closed-form GT SB SDEs (7) (under T SHeat BM and T SHeat VE) as the reference, which we refer to as GTSB-BM and GTSB-VE, respectively. Improving reference dynamics. Our proposed three types of reference dynamics have fixed diffusion rates c. This may limit the model flexibility in capturing the dynamics of the data, which we found especially in matching ocean current data. Thus, for this task, we allow the time-varying diffusion rate ct. Specifically, we set it to be linearly increasing as ct = cmin +t(cmax cmin) for some cmin, cmax. Moreover, due to the nonlinearity of the underlying process (from a non-curl-free GP to a curl-free GP), we also consider the heterogeneous heat diffusion, as dicussed in (B.18) where the down and up diffusion rates are different. Policy models. For the parameterization of the optimal policies (Zθ t , ˆZ ˆθ t ), we first obtain the time and signal embeddings individually. To obtain the signal embedding from the input, we consider the following two sets of models as the signal module: Res Block model: one multi-layer perceptron (MLP) followed by a number of residual block modules where each block has three MLPs with sigmoid linear unit (Si LU) activations. Topological neural network (TNN) model: For node signals, we consider two-layer GCNs [Kipf & Welling, 2017] followed by one MLP; For edge flows in a SC2, we consider two-layer SNNs [Roddenberry et al., 2021; Yang et al., 2022a] followed by one MLP, where each SNN layer has the linear convolution X Lu XW2 + XW1 + Ld XW0 with the down and up Laplacians Ld, Lu and the learnable weights W0, W1, W2. To obtain the time embedding, we pass the sinusoidal positional encoding of the discretized timepoint through a two-layer MLP module with Si LU activations. We then sum the two embeddings and pass it through a two-layer MLP output module with Si LU to obtain the final parameterization. F.2.3 IMPLEMENTATION DETAILS Our implementation is built upon the SB-framework by Chen et al. [2022a]. We use Adam W optimizer with a learning rate of 10 4 and Exponential Moving Average (EMA) with the decay rate of 0.99. For the reference processes with BM involved, we treat the noise scale g as a hyperparameter and optimize it by grid search. For the reference processes with VE and VP involved, we grid search the noise scales σmin, σmax and βmin, βmax. For T SB-based models, we grid search the optimal diffusion rate c and the noise scales involved in the T SHeat. In computing the likelihood in (9) during training, we use the trace estimator following [Hutchinson, 1989] to compute the divergence. In generative procedures, we apply the predictor-corrector sampling [Song et al., 2020b] to improve performance. To evaluate the models, we compute the negative log-likelihoods (NLLs) in (E.1) for generation tasks. For both generation and matching tasks, we assess the 1and (square rooted) 2-Wasserstein distances between the predicted and true signals. F.2.4 RESULTS Heat flows: For matching the two types of heat flows, we observe from Table F.1 that: (i) T SBbased models are consistently better than SB-based models; and (ii) using GCNs for policy models increases the performance by a large margin for both sets of models. Seismic magnitudes: In generative modeling for seismic magnitudes, while we have the similar observations as for the previous datasets, from Table F.3, we also observe that the GT SB-based models are able to achieve the best performance, and likewise GSB-based models also increase the Published as a conference paper at ICLR 2025 Figure F.6: Backward sampled ocean currents using TSB-BM (Top), SB-BM (Center) and GTSB-BM (Bottom). Table F.1: Heat flow matching results. Method Res Block GCN Forward Backward Forward Backward SB-BM -0.10 0.02 -0.08 0.03 -0.74 0.05 -0.72 0.05 SB-VE -0.12 0.02 -0.10 0.01 -1.20 0.07 -0.95 0.07 SB-VP -0.09 0.02 -0.08 0.02 -0.83 0.04 -0.66 0.11 TSB-BM -0.29 0.02 -0.27 0.02 -0.83 0.05 -0.81 0.05 TSB-VE -0.31 0.02 -0.29 0.01 -1.26 0.05 -0.97 0.08 TSB-VP -0.57 0.02 -0.55 0.02 -1.01 0.03 -0.92 0.03 Table F.2: Traffic flow results. Method Res Block SNN SB-BM 0.82 0.00 0.18 0.02 SB-VE 0.77 0.00 -0.42 0.01 SB-VP 0.79 0.00 -0.09 0.01 TSB-BM 0.40 0.00 0.02 0.03 TSB-VE 0.01 0.00 -0.89 0.02 TSB-VP 0.02 0.00 -0.32 0.01 Table F.3: Seismic magnitudes results. Method Res Block GCN SB-BM 2.78 0.01 2.71 0.03 SB-VE 2.97 0.03 2.73 0.05 SB-VP 2.28 0.02 2.01 0.03 GSB-BM 1.86 0.02 1.83 0.05 GSB-VE 1.68 0.03 1.46 0.07 TSB-BM 2.13 0.01 1.82 0.02 TSB-VE 2.22 0.02 1.53 0.03 TSB-VP 2.00 0.02 1.51 0.02 GTSB-BM 1.58 0.01 1.43 0.04 GTSB-VE 1.49 0.02 1.06 0.04 Table F.4: Ocean current matching results. Method Foward Backward SB-BM 7.21 0.00 7.21 0.00 SB-VE 7.17 0.02 7.17 0.02 GSB-BM 1.09 0.01 0.97 0.00 GSB-VE 0.83 0.01 0.49 0.00 TSB-BM 6.94 0.01 3.70 0.00 TSB-VE 6.89 0.00 3.60 0.00 GTSB-BM 1.09 0.01 0.97 0.00 GTSB-VE 0.53 0.00 0.47 0.00 performance of SB-models. In Table F.5, we report the 1and 2-Wasserstein distances between the generated samples and the true ones. Traffic flows: In geneartive modeling of traffic flows, we observe from Table F.2 that: T SB-based models achieve smaller NLLs and using SNNs for both models improves the performance. This observation is consistent with the Wasserstein metrics reported in Table F.5. Ocean currents: In matching ocean currents with two types of different-behaving edge GPs, note that the initial sample in the forward process in Fig. 4 and the final sample in the backward process in Fig. F.6 are the true samples. From these two figures, we first observe that SB-based models fail to learn the dynamics to reach the expected end states, as shown in Figs. 4 and F.6. On the other hand, T SB-based models are able to reach an end state with small curl component in the forward process, yet with some discrepancy from the true one. Moreover, the learned backward dynamics remains noisy and does not completely return to the initial state. This implies that the underlying dynamics cannot be fully captured by the T SHeat-type reference processes. This can be largely alleviated by using GT SB-based models, where both the forward and backward processes reach the expected states with high fidelity. This is however because we have the initial and final distributions Published as a conference paper at ICLR 2025 Table F.5: Overall 1and (square rooted) 2-Wasserstein distances for generating and matching. Method Seismic magnitudes Traffic flows Brain signals Single-cell data W1 W2 W1 W2 W1 W2 W1 W2 SB-BM 11.73 0.05 8.29 0.04 18.69 0.02 13.36 0.01 12.08 0.08 8.58 0.05 0.33 0.01 0.40 0.01 SB-VE 11.49 0.04 8.13 0.03 19.04 0.02 13.61 0.02 17.46 0.14 12.42 0.09 0.33 0.01 0.39 0.01 SB-VP 12.61 0.06 8.92 0.04 18.22 0.03 13.02 0.02 13.41 0.05 9.54 0.04 0.33 0.01 0.40 0.00 TSB-BM 9.01 0.03 6.37 0.03 10.57 0.02 7.62 0.01 7.51 0.08 5.51 0.06 0.14 0.03 0.28 0.05 TSB-VE 7.69 0.04 5.44 0.03 10.51 0.02 7.58 0.01 7.59 0.05 5.55 0.04 0.14 0.02 0.27 0.04 TSB-VP 8.40 0.04 5.95 0.03 9.92 0.02 7.16 0.01 7.67 0.11 5.64 0.09 0.14 0.01 0.22 0.03 Figure F.7: Intermediate samples of brain signals learned using SB-VE (Top) and TSB-VE (Bottom). modeled by GPs, allowing for GT SB to capture the underlying dynamics better. These observations are reflected in the square-rooted 2-Wasserstein distance results in Table F.4 between the samples given by the learned forward process and the true ones, as well as for the backward ones. Brain f MRI signals: In matching the two brain f MRI signals, we observe from Fig. F.7 that TSB-VEbased model reaches the final state where the signals have lower energy over the brain, indicating aligned activities, whereas SB-VE fails to do so. This is quantatively reflected in terms of the Wasserstein metrics between the generated final samples and the groundtruth ones in Table F.5. Table F.6: Ablation study results on graph normalizations for brain signal matching. Method TSB-BM TSB-VE TSB-VP W1, Lsym norm 7.51 0.08 7.59 0.05 7.67 0.11 W2, Lsym norm 5.51 0.06 5.55 0.04 5.64 0.09 W1, LRW 7.51 0.08 7.62 0.09 7.65 0.09 W2, LRW 5.52 0.06 5.58 0.07 5.62 0.06 W1, Lcomb 8.06 0.05 9.21 0.06 9.29 0.05 W2, Lcomb 5.80 0.04 6.62 0.05 6.73 0.03 Ablation study on graph normalizations. Here, we compare the performance of the TSB-based models using different ways of graph Laplacian normalizations. Specifically, we consider the random walk LRW and the combinatorial Lcomb graph Laplacians. For the latter, we normalize it by dividing the maximal eigenvalue of the Laplacian for stability. From Table F.6, we notice that using the random walk has comparable performance with using the symmetric normalized Laplacian Lsym norm, and using the combinatorial one is worse than the other two. This is not surprising since the combinatorial one does not encode the connection strength between brain regions. Single-cell data: We first measure the Wasserstein distances between the predicted single-cells and the groundtruth ones at the final timepoint, as reported in Table F.5. We here provide the predictions in the two-dim phate embedding space for the SB-BM and TSB-BM models in Fig. F.8 and the latter has a much better prediction. Moreover, from the final sample, we evaluate the predictions at the intermediate timepoints (see Appendix F.2.1) in Table F.7 where the performance of TSB-BM is consistently better than SB-BM. Since our method relies on a graph constructed from the entire data points, we also provide the leave-one-out accuracy for SB-BM by training on the entire data points leaving out the to-be-predicted timepoint. We see that while the accuracy for the final timepoint is perfect, the intermediate predictions remain poor. In contrast, TSB-BM, by making use of the topology, captures the underlying dynamics and predicts the intermediate states better. Published as a conference paper at ICLR 2025 Figure F.8: Single-cell two-dim phate embeddings of the observations (Left) and the predictions using TSB-BM (Center) and SB-BM (Right). Table F.7: Intermediate prediction performance on single-cell data using TSB-BM and SB-BM. Timepoint TSB-BM SB-BM W1 W2 accuracy W1 W2 accuracy leave-one-out 2 0.03 0.00 0.09 0.00 0.80 0.52 0.01 0.59 0.01 0.28 0.24 3 0.09 0.00 0.22 0.01 0.42 0.12 0.00 0.21 0.00 0.23 0.20 4 0.08 0.00 0.16 0.01 0.45 0.19 0.00 0.34 0.00 0.26 0.26 5 0.14 0.03 0.28 0.05 0.70 0.33 0.01 0.40 0.01 0.24 1 F.2.5 COMPUTATIONAL COMPLEXITY Compared to SB-based models, the T SB-based models introduce an additional topological convolution [cf. (1)] overhead, which however admits an efficient computation, as discussed in Section 5. We here provide a quantative comparison of the compelxity in terms of the training time and memory consumption. We measure them using SB-VE and TSB-VE models on different-sized 10-nearest neighbour graphs built from Swiss roll point clouds. This comparison is done in a single training stage with 2,000 iterations, running on a single NVIDIA RTX 3080 GPU. As shown in Fig. F.9, we observe that in the moderate scale ( 10, 000) region, the training time and memory consumption of TSB-VE are only slightly higher than SB-VE, with negligible difference. While this overhead becomes more significant as the scale further increases, both training time and memory can be reduced by exploiting the sparse structure (here implemented using torch.tensor.to sparse) in the graph topology such that the computational overheads for for SB and TSB models remain comparable. Under the same settings, Table F.8 compares SB-BM and TSB-BM models across all datasets. The additional memory and training time introduced by TSB-BM remain below 4%. 0 10000 20000 30000 40000 Graph size (nodes) Training time (minutes) Time SB-VE Time TSB-VE Time TSB-VE, Sparse Memory usage (Mi B) Memory SB-VE Memory TSB-VE Memory TSB-VE, Sparse Figure F.9: Training time and memory comparison when training SB-VE and TSB-VE w.r.t. different-sized graphs. Dataset TSB-BM SB-BM Seismic 516 512 50.17 51.48 Traffic 510 504 52.25 50.62 Ocean 5976 5892 106.68 102.67 Brain 486 468 49.62 48.97 Single-cell 4446 4294 94.30 92.54 Table F.8: Complexity (first row: memory (in Mi B), and second row: training time (in seconds)).