# bridging_geometric_states_via_geometric_diffusion_bridge__5f253971.pdf

Bridging Geometric States via Geometric Diffusion Bridge

Shengjie Luo1 , Yixian Xu1,4 , Di He1 , Shuxin Zheng2, Tie-Yan Liu2, Liwei Wang1,3

1State Key Laboratory of General Artificial Intelligence, School of Intelligence Science and Technology, Peking University 2Microsoft Research AI4Science 3Center for Data Science, Peking University 4Pazhou Laboratory (Huangpu), Guangzhou, Guangdong 510555, China luosj@stu.pku.edu.cn, xyx050@stu.pku.edu.cn, {shuz, tyliu}@microsoft.com, {dihe, wanglw}@pku.edu.cn

The accurate prediction of geometric state evolution in complex systems is critical for advancing scientific domains such as quantum chemistry and material modeling. Traditional experimental and computational methods face challenges in terms of environmental constraints and computational demands, while current deep learning approaches still fall short in terms of precision and generality. In this work, we introduce the Geometric Diffusion Bridge (GDB), a novel generative modeling framework that accurately bridges initial and target geometric states. GDB leverages a probabilistic approach to evolve geometric state distributions, employing an equivariant diffusion bridge derived by a modified version of Doob s h-transform for connecting geometric states. This tailored diffusion process is anchored by initial and target geometric states as fixed endpoints and governed by equivariant transition kernels. Moreover, trajectory data can be seamlessly leveraged in our GDB framework by using a chain of equivariant diffusion bridges, providing a more detailed and accurate characterization of evolution dynamics. Theoretically, we conduct a thorough examination to confirm our framework s ability to preserve joint distributions of geometric states and capability to completely model the underlying dynamics inducing trajectory distributions with negligible error. Experimental evaluations across various real-world scenarios show that GDB surpasses existing state-of-theart approaches, opening up a new pathway for accurately bridging geometric states and tackling crucial scientific challenges with improved accuracy and applicability.

1 Introduction

Predicting the evolution of the geometric state of a system is essential across various scientific domains [46, 88, 55, 17, 20, 101], offering valuable insights into difficult tasks such as drug discovery [25, 29], reaction modeling [9, 24], and catalyst analysis [13, 105]. Despite its critical importance, accurately predicting future geometric states of interest is challenging. Experimental approaches often face obstacles due to strict environmental requirements and physical limits of instruments [102, 3, 69]. Computational approaches seek to solve the problem by simulating the dynamics based on underlying equations [81, 88]. Though providing greater flexibility, such calculations are typically driven by first-principle methods or empirical laws, either requiring extensive computational costs [68] or sacrificing accuracy [40].

Equal contribution. Correspondence to: Di He<dihe@pku.edu.cn>, Liwei Wang <wanglw@pku.edu.cn>.

38th Conference on Neural Information Processing Systems (Neur IPS 2024).

In recent years, deep learning has emerged as a pivotal tool in scientific discovery for many fields [43, 23, 69, 107], offering new avenues for tackling this problem. One line of approach aims to train models to predict target geometric states (e.g., equilibrium states) from initial states directly and develop neural network architectures that respect inherent symmetries of geometric states, such as the equivariance of rotation and translation [104, 31, 8, 87, 89, 103]. However, this paradigm requires encoding the iterative evolution into a single-step prediction model, which lacks the ability to fully capture the system s underlying dynamics and potentially leading to reduced accuracy. Another line of research trains machine learning force fields (MLFFs) to simulate the trajectory of geometric states over time [32, 34, 6, 70, 5, 58], showing a better efficiency-accuracy balance [15, 13, 105, 84]. Nevertheless, MLFFs are typically trained to predict intermediate labels, such as the force of the (local) current state. During inference, states are iteratively updated step by step. Since small local errors can accumulate, reliable predictions over long trajectories highly depend on the quality of intermediate labels, which cannot be guaranteed [7, 106, 30]. Therefore, an ideal solution that can precisely bridge initial and target geometric states and effectively leverage trajectory data (if available) as guidance is in great demand.

In this work, we introduce Geometric Diffusion Bridge (GDB), a general framework for bridging geometric states through generative modeling. From a probabilistic perspective, predicting target geometric states from initial states requires modeling the joint state distribution across different time steps. The diffusion models [37, 99] are standard choices to achieve this goal. However, these methods ideally generate data by denoising samples drawn from a Gaussian prior distribution, which makes it challenging to bridge pre-given geometric states or leverage trajectories in a unified manner. To address the issue, we establish a novel equivariant diffusion bridge by developing a modified version of Doob s h-transform [82, 81, 16]. The proposed stochastic differential equation (SDE) is anchored by initial and target geometric states to simultaneously model the joint state distribution and is governed by equivariant transition kernels to satisfy symmetry constraints. Intriguingly, we further demonstrate that this framework can seamlessly leverage trajectory data to improve prediction. With available trajectory data, we can construct chains of equivariant diffusion bridges, each modeling one segment in the trajectory. The segments are interconnected by properly setting the boundary conditions, allowing complete modeling of trajectory data. For model training, we derive a scalable and simulation-free matching objective similar to [59, 61, 77], which requires no computational overhead when trajectory data is leveraged.

Overall, our GDB framework offers a unified solution that precisely bridges geometric states by modeling the joint state distribution and comprehensively leverages available trajectories as finegrained depiction of dynamics for enhanced performance. Mathematically, we prove that the joint distribution of geometric states across different time steps can be completely preserved by our (chains of) equivariant diffusion bridge technique, confirming its expressiveness in bridging geometric states and underscoring the necessity of design choices in our framework. Furthermore, under mild and practical assumptions, we prove that our framework can approximate the underlying dynamics governing the evolution of geometric state trajectories with negligible error in convergence, remarking on the completeness and usefulness of our framework in different scenarios. These advantages show the superiority of our framework over existing approaches.

Practically, we provide a comprehensive guidance for implementing our GDB framework in realworld applications. To verify its effectiveness and generality, we conduct extensive experiments covering diverse data modalities (simple molecules & adsorbate-catalyst complex), scales (small, medium and large scales) and scenarios (with & without trajectory guidance). Numerical results show that our GDB framework consistently outperforms existing state-of-the-art machine learning approaches by a large margin. In particular, our method even surpasses strong MLFF baselines that are trained on 10 more data in the challenging structure relaxation task of OC22 [105], and trajectory guidance can further enhance our performance. The significantly superior performance demonstrates the high capacity of our framework to capture the complex evolution dynamics of geometric states and determine valuable and crucial geometric states of interest in critical real-world challenges.

2 Background

2.1 Problem Definition

Our task of interest is to capture the evolution of geometric states, i.e., predicting future states from initial states. Formally, let S denote a system consisting of a set of objects located in the

three-dimensional Euclidean space. We use H Rn d to denote the objects with features, where n is the number of objects, and d is the feature dimension. For object i, let ri R3 denote its Cartesian coordinate. We define the system as S = (H, R), where R = {r1, ..., rn}. This data structure ubiquitously corresponds to various real-world systems such as molecules and proteins [17, 20, 101]. In practice, the geometric state is governed by physical laws and evolves over time, and we denote the geometric state at a given time t as Rt = {rt 1, ..., rt n}. Given a system St0 = (H, Rt0) at time t0, our goal is to predict St1 = (H, Rt1) at a future time t1. As an example, in a molecular system, Rt1 can be the equilibrium state of interest evolved from the initial state Rt0.

In this problem, inherent symmetries in geometric states should be considered. For example, a rotation that is applied to the coordinate system at time t0 should also be applied to subsequent time steps. These symmetries are related to the concept of equivariance in group theory [19, 18, 91]. Formally, let ϕ : X Y denote a function mapping between two spaces. Given a group G, let ρX and ρY denote its group representations, which describe how the group elements act on these spaces. A function ϕ : X Y is said to be equivariant if it satisfies the following condition: ρY(g)[ϕ(x)] = ϕ ρX (g)[x] , g G, x X. When ρY = IY (identity transformation), it is also known as invariance. SE(3) group, which pertains to translations (T(3)) and rotations (SO(3)) in 3D Euclidean space, is one of the most widely used groups and is employed in our framework.

2.2 Diffusion Models

Diffusion models [95, 37, 99] have emerged as the state-of-the-art generative modeling approaches across various domains [83, 85, 47, 115, 113, 117]. The main idea of this method is to construct a diffusion process that maps data to noise, and train models to reverse such process by using a tractable objective.

Formally, to model the data distribution qdata(X), where X Rd, we construct a diffusion process (Xt)t [0,T ], which is represented as a sequence of random variables indexed by time steps. We set X0 qdata(X) and XT pprior(X), where pprior(X) has a tractable form to generate samples efficiently, e.g. standard Gaussian distribution. Mathematically, we model (Xt)t [0,T ] as the solution to the following stochastic differential equation (SDE):

d Xt = f(Xt, t)dt + σ(t)d Bt, (1)

where f( , ) : Rd [0, T] Rd is a vector-valued function called the drift coefficient, σ( ) : [0, T] R is a scalar function known as the diffusion coefficient, and (Bt)t [0,T ] is the standard Wiener process (a.k.a., Brownian motion) [26]. We hereafter denote by pt(X) the marginal distribution of Xt. Let p(x , t |x, t) denote the transition density function such that P(Xt A|Xt = x) = R

A p(x , t |x, t)dx for any Borel set A. By simulating this diffusion process forward in time, the distribution of Xt will become pprior(X) at the final time T. In the literature, there exist various design choices of the SDE formulation in Eqn. (1) such that it transports the data distribution into the fixed prior distribution [98, 37, 99, 72, 97, 47].

In order to sample X0 p0(X) := qdata(X), an intriguing fact can be leveraged: the reverse of a diffusion process is also a diffusion process [2]. This reverse process runs backward in time and can be formulated by the following time-reversal SDE:

d Xt = f(Xt, t) σ2(t) Xt log pt(Xt) dt + σ(t)d Bt, (2)

where X log pt(X) denote the score of the marginal distribution at time t. If the score is known for all time, then we can derive the reverse diffusion process from Eqn. (2), sample from pprior(X), and simulate this process to generate samples from the data distribution qdata(X). In particular, the score X log pt(X) can be estimated by training a parameterized model sθ(X, t) with a denoising score matching objective [98, 97]. In theory, the minimizer of this objective approximates the ground-truth score [99] and this objective is tractable.

3 Geometric Diffusion Bridge

As discussed in the introduction, effectively capturing the evolution of geometric states is crucial, for which three desiderata should be carefully considered:

Coupling Preservation: From a probabilistic perspective, the evolution of geometric states transports their distribution from qdata(St0) to qdata(St1), and we are interested in modeling the distribution of target geometric states given the initial states, i.e., qdata(St1|St0) := qdata(Rt1|H, Rt0), which can be achieved by preserving the coupling of geometric states, i.e., qdata(Rt0, Rt1|H). For brevity, we hereafter omit the condition of H because it keeps the same along the evolution and can be easily incorporated into the models.

Symmetry Constraints: Since the law governing the evolution is unchanged regardless of how the system is rotated or translated, the distribution of the geometric states should satisfy symmetry constraints, i.e., qdata(ρR(g)[Rt1]|ρR(g)[Rt0]) = qdata(Rt1|Rt0) and qdata(ρR(g)[Rt0], ρR(g)[Rt1]) = qdata(Rt0, Rt1) for all g SE(3), Rt R.

Trajectory Guidance: Trajectories of geometric states are sometimes accessible and provide fine-grained descriptions of the evolution dynamics. For completeness, it is crucial to develop a unified framework that can characterize and leverage trajectory data as guidance for better bridging geometric states and capturing the evolution.

However, existing approaches typically have their limitations for this task, which we thoroughly discuss in Sec. 5 and summarize into Table 1. In this section, we introduce Geometric Diffusion Bridge (GDB), a general framework for bridging geometric states through generative modeling. We will elaborate on key techniques for completely preserving couping under symmetry constraints (Sec. 3.1), and demonstrate how our framework can be seamlessly extended to leverage trajectory data (Sec. 3.2). Theoretically, we conduct a thorough analysis on the capability of our unified framework, showing its completeness and superiority. All proofs of theorems are presented in Appendix B. A detailed guidance of practical implementing our framework is further provided (Sec. 3.3).

Table 1: Comparisons of different candidates for bridging geometric states

Methods Symmetry Constraints Coupling Preservation Trajectory guidance Direct Prediction [104, 31, 87, 89, 8] MLFFs [90, 33, 6, 34, 58] Geometric Diffusion Model [115, 38, 114] Geometric Diffusion Bridge (ours)

3.1 Equivariant Diffusion Bridge

Our key design lies in the construction of equivariant diffusion bridge, a tailored diffusion process (Rt)t [0,T ] for bridging initial states R0 qdata(Rt0) and target states RT qdata(Rt1|Rt0), completely preserving coupling of geometric states and satisfying symmetry constraints. Firstly, we investigate necessary conditions for a diffusion process on geometric states to meet the symmetric constraints:

Proposition 3.1. Let R denote the space of geometric states and f R( , ) : R [0, T] R denote the drift coefficient on R. Let (Wt)t [0,T ] denote the Wiener process on R. Given an SDE on geometric states d Rt = f R(Rt, t)dt + σ(t)d Wt, R0 q(R0), its transition density p R(z , t |z, t), z, z R is SE(3)-equivariant, i.e., p R(Rt , t |Rt, t) = p R(ρR(g)[Rt ], t |ρR(g)[Rt], t), g SE(3), 0 t, t T, if these conditions are satisfied: (1) q(R0) is SE(3)-invariant; (2) f R( , t) is SO(3)- equivariant and T(3)-invariant; (3) the transition density of (Wt)t [0,T ] is SE(3)-equivariant.

Using Proposition 3.1, we can obtain a diffusion process that respect symmetry constraints by properly considering conditions for key components. Next, we modify a useful tool in probability theory called Doob s h-transform [82, 81, 16], which plays an essential role in the construction of our equivariant diffusion bridge for preserving coupling of geometric states:

Proposition 3.2. Let p R(z , t |z, t) be the transition density of the SDE in Proposition 3.1. Let h R( , ) : R [0, T] R>0 be a smooth function satisfying: (1) h R( , t) is SE(3)-invariant; (2) h R(z, t) = R p R(z , t |z, t)h R(z , t )dz . Then we can derive the following h R-transformed SDE on geometric states:

d Rt = f R(Rt, t) + σ2(t) Rt log h R(Rt, t) dt + σ(t)d Wt, (3)

with SE(3)-equivariant transition density ph R(z , t |z, t) equals to p R(z , t |z, t) h R(z ,t )

Proposition 3.2 provides an equivariant version of Doob s h-transform, which can be used to guide a free SDE on geometric states to hit an event almost surely. For example, if we set h R( , t) = p R(z, T| , t), z R, i.e., the transition density of the original SDE evaluated at RT = z, then the h R-transformed SDE in Eqn. (3) arrives at the specific geometric state z almost surely at the final time (see Proposition B.7 in the appendix for more details). Therefore, if we derive a proper h R( , ) function under the symmetry constraints, our target process (Rt)t [0,T ] can be constructed:

Theorem 3.3 (Equivariant Diffusion Bridge). Let d Rt = f R(Rt, t)dt + σ(t)d Wt be an SDE on geometric states with transition density p R(z , t |z, t), z, z R satisfying the conditions in Proposition 3.1. Let h R(z, t; z0) = R p R(z , T|z, t) qdata(z |z0) p R(z ,T |z0,0)dz . By using Proposition 3.2, we can derive the following h R-transformed SDE:

d Rt = f R(Rt, t) + σ2(t)Eq R(RT ,T |Rt,t;R0,0)[ Rt log p R(RT , T|Rt, t)|R0, Rt] dt+σ(t)d Wt, (4) which corresponds to a process (Rt)t [0,T ], R0 qdata(Rt0) satisfying the following properties:

let q( , ) : R R R 0 denote the joint distribution induced by (Rt)t [0,T ], then q(R0, RT ) equals to qdata(Rt0, Rt1);

its transition density q R(Rt , t |Rt, t; R0, 0)=q R(ρR(g)[Rt ], t |ρR(g)[Rt], t; ρR(g)[R0], 0), 0 t,t T, g SE(3),R0 qdata(Rt0).

We call the tailored diffusion process (Rt)t [0,T ] an equivariant diffusion bridge.

According to Theorem 3.3, given an initial geometric state Rt0, we can predict target geometric states Rt1 by simulating the equivariant diffusion bridge (Rt)t [0,T ] from R0 = Rt0, which arrives at RT qdata(Rt1|Rt0). However, the score Eq R(RT ,T |Rt,t;R0,0)[ Rt log p R(RT , T|Rt, t)|R0, Rt] in Eqn. (4) is not tractable in general. Inspired by the score matching objective in diffusion models [99], we use a parameterized model vθ(Rt, t; R0) to estimate the score by using the following training objective: L(θ) = E(z0,z1) qdata(Rt0,Rt1),Rt q R(Rt,t|z1,T ;z0,0)λ(t) vθ(Rt, t; z0) Rt log p R(z1, T|Rt, t) 2, (5)

where t U(0, T) (the uniform distribution on [0, T]), and λ( ) : [0, T] R 0 is a positive weighting function. Theoretically, we prove that the minimizer of Eqn. (5) approximates the groundtruth score (see Appendix B.5 for more details). Moreover, this objective is tractable because the transition density p R and q R can be designed to have simple and explicit forms such as Gaussian, which we will elaborate on in Sec. 3.3.

3.2 Chain of Equivariant Diffusion Bridges for Leveraging Trajectory Guidance

In this subsection, we elaborate on how to leverage trajectories of geometric states as a fine-grained guidance in our framework. Let ( Ri)i [N] denote a trajectory of N + 1 geometric states and qtraj( R0, ..., RN) denote the joint probability density function of geometric states in a trajectory. In practice, the markov property of trajectories typically holds [109, 78]. Under this assumption, qtraj( R0, ..., RN) can be equivalently reformulated into q0 traj( R0) QN i=1 qi traj( Ri| Ri 1) by the chain rule of probability. If qi traj( Ri| Ri 1) can be well modeled, we can capture the distribution of trajectories of geometric states completely.

According to Theorem 3.3, given R0 q0 traj( R0), an equivariant diffusion bridge (Rt)t [0,T ] can be constructed to model the joint distribution qtraj( R0, R1) and hence q1 traj( R1| R0) is preserved. Therefore, if we construct a series of interconnected equivariant diffusion bridges, the distribution of trajectories can be modeled: Theorem 3.4 (Chain of Equivariant Diffusion Bridges). Let {(Rt i)t [0,T ]}i [N 1] denote a series of N equivaraint diffusion bridges defined in Theorem 3.3. For the i-th bridge (Rt i)t [0,T ], if we set

(1) hi R(z, t; z0) = R p R(z , T|z, t) qi+1 traj (z |z0) p R(z ,T |z0,0)dz ; (2) R0 0 q0 traj( R0), R0 i = RT i 1, 0 < i < N, then the joint distribution q R(R0 0, RT 0 , RT 1 , , RT N 1) induced by {(Rt i)t [0,T ]}i [N 1] equals to qtraj( R0, ..., RN). We call this process a chain of equivariant diffusion bridges.

In this way, a chain of equivariant diffusion bridge can be used to model prior trajectory data, and simulating this chain not only bridges initial and target geometric states but also yields intermediate evolving states. Similarly, we can also use a parameterized model to estimate the scores of bridges in this chain. Instead of having only one objective in all time steps, we now have N bridges in total, which categorize the time span into N groups with different time-dependent objectives. Therefore, by properly specifying time steps and initial conditions, the objective in Eqn. (5) can be seamlessly extended (see Appendix B.7 for more details on its provable guarantee):

L (θ) = E(z0,...,z N) qtraj( R0,..., RN),t,Rt i λ(t) vθ(Rt i , t; zi) Rt i log pi R(zi+1, T|Rt i , t ) 2, (6)

where t U(0, N T), i = t

T , t = t i T, Rt i qi R(Rt i , t |zi+1, T; zi, 0).

Lastly, we provide the following theoretical result, which further characterizes our framework s expressiveness to completely model the underlying dynamics that induce the trajectory distributions:

Theorem 3.5. Assume ( Ri)i [N] is sampled by simulating a prior SDE on geometric states d Rt = H R( Rt)dt + σd Wt. Let µ i denote the path measure of this prior SDE when t [i T, (i + 1)T]. Building upon ( Ri)i [N], let {µi R}i [N 1] denote the path measure of our chain of equivariant diffusion bridges. Under mild assumptions, we have lim N max i KL(µ i ||µi R) = 0.

It is noteworthy that the assumption of the prior SDE existence holds in various real-world applications. For example, in geometry optimization, we can formulate the iterative updating process of a molecular system as d Rt = α Rt V (Rt)dt + βd Wt, where V (Rt) denotes the potential energy at Rt and α, β are step sizes [88]. From Theorem 3.5, such prior SDE serves as the underlying law governing the evolution dynamic, and our chain of equivariant diffusion bridges constructed from empirical trajectory data can well approximate it, showing the completeness of our framework.

3.3 Practical Implementation

In this subsection, we elaborate on how to practically implement our framework. According to Eqn. (5), it is necessary to carefully design (1) tractable distribution q R(Rt, t|z1, T; z0, 0) for sampling Rt; (2) closed-form matching objective Rt log p R(z1, T|Rt, t).

Matching objective. Inspired by diffusion models that use Gaussian transition kernels for tractable computation, we design the SDE on geometric states in Proposition 3.1 to be:

d Rt = σd Wt, with transition density p R(z , t |z, t) = N(z0, σ2(t t)I) (7)

The explicit form of the objective can be directly calculated, i.e., Rt log p R(z1, T|Rt, t) = z1 Rt

Sampling distribution. According to Theorem 3.3, the transition density q R(Rt, t|z1, T; z0, 0) can be calculated by using the Doob s h-transform in Proposition 3.2, i.e., q R(Rt, t|z1, T; z0, 0) = p R(Rt, t|z1, T) h R(Rt,t;z0)

h R(z1,T ;z0). Moreover, h R is determined by qdata and p R, which is already specified

in Eqn. (7). Therefore, we can also calculate q R(Rt, t|z1, T; z0, 0) = N( t

T z0, σ2 t(T t)

Symmetry constraints. In proposition 3.1, we have several conditions that should be satisfied to meet the symmetry constraints. Firstly, since a parameterized model vθ(Rt, t; R0) is used to estimate the score of our equivariant diffusion bridge, it should be SO(3)-equivariant and T(3)-invariant. Besides, we follow [50, 115] to consider Co M-free systems: given R = {r1, ..., rn}, we define r = 1

n Pn i=1 ri and the Co M-free version of R = {r1 r, ..., rn r}. To sample from N(z0, σ2I) with z0 R consisting of n objects, we (1) sample ϵ = {ϵi}n i=1 by i.i.d. drawing ϵi N(0, I3); (2) calculate the Co M-free ϵ of ϵ; (3) obtain z0 + σϵ .

Trajectory guidance. According to Eqn. (6), both pi R and qi R for all i [N 1] should be determined. Similarly, we set pi R(zi+1, T|Rt , t )=N(Rt , σ2 i (T t )I), which further induces qi R(Rt , t |zi+1, T; zi, 0) = N( t

T zi+1 + T t

T zi, σ2 i t (T t )

Combining all the above design choices, we have the following algorithms for training our Geometric Diffusion Bridge (Alg. 3) and leveraging trajectory guidance if available (Alg. 4). After the model

is well trained, we leverage ODE numerical solvers [12] to simulate the bridge process by using its equivalent probability flow ODE [99]. In this way, we can effectively and deterministically predict future geometric states of interest from initial states in an efficient iterative process. Lastly, it is also noteworthy that our framework is general to be implemented by using other advanced design strategies [99, 47, 48], which we leave as future work.

Algorithm 1 Training

1: repeat 2: (z0, z1) qdata(Rt0, Rt1) 3: t U[0, T] 4: ϵ N(0, I)

5: Rt = t T z1 + T t

T σϵ 6: Take gradient descent step on

θλ(t) z1 Rt

σ2(T t) vθ(Rt, t; z0) 2

7: until converged

Algorithm 2 Training with trajectory guidance

1: repeat 2: (z0, . . . , z N) qtraj( R0, . . . , RN) 3: t U (0, N T), i = t

T , t = t i T 4: ϵ N(0, I)

5: Rt i = t

T zi+1 + T t

T σiϵ 6: Take gradient descent step on

θλ(t) zi+1 Rt i σ2 i (T t ) vθ(Rt i , t; zi)

7: until converged

4 Experiments

In this section, we empirically study the effectiveness of our Geometric Diffusion Bridge on crucial real-world challenges requiring bridging geometric states. In particular, we carefully design several experiments covering different types of data, scales and scenarios, as shown in Table 2. Due to space limits, we present more details in Appendix D.

Table 2: Summary of experimental setup.

Dataset Task Description Data Type Trajectory data Training set size QM9 [79] Equilibrium State Prediction Simple molecule 110,000 Molecule3D [116] Equilibrium State Prediction Simple molecule 2,339,788 OC22, IS2RS [13] Structure Relaxation Adsorbate-Catalyst complex 45,890

4.1 Equilibrium State Prediction

Task. Equilibrium states typically represent local minima on the Born-Oppenheimer potential energy surface of a molecular system [54], which correspond to its most stable geometric state and play an essential role in determining its properties in various aspects [4, 21]. In this task, our goal is to accurately predict the equilibrium state from the initial geometric state of a molecular system.

Dataset. Two popular datasets are used: (1) QM9 [79] is a medium-scale dataset that has been widely used for molecular modeling, consisting of 130,000 organic molecules. In convention, 110k, 10k, and 11k molecules are used for train/valid/test sets respectively; (2) Molecule3D [116] is a largescale dataset curated from the Pub Chem QC project [67, 71], consisting of 3,899,647 molecules in total and its train/valid/test splitting ratio is 6 : 2 : 2. In particular, both random and scaffold splitting methods are adopted to thoroughly evaluate the in-distribution and out-of-distribution performance. For each molecule, an initial geometric state is generated by using fast and coarse force field [73, 52] and geometry optimization is conducted to obtain DFT-calculated equilibrium geometric structure.

Setting. In this task, we parameterize vθ(Rt, t; R0) by extending a Graph-Transformer based equivariant network [92, 63] to encode both time steps and initial geometric states as conditions. For inference, we use 10 time steps with the Euler solver [12]. Following [111], we choose several strong baselines for a comprehensive comparison, and use three metrics for measuring the error between predicted target states and ground-truth states: C-RMSD, D-MAE and D-RMSE. The detailed descriptions of the baselines, evaluation metrics and training settings are presented in Appendix D.1.

Results. Results on QM9 and Molecule3D are shown in Table 3 and 4 respectively. It can be easily seen that our GDB framework consistently surpasses all baselines by a significantly large margin on

Table 3: Results on the QM9 dataset (Å). We report the official results of baselines from [111]

Validation Test D-MAE D-RMSE C-RMSD D-MAE D-RMSE C-RMSD RDKit DG 0.358 0.616 0.722 0.358 0.615 0.722 RDKit ETKDG 0.355 0.621 0.691 0.355 0.621 0.689 GINE [39] 0.357 0.673 0.685 0.357 0.669 0.693 GATv2 [10] 0.339 0.663 0.661 0.339 0.659 0.666 GPS [80] 0.326 0.644 0.662 0.326 0.640 0.666 GTMGC [111] 0.262 0.468 0.362 0.264 0.470 0.367 GDB (ours) 0.092 0.218 0.143 0.096 0.223 0.148

Table 4: Results on the Molecule3D dataset (Å). We report the official results of baselines from [111]

Validation Test D-MAE D-RMSE C-RMSD D-MAE D-RMSE C-RMSD (a) Random Split RDKit DG 0.581 0.930 1.054 0.582 0.932 1.055 RDKit ETKDG 0.575 0.941 0.998 0.576 0.942 0.999 Deeper GCN-DAGNN [116] 0.509 0.849 * 0.571 0.961 * GINE [39] 0.590 1.014 1.116 0.592 1.018 1.116 GATv2 [10] 0.563 0.983 1.082 0.564 0.986 1.083 GPS [80] 0.528 0.909 1.036 0.529 0.911 1.038 GTMGC [111] 0.432 0.719 0.712 0.433 0.721 0.713 GDB (ours) 0.374 0.631 0.622 0.376 0.626 0.619 (b) Scaffold Split RDKit DG 0.542 0.872 1.001 0.524 0.857 0.973 RDKit ETKDG 0.531 0.874 0.928 0.511 0.859 0.898 Deeper GCN-DAGNN [116] 0.617 0.930 * 0.763 1.176 * GINE [39] 0.883 1.517 1.407 1.400 2.224 1.960 GATv2 [10] 0.778 1.385 1.254 1.238 2.069 1.752 GPS [80] 0.538 0.885 1.031 0.657 1.091 1.136 GTMGC [111] 0.406 0.675 0.678 0.400 0.679 0.693 GDB (ours) 0.335 0.587 0.592 0.341 0.608 0.603

QM9, e.g., 60.5%/59.7% relative C-RMSD reduction on valid/test sets respectively, establishing a new state-of-the-art performance. Similar trends also can be observed in Molecule3D, i.e., 12.6%/13.2% relative C-RMSD reduction for valid/test sets of the random split and 12.7%/13.0% reduction for the scaffold split, largely outperforming the best baseline. These significant error reduction results show the superiority of our GDB framework for bridging geometric states, and its generality on both medium and large-scale challenges. Moreover, our framework performs consistently across valid and tests of both random and scaffold splits, further verifying its robustness in challenging scenarios.

4.2 Structure Relaxation

Task. Catalyst discovery is crucial for various applications. Adsorbate candidates are placed on catalyst surfaces and evolve through structure relaxation to adsorption states, in which the adsorption structures can be determined for measuring catalyst activity and selectivity. Our goal is thus to accurately predict adsorption states from initial states of adsorbate-catalyst complexes.

Dataset. We adopt Open Catalyst 2022 (OC22) dataset [105], which has great significance for the development of Oxygen Evolution Reaction (OER) catalysts. Each data is in the form of the adsorbatecatalyst complex. Both initial and adsorption states with trajectories connecting them are provided. The training set consists of 45,890 catalyst-adsorbate complexes. To better evaluate the model s performance, the validation and test sets consider the in-distribution (ID) and out-of-distribution (OOD) settings which use unseen catalysts, containing approximately 2,624 and 2,780 complexes respectively.

Setting. Following [105], we use the Average Distance within Threshold (ADw T) as the evaluation metric, which reflects the percentage of structures with an atom position MAE below thresholds. We parameterize vθ(Rt, t; R0) by using Gem Net-OC [34], which also serves as a verification that

Table 5: Results on the OC22 IS2RS Validation set. "OC20+OC22" denotes using both OC20 [13] and OC22 data; "OC20 OC22" means pre-training on OC20 data then fine-tuning on OC22 data; "OC22-only" means only using OC22 data. We report the official results of baselines from [105]

Model ADw T [%] (ID) ADw T [%] (OOD) Avg [%] OC20+OC22 Spin Conv [94] 55.79 47.31 51.55 Gem Net-OC [34] 60.99 53.85 57.42 OC20 OC22 Spin Conv [94] 56.69 45.78 51.23 Gem Net-OC [34] 58.03 48.33 53.18 Gem Net-OC-Large [34] 59.69 51.66 55.67 OC22-only IS baseline 44.77 42.59 43.68 Spin Conv [94] 54.53 40.45 47.49 Gem Net-d T [32] 59.68 51.25 55.46 Gem Net-OC [34] 60.69 52.90 56.79 GDB (ours) 63.01 55.78 59.39 trajectory guidance 62.14 54.94 58.54 R0 condition 60.17 49.26 54.71

our framework is compatible with different backbone models. For inference, we also use 10 time steps with the Euler solver. Following [105], we choose strong MLFF baselines trained on force field data for a challenging comparison. The detailed descriptions of baselines and settings are presented in Appendix D.2.

Results. In Table 5, our GDB significantly outperforms the best baseline, e.g., 3.3%/3.6%/3.4% relative improvement on the ADw T metric of ID, OOD and Avg respectively. It is noteworthy that the best baseline is the Gem Net-OC force field trained on both OC20 and OC22 data, which is 10 times more than OC22 data only. Nevertheless, our framework still achieves better performance on predicting the adsorption geometric states. Moreover, our framework without using any trajectory data still can achieve better performance compared to the best baseline, e.g., 58.54 v.s. 57.42 Avg[%]. All the results on this challenging task further demonstrate the superiority and completeness of our framework.

Ablation study. Furthermore, we conduct ablation studies to examine key designs of our framework in Table 5. Firstly, we can see that using trajectory guidance indeed improves the performance of our framework, e.g., 1.4% relative improvement on Avg ADw T. Moreover, we also investigate the impact of R0 condition in vθ(Rt, t; R0), which plays an essential role in preserving the joint distribution of geometric states. Without this condition, we can see a significant drop, e.g., 6.5%/10.3% relative ADw T drop on Avg/OOD respectively. Overall, these ablation studies serve as strong supports on the necessity of developing a unified framework that can precisely bridge geometric states by preserving their joint distributions and effectively leverage trajectory data as guidance for enhanced performance.

5 Related Works

Direct Prediction. One line of approach for bridging geometric states is direct prediction, i.e., training a model to directly predict target geometric states given initial states as input. Models that carefully respect symmetry constraints such as the equivariance to 3D rotations and translations are typically used, which are called Geometric Equivariant Networks [11, 36, 120, 27]. Different techniques have been explored to encode such priors, which mainly include vector operations such as scalar and vector product [35, 87, 89, 41, 103, 14], e.g., the scalar-vector product used in EGNN [87], and tensor product based operations [104, 31, 8, 57, 64]. Despite its simplicity and efficiency, direct prediction requires encoding the iterative evolution of geometric states into a single-step prediction model, which lacks the ability to capture the underlying dynamics and cannot leverage trajectories of geometric states.

Machine Learning Force Field. Another line of approach is called machine learning force field (MLFF) [106, 5, 6, 70, 75, 58], which are trained to predict intermediate labels, such as the potential

energy or force of the (local) current geometric state instead. After training, MLFFs can be used to simulate the trajectory of geometric states over time based on underlying equations. Using Geometric Equivariant Networks as the backbone, MLFFs typically satisfy the symmetry constraints. Besides, trajectory data with additional energy or force labels can directly be used for training MLFFs. However, this paradigm highly depends on the existence and quality of intermediate labels since small local errors in energy or force prediction can accumulate along the simulation process [7, 106, 30]. Moreover, there exists no guarantee that MLFFs can completely model joint state distributions, which is another limitation for bridging geometric states.

Geometric Diffusion Models. In recent years, diffusion models [37, 99] have emerged with stateof-the-art generative modeling performance across various domains [85, 108, 51, 56]. In geometric domain, diffusion models are typically used for molecule conformation generation [115, 114, 38] and protein design [108, 117]. By properly design the noising process and model architectures, symmetry constraints on the transition kernel and prior distribution can be satisfied, which guarantees the generated data is sampled from roto-translational invariant distributions [115, 38]. In addition to the score-based formulation, recent advances further extend new techniques such as flow matching [59, 61, 1] to satisfy symmetry constraints for these generation tasks [49, 100]. Nevertheless, there exists no guaratee that these approaches can model the joint distribution of geometric states [61, 96]. And how to leverage trajectory data as guidance for bridging geometric states is also challenging.

Other techniques. More Red [45] trains a diffusion model on equilibrium molecule conformations with a time step predictor, and directly use it for bridging any conformations to their equilibrium states. GTMGC [111] instead develop a Graph Transformer to directly predict equilibrium conformations from their 2D graph forms. Both of them are limited to the equilibrium conformation prediction task, cannot preserve the joint state distribution and leverage trajectory data. EGNO [112] is a concurrent work that develops a neural operator based approach to model dynamics of trajectories. By carefully designing temporal convolution in fourier spaces, EGNO can learn from trajectory data. However, this tailored approach cannot be directly used without trajectory guidance. To preserve joint data distributions, [22, 121] coincide with us to leverage Doob s h-transform to repurposing standard diffusion processes, but they do not respect symmetry constraints and cannot leverage trajectories. There also exist recent works that study the diffusion bridge framework [76, 93] and apply it to various domains such as images and graphs [110, 62, 42]. Compared to all above approaches, our GDB framework stands out as a unique and ideal solution that can precisely bridge geometric states and effectively leverage trajectory data (if available) in a unified manner.

6 Conclusion

In this work, we introduce Geometric Diffusion Bridge (GDB), a general framework for bridging geometric states through generative modeling. We leverage a modified version of Doob s h-transform to constructe an equivariant diffusion bridge for bridging initial and target geometric states. Trajectory data can further be seamlessly leveraged as guidance by using a chain of equivariant diffusion bridges, allowing complete modeling of trajectory data. Mathematically, we conduct a comprehensive theoretical analysis showing our framework s ability to preserve joint distributions of geometric states and capability to completely model the evolution dynamics. Empirical comparisons on different settings show that our GDB significantly surpasses existing state-of-the-art approaches and ablation studies further underscore the necessity of several key designs in our framework. In the future, it is worth exploring better implementation strategies of our framework for enhanced performance, and applying our GDB to other critical challenges involving bringing geometric states.

Broader Impacts and Limitations

This work newly proposes a general framework to bridge geometric states, which has great significance in various scientific domains. Our experimental results have also demonstrated considerable positive potential for various applications, such as catalyst discovery and molecule optimization, which can significantly contribute to the advancement of renewable energy processes and chemistry discovery. However, it is essential to acknowledge the potential negative impacts including the development of toxic drugs and materials. Thus, stringent measures should be implemented to mitigate these risks.

There also exist some limitations to our work. For the sake of generality, we do not experiment with advanced implementation strategies of training objectives and sampling algorithms, which leave room for further improvement. Besides, the employment of Transformer-based architectures may also limit the efficiency of our framework. This has also become a common issue in transformer-based diffusion models, which we have earmarked for future research.

Acknowledgements

We thank all the anonymous reviewers for the very careful and detailed reviews as well as the valuable suggestions. Their help has further enhanced our work. Liwei Wang is supported by National Science and Technology Major Project (2022ZD0114902) and National Science Foundation of China (NSFC62276005). Di He is supported by National Science Foundation of China (NSFC62376007).

[1] Michael S Albergo, Nicholas M Boffi, and Eric Vanden-Eijnden. Stochastic interpolants: A unifying framework for flows and diffusions. ar Xiv preprint ar Xiv:2303.08797, 2023.

[2] Brian DO Anderson. Reverse-time diffusion equation models. Stochastic Processes and their Applications, 12(3):313 326, 1982.

[3] Muratahan Aykol, Joseph H Montoya, and Jens Hummelshøj. Rational solid-state synthesis routes for inorganic materials. Journal of the American Chemical Society, 143(24):9244 9259, 2021.

[4] Keld L Bak, Jürgen Gauss, Poul Jørgensen, Jeppe Olsen, Trygve Helgaker, and John F Stanton. The accurate determination of molecular equilibrium structures. The Journal of Chemical Physics, 114(15):6548 6556, 2001.

[5] Ilyes Batatia, David P Kovacs, Gregor Simm, Christoph Ortner, and Gábor Csányi. Mace: Higher order equivariant message passing neural networks for fast and accurate force fields. Advances in Neural Information Processing Systems, 35:11423 11436, 2022.

[6] Simon Batzner, Albert Musaelian, Lixin Sun, Mario Geiger, Jonathan P Mailoa, Mordechai Kornbluth, Nicola Molinari, Tess E Smidt, and Boris Kozinsky. E (3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials. Nature communications, 13(1):2453, 2022.

[7] Jorg Behler. Perspective: Machine learning potentials for atomistic simulations. The Journal of chemical physics, 145(17), 2016.

[8] Johannes Brandstetter, Rob Hesselink, Elise van der Pol, Erik J Bekkers, and Max Welling. Geometric and physical quantities improve e(3) equivariant message passing. In International Conference on Learning Representations, 2022.

[9] Linda J Broadbelt, Scott M Stark, and Michael T Klein. Computer generated pyrolysis modeling: on-the-fly generation of species, reactions, and rates. Industrial & Engineering Chemistry Research, 33(4):790 799, 1994.

[10] Shaked Brody, Uri Alon, and Eran Yahav. How attentive are graph attention networks? ar Xiv preprint ar Xiv:2105.14491, 2021.

[11] Michael M Bronstein, Joan Bruna, Taco Cohen, and Petar Veliˇckovi c. Geometric deep learning: Grids, groups, graphs, geodesics, and gauges. ar Xiv preprint ar Xiv:2104.13478, 2021.

[12] John Charles Butcher. Numerical methods for ordinary differential equations. John Wiley & Sons, 2016.

[13] Lowik Chanussot, Abhishek Das, Siddharth Goyal, Thibaut Lavril, Muhammed Shuaibi, Morgane Riviere, Kevin Tran, Javier Heras-Domingo, Caleb Ho, Weihua Hu, et al. Open catalyst 2020 (oc20) dataset and community challenges. Acs Catalysis, 11(10):6059 6072, 2021.

[14] Tianlang Chen, Shengjie Luo, Di He, Shuxin Zheng, Tie-Yan Liu, and Liwei Wang. Geo MFormer: A general architecture for geometric molecular representation learning. In Forty-first International Conference on Machine Learning, 2024.

[15] Stefan Chmiela, Alexandre Tkatchenko, Huziel E Sauceda, Igor Poltavsky, Kristof T Schütt, and Klaus-Robert Müller. Machine learning of accurate energy-conserving molecular force fields. Science advances, 3(5):e1603015, 2017.

[16] Kai Lai Chung and John B Walsh. Markov processes, Brownian motion, and time symmetry, volume 249. Springer Science & Business Media, 2006.

[17] Jonathan Clayden, Nick Greeves, and Stuart Warren. Organic chemistry. Oxford University Press, USA, 2012.

[18] John F Cornwell. Group theory in physics: An introduction. Academic press, 1997.

[19] F Albert Cotton. Chemical applications of group theory. John Wiley & Sons, 1991.

[20] F Albert Cotton, Geoffrey Wilkinson, Carlos A Murillo, and Manfred Bochmann. Advanced inorganic chemistry. John Wiley & Sons, 1999.

[21] Attila G Császár, Gábor Czakó, Tibor Furtenbacher, Jonathan Tennyson, Viktor Szalay, Sergei V Shirin, Nikolai F Zobov, and Oleg L Polyansky. On equilibrium structures of the water molecule. The Journal of chemical physics, 122(21), 2005.

[22] Valentin De Bortoli, Guan-Horng Liu, Tianrong Chen, Evangelos A Theodorou, and Weilie Nie. Augmented bridge matching. ar Xiv preprint ar Xiv:2311.06978, 2023.

[23] Jonas Degrave, Federico Felici, Jonas Buchli, Michael Neunert, Brendan Tracey, Francesco Carpanese, Timo Ewalds, Roland Hafner, Abbas Abdolmaleki, Diego de Las Casas, et al. Magnetic control of tokamak plasmas through deep reinforcement learning. Nature, 602(7897):414 419, 2022.

[24] Amanda L Dewyer, Alonso J Argüelles, and Paul M Zimmerman. Methods for exploring reaction space in molecular systems. Wiley Interdisciplinary Reviews: Computational Molecular Science, 8(2):e1354, 2018.

[25] Jacob D Durrant and J Andrew Mc Cammon. Molecular dynamics simulations and drug discovery. BMC biology, 9:1 9, 2011.

[26] Rick Durrett. Probability: theory and examples, volume 49. Cambridge university press, 2019.

[27] Alexandre Duval, Simon V Mathis, Chaitanya K Joshi, Victor Schmidt, Santiago Miret, Fragkiskos D Malliaros, Taco Cohen, Pietro Liò, Yoshua Bengio, and Michael Bronstein. A hitchhiker s guide to geometric gnns for 3d atomic systems. ar Xiv preprint ar Xiv:2312.07511, 2023.

[28] Andreas Eberle. Stochastic analysis.

[29] Ferran Feixas, Steffen Lindert, William Sinko, and J Andrew Mc Cammon. Exploring the role of receptor flexibility in structure-based drug discovery. Biophysical chemistry, 186:31 45, 2014.

[30] Xiang Fu, Zhenghao Wu, Wujie Wang, Tian Xie, Sinan Keten, Rafael Gomez-Bombarelli, and Tommi S. Jaakkola. Forces are not enough: Benchmark and critical evaluation for machine learning force fields with molecular simulations. Transactions on Machine Learning Research, 2023. Survey Certification.

[31] Fabian Fuchs, Daniel Worrall, Volker Fischer, and Max Welling. Se (3)-transformers: 3d roto-translation equivariant attention networks. Advances in neural information processing systems, 33:1970 1981, 2020.

[32] Johannes Gasteiger, Florian Becker, and Stephan Günnemann. Gemnet: Universal directional graph neural networks for molecules. Advances in Neural Information Processing Systems, 34:6790 6802, 2021.

[33] Johannes Gasteiger, Janek Groß, and Stephan Günnemann. Directional message passing for molecular graphs. ar Xiv preprint ar Xiv:2003.03123, 2020.

[34] Johannes Gasteiger, Muhammed Shuaibi, Anuroop Sriram, Stephan Günnemann, Zachary Ward Ulissi, C. Lawrence Zitnick, and Abhishek Das. Gemnet-OC: Developing graph neural networks for large and diverse molecular simulation datasets. Transactions on Machine Learning Research, 2022.

[35] Mojtaba Haghighatlari, Jie Li, Xingyi Guan, Oufan Zhang, Akshaya Das, Christopher J Stein, Farnaz Heidar-Zadeh, Meili Liu, Martin Head-Gordon, Luke Bertels, et al. Newtonnet: A newtonian message passing network for deep learning of interatomic potentials and forces. Digital Discovery, 1(3):333 343, 2022.

[36] Jiaqi Han, Yu Rong, Tingyang Xu, and Wenbing Huang. Geometrically equivariant graph neural networks: A survey. ar Xiv preprint ar Xiv:2202.07230, 2022.

[37] Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840 6851, 2020.

[38] Emiel Hoogeboom, Vıctor Garcia Satorras, Clément Vignac, and Max Welling. Equivariant diffusion for molecule generation in 3d. In International conference on machine learning, pages 8867 8887. PMLR, 2022.

[39] Weihua Hu, Bowen Liu, Joseph Gomes, Marinka Zitnik, Percy Liang, Vijay Pande, and Jure Leskovec. Strategies for pre-training graph neural networks. ar Xiv preprint ar Xiv:1905.12265, 2019.

[40] Frank Jensen. Introduction to computational chemistry. John wiley & sons, 2017.

[41] Bowen Jing, Stephan Eismann, Patricia Suriana, Raphael John Lamarre Townshend, and Ron Dror. Learning from protein structure with geometric vector perceptrons. In International Conference on Learning Representations, 2021.

[42] Jaehyeong Jo, Dongki Kim, and Sung Ju Hwang. Graph generation with destination-predicting diffusion mixture, 2024.

[43] John Jumper, Richard Evans, Alexander Pritzel, Tim Green, Michael Figurnov, Olaf Ronneberger, Kathryn Tunyasuvunakool, Russ Bates, Augustin Žídek, Anna Potapenko, et al. Highly accurate protein structure prediction with alphafold. Nature, 596(7873):583 589, 2021.

[44] Wolfgang Kabsch. A discussion of the solution for the best rotation to relate two sets of vectors. Acta Crystallographica Section A: Crystal Physics, Diffraction, Theoretical and General Crystallography, 34(5):827 828, 1978.

[45] Khaled Kahouli, Stefaan Simon Pierre Hessmann, Klaus-Robert Müller, Shinichi Nakajima, Stefan Gugler, and Niklas Wolf Andreas Gebauer. Molecular relaxation by reverse diffusion with time step prediction. ar Xiv preprint ar Xiv:2404.10935, 2024.

[46] Martin Karplus and J Andrew Mc Cammon. Molecular dynamics simulations of biomolecules. Nature structural biology, 9(9):646 652, 2002.

[47] Tero Karras, Miika Aittala, Timo Aila, and Samuli Laine. Elucidating the design space of diffusion-based generative models. Advances in Neural Information Processing Systems, 35:26565 26577, 2022.

[48] Diederik Kingma and Ruiqi Gao. Understanding diffusion objectives as the elbo with simple data augmentation. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors, Advances in Neural Information Processing Systems, volume 36, pages 65484 65516. Curran Associates, Inc., 2023.

[49] Leon Klein, Andreas Krämer, and Frank Noé. Equivariant flow matching. Advances in Neural Information Processing Systems, 36, 2024.

[50] Jonas Köhler, Leon Klein, and Frank Noé. Equivariant flows: exact likelihood generative learning for symmetric densities. In International conference on machine learning, pages 5361 5370. PMLR, 2020.

[51] Zhifeng Kong, Wei Ping, Jiaji Huang, Kexin Zhao, and Bryan Catanzaro. Diffwave: A versatile diffusion model for audio synthesis. In International Conference on Learning Representations, 2021.

[52] Greg Landrum. Rdkit: Open-source cheminformatics software. Github, 2016.

[53] Christian Léonard. Girsanov theory under a finite entropy condition. In Séminaire de Probabilités XLIV, pages 429 465. Springer, 2012.

[54] Ira N Levine, Daryle H Busch, and Harrison Shull. Quantum chemistry, volume 6. Pearson Prentice Hall Upper Saddle River, NJ, 2009.

[55] Raphael D Levine. Molecular reaction dynamics. Cambridge University Press, 2009.

[56] Xiang Li, John Thickstun, Ishaan Gulrajani, Percy S Liang, and Tatsunori B Hashimoto. Diffusion-lm improves controllable text generation. Advances in Neural Information Processing Systems, 35:4328 4343, 2022.

[57] Yi-Lun Liao and Tess Smidt. Equiformer: Equivariant graph attention transformer for 3d atomistic graphs. ar Xiv preprint ar Xiv:2206.11990, 2022.

[58] Yi-Lun Liao, Brandon M Wood, Abhishek Das, and Tess Smidt. Equiformerv2: Improved equivariant transformer for scaling to higher-degree representations. In The Twelfth International Conference on Learning Representations, 2024.

[59] Yaron Lipman, Ricky T. Q. Chen, Heli Ben-Hamu, Maximilian Nickel, and Matthew Le. Flow matching for generative modeling. In The Eleventh International Conference on Learning Representations, 2023.

[60] Meng Liu, Cong Fu, Xuan Zhang, Limei Wang, Yaochen Xie, Hao Yuan, Youzhi Luo, Zhao Xu, Shenglong Xu, and Shuiwang Ji. Fast quantum property prediction via deeper 2d and 3d graph networks. ar Xiv preprint ar Xiv:2106.08551, 2021.

[61] Xingchao Liu, Chengyue Gong, and qiang liu. Flow straight and fast: Learning to generate and transfer data with rectified flow. In The Eleventh International Conference on Learning Representations, 2023.

[62] Xingchao Liu, Lemeng Wu, Mao Ye, and qiang liu. Learning diffusion bridges on constrained domains. In The Eleventh International Conference on Learning Representations, 2023.

[63] Shuqi Lu, Zhifeng Gao, Di He, Linfeng Zhang, and Guolin Ke. Highly accurate quantum chemical property prediction with uni-mol+. ar Xiv preprint ar Xiv:2303.16982, 2023.

[64] Shengjie Luo, Tianlang Chen, and Aditi S. Krishnapriyan. Enabling efficient equivariant operations in the fourier basis via gaunt tensor products. In The Twelfth International Conference on Learning Representations, 2024.

[65] Shengjie Luo, Tianlang Chen, Yixian Xu, Shuxin Zheng, Tie-Yan Liu, Liwei Wang, and Di He. One transformer can understand both 2d & 3d molecular data. In The Eleventh International Conference on Learning Representations, 2023.

[66] Shengjie Luo, Shanda Li, Shuxin Zheng, Tie-Yan Liu, Liwei Wang, and Di He. Your transformer may not be as powerful as you expect. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors, Advances in Neural Information Processing Systems, 2022.

[67] Nakata Maho. The pubchemqc project: A large chemical database from the first principle calculations. In AIP conference proceedings, volume 1702, page 090058. AIP Publishing LLC, 2015.

[68] Richard M Martin. Electronic structure: basic theory and practical methods. Cambridge university press, 2020.

[69] Amil Merchant, Simon Batzner, Samuel S Schoenholz, Muratahan Aykol, Gowoon Cheon, and Ekin Dogus Cubuk. Scaling deep learning for materials discovery. Nature, 624(7990):80 85, 2023.

[70] Albert Musaelian, Simon Batzner, Anders Johansson, Lixin Sun, Cameron J Owen, Mordechai Kornbluth, and Boris Kozinsky. Learning local equivariant representations for large-scale atomistic dynamics. Nature Communications, 14(1):579, 2023.

[71] Maho Nakata and Tomomi Shimazaki. Pubchemqc project: a large-scale first-principles electronic structure database for data-driven chemistry. Journal of chemical information and modeling, 57(6):1300 1308, 2017.

[72] Alexander Quinn Nichol and Prafulla Dhariwal. Improved denoising diffusion probabilistic models. In International conference on machine learning, pages 8162 8171. PMLR, 2021.

[73] Noel M O Boyle, Michael Banck, Craig A James, Chris Morley, Tim Vandermeersch, and Geoffrey R Hutchison. Open babel: An open chemical toolbox. Journal of cheminformatics, 3:1 14, 2011.

[74] Bernt Øksendal and Bernt Øksendal. Stochastic differential equations. Springer, 2003.

[75] Saro Passaro and C Lawrence Zitnick. Reducing so (3) convolutions to so (2) for efficient equivariant gnns. In International Conference on Machine Learning, pages 27420 27438. PMLR, 2023.

[76] Stefano Peluchetti. Diffusion bridge mixture transports, schrödinger bridge problems and generative modeling. Journal of Machine Learning Research, 24(374):1 51, 2023.

[77] Stefano Peluchetti. Non-denoising forward-time diffusions. ar Xiv preprint ar Xiv:2312.14589, 2023.

[78] Jan-Hendrik Prinz, Hao Wu, Marco Sarich, Bettina Keller, Martin Senne, Martin Held, John D Chodera, Christof Schütte, and Frank Noé. Markov models of molecular kinetics: Generation and validation. The Journal of chemical physics, 134(17), 2011.

[79] Raghunathan Ramakrishnan, Pavlo O Dral, Matthias Rupp, and O Anatole Von Lilienfeld. Quantum chemistry structures and properties of 134 kilo molecules. Scientific data, 1(1):1 7, 2014.

[80] Ladislav Rampášek, Michael Galkin, Vijay Prakash Dwivedi, Anh Tuan Luu, Guy Wolf, and Dominique Beaini. Recipe for a general, powerful, scalable graph transformer. Advances in Neural Information Processing Systems, 35:14501 14515, 2022.

[81] Dennis C Rapaport. The art of molecular dynamics simulation. Cambridge university press, 2004.

[82] L Chris G Rogers and David Williams. Diffusions, Markov processes and martingales: Volume 2, Itô calculus, volume 2. Cambridge university press, 2000.

[83] Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bjorn Ommer. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10684 10695, June 2022.

[84] David Rosenberger, Justin S Smith, and Angel E Garcia. Modeling of peptides with classical and novel machine learning force fields: A comparison. The Journal of Physical Chemistry B, 125(14):3598 3612, 2021.

[85] Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily L Denton, Kamyar Ghasemipour, Raphael Gontijo Lopes, Burcu Karagol Ayan, Tim Salimans, et al. Photorealistic text-to-image diffusion models with deep language understanding. Advances in neural information processing systems, 35:36479 36494, 2022.

[86] Simo Särkkä and Arno Solin. Applied stochastic differential equations, volume 10. Cambridge University Press, 2019.

[87] Vıctor Garcia Satorras, Emiel Hoogeboom, and Max Welling. E (n) equivariant graph neural networks. In International conference on machine learning, pages 9323 9332. PMLR, 2021.

[88] H Bernhard Schlegel. Geometry optimization. Wiley Interdisciplinary Reviews: Computational Molecular Science, 1(5):790 809, 2011.

[89] Kristof Schütt, Oliver Unke, and Michael Gastegger. Equivariant message passing for the prediction of tensorial properties and molecular spectra. In International Conference on Machine Learning, pages 9377 9388. PMLR, 2021.

[90] Kristof T Schütt, Huziel E Sauceda, P-J Kindermans, Alexandre Tkatchenko, and K-R Müller. Schnet a deep learning architecture for molecules and materials. The Journal of Chemical Physics, 148(24), 2018.

[91] William Raymond Scott. Group theory. Courier Corporation, 2012.

[92] Yu Shi, Shuxin Zheng, Guolin Ke, Yifei Shen, Jiacheng You, Jiyan He, Shengjie Luo, Chang Liu, Di He, and Tie-Yan Liu. Benchmarking graphormer on large-scale molecular modeling datasets. ar Xiv preprint ar Xiv:2203.04810, 2022.

[93] Yuyang Shi, Valentin De Bortoli, Andrew Campbell, and Arnaud Doucet. Diffusion schrödinger bridge matching. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.

[94] Muhammed Shuaibi, Adeesh Kolluru, Abhishek Das, Aditya Grover, Anuroop Sriram, Zachary Ulissi, and C Lawrence Zitnick. Rotation invariant graph neural networks using spin convolutions. ar Xiv preprint ar Xiv:2106.09575, 2021.

[95] Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsupervised learning using nonequilibrium thermodynamics. In Francis Bach and David Blei, editors, Proceedings of the 32nd International Conference on Machine Learning, volume 37 of Proceedings of Machine Learning Research, pages 2256 2265, Lille, France, 07 09 Jul 2015. PMLR.

[96] Vignesh Ram Somnath, Matteo Pariset, Ya-Ping Hsieh, Maria Rodriguez Martinez, Andreas Krause, and Charlotte Bunne. Aligned diffusion schrödinger bridges. In Uncertainty in Artificial Intelligence, pages 1985 1995. PMLR, 2023.

[97] Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models. In International Conference on Learning Representations, 2021.

[98] Yang Song and Stefano Ermon. Generative modeling by estimating gradients of the data distribution. Advances in neural information processing systems, 32, 2019.

[99] Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2021.

[100] Yuxuan Song, Jingjing Gong, Minkai Xu, Ziyao Cao, Yanyan Lan, Stefano Ermon, Hao Zhou, and Wei-Ying Ma. Equivariant flow matching with hybrid probability transport for 3d molecule generation. Advances in Neural Information Processing Systems, 36, 2024.

[101] Howard Stephen Stoker and G Lynn Carlson. General, organic, and biological chemistry. Houghton Mifflin, 2004.

[102] Challapalli Suryanarayana. Experimental techniques in materials and mechanics. Crc Press, 2011.

[103] Philipp Thölke and Gianni De Fabritiis. Equivariant transformers for neural network based molecular potentials. In International Conference on Learning Representations, 2022.

[104] Nathaniel Thomas, Tess Smidt, Steven Kearnes, Lusann Yang, Li Li, Kai Kohlhoff, and Patrick Riley. Tensor field networks: Rotation-and translation-equivariant neural networks for 3d point clouds. ar Xiv preprint ar Xiv:1802.08219, 2018.

[105] Richard Tran, Janice Lan, Muhammed Shuaibi, Brandon M Wood, Siddharth Goyal, Abhishek Das, Javier Heras-Domingo, Adeesh Kolluru, Ammar Rizvi, Nima Shoghi, et al. The open catalyst 2022 (oc22) dataset and challenges for oxide electrocatalysts. ACS Catalysis, 13(5):3066 3084, 2023.

[106] Oliver T Unke, Stefan Chmiela, Huziel E Sauceda, Michael Gastegger, Igor Poltavsky, Kristof T Schutt, Alexandre Tkatchenko, and Klaus-Robert Muller. Machine learning force fields. Chemical Reviews, 121(16):10142 10186, 2021.

[107] Hanchen Wang, Tianfan Fu, Yuanqi Du, Wenhao Gao, Kexin Huang, Ziming Liu, Payal Chandak, Shengchao Liu, Peter Van Katwyk, Andreea Deac, et al. Scientific discovery in the age of artificial intelligence. Nature, 620(7972):47 60, 2023.

[108] Joseph L Watson, David Juergens, Nathaniel R Bennett, Brian L Trippe, Jason Yim, Helen E Eisenach, Woody Ahern, Andrew J Borst, Robert J Ragotte, Lukas F Milles, et al. De novo design of protein structure and function with rfdiffusion. Nature, 620(7976):1089 1100, 2023.

[109] E Weinan and Eric Vanden-Eijnden. Transition-path theory and path-finding algorithms for the study of rare events. Annual review of physical chemistry, 61(2010):391 420, 2010.

[110] Lemeng Wu, Chengyue Gong, Xingchao Liu, Mao Ye, and qiang liu. Diffusion-based molecule generation with informative prior bridges. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors, Advances in Neural Information Processing Systems, 2022.

[111] Guikun Xu, Yongquan Jiang, Peng Chuan Lei, Yan Yang, and Jim Chen. Gtmgc: Using graph transformer to predict molecule s ground-state conformation. In The Twelfth International Conference on Learning Representations, 2023.

[112] Minkai Xu, Jiaqi Han, Aaron Lou, Jean Kossaifi, Arvind Ramanathan, Kamyar Azizzadenesheli, Jure Leskovec, Stefano Ermon, and Anima Anandkumar. Equivariant graph neural operator for modeling 3d dynamics. ar Xiv preprint ar Xiv:2401.11037, 2024.

[113] Minkai Xu, Alexander S Powers, Ron O. Dror, Stefano Ermon, and Jure Leskovec. Geometric latent diffusion models for 3D molecule generation. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett, editors, Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pages 38592 38610. PMLR, 23 29 Jul 2023.

[114] Minkai Xu, Alexander S Powers, Ron O Dror, Stefano Ermon, and Jure Leskovec. Geometric latent diffusion models for 3d molecule generation. In International Conference on Machine Learning, pages 38592 38610. PMLR, 2023.

[115] Minkai Xu, Lantao Yu, Yang Song, Chence Shi, Stefano Ermon, and Jian Tang. Geodiff: A geometric diffusion model for molecular conformation generation. In International Conference on Learning Representations, 2022.

[116] Zhao Xu, Youzhi Luo, Xuan Zhang, Xinyi Xu, Yaochen Xie, Meng Liu, Kaleb Dickerson, Cheng Deng, Maho Nakata, and Shuiwang Ji. Molecule3d: A benchmark for predicting 3d geometries from molecular graphs. ar Xiv preprint ar Xiv:2110.01717, 2021.

[117] Jason Yim, Brian L. Trippe, Valentin De Bortoli, Emile Mathieu, Arnaud Doucet, Regina Barzilay, and Tommi Jaakkola. SE(3) diffusion model with application to protein backbone generation. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett, editors, Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pages 40001 40039. PMLR, 23 29 Jul 2023.

[118] Chengxuan Ying, Tianle Cai, Shengjie Luo, Shuxin Zheng, Guolin Ke, Di He, Yanming Shen, and Tie-Yan Liu. Do transformers really perform badly for graph representation? In A. Beygelzimer, Y. Dauphin, P. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, 2021.

[119] Bohang Zhang, Shengjie Luo, Liwei Wang, and Di He. Rethinking the expressive power of GNNs via graph biconnectivity. In The Eleventh International Conference on Learning Representations, 2023.

[120] Xuan Zhang, Limei Wang, Jacob Helwig, Youzhi Luo, Cong Fu, Yaochen Xie, Meng Liu, Yuchao Lin, Zhao Xu, Keqiang Yan, et al. Artificial intelligence for science in quantum, atomistic, and continuum systems. ar Xiv preprint ar Xiv:2307.08423, 2023.

[121] Linqi Zhou, Aaron Lou, Samar Khanna, and Stefano Ermon. Denoising diffusion bridge models. In The Twelfth International Conference on Learning Representations, 2024.

A Organization of the Appendix

The supplementary material is organized as follows. In Appendix B, we first recall some definitions and tools from stochastic calculus and then give the proofs of all theorems. In Appendix C, we give the derivation of our practical objective function and our sampling algorithms. In Appendix D, we give some details of our experiments, including a comprehensive introduction to the datasets, baselines, metrics and settings.

B Proof of Theorems

B.1 Review of Stochastic Calculus

Let (Xt)t [0,T ] be a stochastic process. We use p(x , t |x1, t1; x2, t2; . . . ; xn, tn) to denote its conditional density function satisfying

P(Xt A|Xt1 = x1, Xt2 = x2, . . . , Xtn = xn) = Z

A p(x , t |x1, t1; x2, t2; . . . ; xn, tn)dx

for any Borel set A, where t1 < t2 < < tn. If (Xt)t [0,T ] is a Markov process, p(x , t |x1, t1; x2, t2; . . . ; xn, tn) = p(x , t |xn, tn), which is also called a transition density function.

One of the most important results of stochastic calculus is the Ito s formula. The precise statements are as follows. Theorem B.1 (Ito s formula for Brownian Motion). Let Bt be the d dimensional Brownian Motion. Assume f is a bounded real valued function with continuous second-order partial derivatives, i.e. f C2 b (Rd). Then the Ito s formula is given by

f(Bt) = f(B0) + Z T

0 f(Bt) d Bt + 1

0 2f(Bt)dt. (8)

We follow [86] for the proof of Doob s h-transform. The infinitesimal generator of the Markov process plays an important role in the proof of the Doob s h-transform. The precise definitions are as follows. Definition B.2. (Generator of a Process) The infinitesimal generator At of a stochastic process (Xt) for a function ϕ(x) is

Atϕ(x) = lim s 0+ E[ϕ(Xt+s)|Xt = x] ϕ(x)

where ϕ is a suitably regular function. For an Itô process defined as the solution to the SDE d Xt = f(Xt, t)dt + σ(t)d Bt, the generator is

i=1 f i(x, t)

i=1 σ2(t) 2

x2 i . (10)

The Fokker-Planck s Equation is an useful tool to track the evolution of the transition density function associated with an SDE. The precise statements are as follows. Proposition B.3. (Fokker-Planck s Equation) Let p(x , t |x, t) be the transition density function of the SDE d Xt = f(Xt, t)dt + σ(t)d Bt. Then p(x , t |x, t) satisfies the Fokker-Planck s Equation

p(x, t|x0, 0)

(f i(x, t)p(x, t|x0, 0))

i=1 σ2(t) 2p(x, t|x0, 0)

x2 i = 0, (11)

with the initial condition p(x, 0|x0, 0) = δ(x x0). The Fokker-Planck s Equation can also be written in a compact form using the generator At: tp(x, t|x0, 0) = A t p(x, t|x0, 0), (12)

where A t is the adjoint operator of A:

(f i(x, t) )

i=1 σ2(t) 2( )

x2 i . (13)

When the terminal is fixed, the evolution of the transition density function can also given by a PDE, which is called the Backward Kolmogorov Equation. We give the precise statement as follows. Proposition B.4. (Backward Kolmogorov Equation) Let p(x , t |x, t) be the transition density function of the SDE d Xt = f(Xt, t)dt + σ(t)d Bt. Then p(x , t |x, t) satisfies the Backward Kolmogorov Equation

p(xt, t|x, s)

i=1 f i(x, s) p(xt, t|x, s)

i=1 σ2(s) 2p(xt, t|x, s)

x2 i = 0, (14)

with the initial condition p(xt, t|x, t) = δ(x xt). The Backward Kolmogorov Equation can also be written in a compact form using the generator As:

p(xt, t|x, s) = 0. (15)

B.2 Proof of Proposition 3.1

Proposition B.5. Let R denote the space of geometric states and f R( , ) : R [0, T] R denote the drift coefficient on R. Let (Wt)t [0,T ] denote the Wiener process on R. Given an SDE on geometric states d Rt = f R(Rt, t)dt + σ(t)d Wt, R0 q(R0), its transition density p R(z , t |z, t), z, z R is SE(3)-equivariant, i.e., p R(Rt , t |Rt, t) = p R(ρR(g)[Rt ], t |ρR(g)[Rt], t), g SE(3), 0 t < t T, if the following conditions are satisfied: (1) q(R0) is SE(3)-invariant; (2) f R( , t) is SO(3)-equivariant and T(3)-invariant; (3) the transition density of (Wt)t [0,T ] is SE(3)- equivariant.

Proof. In this section, we view R = {r1, ..., rn} R as r1 r2 rn R3n, which is the concatenation of ri. So from this perspective, the space R is isomorphic to the Euclidean space R3n. Then (Wt)t [0,T ] is the Wiener process with dimension d = 3n.

For any g SE(3), ρR(g) can be characterized by an orthogonal matrix O(g) R3 3, satisfying det(O(g)) = 1, and a translation vector t R3. Then the representation of SE(3) on R3n is given by ρR(g)[R] = OR(g)R + t R, (16) where OR(g) = diag{O(g), O(g), . . . , O(g)}, t R = t t t R3n. It s obvious that OR(g) is also an orthogonal matrix in R3n 3n, satisfying O 1 R (g) = OT R(g).

According to Proposition B.3, the evolution of the transition density function is given by the Fokker-Planck s Equation

p R(x, t|x0, 0)

f i(x, t)p R(x, t|x0, 0)

i=1 σ2(t) 2 (p R(x, t|x0, 0))

x2 i , (17)

with the initial condition p R(x, 0|x0, 0) = δ(x x0).

Let y = OR(g)x + t R, y0 = OR(g)x0 + t R, then we have

p R(ρR(g)[x], t|ρR(g)[x0], 0) = p R(OR(g)x + t R, t|OR(g)x0 + t R, 0) = p R(y, t|y0, 0). (18)

The evolution of the transition density function p R(y, t|y0, 0) is also given by the Fokker-Planck s Equation:

p R(y, t|y0, 0)

f i(y, t)p R(y, t|y0, 0)

i=1 σ2(t) 2 (p R(y, t|y0, 0))

y2 i , (19)

with the boundary condition p R(y, 0|y0, 0) = δ(y y0) = δ(x x0). Since y = OR(g)x + t R, we have x = O 1 R (g)(y t R). Then by the chain rule, we have

j=1 (O 1 R (g))ji xj =

j=1 (OR(g))ij xj . (20)

Since f R( , t) is a SO(3)-equivariant and T(3)-invariant function, we have

f i R(y, t) = f i R(OR(g)x + t R, t) = (OR(g)f R(x, t))i =

k=1 (OR(g))ikf k R(x, t). (21)

Then the Fokker-Planck s equation becomes

p R(y, t|y0, 0)

f i(y, t)p R(y, t|y0, 0)

2 (p R(y, t|y0, 0))

k=1 (OR(g))ij ((OR(g))ikf k R(x, t)p R(y, t|y0, 0))

k=1 (OR(g))ik xk (OR(g))ij (p R(y, t|y0, 0))

k=1 (OR(g))ij(OR(g))ik (f k R(x, t)p R(y, t|y0, 0))

k=1 (OR(g))ik(OR(g))ij xk

(p R(y, t|y0, 0))

Since OR(g) is an orthogonal matrix, the columns of OR(g) are orthogonal to each other, i.e.

i=1 (OR(g))ik(OR(g))ij = δjk = 0 j = k, 1 j = k. (27)

So the Fokker-Planck s equation can be simplified to

p R(y, t|y0, 0)

(f k R(x, t)p R(y, t|y0, 0))

(p R(y, t|y0, 0))

(f j R(x, t)p R(y, t|y0, 0))

2 (p R(y, t|y0, 0))

(xj)2 , (30)

which is same as Eqn.(17). Since the boundary condition p R(y, 0|y0, t0) = δ(y y0) = δ(x x0) = p R(x, 0|x0, 0), then p R(y, t|y0, t0) = p R(x, t|x0, t0), t [0, T]. Thus we have proved that p R(Rt , t |Rt, t) = p R(ρR(g)[Rt ], t |ρR(g)[Rt], t), g SE(3), 0 t < t T.

B.3 Proof of Proposition 3.2

Proposition B.6 (Doob s h-transform). Let p R(z , t |z, t) be the transition density of the SDE in Proposition 3.1. Let h R( , ) : R [0, T] R>0 be a smooth function satisfying: (1) h R( , t) is SE(3)-invariant; (2) h R(z, t) = R p R(z , t |z, t)h R(z , t )dz . We can derive the following h Rtransformed SDE on geometric states:

d Rt = f R(Rt, t) + σ2(t) Rt log h R(Rt, t) dt + σ(t)d Wt, (31)

with transition density ph R(z , t |z, t) = p R(z , t |z, t) h R(z ,t )

h R(z,t) preserving the symmetry constraints.

Proof. We use the definition of the infinitesimal generator to prove the proposition. The infinitesimal generator of ph R(x , t |x, t) for a function ϕ(x) is given by

Ah t ϕ(x) = lim s 0+ Eh[ϕ(Rt+s)|Rt = x] ϕ(x)

Since ph R(z , t |z, t) = p R(z , t |z, t) h R(z ,t )

h R(z,t) , so we have

Eh[ϕ(Rt+s)|Rt = x] = E[ϕ(Rt+s)h(Rt+s, t + s)|Rt = x]

h(x, t) . (33)

Then Ah t ϕ(x) can be simplified as

Ah t ϕ(x) = lim s 0+ Eh[ϕ(Rt+s)|Rt = x] ϕ(x)

= lim s 0+ E[ϕ(Rt+s)h(Rt+s, t + s)|Rt = x] ϕ(x)h(x, t)

sh(x, t) (35)

= 1 h(x, t)[ h(x, t)

xi ϕ(x) + h(x, t) ϕ(x)

f i(x, t) (36)

i=1 σ2(t) 2h(x, t)

x2 i ϕ(x) +

i=1 σ2(t) h(x, t)

i=1 σ2(t) 2ϕ(x)

x2 i h(x, t)] (38)

= 1 h(x, t)[ h(x, t)

t ϕ(x) + (Ath(x, t)) ϕ(x)

i=1 h(x, t) ϕ(x)

xi f i(x, t) (39)

i=1 σ2(t) h(x, t)

i=1 σ2(t) 2ϕ(x)

x2 i h(x, t)] (40)

Since h(x, t) = R p R(x , t |x, t)h(x , t )dx, we have

h(x, t) = Z p(x , t |x, t)

t + Atp(x , t |x, t) h(x , t )dx. (41)

According to the Backward Kolmogorov Equation (Proposition B.4), we get

p(x , t |x, t)

t + Atp(x , t |x, t) = 0. (42)

h(x, t) = 0. (43)

Then Ah t ϕ(x) can be simplified as

Ahϕ(x) = 1 h(x, t)[

i=1 h(x, t) ϕ(x)

xi f i(x, t) +

i=1 σ2(t) h(x, t)

i=1 σ2(t) 2ϕ(x)

x2 i h(x, t)] (45)

xi f i(x, t) +

i=1 σ2(t) 1 h(x, t) h(x, t)

i=1 σ2(t) 2ϕ(x)

f i(x, t) + σ2(t) log h(x, t)

i=1 σ2(t) 2ϕ(x)

x2 i . (47)

So we show that

f i(x, t) + σ2(t) log h(x, t)

i=1 σ2(t) 2

x2 i . (48)

According to the correspondence between SDE and its generator (Definition B.2), the equation above implies that the h-transformed SDE is given by

d Rt = f R(Rt, t) + σ2(t) Rt log h R(Rt, t) dt + σ(t)d Wt. (49)

Additionally, we need to show that the h-transformed transition density function satisfies the symmetric constraints. First, we show that if h( , t0) is SE(3)-invariant, then h( , t) is also SE(3)-invariant t [0, T]. For any g SE(3), assume ρR(g)[z] = OR(g)z + t R, where OR(g) is an orthogonal matrix and det(OR(g)) = 1. Since h R(z, t) satisfies

h R(z, t) = Z p R(z , t0|z, t)h(z , t0)dz , (50)

then we have

h R(ρR(g)[z], t) = Z p R(z , t0|ρR(g)[z], t)h(z , t0)dz (51)

= Z p R ρR(g)(ρR(g)) 1[z ], t0|ρR(g)[z], t h(ρR(g)(ρR(g)) 1[z ], t0)dz .

By Proposition 3.1, p R ρR(g)(ρR(g)) 1[z ], t0|ρR(g)[z], t = p R (ρR(g)) 1[z ], t0|z, t , let z1 = ρR(g)) 1[z ], then

h R(ρR(g)[z], t) = Z p R (ρR(g)) 1[z ], t0|z, t h(ρR(g)(ρR(g)) 1[z ], t0)dz (54)

= Z p R (z1, t0|z, t) h(ρR(g)z1, t0) det(OR(g))dz1 (55)

= Z p R (z1, t0|z, t) h(z1, t0)dz1 (56)

= h R(z, t). (57)

So h( , t) is SE(3)-invariant t [0, T], h( , t) is well-defined under these symmetric constraints. Then we show ph R(z , t |z, t) preserves the symmetric constraints:

ph R(ρR(g)[z ], t |ρR(g)[z], t) = p R(ρR(g)[z ], t |ρR(g)[z], t)h R(ρR(g)[z ], t )

h R(ρR(g)[z], t) (58)

= p R(z , t |z, t)h R(ρR(g)[z ], t )

h R(ρR(g)[z], t) (59)

= p R(z , t |z, t)h R(z , t )

h R(z, t) (60)

= ph R(z , t |z, t). (61)

Thus we have proved that

ph R(ρR(g)[z ], t |ρR(g)[z], t) = ph R(z , t |z, t), (62)

which implies that ph R(z , t |z, t) preserves the symmetric constraints for any g SE(3). So the proof is completed.

Next, we show how to construct a SDE with a fixed terminal point as an simple application of the Doob s h-transform. The result of this example is very useful to construct diffusion bridge. Proposition B.7. Assume the original SDE is given by d Xt = f(Xt, t)dt+σ(t)d Wt. Let h R(x, t) = p R(y, T|x, t) which is the transition density function of the original SDE evaluated at XT = y. Then the h-transformed SDE

d Rt = f(Rt, t) + σ2(t) Rt log p R(y, T|Rt, t) dt + σ(t)d Wt, (63)

arrive at y almost surely at the final time.

Proof. The original SDE is given by

d Xt = f(Xt, t)dt + σ(t)d Wt. (64)

First, we need to verify that h R(x, t) satisfies the condition

h R(x, t) = Z p R(x , t0|x, t)h(x , t0)dx . (65)

Since h R(x, t) = p R(y, T|x, t), we have Z p R(x , t |x, t)h R(x , t ) = Z p R(x , t |x, t)p R(y, T|x , t )dx . (66)

Then by the Chapman Kolmogorov s equation Z p R(x , t |x, t)p R(y, T|x , t )dx = p R(y, T|x, t), (67)

we get Z p R(x , t |x, t)h R(x , t ) = p R(y, T|x, t) = h R(x, t). (68)

So the condition is satisfied. Then we can use the result of the Proposition 3.2. The h-transformed SDE is given by

d Rt = f(Rt, t) + σ2(t) Rt log p R(y, T|Rt, t) dt + σ(t)d Wt. (69)

And the h-transformed transition density function satisfies Z

A ph R(x , t |x, t)dx = Z

A p R(x , t |x, t)h R(x , t )

h R(x, t) dx (70)

A p R(x , t |x, t)p R(y, T|x , t )

p R(y, T|x, t) dx (71)

= P(Xt A|Xt = x, XT = y), (72)

where we use the Bayes theorem to deduce the last equality and A is an arbitrary Borel set. Since Rt is a process conditioning on XT = y, then RT = y almost surly.

B.4 Proof of Theorem 3.3

Theorem B.8 (Equivariant Diffusion Bridge). Given an SDE on geometric states d Rt = f R(Rt, t)dt + σ(t)d Wt with transition density p R(z , t |z, t), z, z R satisfying the conditions in Proposition 3.1. Let h R(z, t; z0) = R p R(z , T|z, t) qdata(z |z0) p R(z ,T |z0,0)dz . By using Proposition 3.2, we can derive the following h R-transformed SDE:

d Rt = f R(Rt, t) + σ2(t)Eq R(RT ,T |Rt,t;R0)[ Rt log p R(RT , T|Rt, t)|R0, Rt] dt + σ(t)d Wt, (73) which corresponds to a process (Rt)t [0,T ], R0 qdata(Rt0) satisfying the following properties:

let q( , ) : R R R 0 denote the joint distribution induced by (Rt)t [0,T ], then q(R0, RT ) equals to qdata(Rt0, Rt1);

its transition density q R(Rt , t |Rt, t; R0)=q R(ρR(g)[Rt ], t |ρR(g)[Rt], t; ρR(g)[R0]), 0 t<t T, g SE(3),R0 qdata(Rt0).

We call the tailored diffusion process (Rt)t [0,T ] an equivariant diffusion bridge.

Proof. Let h R(z, T; z0) = qdata(z|z0) p R(z,T |z0,0), then we define

h R(z, t; z0) = Z p R(z , T|z, t) qdata(z |z0) p R(z , T|z0, 0)dz , t [0, T). (74)

So we can easily show that h R(z, t; z0) satisfies the condition

h R(z, t; z0) = Z p R(z , T|z, t)h(z , T; z0)dz , t [0, T], z, z0 R. (75)

Then we can use the result of Theorem 3.2 to get the h-transformed SDE. By Theorem 3.2, the h-transformed SDE is

d Rt = f R(Rt, t) + σ2(t) Rt log h R(Rt, t; R0) dt + σ(t)d Wt. (76)

Next, we need to find the explicit form of Rt log h R(Rt, t; R0),

z log h R(z, t; z0) = zh R(z, t; z0)

hz(z, t; z0) (77)

= 1 h R(z, t; z0)

Z zp R(z , T|z, t) qdata(z |z0) p R(z , T|z0, 0)dz . (78)

The h-transformed density function is

q R(z , T|z, t; z0, 0) = p R(z , T|z, t)h R(z , T; z0)

h R(z, t; z0) (79)

= p R(z , T|z, t) qdata(z |z0) p R(z , T; z0, 0)h R(z, t; z0). (80)

Then we have

z log h R(z, t; z0) = 1 h R(z, t; z0)

Z zp R(z , T|z, t) qdata(z |z0) p R(z , T|z0, 0)dz (81)

= Z zp R(z , T|z, t)q R(z , T|z, t; z0, 0)

p R(z , T|z, t) dz (82)

= Z z log p R(z , T|z, t)q R(z , T|z, t; z0, 0)dz . (83)

So we get a explicit form of Rt log h R(Rt, t; R0):

Rt log h R(Rt, t; R0) = Eq R(RT ,T |Rt,t;z0)[ Rt log p R(RT , T|Rt, t)|z0, Rt]. (84)

Then the h-transformed SDE becomes

d Rt = f R(Rt, t) + σ2(t)Eq R(RT ,T |Rt,t;z0)[ Rt log p R(RT , T|Rt, t)|z0, Rt] dt + σ(t)d Wt. (85) Since h R(z, 0; z0) = R p R(z , T|z, 0) qdata(z |z0) p R(z ,T |z0,0)dz = R qdata(z |z0)dz = 1, then

q R(z , T|z0, 0) = p R(z , T|z0, 0) qdata(z |z0) p R(z , T; z0, 0)h R(z0, 0; z0) = qdata(z |z0), (86)

which means q R(RT , T|R0, 0) = qdata(RT |R0). Since the initial distribution R0 qdata(Rt0), so q R(R0) = qdata(R0). So we can deduce that

q(R0, RT ) = q R(R0)q R(RT , T|R0, 0) = qdata(R0)qdata(RT |R0) = qdata(R0, RT ). (87)

Finally, we need to show that the transition density function satisfies the corresponding symmetric constrains. Since h R(z , T; z0) = qdata(z |z0) p R(z ,T |z0,0) is SE(3)-invariant, i.e.

h R(ρR(g)[z], T; ρR(g)[z0]) = h R(z , T; z0), g SE(3), (88)

we can show that h( , t; ) is also SE(3)-invariant t [0, T] using the following property

h R(z, t; z0) = Z p R(z , T|z, t)h(z , T; z0)dz . (89)

For any g SE(3), assume ρR(g)[z] = OR(g)z + t R, where OR(g) is an orthogonal matrix satisfying det(OR(g)) = 1, then we have

h R(ρR(g)[z], t; ρR(g)[z0]) = Z p R(z , T|ρR(g)[z], t)h(z , T; ρR(g)[z0])dz (90)

= Z p R ρR(g)(ρR(g)) 1[z ], T|ρR(g)[z], t h(ρR(g)(ρR(g)) 1[z ], T; ρR(g)[z0])dz . (91)

By Proposition 3.1, p R ρR(g)(ρR(g)) 1[z ], t0|ρR(g)[z], t = p R (ρR(g)) 1[z ], t0|z, t , let z1 = ρR(g)) 1[z ], then

h R(ρR(g)[z], t; ρR(g)[z0]) (93)

= Z p R (ρR(g)) 1[z ], T|z, t h(ρR(g)(ρR(g)) 1[z ], T; ρR(g)[z0])dz (94)

= Z p R (z1, T|z, t) h(ρR(g)[z1], T; ρR(g)[z0]) det(OR(g))dz1 (95)

= Z p R (z1, t0|z, t) h(z1, t0; z0)dz1 (96)

= h R(z, t; z0). (97) So h( , t; ) is SE(3)-invariant t [0, T]. Then we show q R(z , t |z, t; z0, 0) preserves the symmetric constraints: q R(ρR(g)[z ], t |ρR(g)[z], t; ρR(g)[z0], 0) (98)

= p R(ρR(g)[z ], t |ρR(g)[z], t)h R(ρR(g)[z ], t ; ρR(g)[z0])

h R(ρR(g)[z], t; ρR(g)[z0]) (99)

= p R(z , t |z, t)h R(ρR(g)[z ], t ; ρR(g)[z0])

h R(ρR(g)[z], t; ρR(g)[z0]) (100)

= p R(z , t |z, t)h R(z , t ; z0)

h R(z, t; z0) (101)

= q R(z , t |z, t; z0, 0), (102) which completes our proof.

B.5 Objective Function of the Equivariant Diffusion Bridge

Lemma B.9. Let X1, , Xn, Y, Z be random variables. Then the optimal approximation of Y based on {X}n i=1 is f (X1, , Xn) = arg min f E Y f(X1, , Xn) 2 = E[Y|X1, , Xn].

Proof. Denote X = (X1, , Xn). We show the following decomposition first:

E Y f(X) 2 = E Y E[Y|X] 2 + E E[Y|X] f(X) 2 . (103)

We can compute E Y f(X) 2 directly by

E Y f(X) 2 = E Y E[Y|X] + E[Y|X] f(X) 2 (104)

= E Y E[Y|X] 2 + E E[Y|X] f(X) 2 (105)

+ E Y E[Y|X], E[Y|X] f(X) . (106) Since E Y E[Y|X], E[Y|X] f(X) = E [E Y E[Y|X], E[Y|X] f(X) |X] = 0, (107) we have E Y f(X) 2 = E Y E[Y|X] 2 + E E[Y|X] f(X) 2 (108)

+ E Y E[Y|X], E[Y|X] f(X) (109)

= E Y E[Y|X] 2 + E E[Y|X] f(X) 2 (110)

E Y E[Y|X1, , Xn] 2. (111)

The inequality becomes equality if and only if f(X1, , Xn) = E[Y|X1, , Xn]. So the the optimal approximation of Y based on {X}n i=1 is E[Y|X1, , Xn], i.e.

f (X1, , Xn) = arg min f E Y f(X1, , Xn) 2 = E[Y|X1, , Xn]. (112)

Proposition B.10. The training objective function of Equivariant Diffusion Bridge is:

L(θ) = E(z0,z1) qdata(Rt0,Rt1),Rt q R(Rt,t|z1,T ;z0,0)λ(t) vθ(Rt, t; z0) Rt log p R(z1, T|Rt, t) 2, (113) where t U(0, T). Then the optimal parameter θ = arg min θ L(θ) satisfies

vθ (Rt, t; z0) = Eq R(RT ,T |Rt,t;R0)[ Rt log p R(RT , T|Rt, t)|R0, Rt]. (114)

Proof. Let L(θ) = Et U(0,T )λ(t)Lt(θ), where

Lt(θ) = E(z0,z1) qdata(Rt0,Rt1),Rt q R(Rt,t|z1,T ;z0,0) vθ(Rt, t; z0) Rt log p R(z1, T|Rt, t) 2. (115) Then by Lemma B.9, vθ(Rt, t; z0) = Eq R(RT ,T |Rt,t;R0)[ Rt log p R(RT , T|Rt, t)|R0, Rt] minimize Lt(θ), t [0, T]. Since λ(t) 0, so the optimal parameter θ = arg min θ L(θ) satisfies

vθ (Rt, t; z0) = Eq R(RT ,T |Rt,t;R0)[ Rt log p R(RT , T|Rt, t)|R0, Rt], t [0, T]. (116)

B.6 Proof of Theorem 3.4

Theorem B.11 (Chain of Equivariant Diffusion Bridges). Let {(Rt i)t [0,T ]}i [N 1] denote a series of N equivaraint diffusion bridges defined in Theorem 3.3. For the i-th bridge (Rt i)t [0,T ], if we set

(1) hi R(z, t; z0) = R p R(z , T|z, t) qi+1 traj (z |z0) p R(z ,T |z0,0)dz ; (2) R0 0 q0 traj( R0), R0 i = RT i 1, 0 < i < N, then the joint distribution q R(R0 0, RT 0 , RT 1 , , RT N 1) induced by {(Rt)t [0,T ]}i [N 1] equals to qtraj( R0, ..., RN). We call this process a chain of equivariant diffusion bridges.

Proof. By Theorem 3.3, the transition density function of (Rt i)t [0,T ] satisfies qi R(RT i |R0 i ) = qi traj(RT i |R0 i ), 0 i N 1. The ground truth probability density function has the decomposition q0 traj( R0) QN i=1 qi traj( Ri| Ri 1). Then we use the boundary condition, R0 0 q0 traj( R0), R0 i = RT i 1, 0 < i < N, we have

q(R0 0, RT 0 , RT 1 , , RT N 1) = q0 R(R0 0)

i=1 qi R(RT i |RT i 1) (117)

= q0 R(R0 0)

i=1 qi R(RT i |R0 i ) (118)

= q0 traj(R0 0)

i=1 qi traj(RT i |R0 i ) (119)

= qtraj(R0 0, RT 0 , RT 1 , , RT N 1). (120)

So the joint distribution q R(R0 0, RT 0 , RT 1 , , RT N 1) induced by {(Rt)t [0,T ]}i [N 1] equals to qtraj( R0, ..., RN).

B.7 Objective of the Chain of Equivariant Diffusion Bridge

Proposition B.12. The training objective function of the Chain of Equivariant Diffusion Bridge is:

L (θ) = E(z0,...,z N) qtraj( R0,..., RN),t,Rt i λ(t) vθ(Rt i , t; zi) Rt i log pi R(zi+1, T|Rt i , t ) 2, (121) where t U(0, N T), i = t

T , t = t i T, Rt i qi R(Rt , t |zi+1, T; zi, 0). Then the optimal parameter θ = arg min θ L (θ) satisfies

vθ (Rt i , t; zi) = Eqi R(RT i ,T |Rt i ,t;R0 i )[ Rt i log pi R(RT i , T|Rt i, t)|R0 i , Rt i]. (122)

Proof. Let L (θ) = Et U(0,NT )λ(t)L t(θ), where

L t(θ) = E(z0,...,z N) qtraj( R0,..., RN),t,Rt i vθ(Rt i , t; zi) Rt i log pi R(zi+1, T|Rt i , t ) 2, (123)

where t U(0, N T), i = t

T , t = t i T, Rt i qi R(Rt , t |zi+1, T; zi, 0). Then by Lemma B.9, vθ (Rt i , t; zi) = Eqi R(RT i ,T |Rt i ,t;R0 i )[ Rt i log pi R(RT i , T|Rt i, t)|R0 i , Rt i] minimize L t(θ), t [0, NT]. Since λ(t) 0, so the optimal parameter θ = arg min θ L(θ) satisfies

vθ (Rt i , t; zi) = Eqi R(RT i ,T |Rt i ,t;R0 i )[ Rt i log pi R(RT i , T|Rt i, t)|R0 i , Rt i]. (124)

B.8 Proof of Theorem 3.5

In this paper, we choose the Brownian bridge as our matching target. Let s first recall the definition and properties of the Brownian bridge. A Brownian bridge (Xt)t [0,T ] with the initial position X0 and the terminal position XT is given by the following SDE

d Xt = XT Xt

T t dt + σd Wt, (125)

where Wt is the standard wiener process. The solution of the Brownian bridge is given by

Xt N (1 t)X0 + t X1, σ2t(1 t) . (126)

Next, we recall the definition of the KL Divergence:

Definition B.13 (KL Divergence). The relative entropy (or Kullback Leibler Divergence) KL(f||g) between two probability density functions f(x) and g(x) is defined by:

KL(f||g) = Z f(x) log f(x)

g(x) dx. (127)

In general, let P and Q be two probability measures on space X. Assume P is absolutely continuous with respect to Q then the Kullback Leibler Divergence between P and Q is defined as follows

KL(P||Q) = Z

d Qd P, (128)

d Q is the Radon Nikodym derivative of P with respect to Q.

When we need to compute the KL divergence between the path measures associated with two SDEs, the Girsanov s theorem [53] is an useful tool to get the Radon Nikodym derivative between the two measure. The precise statements are as follows. Theorem B.14 (Girsanov s Theorem). Let Wt be a d-dimensional Wiener process defined on (Ω, F, (Ft), P). Let Ht be a d-dimensional Ft adapted process such that Z T

0 Ht 2dt < , P a.s. (129)

Zt = exp Z t

0 Hs d Wt 1

0 Ht 2ds . (130)

Assume Zt is a martingale. Define the probability measure Q on FT by

d Q = ZT d P. (131)

Let Mt = Wt R t 0 Hsds, then Mt is a d dimensional Wiener process with respect to Q.

In practice, the condition that Zt is a martingale is hard to vertify. So the condition is often replaced by the Novikov s condition

For more discussions and applications of the Girsanov s theorem, please see [86, 74, 28]. Now we can give the precise assumptions and proof of Theorem 3.5 using the properties of Brownian Bridge and Theorem B.14. Theorem B.15. Assume ( Ri)i [N] is sampled by simulating a prior SDE on geometric states d Rt = H R( Rt)dt + σd Wt. Let µ i denote the path measure of this prior SDE when t [i T, (i + 1)T]. Building upon ( Ri)i [N], let {µi R}i [N 1] denote the path measure of our chain of equivariant diffusion bridges. Assume {µi R}i [N 1] is composed of chain of the Brownian Bridge. Assume the total time is NT = 1. Under the following assumptions:

H R( ) : Rd R is a scalar function with continuous second-order partial derivative;

The drift function is Lipschitz: there exist a constant L such that

H R(x) H R(y) L x y , x, y Rd;

H R( ) satisfies H R(x) K(1 + x ), x Rd;

E Rt 2 < M, t [0, NT];

h(t) = E[H R( Rt)] is a continuous function on t [0, NT];

The Novikov s condition:

0 H R( Wt) 2dt

The function H R satisfies the following regulaity condition: there exist a constant C such that 2H R(x) H R(x) 2/σ2 < C, x Rd;

then we have lim N max i KL(µ i ||µi R) = 0.

Proof. Let p be the probability density function associated with the ground truth SDE d Rt = f R( Rt, t)dt+σd Wt, R0 = R0. Let {(Rt i)t [0,T ]}i [N 1] denote a series of N equivaraint diffusion bridges defined in Theorem 3.4. Then by theorem 3.4, q R(R0 0, RT 0 , RT 1 , , RT N 1) induced by {(Rt)t [0,T ]}i [N 1] equals to p R(R0 0, RT 0 , RT 1 , , RT N 1). Additionally, the conditional probability density function q R(Rt i|RT i , R0 i ) , for i T t < (i + 1)T, is associated with the Brownian bridge

d Rt i = RT i Rt i T t dt + σd Wt, (133)

where t = t i T. Then by the chain rule of KL divergence

KL(µ i ||µi R) = KL(p i ( R(i+1)T , Ri T )||qi R( R(i+1)T , Ri T )+ (134)

Ep i ( R(i+1)T , Ri T ) h KL(µ i ( | R(i+1)T , Ri T )||µi R( | R(i+1)T , Ri T )) i . (135)

Since p i ( R(i+1)T , Ri T ) = qi R( R(i+1)T , Ri T ), we have

KL(µ i ||µi R) = Ep i ( R(i+1)T , Ri T ) h KL(µ i ( | R(i+1)T , Ri T )||µi R( | R(i+1)T , Ri T )) i . (136)

Since the prior SDE is time homogeneous, we can only consider the case i = 0 without loss of generality. Let υ be the path measure of the Brownian motion σ Wt on space R. Since the condition of Theorem B.14 is satisfied, then we can use Theorem B.14 and get

dµ 0( | R0) = exp

0 H R(σ Wt) d Wt 1 2σ2

0 H R(σ Wt) 2dt

(137) Then we can use the Ito s formula (Theorem B.1) to simplify the expression

dµ 0( | R0) = exp 1 σ2 (H R(σ W0) H R(σ WT )) + 1

2 R T 0 ( 2H R(σ Wt) 1 σ2 H R(σ Wt) 2)dt dυ( | R0).

(138) To simplify our notation, we denote

1 σ2 (H R(σ W0) H R(σ WT )) + 1

0 ( 2H R(σ Wt) 1

σ2 H R(σ Wt) 2)dt

(139) Let F, g be measurable functions on C[0, T], Rd, respectively. Then by the disintegration of Wiener measure into pinned Wiener measures (path measure of the Brownian Bridge), we have

Eµ 0( | R0)[Fg( σW T )] = Eυ( | R0)[Fg(σ WT )ZT ] = Z Eυ( | R0, RT =x)[FZT ]g(x)p T (x| R0)dx,

(140) where p T (x| R0) is the transition density function of σ Wt. Let F = 1, we get Z Eυ( | R0, RT =x)[ZT ]g(x)p T (x| R0)dx = Z g(x)p 0(x| R0)dx. (141)

So we have Eυ( | R0, RT =x)[ZT ] = p 0(x| R0)/p T (x| R0). Let g = 1, then we get Z Eµ 0( | R0, RT =x)[F]p 0(x| R0)dx = Z Eυ( | R0, RT =x)[FZT ]p T (x| R0)dx. (142)

So we can conclude that

dµ 0( | R0, RT ) dυ( | R0, RT ) = p T ( RT | R0)

p 0( RT | R0) exp( 1

σ2 (H R( R0))

σ2 (H R( RT )) exp 1 2 R T 0 ( 2H R( ) 1 σ2 H R( ) 2)dt .

(143) Note that µi R( | R0, RT ) = υ( | R0, RT ). Now we can calculate the KL divergence by

KL(µ 0||µ0 R) = Ep 0( RT , R0) h KL(µ 0( | RT , R0)||µ0 R( | RT , R0)) i (144)

Ep 0( RT , R0)

p T ( RT | R0)

p 0( RT | R0) exp 1 σ2 (H R( R0)

exp 1 σ2 (H R( RT )

= Ep 0( R0, Rt)

p T ( RT | R0)

p 0( RT | R0)

σ2 H R( R0)] E[ 1

σ2 H R( RT )] + CT

= Ep 0( R0) KL(p 0( RT | R0)||p T ( RT | R0)) + h(0) h(T)

When N , T = 1 N 0. Since h(t) is continuous by our assumption, then KL(µ 0||µ0 R) 0.

C Derivation of Practical Objective Function

In this subsection, we show the implementation details of our framework. We set T = 1 in all the experiments.

Matching objective. We design the SDE on geometric states in Proposition 3.1 to be:

d Rt = σd Wt, with transition density p R(z , t |z, t) = N(z0, σ2(t t)I) (149)

The explicit form of the objective is

Rt log p R(z1, 1|Rt, t) = Rt log N(z0, σ2(1 t)I) = z1 Rt

σ2(1 t) (150)

Then the h-transformed SDE becomes

d Rt = R1 Rt

1 t dt + σd Wt, (151)

which is known as the Brownian bridge. The corresponding h-transformed density is

q R(Rt, t|z1, 1; z0, 0) = N(tz1 + (1 t)z0, σ2t(1 t)I). (152)

In practice, we do not use q R(R0, 0|z1, 1; z0, 0) = δ(R0 z0) as our initial distribution. We use q R(R0, 0|z1, 1; z0, 0) = N(z0, σ2I) instead. Since the solution of the Brownian bridge is given by

Rt = (1 t)R0 + t R1 + σ p

t(1 t)Z, (153)

where Z N(0, I), then the marginal distribution of Rt becomes N((1 t)z0 + tz1, (1 t)σ2I). We use this distribution to sample geometric state Rt in the training stage.

Trajectory guidance. Similarly, we set T = 1 N , pi R(zi+1, T|Rt , t ) = N(Rt , σ2 i (T t )I) when we use the trajectory guidance. So the h-transformed SDE becomes

d Rt i = RT i Rt i T t dt + σid Wt, (154)

which is a Brownian bridge with T = 1

N . Then associated density function is

qi R(Rt , t |zi+1, T; zi, 0) = N( t

T zi+1 + T t

T zi, σ2 i t (T t )

T 2 I). (155)

Additionally, we set σi decays linearly with respect to i N , i.e. σi = N i

N σ, where σ is a hyperparameter. Again, in training stage, we set qi R(R0 i , 0|z1, 1; z0, 0) = N(z0, σ2 i I) as initial distribution, and the terminal distribution is qi R(R0 i , 0|z1, 1; z0, 0) = N(z1, σ2 i+1I), which is same as the initial distribution of the next bridge.

Sampling Algorithm We use the ODE-based method to generate samples at inference time. After the training process, the neural network vθ is trained as described in Algorithm 3 and Algorithm 4. When the network is trained without trajectory guidance, we simulate the following ODE to generate samples: d Rt

d t = vθ(Rt, t; R0), R0 qdata(Rt0), t [0, T] . (156)

When the network is trained with trajectory guidance, we solve the following ODE to generate samples: d Rt

d t = vθ(Rt, t; R t

T T ), R0 qdata(Rt0), t [0, N T] . (157)

Denote a black box ODE solver by Solver(v, t). Solver(v, t) takes a vector field v and a time point as inputs, then returns the solution of the ODE

d t = v(Xt, t; ϕ), X0 = x0, (158)

at the specific time t, i.e. Solver(v, t) = Xt. Combining all the above design choices, we have the following algorithms for sampling our Geometric Diffusion Bridge (Algorithm 5) and leveraging trajectory guidance if available (Algorithm 6).

Algorithm 3 Training

1: repeat 2: (z0, z1) qdata(Rt0, Rt1) 3: t U[0, T] 4: ϵ N(0, I)

5: Rt = t T z1 + T t

T σϵ 6: Take gradient descent step on

θλ(t) z1 Rt

σ2(T t) vθ(Rt, t; z0) 2

7: until converged

Algorithm 4 Training with trajectory guidance

1: repeat 2: (z0, . . . , z N) qtraj( R0, . . . , RN) 3: t U (0, N T), i = t

T , t = t i T 4: ϵ N(0, I)

5: Rt i = t

T zi+1 + T t

T σiϵ 6: Take gradient descent step on

θλ(t) zi+1 Rt i σ2 i (T t ) vθ(Rt i , t; zi)

7: until converged

Algorithm 5 Sampling

Require: Initial geometric state z0 qdata(Rt0), a trained neural network vθ, a numerical ODE solver Solver(v, t) 1: R0 = z0 2: RT = Solver(vθ(Rt, t; R0), T) Ensure: RT

Algorithm 6 Sampling with trajectory guidance

Require: Initial geometric state z0 qdata(Rt0), a trained neural network vθ, a numerical ODE solver Solver(v, t) 1: R0 = z0 2: RNT = Solver(vθ(Rt, t; R t

T T ), t = NT) Ensure: RNT

D Experiments

D.1 Equilibrium State Prediction

Dataset. QM9 [79] is a quantum chemistry benchmark consisting of 134k stable small organic molecules, which has been widely used for molecular modeling. These molecules correspond to the subset of all 133,885 species out of the GDB-17 chemical universe of 166 billion organic molecules. In convention, 110k, 10k, and 11k molecules are used for train/valid/test sets respectively. The geometric conformations that are minimal in energy are provided in the QM9 dataset. The equilibrium conformation and its relative properties are all calculated at the B3LYP/6-31G(2df,p) level of quantum chemistry.

Molecule3D [116] is a large-scale dataset curated from the Pub Chem QC project [67, 71], consisting of 3,899,647 molecules in total, 2,339,788 molecules in training set, 779,929 molecules in the validation set, 779,930 molecules in the test set, and its train/valid/test splitting ratio is 6 : 2 : 2. For each molecule, the 2D atom graph, the 3D equilibrium geometric conformation, and four extra properties are provided. In particular, both random and scaffold splitting methods are adopted to thoroughly evaluate the in-distribution and out-of-distribution performance. For each molecule, an initial geometric state is generated by using fast and coarse force field [73, 52] and geometry optimization is conducted to obtain B3LYP/6-31G* level DFT-calculated equilibrium geometric structure.

Baselines. We comprehensively compare our GDB framework with previous equilibrium conformation prediction methods. Following [111], we use DG and ETKDG algorithms implemented by RDkit as our fundamental baselines. The benchmark [116] used the Deeper GCN-DAGNN framework [60] which proposed a deep graph neural network architecture to predict 3D geometric conformation of the molecule based on its 2D graph structure, and got impressive performance on the Molecule3D dataset. GINE [39] proposed a method for pretraining GNN to improve the performance and capacity of GNN. GATv2 [10] proposed a dynamic graph attention mechanism and improved the performance of the graph attention network on several tasks. GPS [80] proposed a general framework that supported multiple types of encodings with efficiency and scalability guarantees in both small and large graph prediction tasks. GTMGC [111] proposed a novel neural network based on Graph-Transformer (GT) [118, 66, 119, 65] to predict the equilibrium conformation of the molecule in 3D based on its 2D graph structure.

Metric. Following [116], three metrics are adopted to evaluate predictions of equilibrium states: (1) C-RMSD: given prediction ˆR = {ˆri}N i=1 which is rigidly aligned to the ground-truth R = {r i }N i=1 by the Kabsch algorithm [44], Root Mean Square Deviation between their atoms is calculated,

i.e., C-RMSD( ˆR, R ) = q

1 N PN i=1 ˆri r i 2 2; (2) D-RMSE: based on ˆR and R = {r i }N i=1,

interatomic distances can be calculated, i.e., { ˆdi}N i=1 and { ˆd i }N i=1. Root Mean Square Error be-

tween these distances is calculated, i.e., D-RMSE({ ˆdi}N i=1, { ˆd i }N i=1) = q

1 N PN i=1(di d i )2; (3)

D-MAE({ ˆdi}N i=1, { ˆd i }N i=1) = 1 N PN i=1 |di d i |.

Settings. In this task, we parameterize vθ(Rt, t; R0) by extending a Graph-Transformer based equivariant network [92, 63] to encode both time steps and initial geometric states as conditions. For training, we use Adam W as the optimizer, and set the hyper-parameter ϵ to 1e-8 and (β1, β2) to (0.9,0.999). The gradient clip norm is set to 5.0. The peak learning rate is set to 1e-4. The batch size is set to 512. The weight decay is set to 0.0. The model is trained for 500k steps with a 30k-step warm-up stage. After the warm-up stage, the learning rate decays linearly to zero. The noise scale σ is set to 0.5. For inference, we use 10 time steps with the Euler solver [12]. All models are trained on 16 NVIDIA V100 GPU.

D.2 Structure Relaxation

Dataset. Open Catalyst 2022 (OC22) dataset [105] is a widely used dasaset, which has great significance for the development of Oxygen Evolution Reaction (OER) catalysts. Each data in the dataset is in the form of the adsorbate-catalyst complex. Both initial and adsorption states with trajectories connecting them are provided. The dataset consists of 62,331 Density Functional Theory (DFT) relaxations trajectories, and about 9,854,504 single-point DFT calculations across a range of oxide materials, coverages, and adsorbates.The training set consists of 45,890 catalyst-adsorbate complexes. To better evaluate the model s performance, the validation and test sets consider the in-distribution (ID) and out-of-distribution (OOD) settings which use unseen catalysts, containing approximately 2,624 and 2,780 complexes respectively.

Baselines. Following [105], we choose strong MLFF baselines trained on force field data for a challenging comparison. Spinconv [94] introduced a novel approach called spin convolution to model angular information between sets of neighboring atoms in a graph neural network and got impressive performance in molecular simulation tasks. Gemnet [32] proposed multiple structural improvements for geometric GNN with theoretical insights, which significantly improved the experimental performance as well. Based on Gemnet s framework, Gemnet-OC [34] modified the architecture of the network and improved the experimental performance on more diverse tasks.

In [105], there are still other baseline setting. [105] introduce a large-scale dataset Open Catalyst 2020 (OC20), which consists of 1,281,040 Density Functional Theory (DFT) relaxations and 264,890,000 single point evaluations to help training the baseline model. [105] presented baselines using both OC20 and OC22 data in training stage and baselines using only OC20/OC22 for comparison.

Metric. Following [105], we use the Average Distance within Threshold (ADw T) as the evaluation metric, which reflects the percentage of structures with an atom position MAE below thresholds. To be more precise, the ADWT metric across thresholds ranging from β = 0.01 A to β = 0.5 A in increments of 0.001 A. The computation of ADw T metric is to count the percentage of structures with an atom position MAE below the threshold.

Settings. In this task, We parameterize vθ(Rt, t; R0) by using Gem Net-OC [34], which also serves as a verification that our framework is compatible with different backbone models. For training, we use Adam W as the optimizer, and set the hyper-parameter ϵ to 1e-8 and (β1, β2) to (0.9,0.999). The gradient clip norm is set to 10.0. The peak learning rate is set to 5e-4. The batch size is set to 64. The weight decay is set to 0.0. The model is trained for 200k steps. After the warm-up stage, the learning rate decays linearly to zero. The noise scale σ is set to 0.5. The trajectory length is set to N = 10. For inference, we also use 10 time steps with the Euler solver [12]. All models are trained on 8 NVIDIA A100 GPU.

Neur IPS Paper Checklist

The checklist is designed to encourage best practices for responsible machine learning research, addressing issues of reproducibility, transparency, research ethics, and societal impact. Do not remove the checklist: The papers not including the checklist will be desk rejected. The checklist should follow the references and precede the (optional) supplemental material. The checklist does NOT count towards the page limit.

Please read the checklist guidelines carefully for information on how to answer these questions. For each question in the checklist:

You should answer [Yes] , [No] , or [NA] . [NA] means either that the question is Not Applicable for that particular paper or the relevant information is Not Available. Please provide a short (1 2 sentence) justification right after your answer (even for NA).

The checklist answers are an integral part of your paper submission. They are visible to the reviewers, area chairs, senior area chairs, and ethics reviewers. You will be asked to also include it (after eventual revisions) with the final version of your paper, and its final version will be published with the paper.

The reviewers of your paper will be asked to use the checklist as one of the factors in their evaluation. While "[Yes] " is generally preferable to "[No] ", it is perfectly acceptable to answer "[No] " provided a proper justification is given (e.g., "error bars are not reported because it would be too computationally expensive" or "we were unable to find the license for the dataset we used"). In general, answering "[No] " or "[NA] " is not grounds for rejection. While the questions are phrased in a binary way, we acknowledge that the true answer is often more nuanced, so please just use your best judgment and write a justification to elaborate. All supporting evidence can appear either in the main paper or the supplemental material, provided in appendix. If you answer [Yes] to a question, in the justification please point to the section(s) where related material for the question can be found.

IMPORTANT, please:

Delete this instruction block, but keep the section heading Neur IPS paper checklist", Keep the checklist subsection headings, questions/answers and guidelines below. Do not modify the questions and only use the provided macros for your answers.

Question: Do the main claims made in the abstract and introduction accurately reflect the paper s contributions and scope? Answer: [Yes] Justification: Section 3, 4. Guidelines:

The answer NA means that the abstract and introduction do not include the claims made in the paper. The abstract and/or introduction should clearly state the claims made, including the contributions made in the paper and important assumptions and limitations. A No or NA answer to this question will not be perceived well by the reviewers. The claims made should match theoretical and experimental results, and reflect how much the results can be expected to generalize to other settings. It is fine to include aspirational goals as motivation as long as it is clear that these goals are not attained by the paper. 2. Limitations

Question: Does the paper discuss the limitations of the work performed by the authors? Answer: [Yes] Justification: We have discussed several future directions in Section 3 and 6

Guidelines:

The answer NA means that the paper has no limitation while the answer No means that the paper has limitations, but those are not discussed in the paper. The authors are encouraged to create a separate "Limitations" section in their paper. The paper should point out any strong assumptions and how robust the results are to violations of these assumptions (e.g., independence assumptions, noiseless settings, model well-specification, asymptotic approximations only holding locally). The authors should reflect on how these assumptions might be violated in practice and what the implications would be. The authors should reflect on the scope of the claims made, e.g., if the approach was only tested on a few datasets or with a few runs. In general, empirical results often depend on implicit assumptions, which should be articulated. The authors should reflect on the factors that influence the performance of the approach. For example, a facial recognition algorithm may perform poorly when image resolution is low or images are taken in low lighting. Or a speech-to-text system might not be used reliably to provide closed captions for online lectures because it fails to handle technical jargon. The authors should discuss the computational efficiency of the proposed algorithms and how they scale with dataset size. If applicable, the authors should discuss possible limitations of their approach to address problems of privacy and fairness. While the authors might fear that complete honesty about limitations might be used by reviewers as grounds for rejection, a worse outcome might be that reviewers discover limitations that aren t acknowledged in the paper. The authors should use their best judgment and recognize that individual actions in favor of transparency play an important role in developing norms that preserve the integrity of the community. Reviewers will be specifically instructed to not penalize honesty concerning limitations.

3. Theory Assumptions and Proofs

Question: For each theoretical result, does the paper provide the full set of assumptions and a complete (and correct) proof?

Answer: [Yes]

Justification: All assumptions and complete proofs are provided in the appendix.

Guidelines:

The answer NA means that the paper does not include theoretical results. All the theorems, formulas, and proofs in the paper should be numbered and crossreferenced. All assumptions should be clearly stated or referenced in the statement of any theorems. The proofs can either appear in the main paper or the supplemental material, but if they appear in the supplemental material, the authors are encouraged to provide a short proof sketch to provide intuition. Inversely, any informal proof provided in the core of the paper should be complemented by formal proofs provided in appendix or supplemental material. Theorems and Lemmas that the proof relies upon should be properly referenced.

4. Experimental Result Reproducibility

Question: Does the paper fully disclose all the information needed to reproduce the main experimental results of the paper to the extent that it affects the main claims and/or conclusions of the paper (regardless of whether the code and data are provided or not)?

Answer: [Yes]

Justification: Section 4 and Appendix D.

Guidelines:

The answer NA means that the paper does not include experiments.

If the paper includes experiments, a No answer to this question will not be perceived well by the reviewers: Making the paper reproducible is important, regardless of whether the code and data are provided or not. If the contribution is a dataset and/or model, the authors should describe the steps taken to make their results reproducible or verifiable. Depending on the contribution, reproducibility can be accomplished in various ways. For example, if the contribution is a novel architecture, describing the architecture fully might suffice, or if the contribution is a specific model and empirical evaluation, it may be necessary to either make it possible for others to replicate the model with the same dataset, or provide access to the model. In general. releasing code and data is often one good way to accomplish this, but reproducibility can also be provided via detailed instructions for how to replicate the results, access to a hosted model (e.g., in the case of a large language model), releasing of a model checkpoint, or other means that are appropriate to the research performed. While Neur IPS does not require releasing code, the conference does require all submissions to provide some reasonable avenue for reproducibility, which may depend on the nature of the contribution. For example (a) If the contribution is primarily a new algorithm, the paper should make it clear how to reproduce that algorithm. (b) If the contribution is primarily a new model architecture, the paper should describe the architecture clearly and fully. (c) If the contribution is a new model (e.g., a large language model), then there should either be a way to access this model for reproducing the results or a way to reproduce the model (e.g., with an open-source dataset or instructions for how to construct the dataset). (d) We recognize that reproducibility may be tricky in some cases, in which case authors are welcome to describe the particular way they provide for reproducibility. In the case of closed-source models, it may be that access to the model is limited in some way (e.g., to registered users), but it should be possible for other researchers to have some path to reproducing or verifying the results.

5. Open access to data and code

Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material?

Answer: [No]

Justification: The code and model checkpoints will be publicly available after the submission is acceptance.

Guidelines:

The answer NA means that paper does not include experiments requiring code. Please see the Neur IPS code and data submission guidelines (https://nips.cc/ public/guides/Code Submission Policy) for more details. While we encourage the release of code and data, we understand that this might not be possible, so No is an acceptable answer. Papers cannot be rejected simply for not including code, unless this is central to the contribution (e.g., for a new open-source benchmark). The instructions should contain the exact command and environment needed to run to reproduce the results. See the Neur IPS code and data submission guidelines (https: //nips.cc/public/guides/Code Submission Policy) for more details. The authors should provide instructions on data access and preparation, including how to access the raw data, preprocessed data, intermediate data, and generated data, etc. The authors should provide scripts to reproduce all experimental results for the new proposed method and baselines. If only a subset of experiments are reproducible, they should state which ones are omitted from the script and why. At submission time, to preserve anonymity, the authors should release anonymized versions (if applicable).

Providing as much information as possible in supplemental material (appended to the paper) is recommended, but including URLs to data and code is permitted. 6. Experimental Setting/Details

Question: Does the paper specify all the training and test details (e.g., data splits, hyperparameters, how they were chosen, type of optimizer, etc.) necessary to understand the results? Answer: [Yes] Justification: Section 4 and Appendix D. Guidelines:

The answer NA means that the paper does not include experiments. The experimental setting should be presented in the core of the paper to a level of detail that is necessary to appreciate the results and make sense of them. The full details can be provided either with the code, in appendix, or as supplemental material. 7. Experiment Statistical Significance

Question: Does the paper report error bars suitably and correctly defined or other appropriate information about the statistical significance of the experiments? Answer: [No]

Justification: There exists little randomness in all the experiments of this submission, which means that results of using different random seeds are almost the same. Guidelines:

The answer NA means that the paper does not include experiments. The authors should answer "Yes" if the results are accompanied by error bars, confidence intervals, or statistical significance tests, at least for the experiments that support the main claims of the paper. The factors of variability that the error bars are capturing should be clearly stated (for example, train/test split, initialization, random drawing of some parameter, or overall run with given experimental conditions). The method for calculating the error bars should be explained (closed form formula, call to a library function, bootstrap, etc.) The assumptions made should be given (e.g., Normally distributed errors). It should be clear whether the error bar is the standard deviation or the standard error of the mean. It is OK to report 1-sigma error bars, but one should state it. The authors should preferably report a 2-sigma error bar than state that they have a 96% CI, if the hypothesis of Normality of errors is not verified. For asymmetric distributions, the authors should be careful not to show in tables or figures symmetric error bars that would yield results that are out of range (e.g. negative error rates). If error bars are reported in tables or plots, The authors should explain in the text how they were calculated and reference the corresponding figures or tables in the text. 8. Experiments Compute Resources

Question: For each experiment, does the paper provide sufficient information on the computer resources (type of compute workers, memory, time of execution) needed to reproduce the experiments? Answer: [Yes] Justification: Section 4 and Appendix D Guidelines:

The answer NA means that the paper does not include experiments. The paper should indicate the type of compute workers CPU or GPU, internal cluster, or cloud provider, including relevant memory and storage.

The paper should provide the amount of compute required for each of the individual experimental runs as well as estimate the total compute. The paper should disclose whether the full research project required more compute than the experiments reported in the paper (e.g., preliminary or failed experiments that didn t make it into the paper). 9. Code Of Ethics

Question: Does the research conducted in the paper conform, in every respect, with the Neur IPS Code of Ethics https://neurips.cc/public/Ethics Guidelines? Answer: [Yes] Justification: The research in this work conforms with the Neur IPS Code of Ethics. Guidelines:

The answer NA means that the authors have not reviewed the Neur IPS Code of Ethics. If the authors answer No, they should explain the special circumstances that require a deviation from the Code of Ethics. The authors should make sure to preserve anonymity (e.g., if there is a special consideration due to laws or regulations in their jurisdiction). 10. Broader Impacts

Question: Does the paper discuss both potential positive societal impacts and negative societal impacts of the work performed? Answer: [NA] Justification: There is no societal impact of the work performed. Guidelines:

The answer NA means that there is no societal impact of the work performed. If the authors answer NA or No, they should explain why their work has no societal impact or why the paper does not address societal impact. Examples of negative societal impacts include potential malicious or unintended uses (e.g., disinformation, generating fake profiles, surveillance), fairness considerations (e.g., deployment of technologies that could make decisions that unfairly impact specific groups), privacy considerations, and security considerations. The conference expects that many papers will be foundational research and not tied to particular applications, let alone deployments. However, if there is a direct path to any negative applications, the authors should point it out. For example, it is legitimate to point out that an improvement in the quality of generative models could be used to generate deepfakes for disinformation. On the other hand, it is not needed to point out that a generic algorithm for optimizing neural networks could enable people to train models that generate Deepfakes faster. The authors should consider possible harms that could arise when the technology is being used as intended and functioning correctly, harms that could arise when the technology is being used as intended but gives incorrect results, and harms following from (intentional or unintentional) misuse of the technology. If there are negative societal impacts, the authors could also discuss possible mitigation strategies (e.g., gated release of models, providing defenses in addition to attacks, mechanisms for monitoring misuse, mechanisms to monitor how a system learns from feedback over time, improving the efficiency and accessibility of ML). 11. Safeguards

Question: Does the paper describe safeguards that have been put in place for responsible release of data or models that have a high risk for misuse (e.g., pretrained language models, image generators, or scraped datasets)? Answer: [NA] Justification: The paper poses no such risks. Guidelines:

The answer NA means that the paper poses no such risks.

Released models that have a high risk for misuse or dual-use should be released with necessary safeguards to allow for controlled use of the model, for example by requiring that users adhere to usage guidelines or restrictions to access the model or implementing safety filters. Datasets that have been scraped from the Internet could pose safety risks. The authors should describe how they avoided releasing unsafe images. We recognize that providing effective safeguards is challenging, and many papers do not require this, but we encourage authors to take this into account and make a best faith effort.

12. Licenses for existing assets

Question: Are the creators or original owners of assets (e.g., code, data, models), used in the paper, properly credited and are the license and terms of use explicitly mentioned and properly respected?

Answer: [NA]

Justification: The paper does not use existing assets.

Guidelines:

The answer NA means that the paper does not use existing assets. The authors should cite the original paper that produced the code package or dataset. The authors should state which version of the asset is used and, if possible, include a URL. The name of the license (e.g., CC-BY 4.0) should be included for each asset. For scraped data from a particular source (e.g., website), the copyright and terms of service of that source should be provided. If assets are released, the license, copyright information, and terms of use in the package should be provided. For popular datasets, paperswithcode.com/datasets has curated licenses for some datasets. Their licensing guide can help determine the license of a dataset. For existing datasets that are re-packaged, both the original license and the license of the derived asset (if it has changed) should be provided. If this information is not available online, the authors are encouraged to reach out to the asset s creators.

13. New Assets

Question: Are new assets introduced in the paper well documented and is the documentation provided alongside the assets?

Answer: [NA]

Justification: The paper does not release new assets.

Guidelines:

The answer NA means that the paper does not release new assets. Researchers should communicate the details of the dataset/code/model as part of their submissions via structured templates. This includes details about training, license, limitations, etc. The paper should discuss whether and how consent was obtained from people whose asset is used. At submission time, remember to anonymize your assets (if applicable). You can either create an anonymized URL or include an anonymized zip file.

14. Crowdsourcing and Research with Human Subjects

Question: For crowdsourcing experiments and research with human subjects, does the paper include the full text of instructions given to participants and screenshots, if applicable, as well as details about compensation (if any)?

Answer: [NA]

Justification: The paper does not involve crowdsourcing nor research with human subjects.

Guidelines:

The answer NA means that the paper does not involve crowdsourcing nor research with human subjects. Including this information in the supplemental material is fine, but if the main contribution of the paper involves human subjects, then as much detail as possible should be included in the main paper. According to the Neur IPS Code of Ethics, workers involved in data collection, curation, or other labor should be paid at least the minimum wage in the country of the data collector. 15. Institutional Review Board (IRB) Approvals or Equivalent for Research with Human Subjects Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country or institution) were obtained? Answer: [NA] Justification: The paper does not involve crowdsourcing nor research with human subjects. Guidelines:

The answer NA means that the paper does not involve crowdsourcing nor research with human subjects. Depending on the country in which research is conducted, IRB approval (or equivalent) may be required for any human subjects research. If you obtained IRB approval, you should clearly state this in the paper. We recognize that the procedures for this may vary significantly between institutions and locations, and we expect authors to adhere to the Neur IPS Code of Ethics and the guidelines for their institution. For initial submissions, do not include any information that would break anonymity (if applicable), such as the institution conducting the review.