# metropolis_sampling_for_constrained_diffusion_models__39fe78ed.pdf

Metropolis Sampling for Constrained Diffusion Models

Nic Fishman University of Oxford Leo Klarner University of Oxford Emile Mathieu University of Cambridge

Michael Hutchinson University of Oxford Valentin De Bortoli ENS Ulm

{njwfish,leojklarner}@gmail.com

Denoising diffusion models have recently emerged as the predominant paradigm for generative modelling on image domains. In addition, their extension to Riemannian manifolds has facilitated a range of applications across the natural sciences. While many of these problems stand to benefit from the ability to specify arbitrary, domaininformed constraints, this setting is not covered by the existing (Riemannian) diffusion model methodology. Recent work has attempted to address this issue by constructing novel noising processes based on the reflected Brownian motion and logarithmic barrier methods. However, the associated samplers are either computationally burdensome or only apply to convex subsets of Euclidean space. In this paper, we introduce an alternative, simple noising scheme based on Metropolis sampling that affords substantial gains in computational efficiency and empirical performance compared to the earlier samplers. Of independent interest, we prove that this new process corresponds to a valid discretisation of the reflected Brownian motion. We demonstrate the scalability and flexibility of our approach on a range of problem settings with convex and non-convex constraints, including applications from geospatial modelling, robotics and protein design.

1 Introduction

In recent years, denoising diffusion models (Sohl-Dickstein et al., 2015; Song et al., 2019; Song et al., 2021; Ho et al., 2020) have emerged as a powerful paradigm for generative modelling, achieving state-of-the-art performance across a range of domains. They work by progressively adding noise to data following a Stochastic Differential Equation (SDE) the forward noising process until it is close to the invariant distribution of the SDE. The generative model is then given by an approximation of the associated time-reversed denoising process, which is also an SDE whose drift depends on the gradient of the logarithmic densities of the forward process, referred to as the Stein score. Building on the success of diffusion models for image generation tasks, De Bortoli et al. (2022) and Huang et al. (2022) have recently extended this framework to a wide range of Riemannian manifolds, enabling the specification of inherent structural properties of the modelled domain. This has broadened the applicability of diffusion models to problems in the natural and engineering sciences, including the conformational modelling of small molecules (Jing et al., 2022; Corso et al., 2022), proteins (Trippe et al., 2022; Watson et al., 2022; Yim et al., 2023) and robotic platforms (Urain et al., 2022).

However, in many data-scarce or safety-critical settings, researchers may want to restrict the modelled domain even further by specifying problem-informed constraints to make maximal use of limited experimental data or prevent unwanted behaviour (Morris, 2002; Han et al., 2006; Thiele et al., 2013; Lukens et al., 2020). As illustrated in Figure 1, such domain-informed constraints can be naturally represented as a Riemannian manifold with boundary. Training diffusion models on such constrained manifolds is thus an important problem that requires principled noising processes and corresponding discretisations that stay within the constrained set.

37th Conference on Neural Information Processing Systems (Neur IPS 2023).

(c) M S2 R3

Figure 1: When operating in data-scarce settings, it may often be beneficial to specify as much prior knowledge of the modelled domain as possible. Consider a distribution over a subset M of the unit sphere S2 R3. While a Euclidean diffusion model can approximate the distribution in R3 (a), directly modelling it on S2 can make learning significantly easier (b). Restricting the problem space even further by only constructing a diffusion model on M can lead to even better performance (c).

Recent work by Fishman et al. (2023) has attempted to derive such processes and extend the applicability of diffusion models to inequality-constrained manifolds by investigating the generative modelling applications of classic sampling schemes based on log-barrier methods (Kannan et al., 2009; Lee et al., 2017; Noble et al., 2022; Kook et al., 2022; Gatmiry et al., 2022; Lee et al., 2018) and the reflected Brownian motion (Williams, 1987; Petit, 1997; Shkolnikov et al., 2013). While empirically promising, the proposed algorithms can be computationally and numerically burdensome, and require bespoke implementations for different manifolds and constraints. Concurrently, Lou et al. (2023) have investigated the use of reflected diffusion models for image applications. They focus on the high-dimensional hypercube, as this setting admits a theoretically grounded treatment of the static thresholding method which is widely used in image models such as Saharia et al. (2022). More recently, Liu et al. (2023) have investigated the use of log-barrier-based mirror maps to transform a constrained domain into an unconstrained dual space for applications in image watermarking. While both methods exhibit robust scaling properties and impressive results, they only consider convex subsets of Euclidean space and do not extend to more general manifolds.

Here, we propose a new method for generative modelling on constrained manifolds based on a Metropolis-based discretisation of the reflected Brownian motion. The Metropolised process chief advantage is that it is lightweight: the only additional requirement over those outlined in De Bortoli et al. (2022) that is needed to implement a constrained diffusion model is an efficient binary function that indicates whether any given point is within the constrained set. This Metropolised approximation of the reflected Brownian motion is substantially easier to implement, faster to compute and more numerically stable than the previously considered sampling schemes. Our core theoretical contribution is to show that this new discretisation converges to the reflected SDE by using the invariance principle for SDEs with boundary (Stroock et al., 1971). To the best of our knowledge, this is the first time that such a process has been investigated. We demonstrate that our method attains improved empirical results on diverse manifolds with convex and non-convex constraints by applying it to a range of problems from geospatial modelling, robotics and protein design.

2 Background

Riemannian manifolds. A Riemannian manifold is defined as a tuple (M, g) with M a smooth manifold and g a metric defining an inner product on tangent spaces. In this work, we will use the exponential map expx : Tx M M, as well as the extension of the gradient , divergence div and Laplace operators to M. All of these quantities can be defined in local coordinates in terms of the metric. The extension of the Laplace operator to M is called the Laplace-Beltrami operator, also denoted when there is no ambiguity. Using , we can define a Brownian motion on M, denoted (Bt)t 0 and with density w.r.t. the volume form of M denoted pt for any t > 0. We refer to Appendix B for a more detailed exposition, to Lee (2013) for a thorough treatment of Riemannian manifolds and to Hsu (2002) for details on stochastic analysis on manifolds. In the following, we consider a constrained manifold M defined by

M = {x N : fi(x) < 0, i I}, (1)

Unconstrained Reﬂected Metropolis

Figure 2: Visual comparison of a discretisation of the unconstrained Brownian motion (blue) and two discretisations of the reflected Brownian motion: one based on a reflection scheme (green) and the other based on our Metropolis sampler (red). The Metropolised trajectory is very close to that of the reflected one while being significantly easier to implement and cheaper to compute.

where (N, g) is a Riemannian manifold, I is an arbitrary finite indexing family and for any i I, fi C(N, R). Since I is finite and fi continuous for any i I, M is an open set of N and inherits its metric g. This captures simple Euclidean polytopes and complex constrained spaces like Figure 1.

Denoising diffusion models. Denoising diffusion models (Song et al., 2019; Ho et al., 2020; Song et al., 2021) work as follows: let (Xt)t [0,T ] be a noising process that corrupts the original data distribution p0. We assume that (Xt)t 0 converges to N(0, σ2 Id), with σ > 0. Several such processes exist, but in practice we consider the Ornstein-Uhlenbeck (OU) process, also referred to as VP-SDE, which is defined by the following Stochastic Differential Equation (SDE)

2Xtdt + σd Bt, X0 p0. (2)

Under conditions on p0, for any T > 0, (Yt)t [0,T ] = (XT t)t [0,T ] is also the (weak) solution to a SDE (Anderson, 1982; Haussmann et al., 1986; Cattiaux et al., 2021)

2Yt + σ2 log p T t(Yt)}dt + σd Bt, Y0 p T , (3)

where pt denotes the density of Xt. In practice, log pt is approximated with a score network (t, x) 7 sθ(t, x) trained by minimising either a denoising score matching (dsm) loss or an implicit score matching (ism) loss (Vincent, 2011)

ℓ(θ) = Et U([0,T ]),(X0,Xt)[λt( 1

2 sθ(t, Xt) 2 + div(sθ)(t, Xt))], (4)

where λt > 0. For a flexible score network, the global minimiser θ = argminθL(θ) satisfies sθ (t, ) = log pt. De Bortoli et al. (2022) and Huang et al. (2022) have extended denoising diffusion models to the Riemannian setting. The time-reversal formula (3) remains the same, replacing the Euclidean gradient with its Riemannian equivalent. The ism loss can still be computed in that setting. However, the samplers used in the Riemannian setting differ from the classical Euler Maruyama discretisation used in the Euclidean framework. De Bortoli et al. (2022) use Geodesic Random Walks (Jørgensen, 1975), which ensure that the samples remain on the manifold at every step. In this paper, we propose a sampler with similar properties in the case of constrained manifolds.

Reflected SDE. We conclude this section by recalling the framework for studying reflected SDEs, which is introduced via the notion of the Skorokhod problem. For simplicity, we focus on Euclidean space Rd here, but note that reflected processes can be defined on arbitrary smooth manifolds N. In the case of the Brownian motion, a solution to the Skorokhod problem is a process of the form ( Bt, kt)t 0. Locally, ( Bt)t 0 can be seen as a regular Brownian motion (Bt)t 0 while (kt)t 0 forces ( Bt)t 0 to remain in M. Under mild additional regularity conditions on M and ( Bt, kt)t 0, see Skorokhod (1961), ( Bt, kt)t 0 is a solution to the Skorokhod problem if for any t 0

Bt = B0 + Bt kt M, (5)

|k|t = R t 0 1 Bs Md|k|s and kt = R t 0 n( Bs)d|k|s, where (|k|t)t 0 is the total variation of (kt)t 01. Let us provide some intuition on this definition. When ( Bt)t 0 hits the boundary M, kt pushes the process back into M along the inward normal n( Bt), according to kt = R t 0 n( Bs)d|k|s.

1In this case (kt)t 0 is not regular enough, but if it were in the class C1, its total variation would be given by R t 0 | tkt|ds in the one-dimensional case.

(a) Reflected Brownian motion

(b) Metropolised approximation of Brownian motion

Figure 3: Evolution of the density of the reflected Brownian motion and its Metropolis samplingbased approximation on the unit interval starting from a delta mass.

The condition |k|t = R t 0 1 Bs Md|k|s is more technical and can be seen as imposing that kt remains constant so long as ( Bt)t 0 does not hit M. We refer to Fishman et al. (2023) and Lou et al. (2023) for a more thorough introduction of these concepts in the context of diffusion models.

3 Diffusion models for constrained manifolds via Metropolis sampling

In Section 3.1, we highlight the practical limitations of existing constrained diffusion models and propose an alternative Metropolis sampling-based approach. In Section 3.2, we outline our proof that this process corresponds to a valid discretisation of the reflected Brownian motion, justifying its use in diffusion models. An overview of the samplers we cover in this section is presented in Table 1.

3.1 Practical limitations of existing constrained diffusion models

Barrier methods. In the barrier approach, a constrained manifold is transformed into an unconstrained space via a barrier metric. This metric is defined by 2ϕ(x) with ϕ(x) = P

i I ϕi(d(x, fi)) where d(x, fi) is the minimum distance from the point x to the set defined by fi(x) = 0 and ϕi is a monotonically decreasing function such that limz 0 ϕi(z) = . Under additional regularity assumptions, ϕ is called a barrier function (see Nesterov et al. (1994)). This definition ensures that the barrier function induces a well-defined exponential map on the manifold, making the Riemannian diffusion model frameworks of De Bortoli et al. (2022) and Huang et al. (2022) applicable. In the log-barrier method of Fishman et al. (2023), evaluating ϕ requires computing d(x, M) (and its derivatives), which can be prohibitively expensive. Furthermore, since the exponential map under the induced manifold is difficult to compute, it is approximated by projecting the exponential map on the original manifold back onto the constraint set, incurring an additional bias. Liu et al. (2023) propose a more tractable method by constructing a mirror map that transforms a constrained domain into an unconstrained dual space, in which one can train a standard Euclidean diffusion model. However, this approach is only applicable to convex subsets of Rd and does not extend to arbitrary Riemannian manifolds. More generally, warping the geometry of the modelled domain can adversely impact the interpolative performance of log-barrier-based diffusion models, as the space between data points expands rapidly when approaching the boundary.

Reflected stochastic processes. Fishman et al. (2023) and Lou et al. (2023) introduce diffusion models based on the reflected Brownian motion (RBM). In Fishman et al. (2023), the reflected SDE is discretised by (i) considering a classical step of the Euler-Maruyama discretization (or the Geodesic Random Walk in the Riemannian setting) and (ii) reflecting this step according to the boundary defined by M. To compute the reflection, one must check whether the step crosses the boundary. If it does, the point of intersection needs to be calculated in order to reflect the ray and continue the step in the reflected direction. This can require an arbitrarily large number of reflections depending on the step size, the geodesic on the manifold, and the geometry of the bounded region within the manifold. We refer to Appendix C for the pseudocode of the reflection step and additional comments. An alternative approach to discretising a reflected SDE is to replace the reflection with a projection (Słomi nski, 1994). However, the projection requires the most expensive part of the reflection algorithm: computing the intersection of the geodesic with the boundary. Lou et al. (2023) propose a more tractable approach that exploits the product structure of the unit hypercube to afford simulation-free sampling but does not extend to arbitrary Riemannian manifolds. Additionally, specifying convex constraints in their framework requires a bijection onto the hypercube, distorting the modelled geometry and incurring the same issues as outlined above.

Table 1: Comparison of the advantages and disadvantages of the different constrained (Riemannian) diffusion models covered in Section 3.1.

Both required for fast DSM loss

DIFFUSION MODEL TRACTABLE CONDITIONAL SCORE SIMULATION-FREE FORWARD SAMPLING MODELLING DOMAIN PRESERVES METRIC OF M

Reflected Diffusions

LOU ET AL. (2023) convex Rd FISHMAN ET AL. (2023) O(d2) Any M METROPOLIS (OURS) O(d) Any M

Barrier Diffusions

FISHMAN ET AL. (2023) convex any M LIU ET AL. (2023) convex Rd

Metropolis approximation. Existing approaches to constrained (Riemannian) diffusion models either only apply to convex subsets of Rd or require manifoldand constraint-specific implementations that become computationally intractable as the complexity of the modelled geometry increases. This limits their practicality even for relatively simple manifolds with well-defined exponential maps and linear inequality constraints such as for example polytopes.

Algorithm 1 Metropolis approx. of RBM

Require: p M, {fi}i I

Sample v N(0, Id) Tp M p expp(v) if fi(p ) < 0 i then

p p end if return p

In the following, we introduce a method that aims to solve both of these problems. The sampler we propose is similar to a classical Euler-Maruyama discretisation of the Brownian motion, except that, whenever a step would carry the Brownian motion out of the constrained region, we reject it (see Algorithm 1). This is a Metropolised version of the usual discretisation and is trivial to implement compared to the existing barrier, reflection and projection methods. Hence, this method enables the principled extension of diffusion models to arbitrarily constrained manifolds at virtually no added implementational complexity or computational expense.

3.2 Relating the Metropolis sampler to the reflected Brownian motion

In this section, we prove that the proposed Metropolis sampling-based process (Algorithm 1) corresponds to a valid discretisation of the reflected process, justifying its use in diffusion models. Here we focus on a concise presentation of the core concepts and the main results. A full proof can be found in Appendix D. For simplicity, we restrict ourselves to the Euclidean setting. All of our results require particular assumptions on M, which we discuss at the end of this section. We begin with a definition of the Metropolis approximation of RBM. Definition 1. For any γ > 0 and k N, let Xγ 0 M and Xγ k+1 = Xγ k + γZγ k if Xγ k + γZγ k M and Xγ k otherwise. The sequence (Xγ k )k N is called the Metropolis approximation of RBM.

For any γ > 0, we consider (Xγ t )t 0, the linear interpolation of (Xγ k )k N such that for any k N, Xγ kγ = Xγ k . The following result is the main theoretical contribution of our paper.

Theorem 2. Under assumptions on M, for any T 0, (Xγ t )t [0,T ] weakly converges to the RBM ( Bt)t [0,T ] as γ 0.

The rest of the section is devoted to a high level presentation of the proof of Theorem 2. It is theoretically impractical to work directly with the Metropolis approximation of RBM. Instead, we introduce an auxiliary process, show this converges to the RBM, and finally prove that the convergence of the auxiliary process implies the convergence of our Metropolis discretisation.

Definition 3. For any γ > 0 and k N, let ˆXγ 0 = x M and ˆXγ k+1 = ˆXγ k + γZγ k with Zγ k a Gaussian random variable conditioned on ˆXγ k + γZγ k M. The sequence ( ˆXγ k )k N is called the Rejection approximation of RBM.

Algorithm 2 Rejection approx. of RBM

Require: p M, {fi}i I

Sample v N(0, Id) Tp M p expp(v) while fi(p ) 0 for any i do

Sample v Id(0, 1) Tp M p expp(v) end while return p

We call this process Rejection approximation of RBM since in practice, Zγ k is sampled using rejection sampling, see Algorithm 2. For any γ > 0, we also consider ( ˆXγ t )t 0, the linear interpolation of ( ˆXγ k )k N such that for any k N, ˆXγ kγ = ˆXγ k . In Appendix D, we prove the following result.

Theorem 4. Under assumptions on M, for any T 0, ( ˆXγ t )t [0,T ] weakly converges to the Reflected Brownian Motion ( Bt)t [0,T ] as γ 0.

Proof. Here we give some elements of the proof. Details and full derivations are postponed to Appendix D. Our approach is based on the invariance principle of Stroock et al. (1971). More precisely, we show that we can compute an equivalent drift and diffusion matrix for the discretised process and that, as γ 0, the drift converges to zero and the diffusion matrix converges to Id. In the Euclidean setting, this result, accompanied by mild regularity and growth assumptions, ensures that the discretization weakly converges to the original SDE. However, the case with boundary is much more complicated, primarily because the approximate drift might explode near the boundary, thus we need to verify exactly how the drift behaves as γ 0 and as the process approaches the boundary. We show that the normalised drift converges to the inward normal near the boundary. This ensures that (a) in the interior of M the drift converges to zero, i.e. locally in the interior of M the Brownian motion and the Reflected Brownian Motion coincide, (b) on the boundary, the drift pushes the samples inside the manifold according to the inward normal, mimicking (kt)t 0 in (5). Finally, with results from Stroock et al. (1971) and Kang et al. (2017), we show the convergence to the RBM.

Our next step is to show that the approximate drift and diffusion matrix of the Metropolised process are upper and lower bounded by their counterparts in the rejection process. While the upper-bound is easy to derive, the lower-bound requires the following result.

Proposition 5. Under assumptions on M, ε > 0, γ > 0 such that for any γ (0, γ) and for any x M, γ (0, γ) and Z N(0, Id) we have P(x + γZ M) 1/2 ε, with Z N(0, Id).

Proposition 5 tells us that locally the boundary looks like a half-space when integrating w.r.t. a Gaussian measure. A corollary is that, for γ > 0 small enough and for any k N, the probability that Xγ k+1 = Xγ k is upper bounded uniformly w.r.t. Xγ k M. The proof of Proposition 5 uses Theorem 7 in Appendix D, whose proof relies on the concept of tubular neighborhoods (Lee et al., 2012).

Having established the lower and upper bound, we can conclude the proof by noting that the approximate drift and the diffusion matrix in the rejection and Metropolis case coincide as γ 0. This is enough to apply the same results as before, giving the desired convergence.

Assumptions on M. Before concluding this section, we detail the assumptions we make on M. For Theorem 2 to hold, we assume that M = {x Rd : Φ(x) > 0} is bounded, with Φ C2(Rd, R) concave. We have that M = {x Rd : Φ(x) = 0}. In addition, we assume that for any x M, Φ(x) = 1. These assumptions match those Stroock et al. (1971) use for their study of the existence of solutions to the RBM. While it seems possible to relax the global existence of Φ to a local one, the regularity assumption of the domain is key. This regularity is essential to establish Proposition 5 and the associated geometrical result on tubular neighbourhoods. We also emphasize that the smoothness of the domain is central in the results of Kang et al. (2017) on the equivalence of two definitions of RBMs which we rely on.

4 Related work on approximations of reflected SDEs

Several schemes have been introduced to approximately sample from reflected Stochastic Differential Equations. They can be interpreted as modifications of classical Euler-Maruyama schemes used to discretise SDEs without boundary. One of the most common approaches is to use the Euler-Maruyama discretisation and project the solution onto the boundary if it escapes from the domain M.

Table 2: Log-likelihood ( ) of a held-out test set from a synthetic bimodal distribution over convex subsets of Rd bounded by the hypercube [ 1, 1]d and unit simplex d. Means and standard deviations are computed over 3 different runs. Average training time is provided in hours.

MANIFOLD DIMENSION REFLECTED METROPOLIS log-likelihood runtime log-likelihood runtime

[ 1, 1]d 2 2.25 .01 8.95 2.32 .05 0.72 3 3.77 .13 8.97 4.15 .15 0.71 10 7.42 .77 10.1 10.80 .34 0.90

d 2 1.01 .01 9.17 1.06 .02 0.82 3 2.64 .01 9.43 3.23 .17 0.78 10 7.00 .13 10.5 7.81 .20 0.97

4 Reflected Hypercube Reflected Simplex Rejection Hypercube Rejection Simplex O(d) O(d²)

Convergence time rel. to d = 1

Dimension of manifold Figure 4: Convergence time of the Reflected (green) and Metropolis (red) forward noising processes to the uniform distribution on the hypercube [ 1, 1]d and unit simplex d. The lines indicate functions fit with the PYSR symbolic regression package (Cranmer, 2023) and correspond to empirical runtime complexities of O(d2) and O(d), respectively, matching the superimposed scaling law isocontours.

In this case, mean-square error rates of order almost 1/2 have been proven under various conditions (Liu, 1995; Chitashvili et al., 1981; Pettersson, 1995; Słomi nski, 1994). Concretely this means that E[ Bt Xt/n n 2] = O(n 1+ε) with ε > 0 arbitrary small and where (Xγ k )k N is the projection scheme. The rate 1/2 is tight (Pacchiarotti et al., 1998). It is possible to use the Euler-Peano method to get slight improvements for a mean-square error rate of order of 1/2, but this is impractical as it assumes that one can solve a (simplified) Skorokhod problem, which is usually intractable. Liu (1993) introduced a penalised method which pushes the solution away from the boundary and shows a mean-square error of order 1/4, see also Pettersson (1997). Weak errors of order 1 have been obtained in Bossy et al. (2004) and Gobet (2001) by introducing a reflection component in the discretisation or using some local approximation of the domain to a half-space. We refer to Pilipenko (2014) for an introduction to the discretisation of reflected SDEs. Closer to our work, Burdzy et al. (2008) consider three different methods to approximate reflected Brownian motions on general domains (two based on discrete methods and one based on killed diffusions). Only qualitative results are provided. To the best of our knowledge, no previous work in the probability literature has investigated the Metropolised scheme we propose. Our Metropolis scheme is also related to the ball walk (Applegate et al., 1991), which replaces the Gaussian random variable with a uniform over the ball (or the Dikin ellipsoid). Applegate et al. (1991) and Lovász et al. (2007) have studied the asymptotic convergence rate of the ball walk, but, to the best of our knowledge, its limiting behaviour when the step size goes to zero has not been investigated.

(a) Data distribution.

(b) Metropolis samples.

(c) Reflected samples.

(d) Uniform distribution.

Figure 5: A qualitative visual comparison of 106 samples from the data distribution, our Metropolis diffusion model, a reflected diffusion model and the uniform distribution for the constrained configurational modelling of robotic arms on S2 ++ R2.

5 Experimental results

To demonstrate the practical utility and empirical performance of the proposed Metropolis diffusion models, we conduct a comprehensive evaluation on a range of synthetic and real-world tasks. In Section 5.1, we assess the scalability of our method by applying it to synthetic distributions on hypercubes and simplices of increasing dimensionality. In Section 5.2, we extend the evaluation to real-world tasks on manifolds with convex constraints by applying our method to the robotics and protein design datasets presented in Fishman et al. (2023). In Section 5.3, we additionally demonstrate that our method extends to constrained manifolds with highly non-convex boundaries a setting that is intractable with existing approaches.

As we found in line with Fishman et al. (2023) that log-barrier diffusion models perform strictly worse than reflected approaches across all experimental settings, we focus on a more detailed comparison with the latter here and postpone additional empirical results to Appendix F.1. These include additional performance metrics and a comparison to an unconstrained Euclidean diffusion model on the synthetic datasets presented in Section 5.1.

For all experiments, we use a simple 6-layer MLP with sine activations and a score rescaling function to ensure that the score reaches zero at the boundary, scaling linearly into the interior of the constrained set as in Liu et al. (2022) and Fishman et al. (2023). We set T = 1, β0 = 1 10 3 and tune β1 to ensure that the forward process reaches the invariant distribution with a linear β-schedule. We use a learning rate of 2 10 4 with a cosine learning rate schedule and an ism loss with a modified loss weighting function of (1 + t), a batch size of 1024 and 8 repeats per batch. All models were trained on a single NVIDIA Ge Force GTX 1080 GPU. Additional details are provided in Appendix F.2.

All source code that is needed to reproduce the results presented below is made available under https://github.com/oxcsml/score-sde/tree/metropolis, which requires a supporting package to handle the different geometries that is available under https://github.com/oxcsml/geomstats/tree/polytope.

5.1 Synthetic distributions on simple polytopes

In this section, we investigate the scalability of the proposed Metropolis diffusion models by applying them to synthetic bimodal distributions over the d-dimensional hypercube [ 1, 1]d and unit simplex d. A quantitative comparison of the log-likelihood of a held-out test set is presented in Table 2, while a visual comparison is postponed to Appendix F.3. We find that our Metropolis models outperform reflected approaches across all dimensions and constraint geometries by a substantial and statistically significant margin while training in one tenth of the time. The degree of improvement seems to scale with the dimensionality of the problem: the larger the dimension of the experiment, the larger the gain in performance from using our proposed Metropolis scheme.

We observe a similar difference in the scaling properties of reflected and Metropolis models when measuring the convergence times of the respective forward noising processes to the uniform distribution on hypercubes [ 1, 1]d and simplices d of increasing dimensionality. The results are presented in Section 4 and show that the convergence time of the Metropolis process scales linearly in the dimension, while the reflected process scales quadratically.

(a) Data distribution.

(b) Metropolis samples.

(c) Reflected samples.

(d) Uniform distribution.

Figure 6: A qualitative comparison of 105 samples from the data distribution, our Metropolis diffusion model, a reflected diffusion model and the uniform distribution for the constrained conformational modelling of cyclic peptide backbones. For visual clarity, the figures only show the constrained planar projections encoded by P R3.

5.2 Modelling proteins and robotic arms under convex constraints

In addition to illustrating our method s scalability on high-dimensional synthetic tasks, we follow the experimental setup from Fishman et al. (2023) to additionally demonstrate its practical utility and favourable empirical performance on two real-world problems from robotics and protein design.

Constrained configurational modelling of robotic arms. The problem of modelling the configurations and trajectories of a robotic arm can be formulated as learning a distribution over the locations and manipulability ellipsoids of its joints, parameterised on Rd Sd ++, where Sd ++ is the manifold of symmetric positive-definite (SPD) d d matrices (Yoshikawa, 1985; Jaquier et al., 2021). For practical robotics applications, it may be desirable to restrict the maximal velocity with which a robotic arm can move or the maximum force it can exert. This manifests in a trace constraint C > 0 on Sd ++, resulting in a constrained manifold {M Sd ++ : Pd i=1 Mii < C}. Following Fishman et al. (2023), we parametrise this constraint via the Cholesky decomposition (Lin, 2019) and use the resulting setup to model the dataset presented in Jaquier et al. (2021).

Conformational modelling of protein backbones. Modelling the conformational ensembles of proteins is a data-scarce problem with a range of important applications in biotechnology and drug discovery (Lane, 2023). In many practical settings, it may often be unnecessary to model the structural ensembles of an entire protein, as researchers are primarily interested in specific functional sites that are embedded in a structurally conserved scaffold (Huang et al., 2016). Modelling the conformational ensembles of such substructural elements requires positional constraints on their endpoints to ensure that they can be accommodated by the remaining protein. Using the parametrisation and dataset presented in Fishman et al. (2023), we formulate the problem of modelling the backbone conformations of a cyclic peptide of length N = 6 as learning a distribution over the product of a polytope P R3 and the hypertorus T4.

Table 3: Log-likelihood ( ) of a held-out test set for the robotics and protein applications. Means and standard deviations are computed over 3 different runs. Average training time is provided in hours.

DATASET DOMAIN REFLECTED METROPOLIS log-likelihood runtime log-likelihood runtime

Robotics S2 ++ R2 8.39 .06 9.52 9.13 .03 1.36 Proteins P R3 T4 15.20 .06 24.80 15.33 .02 3.12

We quantify the empirical performance of different methods by evaluating the log-likelihood of a held-out test set and present the resulting performance metrics in Table 3. Again, we find that our Metropolis model outperforms the reflected approach by a statistically significant margin while training 7-8 times as fast. Qualitative visual comparisons of samples from the true distribution, the trained diffusion models and the uniform distribution are presented in Figures 5 and 6, with full univariate marginal and pairwise bivariate correlation plots postponed to Appendices F.4 and F.5.

(a) Data distribution.

(b) Metropolis samples.

(c) Uniform distribution.

Figure 7: Orthographic projection of 105 samples from (a) the data distribution, (b) our Metropolis diffusion model, and (c) the uniform distribution, for geospatial data (wildfire incidence rates) within a non-convex boundary (the continental United States). The projections are aligned with the geometric centre of the boundary and zoomed in ten-fold for visual clarity.

5.3 Modelling geospatial data within non-convex country borders

Motivated by the strong empirical performance of our approach on tasks with challenging convex constraints, we investigated its ability to model distributions whose support is restricted to manifolds with highly non-convex boundaries a setting that is intractable with existing approaches. To this end, we derived a geospatial dataset based on wildfire incidence rates within the continental United States (see Appendix E for full details) and trained a Metropolis diffusion model constrained by the corresponding country borders on the sphere S2. A qualitative visual comparison of samples from the true distribution, our model, and the uniform distribution is presented in Figures 7a to 7c and a quantitative comparison to a Riemannian diffusion model on S2 (De Bortoli et al., 2022) is given in Table 4. Both demonstrate that our approach is able to successfully capture challenging multimodal and sparse distributions on constrained manifolds with highly non-convex boundaries.

Table 4: MMD ( ) of a held-out test set for the geospatial modelling dataset. Means and standard deviations are computed over 3 different runs. Average training time is provided in hours.

MODEL DOMAIN MMD RUNTIME % IN BOUNDARY

Unconstrained S2 0.1567 0.013 0.81 63.3 Metropolis P S2 0.1388 0.015 3.86 100.0

6 Conclusion

Accurately modelling distributions on constrained Riemannian manifolds is a challenging problem with a range of impactful practical applications. In this work, we have proposed a mathematically principled and computationally tractable extension of existing diffusion model methodology to this setting. Based on a Metropolisation of random walks in Euclidean spaces and on Riemannian manifolds, we have shown that our approach corresponds to a valid discretisation of the reflected Brownian motion, justifying its use in diffusion models. To demonstrate the practical utility of our method, we have performed an extensive empirical evaluation, showing that it outperforms existing constrained diffusion models on a range of synthetic and real-world tasks defined on manifolds with convex boundaries, including applications from robotics and protein design. Leveraging the flexibility and simplicity of our method, we have also demonstrated that it extends beyond convex constraints and is able to successfully model distributions on manifolds with highly non-convex boundaries. While we found our method to perform well across the synthetic and real-world applications we considered, we expect it to perform poorly on certain constraint geometries. For instance, the current implementation relies on an isotropic noise distribution which could impede its performance on exceedingly narrow constraint geometries, even with correspondingly small step sizes. In this context, an important direction of future research would be to investigate whether we can instead sample from more suitable distributions, e.g. a Dikin ellipsoid, while maintaining the simplicity and efficiency of the Metropolis approach.

Acknowledgements

NF thanks the Rhodes Trust for supporting their studies at Oxford and this work. LK acknowledges support from the University of Oxford s Clarendon Fund.

B. D. Anderson. Reverse-time diffusion equation models. Stochastic Processes and their Applications, 12(3):313 326, 1982. Cited on page 3.

D. Applegate and R. Kannan. Sampling and integration of near log-concave functions. In Proceedings of the twenty-third annual ACM symposium on Theory of computing, pages 156 163, 1991. Cited on page 7.

J. A. Bednar, J. Crail, J. Crist-Harif, P. Rudiger, G. Brener, I. Thomas, C. B, J. Mease, J. Signell, M. Liquet, J.-L. Stevens, B. Collins, A. Thorve, thuydotm, S. H. Hansen, esc, kbowen, N. Abdennur, O. Smirnov, maihde, A. Hawley, A. Oriekhov, A. Ahmadia, B. A. B. Jr, C. H. Brandt, C. Tolboom, E. G., E. Welch, J. Bourbeau, and J. J. Schmidt. Holoviz/datashader: version 0.14.4, version v0.14.4, Feb. 2023. DOI: 10.5281/zenodo.7599872. URL: https://doi.org/10.5281/zenodo. 7599872. Cited on page 16.

M. Bevis and J.-L. Chatelain. Locating a point on a spherical surface relative to a spherical polygon of arbitrary shape. Mathematical geology, 21:811 828, 1989. Cited on pages 15, 16.

M. Bossy, E. Gobet, and D. Talay. A symmetrized euler scheme for an efficient approximation of reflected diffusions. Journal of applied probability, 41(3):877 889, 2004. Cited on page 7.

K. Burdzy and Z.-Q. Chen. Discrete approximations to reflected brownian motion, 2008. Cited on page 7.

P. Cattiaux, G. Conforti, I. Gentil, and C. Léonard. Time reversal of diffusion processes under a finite entropy condition. ar Xiv preprint ar Xiv:2104.07708, 2021. Cited on page 3.

R. Chitashvili and N. Lazrieva. Strong solutions of stochastic differential equations with boundary conditions. Stochastics: an international journal of probability and stochastic processes, 5(4):255 309, 1981. Cited on page 7.

G. Corso, H. Stärk, B. Jing, R. Barzilay, and T. Jaakkola. Diff Dock: Diffusion Steps, Twists, and Turns for Molecular Docking, Oct. 2022. URL: http://arxiv.org/abs/2210.01776. Cited on page 1.

M. Cranmer. Interpretable machine learning for science with pysr and symbolicregression. jl. ar Xiv preprint ar Xiv:2305.01582, 2023. Cited on page 7.

V. De Bortoli, E. Mathieu, M. Hutchinson, J. Thornton, Y. W. Teh, and A. Doucet. Riemannian score-based generative modeling, 2022. ar Xiv: 2202.02763 [cs.LG]. Cited on pages 1 4, 10, 18.

N. Fishman, L. Klarner, V. De Bortoli, E. Mathieu, and M. Hutchinson. Diffusion models for constrained domains. ar Xiv preprint ar Xiv:2304.05364, 2023. Cited on pages 2, 4, 5, 8, 9, 1, 16 18, 20.

K. Gatmiry and S. S. Vempala. Convergence of the riemannian langevin algorithm. ar Xiv preprint ar Xiv:2204.10818, 2022. Cited on page 2.

E. Gobet. Euler schemes and half-space approximation for the simulation of diffusion in a domain. ESAIM: Probability and Statistics, 5:261 297, 2001. Cited on page 7.

A. Gretton, K. M. Borgwardt, M. J. Rasch, B. Schölkopf, and A. Smola. A kernel two-sample test. Journal of Machine Learning Research, 13(null):723 773, Mar. 2012. ISSN: 1532-4435. Cited on page 16.

L. Han and L. Rudolph. Inverse kinematics for a serial chain with joints under distance constraints. In Robotics: Science and systems, 2006. Cited on pages 1, 20.

U. G. Haussmann and E. Pardoux. Time reversal of diffusions. The Annals of Probability:1188 1205, 1986. Cited on page 3.

J. Ho, A. Jain, and P. Abbeel. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 2020. Cited on pages 1, 3.

E. P. Hsu. Stochastic analysis on manifolds, number 38. American Mathematical Soc., 2002. Cited on page 2.

C.-W. Huang, M. Aghajohari, A. J. Bose, P. Panangaden, and A. Courville. Riemannian Diffusion Models, Aug. 2022. DOI: 10.48550/ar Xiv.2208.07949. Cited on pages 1, 3, 4.

P.-S. Huang, S. E. Boyken, and D. Baker. The coming of age of de novo protein design. Nature, 537(7620):320 327, 2016. Cited on page 9.

N. Jaquier, L. Rozo, D. G. Caldwell, and S. Calinon. Geometry-aware manipulability learning, tracking, and transfer. The International Journal of Robotics Research, 40(2-3):624 650, 2021. Cited on page 9.

B. Jing, G. Corso, J. Chang, R. Barzilay, and T. Jaakkola. Torsional Diffusion for Molecular Conformer Generation, June 2022. URL: http://arxiv.org/abs/2206.01729. Cited on page 1.

E. Jørgensen. The central limit problem for geodesic random walks. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete, 32(1-2):1 64, 1975. Cited on page 3.

W. Kang and K. Ramanan. On the submartingale problem for reflected diffusions in domains with piecewise smooth boundaries, 2017. Cited on pages 6, 4, 14.

R. Kannan and H. Narayanan. Random walks on polytopes and an affine interior point method for linear programming. In Proceedings of the Forty-First Annual ACM Symposium on Theory of Computing, STOC 09, pages 561 570, Bethesda, MD, USA. Association for Computing Machinery, 2009. ISBN: 9781605585062. DOI: 10.1145/1536414.1536491. URL: https: //doi.org/10.1145/1536414.1536491. Cited on page 2.

R. Ketzner, V. Ravindra, and M. Bramble. A robust, fast, and accurate algorithm for point in spherical polygon classification with applications in geoscience and remote sensing. Computers & Geosciences, 167:105185, 2022. Cited on pages 15, 16.

Y. Kook, Y. T. Lee, R. Shen, and S. S. Vempala. Sampling with riemannian hamiltonian monte carlo in a constrained space. ar Xiv preprint ar Xiv:2202.01908, 2022. Cited on page 2.

T. J. Lane. Protein structure prediction has reached the single-structure frontier. Nature Methods:1 4, 2023. Cited on page 9.

B. Laurent and P. Massart. Adaptive estimation of a quadratic functional by model selection. Annals of Statistics:1302 1338, 2000. Cited on page 6.

J. M. Lee. Smooth manifolds. In Introduction to Smooth Manifolds, pages 1 31. Springer, 2013. Cited on pages 2, 1.

J. M. Lee and J. M. Lee. Smooth manifolds. Springer, 2012. Cited on pages 6, 4, 5.

Y. T. Lee and S. S. Vempala. Convergence rate of riemannian hamiltonian monte carlo and faster polytope volume computation. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, pages 1115 1121, 2018. Cited on page 2.

Y. T. Lee and S. S. Vempala. Geodesic walks in polytopes. In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pages 927 940, Montreal Canada. ACM, June 2017. ISBN: 978-1-4503-4528-6. DOI: 10.1145/3055399.3055416. Cited on page 2.

Z. Lin. Riemannian geometry of symmetric positive definite matrices via cholesky decomposition. SIAM Journal on Matrix Analysis and Applications, 40(4):1353 1370, 2019. Cited on pages 9, 2.

G.-H. Liu, T. Chen, E. A. Theodorou, and M. Tao. Mirror diffusion models for constrained and watermarked generation. ar Xiv preprint ar Xiv:2310.01236, 2023. Cited on pages 2, 4, 5.

S. Liu, T. Kanamori, and D. J. Williams. Estimating density models with truncation boundaries using score matching. Journal of Machine Learning Research, 23(186):1 38, 2022. Cited on page 8.

Y. Liu. Discretization of a class of reflected diffusion processes. Mathematics and computers in simulation, 38(1-3):103 108, 1995. Cited on page 7.

Y. Liu. Numerical approaches to stochastic differential equations with boundary conditions. Purdue University, 1993. Cited on page 7.

A. Lou and S. Ermon. Reflected diffusion models. ar Xiv preprint ar Xiv:2304.04740, 2023. Cited on pages 2, 4, 5.

L. Lovász and S. Vempala. The geometry of logconcave functions and sampling algorithms. Random Structures & Algorithms, 30(3):307 358, 2007. Cited on page 7.

J. M. Lukens, K. J. Law, A. Jasra, and P. Lougovski. A practical and efficient approach for bayesian quantum state estimation. New Journal of Physics, 22(6):063038, 2020. Cited on page 1.

Met Office. Cartopy: a cartographic python library with a Matplotlib interface. Exeter, Devon, 2010 - 2015. URL: https://scitools.org.uk/cartopy. Cited on page 15.

B. J. Morris. Improved bounds for sampling contingency tables. Random Structures & Algorithms, 21(2):135 146, 2002. DOI: https : / / doi . org / 10 . 1002 / rsa . 10049. eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1002/rsa.10049. URL: https: //onlinelibrary.wiley.com/doi/abs/10.1002/rsa.10049. Cited on page 1.

Natural Earth. Natural earth, 2023. URL: https://www.naturalearthdata.com/. [Online; accessed 31-Jan-2023]. Cited on page 15.

Y. Nesterov and A. Nemirovskii. Interior-point polynomial algorithms in convex programming. SIAM, 1994. Cited on page 4.

M. Noble, V. De Bortoli, and A. Durmus. Barrier Hamiltonian Monte Carlo, Oct. 2022. DOI: 10. 48550/ar Xiv.2210.11925. Cited on page 2.

B. Pacchiarotti, C. Costantini, and F. Sartoretto. Numerical approximation for functionals of reflecting diffusion processes. SIAM Journal on Applied Mathematics, 58(1):73 102, 1998. Cited on page 7.

F. Petit. Time reversal and reflected diffusions. Stochastic Processes and their Applications, 69(1):25 53, July 1997. ISSN: 03044149. DOI: 10.1016/S0304-4149(97)00035-5. Cited on page 2.

R. Pettersson. Approximations for stochastic differential equations with reflecting convex boundaries. Stochastic processes and their applications, 59(2):295 308, 1995. Cited on page 7.

R. Pettersson. Penalization schemes for reflecting stochastic differential equations. Bernoulli:403 414, 1997. Cited on page 7.

A. Pilipenko. An introduction to stochastic differential equations with reflection, volume 1. Universitätsverlag Potsdam, 2014. Cited on page 7.

K. Ramanan. Reflected diffusions defined via the extended skorokhod map, 2006. Cited on pages 4,

P. Rudiger, J. A. Bednar, J.-L. Stevens, M. Liquet, C. B, S. H. Hansen, B. Little, Andrew, J. Signell, M. Hedley, C. Bosley, R. Hattersley, G. Brener, ea42gh, kbowen, A. Hilboll, I. Lustig, J. de Werd, N. Hand, R. Signell, J. Bampton, pmav99, scaine1, and zassa. Holoviz/geoviews: version 1.9.6, version v1.9.6, Jan. 2023. DOI: 10.5281/zenodo.7543863. URL: https://doi.org/10. 5281/zenodo.7543863. Cited on page 16.

C. Saharia, W. Chan, S. Saxena, L. Li, J. Whang, E. L. Denton, K. Ghasemipour, R. Gontijo Lopes, B. Karagol Ayan, T. Salimans, et al. Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems, 35:36479 36494, 2022. Cited on page 2.

M. Shkolnikov and I. Karatzas. Time-reversal of reflected Brownian motions in the orthant, July 2013. URL: http://arxiv.org/abs/1307.4422. Cited on page 2.

A. V. Skorokhod. Stochastic equations for diffusion processes in a bounded region. Theory of Probability & Its Applications, 6(3):264 274, 1961. Cited on page 3.

L. Słomi nski. On approximation of solutions of multidimensional sde s with reflecting boundary conditions. Stochastic processes and their Applications, 50(2):197 219, 1994. Cited on pages 4, 7.

J. Sohl-Dickstein, E. Weiss, N. Maheswaranathan, and S. Ganguli. Deep unsupervised learning using nonequilibrium thermodynamics. In International Conference on Machine Learning, pages 2256 2265. PMLR, 2015. Cited on page 1.

Y. Song and S. Ermon. Generative modeling by estimating gradients of the data distribution. In Advances in Neural Information Processing Systems, 2019. Cited on pages 1, 3.

Y. Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, and B. Poole. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2021. Cited on pages 1, 3.

D. W. Stroock and S. S. Varadhan. Diffusion processes with boundary conditions. Communications on Pure and Applied Mathematics, 24(2):147 225, 1971. Cited on pages 2, 6, 4, 14, 15.

I. Thiele, N. Swainston, R. Fleming, A. Hoppe, S. Sahoo, M. Aurich, H. Haraldsdottir, M. Mo, O. Rolfsson, M. Stobbe, S. Thorleifsson, R. Agren, C. Bölling, S. Bordel, A. Chavali, P. Dobson, W. Dunn, L. Endler, D. Hala, M. Hucka, D. Hull, D. Jameson, N. Jamshidi, J. Jonsson, N. Juty, S. Keating, I. Nookaew, N. Le Novère, N. Malys, A. Mazein, J. Papin, N. Price, E. Selkov, M. Sigurdsson, E. Simeonidis, N. Sonnenschein, K. Smallbone, A. Sorokin, J. van Beek, D. Weichart, I. Goryanin, J. Nielsen, H. Westerhoff, D. Kell, P. Mendes, and B. Palsson. A community-driven global reconstruction of human metabolism. English. Nature Biotechnology, Mar. 2013. ISSN: 1087-0156. DOI: 10.1038/nbt.2488. Cited on page 1.

B. L. Trippe, J. Yim, D. Tischer, T. Broderick, D. Baker, R. Barzilay, and T. Jaakkola. Diffusion probabilistic modeling of protein backbones in 3D for the motif-scaffolding problem, June 2022. DOI: 10.48550/ar Xiv.2206.04119. Cited on page 1.

J. Urain, N. Funk, G. Chalvatzaki, and J. Peters. Se (3)-diffusionfields: learning cost functions for joint grasp and motion optimization through diffusion. ar Xiv preprint ar Xiv:2209.03855, 2022. Cited on page 1.

P. Vincent. A connection between score matching and denoising autoencoders. Neural Computation, 23(7):1661 1674, 2011. Cited on page 3.

J. L. Watson, D. Juergens, N. R. Bennett, B. L. Trippe, J. Yim, H. E. Eisenach, W. Ahern, A. J. Borst, R. J. Ragotte, L. F. Milles, B. I. M. Wicky, N. Hanikel, S. J. Pellock, A. Courbet, W. Sheffler, J. Wang, P. Venkatesh, I. Sappington, S. V. Torres, A. Lauko, V. D. Bortoli, E. Mathieu, R. Barzilay, T. S. Jaakkola, F. Di Maio, M. Baek, and D. Baker. Broadly applicable and accurate protein design by integrating structure prediction networks and diffusion generative models, Dec. 2022. DOI: 10.1101/2022.12.09.519842. Cited on page 1.

J. Welty and M. Jeffries. Combined wildfire datasets for the united states and certain territories, 1878-2019: us geological survey data release, 2020. Cited on page 15.

R. J. Williams. Reflected Brownian motion with skew symmetric data in a polyhedral domain. Probability Theory and Related Fields, 75(4):459 485, Aug. 1987. ISSN: 0178-8051, 1432-2064. DOI: 10.1007/BF00320328. Cited on page 2.

J. Yim, B. L. Trippe, V. De Bortoli, E. Mathieu, A. Doucet, R. Barzilay, and T. Jaakkola. Se (3) diffusion model with application to protein backbone generation. ar Xiv preprint ar Xiv:2302.02277, 2023. Cited on page 1.

T. Yoshikawa. Manipulability of robotic mechanisms. The international journal of Robotics Research, 4(2):3 9, 1985. Cited on page 9.

Supplementary to:

Metropolis Sampling for Constrained Diffusion Models

In Appendix B, we recall some basic concepts of Riemannian geometry which are key to defining discretisations of the reflected Brownian motion. In Appendix C, we give some details on the reflection step in reflected discretizations. In Appendix D, we prove the convergence of the rejection and Metropolis discretizations to the true reflected Brownian Motion. The geospatial dataset with non-convex constraints based on wildfire incidence rates in the continental United States is presented Appendix E. All supplementary experimental details and empirical results are gathered in Appendix F.

B Manifold concepts

In the following, we aim to introduce key concepts that underpin diffusion models on Riemannian manifolds, with a particular focus on notions relevant to the reflected Brownian motion that we build on in Appendix C. For a more thorough treatment with reference to reflected diffusion models, we refer to (Fishman et al., 2023). For a detailed presentation of smooth manifolds, see Lee (2013).

A Riemannian manifold is a tuple (M, g) with M a smooth manifold and g a metric that imbues the manifold with a notion of distance and curvature and is defined as a smooth positive-definite inner product on each of the tangent spaces of the manifold:

g(p) : Tp M Tp M R.

The tangent space Tp of a point p on a manifold is an extension of the notion of tangent planes and can be thought of as the space of derivatives of scalar functions on the manifold at that point.

To establish how different tangent spaces relate to one another, we need to additionally introduce the concept of a connection. This is a map that takes two vector fields and produces a derivative of the first with respect to the second, typically written as (X, Y ) = XY . While there are infinitely many connections on any given manifold, the Levi-Cevita emerges as a natural choice if we impose the following two conditions:

(i) X (g(Y, Z)) = g( XY, Z) + g(Y, XZ),

(ii) [X, Y ] = XY Y X,

where [ , ] is the Lie bracket. These conditions ensure that the connection is (i) metric-preserving and (ii) torsion-free, with the latter guaranteeing a unique connection and integrability on the manifold.

Using the metric and Levi-Cevita connection, we can define a number of key concepts:

Geodesic. Geodesics extend the Euclidean notion of straight lines to manifolds. They are defined as the unique path γ : (0, 1) M such that γ γ = 0 and are the shortest path between two points on a manifold, in the sense that L(γ) = R 1 0 p

g(γ(t))(γ (t), γ(t))dt is minimal.

Exponential map. The exponential map on a manifold is given by the mapping between an element v Tp M of the tangent space at point p and the endpoint of the unique geodesic γ with γ(0) = p and γ (0) = v.

Intersection. The intersection along a geodesic is the first point at which the geodesic intersects the boundary. We recall that the boundary is defined by f = 0. We can define this via an optimisation problem: compute the minimum t > 0 such that we have that expg(x, tz) is a root of f: f(expg(x, tz)) = 0. We will say that expg(x, tz) = intersectg(x, z; f) and that t = arg intersectg(x, z; f).

Parallel transport. We say that a vector field X is parallel to a curve γ : (0, 1) M if γ X = 0, where γ : (0, 1) Tγ(t)M. For two points on the manifold p, q M that are connected by a curve γ, and an initial vector X0 Tp M, there is a unique vector field X that is parallel to γ such that X(p) = X0. This induces a map between the tangent spaces at p and q τγ : Tp M Tq M, which is referred to as the parallel transport of tangent vectors between p and q and satisfies the condition that for v, u Tp M g(p)(v, u) = g(q)(τγ(v), τγ(u)).

Reflection. For an element v Tp M in the tangent space of the manifold at point p and a constraint characterised by its unit normal vector n Tp M, the reflection of v in the tangent space is given by v = v 2g(v, n)n.

C Full Reflected Discretisation

Here, we reproduce the central algorithm for the full discretisation of the reflected Brownian motion (Algorithm 4) derived for Euclidean models in (Lou et al., 2023) and for Reimannian models in (Fishman et al., 2023). Its central component is the Reflected Step Algorithm (Algorithm 3), which gives a generic computation for the reflection in any manifold. Due to the need to balance speed and numerical instability issues around the boundary, an efficient practical implementation of the reflected step is highly non-trivial, even for simple polytopes in Euclidean space. More complex geometries and boundaries make this problem significantly worse: a constraint on the trace of SPD matrices under the log-Cholesky metric of (Lin, 2019) requires solving complex non-convex optimisation problems for each sample at each discretised sampling step in both the forward and reverse process. This motivates our work in this paper.

These problems motivated the development of our Metropolis approximation, which significantly simplifies the random walk. Instead of requiring the intersection, parallel transport and reflection, we simply need to be able to evaluate the constraint functions fi. We highlight this simplicity in Algorithm 5.

Algorithm 3 Reflected Step Algorithm. The algorithm operates by repeatedly taking geodesic steps until one of the constraints is violated, or until the step is fully taken. Upon hitting the boundary, we parallel-transport the tangent vector to the boundary and then reflect it against it. We then start a new geodesic from this point in the new direction. The arg intersectt function computes the distance one must travel along a geodesic in direction s until constraint fi is violated. For a discussion of paralleltransport, expg and reflect see appendix B.

Require: x M, v Tx M, {fi}i I

ℓ v g s v/ v g while ℓ 0 do

di = arg intersectg(x, z; fi) i arg mini di α min(di, ℓ) x expg(x, αs) s paralleltransportg(x, s, x ) s reflect(s, fi) ℓ ℓ α x x end while return x

Algorithm 4 Reflected Random Walk. Discretisation of the SDE d Xt = b(t, Xt)dt + d Bt dkt.

Require: T, N, Xγ 0 , {fi}i I γ = T/N for k {0, . . . , N 1} do

Zk+1 N(0, Id) Xγ k+1 = Reflected Step[Xγ k , γZk+1, {fi}i I] end for return {Xγ k }N k=0

Algorithm 5 Metropolis Random Walk. Discretisation of the SDE d Xt = b(t, Xt)dt + d Bt dkt.

Require: T, N, Xγ 0 , {fi}i I γ = T/N for k {0, . . . , N 1} do

Zk+1 N(0, Id) X expg Xγ k , γZk+1

if maxi I fi(X ) 0 then

Xγ k+1 = X else

Xγ k+1 = Xγ k end if end for return {Xγ k }N k=0

D Convergence to the reflected process

In this note, we assume that M = {x Rd : Φ(x) > 0} is compact, with Φ C2(Rd, R). We have that M = {x Rd : Φ(x) = 0}. In addition, we assume that for any x M, Φ(x) = 1 and that Φ is concave. The closure of M is denoted M. The assumption that Φ is concave is only used in Theorem 7-(d) and can be dropped. We consider it for simplicity.

Let ( ˆXγ k )k N given for any γ > 0 and k N by ˆXγ 0 = x M and for ˆXγ k+1 = ˆXγ k + γZγ k with Zγ k a Gaussian random variable conditioned on ˆXγ k + γZγ k M. In practice, Zγ k is sampled using rejection sampling. We define ˆXγ : R+ M given for any k N by ˆXγ kγ = ˆXγ k and for

any t [kγ, (k + 1)γ), ˆXγ t = ˆXγ k . Note that (Xt)t [0,T ] is a D([0, T] , M) valued random variable, where D([0, T] , M) is the space of right-continuous with left-limit processes which take values in M. We denote ˆPγ the distribution of ( ˆXγ t )t [0,T ] on D([0, T] , M).

Our goal is to show the following theorem.

Theorem 6. For any T 0, ( ˆXγ t )t [0,T ] weakly converges to (Xt)t [0,T ] such that for any t [0, T]

Xt = x + Bt kt, |k|t = R t 0 1Xs Md|k|s, kt = R t 0 n(Xs)d|k|s. (6)

Proof. In order to prove the result, we prove that the distribution of the Markov chain seen as an element of D([0, T] , M) converges to a solution of the Skorokhod problem (6). In particular, we first show that the limiting distribution satisfies a submartingale problem following (Stroock et al., 1971, Theorem 6.3). The transition from a solution of a submartingale problem to a weak solution of the Skorokhod problem is given by (Kang et al., 2017, Theorem 1, Proposition 2.12) and (Ramanan, 2006, Corollary 2.10). In order to apply (Stroock et al., 1971, Theorem 6.3), we define an intermediate drift and diffusion matrix, see (55) and (51). To prove the theorem one needs to control the drift and diffusion matrix inside M and show that it converges to 0 and Id respectively. The technical part of the proof comes from the control of the drift coefficient near the boundary. In particular, we show that if the intermediate drift is large then we are close to the boundary and the intermediate drift is pointing inward. To investigate the local properties of the drift near the boundary we rely on the notion of tubular neighborhood, see (Lee et al., 2012, Theorem 6.24).

Some key properties of the tubular neighborhood are stated in Appendix D.1. We then establish a few technical lemmas about the tail probability of some distributions in Appendix D.2. Controls on the diffusion matrix and lower bounds on the probability of belonging in M are given in Appendix D.3. Properties of large drift terms are given in Appendix D.4. The convergence of the drift and diffusion matrix on compact sets is given in Appendix D.5. The convergence of the boundary terms is investigated in Appendix D.6. Finally, we conclude the proof in Appendix D.7.

D.1 Properties of the tubular neighborhood

Using the results of (Lee et al., 2012), we establish the existence of an open set of M (for the induced topology of Rd) satisfying several important properties.

Theorem 7. There exist U M open and C 1, r > 0 such that for any γ (0, γ) with γ = 1 the following properties hold:

(a) For any x U, there exist a unique x M and α > 0 such that x = x + α Φ( x).

(b) For any α [0, r] and x M such that x + α Φ( x) M, let x = x + α Φ( x) and C(x, γ) such that x + γz C(x, γ) if

αγ 1/2 α < rγ 1/2, v 2 (αγ1/2 + α)/(Cγ), (7)

with z = α Φ( x) + v, with v Φ( x). Then C(x, γ) M.

(c) V = { x + α Φ( x) : x M, α [0, r)} is open in M.

(d) For any x U, x + γz M C(x, γ)c then α rγ 1/2 or v 2 (αγ1/2 + α)/(Cγ) and αγ1/2 + α 0, with z = α Φ( x) + v, with x given in (a) and v Φ( x). .

(e) There exists R > 0 such that {x M : d(x, M) 2R} V.

Proof. Let γ (0, γ) with γ = 1. First, note that for any x M, the normal space is given by {α Φ( x) : α R}. Using this result and (Lee et al., 2012, Theorem 6.24) there exists r0 > 0 such that U0 = { x + α Φ( x) : x M, α ( r0, r0)} Rd is open2. We have that for any α [ r0, 0) and x M

Φ( x + α Φ( x)) = Φ( x) + α Φ( x) 2 + R 1 0 2Φ( x + tα Φ( x))(α Φ( x)) 2dt (8)

α + C0α2 < 0, (9)

with r0 = min( r0, 1/(2 C0)), where we have used that Φ( x) = 0, Φ( x) = 1 and defined C0 = sup{ 2Φ( x + α Φ( x)) : x M, α [ r0, r0]}. Reciprocally, we have for any α [0, r0) and x M

Φ( x + α Φ( x)) = Φ( x) + α Φ( x) 2 + R 1 0 2Φ( x + tα Φ( x))(α Φ( x)) 2dt α C0α2, (10)

where we have used that Φ( x) = 0, Φ( x) = 1 and defined C0 = sup{ 2Φ( x + α Φ( x)) : x M, α [ r0, r0]}. Let r1 = min(r0, 1/(2C0)). Then, U1 = { x + α Φ( x) : x M, α ( r1, r1)} Rd is open and

U1 M = { x + α Φ( x) : x M, α [0, r1)}. (11)

In what follows, we define U = U1 M. Note that U is open for the induced topology and that M U. In particular, M is compact, Uc is closed and M Uc = . Hence, there exists r > 0 such that d( M, Uc) 4r. Without loss of generality we can assume that r 1/2. We also have {x M : d(x, M) 2r} U. The proof of (a) follows from the definition of U0. In the rest of the proof, we define

C1/2 = 2 max(1, sup{ 2Φ( x+u) : x M, u 2 r(r+1)}), r = min(1/(2C1/2), r/2). (12) Let us prove (b). Consider x + γz C(x, γ) with C(x, γ) given by (7) and x = x + α Φ( x) and z = α Φ( x) + v with v Φ( x). In particular, we recall that we have

αγ 1/2 α < rγ 1/2, v 2 (αγ1/2 + α)/(Cγ). (13)

This implies that α + γα 2 r, γ v 2 2 r/C. (14) First, using that C 1, Φ( x) = 1, (14) and (12), we have

x + γz x 2 = ( α + γα)2 + γ v 2 r2 + r/C r(r + 1). (15)

Then, we have that

Φ(x + γz) = Φ( x) + α + γα + R 1 0 2Φ( x + t(x + γz x))(x + γz x) 2dt (16)

α + γα (C1/2/2)(( α + γα)2 + γ v 2), (17)

where we recall that

C1/2 = 2 max(1, sup{ 2Φ( x+u) : x M, u 2 r(r+1)}), r = min(1/(2C1/2), r/2). (18) First, using that r 1/2 and (14), we have α + γα 2r 1. Since, v 2 ( α + γα)/(Cγ) and we have that v 2 < 1/(Cγ). Let P(X) = X (C1/2/2)X2 (C1/2/2)γ v 2. We have that P(x) 0 if and only if x [xmin, xmax] with

xmin = (1 (1 Cγ v 2)1/2)/C1/2, xmax = (1 + (1 Cγ v 2)1/2)/C1/2. (19)

Using that for any t (0, 1), (1 t)1/2 1 t we have that

xmin γC v 2/2, xmax 1/C1/2. (20)

2This is the tubular neighborhood theorem which is key to the rest of the proof.

Since v 2 ( γα + α)/(γC), we have that α + γα xmin. In addition, using that α + γα 2 r 1/C1/2 xmax, we get that P( α + γα) 0 and therefore x + γz M since Φ(x + γz) 0. This concludes the proof of (b). Note that the condition α γ 1/2 α is implied by the condition v 2 ( γα + α)/(γC). Using that V {x M : d(x, M) 2r} U, (c) is a direct consequence of (Lee et al., 2012, Theorem 6.24)]. Next, we prove (d). Let x + γz M C(x, γ)c. If α < αγ 1/2 then since Φ is concave, we have

Φ(x + γz) = Φ( x) + α + γα + R 1 0 2Φ( x + t(x + γz x))(x + γz x) 2dt < 0, (21)

where we have used that Φ( x) = 0. This is absurd, hence either α rγ 1/2 or v 2 (αγ1/2 + α)/(Cγ) and αγ1/2 + α 0, which concludes the proof. The proof of (e) is similar to the proof that {x M : d(x, M) 2r} U.

The main message of Theorem 7 is that using Theorem 7-(d), if we move in the direction of Φ( x) (the inward normal) with magnitude α then we are allowed to move in the orthonormal direction with magnitude α1/2. In the next paragraph, we discuss this fact in details and shows it is necessary for the rest of our study.

The necessity of Theorem 7-(b). At first sight one can wonder if the statement of Theorem 7- (b) could be simplify. In particular, it would be simpler to replace this statement with: for any α [0, r] and x M such that x + α Φ( x) M, let x = x + α Φ( x) and C(x, γ) such that x + γz C(x, γ) if

αγ 1/2 α < rγ 1/2, v 2 (αγ1/2 + α)2/(Cγ), (22)

with z = α Φ( x) + v, with v Φ( x). Then C(x, γ) M. Note that v 2 (αγ1/2 + α)/(Cγ) is replaced by v 2 (αγ1/2 + α)2/(Cγ), see Figure 8 for an illustration. However, in that case Theorem 7-(d) becomes: in addition, if x + γz M C(x, γ)c then α rγ 1/2 or v 2 (αγ1/2 + α)2/(Cγ) and αγ1/2 + α 0.

In what follows, when controlling the properties of large drift, see the proof of Proposition 18 and the proof of Proposition 21, we need to control quantities of the form P(x+ γZ C(x, γ)c M)/ γ3 Using the original Theorem 7-(d) it is possible to show that this quantity is bounded. However, if one uses the updated version of Theorem 7-(d) then one needs to show that there exists M 0 and γ > 0 such that for any γ (0, γ) (here we have assumed that α = 0, i.e. x M for simplicity) R r/γ 1/2

Φ( x) 1 v 2 α2φ(v)φ(α)dvdα M γ, (23)

which is absurd.

D.2 Technical lemmas

We start with a few technical lemmas which will allow us to control some Gaussian probabilities outside of a compact set. We denote Ψ : R+ N [0, 1] such that for any k N, Ψ( , k) is the tail probability of a χ-squared random variable with parameter k, i.e. for any k N and t 0 we have Ψ(t, k) = P( Z 2 t), (24)

with Z a Gaussian random variable in Rk with zero mean and identity covariance matrix. We will make extensive use of the following lemma which is a direct consequence of (Laurent et al., 2000, Section 4, Lemma 1). Lemma 8. For any k N and t R+ with t 5k, Ψ(t, k) exp[ t/5].

Proof. Let k N. First, note that for any x k, we have that k + 2(kx)1/2 + 2x 5x. Combining this result and (Laurent et al., 2000, Section 4, Lemma 1, Equation (4.3)), we have that for any x k P( X 2 5x) exp[ x], (25)

with X a Rk-valued Gaussian random variable with zero mean and identity covariance matrix. This concludes the proof upon letting t = 5x.

3The division by γ comes from the definition of the intermediate drift (55).

Figure 8: The grey shaded area represents M while the blue shaded area represents C(x, γ) for an arbitrary value of γ and x = x M.

Let φ : Rp R+ given for any u R by φ(u) = (2π) p/2 exp[ u 2/2]4, i.e. the density of a real Gaussian random variable with zero mean and unit variance. While Lemma 9 appears technical, it will be central to provide quantitative upper bounds on the rejection probability, see Lemma 12 for instance. Lemma 9. For any k N, α0 > 0, β0 (0, 1] and δ > 0 we have

ψ(δ) = sup{ R + 0 Ψ(α0t/δ, k)β0φ(t t0/δ)dt : t0 0} C0δ, (26)

with C0 = 5(2π) 1/2(k + 1)/(α0β0).

Proof. Let k N, α0 > 0, β0 (0, 1] and δ > 0. Let tδ = 5kδ/α0. Note that if t tδ then, α0t/δ 5k. In addition, we have R + 0 Ψ(α0t/δ, k)β0φ(t t0/δ)dt (2π) 1/2 R + 0 Ψ(α0t/δ, k)β0dt (27)

(2π) 1/2 R tδ 0 Ψ(α0t/δ, k)β0dt + (2π) 1/2 R + tδ Ψ(α0t/δ, k)β0dt. (28)

Using that for any w > 0, R + 0 exp[ wt]dt (1/w), that for any u 0, Ψ(u, k) 1 and that if u 5k, Ψ(u, k) exp[ u/5], we get for any t0 0 R + 0 Ψ(α0t/δ, k)φ(t t0/δ) (2π) 1/2[5kδ/α0 + 5δ/(α0β0)] (5(2π) 1/2(k + 1)/(α0β0))δ, (29) which concludes the proof.

Finally, we have the following lemma, which is similar to Lemma 8 but will be used to control quantities related to the norm. Lemma 10. For any k N, α0 > 0, β0 (0, 1] and δ > 0 we have

ψ(δ) = R + 0 Ψ(α0t/δ, k)β0tφ(t)dt C0δ2, (30)

with C0 = 25(2π) 1(k2 + 1)/(α0β0)2.

Proof. Let k N, α0 > 0, β0 (0, 1] and δ > 0. Let tδ = 5kδ/α0. Note that if t tδ then, α0t/δ 5k. In addition, we have R + 0 Ψ(α0t/δ, k)β0tφ(t)dt (2π) 1R tδ 0 Ψ(α0t/δ, k)β0tdt + (2π) 1R + tδ Ψ(α0t/δ, k)β0tdt. (31)

4In the rest of the supplementary, we never precise the dimension p N which can be deduced from the variable.

In addition, using that if u 5k then Ψ(u, k) exp[ u/5], we get

(2π) 1 R + tδ Ψ(α0t/δ, k)β0tdt (2π) 1 R + 0 exp[ α0β0t/(5δ)]tdt (2π) 125δ2/(α0β0)2. (32)

Finally, using that for any u 0, Ψ(u, k) 1, we have

(2π) 1R tδ 0 Ψ(α0t/δ, k)β0tdt (2π) 125k2δ2/α2 0, (33)

which concludes the proof.

D.3 Lower bound on the inside probability and control of moments of order two and higher

Lower bound on the inside probability. We begin with the following lemma which controls the expectation of 1 + Z outside of C(x, γ). We recall that V is defined in Theorem 7-(c). Lemma 11. Let γ = 1. Let x V, Z N(0, Id) and γ (0, γ) then we have

max(E[1x+ γZ M C(x,γ)c], E[ Z, Φ( x) 1x+ γZ M C(x,γ)c]) ψ(γ), (34)

with ψ : R+ R+ such that lim supt 0 ψ(t)/t1/2 < + .

Proof. Let r > 0 given by Theorem 7. First, we have that R

Rd 1(1 + |α| + v )1α r/γ1/2φ(α)φ(v)dαdv (35)

R(1 + |α|)1α r/γ1/2φ(α)dα d(Ψ( r2/γ, 1) + exp[ r2/(2γ)]). (36)

Second, using Lemma 9, we have that R

Rd 1 1 v 2 ( α+ γα)/(Cγ)1 α+ γα 0φ(α)φ(v)dαdv (37)

R 1 α+ γα 0Ψ(( α + γα)/(Cγ), d 1)φ(α)dα (38)

R + 0 Ψ(α/Cγ1/2, d 1)φ(α α/γ1/2)dα Ψ1(γ1/2). (39)

Second, using Lemma 10, we have that R

Rd 1 α1 v 2 ( α+ γα)/(Cγ)1 α+ γα 0φ(α)φ(v)dαdv (40)

R αΨ(( α + γα)/(Cγ), d 1)1 α+ γα 0φ(α)dα (41)

R + 0 Ψ(α/Cγ1/2, d 1)αφ(α)dα Ψ2(γ1/2). (42)

Note that we have lim supγ 0 Ψ2(γ1/2) + Ψ1(γ1/2) < + . We conclude upon combining (36), (39) and (42) with Theorem 7-(d) and the fact that Φ( x) = 1.

The following lemma allow us to give a lower bound to the quantity E[1x+ γZ M] uniformly w.r.t

x M. Lemma 12. There exists γ > 0 such that for any γ (0, γ) and for any x M, γ (0, γ) and Z N(0, Id) we have E[1x+ γZ M] 1/4 . (43)

Proof. Let γ (0, γ). If x V then B(x, 2R) M using Theorem 7-(e) and therefore E[1x+ γZ M] 1/4 for γ > 0 small enough. Now, assume that x V. Using Lemma 11, we have that E[1x+ γZ M C(x,γ)c] ψ(γ). In addition, using Theorem 7-(b), we have that for any γ > 0

E[1x+ γZ M] E[1x+ γZ C(x,γ)] (44)

Φ( x) 1 v 2 ( α+γ1/2α)/(Cγ)φ(α)φ(v)dαdv (45)

αγ 1/2(1 Ψ(( α + γ1/2α)/(Cγ), d 1))φ(α)dα (46)

(1/2) Ψ(r2/γ, 1) R + αγ 1/2 Ψ(( α + γ1/2α)/(Cγ), d 1)φ(α)dα. (47)

Hence, using Lemma 8 and Lemma 9, there exists γ > 0 such that for any γ (0, γ), Ψ(r2/γ, 1) + R + 0 Ψ(α/(Cγ1/2), d)φ(α γ1/2 α)dα 1/4, which concludes the proof.

Note that the result of Lemma 12 can be improved to 1/2 ε for any ε > 0. In particular this result tells us that for γ > 0 small enough, M looks like the hyperplane from the point of view of the Gaussian with variance γ centered on M.

Bound on moments of order two and higher. In what follows, we define for any γ > 0, γ : M R+ given for any x M by

γ(x) = (1/γ) R

Rd 1x+ γz M γz 4φ(z)dz/ R

Rd 1x+ γz Mφ(z)dz. (48)

Proposition 13. We have limγ 0 sup{ γ(x) : x M} = 0.

Proof. Let γ > 0 given by Lemma 12. Let x M and γ (0, γ). We have using Lemma 12

Rd 1x+ γz Mφ(z)dz 1/4. (49)

We also have that (1/γ) R

Rd 1x+ γz M γz 4φ(z)dz 3γd2. (50)

Therefore, we get that for any γ (0, γ), γ(x) 12γd2, which concludes the proof.

In what follows, we define for any γ > 0, ˆΣγ : M S+ d (R) given for any x M by

Rd 1x+ γz Mz zφ(z)dz/ R

Rd 1x+ γz Mφ(z)dz. (51)

Proposition 14. There exists γ > 0 such that for any x M and γ (0, γ) we have

ˆΣγ(x) 4d. (52)

Proof. Let x M and γ > 0 given by Lemma 12. For any γ (0, γ), we have using Lemma 12

Rd 1x+ γz Mφ(z)dz 1/4. (53)

We also have that R

Rd 1x+ γz M z 2φ(z)dz d, (54) which concludes the proof.

D.4 Properties of large drift terms

Finally, we define for any γ > 0, ˆbγ : M Rd given for any x M by

ˆbγ(x) = γ 1/2 R

Rd 1x+ γz Mzφ(z)dz/ R

Rd 1x+ γz Mφ(z)dz. (55)

First, we show away from the boundary the drift ˆbγ converges to zero.

Proposition 15. There exists γ > 0 such that for any γ (0, γ), r > 0 and x M such that d(x, M) r we have ˆbγ(x) 2dΨ(r/γ, d)1/2/γ1/2.

Proof. Let x M and γ > 0 given by Lemma 12. For any γ (0, γ) we have using Lemma 12

Rd 1x+ γz Mφ(z)dz 1/4. (56)

We also have that

Rd 1x+ γz Mzφ(z)dz R

Rd 1 z r/γ1/2zφ(z)dz + R

Rd 1 z r/γ1/2 z φ(z)dz (57)

Rd 1 z r/γ1/2 z φ(z)dz 2dΨ(r/γ, d)1/2/γ1/2, (58)

which concludes the proof.

We have the following corollary. Corollary 16. There exists γ > 0 such that for any δ > 0 there exists Mδ > 0 such that for any γ (0, γ) and x M, ˆbγ(x) Mδ, then Φ(x) δ.

Proof. Let γ > 0 given by Lemma 12. Let f : R+ R+ given for any r > 0 by f(r) = sup{γ > 0 : Ψ(r/γ, 1)1/2/γ1/2}. We have that f is non-increasing and limr 0 f(r) = + . Let δ > 0 and Mδ = 2df(δ/C) with C = sup{ Φ(x) : x M}. Let γ (0, γ) and x M such that ˆbγ(x) Mδ then using Proposition 15 we have that d(x, M) δ/C. Let x M such that x x = d(x, M). We have

Φ(x) Φ( x) + R 1 0 Φ( x + t(x x)), x x dt δ, (59)

which concludes the proof.

For ease of notation, for any γ > 0, we define bγ = γ1/2ˆbγ, the renormalized version of the drift. First, we have the following result which will ensure that the drift projected on the normal component does not vanish.

Lemma 17. There exists γ > 0 such that for any γ (0, γ) and x V we have

bγ(x), Φ( x) bγ(x) ψ(γ), (60)

with ψ : R+ R+ such that lim supγ 0 ψ(γ)/ γ < + .

Proof. Let x M and γ > 0 given by Lemma 12. For any γ (0, γ) we have using Lemma 12

Rd 1x+ γz Mφ(z)dz 1/4. (61)

In addition, we have R

Rd 1x+ γz M z, Φ( x) φ(z)dz R

Rd 1x+ γz C(x,γ) z, Φ( x) φ(z)dz (62)

Rd 1x+ γz M C(x,γ)c z, Φ( x) φ(z). (63)

Using Lemma 11, we get that R

Rd 1x+ γz M z, Φ( x) φ(z)dz R

Rd 1x+ γz C(x,γ) z, Φ( x) φ(z)dz ψ(γ). (64)

Let {ei}d 1 i=1 a basis of Φ( x) . Using Theorem 7-(b), we have that for any i {1, . . . , d 1}

Rd 1x+ γz C(x,γ) z, ei φ(z)dz= R r/γ1/2

Φ( x) 1 v 2 (γ1/2α+ α)/γ v, ei φ(v)φ(α)dvdα. (65)

Hence, combining this result and the Cauchy-Schwarz inequality we have for any i {1, . . . , d 1}

Rd 1x+ γz C(x,γ) z, ei φ(z)dz)2 = ( R r/γ1/2

Φ( x) 1 v 2 (γ1/2α+ α)/γ v, ei φ(v)φ(α)dvdα)2

Φ( x) v, ei 2φ(v)dv( R r/γ1/2

α/γ1/2 Ψ(( α + αγ1/2)/γ, d 1)1/2φ(α)dα)2 (67)

α/γ1/2 Ψ(( α + αγ1/2)/γ, d 1)1/2φ(α)dα)2. (68)

Hence, using Lemma 9, we get that Pd 1 i=1 ( R

Rd 1x+ γz C(x,γ) z, ei φ(z)dz)2 (d 1)ψ2(γ), (69)

with ψ given by Lemma 9 with β0 = 1/2. Therefore, we get that

Rd 1x+ γz C(x,γ) z, Φ( z) φ(z)dz)2 (70)

Rd 1x+ γz Mφ(z)dz)2 bγ(x) 2 Pd 1 i=1 ( R

Rd 1x+ γz C(x,γ) z, ei φ(z)dz)2 (71)

Rd 1x+ γz Mφ(z)dz)2 bγ(x) 2 ψ(γ)2. (72)

We conclude the proof upon using that for any a, b 0, (a + b)1/2 a1/2 + b1/2 and (61).

We are now ready to state the following lower bound on the drift.

Proposition 18. There exist γ > 0, M 0 and c > 0 such that for any x M and γ (0, γ) if ˆbγ(x) M then x V and

min( ˆbγ(x), Φ(x) , ˆbγ(x), Φ( x) ) c ˆbγ(x) . (73)

Proof. Let γ > 0 given by Lemma 12 and M0 = 4 sup{ψ(γ)/γ1/2 : γ (0, γ]}. In addition, let c = 1/4. Using Proposition 15 and Theorem 7-(e), there exists M1 0 such that for any any x M, if ˆbγ(x) M1 then x V and x = x + α Φ( x) with α 1/(4C) and C = sup{ 2Φ(x) : x M}. We denote M = max(M0, M1). Let γ (0, γ) and x M such that ˆbγ(x) M. Using Lemma 17, we have that

ˆbγ(x), Φ( x) ˆbγ(x) ψ(γ)/γ1/2. (74)

Using that ψ(γ)/γ1/2 M/2 ˆbγ(x) /2, we have

ˆbγ(x), Φ( x) (1/2) ˆbγ(x) . (75)

Since x x α 1/(4C) we have ˆbγ(x), Φ(x) (1/2 Cα) ˆbγ(x) ˆbγ(x) /4, which concludes the proof.

D.5 Convergence on compact sets

In this section, we show the convergence of the drift and diffusion matrix on compact sets. We recall that M does not include its boundary M.

Proposition 19. For any compact set K M and ε > 0, there exists γ > 0 such that for any γ (0, γ) we have for any x K

ˆbγ(x) ε, ˆΣγ(x) Id ε. (76)

Proof. Let K M be a compact set and γ > 0. Since K M = , there exists r > 0 such that for any x K, d(x, M) > r. Therefore, we have that for any x K

ˆbγ(x) = γ 1/2 R

x+ γz M zφ(z)dz / R

x+ γz M φ(z)dz. (77)

In addition, using the Cauchy-Schwarz inequality we have

x+ γz M zφ(z)dz R

Rd zφ(z)dz + R

Mc z φ(z)dz (78)

Rd 1 z r/γ1/2 z φ(z)dz

dΨ(r2/γ, d)1/2. (79)

Using Lemma 8 and Lemma 12, there exists γ0 > 0 such that for any γ (0, γ0) we have that for any x K

ˆbγ(x) 4dΨ(r2/γ, 1)1/2/γ1/2 ε, (80)

which concludes the first part of the proof. Similarly, we have that for any x K

x+ γz M(z z Id)φ(z)dz R

Rd(z z Id)φ(z)dz + R

Mc z φ(z)dz (81)

Rd 1 z r/γ1/2 z z Id φ(z)dz (82)

2(1 + 3d2)1/2Ψ(r2/γ, d)1/2. (83)

Using Lemma 8 and Lemma 12, there exists γ1 > 0 such that for any γ (0, γ1), we have that for any x K

ˆΣγ(x) Id 4

2(1 + 3d2)1/2Ψ(r2/γ, 1)1/2 ε, (84)

which concludes the proof upon letting γ = min( γ0, γ1).

D.6 Convergence on the boundary

Finally, we investigate the behavior at the boundary of the diffusion matrix and the drift. First, we show that there is a lower bound to the diffusion matrix near the boundary. Second, we show that the renormalized drift converges to the outward normal. Proposition 20. There exist c > 0 and γ > 0 such that for any γ (0, γ), u Rd and x V we have u, ˆΣγ(x)u c u 2. (85)

In particular, there exist r, ε > 0 such that for any γ (0, γ) and x M with d(x, M) r

Φ(x), ˆΣγ(x) Φ(x) ε. (86)

Proof. First, we show (85). Let x V. We have for any u Rd

u, ˆΣγ(x)u = R

Rd 1x+ γz M z, u 2φ(z)dz/ R

Rd 1x+ γz Mdz (87)

Rd 1x+ γz C(x,γ) z, u 2φ(z)dz. (88)

For any u Rd, let αu = u, Φ( x) . Using Theorem 7-(b) we have for any u Rd R

Rd 1x+ γz C(x,γ) z, u 2φ(z)dz (89)

Φ( x) ( u, v + ααu)21 v 2 (αγ1/2+ α)/γφ(v)φ(α)dvdα (90)

Φ( x) ( u, v 2 + α2α2 u)1 v 2 (αγ1/2+ α)/γφ(v)φ(α)dvdα (91)

α2 u R r/γ1/2

0 α2φ(α)dα + R r/γ1/2

Φ( x) u, v 21 v 2 (αγ1/2+ α)/γφ(v)φ(α)dvdα. (92)

Using Cauchy-Schwarz inequality, we have R r/γ1/2

0 α2φ(α)dα = (1/2) R + r/γ1/2 α2φ(α)dα (1/2) 3Φ(r2/γ, 1)1/2. (93)

In addition, using the Cauchy-Schwarz inequality, we have that R r/γ1/2

Φ( x) u, v 21 v 2 (αγ1/2+ α)/γφ(v)φ(α)dvdα (94)

Φ( x) u, v 2φ(v)dv R r/γ1/2

α/γ1/2 φ(α)dα (95)

Φ( x) u, v 21 v 2 (αγ1/2+ α)/γφ(v)φ(α)dvdα (96)

( u 2 α2 u)((1/2) Φ(r2/γ, 1)) (97)

3(d 1) u 2 R + 0 Φ(α/γ1/2, d 1)1/2φ(α α/γ1/2)dα. (98)

Combining this result, (93), (92) and Lemma 9 there exists γ > 0 such that for any γ (0, γ] and u Rd R

Rd 1x+ γz C(x,γ) z, u 2φ(z)dz (1/4) u 2, (99)

which concludes the proof of (85). Finally, using Theorem 7-(e), we have that for any x M if d(x, M) R then x V. Let r = min(R, 1/(2C)) with C = sup{ 2Φ(x) : x M}. We have that for any x M such that d(x, M) r

Φ(x) Φ( x0) Cr 1/2, (100)

where x0 is such that x x0 r and x0 M. Combining this result and (99) concludes the proof upon letting ε = 1/16.

Finally, we investigate the behavior of the normalized drift near the boundary.

Proposition 21. For any x0 M and ε > 0, there exist γ, r, M > 0 such that for any x M and γ (0, γ) with x x0 r and ˆbγ(x) M

ˆbγ(x)/ ˆbγ(x), Φ(x) Φ( x0) ε. (101)

Proof. Let γ be given by Proposition 18. Let ψ given by Lemma 9 and M0 = sup{ψ(γ)/γ1/2 : γ (0, γ)} < + . Let M = 16M0/(cε1/2) with c given in Proposition 18. Let R > 0 given by Theorem 7-(e) such that for any x M with d(x, M) there exist x M and α [0, cε/(4C)] such that x = x + α Φ( x) with C = sup{ 2Φ(x) : x M} and c given in Proposition 18. Let r = min( r, cε/4, R) and x M with x x0 r. First, since d(x, M) R, there exist x M and α [0, ε/(4C)] such that x = x+α Φ( x). Therefore, we get that x x0 ε/(2C) and therefore Φ( x0) Φ( x) ε/2. In addition, we have that

ˆbγ(x)/ ˆbγ(x), Φ(x) ˆbγ(x)/ ˆbγ(x), Φ( x) (102)

ˆbγ(x) 2 Φ(x) Φ( x) /( ˆbγ(x), Φ(x) ˆbγ(x), Φ( x) ). (103)

Using Proposition 18, we get that

ˆbγ(x)/ ˆbγ(x), Φ(x) ˆbγ(x)/ ˆbγ(x), Φ( x) ε/4. (104)

In what follows, we show that

ˆbγ(x)/ ˆbγ(x), Φ( x) Φ( x) 2 ε/2. (105)

In particular, we show that for any u Φ( x) with u = 1,

ˆbγ(x), u 2 (ε/16) ˆbγ(x), Φ( x) 2. (106)

Assuming (106), letting u = (ˆbγ(x) ˆbγ(x), Φ( x ))/( ˆbγ(x) 2 ˆbγ(x), Φ( x 2)1/2 and using that ˆbγ(x) = ˆbγ(x), u u + ˆbγ(x), Φ( x) Φ( x) we have

ˆbγ(x)/ ˆbγ(x), Φ(x) Φ( x) ˆbγ(x)/ ˆbγ(x), Φ( x) Φ( x) (107)

+ ˆbγ(x)/ ˆbγ(x), Φ(x) ˆbγ(x)/ ˆbγ(x), Φ( x) (108)

| ˆbγ(x), u / ˆbγ(x), Φ( x) | + ε/4 ε/2, (109)

which concludes the proof. Let u Φ( x) with u = 1 and {ei}d 1 i=1 an orthonormal basis of Φ( x) . There exist {ai}d 1 i=1 such that Pd 1 i=1 a2 i = 1 and u = Pd 1 i=1 aiei. Using Theorem 7-(b), we have that for any i {1, . . . , d 1}

Rd 1x+ γz C(x,γ) z, ei φ(z)dz= R r/γ1/2

Φ( x) 1 v 2 (γ1/2α+ α)/γ v, ei φ(v)φ(α)dvdα (110)

Φ( x) 1 v 2 (γ1/2α+ α)/γ v, ei φ(v)φ(α)dvdα (111)

Hence, combining this result and the Cauchy-Schwarz inequality we have for any i {1, . . . , d 1}

Rd 1x+ γz C(x,γ) z, ei φ(z)dz)2 = ( R r/γ1/2

Φ( x) 1 v 2 (γ1/2α+ α)/γ v, ei φ(v)φ(α)dvdα)2

Φ( x) v, ei 2φ(v)dv( R r/γ1/2

α/γ1/2 Ψ(( α + αγ1/2)/γ, d 1)1/2φ(α)dα)2. (113)

Hence, we get that Pd 1 i=1 a2 i ( R

Rd 1x+ γz C(x,γ) z, ei φ(z)dz)2 u 2ψ2(γ), (114)

with ψ given by Lemma 9. Recalling that ˆbγ(x) M we have

ˆbγ(x), u 2 16ψ(γ)2/γ c2(ε/16)M 2 (ε/16) ˆbγ(x), Φ( x) 2, (115)

which concludes the proof.

D.7 Submartingale problem and weak solution

We are now ready to conclude the proof.

Theorem 22. There exists P a distribution on D([0, T] , M) such that limγ 0 ˆPγ = P . In addition, for any f C1,2([0, T] M, R) with Φ( x), f(x) 0 for any t [0, T] and x M, we have that the process (f(t, ω(t)))t [0,T ] given for any t [0, T]

f(t, ω(t)) R t 0( sf(s, ω(s) + 1

2 f(s, ω(s)))1M(ω(s))ds, (116)

is a P submartingale.

Proof. Condition (A) (Stroock et al., 1971, p.197) is a consequence of Proposition 13. Condition (B) (Stroock et al., 1971, p.197) is a consequence of Proposition 18. Condition (C) (Stroock et al., 1971, p.198) is a consequence of Corollary 16. Condition (D) (Stroock et al., 1971, p.198) is a consequence of Proposition 14. We fix ρ = 0 and condition (1) (Stroock et al., 1971, p.203) is a consequence of Proposition 19. Condition (2)-(iii) (Stroock et al., 1971, p.203) is a consequence of Proposition 20. Condition (2)-(iv) (Stroock et al., 1971, p.203) is a consequence of Proposition 21. We conclude upon using (Stroock et al., 1971, Theorem 6.3) and (Stroock et al., 1971, Theorem 5.8).

We finally conclude the proof of Theorem 6 upon using the results of (Kang et al., 2017) which establish the link between a weak solution to the reflected SDE and the solution to a submartingale problem.

Theorem 23. For any T 0, ( ˆXγ t )t [0,T ] weakly converges to (Xt)t [0,T ] such that for any t [0, T]

Xt = x + Bt kt, |k|t = R t 0 1Xs Md|k|s, kt = R t 0 n(Xs)d|k|s. (117)

Proof. Using Theorem 22 and (Kang et al., 2017, Theorem 1, Proposition 2.12), we have that P in Theorem 22is associated with a solution to the extended Skorokhod problem. We conclude that a solution to the extended Skorokhod problem is a solution to the Skorokhod problem using (Ramanan, 2006, Corollary 2.10).

D.8 Extension to the Metropolis process

We recall that the Metropolis process is defined as follows. Let (Xγ k )k N given for any γ > 0 and k N by Xγ 0 = x M and for Xγ k+1 = Xγ k + γZk if Xγ k + γZγ k M and Xγ k otherwise, Zk N(0, Id). We recall that ˆbγ, ˆΣγ and ˆ γ are given by (48), (51) and (55). In particular, denoting ˆKγ the Markov kernel associated with ( ˆXγ k )k N, i.e. ˆKγ : M B(M) [0, 1] such that for any x M, ˆKγ(x, ) is a probability measure, for any A B(M), ˆKγ( , A) is a measurable function and E[1A( ˆXγ 1 ) | ˆXγ 0 = x] = ˆKγ(x, A). We have that for any γ > 0 and x M

ˆbγ(x) = (1/γ) R

M(y x)ˆKγ(x, dy), (118) ˆΣγ(x) = (1/γ) R

M(y x) 2 ˆKγ(x, dy), (119) ˆ γ(x) = (1/γ) R

M y x 4 ˆKγ(x, dy). (120)

In what follows, we denote aγ(x) = E[1x+ γZ0 M]. Denote Kγ the kernel associated with (Xγ k )k N. We have that for any A B(M), γ > 0 and x M

Kγ(x, A) = E[1Xγ k+1 A1x+ γZk+1 M] + (1 aγ(x))1A(x) (121)

= aγ(x)ˆKγ(x, A) + (1 aγ(x))1A(x). (122)

We define for any γ > 0 and x M

bγ(x) = (1/γ) R

M(y x)Kγ(x, dy), (123)

Σγ(x) = (1/γ) R

M(y x) 2Kγ(x, dy), (124)

γ(x) = (1/γ) R

M y x 4Kγ(x, dy). (125)

Using (122), we get that for any γ > 0 and x M

bγ(x) = aγ(x)ˆbγ(x), Σγ(x) = aγ(x)ˆΣγ(x), γ(x) = aγ(x) ˆ γ(x). (126)

Using Lemma 12, we have that for any γ (0, γ) and x M, aγ(x) 1/4.

In order to conclude for the convergence of the Metropolis process we adapt Theorem 22 and Theorem 23. We define Xγ : R+ M given for any k N by Xγ kγ = Xγ k and for any t [kγ, (k + 1)γ), Xγ t = Xγ k . Note that (Xt)t [0,T ] is a D([0, T] , M) valued random variable, where D([0, T] , M) is the space of right-continuous with left-limit processes which take values in M. We denote Pγ the distribution of (Xt)t [0,T ] on D([0, T] , M).

Theorem 24. There exists P a distribution on D([0, T] , M) such that limγ 0 Pγ = P . In addition, for any f C1,2([0, T] M, R) with Φ( x), f(x) 0 for any t [0, T] and x M, we have that the process (f(t, ω(t)))t [0,T ] given for any t [0, T]

f(t, ω(t)) R t 0( sf(s, ω(s) + 1

2 f(s, ω(s)))1M(ω(s))ds, (127)

is a P submartingale.

Proof. Condition (A) (Stroock et al., 1971, p.197) is a consequence of Proposition 13 and (126). Condition (B) (Stroock et al., 1971, p.197) is a consequence of Proposition 18 and (126). Condition (C) (Stroock et al., 1971, p.198) is a consequence of Corollary 16 and (126). Condition (D) (Stroock et al., 1971, p.198) is a consequence of Proposition 14 and (126). We fix ρ = 0 and condition (1) (Stroock et al., 1971, p.203) is a consequence of Proposition 19 and that limγ 0 aγ = 1 uniformly on compact subsets K M. Condition (2)-(iii) (Stroock et al., 1971, p.203) is a consequence of Proposition 20 and (126). Condition (2)-(iv) (Stroock et al., 1971, p.203) is a consequence of Proposition 21 and (126). We conclude upon using (Stroock et al., 1971, Theorem 6.3) and (Stroock et al., 1971, Theorem 5.8).

Theorem 25. For any T 0, (Xγ t )t [0,T ] weakly converges to (Xt)t [0,T ] such that for any t [0, T]

Xt = x + Bt kt, |k|t = R t 0 1Xs Md|k|s, kt = R t 0 n(Xs)d|k|s. (128)

Proof. The proof is identical to Theorem 23.

E Modelling geospatial data within non-convex boundaries

To demonstrate the ability of the proposed method to model distributions whose support is restricted to manifolds with highly non-convex boundaries, we derived a geospatial dataset based on the historical wildfire incidence rate within the continental United States (described in in Appendix E.1) and, using the corresponding country borders, trained a constrained diffusion model by adapting the point-in-spherical-polytope conditions outlined in (Ketzner et al., 2022) (described in Appendix E.2).

E.1 Derivation of bounded geospatial dataset

Specifically, we retrieved the rasterised version of the wildfire data provided by Welty et al. (2020), converted it to a spherical geodetic coordinate system using the CARTOPY library (Met Office, 2010 - 2015), and drew a weighted subsample of size 1 106. We then retrieved the country borders of the continental United States from (Natural Earth, 2023) and mapped them to the same geodetic reference frame as the wildfire data. A visualization of the resulting dataset is presented in Figure 9.

E.2 Point-in-spherical-polytope algorithms

The support of the data-generating distribution we aim to approximate is thus restricted to a highly non-convex spherical polytope P S2 given by the country borders of the continental United States. To determine whether a query point q S2 is within P, we adapt an efficient reformulation of the point-in-spherical-polygon algorithm (Bevis et al., 1989) presented in (Ketzner et al., 2022). The algorithm requires the provision of a reference point r S2 known to be located in P and determines

Figure 9: Orthographic projection of the wildfire dataset described in Appendix E. The projection is aligned with the centroid of the continental United States and zoomed in ten-fold for visual clarity. All visualisations of geospatial data were generated using the GEOVIEWS (Rudiger et al., 2023) and DATASHADER (Bednar et al., 2023) libraries.

whether q is inside or outside the polygon by checking whether the geodesic between r and q crosses the polygon an even or odd number of times. Letting ˆx R3 denote the Cartesian coordinates of a point x S2, (Ketzner et al., 2022) rely on a Cartesian reference coordinate system ˆQ (with its z-axis given by ˆr) and the corresponding spherical coordinate system Q to decompose the edge-crossing condition of Bevis et al. (1989) into two efficiently computable parts. That is, the geodesic between q and r crosses an edge ei = (vi, vj) of the polygon if:

(i) the longitude of q in Q is bounded by the longitudes of vi and vj in Q, i.e.

ϕQ(q) [min(ϕQ(vi), ϕQ(vj)), max(ϕQ(vi), ϕQ(vj))],

(ii) the plane specified by the normal vector ˆpi = ˆvi ˆvj represents an equator that separates q and r into two different hemispheres, i.e.

sign( ˆpi, ˆr ˆpi, ˆq ) = 1.

Especially when P is fixed and the corresponding coordinate transformations and normal vectors can be precomputed for each edge, this algorithm affords an efficient and parallelisable approach to determining whether any given point on S2 is contained by a spherical polytope.

F Supplementary Experimental Results

F.1 Evaluating log-barrier and Euclidean models

Following (Fishman et al., 2023), we approached the empirical evaluation of our Metropolis model by computing the maximum mean discrepancy (MMD) (Gretton et al., 2012) between samples from the true distribution and the trained diffusion models. The MMD is a statistic that quantifies the similarity of two samples by computing the distance of their respective mean embeddings in a reproducing kernel Hilbert space. For this, we use an RBF kernel with the same length scales as the standard deviations of the normal distributions used to generate the synthetic distribution. We sum these RBF kernels by the weights of the corresponding components of the synthetic Gaussian mixture model.

This is essential to be able to include the log-barrier in the comparison since the log-barrier methods suffer severe instabilities around the boundary, as the space is stretched to more and more. These instabilities cause the problems in fitting the log-barrier model and in computing the likelihood using the log-barrier model.

Table 5: Maximum mean discrepancy (MMD) ( ) of a held-out test set from a synthetic bimodal distribution over convex subsets of Rd bounded by the hypercube [ 1, 1]d and unit simplex d. Means and standard deviations are computed over 3 different runs.

Manifold Dimension Process MMD % in Manifold mean std mean

Euclidean 0.027 0.011 0.969 Log-Barrier 0.050 0.012. 1.000 Reflected 0.041 0.008 1.000 Rejection 0.030 0.002 1.000

Euclidean 0.032 0.015 0.969 Log-Barrier 0.238 0.009 1.000 Reflected 0.179 0.013 1.000 Rejection 0.111 0.002 1.000

Euclidean 0.028 0.001 0.946 Log-Barrier 0.275 0.0015 1.000 Reflected 0.233 0.004 1.000 Rejection 0.226 0.005 1.000

Euclidean 0.069 0.004 0.992 Log-Barrier 0.66 0.006 1.000 Reflected 0.048 0.012 1.000 Rejection 0.025 0.005 1.000

Euclidean 0.074 0.004 0.991 Log-Barrier 0.209 0.0077 1.000 Reflected 0.085 0.006 1.000 Rejection 0.049 0.006 1.000

Euclidean 0.086 0.007 0.968 Log-Barrier 0.330 0.004 1.000 Reflected 0.314 0.049 1.000 Rejection 0.138 0.007 1.000

From the results in Table 5, it is clear that the log-barrier approach performs significantly worse than the Reflected model and the Metropolis models across all settings. This, in conjunction with numerical instabilities we encountered when attempting to evaluate sample likelihoods with the log-barrier models as presented in (Fishman et al., 2023), motivated us to focus on the Reflected and Metropolis models in the main text.

Additionally, we note that the unconstrained Euclidean models outperform the constrained methods on both the simplex and the hypercube as the dimensionality of the problem space increases. Especially on the simplex, we attribute this performance primarily to the fact that the synthetic distribution is simply a standard Normal with only a small portion close to the boundary. The amount of reflection needed to model the distribution decreases in higher dimensions, as the mass of the Normal distribution gets increasingly concentrated which Euclidean diffusion models will fit well. This same dynamic is partially responsible for the hypercube performance.

F.2 Implementational details

All source code that is needed to reproduce the results presented below is made available under https://github.com/oxcsml/score-sde/tree/metropolis, which requires a supporting package to handle the different geometries that is available under https://github.com/oxcsml/geomstats/tree/polytope.

We use the same architecture in all of our experiments: a 6-layer MLP with 512 hidden units and sine activation functions, except in the output layer, which uses a linear activation function. Following (Fishman et al., 2023), we implement a simple linear function that scales the score by the distance to the boundary, approaching zero within ϵ = 0.01 of the boundary. This ensures the score obeys the Neumann boundary conditions required by the reflected Brownian Motion. For the geospatial dataset within non-convex country borders, we do not use distance rescaling. Instead, we substitute

it with a series of step functions to rescale the score. This is a proof-of-concept to show that even when computing the distance is hard, simple and efficient approximations suffice. When constructing Riemannian diffusion models on the torus and sphere for the protein and geospatial datasets, we follow (De Bortoli et al., 2022) and include an additional preconditioner for the score on the manifold. We do not use the residual trick or the standard deviation trick, which are both common score-rescaling functions in image model architectures; in our setting, we find that they adversely affect model training.

For the forward/reverse process we always set T = 1, β0 = 1 10 3 and then tune β1 to ensure that the forward process just reaches the invariant distribution with a linear β-schedule. At sampling time we use N = 100 steps of the discretised process. We discretise the training process by selecting a random N between 0 and 100 for each example, rolling out to that time point. This lets us cheaply implement a simple variance reduction technique: we take multiple samples from this trajectory by selecting multiple random N to save for each example. This technique was originally described in (Fishman et al., 2023) and we find it is also helpful for our Metropolis models. For all experiments, we use the ism loss with a modified weighting function of (1 + t), which we found to be essential to model training. All experiments use a batch size of 256 with 8 repeats per batch. For training, we use a learning rate of 2 10 4 with a cosine learning rate schedule. We trained for 100,000 batches on the synthetic examples and 300,000 batches on the real-world examples (robotics, proteins, wildfires).

We selected these hyperparameters from a systematic search over learning rates (6 10 4, 2 10 4, 6 10 5, 2 10 5), learning rate schedules (cosine, log-linear), and batch sizes (128, 256, 512, 1024) on synthetic examples for the reflected and log-barrier models. Similar parameters worked well for both, and we used those for our Metropolis models to allow a straightforward comparison. We tried N = 100, 1000 for several synthetic examples but found that very large rollout times actually hurt performance for the Metropolis model, though the log-barrier performed a bit better with longer rollouts and the reflected was the same.

All models were trained on a single NVIDIA Ge Force GTX 1080 GPU. All of the Metropolis models presented here can easily be trained on this hardware in under 4 hours. The runtime for the log-barrier and reflected models is considerably longer.

F.3 Synthetic Distributions on Constrained Manifolds of Increasing Dimensionality

(a) Data distribution

(b) Metropolis model

(c) Reflected model

(d) Uniform distribution

Figure 10: Qualitiative comparison of samples from the data distribution, our Metropolis model, a Reflected model and the uniform distribution for a synthetic bimodal distribution on [ 1, 1]2.

(a) Data distribution

(b) Metropolis model

(c) Reflected model

(d) Uniform distribution

Figure 11: Qualitiative comparison of samples from the data distribution, our Metropolis model, a Reflected model and the uniform distribution for a synthetic bimodal distribution on 2.

F.4 Constrained Configurational Modelling of Robotic Arms

The following univariate marginal and pairwise bivariate plots visualise the distribution of different samples in

(i) the three dimensions needed to describe an ellipsoid M = l1 l2 l2 l3

(ii) the two dimensions needed to describe a location in R2.

F.4.1 Visualisation of samples from the data distribution

(a) Plots of the univariate marginal and pairwise bivariate distributions of 1 105 samples from the data distribution in S2 ++.

(b) Plots of the univariate marginal and pairwise bivariate distributions of 1 105 samples from the data distribution in R2.

Figure 12: Visualisation of the data distribution in S2 ++ R2 using univariate marginal and pairwise bivariate plots.

F.4.2 Visualisation of samples from our Metropolis sampling-based diffusion model

(a) Plots of the univariate marginal and pairwise bivariate distributions of 1 105 samples from our Metropolis sampling-based diffusion model in S2 ++.

(b) Plots of the univariate marginal and pairwise bivariate distributions of 1 105 samples from our Metropolis sampling-based diffusion model in R2.

Figure 13: Visualisation of the distribution learned by our Metropolis sampling-based diffusion model in S2 ++ R2 using univariate marginal and pairwise bivariate plots.

F.4.3 Visualisation of samples from a reflected Brownian motion-based diffusion model

(a) Plots of the univariate marginal and pairwise bivariate distributions of 1 105 samples from a reflected Brownian motion-based diffusion model in S2 ++.

(b) Plots of the univariate marginal and pairwise bivariate distributions of 1 105 samples from a reflected Brownian motion-based diffusion model in R2.

Figure 14: Visualisation of the distribution learned by a reflected Brownian motion-based diffusion model in S2 ++ R2 using univariate marginal and pairwise bivariate plots.

F.4.4 Visualisation of samples from the uniform distribution

(a) Plots of the univariate marginal and pairwise bivariate distributions of 1 105 samples from the uniform distribution in S2 ++.

(b) Plots of the univariate marginal and pairwise bivariate distributions of 1 105 samples from the uniform distribution in R2.

Figure 15: Visualisation of the uniform distribution in S2 ++ R2 using univariate marginal and pairwise bivariate plots.

F.5 Conformational Modelling of Protein Backbones

The following univariate marginal and pairwise bivariate plots visualise the distribution of different samples in (i) the polytope P R3 and (ii) the torus T4 used to parametrise the conformations of a polypeptide chain of length N = 6 with coinciding endpoints. We refer to (Han et al., 2006) for full detail on the reparametrisation and to (Fishman et al., 2023) for a full description of the dataset.

F.5.1 Visualisation of samples from the data distribution

(a) Plots of the univariate marginal and pairwise bivariate distributions of 1 105 samples from the data distribution in P R3.

(b) Plots of the univariate marginal and pairwise bivariate distributions of 1 105 samples from the data distribution in T4.

Figure 16: Visualisation of the data distribution in P R3 T4 using univariate marginal and pairwise bivariate plots.

F.5.2 Visualisation of samples from our Metropolis sampling-based diffusion model

(a) Plots of the univariate marginal and pairwise bivariate distributions of 1 105 samples from our Metropolis model in P R3.

(b) Plots of the univariate marginal and pairwise bivariate distributions of 1 105 samples from our Metropolis model in T4.

Figure 17: Visualisation of the distribution learned by our Metropolis model in P R3 T4 using univariate marginal and pairwise bivariate plots.

F.5.3 Visualisation of samples from a reflected Brownian motion-based diffusion model

(a) Plots of the univariate marginal and pairwise bivariate distributions of 1 105 samples from a reflected Brownian motion-based diffusion model in P R3.

(b) Plots of the univariate marginal and pairwise bivariate distributions of 1 105 samples from a reflected Brownian motion-based diffusion model in T4.

Figure 18: Visualisation of the distribution learned by a reflected Brownian motion-based diffusion model in P R3 T4 using univariate marginal and pairwise bivariate plots.

F.5.4 Visualisation of samples from the uniform distribution

(a) Plots of the univariate marginal and pairwise bivariate distributions of 1 105 samples from the uniform distribution in P R3.

(b) Plots of the univariate marginal and pairwise bivariate distributions of 1 105 samples from the uniform distribution in T4.

Figure 19: Visualisation of the uniform distribution in P R3 T4 using univariate marginal and pairwise bivariate plots.