# piecewise_deterministic_sampling_with_splitting_schemes__78104bad.pdf

Journal of Machine Learning Research 26 (2025) 1-94 Submitted 1/23; Revised 9/25; Published 10/25

Piecewise deterministic sampling with splitting schemes

Andrea Bertazzi andreabertazzi@duck.com Centre de math ematiques appliqu ees Ecole Polytechnique

Paul Dobson p.dobson 1@hw.ac.uk Department of Mathematics and Computer Science Heriot-Watt University and Maxwell Institute for Mathematical Sciences

Pierre Monmarch e pierre.monmarche@sorbonne-universite.fr Laboratoire Jacques-Louis Lions and Laboratoire de Chimie Th eorique Sorbonne Universit e

Editor: Anthony Lee

We introduce Markov chain Monte Carlo (MCMC) algorithms based on numerical approximations of piecewise-deterministic Markov processes obtained with the framework of splitting schemes. We present unadjusted as well as adjusted algorithms, for which the asymptotic bias due to the discretisation error is removed applying a non-reversible Metropolis-Hastings ﬁlter. In a general framework we demonstrate that the unadjusted schemes have weak error of second order in the step size, while typically maintaining a computational cost of only one gradient evaluation of the negative log-target function per iteration. Focusing then on unadjusted schemes based on the Bouncy Particle and Zig-Zag samplers, we provide conditions ensuring geometric ergodicity and consider the expansion of the invariant measure in terms of the step size. We analyse the dependence of the leading term in this expansion on the refreshment rate and on the structure of the splitting scheme, giving a guideline on which structure is best. Finally, we illustrate promising results for our samplers with numerical experiments on a Bayesian imaging inverse problem and a system of interacting particles. Keywords: MCMC algorithms, piecewise deterministic Markov processes, splitting schemes, non-reversible samplers, subsampling

1. Introduction

Piecewise deterministic Markov processes (PDMPs) are non-diﬀusive Markov processes combining a deterministic motion and random jumps. They appear in a wide range of modelling problems (Cloez, Bertrand et al., 2017; Lemaire et al., 2020; L ocherbach and Monmarch e, 2022) and, over the last decade, have gained considerable interest as Markov Chain Monte Carlo (MCMC) methods (Peters and De With, 2012; Monmarch e, 2016; Bierkens and Roberts, 2017; Bouchard-Cˆot e et al., 2018; Durmus et al., 2020; Vanetti et al., 2017). Their dynamics can be described by their inﬁnitesimal generator, which is of the form

Lf(z) = Φ(z), zf(z) + λ(z) Z

E (f(y) f(z))Q(z, dy) , (1)

c 2025 Andrea Bertazzi, Paul Dobson, and Pierre Monmarch e.

License: CC-BY 4.0, see https://creativecommons.org/licenses/by/4.0/. Attribution requirements are provided at http://jmlr.org/papers/v26/23-0036.html.

Bertazzi, Dobson, and Monmarch e

where E is the state space and, in this work, Φ is a smooth and globally Lipschitz vector ﬁeld, λ : E [0, ) is a continuous function and Q is a probability kernel. The associated process follows the ordinary diﬀerential equation (ODE) z = Φ(z) and, at rate λ(z), jumps to a new position distributed according to Q(z, ). We refer to Davis (1984) and Durmus et al. (2021) for general considerations on PDMP. We denote the deterministic dynamics by ϕt, the integral curve of Φ, that is the solution to

d dtϕt(z) = Φ(ϕt(z)), ϕ0(z) = z, for all t 0, z E,

which exists since Φ is globally Lipschitz. We also assume that ϕt leaves E invariant. For T Exp(1), the random time of the next jump, τ, is given by

τ = inf t > 0 : Z t

0 λ(ϕs(z))ds T . (2)

This work addresses the question of the simulation of a PDMP with generator (1). The classical method is to use a Poisson thinning procedure (Lewis and Shedler, 1978; Lemaire et al., 2017) to sample the jump times, and then to solve the ODE exactly if possible, or otherwise by a standard numerical scheme. Similar to rejection sampling which requires a good reference measure, an eﬃcient Poisson thinning algorithm requires the knowledge of good bounds for the jump rate λ along the trajectory of the ODE. In this work, we focus on the case in which such bounds are not available, or are so crude that thinning would not be numerically eﬃcient. In Section 6.1 we give a concrete example of the latter situation, showing how in a high dimensional setting the Poisson thinning approach makes the exact simulation of a PDMP prohibitive even when the negative log-target distribution is gradient Lipschitz (see Equation (31) for more details on the bounds, which in the considered case have eﬃciency that decreases polynomially with the dimension of the process). In this setting, the random event times have to be approximated even if the ODE can be solved exactly. This question has recently been addressed in Bertazzi et al. (2022), Pagani et al. (2024), Corbella et al. (2022) with three diﬀerent schemes. In this paper we deﬁne approximations of PDMPs by taking advantage of the core ideas behind splitting schemes, which are widely used and studied for other dynamical systems such as Hamiltonian or underdamped Langevin processes (Leimkuhler and Matthews, 2012; Leimkuhler et al., 2016; Monmarch e, 2021), but that have not been considered in the context of PDMPs before. Following the principle of splitting a PDMP into its elementary components, we obtain novel MCMC algorithms which, as we shall prove, have a numerical error which is of order 2 in the step-size. Moreover, it is a ﬂexible framework and thus such schemes can be easily combined with multi-time-step or factorization methods (Leimkuhler et al., 2013) or integrated in hybrid PDMP/diﬀusion schemes (Monmarch e et al., 2020; Monmarch e, 2020). Note that, by using a numerical approximation, we lose one of the interests of PDMP for MCMC purpose, which is the exact simulation by thinning, while in our case the invariant measure of the scheme will have a deterministic bias with respect to the true target measure. However, we still beneﬁt from the good long-time convergence properties of the ballistic non-reversible process and, contrary to Hamiltonian-based dynamics, it is still possible to factorize the target measure and deﬁne eﬃcient schemes in terms of number of computations of forces while using a single step-size (see Monmarch e et al. (2020); Monmarch e (2020) and

Piecewise deterministic sampling with splitting schemes

Section 6.2). We shall also show how the correct stationary distribution can be recovered by means of a non-reversible Metropolis-Hastings acceptance/rejection step (see Section 1.2). Moreover, for classical velocity jump processes used in MCMC, since the norm of the velocity is constant (between possible refreshments which are independent of the potential), these schemes are numerically stable (see the numerical experiments in Section 6 where the step-size of PDMP schemes can be taken larger than for the classical ULA), even for non-globally Lipschitz potentials. The core idea of splitting schemes is ﬁrst to split the generator in several parts such that a process associated to each part can be simulated exactly. For instance, when the ODE can be solved exactly, one can write L = LD + LJ with

LDf(z) = Φ(z), zf(z) ,

LJf(z) = λ(z) Z

E (f(y) f(z))Q(z, dy) ,

in which case the process associated to LD is simply the solution of the ODE, hence D stands for drift, while the process associated to LJ is a continuous-time Markov chain, for which the jump rate is constant between two jumps (so that the jump times are simple exponential random variables), hence J stands for jumps. Then, one approximates the semigroup of the true process by a Strang splitting

Pδ = eδ(LD+LJ) e δ 2 LDeδLJe δ 2 LD (3)

for a small step size δ > 0. Therefore, over one time step the approximation follows LD for time δ/2, then LJ for time δ and ﬁnally LD again for time δ/2. Given a step size δ, now we illustrate how the (n + 1)-th iteration works. Starting at time tn = nδ at state Ztn the process ﬁrst moves deterministically for a half step:

Ztn+δ/2 = ϕδ/2(Ztn).

Then we simulate the pure jump part of the process: we generate an event time τ1 Exp(λ(Ztn+δ/2)) and, if τ1 < δ, we set Ztn+δ/2 Q(Ztn+δ/2, ). Then we repeat this step as long as P i τi < δ, though, since we are interested in second order schemes, it is enough to limit ourselves to two jumps per time step. Note that the rate is updated after every jump and is constant between jumps. We conclude the iteration by a ﬁnal half step of deterministic motion:

Ztn+1 = ϕδ/2(Ztn+δ/2).

We refer to this scheme as the splitting scheme DJD, where consistently with above D stands for drift and J for jumps. When the ODE cannot be solved exactly, any secondorder numerical scheme can be used instead of ϕt. Moreover, in some cases (typically for the Hamiltonian dynamics) the generator LD can be further divided in several ODEs. Similarly, for computational purpose, it can be interesting in some cases to split the jump part LJ in several operators. It is also possible to keep in LD a combination of ODE and jump, simulated e.g. by thinning, while some parts of the jump are treated separately in LJ (it could make sense for instance in the context of Monmarch e et al. (2020)). When there are more than two parts in the splitting of L, a scheme is obtained by starting from (3) and using e.g. eδLJ e δ 2 LAeδLBe δ 2 LA if LJ = LA + LB, etc.

Bertazzi, Dobson, and Monmarch e

Such splitting schemes can be used to simulate any PDMP. For some modelling problems, it can be interesting to have estimates on the trajectorial error between the approximated process and the two process, for instance when dynamical properties (like mean squared displacement or transition rates) are of interest. However, in this work, we have mainly in mind the PDMPs which are used for MCMC methods, in particular our recurrent examples will be the Zig-Zag sampler (ZZS) (Bierkens and Roberts, 2017; Bierkens et al., 2019a) and the Bouncy Particle sampler (BPS) (Peters and De With, 2012; Monmarch e, 2016; Bouchard-Cˆot e et al., 2018). As a consequence, we will not discuss trajectorial errors but rather focus on what is relevant for MCMC purposes, namely the long-time convergence of the Markov chain (which should scale properly as the step size vanishes) and the numerical bias on the invariant measure and on empirical averages of the chain.

Main contributions of the paper

The main contributions of this paper are the following:

We introduce a novel approach to approximate PDMPs based on splitting schemes, an idea which had not been previously considered for processes of this type and that, as we prove in Theorem 6, has the key advantage of giving an approximation of second order at the cost of one gradient evaluation per iteration.

We deﬁne an unbiased version of our splitting schemes by introducing a non-reversible Metropolis adjustment based on the skew detailed balance condition, thus giving a way to eliminate the discretisation error. For these adjusted algorithms we characterise the average rejection rate.

We prove geometric convergence of the law of the unadjusted splitting schemes to the unique invariant measure and we carefully characterise the dependence of the rate of convergence on the step size (these are Theorems 8 and 10).

We study the asymptotic bias in the invariant measure of the unadjusted schemes and determine what structure of splitting scheme performs best and is most robust to poor choices of the refreshment rate, an important tuning parameter of our algorithms.

We demonstrate the advantages of our algorithm based on ZZS on sampling problems in Bayesian Imaging and Molecular Dynamics. In particular, in the imaging context our algorithm gives faster uncertainty quantiﬁcation compared the unadjusted Langevin algorithm thanks to its better stability in the step size. In the molecular dynamics setting, we show how to decompose the pairwise interactions between the N particles to reduce the cost of iterations of our algorithm to O(N), compared to the O(N2) of the Hamiltonian Monte Carlo algorithm.

Organisation of the paper

The article is organised as follows. We conclude this introduction by presenting the algorithms we focus on in this paper. In Section 1.1 we discuss our two main examples and their approximation with splitting schemes. In Sections 1.2 and 1.3 we discuss respectively how we can Metropolis-adjust our schemes in a non-reversible fashion and how we can modify

Piecewise deterministic sampling with splitting schemes

the algorithms to do subsampling. We conclude our introduction with Section 1.4, where we describe how boundaries can be treated with our splitting schemes. Section 2 is devoted to the analysis of the weak error for the ﬁnite-time empirical averages of the scheme DJD. The main result, Theorem 6, states that for this scheme the weak error is of order 2 in the step-size. The geometric ergodicity of splitting schemes based on our main examples is established in Section 3, with a consistent dependency of the estimates on the step-size. In Section 4, we provide a formal expansion (in terms of the step-size) of the invariant measure of the schemes depending on the choice of the splitting, in the spirit of Leimkuhler and Matthews (2012), with a particular focus in Section 4.2 on three one-dimensional examples where everything can be made explicit. In Section 5 we study the average rejection rate of our adjusted schemes, then verifying our theoretical results with numerical simulations on two Gaussian distributions. Numerical experiments for applications in Bayesian Imaging and Molecular dynamics are provided in Section 6. Finally, technical proofs are gathered in an Appendix.

Comparison to related works

Comparison to PDMP based approaches. The work in this paper can be seen as a continuation of the work that two of the authors started with their coauthors in Bertazzi et al. (2022), in which a general framework to approximate PDMPs is introduced and studied. In this previous work, the focus is not a speciﬁc scheme and thus the results are mostly general and not tailored for particular processes or schemes, though the ZZS and BPS are considered as recurrent examples. In particular, the schemes introduced in Bertazzi et al. (2022) leave considerable freedom to the user in the choice of some crucial components of the algorithm, namely an approximation of the switching rates or a numerical integrator in place of the exact ﬂow map. On the other hand, in this paper we follow the philosophy of splitting schemes to describe a simple recipe to approximate PDMPs, an approach that was not considered in Bertazzi et al. (2022). The main advantage of splitting schemes is the second order of accuracy with one gradient evaluation per iteration, whereas second order algorithms considered in Bertazzi et al. (2022) relied on approximations of second order of the switching rates, which can be usually obtained with the expensive computation of the Hessian of the negative log-target. Moreover, in this work we describe how to remove the bias introduced by our approximation with a non-reversible Metropolis-Hastings step. Two other works Pagani et al. (2024); Corbella et al. (2022) focus on approximate simulation of the Zig-Zag sampler, which is one of our two main examples. In Pagani et al. (2024) the authors suggest to approximate event times by using numerical approximations of the integral of the rates along the dynamics (2), as well as a root ﬁnding algorithm. In Corbella et al. (2022), the authors suggest using a numerical optimisation algorithm at each iteration to obtain a suitable bound that enables the use of Poisson thinning. The ﬁrst diﬀerence is that we mainly consider our approximations as discrete time Markov chains, whereas the processes of Pagani et al. (2024) and Corbella et al. (2022) are interpreted in continuous time, although neither resulting process is a Markov process due to the nature of the numerical algorithms that are used. Naturally, one could interpret our algorithms as continuous time processes, which again would not be Markov processes. Secondly, without assuming any properties that we do not verify, we derive theoretical justiﬁcations of

Bertazzi, Dobson, and Monmarch e

our proposed algorithms, such as bounds on the weak error and existence, uniqueness, and geometric convergence to a stationary distribution. Moreover, we introduce Metropolis adjusted algorithms to eliminate the error introduced by the numerical approximations, while this aspect is not studied in previous works. Comparison to SDE based approaches. As far as theoretical results are concerned, notice that over the past few years non-asymptotic eﬃciency bounds for MCMC algorithms like HMC or Langevin-based methods have been obtained, particularly in high-dimensional settings and for speciﬁc families of target measures (e.g. Gaussian, log-concave or mean-ﬁeld models) see for example Gouraud et al. (2025a); Leimkuhler et al. (2024); Bou-Rabee et al. (2020); Camrud et al. (2023); Cheng et al. (2018); Durmus and Moulines (2016) and references within. In this paper, we prove a number of theoretical results for our new algorithms. Our theorems are certainly less quantitative and specialised compared to the aforementioned literature, and this is natural for several reasons. First of all, many results known for HMC and Langevin are not yet established even for continuous-time PDMPs. For instance, direct Wasserstein coupling methods are very eﬃcient for ordinary or stochastic diﬀerential equations, but more delicate to implement when the process involves non-homogeneous Poisson jumps (see Monmarch e (2023, Section 4.3) in this direction with a result for a mean-ﬁeld ZZS). In particular, for Gaussian targets the HMC and unadjusted Langevin algorithms give Markov chains that are Gaussian, therefore the study boils down to linear algebra and sharp non-asymptotic bounds can be obtained, see for example Gouraud et al. (2025a); Leimkuhler et al. (2024). This is not true for PDMPs. In the recent Bierkens et al. (2025), some asymptotic study is provided for badly-conditioned Gaussian targets (in a ﬁxed dimension, focusing mainly on the 2-dimensional case) for BPS and ZZS, and in Deligiannidis et al. (2021) the marginals of the BPS with separable target are shown to converge to a randomized HMC process in high dimension. These results are only asymptotic, restricted to very speciﬁc targets, and require involved technical proofs. It should be possible to adapt them to splitting schemes of PDMP, but this requires a study on its own. On the other hand, our theoretical results provide the necessary qualitative convergence guarantees for the algorithms, similar to classical results for Langevin-based algorithm as Leimkuhler and Matthews (2012) and Talay (1990).

1.1 Main examples

Let us now introduce two examples from the computational statistics literature. In this setting we have a target probability measure with density π(x) exp( ψ(x)) for x Rd.

Example 1 (Zig-Zag sampler, Bierkens et al. (2019a)) Let E = Rd {+1, 1}d. For any z E, we write z = (x, v) for x Rd, v {+1, 1}d, where x is interpreted as the position of the particle and v denotes the corresponding velocity. The deterministic motion of ZZS is determined by Φ(x, v) = (v, 0)T , i.e. the particle travels with constant velocity v. For i = 1, . . . , d we deﬁne the jump rates λi(x, v) := (vi iψ(x))+ +γi(x), where γi(x) can be any non-negative function and is often chosen to be zero. The corresponding (deterministic) jump kernels are given by Qi((x, v), (dy, dw)) = δ(x,Riv)(dy, dw), where δz denotes the Dirac delta measure and Ri is the operator that ﬂips the sign of the i-th component of the vector it is applied to, that is Riv = (v1 . . . , vi 1, vi, vi+1, . . . , vd). Hence the i-th component of

Piecewise deterministic sampling with splitting schemes

Algorithm 1: Splitting scheme DBD for ZZS

Input : Number of iterations N, initial condition (x, v), step size δ. Output: Chain (Xtn, V tn)N n=0. Set n = 0, (X0, V 0) = (x, v); while n < N do

Set Xtn+1 = Xtn + δ

2V tn; Set V tn+1 = V tn; for i = 1 . . . , d do

With probability (1 exp( δλi(Xtn+1, V tn+1))) set V tn+1 = Ri V tn+1; end Set Xtn+1 = Xtn+1 + δ

2V tn+1; Set n = n + 1; end

the velocity is ﬂipped with rate λi. The ZZS is described by its generator

Lf(x, v) = v, xf(x, v) +

i=1 λi(x, v)[f(x, Riv) f(x, v)]. (4)

Simulating the event times with rates of this form is in general a very challenging problem. We can apply the splitting scheme above as follows. For simplicity we consider the process with canonical rates, i.e. γi = 0 for all i. Then we can split the generator as

LDf(x, v) = v, xf(x) ,

LBf(x, v) =

i=1 λi(x, v)[f(x, Riv) f(x, v)].

Here we deﬁne the scheme DBD, where B stands for bounces. Given (Xtn, V tn), we start by a half step of deterministic motion:

2 = Xtn + δ

Then for i = 1, . . . , d we draw τi iid Exp(λi(Xtn+δ/2, V tn)), which are homogeneous exponential random variables. Then let τ(1) = min τi and set

V tn if τ(1) > δ RIV tn if τ(1) δ

where RI = Q i I Ri and I is the set of indices i for which τi δ, and RIv = v when I is the empty set. Observe that for canonical rates ﬂipping the sign of a component does not aﬀect the other switching rates, and thus it is not possible to have two ﬂips in the same component when γi = 0. Finally, set

Xtn+1 = Xtn+ δ

Bertazzi, Dobson, and Monmarch e

Algorithm 2: Splitting scheme RDBDR for BPS

Input : Number of iterations N, initial condition (x, v), step size δ. Output: Chain (Xtn, V tn)N n=0. Set n = 0, (X0, V 0) = (x, v); while n < N do

Set V tn+1 = V tn ; With probability (1 exp( λr δ

2)) draw V tn+1 Unif(Sd 1) ; Set Xtn+1 = Xtn + δ

2V tn+1; With probability (1 exp( δλ1(Xtn+1, V tn+1))) set V tn+1 = R(Xtn+1)V tn+1; Set Xtn+1 = Xtn+1 + δ

2V tn+1; With probability (1 exp( λr δ

2)) set V tn+1 Unif(Sd 1) ; Set n = n + 1; end

which concludes the iteration. The procedure is described in pseudo code form in Algorithm 1. An interesting feature of the algorithm is that the jump part of the chain can be computed in parallel, since in that stage a velocity ﬂip in one component does not aﬀect the other components of the process.

Example 2 (Bouncy Particle Sampler, Bouchard-Cˆot e et al. (2018)) Let E = Rd Rd, and for any z E we write z = (x, v) for x Rd, v Rd. The deterministic motion is the same as for ZZS: Φ(x, v) = (v, 0)T . The BPS has two types of random events: reﬂections and refreshments. These respectively have rates λ1(x, v) = (v T xψ(x))+ and λ2(x, v) = λr for λr > 0, and corresponding jump kernels

Q1((x, v), (dy, dw)) = δ(x,R(x)v)(y, w), Q2((x, v), (dy, dw)) = δx(dy)ν(dw),

where ν is a rotation-invariant probability measure on Rd (typically the standard Gaussian measure or the uniform measure on Sd 1), and

R(x)v = v 2 v, xψ(x)

| xψ(x)|2 xψ(x).

The operator R reﬂects the velocity v oﬀthe hyperplane that is tangent to the contour line of ψ passing though point x. Importantly, the norm of the velocity is unchanged by the application of R, and this corresponds to an elastic collision of the particle on the hyperplane. The BPS has generator

Lf(x, v)= v, xf(x) + λ1(x, v)[f(x, R(x)v) f(x, v)] + λ2

Z f(x, w) f(x, v) ν(dw).

In this case we split the generator in three parts:

LDf(x, v) = v, xf(x) ,

LBf(x, v) = λ1(x, v)[f(x, R(x)v) f(x, v)],

Piecewise deterministic sampling with splitting schemes

LRf(x, v) = λ2

Z f(x, w) f(x, v) ν(dw),

We then deﬁne the scheme RDBDR, where R stands for refreshments. Starting at time tn = nδ at state (Xtn, V tn) we begin by drawing τ1 Exp(λr) and setting

V tn if τ1 > δ/2 W1 if τ1 δ/2

for W1 ν. Then the process evolves deterministically for time δ/2:

2 = Xtn + δ

At this point, we check if a reﬂection takes place by drawing τ2 Exp(λ1(Xtn+ δ

2 )) and set

2 if τ2 > δ

Importantly, λ1(Xtn+ δ

2 , V tn+ δ

2 ) = 0 if a reﬂection takes place and thus at most one reﬂection can happen. This is a consequence of the fact that R(x)v, ψ(x) = v, ψ(x) by deﬁnition of the reﬂection operator. After this we set

Xtn+1 = Xtn+ δ

and ﬁnally conclude the iteration drawing τ3 Exp(λr) and letting

2 if τ3 > δ/2

W2 if τ3 δ/2

where W2 ν. The pseudo code can be found in Algorithm 2.

1.2 Metropolis adjusted algorithms

Naturally, the use of splitting schemes to approximate a PDMP introduces a discretisation error. In this section we discuss how to eliminate this bias with the addition of a Metropolis-Hastings (MH) acceptance-rejection step. In Section 1.2.1 we describe the general procedure, which is a non-reversible MH algorithm, and then apply this to ZZS and BPS. Similarly this can be applied to other kinetic PDMPs used in MCMC.

1.2.1 Non-reversible Metropolis-Hastings

The classical MH algorithm builds a µ invariant Markov chain P by enforcing detailed balance (DB): for all x, y it holds that µ(dx)P(x, dy) = µ(dy)P(y, dx). The chain is then said reversible. PDMPs such as BPS and ZZS do not satisfy DB and are said to be nonreversible. Since this property can lead to a faster converging process (see e.g. Diaconis et al.

Bertazzi, Dobson, and Monmarch e

(2000)), it is reasonable here to Metropolise our splitting schemes in a non-reversible fashion. Moreover, as we shall see below, for our chains based on splitting schemes of PDMPs it is not possible to use the standard MH framework, as in general the chain cannot go back to the previous state. For this reason, we rely on a diﬀerent balance equation known as skew detailed balance: considering a chain for which the state can be decomposed as z = (x, v), for all x, v, y, w

µ(dx, dv)P((x, v), (dy, dw)) = µ(dy, dw)P((y, w), (dx, dv)). (5)

Integrating both sides with respect to x and v we can see that µ is a stationary measure for the chain P. This condition is at the basis of the classical non-reversible HMC algorithm of Horowitz (1991) and was considered in several works on the lifting approach, as for instance Turitsyn et al. (2011); Vucelja (2014); Hukushima and Sakai (2013); Michel et al. (2014). More generally, skew detailed balance holds when we compose a reversible kernel with a measure preserving involution (see e.g. Neklyudov et al. (2020) or Thin et al. (2020)), which in our case is the operator that ﬂips the sign of the velocity vector. Here we wish to construct skew-reversible Markov chains P by Metropolising kernels Q which are unadjusted splitting schemes of BPS and ZZS. Because we only need to adjust the DBD part, it is suﬃcient to consider kernels Q of the form

Q((x, v), (y, w)) =

j=1 pj(x, v)1(y,w)=Fj(x,v) ,

where pj is the probability of applying operator Fj, Pn i=1 pi(x, v) = 1 for all (x, v) E, and ﬁnally Fj : E E are volume preserving maps. Note that this is a more general setting than that of the HMC algorithm, which corresponds to the case n = 1 with F1 being a splitting scheme for the Hamiltonian dynamics. For Q as above the skew-DB holds as long as the move from (x, v) to (y, w) = Fj(x, v) is accepted with probability

α((x, v), (y, w)) = 1 µ(y, w)pj(y, w)

µ(x, v)pj(x, v) . (6)

If the proposal (y, w) is rejected, the new state of the chain becomes (x, v), in which case (5) is trivially satisﬁed.

1.2.2 Non-reversible Metropolis adjusted ZZS

Taking advantage of the skew-reversible Metropolis-Hastings framework described above we now deﬁne an exact version of splitting DBD of ZZS. Recall that the splitting DBD of Example 1 proposes moves from (x, v) to states of the form

( X, V ) = (x + δ

2(v + RIv), RIv).

As we shall motivate below, in this case the acceptance probability (6) becomes

α((x, v), ( X, V )) = 1 exp

ψ(x) ψ( X) + δ X

i/ I vi iψ(x + vδ/2)

Piecewise deterministic sampling with splitting schemes

Algorithm 3: Non-reversible Metropolis adjusted ZZS

Input : Number of iterations N, initial condition (x, v), step size δ. Output: Chain (Xtn, V tn)N n=0. Set n = 0, (X0, V 0) = (x, v); while n < N do

Set Xtn+δ/2 = Xtn + δ

2V tn; Set V = V tn; for i = 1 . . . , d do

With probability (1 exp( δλi(Xtn+δ/2, V ))) set V = Ri V ; end Set X = Xtn+δ/2 + δ

2 V ; Set (Xtn+1, V tn+1) = ( X, V ) with probability

λj(Xtn+δ/2, V tn) λj(Xtn+δ/2, V ) !

else set (Xtn+1, V tn+1) = (Xtn, V tn); Set n = n + 1; end

In case of rejection the state is set to (x, v). The pseudo-code for the resulting adjusted scheme is shown in Algorithm 3, where an equivalent expression of the acceptance probability is used. Derivation of (7). Let x1/2(x, v) := x+vδ/2. After one iteration the algorithm proposes state ( X, V ) = (x1/2 + RIvδ/2, RIv) with probability

i/ I λi(x1/2, v)

i I (1 exp( δλi(x1/2, v)). (8)

The classical MH scheme is not directly applicable, as in general the probability that the process goes from ( X, V ) to (x, v) is 0. Hence we enforce skew-DB by ﬁrst computing the probability that the chain goes from ( X, V ) to (x, v). This can only be achieved by following the same path of (x, v) ( X, V ) backwards, hence ﬂipping the sign of the velocity components in I. Noticing that x1/2(x, v) = x1/2( X, V ), we ﬁnd that the probability of this path is the same as (8) but where terms λi(x1/2, v) are substituted by λi(x1/2, V ). Observe that for i I it holds that Vi = vi and thus λi(x1/2, v) = λi(x1/2, V ), while for i / I we have Vi = vi and hence λi(x1/2, v) λi(x1/2, V ) = vi ψ(x1/2). Therefore applying (6), we ﬁnd that the acceptance probability of state ( X, V ) is (7).

1.2.3 Non-reversible Metropolis adjusted BPS

Here we consider scheme RDBDR of BPS and derive the appropriate acceptance probability (6). The resulting procedure is written in pseudo code form in Algorithm 4.

Bertazzi, Dobson, and Monmarch e

Algorithm 4: Non-reversible Metropolis adjusted BPS

Input : Number of iterations N, initial condition (x, v), step size δ. Output: Chain (Xtn, V tn)N n=0. Set n = 0, (X0, V 0) = (x, v); while n < N do

Set V tn+δ/2 = V tn ; With probability (1 exp( λrδ/2)) draw V tn+δ/2 Unif(Sd 1) ; Set Xtn+δ/2 = Xtn + δ

2V tn+δ/2; Set V = V tn+δ/2; With probability (1 exp( δλ1(Xtn+δ/2, V ))) set V = R(Xtn+δ/2) V ; Set X = X + δ

2 V ; Set (Xtn+1, V tn+1) = ( X, V ) with probability

1 π( X) exp( δλ(Xtn+δ/2, V ))

π(Xtn) exp( δλ(Xtn+δ/2, V tn))

else set (Xtn+1, V tn+1) = (Xtn, V tn+δ/2) ; With probability (1 exp( λrδ/2)) set V tn+1 Unif(Sd 1) ; Set n = n + 1; end

Derivation of Algorithm 4. First observe that the refreshment steps ensure irreducibility but do not alter the stationary distribution of the process, thus we focus on the DBD part. Recall the notation x1/2(x, v) = x + δv/2. According to DBD, the process moves from an initial condition (x, v) to

( (x1/2 + δ

2R(x1/2)v, R(x1/2)v) with probability 1 exp( δλ(x1/2, v)), (x + δv, v) with probability exp( δλ(x1/2, v)). (9)

Observe that for both states in (9) it holds x1/2(x, v) = x1/2( X, V ) = x1/2. We now compute the acceptance probability (6) in either of the two cases. Consider ﬁrst the case in which a reﬂection took place, which corresponds to the ﬁrst line of (9). The process goes from ( X, V ) back to (x, v) with the same probability with which the process has a reﬂection at x1/2. Recall that by deﬁnition of λ, it holds that λ(x1/2, v) = λ(x1/2, R(x1/2)v) and therefore the probability that the process goes from ( X, V ) to (x, v) is the same of going from (x, v) to ( X, V ). This gives that the acceptance probability (6) is

1 π(x1/2 + δ

2R(x1/2)v) π(x) = 1 exp ψ(x) ψ x1/2 + δR(x1/2)v/2 . (10)

Consider now the second case in (9). The probability that the process goes from (x + δv, v) to (x, v) is exp( δλ(x1/2, v)), while the probability of going from (x, v) to (x +

Piecewise deterministic sampling with splitting schemes

δv, v) is exp( δλ(x1/2, v)). Observing that λ(x1/2, v) λ(x1/2, v) = v, ψ(x1/2) we ﬁnd that in this case the MH acceptance probability is

1 π(x + δv) exp( δλ(x1/2, v))

π(x) exp( δλ(x1/2, v)) = 1 exp ψ(x) ψ(x + vδ) + δ v, ψ(x1/2) .

(11) Hence we have shown that the unadjusted proposal ( X, V ) is accepted with probability

α((x, v), ( X, V )) = 1 exp ψ(x) ψ( X) + δ(λ(x1/2, v) λ(x1/2, V )) . (12)

1.3 Algorithms with subsampling

One of the attractive features of ZZS and BPS is exact subsampling, i.e. the possibility when the potential is of the form ψ(x) = 1 M PM j=1 ψj(x) of using only a randomly chosen ψj to simulate the next event time. The typical application of this technique is Bayesian statistics, where ψ(x) is the posterior distribution, x is the parameter of the chosen statistical model and, when the data points are independent realisations, ψj can be chosen to depend only on the j-th batch of data points and not on the rest of the data-set. For large data-sets, this technique can greatly reduce the computational cost per event time. In essence, this property is a consequence of the fact that the ZZS with non-canonical rates

λi(x, v) = 1

j=1 (vi iψj(x))+ for i = 1, . . . , d (13)

is invariant with respect to π exp( ψ) (Bierkens et al., 2019a). If the functions t 7 (vi iψj(x + vt))+ can be bounded independently of j, then one can generate proposals for the next event time using the Poisson thinning technique. A proposal is then accepted evaluating a term (vi iψJ(x))+ for J that is picked uniformly at random from {1, . . . , M}. Overall, this procedure has O(1) cost. Bayesian statistics is not the only area where this structure of ψ arises, see e.g. the interacting particle system of Section 6.2, where subsampling corresponds to a splitting of the forces. Algorithm 1 can be modiﬁed to allow for subsampling by adapting the B part, that is the simulation of the jumps. Here we describe two approaches to achieve this, and we give the pseudo-codes for the corresponding modiﬁed jump part in Appendix A. Whenever (vi iψj(x))+ can be bounded by a constant β independently of j (and vi), then we can take advantage of Poisson thinning similarly to the case of the continuous-time ZZS. In this case, we can simulate the jump part exactly for each component i by the following iterative procedure, to be repeated until the end of the time-step: (i) draw a proposed jump time with rate β, (ii) if the proposed time is smaller than the time left to the current time-step, then we accept it with probability β 1(vi iψJ(x))+ for a random J Unif({1, . . . , M}), and ﬂip the corresponding sign of the velocity, which becomes vi. A consequence of the non-canonical rates (13) is that for each component multiple jumps can happen at each time step. We will discuss a concrete example where this approach can be followed in Section 6.2. When this procedure is not applicable, then one can simply simulate for each component i = 1, . . . , d a jump process with rate (vi iψJ(x))+, where J is refreshed after each jump. This approach does not achieve exact simulation of the jump part of ZZS, hence

Bertazzi, Dobson, and Monmarch e

it introduces further numerical inaccuracies. We remark that this is similar to the algorithm discussed in Example 5.7 of Bertazzi et al. (2022). A splitting scheme with subsampling based on BPS can be deﬁned with similar ideas.

1.4 PDMPs with boundaries

Another interesting feature of PDMPs such as BPS and ZZS is that, thanks to the simple deterministic dynamics, boundary conditions can be included and hitting times of the boundary can be easily computed (see Davis (1993) or Chevallier et al. (2024) for a discussion of PDMPs with boundaries). Here we illustrate how to simply adapt splitting schemes to these settings by adding the boundary behaviour to the D part of the scheme. Boundary terms appear for instance when the target distribution π is deﬁned on a restricted domain (Bierkens et al., 2018). In this case, Algorithms 1 and 2 can be modiﬁed by incorporating the boundary term in part D of the splitting scheme, as the boundary can be hit only if there is deterministic motion. Hence, the continuous deterministic dynamics are applied as in the exact process, while other jumps are performed in the B steps. Another example of this setting is when π is a mixture of a continuous density and a discrete distribution on ﬁnitely many states, as in Bayesian variable selection when a spike and slab prior is chosen. Sticky PDMPs were introduced in Bierkens et al. (2022a) to target a distribution of the form

µ(dx) exp( ψ(x))

i=1 (dxi + 1

ci δ0(dxi)),

which assigns strictly positive mass to events {xi = 0}. The sticky ZZS of (Bierkens et al., 2022a) is obtained following the usual dynamics of the standard ZZS and in addition freezing the i-th component for a time τ Exp(ci) when xi hits zero. The simulation of this process is challenging for the same reasons of the standard ZZS, since the two processes have the same switching rates λi for i = 1, . . . , d. The i-th component is either frozen, which is denoted by (xi, vi) Ai, or it evolves as given by the usual dynamics of ZZS. The generator can then be decomposed as L = LD + LB where LD = Pd i=1 LD,i and LB = Pd i=1 LB,i,

LD,if(x, v) = vi xi f(x, v)1AC i (xi, vi) + ci(f(Ti(x, v)) f(x, v))1Ai(xi, vi),

LB,if(x, v) = λi(x, v)[f(x, Riv) f(x, v)]1AC i (xi, vi),

and Ti(x, v) corresponds to unfreezing the i-th component (we refer to Bierkens et al. (2022a) for a detailed description). An iteration of the scheme DBD in this case proceeds by a ﬁrst half step of D, which is identical to the continuous sticky ZZS but with λi temporarily set to 0. Hence frozen components are unfrozen with rate ci and then start moving again, or unfrozen components move with their corresponding velocity vi and become frozen for a random time with rate ci if they hit xi = 0. Then a full step of the usual bounce kernel B is done for the components which are not frozen, while for the frozen components, that is (xi, vi) Ai, the generator LB,i does nothing and so the velocity cannot be ﬂipped. So unfreezing is not possible in this step. The iteration ends with another half step of D in a similar fashion to the previous one.

Piecewise deterministic sampling with splitting schemes

These ideas are more general than the two speciﬁc examples we considered and do not introduce further diﬃculties for our schemes. Finally, notice that a Metropolis correction can be added following Section 1.2, and subsampling is possible following Section 1.3.

2. Convergence of the splitting scheme

In this section we prove that under suitable conditions the splitting scheme DJD described in Section 1 is indeed a second order approximation of the original PDMP (1). Note that in this section we have a PDMP deﬁned on some arbitrary space E therefore it is not clear what it means to have a derivative, indeed we will typically be interested in the setting E = Rd V for some set V which may be a discrete set. Instead of working with a full derivative we will deﬁne the directional derivative, DΦ, in the direction Φ as

DΦg(z) = lim t 0 d dtg(ϕt(z))

for any g C(E) for which t 7 g(ϕt(z)) is continuously diﬀerentiable in t for every z. Note if E is a subset of Rd for some d and g is continuously diﬀerentiable then DΦg(z) = Φ(z)T g(z). We extend this deﬁnition to multi-dimensional valued functions G : E Rm by deﬁning DΦG(z) = (DΦGi(z))m i=1. We deﬁne the space Ck,m Φ to be the set of all functions g : E R which are k times continuously diﬀerentiable in the direction Φ with all derivatives Dℓ Φg(z) up to order k bounded by a polynomial of order m. We endow this space with the norm

g Ck,m Φ := sup z E

|g(z)| + Pk ℓ=1|Dℓ Φg(z)| 1 + |z|m .

Let us make the following assumptions.

Assumption 1 Let Φ be a globally Lipschitz vector ﬁeld deﬁned on E and assume that DΦΦ is well-deﬁned.

Assumption 2 The switching rate λ : E [0, ) is twice continuously diﬀerentiable in the direction Φ and λ, DΦλ, D2 Φλ grow at most polynomially. We denote by mλ a constant such that λ C 2,mλ Φ < .

Assumption 3 Let Q be a probability kernel deﬁned on E. We shall consider the operator Q : Cb(E) Cb(E) deﬁned by

Qg(z) = Z g( z)Q(z, d z), for any g Cb(E). (14)

Moreover we assume that Q has moments of all orders and Qg has at most polynomial growth of order m whenever g has at most polynomial growth of order m. For any m N, and g C1,m Φ we assume the following distribution is well-deﬁned:

(DΦQ)g(z) = DΦ(Qg)(z). (15)

As an abuse of notation we shall write DΦQ also as a kernel. We assume for any m N, and g C1,m Φ |Qg(ϕs(z)) Qg(z)| Cs(1 + |z|m) g C1,m Φ , (16)

Bertazzi, Dobson, and Monmarch e

and also that there exists a constant C such that for any g C2,m Φ

|Qg(ϕs(z)) Qg(z) s DΦQg(z)| Cs2(1 + |z|m) g C2,m Φ . (17)

Assumption 4 The closure (L, D(L)) of the operator (L, C1 c (E)) in L2 µ generates a C0semigroup Pt. If g C2,0 Φ then we assume that Ptg is also twice continuously diﬀerentiable in the direction Φ and LPtg is continuously diﬀerentiable in the direction Φ. Moreover we assume D2 ΦPtg and DΦLPtg are both polynomially bounded for ﬁnite t and for some C > 0, R R, m P N

|DΦPtg(z)| + |D2 ΦPtg(z)| + |DΦLPtg(z)| C(1 + |z|m P)e Rt g C2,0 Φ .

Assumption 5 Let Ztk denote the approximation obtained by the splitting scheme DJD. Assume that for each k, Ztk has moments of all orders and moreover for every M N there exists some GM such that sup m M Ez[|Ztk|m] GM(z).

Theorem 6 Let Zt be a PDMP corresponding to the generator (1). Assume that Assumption 1 to Assumption 5 hold. Then there exist constants C, R such that for any g C2,0 Φ D(L) we have for some M N

sup k n |E[g(Ztk)] E[g(Ztk)]| Ce Rtn GM(z)δ3n g C2,0 Φ .

Proof The proof follows a similar approach to Bertazzi et al. (2022, Theorem 4.24) and can be found in Appendix B.1.

The Theorem shows that splitting schemes of the form DJD give second order approximations of PDMPs, under the assumptions we stated. Indeed, the term δ3n equals δ2tn, where tn is the time horizon of the continuous time process, and thus we ﬁnd a δ2

dependence. Compared to Bertazzi et al. (2022) our estimate contains a term that is exponentially increasing in the time horizon. This term could be handled by assuming e.g. geometric convergence of the derivatives of the semigroup for the continuous time PDMP. To the best of our knowledge, aside from the results in Bertazzi et al. (2022) there are no known results establishing such estimates for PDMPs. The technical nature of our proof also makes the application of this idea challenging, but we see no reason why this approach should not give a uniform in time estimate of the weak error similarly to Bertazzi et al. (2022). Finally, let us comment on the choice of the class of test functions for which our result holds, that is C2,0 Φ D(L). This choice is explained by the fact that it is necessary to consider functions that are twice diﬀerentiable in the direction of the deterministic motion to obtain bounds on the error over one time step. For this reason, it cannot be expected to have a result e.g. in Wasserstein distance, which is not suited for (continuous time) PDMPs as we discussed previously.

Piecewise deterministic sampling with splitting schemes

Example 3 (ZZS continued) Recall the Zig-Zag sampler from Example 1 let us verify the Assumption 1 to 5 in this case. In order to have a smooth switching rate we replace λi(x, v) by λi(x, v) = log (1 + exp(vi iψ(x))) .

This is shown to be a valid switching rate in Andrieu and Livingstone (2021). We will assume that ψ C2 with bounded second and third derivatives. Let us now consider each assumption in turn. Assumption 1: In this case Φ(x, v) = (v, 0)T which is smooth and globally Lipschitz. Assumption 2: Since λi is the composition of smooth maps and ψ we have that λi has the same smoothness in x as ψ and hence x 7 λi(x, v) is C2. As s 7 log(1 + es) grows at most linearly, has ﬁrst and second derivatives bounded by 1 we have that λi, xλi and 2 xλi are all polynomially bounded. Assumption 3: The proof of this can be found in Appendix B.2. Assumption 4: By Andrieu and Livingstone (2021) we have that Pt is a strongly continuous semigroup on L2 µ with generator (L, D(L)) given as the closure of (L, C1 c (E)). Moreover we have that the assumptions of Durmus et al. (2021, Theorem 17) are satisﬁed and hence Ptg(x, v) is diﬀerentiable in x. Following the proof of Durmus et al. (2021, Theorem 17) one also has | x Ptg| C(1 + |x|m)e Rt g C1,0 Φ .

Note here since DΦg(x, v) = v T xg we have that Ck,0 Φ coincides with the space of continuous functions which are k-times continuously diﬀerentiable in the variable x. By the same arguments one can also obtain

| 2 x Ptg| C(1 + |x|m)e Rt g C2,0 Φ .

Assumption 5: This will be established in Theorem 21 (see Lemma 24), in which we show that the chain satisﬁes a geometric drift condition for a function that bounds any power.

3. Ergodicity of splitting schemes of BPS and ZZS

We shall now focus on results on ergodicity of splitting schemes of BPS and ZZS. In particular we show existence of an invariant distribution, characterise the set of all invariant distributions, and establish convergence of the law of the process to such distributions with geometric rate. Importantly, we make sure that the geometric convergence has the expected dependence on the step size and that the estimates are stable as δ decreases to 0. The statements can be found in Section 3.1, respectively in Theorems 8 for BPS and 10 for ZZS. We obtain our theorems applying the classical theory of Meyn and Tweedie (1993), which is based on minorisation and drift conditions. In Section 3.2 we explain the strategy that we follow to obtain such results.

3.1 Main results

Let us now state more precisely the result on ergodicity we shall obtain for splitting schemes of BPS and ZZS. For a given probability distribution µ, we deﬁne its V -norm as µ V := sup|g| V |µ(g)|. We shall show that the chains admit a unique invariant distribution µδ and

Bertazzi, Dobson, and Monmarch e

that there exist constants C , κ, δ0 > 0 and a function V such that for n 1 and any probability distributions µ, µ it holds

µP n δ µ P n δ V C κnδ µ µ V , for all δ [0, δ0]. (18)

Note that C , κ, δ0, V are all independent of δ. Clearly, taking µ = µδ we obtain geometric convergence to the invariant distribution of the splitting scheme. For splitting schemes of the BPS, we work under the following condition.

Assumption 7 The dimension is d 2, the velocity equilibrium ν is the uniform measure on Sd 1. There exists C > 0 such that

1 C |x|2 C ψ(x) C|x|2 + C , 1 C |x| C | ψ(x)| C|x| + C

for all x Rd. Moreover, 2ψ < and, without loss of generality, inf ψ = 1.

Notice that, when d = 1, the BPS and the ZZS coincide, in which case we refer to Theorem 10 below. Our result of ergodicity for splitting schemes of the BPS is the following.

Theorem 8 Consider any scheme of the BPS based on the decomposition D,R,B. Under Assumption 7, the chains have a unique stationary distribution and there exist δ0, C > 0, κ (0, 1) and V : Rd Sd 1 [1, + ) satisfying

for all x Rd, v Sd 1, e|x|/a/a V (x, v) aea|x|

for which (18) holds.

Proof The proof can be found in Appendix C.

More care is required for the DBD scheme of the ZZS since this Markov chain has periodicity and is not irreducible, which is reminiscent of the discrete-space Zig-Zag chain studied in Monmarch e (2020). Let us illustrate this behaviour by considering the one dimensional setting. Let (x, v) be the initial condition of the process. Since v has magnitude 1, the position component x can only vary by multiples of the step size δ. Thus for a ﬁxed initial condition (x, v) the process remains on a grid (x + δZ) { 1, 1}. Moreover, after a single step of the scheme there are two possible outcomes: either the velocity does not change, in which case x moves to x + δv, or the velocity is ﬂipped and the position remains the same. This means that the change in the position (by amounts of δ) plus half the diﬀerence in the velocity always changes by 1 each step and hence is equal to the number of steps in the scheme up to multiples of two, i.e.

2(V nδ v) n + 2Z.

As a consequence, with probability 1 the chain alternates between two disjoint sets, depending on whether n is even or odd. Therefore, the chain is periodic and not ergodic. To overcome this issue, we consider the chain with one step transition kernel given by

Piecewise deterministic sampling with splitting schemes

P 2 δ = PδPδ, i.e. we restrict to the case of an even number of steps. For a given initial condition (x, v) Rd { 1, 1}d, the Markov chain P 2 δ lives on the following grid

D(x, v) := {(y, w) C { 1}d : (yi, wi) D1(xi, vi) for all i = 1, . . . , d}, (19)

where D1(xi, vi) := D+(xi, vi) D (xi, vi), with

D+(xi, vi) := {(yi, wi) : wi = vi, yi = xi + mδ, m 2Z},

D (xi, vi) := {(yi, wi) : wi = vi, yi = xi + mδ, m 2Z + 1}.

Hence the Markov chain P 2 δ is aperiodic, though it is not irreducible on Rd and therefore has (inﬁnitely) many invariant measures which depend on the initial condition (x, v). The ergodicity of P 2 δ can nevertheless be characterised as, for a given initial condition (x, v), it is irreducible on D(x, v). Notice that the chain at odd steps lives on the disjoint set {(y, w) : y = x + mδ, m Z, w { 1}d} \ D(x, v). In this case we show in Theorem 10 that the Markov chain with transition kernel P 2 δ is irreducible on D(x, v), has a unique invariant measure, πx,v δ , and is geometrically ergodic. Now we can characterise all the invariant measures of the Markov chain with transition kernel P 2 δ deﬁned on Rd { 1, 1}d as the closed convex hull of the set {πx,v δ : x Rd, v { 1, 1}d}. Now consider the Markov chain with transition kernel P 2 δ on Rd { 1, 1}d. For any initial distribution µ we have convergence of µP 2n δ to some measure πµ δ as n tends to and πµ δ is given by

πµ δ (ϕ) = (µπx,v δ )(ϕ) := Z

Rd { 1,1}d ϕ(y, w)πx,v δ (dy, dw)µ(dx, dv). (20)

We use the next assumption to verify that Theorem 11 applies for initial conditions drawn from probability distributions with support on D(x, v).

Assumption 9 Consider switching rates λi(x, v) = (vi iψ(x))+ + γi(x) for i = 1, . . . , d. ψ C2(Rd) and the following conditions hold:

(a) There exists a scalar y 0 such that for all y > y

λ(y) := min i=1,...,d min (x,v): xivi [ y,y], |xj| [ y,y] for all j =i λi(x, v) > 0.

(b) For |x| R for some R > 0

sup t (0,1) sup y1,y2 B(x,t

d), v,w { 1,1}d

e(t2(|(v+w)T 2ψ(y1))i|+2t|(w 2ψ(y2))i|)γi(x + vt)etvi iψ(x) γ0 < 1.

(c) Denote as B(x, δ

d) the ball with centre at x and radius δ

lim x sup y1,y2 B(x,δ

max{1, 2ψ(y1) }

| iψ(y2)| = 0 for all 0 δ δ0, i = 1, . . . , d,

where δ0 = 2(1 + γ0) 1, for γ0 as in part (b).

Bertazzi, Dobson, and Monmarch e

Part (a) in Assumption 9 is inspired by Bierkens and Roberts (2017, Assumption 3) and is used to show that a minorisation condition holds. This condition is either a consequence of properties of the target, or else can be enforced by taking a non-negative excess switching rate, in which case γi(x) can be chosen to be a continuous function γi : Rd (0, ). In principle one could prove a minorisation condition using the techniques of Bierkens et al. (2019b), but this is beyond the scope of this paper. Part (b) is a condition on the decay of the refreshment rate, while Part (c) is similar to Growth Condition 3 in Bierkens et al. (2019b) and is satisﬁed for instance if ψ is strongly convex with globally Lipschitz gradient. These two conditions are used to show that a drift condition holds.

Theorem 10 Consider the splitting scheme DBD for ZZS. Suppose Assumption 9 holds. Then there exist C , δ0 > 0, κ (0, 1) and V : Rd { 1, 1}d [1, ) satisfying

for all (x, v) Rd { 1, 1}d ,

i=1 (1 + 2| iψ(x)|) 1

2 V (x, v) exp(βψ(x))

i=1 (1 + 2| iψ(x)|) 1 2

for all β (0, 1/2) such that, for all δ (0, δ0], the following holds:

1. Fix (x, v) Rd { 1, 1}d and consider P 2 δ = PδPδ, transition kernel on D(x, v). Then P 2 δ admits a unique invariant distribution and the inequality (18) holds with Pδ replaced by P 2 δ with these C , κ, V for any µ, µ having support on D(x, v).

2. For any probability measure µ on Rd { 1, 1}d with µ(V ) < , we have that µP 2n δ converges as n to the measure πµ δ given by (20) where πx,v δ is the unique invariant measure of P 2 δ on D(x, v) and we have

µP 2n δ πµ δ V C κnδ Z δ(x,v) πx,v δ V µ(dx, dv) . (22)

Proof The proof can be found in Appendix D.1.

Under similar assumptions we establish geometric ergodicity of schemes DRBRD, RDBDR of ZZS, where the switching rates in the B part are λi(x, v) = (vi iψ(x))+, i.e. the canonical rates, while refreshments in the R part are independent draws from Unif({ 1}d) with rate γ(x) : Rd [0, ). The rigorous statement of this result, Theorem 26, and its proof can be found in Appendix D.2.

3.2 Proof strategy

Let us start by stating the following classical result, due to Meyn and Tweedie (see Meyn and Tweedie (1993) for the original result, while here the speciﬁc statement is based on Hairer and Mattingly (2011, Theorem 1.2), see also Durmus et al. (2020, Theorem S.7) for the explicit constants).

Theorem 11 Consider a Markov chain with transition kernel P on a set E. Suppose that there exist constants ρ [0, 1), C, α > 0, a function V : E [1, + ) and a probability measure ν on E such that the two following conditions are veriﬁed:

Piecewise deterministic sampling with splitting schemes

1. Drift condition: for all x E,

PV (x) ρV (x) + C. (23)

2. Local Dobelin condition: for all x E with V (x) 4C/(1 ρ),

Then, for all probability measures µ, µ on E and all n N,

µP n µ P n V C

α κn µ µ V (24)

where κ = max(1 α/2, (3 + ρ)/4). Moreover P admits a unique stationary distribution µ satisfying µ (V ) < .

Remark 12 Under the Drift condition (23) alone, following the proof of Hairer and Mattingly (2011, Theorem 1.2) in the case α = 0 we get that for all probability measures µ, µ

on E, µP µ P V (ρ + 2C) µ µ V .

We use this inequality in the discussion that follows.

We prove geometric ergodicity of our splitting schemes by showing that the assumptions of Theorem 11 are satisﬁed. More precisely, both in the case of BPS and of ZZS, we obtain a local Doeblin (or minorisation) condition with constant α after n = t /δ steps, where t > 0 plays the role of physical time and n is the number of steps needed to travel for an equivalent time. Here t , α are independent of δ. On the other hand, we show that the drift condition holds for one step of the kernel with constants ρ = 1 bδ and C = Dδ, where b, D and the Lyapunov function V are independent of δ. This implies that for any s > 0 and any δ (0, δ0]

(Pδ) s/δ V (1 bδ) s/δ V + Dδ

k=0 (1 bδ)k e bs V + D

Applying Theorem 11, we get for P n δ a long-time convergence estimate which is uniform over δ (0, δ0], that is for all δ (0, δ0] and n 1 we ﬁnd

µ(P n δ )n µ (P n δ )n V C

α κn µ µ V ,

where C = D/b and κ = max(1 α/2, (3+e bt )/4). Observe that the rhs does not depend on δ. Using the observation in Remark 12, we can get convergence in V -norm for P n δ . Indeed for n = mn + r with r < n we have

µP n δ µ P n δ V C

α κm µP r δ µ P r δ V

α 1 + 2C κm µ µ V

Bertazzi, Dobson, and Monmarch e

C κnδ µ µ V ,

where κ = κ1/(t +δ0) (0, 1) and C = C (1 + 2C ) /(ακ) are independent from δ. Here we used that with computations identical to above we get the drift condition P r δ V (1 bδ)V + D(1 (1 bδ)r)/b V + C , which is enough for the current purpose. As a conclusion, the estimates given in Theorems 8 and 10 (or in Appendices C and D for more details) give the expected dependency in δ for the convergence rate of the process toward equilibrium.

4. Expansion of the invariant measure of splitting schemes for BPS and ZZS

In this section we investigate the bias in the invariant measure of diﬀerent splittings of BPS and ZZS and draw conclusions on which schemes are to be preferred. This analysis sheds light on the dependence of the bias on the structure of the splitting scheme as well as on the refreshment rate, which is the only parameter of our algorithms other than the step size. Our theoretical and numerical investigations highlight how poor choices of the refreshment rate can have a negative eﬀect on the bias of some schemes. In practice it is hard to deﬁne good values of λr a priori, as this is case-dependent and theory only exists for factorised targets (Bierkens et al., 2022b). For these reasons, we shall focus on the robustness of the various schemes to this parameter. In this section we assume the following condition on µδ, which is motivated by Theorems 8 and 10, giving existence and uniqueness of a stationary distribution µδ, and Theorem 6, leading to the conjecture that µδ is a second order approximation of µ.

Assumption 13 The processes corresponding to our splitting schemes have an invariant distribution with density

µδ(x, v) = µ(x, v)(1 δ2f2(x, v) + O(δ4)), (25)

where µ(x, v) = ν(v)π(x), π is the target and ν is a distribution on the velocity vector.

Remark 14 In Section 3 we discuss cases where a splitting scheme may admit more than one invariant measure. In such cases it is not immediately clear what the expansion (25) means. In order to make (25) consistent as δ 0, in those cases we consider µδ as the limit of the law of the splitting scheme as the number of steps tends to inﬁnity when the process is started according to µ.

Remark 15 In order to establish Assumption 13 rigorously one would typically rely on estimates on the semigroup, for instance establishing decay of its derivatives. For diﬀusion processes analogous results have been established using a variety of techniques (Crisan and Ottobre, 2016; Kopec, 2014), but adapting these techniques to PDMPs is challenging. One of the main reasons for this is that PDE based approaches often make use of smoothing and, unlike diﬀusion processes, PDMPs do not have smoothing properties (for example, the proof in Kopec (2014) relies on use of the Bismut-Elworthy formula, which is not true for PDMP). One result in this direction is Theorem 5.13 in Bertazzi et al. (2022), which establishes convergence of the ﬁrst order derivative for the semigroup of a ZZS. Notice that in order to establish (25) we would also require higher order derivatives.

Piecewise deterministic sampling with splitting schemes

Finding a function f2 as in (25) allows us to compare the asymptotic bias of diﬀerent splitting schemes of BPS and ZZS and is thus crucial to determine which splitting scheme to use for MCMC computation. Due to the complexity of the equations, we obtain an explicit expression for f2 only in speciﬁc cases such as the one-dimensional setting. The extension to the higher dimensional setting is left for future work. In Section 4.1 we give the general, analytic expression for f2 for one-dimensional splitting schemes in the case in which the velocity takes values 1. Observe that in this context the BPS with velocity on the unit sphere and the ZZS coincide, hence this result applies to either process. For the explicit statement we refer the reader to Proposition 16, which is obtained following the approach of Leimkuhler and Matthews (2012). In Section 4.2 we give a heuristic argument suggesting that the scheme RDBDR, that is Algorithm 4 in the case of BPS, performs well compared to the alternatives and is robust to poor choices of the refreshment rate. We support this argument considering a one-dimensional standard normal target distribution and plotting the TV distance to the truth as given by Proposition 16, as well as giving the results from a numerical simulation. We remark that in the case of ZZS the refreshment rate can be set to 0 and thus we prefer the splitting DBD. In Section 4.3 we follow a diﬀerent approach and fully characterise the invariant measure of splitting scheme RDBDR with λr 0 in the one-dimensional case. The result, which can be found in Proposition 17, is obtained verifying that RDBDR is skew-reversible with respect to a particular perturbation of the true target.

4.1 Main result

In order to obtain f2 we follow the approach of Leimkuhler and Matthews (2012), which we now brieﬂy illustrate before stating our main result. First, using the Baker-Campbell Hausdorﬀ(BCH) formula (see e.g. Bonﬁglioli and Fulci (2012)) we can ﬁnd L2 such that

Ex,v[f(Xδ, V δ)] = f(x, v) + δLf(x, v) + δ3L2f(x, v) + O(δ4).

Here L is the inﬁnitesimal generator of the continuous time process. Integrating both sides with respect to µδ and using that µδ is an invariant measure for the splitting scheme we obtain Z f(x, v)µδ(x, v)dxdv = Z f(x, v)µδ(x, v)dxdv + δ Z Lf(x, v)µδ(x, v)dxdv

+ δ3 Z L2f(x, v)µδ(x, v)dxdv + O(δ4).

Substituting µδ with the expansion (25) we have

0 = δ Z Lf(x, v)µ(x, v)dxdv δ3 Z Lf(x, v)µ(x, v)f2(x, v)dxdv

+ δ3 Z L2f(x, v)µ(x, v)dxdv + O(δ4)

and since µ is an invariant measure for BPS we have R Lfdµ = 0 which gives

L (µf2) = L 2µ. (26)

Bertazzi, Dobson, and Monmarch e

Here L and L 2 are the adjoints of L and L2 in L2 with respect to Lebesgue measure1. A compatibility condition is required to ensure that there is a unique solution to (26). Because both µδ and µ are probability densities, integrating (25) gives the requirement Z f2(x, v)µ(x, v)dxdv = 0. (27)

Solving the problem deﬁned by (26)-(27) is much more complicated for PDMPs than for the Langevin case considered in Leimkuhler and Matthews (2012). This is due to the diﬃcult expressions for the adjoint L , which in general is an integro-diﬀerential operator. Nonetheless, we are able to obtain f2 restricting to the one dimensional case with velocity in { 1}. We are ready to state the main result of this section.

Proposition 16 Consider the one-dimensional setting with state space R { 1} and target distribution µ(x, v) = π(x)ν(v) with π exp( ψ) and ν = Unif({ 1}). Let λr 0 be the refreshment rate. Then the function f2 that solves (26)-(27) is

f2(x, +1) = f+ 2 (0) + Z x

2 + ( ψ(y))+

g(y) L 2µ(y, +1) µ(y, +1)

f2(x, 1) = f2(x, +1) + g(x),

g(x) = exp (ψ(x)) Z x

L 2µ(y, +1) µ(y, +1) + L 2µ(y, 1) µ(y, 1)

exp( ψ(y))dy,

f+ 2 (0) = Z

2 + ( ψ(y))+

g(y) L 2µ(y, +1) µ(y, +1)

Proof The proof can be found in Appendix E.2.

Therefore, f2 can be obtained for any splitting scheme by plugging the corresponding L 2 and the particular choice of target π in Proposition 16. For example, in the case of RDBDR an application of Proposition 16 and substitution of the appropriate term L 2 (see Proposition 31 in Appendix E.3) gives

f2(x, +1) = f2(x, 1) = 1

ψ (y)π(y)dy ψ (x) . (28)

We observe that in the one dimensional case the second order term of the bias of scheme RDBDR is always independent of the refreshment rate and of v.

4.2 Comparison of splitting schemes

Proposition 16 gives the analytic expression for f2(x, v), but it remains diﬃcult to conclude which scheme has the smallest bias for a given target due to the complexity of the operators

1. Note that one could alternatively work in the weighted space L2(µ). This proved useful in the case of Langevin diﬀusions, as shown in Leimkuhler et al. (2016).

Piecewise deterministic sampling with splitting schemes

Figure 1: Left: total variation distance to the target up to second order obtained with Proposition 16, and right: square root MSE in the estimation of the radius statistic, i.e. x2, for a one-dimensional standard normal target distribution. In both plots δ = 0.5, while for the right plot the number of iterations is N = 2 105, the position is initialised at the target distribution and the velocity from the uniform distribution on the unit sphere, and the experiment is repeated 300 times.

L 2 (see Proposition 31 in Appendix E.3). Here we give a heuristic argument suggesting that scheme RDBDR is in general a reasonable choice, and support this in the standard Gaussian case from a theoretical and numerical point of view (see Figure 1).

First recall that, neglecting refreshment events, ZZS and BPS can be interpreted as limits of lifted Metropolis-Hastings algorithms based on the deterministic proposal mechanism (x, v) 7 (x + vδ, v), where in fact the jump mechanism arises as limits of the acceptancerejection steps (Bierkens and Roberts, 2017). Hence, the jump part B of ZZS and BPS corrects the error incurred by the drift part D and it is then sensible to expect a smaller bias for schemes which divide the drift term in two half steps, interrupted by a correction step B, as opposed to taking a full step of D all at once. Moreover, intuitively it seems best to introduce the refreshment part R without disrupting the interaction between parts D and B. This motivates incorporating refreshment half-steps at the beginning and end of each iteration. For these reasons we expect scheme RDBDR (or alternatively DBD in the case of ZZS) to have the smallest bias.

In Figure 1 we consider a standard normal target and conﬁrm our intuitive arguments comparing scheme RDBDR with schemes DRBRD and DBRBD, which include the refreshment part in between the DBD part, and also with scheme BDRDB, which violates our intuitive principles but is in fact equivalent to RDBDR up to a shift of a half-step. All these schemes have the same cost of one gradient computation per iteration, since in BDRDB it is suﬃcient to keep track of the gradient at the previous iteration. In Figure 1 we consider both the error incurred in the estimation of a test statistic and the theoretical TV distance between the limiting distribution of the schemes and the standard normal distribution. The TV distance is derived from the analytic expression of f2 as follows. By marginalising µδ with respect to ν = Unif({ 1}) we obtain πδ(x) = π(x)(1 (δ2/2)(f2(x, +1)+f2(x, 1)))+O(δ4). Hence we can express the TV distance between π and

Bertazzi, Dobson, and Monmarch e

π πδ TV = δ2

A (f2(x, +1) + f2(x, 1))π(x)dx + O(δ4). (29)

The δ2 contribution of the TV distance can be computed by plugging in the expressions for f2 found in Proposition 16, where we substitute for each splitting scheme the corresponding operator L 2 obtained in Proposition 31 in Appendix E.3. The test statistic we use for this example is the radius statistic, i.e. T(x) = x the Euclidean norm of the position component. For each time algorithm we estimate the radius statistic using the ergodic average of that chain, i.e.

n=1 T( Xtn)

this is compared to the true radius π(T) = 1. By repeating this over M = 300 independent realisations X1, . . . , XM of the algorithm we obtain an estimate for the root mean square error (MSE) as v u u t 1

m=1 | ˆT( Xm) π(T)|2.

Figure 1 suggests that indeed RDBDR compares well to the other schemes. In particular, its bias is independent of the refreshment rate, contrarily to schemes DRBRD and DBRBD. Finally, scheme BDRDB has a positive bias which is independent of the choice of λr. We notice that the theoretical and numerical behaviour show similar dependence on the structure of the scheme and on the refreshment rate, even though the plots in Figure 1 show a diﬀerent metric for each case. We refer the reader to Appendix E.4 for further comparisons, which corroborate our heuristic arguments further.

4.3 Characterisation of the invariant measure of RDBDR in one dimension

In fact, in 1D, for the scheme RDBDR, we can get an explicit expression for the invariant measure. Note that the result is independent of λr and holds also for the case λr = 0, in which case the scheme coincides with DBD.

Proposition 17 Consider the scheme RDBDR for BPS or ZZS in one dimension, where the velocity is refreshed from ν = Unif({ 1}). For some x R and δ > 0, deﬁne the probability distribution µδ with support on the grid {y R : y = x + nδ, n Z} { 1} given by µδ(y, v) e ψδ(y),

where ψδ(x) = ψ(x) and for y = x + nvδ, n N

ψδ(y) = ψ(x) + δv

ℓ=1 ψ (x + (ℓ 1/2)vδ).

Then the distribution µδ is stationary for the chain which is initialised at x and with step size δ. Moreover, under the conditions of Theorem 10 we obtain that µδ is ergodic, in the

Piecewise deterministic sampling with splitting schemes

sense that for all bounded functions

n=1 f(Xtn, V tn) = µδ(f) Px,v a.s.

Proof The proof can be found in Appendix E.5.

Once again it is clear that the scheme is unbiased in the Gaussian case ψ(x) = x2/(2σ2) (in the sense that ψδ(y) = ψ(y) for all y = x + nδv, i.e. the BPS is ergodic with respect to the restriction of the true Gaussian target to the grid, and moreover the target measure is invariant for the scheme). More generally, for y = x + vnδ with n N we get

ψ(y) = ψ(x) +

δ/2 ψ (x + v(ℓ 1/2)δ + u)du

= ψδ(y) + 1

δ/2 u2ψ(3)(x + v(ℓ 1/2)δ)du + O(nδ5)

= ψδ(y) + δ2

x ψ(3)(u)du + O(δ4|x y|) .

Setting x = 0 this gives ψδ = ψ + δ2f2 + O(δ4) with f2(y) = (1/24)(ψ (y) ψ (0)), which is the same as what follows from Proposition 16 (see Equation (28)) since the term f+ 2 (0) in (28) was introduced to make exp( ψ)(1+δ2f2) a probability distribution and would appear also in the present context. Hence Propositions 16 and 17 agree.

5. Scaling of the rejection probability of adjusted algorithms

In this section we study the average rejection probability of Algorithms 3 and 4, focusing in particular on its dependence on the step size. The main results of the section are Proposition 18 and Proposition 19, dealing with ZZS and BPS respectively. In both cases we consider the average rejection probability, where the initial state is ﬁxed and the randomness is in the proposal for the next state, and prove it has dependence on the step size of order three. This behaviour is akin to each step of the leapfrog integrator used in HMC, as shown by Stoltz and Trstanova (2018). We begin considering the average rejection rate of the Metropolis adjusted ZZS (Algorithm 3).

Proposition 18 Suppose ψ is smooth and its derivatives are growing at most polynomially and let δ [0, δ0] for δ0 > 0. The average rejection probability of Algorithm 3 satisﬁes

E[1 α((x, v), ( X, V ))] = δ3G(x, v) + G(x, v, δ),

where G = G1 + G2,

G1(x, v) = 1

i=1 λi(x, v)vi X

k =i vk ikψ(x)

Bertazzi, Dobson, and Monmarch e

G2(x, v) = 1

α:|α|=3 Dαψ(x)vα

while G(x, v, δ) O(δ4) for any x, v and grows at most polynomially in x for any δ [0, δ0].

Proof The proof can be found in Appendix F.

In the next example we apply Proposition 18 to a factorised target, for which we ﬁnd that in stationarity the expected value of G is proportional to

d for large d.

Example 4 (Factorised target distribution) Consider a potential of the form ψ(x) = Pd i=1 ψi(xi). Proposition 18 gives for (x, v) µ

G(x, v) d= 1

i=1 viψ(3) i (xi)

Clearly, this equals zero if ψ is an independent Gaussian distribution, in which case by the proof of the Proposition we also ﬁnd G = 0 and DBD has the correct stationary distribution. Alternatively, consider the assumption |ψ(3) i | M. In this case max(0, Pd i=1 viψ(3) i (xi)) M max(0, Pd i=1 vi) and we can write Pd i=1 vi = 2Bin(d, 1/2) d where the equality is in distribution and Bin denotes the binomial distribution. In the large d regime Pd i=1 vi is approximately distributed as a centred Gaussian distribution with variance d, hence

Eµ [G(x, v)] M

which means that the leading order of the acceptance rate is stable if δ scales as d 1

Now we focus on the Metropolis adjusted BPS, stating a similar result to Proposition 18.

Proposition 19 Suppose ψ is smooth and its derivatives are growing at most polynomially, and moreover that for the points for which ψ(x) = 0 it holds that det( 2ψ(x)) = 0. Let δ [0, δ0] for δ0 > 0. Consider an initial condition x which is a draw from a distribution that is absolutely continuous with respect to π. Then the average rejection probability of Algorithm 4 satisﬁes

E[1 α((x, v), ( X, V ))] = δ3G(x, v) + G(x, v, δ),

where G = G1 + G2,

G1(x, v) = 1

8λ(x, v) max 0, v, 2ψ(x)v R(x)v, 2ψ(x)R(x)v

G2(x, v) = 1

α:|α|=3 Dαψ(x)vα

while G(x, v, δ) o(δ3) for any x, v and grows at most polynomially in x for any δ [0, δ0].

Proof The proof can be found in Appendix F.

Piecewise deterministic sampling with splitting schemes

6. Numerical experiments

In this section we discuss some numerical simulations for the proposed samplers. The codes for all these experiments can be found at https://github.com/andreabertazzi/ splittingschemes_PDMP.

6.1 Image deconvolution using a total variation prior

In this section we test the unadjusted ZZS (Algorithm 1) on an imaging inverse problem, which we solve with a Bayesian approach. In the following we shall refer to an image either as a N N matrix or as a vector of length d = N2, which is obtained by placing each column of the matrix below the previous one. In both cases each entry corresponds to a pixel. Now denote as x Rd the image we are interested in estimating and y Rd the observation. The observation is related to x via the statistical model

y = Ax + ξ,

where A is a d d-dimensional matrix which may be degenerate and ill-conditioned and ξ a d-dimensional Gaussian random variable with mean zero and variance σ2Id. The forward problem we consider is given by a blurring operator, i.e. A acts by a discrete convolution with a kernel h. In our examples h will be a uniform blur operator with blur length 9. The likelihood of y given x is

p(y|x) e fy(x), for fy(x) = 1 2σ2 Ax y 2.

As prior distribution on x we choose the total variation prior: p(x) e g(x), where g(x) = θ x TV for θ > 0 and x TV is the total variation of the image x (see Rudin et al. (1992)) and is given by

i,j=1 (|xi+1,j xi,j| + |xi,j+1 xi,j|).

The total variation prior is a log concave non-diﬀerential improper prior often used as a benchmark for Bayesian Imaging problems (Pereyra et al., 2020; Oliveira et al., 2009; Pereyra, 2016). The total variation prior corresponds to the ℓ1-norm of the discrete gradient of the image and promotes piecewise constant reconstructions. Note that this prior is not smooth and hence we cannot directly apply the gradient based algorithms such as the unadjusted Langevin algorithm (ULA) or ZZS. Therefore we approximate g with a Moreau Yosida envelope

gλ(x) = min z Rd

By Rockafellar and Wets (2009, Proposition 12.19) we have that gλ is Lipschitz diﬀerentiable with Lipschitz constant λ 1 and

λ(x proxλ g(x)), where proxλ g(x) = arg min z Rd

Bertazzi, Dobson, and Monmarch e

Using Bayes theorem, we have the posterior distribution

p(x|y) e fy(x) θgλ(x).

We select the optimal θ by using the SAPG algorithm (Vidal et al., 2020; De Bortoli et al., 2020) and we choose λ based on the guidelines given in Durmus et al. (2018), which set λ = 5/Lf where Lf is the Lipschitz constant of fy. Despite this being a log concave probability distribution, sampling from this model using MCMC schemes is diﬃcult because x is usually very high dimensional and the problem is ill-conditioned. In order to be able to characterise the nature of the conditioning we introduce a strongly convex term to the posterior, and consider as target distribution

π(x) e fy(x) θgλ(x) 1

2 m x 2. (30)

In this case the unadjusted Langevin algorithm can be very expensive to run since the step size is limited by 2/L, where L = Lf + λ 1+m is the Lipschitz constant of log π. The values of the parameters in the considered example are summarised in Table 1. We are interested in drawing samples from the distribution (30), and in particular we compare the unadjusted ZZS (Algorithm 1, abbreviated as UZZS in the plots), ULA, the continuous ZZS, as well as the discretization of the underdamped Langevin Algorithm considered by Sanz-Serna and Zygalakis (2021), which is a strongly second order accurate integrator abbreviated as UBU in the plots. To implement the continuous ZZS we can compute the Lipschitz constant of the gradient of the negative log-posterior, L, and thus we can implement the exact ZZS using the Poisson thinning technique based on the simple bound

λi(x + vt, v) t L

d + λi(x, v). (31)

In order to compare the computational cost of the continuous ZZS to the unadjusted ZZS, ULA and UBU we count each proposal for an event time obtained by Poisson thinning as a gradient evaluation and thus as an iteration. Indeed, an update of the computational bounds requires the evaluation of λi(x, v) for some i and thus the proximal operator has to be calculated. This is the most relevant cost in the algorithm so we view it as equivalent cost to one step of ULA, UZZS and UBU. To estimate the posterior mean for the continuous ZZS we compute the time average T 1 R T 0 Xtdt. For each of these algorithms ULA, UZZS and continuous ZZS we have used 106 gradient evaluations to obtain the estimates, this includes a burn in phase which uses 10% of the iterations. In order to have the best convergence speed for ULA we use a step size of L 1, for UZZS there is not a stability barrier so we may take much larger step size and for these experiments we use 2L 1/2 as the step size. For UBU the performance based on theoretical guarantees is not competitive since this algorithm scales poorly with the conditioning number (Cheng et al., 2018; Durmus and Moulines, 2016), which is very large in these examples. However we have implemented UBU with a friction parameter γ = 2, which is outside the range covered by the theoretical guarantees, and with step size L 1

2 . Note that although the convergence rate for the discretisation is not known for this regime, the continuous SDE converges with a rate O( m) for m-strongly log-concave probability measures (Cao et al.,

Piecewise deterministic sampling with splitting schemes

σ L d λ θ m 0.0024 2.12 105 65536 2.83 10 5 10.74 1

Table 1: Summary of the problem parameters for Section 6.1.

2023). Therefore using a step-size of O(L 1

2 ) has the potential of achieving a convergence rate O(κ 1

2 ). For comparison of the bias we have used a long run (107 iterations) of the MYMALA algorithm (Durmus et al., 2018) which produces unbiased samples of π including the Moreau-Yosida approximation of g, and we use these samples as our reference for the true distribution. Note that MYMALA mixes substantially slower than ULA, UZZS, and UBU hence is not competitive with these algorithms in terms of convergence speed.

Algorithm IMMSE of posterior mean vs x IMMSE of posterior mean vs MALA IMMSE of posterior standard deviation vs MALA ULA 1.9 10 3 6.4 10 5 3.1 10 5

UZZS 1.9 10 3 5.9 10 5 2.9 10 5

UBU 1.9 10 3 5.9 10 5 2.9 10 5

cts ZZS 5.2 10 3 2.7 10 3 2.7 10 3

Table 2: A summary of the error calculated by the IMMSE for two diﬀerent statistics: the posterior mean and the pixelwise posterior standard deviation. We compare the posterior mean against the ground truth x and against the posterior mean obtained using a long run of MYMALA. For the pixelwise posterior standard deviation we compare against a long run of MYMALA.

Figure 2 shows the ground truth image x, the observed data y, the estimated posterior mean using the diﬀerent samplers and the true posterior mean obtained using a long run of MYMALA. These ﬁgures show that the posterior mean obtained via UZZS, UBU or ULA appear visually the same as the true posterior mean whereas the continuous ZZS sampler has clearly not converged after 106 gradient evaluations. We have plotted the standard deviation for each pixel which provides an indication of the level of conﬁdence in each pixel value, as measured by the model and the results are shown in Figure 3. Using the long run of MYMALA for comparison we see that ULA and UZZS provide a good estimate of the standard deviation of the model, however, looking closely at the boundary of the image one can see that ULA, UBU and UZZS are overestimating the standard deviation at these pixels. As a measure of distance between two images we use the image mean square error (IMMSE) deﬁned as

IMMSE(x, y) = 1 N2

i,j=1 |xi,j yi,j|2

The IMMSE between the pixelwise standard deviation is reported in Table 2 along with the IMMSE of the posterior mean, these show UZZS consistently has smaller error than ULA and both have signiﬁcantly smaller IMMSE than continuous ZZS.

Bertazzi, Dobson, and Monmarch e

(a) Ground truth, x

(b) Data, y

(g) ZZS continuous

Figure 2: Results for the reconstruction of cameraman image using a TV prior. The ﬁrst row shows the original image x, the data y and the posterior mean using a long run of MYMALA. Each panel (d)-(i) shows the posterior mean obtained after 106 iterations with the labelled algorithm.

Figure 3: Results for the pixelwise standard deviation for the cameraman image using a TV prior.

The convergence is further investigated in Figure 4. Figure 4 shows the IMMSE using MYMALA as the truth for the mean and pixelwise variance by each algorithm (ULA, UBU, UZZS and continuous ZZS). These show that UZZS converges the fastest of the four algorithms in terms of the standard deviation while UBU performs the best for the ﬁrst moment. This is likely due to the fact that the step size of ULA must be very small or otherwise the process explodes to inﬁnity, while for UZZS larger step sizes can be selected and in these experiments the step size is 500 times larger for UZZS. This constitutes a major diﬀerence because every iteration is computationally intensive since the dimension is very high (see Table 1) and notably each iteration involves solving an optimisation problem, which in our simulations is solved by the Chambolle-Pock algorithm (Chambolle and Pock, 2011). Finally, let us compare the unadjusted ZZS with the continuous time ZZS. It is clear from our experiments that ZZS performs poorly compared to its discretisation. The reason

Piecewise deterministic sampling with splitting schemes

102 104 106

Gradient Evaluations

UBU ZZS cts

102 104 106

Gradient Evaluations

UBU ZZS cts

Figure 4: IMMSE between posterior mean (left) and pixelwise posterior standard deviation (right) estimated by the algorithms ULA, UBU, UZZS, and the continuous ZZS using a long run of MYMALA as a reference for the truth.

is twofold. First, the major drawback of Poisson thinning using the bounds (31) is that a considerable proportion of the proposed event times are rejected (in our examples the rejection rate is around 70 80%). Moreover, the rates λi are very large in the current framework and the process can have even 109 switches per continuous time unit. In comparison, the ULA requires 2 105 iterations per unit time, while UZZS and UBU require 460 iterations per unit time for this example. This means that signiﬁcantly more gradient computations are required to travel a reasonable distance for the continuous time ZZS and thus the process itself is expensive to run. The combination of these two phenomena implies an important loss of eﬃciency, which explains the results of our simulations.

6.2 Chain of interacting particles

In this section we consider a problem which will serve as an illustration of a typical context where ZZS is favoured with respect to other samplers. This toy model presents features that are similar to the molecular system considered in Monmarch e et al. (2020), where splitting schemes involving velocity bounces have proven eﬃcient. We consider a chain of N particles in 1D, labelled 1 to N. The particles interact through two potentials: a chain interaction, where the particle i interacts with the particles i 1 and i+1; and a mean-ﬁeld interaction, where each particle interacts with all the others. For x RN, the potential is thus of the form

i=1 V (xi xi+1) + 1

i,j=1 W(xi xj) ,

where V is the chain potential and W is the mean-ﬁeld potential. In the following we take

V (s) = s4, W(s) = p

for s R, i.e. the chain interaction is an anharmonic quartic potential which constrains two consecutive particles in the chain to stay close, while the mean-ﬁeld interaction induces a repulsion from the rest of the system. Although this speciﬁc ψ is an academic example meant for illustration purpose, its general form is classical in statistical physics.

Bertazzi, Dobson, and Monmarch e

This target distribution presents two signiﬁcant challenges: the term representing the mean-ﬁeld interactions requires O(N2) computations, which becomes impractical for large N; the quartic potential causes numerical instabilities for standard diﬀusion-based samplers, which then require truncations of their drifts to avoid diverging at inﬁnity. Our samplers are able to tackle both challenges without any modiﬁcations. We shall consider as test statistic the empirical variance of the system of particles around their barycentre, that is

v(x) = 1 2N2

i,j=1 (xi xj)2 . (32)

Notice that ψ is invariant by translation of the whole system, so that e ψ is not integrable on RN. However, we are interested in a translation-invariant function, so we consider e ψ

as a probability density on the subspace {x RN, x = 0}, which amounts to looking at the system of particles from its centre of mass. In practice, we run particles in RN without constraining their barycentre to zero, which does not change the output as long as we estimate the expectations of translation-invariant functions. The forces ψ can be decomposed in two parts, one of which (the chain interaction) is unbounded and not globally Lipschitz but is relatively cheap to compute (with a complexity O(N)), while the second part (the mean-ﬁeld interaction) is bounded but numerically expensive (with a complexity O(N2)). If this decomposition is not taken into account and we simply run a classical MCMC sampler based on the computation of ψ, then the step size is required to be very small due to the non-Lipschitz part of the forces, leading to a poor numerical eﬃciency since the costly mean-ﬁeld force has to be computed at each step. Alternatively, in order to simulate a continuous-time PDMP (e.g. the BPS) via thinning for this model, the non-Lipschitz part of the potential would induce high-order polynomial bounds on the jump rate, leading to many jump proposals and a poor numerical eﬃciency. This issue would be even more critical with 3D particles and singular potentials such as the Lennard-Jones one considered in Monmarch e et al. (2020). Now, as was already discussed in Section 1.3 for subsampling, PDMPs and their splitting schemes can be used with a splitting of the forces2. Let us illustrate the idea for ZZS. We consider a ZZS where the switching rate of the i-th velocity has the form

λi(x, v) = vi(V (xi xi+1) V (xi 1 xi))

vi W (xi xj)

where for the particles 1 and N we set x0 = x1 and x N+1 = x N to cancel out the corresponding terms. The corresponding continuous-time ZZS has the correct invariant measure (once centred) as shown by Bierkens et al. (2019a). We consider our DBD splitting to approximate this ZZS. In particular, we shall take advantage of the fact that |W (s)| 1 for

2. The word splitting is here used for two diﬀerent notions which should not be confused; one the one hand, the Strang splitting (3) used to deﬁne a discrete-time scheme, and on the other hand the decomposition of the forces ψ in several parts, each part being treated with a jump mechanism. It is possible to do a splitting scheme of a PDMP without splitting the forces, as it is possible to sample by thinning a continuous-time PDMP with splitted forces (e.g. the ZZS where ψ is splitted on the canonical basis). By contrast, for multi-time-step methods for HMC, splitting forces is directly related to splitting the operator ψ v.

Piecewise deterministic sampling with splitting schemes

all s R to use the Poisson thinning technique (Lewis and Shedler, 1978) to deal with the mean-ﬁeld interactions. We can sample the jump times of the i-th velocity as follows. First, we sample two jump times with rates respectively (vi(V (xi xi+1) V (xi 1 xi)))+ and 1. The latter rate is an upper bound for the mean ﬁeld force. If both times are larger than the step size δ, then the velocity is not ﬂipped. Else, if the time corresponding to the ﬁrst rate is smaller than δ and than that corresponding to the second, then we ﬂip the i-th velocity. Alternatively, if the second time is smaller than δ and than the ﬁrst, we draw J Unif({1, . . . , N}) and we ﬂip the sign of the i-th velocity with probability (vi W (xi x J))+. Since in this case the rates are not canonical due to the splitting of forces, more than one jump per component is possible and thus this procedure is repeated until the end of the time step has been reached. This results in O(1) computations per particle on average, hence O(N) for the whole system. For BPS, we consider the splitting scheme RDBDR, where part B uses similar ideas as above. The main diﬀerence lies in the fact that jumps due to the mean-ﬁeld interaction are obtained using the reﬂection operator that is deﬁned by the same force that caused the event time. Pseudo-codes for the jump part of ZZS and BPS can be found in Appendix G. We shall compare our methods to a suitable Hamiltonian Monte Carlo (HMC) algorithm. First of all, we consider the variant of HMC that has refreshments from the Laplace distribution (Nishimura et al., 2024), since this is more suitable to deal with the quartic interaction (Livingstone et al., 2019). Then, we introduce a splitting of the forces and construct a multi-time-step method in accordance to the ideas introduced e.g. in Tuckerman et al. (1992); Morrone et al. (2010); Neal (2011); Leimkuhler et al. (2013); Shahbaba et al. (2014); Lagardere et al. (2019). The general idea is to split the generator as L = L1 + L2, where L1 represents the Hamiltonian dynamics with only the cheap part of the forces ψ, denoted as ψ1, while L2 = ψ2 v where ψ2 represents the most expensive part of the forces. Given a step-size δ and some K N, one starts with a Strang splitting approximation

eδL e δ 2 L2 e δ K L1 K e δ 2 L2 .

Then, we use a splitting scheme to approximate e δ1 K L1, and we take a stochastic gradient version of the expensive force in L2, that is we draw J Unif({1, . . . , N}) and we consider only the force between each particle and the J-th particle. This approximation is then repeated M times for some M N. The reason we do this stochastic gradient approximation is that otherwise HMC is prohibitively slow due to the O(N2) complexity of the mean-ﬁeld interaction. The pseudo-code for this HMC algorithm is given in Appendix G. We observe that the ZZS and the HMC algorithm described above are obtained with similar ideas, but applied in a diﬀerent order. Indeed, the ZZS was derived ﬁrst splitting the forces at the level of the continuous-time process and then applying our splitting scheme, whereas for the HMC algorithm the forces were split at the level of the discretisation (cf. previous footnote). Hence, in the case of ZZS we do not introduce an additional numerical error since we are approximating a continuous-time PDMP that has the right invariant distribution. Moreover, ZZS naturally adapts the (random) step-size for the expensive part of the forces by leveraging the Poisson thinning technique, avoiding tuning an additional parameter that arises from multi-time-step procedures. For more comparisons between random jumps and deterministic multi-time-step splitting methods in some practical cases,

Bertazzi, Dobson, and Monmarch e

(a) Results for N = 25. The ZZS has step-size 1e 2, BPS has step-size 5e 3 and refreshment rate λr = 1, while HMC has step size 3.3e 3, M = 5, K = 3. The plots show the average over 100 simulations.

(b) Results for N = 100. The ZZS has step-size 1e 2, BPS has step-size 5e 3 and refreshment rate λr = 1, while HMC has step size 3e 2, M = 5, K = 3. The plots show the average over 5 simulations.

Figure 5: Numerical simulations in the setting of Section 6.2. In all plots the y-axis shows the relative mean square error (MSE), that is the MSE normalised by the square of the true value, which in this case corresponds to a long run of the Metropolis-adjusted BPS. The left and right plots show the relative MSE for the estimation respectively of the mean and variance of the test function (32). The initial position of each particle is obtain drawing from the standard Gaussian distribution.

we refer to Monmarch e et al. (2020) and to Gouraud et al. (2025b) which was released after (and motivated by) the present work.

The results of our simulations are shown in Figure 5. The parameters of all samplers are obtained by performing a grid search over the parameter space. It is clear from Figure 5 that ZZS gives cheaper and more accurate estimates of the mean and variance of the empirical variance in both cases considered. In particular, we see that ZZS clearly outperforms HMC and BPS. This holds even though the calibration of ZZS is considerably simpler since it involves only the tuning of the step size, while for HMC and BPS one has to tune additionally respectively K and M, and λr. We remark that, in our simulations, BPS is

Piecewise deterministic sampling with splitting schemes

robust to reasonable choices of λr. BPS is competitive for N = 25 particles, but for higher N it requires a step size that is too small to match the accuracy of ZZS, making its convergence slow. This is possibly due to the stochasticity in the reﬂection operator of BPS corresponding to the mean ﬁeld interactions. More reﬁned schemes for BPS might improve its performance, but these are outside of the scope of this paper, where we are interested in testing our general-purpose splitting schemes.

Acknowledgments

The authors thank the three anonymous referees for their valuable comments, which helped substantially improve this article. AB acknowledges funding from the Dutch Research Council (NWO) as part of the research programme Zigzagging through computational barriers with project number 016.Vidi.189.043. AB also acknowledges funding by the European Union (ERC-2022-Sy G, 101071601). Views and opinions expressed are however those of the authors only and do not necessarily reﬂect those of the European Union or the European Research Council Executive Agency. Neither the European Union nor the granting authority can be held responsible for them. PD acknowledges funding from the Engineering and Physical Sciences Research Council (EPSRC) grant EP/V006177/1. PM acknowledges funding from the French ANR grant SWIDIMS (ANR-20-CE40-0022) and from the European Research Council (ERC) under the European Union s Horizon 2020 research and innovation program (grant agreement No 810367), project EMC2.

Bertazzi, Dobson, and Monmarch e

Appendix A. Pseudo-codes for the algorithms with subsampling

Here we give pseudo-codes for the ZZS algorithms with subsampling described in Section 1.3. Algorithms 5 and 6 are suited for scenarios where the target distribution is of the form π(x) exp( 1

M PM j=1 ψj(x)). Algorithm 5 in addition requires availability of a function βi(x, vi) such that (vi iψj(x))+ βi(x, vi)

for all j {1, . . . , M}. These can be extended to the case of BPS by using a stochastic version of the reﬂection operator, which is of the form

Rj(x)v = v 2 v, xψj(x)

| xψj(x)|2 xψj(x)

when the jump is caused by the j-th term of ψ, that is ψj.

Algorithm 5: Part B for the ZZS with subsampling and Poisson thinning

Input: Initial condition (x, v) Rd { 1}d, step size δ. Output: Updated velocity vector v. for i 1 to d do

t 0 Compute the Poisson thinning bound βi(x, vi) τ Exp(βi(x, vi)) while τ δ t do

t t + τ J Unif({1, . . . , M}) U Unif[0, 1]

if U (vi iψJ(x))+

βi(x,vi) then vi vi τ Exp(βi(x, vi))

Appendix B. Proofs of Section 2

B.1 Proof of Theorem 6

Fix g C2,0 Φ D(L). By a telescoping sum we have

Ez[g(Ztn)] Ez[g(Ztn)] =

k=0 (Ez[Ptn tk+1g(Ztk+1)] Ez[Ptn tkg(Ztk)]).

For each k {0, . . . , n 1}, set fk(y, s) = Ptn tk sg(y) then we have

Ez[g(Ztn)] Ez[g(Ztn)] =

k=0 Ez[fk(Ztk+1, δ) fk(Ztk, 0)].

Piecewise deterministic sampling with splitting schemes

Algorithm 6: Part B for the ZZS with subsampling without Poisson thinning

Input: Initial condition (x, v) Rd { 1}d, step size δ. Output: Updated velocity vector v. for i 1 to N do

t 0 J Unif({1, . . . , M}) τ Exp((vi iψJ(x))+) while τ δ t do

t t + τ vi vi J Unif({1, . . . , M}) τ Exp((vi iψJ(x))+)

By conditioning on Ztk it is suﬃcient to prove that

|Ez[fk(Zδ, δ)] fk(z, 0)| R(1 + |z|M) g C2,0 Φ δ3. (33)

Indeed if we have that (33) holds then by Assumption 5 we have

|Ez[g(Ztn)] Ez[g(Ztn)]| Cδ3 n 1 X

k=0 e R(tn tk)Ez[G(Ztk)]

C g C2,0 Φ e Rtnδ3n GM(z),

which gives the desired result. It remains to show that (33) holds. As done in Bertazzi et al. (2022) we rewrite the lhs as

Ez[fk(Zδ, δ)] fk(z, 0) = Ez[fk(Zδ, δ)] fk(ϕδ(z), δ) + fk(ϕδ(z), δ) fk(z, 0). (34)

In particular, with identical steps to Bertazzi et al. (2022) we can rewrite the last two terms on the left hand side of (34) using the fundamental theorem of calculus and that sfk(z, s) = Lfk(z, s):

fk(ϕδ(z), δ) fk(z, 0) = Z δ

0 λ(ϕr(z))[Q(fk( , r))(ϕr(z)) fk(ϕr(z), r)]dr.

Then we compute the expectation in the right hand side of (34), collecting a term for the case of no jumps, a single jump and the case of multiple jumps

Ez[fk(Zδ, δ)] fk(z, δ) =

0 Q(ϕδ/2(z), d z) fk(ϕδ/2( z), δ) fk(ϕδ(z), δ) λ(ϕδ/2(z))e sλ(ϕδ/2(z))e (δ s)λ( z)ds

Bertazzi, Dobson, and Monmarch e

ℓ=2 Ez[(fk(Zδ, δ) fk(ϕδ(z), δ))1{ℓevents}] ( )

0 λ(ϕr(z))[Q(fk( , r))(ϕr(z)) fk(ϕr(z), r)]dr. ( )

Observe that the sum in the second term ( ) can be truncated from ℓ= 3 onward as we only wish to get an order δ3 local error. Indeed, we have |fk| g and hence

ℓ=3 Ez[(fk(Zδ, δ) fk(ϕδ(z), δ))1{ℓevents}]

2 g Pz(ℓ 3 events)

0 λ(ϕδ/2(z))e s1λ(ϕδ/2(z))λ(z1)e s2λ(z1)λ(z2)e s3λ(z2)

Q(ϕδ/2(z), dz1)Q(z1, dz2)Q(z2, dz3) ds1ds2ds3

Z (1 e δλ(ϕδ/2(z)))(1 e δλ(z1))(1 e δλ(z2))Q(ϕδ/2(z), dz1)Q(z1, dz2)

Z λ(ϕδ/2(z))λ(z1)λ(z2)Q(ϕδ/2(z), dz1)Q(z1, dz2)

Z λ(ϕδ/2(z))λ(z1)Q(ϕδ/2(z), dz1)Qλ(z1)

2δ3 g λ(ϕδ/2(z))Q(λ( )Qλ( ))(ϕδ/2(z))

where we used that 1 exp( z) z for z 0. By Assumption 3 we have that λQ(λQλ) is polynomially bounded and therefore we can bound ( ) by

( ) = Z Q(ϕδ/2(z), dz1)Q(z1, dz2) fk(ϕδ/2(z2), δ) fk(ϕδ(z), δ)

0 λ(ϕδ/2(z))e s1λ(ϕδ/2(z))λ(z1)e s2λ(z1)e (δ s1 s2)λ(z2)ds2 ds1

+ O( g (1 + |z|3mλ)δ3).

Here and throughout we use the notation F(z, δ, g) = O( g C2 Φ(1 + |z|m)δn) to mean that

lim sup δ 0 sup z E sup g |F(z, δ, g)| g C2 Φδn(1 + |z|m) C.

We Taylor expand several terms in order to verify that the local error is of order δ3. We use repeatedly the following expansions:

λ(ϕs(z)) = λ(z) + s DΦλ(z) + s2R(z, s; λ),

fk(ϕs(z), δ) = fk(z, δ) + s DΦfk(z, δ) + s2R(z, s; fk),

R(z, s; g) = D2 Φg(ϕ s(z))/2,

fk(z, s) = fk(z, 0) s Lfk(z, 0) + s2L2fk(z, s)/2,

for some s [0, s] (note that s may vary with each term so when we use these expansions we include an index to distinguish diﬀerent incidents of s). Note that by Assumption 4 we

Piecewise deterministic sampling with splitting schemes

have fk C2 Φ Ce R(tn tk)(1+|z|m P) g C2,0 Φ which gives us a bound on the remainder terms. Applying the expansions above to ( ) we obtain

( ) = Z Q(ϕδ/2(z), d z) fk( z, 0) δLfk( z, 0) + δ2

2 L2fk( z, s2) + δ

2DΦfk( z, δ)

8 R( z, s2; fk) fk(z, 0) + δLfk(z, 0) δ2

2 L2fk(z, s3) δDΦfk(z, δ) δ2

2 R(z, s3; fk)

2DΦλ(z) + 1

8δ2R(z, s4; λ)

1 sλ(ϕδ/2(z)) (δ s)λ( z) + 1

2(sλ(ϕδ/2(z)) + (δ s)λ( z))2e η ds

= Z Q(ϕδ/2(z), d z) fk( z, 0) fk(z, 0) Z δ

2DΦλ(z) 1 sλ(z) (δ s)λ( z) ds

+ δ Z Q(ϕδ/2(z), d z) Lfk( z, 0) + 1

2DΦfk( z, 0) + Lfk(z, 0) DΦfk(z, 0)

2DΦλ(z) 1 sλ(z) (δ s)λ( z) ds + e R(tn tk)O( g C2,0 Φ (1 + |z|M)δ3)

where we used η [0, sλ(ϕδ/2(z)) + (δ s)λ( z)]

in the ﬁrst equality and further Taylor expansions to obtain the second equality. Here M = 3mλ + m P. Now using Assumption 3 we can expand the Q term

( ) = Z Q(z, d z) + δ

2DΦQ(z, d z) fk( z, 0) fk(z, 0)

2DΦλ(z) 1 sλ(z) (δ s)λ( z) ds

+ δ Z Q(z, d z) Lfk( z, 0) + 1

2DΦfk( z, 0) + Lfk(z, 0) DΦfk(z, 0)

2DΦλ(z) 1 sλ(z) (δ s)λ( z) ds + e R(tn tk)O( g C2,0 Φ (1 + |z|M)δ3)

Term ( ) can be expanded as

( ) = Z Q(ϕδ/2(z), d z)Q(z1, dz2)(fk(z2, 0) fk(z, 0)) Z δ

λ(z) + δDΦ(λ)(ϕ s4(z)) λ(z1) 1 + ( s1λ(ϕδ/2(z)) s2λ(z1) (δ s1 s2)λ(z2))e ξ ds2 ds1

+ e R(tn tk)O( g C1,0 Φ (1 + |z|m P+3mλ)δ3)

Z Q(ϕδ/2(z), d z)Q(z1, dz2) fk(z2, 0) fk(z, 0) λ(z)λ(z1)

+ e R(tn tk)O( g C1,0 Φ (1 + |z|m P+3mλ)δ3)

Z Q(z, dz1)Q(z1, dz2) fk(z2, 0) fk(z, 0) λ(z)λ(z1)

Bertazzi, Dobson, and Monmarch e

+ e R(tn tk)O( g C1,0 Φ (1 + |z|m P+3mλ)δ3),

where ξ [0, s1λ(ϕδ/2(z)) + s2λ(z1) + (δ s1 s2)λ(z2)].

By Assumption 3 we can expand the term ( ) as follows:

0 λ(ϕr(z)) Qfk( , r)(z) + r Z DΦQ(z, d z)fk( z, r) fk(ϕr(z), r) dr

+ e R(tn tk)O(δ3 g C2,0 Φ (1 + |z|mλ)).

By Taylor s theorem

0 λ(ϕr(z)) Qfk( , 0)(z) r Z Q(z, d z)Lfk( z, 0) + r Z DΦQ(z, d z)(fk( z, 0) r Lfk( z, r)) .

(fk(z, 0) + r DΦfk(z, 0) r Lfk(z, 0))) dr

+ e R(tn tk)O(δ3 g C2,0 Φ (1 + |z|mλ+m P).

Note that R DΦQ(z, d z)Lfk( z, r) = Q(DΦLfk( , r))(z) = e R(tn tk)O((1 + |z|m P) g C2,0 Φ ). Using this and Taylor expanding λ(ϕr(z)) we have

0 λ(z) Qfk( , 0)(z) r Z Q(z, d z)Lfk( z, 0) + r Z DΦQ(z, d z)fk( z, 0)

(fk(z, 0) + r DΦfk(z, 0) r Lfk(z, 0)) dr Z δ

0 r DΦλ(z) (Qfk( , 0)(z) fk(z, 0)) dr

+ e R(tn tk)O(δ3 g C2,0 Φ (1 + |z|mλ+m P).

Evaluating the integral over r

( ) = λ(z) Qfk( , 0)(z)δ 1

2δ2 Z Q(z, d z)Lfk( z, 0) + 1

2δ2 Z DΦQ(z, d z)fk( z, 0)

fk(z, 0)δ + 1

2δ2DΦfk(z, 0) 1

2δ2Lfk(z, 0) 1

2δ2DΦλ(z) (Qfk( , 0)(z) fk(z, 0))

+ e R(tn tk)O(δ3 g C2,0 Φ (1 + |z|mλ+m P).

First order terms. Terms of order δ appear only in ( ) and ( ) and clearly they cancel out. Second order terms. In ( ) we can further expand terms of the form DΦ(f)( z, δ) and rearrange as

Order δ2 of ( ) =δ2 " Z λ(z) Q(z, d z)Lfk( z, 0) + Q(z, d z)1

2DΦfk( z, 0)

2DΦQ(z, d z)(fk( z, 0) fk(z, 0))

Piecewise deterministic sampling with splitting schemes

+ λ(z)(Lfk(z, 0) DΦfk(z, 0) )

+ Z Q(z, d z)(fk( z, 0) fk(z, 0)) 1

2λ(z)(λ( z) + λ(z)) + 1

=δ2 " Z λ(z) Q(z, d z)Lfk( z, 0) + Q(z, d z)1

2DΦfk( z, 0)

2DΦQ(z, d z)(fk( z, 0) fk(z, 0)) | {z } Term A

+ λ(z) Lfk(z, 0) DΦfk(z, 0)) 1

2λ(z) Z Q(z, d z)(fk( z, 0) fk(z, 0))

| {z } Term B

2DΦλ(z) Z Q(z, d z)(fk( z, 0) fk(z, 0)) | {z } Term C

Z Q(z, d z)(fk( z, 0) fk(z, 0) | {z } Term D

For term ( ) we have

Order δ2 of ( ) = 1

2δ2 Z Q(z, dz1)Q(z1, dz2)(fk(z2, 0) fk(z, 0) | {z } Term D

)λ(z)λ(z1).

Similarly for ( ) we have

Order δ2 of ( ) = δ2

Z DΦ(Q)(z)λ(z)(fk( z, 0) fk(z, 0)) | {z } Term A

+ Z Q(z, d z)DΦ(λ)(z)(fk( z, 0) fk(z, 0)) | {z } Term C

+ Z Q(z, d z)λ(z)( Lfk( z, 0) + Lfk(z, 0) DΦ(fk)(z, 0) | {z } Term B

After cancellations we obtain

Order δ2 of ( ) + ( ) + ( ) = δ2λ(z)

Z Q(z, d z)Lfk( z, 0) + Q(z, d z)1

2DΦ(fk)( z, 0)

Z Q(z, d z)fk( z, 0)λ( z) + 1

Z Q(z, d z)Q( z, dz2)fk(z2, 0)λ( z) + 1

Z Q(z, d z)Lfk( z, 0)

Z Q(z, d z) Lfk( z, 0) + DΦ(fk)( z, 0) + λ( z) Z Q( z, dz2)[fk(z2, 0) fk( z, 0)] !

Bertazzi, Dobson, and Monmarch e

where the last equality follows by the deﬁnition of the generator of the PDMP. Therefore we have shown that second order terms cancel out.

B.2 Proofs for Example 3

Let us verify that Assumption 3 holds. Note that

(DΦQ)(g)(x, v) = v T d X

λ(x, v) g(x, Riv) .

Therefore by Taylor s theorem we have for some η between x and x + sv

Qg(x + sv, v) Qg(x, v) (DΦQ)(g)(x, v) = 1

i=1 v T 2 x

λ(x, v) g(x, Riv) x=η v.

It remains to show that λi

λ , x λi(x,v)

λ(x,v) , 2 x λi(x,v)

λ(x,v) are bounded. It is clear that 0 < λi/λ 1. Let us consider the ﬁrst derivative. Set Ξ(s) = log(1 + es) so that λi(x, v) = Ξ(vi iψ(x)) and note that

Ξ(s) 1, 0 Ξ (s)

= xλi(x, v)

λ(x, v) λi(x, v) xλ(x, v)

So it remains to show that xλi/λ is bounded. Using the bounds on Ξ we have xλi(x, v)

= Ξ (vi iψ(x))

Ξ(vi iψ(x)) vi x iψ(x) | x iψ(x)| .

This is bounded by our assumptions on ψ. Let us now consider 2 x λi(x,v)

2 xλi(x, v) λ(x, v) 2 xλi(x, v) xλ(x, v)T

+ 2λi(x, v) xλ(x, v) xλ(x, v)T

λ(x, v)3 λi(x, v) 2 xλ(x, v) λ(x, v)2

Using the bound on xλi/λ we can bound all the terms aside from the one involving the second derivative of λ so it suﬃces to consider 2 xλi(x, v) λ(x, v)

Ξ (vi iψ(x)) P j Ξ(vj jψ(x)) x iψ(x) x iψ(x)T + Ξ (vi iψ(x)) P j Ξ(vj jψ(x))vi 2 x iψ(x)

x iψ(x) x iψ(x)T + 2 x iψ(x)

This is bounded since ψ has bound second and third order derivatives.

Piecewise deterministic sampling with splitting schemes

Appendix C. Proofs of ergodicity for splitting schemes of the BPS

To ﬁx ideas, we consider a splitting scheme for a BPS with unitary velocity with generator L decomposed as L = L1 + L2 + L3 with

L1f(x, v) = v T xf(x, v),

L2f(x, v) = v T ψ(x)

+ (f(x, R(x)v) f(x, v)) ,

L3f(x, v) = λr

Sd 1 (f(x, w) f(x, v)) dw .

Write P j t = et Lj the associated semi-groups for j J1, 3K. We shall show the following statement, which implies Theorem 8.

Theorem 20 Consider any scheme of the BPS based on the decomposition D,R,B. Under Assumption 7, the following hold:

1. There exist a, b, C, δ0 > 0 and a function V (with a, b, C, δ0, V depending only on ψ and λr, but not on δ) such that, for all x, v,

e|x|/a/a V (x, v) aea|x|

and for all δ (0, δ0] and all x, v,

PδV (x, v) (1 bδ) V (x, v) + Cδ . (35)

2. For all R > 0, there exist c, δ0 > 0 and a probability measure ν on E such that for all x, v with |x| R and all δ (0, δ0], setting n = 4R/δ ,

δx,v P n δ cν . (36)

C.1 Lyapunov function

In this section, we prove (35). Under Assumption 7, let W(x) = p

ψ(x) so that W is bounded and, as |x| , lim inf | W(x)| > 0 and 2W(x) 0.

Let c W > 0 be such that | W(x)| c W for |x| large enough. Let

Sd 1 1w1 1/2dw > 0 .

Let ϕ C2(R) be a non-decreasing function such that ϕ (0) > 0 and

= 1 for θ 1

4c V = 2 for θ 1.

Let κ = 4λr

c W , M = κ

Bertazzi, Dobson, and Monmarch e

(where we used that ϕ (0) > 0 so that the denominator is positive). Finally, let

ε = λrq 16 ϕ

and let R > 0 be such that for all x Rd with |x| R,

| W(x)| c W , 2W(x) M, and 2W(x) ε .

Preliminary computation

In all this section we denote by the same letter C various constants (independent from t) that change from line to line and we assume that t (0, δ0] with δ0 = 1/( W κ). We consider a Lyapunov function

V (x, v) = eκW(x)ϕ(θ(x, v)) , θ(x, v) = v T W(x) .

This choice is inspired by the Lyapunov function of the continuous time BPS that was studied in Durmus et al. (2020). Notice that |θ| is bounded by W . More generally, for a non-negative C2 function g, we bound

P 1 t eκW g θ (x, v) = eκW(x+tv)g(v T W(x + tv)),

using that etr 1 + tr + r2t2erδ0/2 for t δ0 as

eκW(x+tv) eκ(W(x)+tv T W(x)+ 1

eκW(x) 1 + tκv T W(x) + Ct2

and, similarly,

g(v W(x + tv)) g(θ(x, v)) + tv T 2W(x)vg (θ(x, v)) + Ct2

g(θ(x, v)) + ε g + C1|x| R t + Ct2

(here and below the successive constants C implicitly involve the suprema of g, |g | and |g | over [ W , W ]). Combining these two bounds and using that |tκθ| 1, we get

P 1 t eκW g θ eκW (1 + tκθ)g θ + ε g t + Ct2 + Ct . (37)

Second, for any function g,

P 2 t eκW g θ (x, v)

= eκW(x) e t(v T ψ(x))+g v T W(x) + 1 e t(v T ψ(x))+ g (R(x)v)T W(x)

= eκW(x) g(θ(x, v)) + 1 e 2t W(x)θ+(x,v) (g ( θ(x, v)) g (θ(x, v))) .

In the second equality we used (v T ψ(x))+ = 2W(x)(v T W(x))+. In particular, if g is non-decreasing,

P 2 t eκW g θ (x, v)

Piecewise deterministic sampling with splitting schemes

eκW(x) g(θ(x, v)) + 1 e t Mθ+(x,v) (g ( θ(x, v)) g (θ(x, v))) + t C1|x| R

eκW(x) g(θ(x, v)) + t Mθ+(x, v) (g ( θ(x, v)) g (θ(x, v))) + Ct2 + t C1|x| R . (38)

Third, for any function g, similarly,

P 3 t eκW g θ (x, v)

= eκW(x) g (θ(x, v)) + 1 e λrt Z

Sd 1 g(w T W(x))dw g (θ(x, v))

eκW(x) g (θ(x, v)) + λrt Z

Sd 1 g(w W(x))dw g (θ(x, v)) + Ct2 (39)

In particular, for g = ϕ, using the rotation-invariance of the uniform measure on the sphere and the fact that ϕ is increasing and with values in [1, 2], we bound Z

Sd 1 ϕ(w W(x))dw = Z

Sd 1 ϕ(w1| W(x)|)dw 1|x| R + A

with A = sup |x| R

Sd 1 ϕ(w1| W(x)|)dw .

As a consequence, using that eκW(x,v)1|x| R is bounded, we obtain

P 3 t eκW ϕ θ eκW ϕ θ + λrt (A ϕ θ) + Ct2 + Ct (40)

Combining the computations

Now, to ﬁx ideas, let us start with the DRBRD case. In the following, for convenience, we write f(θ) = f θ for functions f. From our previous computations, using ﬁrst (37) and then (39) and (40) (keeping track only of the terms of order 0 or 1 in t, the higher order ones being bounded and gathered in the term t2C),

P 1 t P 3 t P 2 2t P 3 t P 1 t V

P 1 t P 3 t P 2 2t P 3 t eκW (1 + tκθ)ϕ(θ) + tε ϕ + t2C + t C

P 1 t P 3 t P 2 2t eκW (1 + tκθ)ϕ(θ) + λrt(A ϕ(θ)) + tε ϕ + t2C + t C

:= P 1 t P 3 t P 2 2t eκW Ψ(θ) + t C ,

where we deﬁned

Ψ(z) = (1 + tκz)ϕ(z) + λrt(A ϕ(z)) + tε ϕ + t2C .

Assume that t 1/(2λr + 2 W κ). Under this condition Ψ is non-decreasing on [ W , W ] (which, we recall, contains the image of θ), since ϕ is non-decreasing and (1 + t(κz λr)) > 0 if |z| W . Thus, applying (38) but with t replaced by 2t) and then (39), (40) and (37) again,

P 1 t P 3 t P 2 2t P 3 t P 1 t V

Bertazzi, Dobson, and Monmarch e

P 1 t P 3 t eκW Ψ(θ) + 2t Mθ+ (Ψ( θ) Ψ(θ)) + t2C + t C

P 1 t P 3 t eκW Ψ(θ) + 2t Mθ+ (ϕ( θ) ϕ(θ)) + t2C + t C

P 1 t eκW (1+tκθ)ϕ(θ)+2λrt(A ϕ(θ))+2t Mθ+ (ϕ( θ) ϕ(θ)) + tε ϕ + t2C +t C

eκW (1+2tκθ)ϕ(θ)+2λrt(A ϕ(θ))+2t Mθ+ (ϕ( θ) ϕ(θ)) + 2tε ϕ + t2C0 + t C ,

for some C0, C > 0 (in the last inequality, we use that t is small enough so that the function which multiplies eκW is non-negative in order to apply (37)). The choice of ε ensures that

4ε ϕ λrq/4 =: η.

So, if we have that, for all r R,

κr + λ(A ϕ(r)) + Mr+ (ϕ( r) ϕ(r)) η , (41)

then, using that ϕ(θ) [1, 2], we will conclude that, for t min(η/C0, 1/(λr + W κ))/2,

P 1 t P 3 t P 2 2t P 3 t P 1 t V eκW ϕ(θ) 2ηt + 1

2ηt + t C 1 tη

which will conclude the proof of the ﬁrst part of Theorem 8 for the scheme DRBRD. Let us check what happens for diﬀerent splitting orders. The computations are similar, in particular an important point is to check that the bound for P 1 t (resp. P 2 t ) is always used for a function g which is non-negative (resp. non-decreasing in θ), which is ensured each time by the assumption that t is small enough, as in the previous DRBRD case. For instance, for BDRDB, we get, for t small enough,

P 2 t P 1 t P 3 2t P 1 t P 2 t V

P 2 t P 1 t P 3 2t P 1 t eκW ϕ(θ) + t Mθ+(ϕ( θ) ϕ(θ)) + t2C + t C

P 2 t P 1 t P 3 2t eκW (1 + tκθ)ϕ(θ) + tε ϕ + t Mθ+(ϕ( θ) ϕ(θ)) + t2C + t C

P 2 t P 1 t eκW (1 + tκθ)ϕ(θ) + 2λrt(A ϕ(θ)) + tε ϕ + t Mθ+(ϕ( θ) ϕ(θ)) + t2C + t C

P 2 t eκW (1 + 2tκθ)ϕ(θ) + 2λrt(A ϕ(θ)) + 2tε ϕ + t Mθ+(ϕ( θ) ϕ(θ)) + t2C + t C

eκW (1 + 2tκθ)ϕ(θ) + 2λrt(A ϕ(θ)) + 2tε ϕ + 2t Mθ+(ϕ( θ) ϕ(θ)) + t2C + t C ,

which is exactly the same bound as in the DRBRD case, as expected since we only keep track of the ﬁrst order terms in t, which coincide for all splitting orders. The other cases are similar.

Let us check that (41) holds with our choices of parameters. Using that ϕ(r) 2 for all r R and ϕ(r) = 1 for all r c V /2 and that | V (x)| c V if |x| R,

A q 1 + (1 q) 2 = 2 q .

Then, for r c V /4, since ϕ(r) 2 q/2,

A ϕ(r) q/2 .

Piecewise deterministic sampling with splitting schemes

The choice of κ ensures that, for all r c V /4,

κr + λr(A ϕ(r)) κr + λr(1 q) λrq .

For r [c V /4, qλr/(4κ)],

κr + λr(A ϕ(r)) κr λrq/2 λrq/4 .

Finally, for r qλr/(4κ), the choice of M ensures that

κr + λr(A ϕ(r)) + Mr (ϕ( r) ϕ(r))

which concludes the proof of the ﬁrst part of Theorem 8.

C.2 Doeblin condition

In this section we establish (36). The proof is an adaptation from Monmarch e (2016, Lemma 5.2) (see also Durmus et al. (2020, Lemma 11)), with the additional diﬃculty that time is discrete. The proof essentially relies on a controllability argument: the point is to construct trajectories starting at any point of a given ball and reaching any point of another suitable ball. Since we only need to lower bound the transition probability, we can focus for simplicity on particular trajectories. Speciﬁcally, in the following, we will be interested in trajectories which, during the time 4R (i.e. in the ﬁrst n iterations), have exactly three refreshments, occurring at times close to 0, 2R and 4R, and no bounces (cf. the events A1 and A2 below). Then, for these trajectories, the density of the ﬁnal state will essentially be given by the densities of the velocities V1, V2, V3 sampled at the refreshment events (the densities of V1 and V2 providing the density of the ﬁnal position, while the density of V3 gives the density of the ﬁnal velocity). We formalise this argument. Fix R > 0 and consider an initial condition (x, v) Rd Sd 1 with |x| R. We want to prove that the law of the process at time n = 4R/δ is bounded below by a positive constant (independent from δ small enough) times the Lebesgue measure on some domain, uniformly in x with |x| R. Since the process moves at unitary speed, for all n N, |Xn| R + nδ, and thus

P (no bounce in the n ﬁrst steps) e h(nδ)

with h(t) = sup{ ψ(x)|, |x| R+t}. In the absence of bounces, depending on the splitting order, the chain behaves either as the chain given by the splitting DRD or RDR. To ﬁx ideas, we only consider the DRD case in the following (corresponding either to DRBRD, DBRBD or BDRDB), the proof being similar in the other case. For k 1, denote by Tk the number of transitions of the chain between the (k 1)th

refreshment and the kth one, so that (Tk)k N is an i.i.d. sequence of geometric random variables with parameter 1 e δλr, independent from the random variables used to deﬁne the bounces. Similarly, let (Vk)k N be the i.i.d. sequence of random variables uniformly

Bertazzi, Dobson, and Monmarch e

distributed on the sphere used to deﬁne the velocities at the refreshment times. For a small parameter η > 0 to be ﬁxed later on, consider the events

A1 = {T1 + T2 + T3 n , |δTj 2R| ηR for j = 2, 3}

A2 = {δ(T4 1) > 2ηR, no bounce in the n ﬁrst steps} .

In particular, under A1 A2, exactly three refreshments occur during the n ﬁrst transitions. Then, for all Borel set B of Rd Sd 1,

Zn B, A1, A2

= P ( X, V3) B, A1, A2

= P(A2)P ( X, V3) B, A1

with, in the DRD case for instance,

X = x + δ T1 1

v + T2V1 + T3V2 + n T1 T2 T3 + 1

(Indeed, the DRD scheme starts and ends with a half step of transport). Notice that P(A2) is lower bounded uniformly in δ (0, 1] as

P(A2) e h(4R+1)e λr(2ηR+1) .

On the other hand, writing

I = {(t1, t2, t3) N3, t1 + t2 + t3 n , |δtj 2R| ηR for j = 2, 3} ,

P ( X, V3) B, A1

I (1 e δλr)3e δλr(t1+t2+t3 3)

v + t2v1 + t3v2 + n t1 t2 t3 + 1

I (1 e δλr)3e δλr(t1+t2+t3 3) inf |x | R(1+2η)

(Sd 1)3 1B x + δ(t2v1 + t3v2), v3 dv1dv2dv3 .

By the rotation invariance of the uniform measure on the sphere, for ﬁxed t, s > 0, s V1 +t V2 has the same distribution as V1|sw + t V2| where w is a ﬁxed unitary vector. Then, |sw + t V2|2 = s2 + t2 + 2st V T 2 w and V T 2 w has on [ 1, 1] the probability density proportional to u 7 (1 u2)d/2 1 (this is here we use that d 2), which is lower bounded by a positive constant on [ 1 + ε, 1 ε] for all ε > 0. Assuming that η 1/4 and considering for y Rd

the ring R(y) = {x Rd, 4ηR |x y| 4R(1 2η)}, we get that s V2 + t V3 has a density which is lower bounded on R(0) by a constant α > 0 which is independent from t, s [R(2 η), R(2 + η)].

Piecewise deterministic sampling with splitting schemes

As a consequence, for (t1, t2, t3) I and x Rd with |x | R(1 + 2η), the law of (x + δ(t2V1 + t3V2), V3) has a density lower bounded by α on R(x ) Sd 1. Assuming that η 1/16, let R = {y Rd, R(1 + 6η) |y| R(3 10η)} (which has a non-zero Lebesgue measure). The triangle inequality implies that R R(x ) if |x | R(1 + 2η). As a consequence,

P ( X, V3) B, A1 P (A1) α Z

R Sd 1 1B(y, v)dydv ,

which concludes the proof of (36) since P(A1) converges to a positive constant as δ 0.

Appendix D. Ergodicity for splitting schemes of ZZS

D.1 Splitting DBD

In this Section we focus on splitting scheme DBD for ZZS. In order to prove Theorem 10 we prove a drift condition for the one step transition kernel Pδ, implying a similar conclusion for P 2 δ , as well as a minorisation condition after an even number of steps n . These two statements, together with the aperiodicity of P 2, enable us to conclude on the geometric ergodicity of the kernel P 2.

Theorem 21 Consider the splitting scheme DBD for ZZS. Suppose Assumption 9 holds. Then the following hold:

1. Let β (0, 1/2), φ(s) = 1

2sign(s) ln (1 + 2|s|) and

V (x, v) = exp

i=1 φ(vi iψ(x))

There exist constants b (0, 1), C < such that for all (x, v) and all δ (0, δ0) with δ0 = 2(1 + γ0) 1, where γ0 is as in Assumption 9(b), it holds that

PδV (x, v) (1 bδ)V (x, v) + Cδ. (43)

2. For any R > 0 consider a set C = [ R, R]d. For some L > 0 and for y as in Assumption 9(a) let

n := 2 + 4 y + 2R

which satisﬁes n 2N. For (x, v) C { 1}d deﬁne the set D(x, v) given by (19). Then for any (y, w) D(x, v) (C { 1}d) and δ (0, δ0] for δ0 > 0 it holds that

δy,w P n δ cν .

where c is independent of δ and ν is uniform over D(x, v) (C { 1}d).

We observe that the Lyapunov function deﬁned in (42) is the same of the continuous time ZZS of Bierkens et al. (2019b). In Section D.1.1 we prove the minorisation condition, in Section D.1.2 we prove the drift condition, while in Section D.1.3 we prove Equation (22).

Bertazzi, Dobson, and Monmarch e

D.1.1 Minorisation condition

We now prove a minorisation condition for splitting scheme DBD of ZZS. In the following Lemma we consider the one-dimensional setting, for which the reasoning is similar to that of the proof of a minorisation condition for the continuous ZZS done in Bertazzi and Bierkens (2022, Lemma B.2).

Lemma 22 Consider the splitting scheme DBD of ZZS with step size δ δ0. Suppose Assumption 9(a) holds for some y 0 and consider a set C = [ R, R] for R > 0. For L > 0 and for y as in Assumption 9(a) let n as in (44). For (x, v) C { 1} deﬁne the set D(x, v) := D+(x, v) D (x, v), where

D+(x, v) := {(y, w) : w = v, y = x + mδ, m 2Z},

D (x, v) := {(y, w) : w = v, y = x + mδ, m 2Z + 1}. (45)

Then for any (y, w) D(x, v) (C { 1}) it holds that

P(y,w)((Xn , Vn ) ) bν( )

where b is independent of δ and ν is uniform over D(x, v) (C { 1}).

Proof Let C = [ R, R] for a ﬁxed R > 0 and let x C. We shall consider only the case of v = +1, as the same arguments extend to the symmetric case v = 1. In particular observe that if the process is started in set D(x, +1) (respectively D(x, 1)), then after an even number of iterations it will again be in D(x, +1) (D(x, 1)). This means that the process lives on D(x, +1) (respectively on D(x, 1)). To shorten the notation we denote by D+, D the sets D+(x, +1), D (x, +1) as deﬁned in (45). Below we focus on the case of an initial condition in D+, while the case of D follows with an identical reasoning and obvious changes. Let n as in (44) and deﬁne

λ := max x C max y:|y x| n δ,v= 1 λ(y, v)

which is the largest switching rate that can be reached within n iterations starting in C. Note that our deﬁnition of n , that is (44), implies that n δ is upper bounded by a constant as δ δ0 and thus λ can be chosen independently of the step size δ. Recall λ > 0. From here on we shall denote the initial condition as (y, w) D+, and without loss of generality we shall assume y = x + ℓδ for some ℓ N, where y is as in Assumption 9(a). We want to lower bound the probability that after n iterations the process is in measurable sets B D. We consider two cases: in the ﬁrst one the ﬁnal state of the process is of the form (Xn , Vn ) = (z, 1) D (C { 1}), while in the second case (Xn , Vn ) = (z, +1) D+ (C { 1}).

Consider the case in which the ﬁnal state has negative velocity, i.e. Vn = 1. To lower bound the probability of reaching this state, we consider the case in which only one switching

Piecewise deterministic sampling with splitting schemes

event takes place. Let z = y + mδ with m N odd. Then in order to have (Xn , Vn ) = (z, 1) with exactly one event taking place at time N1 it must be that

y + (N1 1)δ (n N1)δ = z.

Thus we ﬁnd that the event should take place at

In order to guarantee the switching rate is strictly positive it must also be that XN1 y, i.e. y + (N1 1)δ y and thus N1 1 + ( y y)/δ. Note N1 < n as required. Denote the position at the time of the switching event by x = y + δ(N1 1/2). Then the probability of exactly one event taking place at iteration N1 is given by Z δ

0 λ( x, 1) exp( sλ( x, 1)) exp( (δ s)λ( x, 1))ds δλ exp( δλ).

The probability of this path is simple to lower bound, since upper bounding the switching rates gives a smaller probability:

P(y,+1)((Xn , Vn ) = (z, 1))

n=0 exp( δλ(y + (n + 1/2)δ))

| {z } no jump before N1

δλ exp( δλ) | {z } a jump at N1

n=0 exp( δλ(y + (N1 1 n)δ))

| {z } no jump after N1 exp( (N1 1)δλ) δλ exp( δλ) exp( (n N1)δλ)

2 exp( (n 1)δλ) λ exp( δ0λ) 1

2 exp( (n 1)δλ) λ exp( δ0λ)(2R δ) ν( 1) 1

where M N is the number of points in D+ (C { 1}). In the last line we used that δM 2R δ. Recall that δ δ0. This concludes as (n 1)δ 4 y + R + 2L + 3δ0 and 2R δ 2R δ0.

Second case

We now focus on the case in which Vn = +1. We shall ﬁnd an appropriate lower bound by restricting to the case in which exactly two switching events take place. Denoting the times of the two events as N1, N2, if the ﬁnal position is z it must be

y + (N1 1)δ (N2 1)δ + (n N1 N2)δ = z

which implies

Bertazzi, Dobson, and Monmarch e

Moreover, at event times the process should be in regions with strictly positive switching rate:

y + (N1 1/2)δ y,

y + (N1 1)δ (N2 1/2)δ y.

These imply respectively

δ + 1 =: N1,

Since N2 is determined by (46), we enforce that the second inequality holds:

which implies

2 y + 2 y + z

Now to obtain the right dependence on δ, we shall take n such that N1 N1 is increasing as 1/δ. It holds

N1 N1 = n 2

2 4 y y + z

and thus it is suﬃcient to take

n = 2 + 4 y y + z

for some constant L > 0, as with this choice N1 N1 = L/δ . Using the results above we ﬁnd

P(y,+1)((Xn , Vn ) = (z, +1))

n=0 exp( δλ(y + (n + 1/2)δ))

| {z } no jumps before N1

δλ exp( δλ) | {z } a jump at N1

m=0 exp( δλ(y + (N1 1 (m + 1/2))δ))

| {z } no jumps until N2

δλ exp( δλ) | {z } a jump at N2

ℓ=0 exp( δλ(y + (N1 1 (N2 1) + (ℓ+ 1/2))δ))

| {z } no jumps after N2

Piecewise deterministic sampling with splitting schemes

N1=N1 exp( δλn δ))δ2λ2 exp( 2δλ)

exp( λn δ))δ2λ2 exp( 2δλ)

L exp( δλn ))λ2 exp( 2δ0λ)δ

2L exp( δλn ))λ2 exp( 2δ0λ)(2R δ0) ν(+1) 1

Similarly to above it is now suﬃcient to note that n δ 4δ0 + 4 y + 2R + 2L.

To conclude it is suﬃcient to observe that the conditions above hold for any choice of x, y, z C since n is as in (44).

Multidimensional case

To extend to the higher dimensional setting, ﬁrst observe that it is possible to apply the same ideas in the proof of Lemma 22 to each component, in particular requiring that the events happen when all components of the process are outside of the rectangle [ y, + y]d. This implies that Assumption 9(a) can be used to lower bound the probability of ﬂipping each component of the velocity vector. Hence each coordinate can be controlled independently of the others. It is clear that the following minorisation condition is implied: let C = [ R, R]d

for R > 0, (x, v) C { 1}d, and let D(x, v) as in (19); then for all (y, w) (x, v) it holds that P(y,w)((Xn , Vn ) ) bd νd( ),

where n , b are as in Lemma 22 and νd is the uniform distribution over states in the grid D(x, v) (C { 1}d).

D.1.2 Drift condition

Let us ﬁrst characterise in the following Lemma the law of the jump part of the process. This result is then used to prove the wanted drift condition in Lemma 24 below.

Lemma 23 Let V x t denote the PDMP corresponding to the generator L2 (for this process x acts as a parameter). Suppose that λi(x, v) is independent of vj for j = i. Then for any w { 1}d we have

Pv( V x t = w) =

λi(x, Riw) + wi

vi λi(x, v)e (λi(x,v)+λi(x,Riv))t

λi(x, v) + λi(x, Riv) .

Proof To simplify notation we will suppress the dependence on x and set Λi(v) = λi(v) + λi( v) = λi(x, v) + λi(x, Riv). Since V x t jumps according to λi which does not depend on

Bertazzi, Dobson, and Monmarch e

vj we have that the coordinates of V x t are all independent. Hence it is suﬃcient to show

Pvi(( V x t )i = wi) = λi( wi) + wi

vi λi(vi)e Λi(vi)t

Therefore it is suﬃcient to consider the setting d = 1. Deﬁne for any t 0, v, w {1, 1}

ϕt(v; w) := λ( w) + w

v λ(v)e Λ(v)t

If we show that for all t 0, v, w {1, 1}

tϕt(v; w) = L2ϕt(v; w), ϕ0(v; w) = 1w(v), (47)

then ϕt coincides with the semigroup applied to 1w and we have the desired result

ϕt(v; w) = Ev[ϕ0( Vt; w)] = Pv[ Vt = w].

It is straightforward to conﬁrm the initial condition ϕ0(v; w) = 1w(v) holds. So it remains to show that ϕt satisﬁes the PDE (47). Note that

tϕt(v; w) = w

v λ(v)e Λ(v)t

L2ϕt(v; w) = λ(v) (ϕt( v; w) ϕt(v; w))

v λ( v)e Λ(v)t

Λ(v) λ( w) + w

v λ(v)e Λ(v)t

v λ( v)e Λ(v)t

v λ(v)e Λ(v)t

Therefore we have that (47) holds.

Lemma 24 Consider the splitting scheme DBD of ZZS. Let λi(x, v) = (vi iψ(x))+ + γi(x) and let Assumption 9 be veriﬁed. Let V be the function deﬁned in (42). Then there exist constants ρ (0, 1), C < such that for all (x, v) and all t (0, t0) with t0 < (1 + γ0) 1

P t V (x, v) = P D t P B 2t P D t V (x, v) (1 ρt)V (x, v) + Ct. (48)

Proof We start the proof by deriving a ﬁrst bound for P t V , where V is the function deﬁned in (42) and which it is our goal to prove is a Lyapunov function for the chain. This will result in the inequality (51). Afterwards, we divide our proof in two cases, based on whether we are considering an initial condition (x, v) which is inside or outside of a large enough compact set. In both cases we shall prove that (48) holds with the right time dependence. The most challenging case is when (x, v) is outside of a large set, where involved computations are required.

Piecewise deterministic sampling with splitting schemes

For a function g(x, v) conditioning on the event v = w and using Lemma 23 we have

P tg(x, v) = X

w { 1}d g(x + vt + wt, w)

" λi(x + vt, Riw) + wi

vi λi(x + vt, v)e (λi(x+vt,v)+λi(x+vt,Riv))t

λi(x + vt, v) + λi(x + vt, Riv)

Plugging in our Lyapunov function V we have

P t V (x, v)

V (x, v) = X

V (x + vt + wt, w)

" λi(x + vt, Riw) + wi

vi λi(x + vt, v)e (λi(x+vt,v)+λi(x+vt,Riv))t

λi(x + vt, v) + λi(x + vt, Riv)

By Taylor s theorem there exists x1 = x1(x, v, w, t) B(x, t

d) such that

ψ(x + vt + wt) = ψ(x) + t v + w, ψ(x) + t2

2 (v + w)T 2ψ(x1)(v + w).

Therefore we can rewrite (49) as

P t V (x, v)

V (x, v) = X

w { 1}d e t2

2 (v+w)T 2ψ(x1)(v+w) d Y

i=1 et(vi+wi)β iψ(x)+φ(wi iψ(x+vt+wt)) φ(vi iψ(x))

" λi(x + vt, Riw) + wi

vi λi(x + vt, v)e (λi(x+vt,v)+λi(x+vt,Riv))t

λi(x + vt, v) + λi(x + vt, Riv)

(50) Since |φ (s)| 1 for all s, by Taylor s Theorem we have

φ(wi iψ(x+vt+wt)) φ(vi iψ(x)) |wi iψ(x+vt+wt) wi iψ(x)|+φ(wi iψ(x)) φ(vi iψ(x)).

Then we can write

P t V (x, v)

i=1 I(i) (51)

2 (v+w)T 2ψ(x1)(v+w)e Pd i=1|wi iψ(x+vt+wt) wi iψ(x)|

I(i) = et(vi+wi)β iψ(x)+φ(wi iψ(x)) φ(vi iψ(x))

λi(x + vt, Riw) + wi

vi λi(x + vt, v)e (λi(x+vt,v)+λi(x+vt,Riv))t

λi(x + vt, v) + λi(x + vt, Riv) .

We shall now use the bound (51) as a starting point, focusing ﬁrst on the setting in which (x, v) is outside of a large enough compact set.

Bertazzi, Dobson, and Monmarch e

Bound outside of a compact set

We split the product in (51) into four cases: (i) wi = vi and vi iψ(x + vt) > 0; (ii) wi = vi and vi iψ(x + vt) < 0; (iii) wi = vi and vi iψ(x + vt) > 0; (iv) wi = vi and vi iψ(x + vt) < 0. Consider ﬁrst case (i). Let i be such that wi = vi and vi iψ(x + vt) > 0. Then

I(i) = e2tviβ iψ(x) λi(x + vt, Riv) + λi(x + vt, v)e (λi(x+vt,v)+λi(x+vt,Riv))t

λi(x + vt, v) + λi(x + vt, Riv) .

Using the form of λi we can write this as

I(i) = γi(x + vt)e2βtvi iψ(x)(1 e (| iψ(x+vt)|+2γi(x+vt))t)

| iψ(x + vt)| + 2γi(x + vt) + e (| iψ(x+vt)|+2γi(x+vt))t+2tviβ iψ(x).

Using that 1 e z z for all z > 0 (note we make use of this inequality several times in the following computations) we ﬁnd

I(i) γi(x + vt)e2βtvi iψ(x)t + e (| iψ(x+vt)|+2γi(x+vt))t+2tviβ iψ(x).

By Assumption 9(b) we can bound γi for |x| R with R suﬃciently large and we have vi iψ(x + vt) 1 + γi(x + vt) which gives

I(i) 1 + (vi iψ(x + vt) + γi(x + vt))

| iψ(x + vt)| + 2γi(x + vt) e | iψ(x+vt)|t+2tviβ iψ(x) 2e | iψ(x+vt)|t+2tviβ iψ(x).

For case (ii), let i be such that wi = vi and vi iψ(x + vt) < 0. Then

I(i) = e2βtvi iψ(x) | iψ(x + vt)| + γi(x + vt) + γi(x + vt)e (| iψ(x+vt)|+2γi(x+vt))t

| iψ(x + vt)| + 2γi(x + vt) e2βtvi iψ(x).

For case (iii), let i be such that wi = vi and vi iψ(x + vt) > 0. Then

I(i) = eφ( vi iψ(x)) φ(vi iψ(x)) λi(x + vt, v) λi(x + vt, v)e (λi(x+vt,v)+λi(x+vt,Riv))t

λi(x + vt, v) + λi(x + vt, Riv) .

For s > 0 it holds that φ( s) φ(s) = ln(1 + 2s) and hence

I(i) = λi(x + vt, v)

1 + 2vi iψ(x) 1 e (λi(x+vt,v)+λi(x+vt, v))t

λi(x + vt, v) + λi(x + vt, Riv) λi(x + vt, v)

1 + 2vi iψ(x)t.

For case (iv), let i be such that wi = vi and vi iψ(x + vt) < 0. Then

I(i) = eφ( vi iψ(x)) φ(vi iψ(x)) γi(x + vt) γi(x + vt)e (λi(x+vt,v)+λi(x+vt,Riv))t

λi(x + vt, v) + λi(x + vt, Riv) .

For s < 0 we have φ( s) φ(s) = ln(1 + 2|s|), and thus we obtain

I(i) γi(x + vt)(1 + 2| iψ(x)|)t.

Piecewise deterministic sampling with splitting schemes

Combining these estimates we have for |x| R with R suﬃciently large

P t V (x, v)

w { 1}d K1 Y

i:wi=vi, vi iψ>0 (γi(x + vt)e2βtvi iψ(x)t + e (| iψ(x+vt)|+2γi(x+vt))t+2tviβ iψ(x))

i:wi=vi, vi iψ<0 e2βtvi iψ(x) Y

i:wi= vi, vi iψ>0

λi(x + vt, v) 1 + 2vi iψ(x)t Y

i:wi= vi, vi iψ<0 γi(x + vt)(1 + 2| iψ(x)|)t.

Now consider K1. By Taylor s theorem there exists x2 B(x, 2

dt) such that

2 |((v + w)T 2ψ(x1))i| + t|(w 2ψ(x2))i| |vi + wi|

Using this bound and the four cases above we now obtain

P t V (x, v)

i:vi iψ>0 e2t2|(v T 2ψ(x1))i|+ 2t

2 |(v 2ψ(x2))i| (γi(x + vt)e2βtvi iψ(x)t + e (1 2β)| iψ(x+vt)|t 2γi(x+vt)t)

i:vi iψ<0 e

2 (2|v T 2ψ(x1))i|+2t|(v 2ψ(x2))i| +2βtvi iψ(x) + X

w { 1}d\{v} t|{i:wi =vi}|

i:wi=vi,vi iψ>0 e(t2(|(v+w)T 2ψ(x1))i|+2t|(w 2ψ(x2))i|)(γi(x + vt)e2βtvi iψ(x)t + e (1 2β)| iψ(x+vt)|t 2γi(x+vt)t)

i:wi=vi,vi iψ<0 e(t2(|(v+w)T 2ψ(x1))i|+2t|(w 2ψ(x2))i|)e2βtvi iψ(x) Y

i:wi= vi,vi iψ>0

λi(x + vt, v) 1 + 2vi iψ(x)

i:wi= vi,vi iψ<0 e t0| iψ(x+vt)|(1 + 2| iψ(x)|).

By (21) we have

P t V (x, v)

i:vi iψ>0 (γ0t + e2t2|(v T 2ψ(x1))i|+ 2t

2 |(v 2ψ(x2))i|e (1 2β)| iψ(x+vt)|t 2γi(x+vt)t)

i:vi iψ<0 e

2 (2|v T 2ψ(x1))i|+2t|(v 2ψ(x2))i| +2βtvi iψ(x)

w { 1}d\{v} t|{i:wi =vi}| Y

i:wi=vi,vi iψ>0 (γ0t + e(t2(|(v+w)T 2ψ(x1))i|+2t|(w 2ψ(x2))i|)e (1 2β)| iψ(x+vt)|t 2γi(x+vt)t)

i:wi=vi,vi iψ<0 e(t2(|(v+w)T 2ψ(x1))i|+2t|(w 2ψ(x2))i|)e2βtvi iψ(x) Y

i:wi= vi,vi iψ>0

λi(x + vt, v) 1 + 2vi iψ(x)

i:wi= vi,vi iψ<0 e t0| iψ(x+vt)|(1 + 2| iψ(x)|).

Since β < 1/2, by Assumption 9(c) there exists β1 such that for |x| R with R suﬃciently large

P t V (x, v)

i:vi iψ>0 (γ0t + e β1| iψ(x+vt)|t) Y

i:vi iψ<0 e β1| iψ(x)|t

Bertazzi, Dobson, and Monmarch e

w { 1}d\{v} t|{i:wi =vi}| Y

i:wi=vi,vi iψ>0 (γ0t + e β1| iψ(x+vt)|t) Y

i:wi=vi,vi iψ<0 e β1t| iψ(x)|

i:wi= vi,vi iψ>0

λi(x + vt, v) 1 + 2vi iψ(x)

i:wi= vi,vi iψ<0 e t0| iψ(x+vt)|(1 + 2| iψ(x)|).

For |x| R with R suﬃciently large λi(x + vt, v)/(1 + 2vi iψ(x)) 1 and by (21) we have γi(x)(1 + 2| iψ(x)|) 1. We also have that | ψ(x + vt)| M for any M > 0 for |x| R with R suﬃciently large. Then we have

P t V (x, v)

V (x, v) (γ0t + e β1Mt)|{i:vi iψ(x+vt)>0}|e β1Mt|{i:vi iψ(x+vt)<0}|

w { 1}d\{v} t|{i:wi =vi}| Y

i:wi=vi,vi iψ>0 (γ0t + e β1Mt) Y

i:wi=vi,vi iψ<0 e β1t M.

Since e β1Mt γ0t + e β1Mt we obtain

P t V (x, v)

V (x, v) (γ0t + e β1Mt)|{i:vi iψ(x+vt)>0}|(γ0t + e β1Mt)|{i:vi iψ(x+vt)<0}|

w { 1}d\{v} t|{i:wi =vi}|(γ0t + e β1Mt)|{i:wi=vi,vi iψ(x+vt)>0}|(γ0t + e β1Mt)|{i:wi=vi,vi iψ(x+vt)<0}|.

P t V (x, v)

V (x, v) (γ0t + e β1Mt)d + X

w { 1}d: w =v t|{i:wi =vi}|(γ0t + e β1Mt)d |{i:wi =vi}|

w { 1}d t|{i:wi =vi}|(γ0t + e β1Mt)d |{i:wi =vi}|

tk (γ0t + e β1Mt)d k

= (1 + γ0)t + e β1Mt d .

To show that (48) holds for |x| R it is suﬃcient to show that (1 + γ0)t + e β1Mt < 1 ρt for some ρ > 0. Indeed in that case 1 ρt < 1 and thus

((1 + γ0)t + e β1Mt)d < (1 ρt)d < 1 ρt.

Note that for t t0, with t0 [0, 1], it holds that e β1Mt 1 ct for c = 1 e β1Mt0

t0 . Then for t t0 we have (1 + γ0)t + e β1Mt 1 t(c 1 γ0).

Then it is needed that c > 1 + γ0, that is t0 should be such that

t0 > 1 + γ0. (52)

Note we can always increase M by taking R larger. Choose M such that e β1Mt0 < 1 t0(1 + γ0), which is possible since t0 < (1 + γ0) 1, then (52) holds. Hence (48) holds for |x| R with ρ = (1 e β1Mt0)t 1 0 1 γ0.

Piecewise deterministic sampling with splitting schemes

Bound inside of a compact set

It remains to show that (48) holds for |x| R. Let C = {x : |x| R} { 1}d. Recall t < 1 and ψ C2. We shall use the inequality etr 1 + tr + t2r2er/2 1 + t(r + e3r/2), which holds for for t 1, r > 0. First of all we consider the term in the sum corresponding to the case w = v. Bounding the probability of this event by 1 we ﬁnd

i=1 I(i) et22v T 2ψ(x1)v+ 2

2 Pd i=1| iψ(x+2vt) iψ(x)|+2tβ v, ψ(x)

1 + t(A(x, v, t) + e3A(x,v,t)/2),

where A(x, v, t) = 2v T 2ψ(x1)v + (2/2) Pd i=1| v, iψ(x2) | + 2β v, ψ(x) . Taking the maximum of A over (x, v, t) C { 1}d (0, 1) we ﬁnd

i=1 I(i) 1 + t A.

Let us now consider the remaining elements in the sum. Here we take advantage that a velocity ﬂip is an order t event. Consider for the moment only the i-th component of the velocity vector. The probability that this is ﬂipped (i.e. wi = vi) satisﬁes

λi(x + vt, Riw) λi(x + vt, v)e (λi(x+vt,v)+λi(x+vt,Riv))t

λi(x + vt, v) + λi(x + vt, Riv)

λi(x + vt, v)(1 e t(| iψ(x+tv)|+2γi(x+tv)))

| iψ(x + tv)| + 2γi(x + tv) tλi(x + vt, v)

t max i=1,...,d, (x,v) C { 1}d, t (0,1) λi(x + vt, v).

Here we used that 1 exp( z) z for z 0. All other probabilities can be bounded by 1 and hence

i=1 I(i) t max i=1,...,d, (x,v) C { 1}d, t (0,1) λi(x + vt, v) X

V (x + vt + wt, w)

Since V is continuous under our assumptions we proved that for every compact set C { 1}d there exists a constant B > 0 such that for all (x, v) C { 1}d it holds

P t V (x, v) (1 + t B)V (x, v). (53)

Therefore we have (48) holds for all x Rd, v { 1}d.

Bertazzi, Dobson, and Monmarch e

D.1.3 Proof of Equation (22)

Let us prove (22). Fix a probability measure µ on Rd { 1}d, and let (x, v) be a point in the support of µ. Then we can construct the set D(x, v) corresponding to (x, v) and given by (19). By Theorem 10 there is a unique invariant measure πx,v δ for the Markov process with kernel P 2 δ , and by (18) for any probability measures ν, ν on D(x, v)

νP 2n δ ν P 2n δ V C

α κnδ ν ν V .

Setting ν = δx,v, ν = πx,v δ and using that πx,v δ is an invariant measure for the kernel P 2 δ we have

δx,v P 2n δ πx,v δ V C

α κnδ ν ν V .

Then integrating with respect to the probability measure µ we obtain

µP 2n δ µπx,v δ V = sup |g| V

Z [P 2n δ g(x, v) πx,v δ (g)]µ(dx, dv)

Z sup |g| V

P 2n δ g(x, v) πx,v δ (g) µ(dx, dv)

Z δx,v P 2n δ πx,v δ V µ(dx, dv)

α κnδ Z δ(x,v) πx,v δ V µ(dx, dv).

D.2 Other splitting schemes

In this Section we consider splitting schemes DRBRD and RDBDR of ZZS and prove geometric ergodicity in Theorem 26. The minorisation and drift conditions are proved in Sections D.2.1 and D.2.2 respectively. We shall work under the following assumption.

Assumption 25 There exists γ0 (0, ) such that the following conditions for the refreshment rate hold:

(a) there exists R > 0 for which for any |x| R it holds that

j=1 (1 + | jψ(x)|) γ0.

(b) For |x| > R for some R > 0 it holds that

sup γ(x + vt)et| ψ(x)|+ t2

2 (v+w)T 2ψ(y1)(v+w)+| ψ(y2)| d Y

i=1 (1 + 2| iψ(x)|) γ0,

where the supremum is over t (0, 1), y1, y2 B(x, t

d), v, w { 1, 1}d.

Piecewise deterministic sampling with splitting schemes

Theorem 26 Consider splitting schemes DRBRD and RDBDR of ZZS. Suppose Assumption 9(a), (c) holds for switching rates λi(x, v) = (vi iψ(x))+. Suppose moreover that the refreshment rate γ satisﬁes Assumption 25(a) for scheme RDBDR and Assumption 25(b) for scheme DRBRD. Then statements (1) and (2), as well as Equation (22), hold. In particular (2) holds with δ0 < 2(1+2γ0 +γ2 0) 1 for RDBDR and with δ0 < 2(1+2γ0) 1

D.2.1 Minorisation condition

Splitting DRBRD

The chain obtained by DRBRD has the same periodic behaviour of DBD. Hence this case can be treated in the same way and a minorisation condition follows by the same reasoning used in Section D.1.1 for splitting DBD.

Splitting RDBDR

In this case we give a sketch of the proof. The chain obtained by RDBDR breaks the grid behaviour exhibited by DBD because of the two refreshment steps at the beginning and end of each step. Indeed, consider the one-dimensional case and recall the deﬁnition of the grid D(x, v) as in Lemma 22. Since v = 1, there are two disjoint grids: D(x, +1) and D(x, 1), with the idea being that after even steps of DBD the process lives on the same grid it started from. However, the process RDBDR can swap between one grid and the other by having a velocity refreshment. Indeed, starting the process at (x, +1) and having a velocity ﬂip due to a refreshment at the end of the ﬁrst step and having no other jumps, we ﬁnd the state of the process is (X2, V2) = (x, 1). Therefore after even steps this process lives on the grid D(x) = {y : y = x + mδ, m Z}. If the initial and ﬁnal condition are on the same grid D(x, v), then no refreshment is required and one can simply use the proof of the scheme DBD. On the other hand, if the two states are on diﬀerent grids, i.e. one is on D(x, +1) and the other on D(x, 1), then a refreshment is required to choose the right grid. In order to maintain the right dependence on the step size δ it is required to give the process additional M

δ iterations, for a constant M > 0. Indeed with this modiﬁcation the probability of having a refreshment in the ﬁrst M

δ is constant has a lower bound which is independent of δ, assuming δ δ0 for some δ0 > 0 (see for instance the second case in the proof of Lemma 22). After the ﬁrst M

δ iterations the process is on the right grid and Lemma 22 can be applied with the further constraint that no (more) refreshments take place. Note that this event again has a lower bounded probability independent of δ. Since in the ﬁrst M

δ iterations the process can go out of the initial compact set C = [ R, R], it follows that the Lemma should be applied with set C = [ R M, R + M]. The extension to the multidimensional case follows by applying this same intuition to every component.

D.2.2 Drift condition

Let us start with an auxiliary result.

Bertazzi, Dobson, and Monmarch e

Lemma 27 Suppose the refreshment rate γ satisﬁes Assumption 25(a). Then P R t V (1 + γ0t)V + Mt, where γ0 is as in Assumption 25 and M independent of t.

Proof Let V be as in Lemma 24. Applying the transition kernel P R t to V we ﬁnd

P R t V (x, v) = V (x, v)e tγ(x) + 1

2d (1 e tγ(x)) X

w =v V (x, w)

= V (x, v)e tγ(x) + (1 e tγ(x))V (x, v) 1

= V (x, v) e tγ(x) + (1 e tγ(x)) 1

j:vj =wj (1 + | jψ(x)|)

e tγ(x) + tγ(x)

j=1 (1 + | jψ(x)|)

Clearly for x inside of a compact set this implies P R t V (x, v) (1 + Bt)V (x, v) by taking maximum over x. Outside of a compact set we use Assumption 25 to obtain

P R t V (x, v) V (x, v)(1 + tγ0).

Lemma 28 Consider the splitting scheme RDBDR of ZZS. Suppose Assumptions 9(c) and 25(a) hold. Then there exist a function V and constants ρ (0, 1), C > 0 such that for any t (0, t0) with t0 < (1 + 2γ0 + γ2 0) 1 it holds that

P R t P D t P B 2t P D t P R t V (x, v) (1 ρt)V (x, v) + Ct.

Proof Let V be as in Lemma 24. In the current context the result of the Lemma is that for all t (0, t0) with t0 < 1 it holds that P D t P B 2t P D t V (x, v) (1 ρt)V (x, v) + Bt where ρ = (1 e Rt0)t 1 0 1 for R suﬃciently large such that ρ > 0. Applying Lemmas 24 and 27 we obtain

P R t P D t P B 2t P D t P R t V (x, v) (1 + tγ0)P R t P D t P B 2t P D t V (x, v) + Mt

(1 + tγ0)(1 ρt)P R t V (x, v) + t(M + (1 + γ0)B)

(1 + tγ0)2(1 ρt)V (x, v) + t(M(2 + γ0) + (1 + γ0)B).

It is left to ensure that (1 + tγ0)2(1 ρt) (1 ρt) for ρ > 0. We have

(1 + tγ0)2(1 ρt) (1 t(ρ 2γ0 γ2 0)).

Hence it is needed that (1 e Rt0)

t0 1 > 2γ0 + γ2 0

Piecewise deterministic sampling with splitting schemes

and thus that e Rt0 < 1 t0(1 + 2γ0 + γ2 0), which is valid as R can be taken as large as needed and t0 < (1 + 2γ0 + γ2 0) 1.

Lemma 29 Consider the splitting scheme DRBRD of ZZS. Suppose Assumptions 9(c) and 25 hold. Then there exist a function V and constants ρ (0, 1), C > 0 such that for any t (0, t0) with t0 < (1 + 2γ0) 1 it holds that

P D t P R t P B 2t P R t P D t V (x, v) (1 ρt)V (x, v) + Ct.

Proof Let V be as in Lemma 24. Observe that by Lemma 27 we have that

P R t P D t V (x, v) = P R t V (x + vt, v) (1 + γ0t)V (x + vt, v) + Mt

and thus P R t P D t V (x, v) (1 + γ0t)P D t V (x, v) + Mt. Then

P D t P R t P B 2t P R t P D t V (x, v) (1 + γ0t)P D t P R t P B 2t P D t V (x, v) + Mt

P D t P R t P B 2t P D t V (x, v) = e tγ(x+vt)P D t P B 2t P D t V (x, v)

+ V (x, v)(1 e tγ(x+vt)) X

1 2d V (x + vt + wt, w)

The ﬁrst term corresponds to the case of no refreshments, while in the second term a refreshment takes place. For the ﬁrst term we can directly apply Lemma 24, which in the current context shows that for t < t0 < 1 it holds P D t P B 2t P D t V (x, v) (1 ρt)V (x, v) + Mt for ρ = (1 e β1Mt0)t 1 0 1. The second term can be rewritten as in (50), that is for x1 = x1(x, v, w, t) B(x, t

V (x + vt + wt, w)

β(ψ(x + vt + wt) ψ(x)) +

i=1 (φ(wi iψ(x + vt + wt)) φ(vi iψ(x)))

= et| ψ(x)|+ t2

2 (v+w)T 2ψ(x1)(v+w)+| ψ(x2)| d Y

i=1 (1 + 2| iψ(x)|).

Using Assumption 25(b) we ﬁnd

(1 e tγ(x+vt)) X

1 2d V (x + vt + wt, w)

V (x, v) tγ(x + vt) X

1 2d V (x + vt + wt, w)

Bertazzi, Dobson, and Monmarch e

Therefore we have shown

P D t P R t P B 2t P R t P D t V (1 + γ0t)(1 ρt + tγ0)V + Mt (1 t(ρ 2γ0))V + Mt.

Hence it is suﬃcient to ensure that ρ > 2γ0, which can be done similarly to the proof of Lemma 28.

Appendix E. Proofs of Section 4

In this section we collect statements and proofs that are not included in Section 4.

E.1 The generator of BPS and its adjoint

The generator of BPS and its adjoint play an important role in the proofs below and therefore we recall them in this section. Recall the generator of BPS:

LBPSf(x, v) = v, xf(x, v) + λ1(x, v)[f(x, R(x)v) f(x, v)]

Z f(x, w) f(x, v) ν(w)dw.

The adjoint of LBPS in L2, that is the operator L BPS such that R g LBPSf = R f L BPSg, is given by

L BPSg(x, v) = v, xg(x, v) + ((gλ1)(x, R(x)v) (gλ1)(x, v))

ν(v) Z g(x, y)dy g(x, v) ,

which can be written as L BPSg(x, v) = (L D + L B + L R)g(x, v) for

L Dg(x, v) = v, xg(x, v) ,

L Bg(x, v) = g(x, R(x)v)λ1(x, R(x)v) g(x, v)λ1(x, v),

L Rg(x, v) = λr

ν(v) Z g(x, y)dy g(x, v) .

E.2 Proof of Proposition 16

We start by focusing on the left hand side of (26), i.e. L BPS(µf2). We ﬁnd since µ is rotationally invariant in v

L BPS(µf2)(x, v) = µ(x, v) n v, ψ(x) f2(x, v) v, xf2(x, v) + ( v, ψ(x) )+f2(x, R(x)v)

v, ψ(x) +f2(x, v) + λr

Z f2(x, y)ν(dy) λrf2(x, v) o .

We shall consider the case of v = 1, hence ν = (1/2)δ+1+(1/2)δ 1. In particular this choice satisﬁes Assumption 30 below. Introduce the notation f+ 2 (x) = f2(x, 1), f 2 (x) = f2(x, 1). We have in the 1-dimensional setting

L BPS(µf2)(x, +1) = µ(x, +1) n (f+ 2 ) (x) + (( ψ (x))+ + λr/2)f+ 2 (x) (λr/2 + ( ψ (x))+)f 2 (x) o ,

Piecewise deterministic sampling with splitting schemes

L BPS(µf2)(x, 1) = +µ(x, 1) n (f 2 ) (x) + ((+ψ (x))+ + λr/2)f+ 2 (x) (λr/2 + (+ψ (x))+)f 2 (x) o .

Deﬁne function h such that h = L 2µ, and also h+(x) = h(x, +1) and h (x) = h(x, 1). Therefore we wish to solve the following system of ODEs ( (f+ 2 ) (x) = (λr/2 + ( ψ (x))+)f+ 2 (x) + (λr/2 + ( ψ (x))+)f 2 (x) h+(x), (f 2 ) (x) = (λr/2 + (+ψ (x))+)f+ 2 (x) + (λr/2 + (+ψ (x))+)f 2 (x) + h (x), (54)

with compatibility condition (27), which in this case can be written as Z

(f+ 2 (x) + f 2 (x))π(x)dx = 0 (55)

with π(x) = µ(x, 1) + µ(x, 1). Let us ﬁnd a solution to (54) for a generic (continuous and locally lipschitz) function h. Start by subtracting the ﬁrst line to the second line in (54):

(f 2 ) (x) (f+ 2 ) (x) = ((ψ (x))+ ( ψ (x))+)(f 2 (x) f+ 2 (x)) + hs(x), (56)

where hs(x) = h+(x) + h (x). Deﬁne g = f 2 f+ 2 and notice that (ψ (x))+ ( ψ (x))+ = ψ (x). Then we can rewrite (56) as

g (x) = ψ (x)g(x) + hs(x).

Solving this ODE using an integrating factor we ﬁnd

g(x) = exp (ψ(x)) lim y [exp ( ψ(y)) g(y)] + exp (ψ(x)) Z x

hs(y) exp( ψ(y))dy.

Recall that g = f f+ and f+, f satisfy (55). In order for f2 to deﬁne a proper density we require Z

g(x)π(x)dx < .

For this to hold it must be that limy exp ( ψ(y)) g(y) = 0 and thus

g(x) = exp (ψ(x)) Z x

hs(y) exp( ψ(y))dy. (57)

Since f 2 (x) = f+ 2 (x) + g(x) and plugging this in the ﬁrst equation of (54) we obtain the ODE (f+ 2 ) (x) = (λr/2 + ( ψ (x))+)g(x) h+(x)

which can be integrated as

f+ 2 (x) = f+ 2 (0) + Z x

(λr/2 + ( ψ (y))+)g(y) h+(y) dy. (58)

It follows that

f 2 (x) = f+ 2 (0) + Z x

(λr/2 + ( ψ (y))+)g(y) h+(y) dy

+ exp (ψ(x)) Z x

hs(y) exp( ψ(y))dy. (59)

Bertazzi, Dobson, and Monmarch e

Finally we compute f+ 2 (0) enforcing the compatibility condition (55). Plugging (58) and (59) in (55) we ﬁnd the condition

f+ 2 (0) = Z

g(x)/2 + Z x

(λr/2 + ( ψ (y))+)g(y) h+(y) dy π(x)dx. (60)

E.3 Computing L 2

In this section we obtain the terms L 2 for the splitting schemes DBRBD, RDBDR, DRBRD, BDRDB. This is necessary in order to ﬁnd the analytic expression of f2 given by Proposition 16. We shall focus on the case of BPS with a general dimensionality of the process. Recall that in the one-dimensional case this coincides with the ZZS. Let us consider the following assumption on the invariant distribution of the velocity vector.

Assumption 30 The invariant measure for the velocity component ν satisﬁes the following conditions:

1. Invariance under rotations: ν(w) = ν(v) for any v, w such that |v| = |w|;

2. Mean zero: Eν[V ] = 0;

3. Isotropic: for some b > 0 it holds that Covν(V ) = b I.

These properties hold for instance if ν is the standard Gaussian distribution, as well as if ν is the uniform on the unit sphere (in that case b = 1/d). We ﬁnd the following expressions for L 2.

Proposition 31 Let Assumption 30 hold and deﬁne

A(x, v) = 3

2λr b(| ψ(x)|2 ψ(x)) + 2 v, ψ(x) λ1(x, R(x)v) + v, 2ψ(x)v ,

B(x, v) = 3

2λ1(x, R(x)v) v, 2ψ(x)v R(x)v, 2ψ(x)R(x)v + 1

2 v, x( v, 2ψ(x)v ) ,

C(x, v) = 3λ1(x, R(x)v) 2 v, 2ψ(x)v + v, ψ(x) 2 v, ( v, 2ψ(x)v ) ,

D(x, v) = 3

2λr b(| ψ(x)|2 ψ(x)) + v, 2ψ(x)v + v, ψ(x) 3λ1(x, R(x)v) + λ1(x, v) .

The splitting scheme DBRBD satisﬁes

L 2µ(x, v) = µ(x, v)

A(x, v) + B(x, v) .

The splitting scheme RDBDR satisﬁes

L 2µ(x, v) = µ(x, v)

12 B(x, v).

The splitting scheme DRBRD satisﬁes

L 2µ(x, v) = µ(x, v)

D(x, v) + B(x, v) + 3

2λ2 r v, ψ(x)

Piecewise deterministic sampling with splitting schemes

Figure 6: Error for the radius statistic for a one-dimensional standard Gaussian target. Here λr = 0 for both schemes DBD and BDB. The dashed, blue line corresponds to second order convergence. The time horizon is ﬁxed to T = 105 and the number of iterations is N = T/δ.

The splitting scheme BDRDB satisﬁes

L 2µ(x, v) = µ(x, v)

A(x, v) + C(x, v) .

Remark 32 Clearly, if L 2µ = 0 then f2 must be a constant that satisﬁes (27) and hence it must be that f2 = 0, i.e. the second order term in µδ is zero. This is the case for instance for scheme RDBDR when the target is a multidimensional standard Gaussian. Indeed in Section 1.2 we proved that RDBDR is unbiased for standard Gaussian targets, thus this is a consistent result. In the same setting, we observe that f2 = 0 for schemes DBRBD and DRBRD when the refreshment rate is λr = 0. This is an expected result, as when λr = 0 these schemes coincide with RDBDR. In Figure 6 we conﬁrm that, for a one-dimensional standard Gaussian and when λr = 0, the scheme DBD is unbiased, while the scheme BDB is of second order.

In Proposition 31 we consider symmetric splitting schemes of the form

eδLS = e δ 2 LAe δ 2 LBeδLCe δ 2 LBe δ 2 LA.

For such schemes, the Baker-Campbell-Haussdorﬀformula gives that the adjoint of the generator of the chain satisﬁes the following decomposition:

L S = L + δ2

[L C, [L C, L A + L B]] + [L B, [L B, L A]] + [L C, [L B, L A]] + [L B, [L C, L A]]

2[L B, [L B, L C]] 1

2[L A, [L A, L C]] 1

2[L A, [L A, L B]] + O(δ4)

= L + δ2L 2 + O(δ4).

where L = L A + L B + L C is the adjoint of the generator of the process with generator L = LA + LB + LC. In order to prove Proposition 31 it is then suﬃcient to compute the commutators above for the four schemes we are interested in. In Section E.3.1 we obtain the ﬁrst and second order commutators of BPS, while in Section E.3.2 we use the BCH formula and the obtained results to prove Proposition 31.

Bertazzi, Dobson, and Monmarch e

E.3.1 Computing the commutators of BPS

In this section we compute the ﬁrst and second order commutators for the various components of the adjoint of the BPS. In the following we shall rely on the following equalities

L Dµ(x, v) = v, ψ(x) µ(x, v),

L Bµ(x, v) = v, ψ(x) µ(x, v),

L Rµ(x, v) = 0.

To obtain L Dµ(x, v) we used the trivial, but useful, identity

xµ(x, v) = ψ(x)µ(x, v).

The following lemma groups two other useful identities.

Lemma 33 For λ(x, v) = v, ψ(x) + it holds that

λ1(x, R(x)v) λ1(x, v) = v, ψ(x) , (61)

λ1(x, R(x)v) + λ1(x, v) = +| v, ψ(x) |. (62)

First order commutators. Let us start computing the three ﬁrst order commutators [L B, L D], [L R, L D], and [L R, L B], which are essential to compute higher order commutators. This is done below respectively in Lemmas 34, 36, 37.

Lemma 34 Let g be a suitable function. It holds that

[L B, L D]g(x, v) = R(x)v, ( xg)(x, R(x)v) λ1(x, R(x)v) + v, xg(x, v) λ1(x, v)

+ v, x g(x, R(x)v)λ1(x, R(x)v) g(x, v)λ1(x, v) .

In particular if g = µ

[L B, L D]µ(x, v) = µ(x, v) v, ψ(x) v, ψ(x) | v, ψ(x) | v, 2ψ(x))v .

Remark 35 Alternative ways to write [L B, L D]µ(x, v) can be found using the identities in Lemma 33. We ﬁnd

[L B, L D]µ(x, v) = µ(x, v) λ2 1(x, R(x)v) λ2 1(x, v) + v, ψ(x) 2 v, 2ψ(x)v

= µ(x, v) λ2 1(x, R(x)v) λ2 1(x, v) + v, ( ψ(x) ψ(x)T 2ψ(x))v .

Proof We have

[L B, L D]g(x, v) = L B( v, xg(x, v) ) L D(g(x, R(x)v)λ1(x, R(x)v) g(x, v)λ1(x, v))

= R(x)v, ( xg)(x, R(x)v) λ1(x, R(x)v) + v, xg(x, v) λ1(x, v)

+ v, x g(x, R(x)v)λ1(x, R(x)v) g(x, v)λ1(x, v)

Piecewise deterministic sampling with splitting schemes

[L B, L D]µ(x, v) = µ(x, v) v, ψ(x) (λ1(x, R(x)v) + λ1(x, v))

+ v, x µ(x, v)(λ1(x, R(x)v) λ1(x, v))

= µ(x, v)(λ2 1(x, R(x)v) λ2 1(x, v))

v, x µ(x, v) v, ψ(x) .

Then note that

v, x µ(x, v) v, ψ(x) = v, 2ψ(x)v ψ(x) v, ψ(x) µ(x, v).

[L B, L D]µ(x, v) = µ(x, v) v, ψ(x) v, ψ(x) | v, ψ(x) | v, 2ψ(x))v .

Lemma 36 Let g be a suitable function. It holds that

[L R, L D]g(x, v) = λrν(v) v, Z xg(x, y)dy Z y, xg(x, y) dy .

In particular if g = µ

[L R, L D]µ(x, v) = λr v, ψ(x) µ(x, v).

Proof We ﬁnd

[L R, L D]g(x, v) = L R( v, xg(x, v) ) L D

ν(v) Z g(x, y)dy g(x, v)

ν(v) Z y, xg(x, y) dy v, xg(x, v)

+ λr v, x(ν(v) Z g(x, y)dy g(x, v))

= λrν(v) v, Z xg(x, y)dy Z y, xg(x, y) dy

[L R, L D]µ(x, v) = L R( v, ψ(x) µ(x, v)) = λr v, ψ(x) µ(x, v)

Bertazzi, Dobson, and Monmarch e

Lemma 37 Let g be a suitable function. It holds that

[L R, L B]g(x, v) = λr

ν(v) Z (g(x, R(x)y)λ1(x, R(x)y) g(x, y)λ1(x, y))dy

+ ν(v) Z g(x, y)dy v, ψ(x)

In particular if g = µ

[L R, L B]µ(x, v) = [L R, L D]µ(x, v) = λr v, ψ(x) µ(x, v).

Proof Compute

[L R, L B]g(x, v) = L R(g(x, R(x)v)λ1(x, R(x)v) g(x, v)λ1(x, v))

ν(v) Z g(x, y)dy g(x, v)

= λr ν(v) Z (g(x, R(x)y)λ1(x, R(x)y) g(x, y)λ1(x, y))dy

g(x, R(x)v)λ1(x, R(x)v) | {z } A

+ g(x, v)λ1(x, v) | {z } B

ν(R(x)v) Z g(x, y)dy g(x, R(x)v) | {z } A

λ1(x, R(x)v)

ν(v) Z g(x, y)dy g(x, v) | {z } B

It is now suﬃcient to cancel out the terms denoted by A and B that appear twice with opposite signs to obtain the ﬁnal statement. To compute g = µ, we can use that LD + LB and LR both preserve the invariant distribution, hence [L R, L B]µ(x, v) = [L R, L D]µ(x, v).

Higher order commutators. Let us now compute higher order commutators.

Lemma 38 It holds that

[L B, [L R, L D]]µ(x, v) = λrµ(x, v) v, ψ(x) λ1(x, R(x)v) + λ1(x, v)

+ b tr ψ(x)( ψ(x))T 2ψ(x) .

Proof Applying Lemma 36 we obtain

[L B, [L R, L D]]µ(x, v) = λr L B v, ψ(x) µ(x, v) + [L R, L D] v, ψ(x) µ(x, v)

= λr R(x)v, ψ(x) µ(x, v)λ1(x, R(x)v) v, ψ(x) µ(x, v)λ1(x, v)

Piecewise deterministic sampling with splitting schemes

+ λrν(v) v, x

Z ( y, ψ(x) µ(x, y))dy Z y, x( y, ψ(x) µ(x, y)) dy

= λrµ(x, v) v, ψ(x) λ1(x, R(x)v) + λ1(x, v)

λrµ(x, v) Z ( y, 2ψ(x)y y, ψ(x) 2)ν(dy)

= λrµ(x, v) v, ψ(x) λ1(x, R(x)v) + λ1(x, v)

+ b tr ψ(x)( ψ(x))T 2ψ(x) .

Note that in the last line we used that a, b 2 = a, bb T a and that

Z y, ( ψ(x) ψ(x)T 2ψ(x))y ν(dy) =

ℓ=1 ( ψ(x) ψ(x)T 2ψ(x))jℓ

Z (yjyℓ)ν(dy)

= b tr ψ(x) ψ(x)T 2ψ(x)

(63) which is a consequence of Assumption 30.

Lemma 39 It holds that

[L R, [L R, L B]]µ(x, v) = λ2 rµ(x, v) v, ψ(x) .

Proof Since L Rµ(x, v) = 0 we easily ﬁnd

[L R, [L R, L B]]µ(x, v) = L R(λr v, ψ(x) µ(x, v)) = λ2 rµ(x, v) v, ψ(x) .

Lemma 40 It holds that

[L R, [L R, L D]]µ(x, v) = λ2 rµ(x, v) v, ψ(x) .

Proof The result follows from Lemma 39.

Lemma 41 It holds that

[L R, [L B, L D]]µ(x, v) = λrµ(x, v)

b tr ψ(x) ψ(x)T 2ψ(x)

λ2 1(x, R(x)v) λ2 1(x, v) + v, ( ψ(x) ψ(x)T 2ψ(x))v !

Bertazzi, Dobson, and Monmarch e

Proof Taking advantage of Lemma 34

[L R, [L B, L D]]µ(x, v) = L R µ(x, v) λ2 1(x, R(x)v) λ2 1(x, v) + v, ( ψ(x) ψ(x)T 2ψ(x))v

= λrµ(x, v)

Z λ2 1(x, R(x)y) λ2 1(x, y) + y, ( ψ(x) ψ(x)T 2ψ(x))y ν(dy)

λ2 1(x, R(x)v) λ2 1(x, v) + v, ( ψ(x) ψ(x)T 2ψ(x))v !

Observe that for A = {y : y, ψ(x) 0} we have Z (λ2 1(x, R(x)y) λ2 1(x, y))ν(dy) = Z

AC y, ψ(x) 2ν(y)dy Z

A y, ψ(x) 2ν(y)dy

This can be seen by the change of variables y = R(x)y in the ﬁrst integral. The result then follows by using (63).

Lemma 42 It holds that

[L B, [L R, L B]]µ(x, v) = λrµ(x, v) v, ψ(x) (λ1(x, R(x)v) + λ1(x, v)).

Proof Consider now

[L B, [L R, L B]]µ(x, v) = L B(λr v, ψ(x) µ(x, v)) + [L R, L B]( v, ψ(x) µ(x, v))

= λrµ(x, v)

R(x)v, ψ(x) λ1(x, R(x)v) v, ψ(x) λ1(x, v)

+ Z R(x)y, ψ(x) λ1(x, R(x)y y, ψ(x) λ1(x, y) ν(dy)

+ Z ( y, ψ(x) ν(dy) v, ψ(x)

The last term equals zero as ν has mean zero. Then observe that by Identity (61) Z R(x)y, ψ(x) λ1(x, R(x)y) y, ψ(x) λ1(x, y) ν(dy) =

= Z y, ψ(x) (λ1(x, R(x)y) + λ1(x, y))ν(dy)

= Z λ1(x, R(x)y)2ν(dy) Z λ1(x, y)2ν(dy)

where the last equality follows by invariance under rotation of ν as required in Assumption 30. Hence we have obtained the statement.

Piecewise deterministic sampling with splitting schemes

Lemma 43 It holds that

[L B, [L B, L D]]µ(x, v) = 2µ(x, v)λ1(x, R(x)v) v, ψ(x) 2 v, 2ψ(x)v

R(x)v, ψ2(x)R(x)v .

Proof By Lemma 34 we ﬁnd

[L B, [L B, L D]]µ(x, v) = L B µ(x, v)(λ2 1(x, R(x)v) λ2 1(x, v) + v, ψ(x) 2 v, 2ψ(x)v )

+ [L B, L D] v, ψ(x) µ(x, v) . (**)

Let us treat the two terms separately, starting with (*). After applying L B and using that R(x)(R(x)v) = v the ﬁrst term becomes

(*) = µ(x, v) h (λ2 1(x, v) λ2 1(x, R(x)v) + R(x)v, ψ(x) 2 R(x)v, 2ψ(x)R(x)v λ1(x, R(x)v)

λ2 1(x, R(x)v) λ2 1(x, v) + v, ψ(x) 2 v, 2ψ(x)v ) λ1(x, v) i

= µ(x, v) h (λ2 1(x, v) λ2 1(x, R(x)v))(λ1(x, R(x)v) + λ1(x, v))

+ v, ψ(x) 2(λ1(x, R(x)v) λ1(x, v))

R(x)v, 2ψ(x)R(x)v λ1(x, R(x)v) + v, 2ψ(x)v λ1(x, v) i .

Using Identity (61) we obtain that

v, ψ(x) 2(λ1(x, R(x)v) λ1(x, v)) = (λ2 1(x, v) λ2 1(x, R(x)v))(λ1(x, R(x)v) + λ1(x, v))

and thus cancelling out the corresponding terms in (*) it follows that

(*) = µ(x, v) v, 2ψ(x)v λ1(x, v) R(x)v, 2ψ(x)R(x)v λ1(x, R(x)v) .

Focusing now on (**), we apply Lemma 34 to ﬁnd

(**) = R(x)v, x v, ψ(x) µ(x, v) (x, R(x)v) λ1(x, R(x)v)

+ v, x v, ψ(x) µ(x, v) λ1(x, v)

+ v, x R(x)v, ψ(x) µ(x, v)λ1(x, R(x)v) v, ψ(x) µ(x, v)λ1(x, v) .

Recalling that

x( v, ψ(x) µ(x, v)) = µ(x, v)( 2ψ(x)v ψ(x) v, ψ(x) ),

(**) = µ(x, v) h R(x)v, 2ψ(x)R(x)v + v, ψ(x) 2 λ1(x, R(x)v)

+ v, 2ψ(x)v v, ψ(x) 2 λ1(x, v) i

Bertazzi, Dobson, and Monmarch e

v, x v, ψ(x) µ(x, v)| v, ψ(x) | .

In particular we used Lemma 33 to write the last term more compactly. The derivative in the last term can be computed as follows

v, x v, ψ(x) µ(x, v)| v, ψ(x) | =

= µ(x, v) v, 2ψ(x)v| v, ψ(x) | ψ(x) v, ψ(x) | v, ψ(x) |

+ v, ψ(x) sign( v, ψ(x) ) 2ψ(x)v

= µ(x, v) v, ψ(x) 2| v, ψ(x) | + v, 2ψ(x)v (| v, ψ(x) | + v, ψ(x) sign( v, ψ(x) ))

= µ(x, v) v, ψ(x) 2| v, ψ(x) | + v, 2ψ(x)v 2| v, ψ(x) |

= µ(x, v)| v, ψ(x) | v, ψ(x) 2 2 v, 2ψ(x)v .

Hence re-applying Lemma 33 we ﬁnd

(**) = µ(x, v) h R(x)v, 2ψ(x)R(x)v λ1(x, R(x)v)

+ 2 v, ψ(x) 2λ1(x, R(x)v) (2λ1(x, R(x)v) + λ1(x, v)) v, 2ψ(x)v i .

The proof is now concluded by summing (*) and (**).

Lemma 44 It holds that

[L D, [L R, L B]]µ(x, v) = λrµ(x, v) v, ψ(x) 2 v, 2ψ(x)v .

Proof Consider now [L D, [L R, L B]]:

[L D, [L R, L B]]µ(x, v) = L D(λr v, ψ(x) µ(x, v)) [L R, L B]( v, ψ(x) µ(x, v))

= v, x(λr v, ψ(x) µ(x, v))

λr µ(x, v) Z ( y, ψ(x) )(λ1(x, R(x)y) + λ1(x, y))ν(dy)

= λrµ(x, v) v, ψ(x) 2 v, 2ψ(x)v .

In particular we used that Z ( y, ψ(x) )(λ1(x, R(x)y) + λ1(x, y))ν(dy) = Z λ1(x, y)2ν(dy) Z λ1(x, R(x)y)2ν(dy) = 0

which was shown in (64).

Lemma 45 It holds that

[L D, [L R, L D]]µ(x, v) = λrµ(x, v) v, 2ψ(x)v v, ψ(x) 2

+ b tr 2ψ(x) ψ(x) ψ(x)T

Piecewise deterministic sampling with splitting schemes

Proof By Lemma 36

[L D, [L R, L D]]µ(x, v) = λr L D( v, ψ(x) µ(x, v)) [L R, L D]( v, ψ(x) µ(x, v))

= λrµ(x, v)

v, 2ψ(x)v v, ψ(x) 2

+ Z ( y, 2ψ(x)y y, ψ(x) 2)ν(dy)

The statement follows by Equation (63).

Lemma 46 It holds that

[L D, [L B, L D]] = µ(x, v) 4 v, ψ(x) 2λ1(x, R(x)v) + 7 v, 2ψ(x)v λ1(x, R(x)v)

+ v, x( v, 2ψ(x))v ) + R(x)v, 2ψ(x)R(x)v λ1(x, R(x)v) .

Proof By Lemma 34 together with Lemma 33

[L D, [L B, L D]]µ(x, v) = L D µ(x, v) v, ψ(x) v, ψ(x) | v, ψ(x) | v, 2ψ(x)v

[L B, L D]( v, ψ(x) µ(x, v)). ( )

Consider the two terms separately, starting from the ﬁrst one:

( ) = µ(x, v) v, ψ(x) v, ψ(x) v, ψ(x) | v, ψ(x) | v, 2ψ(x))v

µ(x, v) v, 2 v, ψ(x) 2ψ(x)v 2 2ψ(x)v| v, ψ(x) | x( v, 2ψ(x))v )

= µ(x, v) 2 v, ψ(x) 2λ1(x, R(x)v) + v, 2ψ(x) ( 3 v, ψ(x) + 2| v, ψ(x) |)

+ v, x( v, 2ψ(x))v )

The second term ( ) is the same as term (**) in the proof of Lemma 43. The statement follows taking the diﬀerence of the two terms ( ) and ( ) and using Lemma 33.

E.3.2 Proof of Proposition 31

Now we apply the expressions for the commutators derived in Section E.3.1, thus obtaining the expressions for L 2 of Proposition 31. Let us start with splitting DBRBD:

L 2µ(x, v) = 1

[L R, [L R, L D + L B]] + [L B, [L B, L D]] + [L R, [L B, L D]] + [L B, [L R, L D]]

Bertazzi, Dobson, and Monmarch e

2[L B, [L B, L R]] 1

2[L D, [L D, L R]] 1

2[L D, [L D, L B]]

3 2λr b tr ψ(x) ψ(x)T 2ψ(x) + 2 v, ψ(x) λ1(x, R(x)v) + v, 2ψ(x)v

2λ1(x, R(x)v) v, 2ψ(x)v R(x)v, 2ψ(x)R(x)v + 1

2 v, x( v, 2ψ(x)v )

Then focus on splitting BDRDB:

L 2µ(x, v) = 1

[L R, [L R, L D + L B]] + [L D, [L D, L B]] + [L R, [L D, L B]] + [L D, [L R, L B]]

2[L D, [L D, L R]] 1

2[L B, [L B, L R]] 1

2[L B, [L B, L D]]

2λr b tr ψ(x) ψ(x)T 2ψ(x) + 2 v, ψ(x) λ1(x, R(x)v) + v, 2ψ(x)v

+ 3λ1(x, R(x)v) 2 v, 2ψ(x)v + v, ψ(x) 2 v, ( v, 2ψ(x)v )

Consider now RDBDR:

L 2µ(x, v) = 1

[L B, [L B, L R + L D]] + [L D, [L D, L R]] + [L B, [L D, L R]] + [L D, [L B, L R]]

2[L D, [L D, L B]] 1

2[L R, [L R, L B]] 1

2[L R, [L R, L D]]

3 2λ1(x, R(x)v) v, 2ψ(x)v R(x)v, 2ψ(x)R(x)v + 1

2 v, x( v, 2ψ(x)v )

Finally focus on DRBRD:

L 2µ(x, v) = 1

[L B, [L B, L D + L R]] + [L R, [L R, L D]] + [L B, [L R, L D]] + [L R, [L B, L D]]

2[L R, [L R, L B]] 1

2[L D, [L D, L B]] 1

2[L D, [L D, L R]]

3 2λr b tr ψ(x) ψ(x)T 2ψ(x) + v, ψ(x) 3λ1(x, R(x)v) + λ1(x, v)

+ v, 2ψ(x)v + 3

2λ1(x, R(x)v) v, 2ψ(x)v R(x)v, 2ψ(x)R(x)v

2 v, ( v, 2ψ(x)v ) + 3

2λ2 r v, ψ(x)

E.4 Application of Proposition 16 to three one-dimensional targets

In this section we give analytic expressions for f2 for the four splitting schemes considered in Section 4.2 for three diﬀerent one-dimensional target distributions, together with various numerical simulations.

Piecewise deterministic sampling with splitting schemes

Figure 7: Square root of the MSE in the estimation of the radius statistic, x2, with a one-dimensional standard Gaussian target. The step size is set to δ = 0.5, the number of iterations is N = 2 105, and the experiment is repeated 300 times. The schemes BDB (left) and DBD (right) correspond to including the refreshment part in B. In schemes B DR B (left) and DR B DR (right) we denote by B the standard bounce part, by DR the transition kernel which corresponds to having refreshments and deterministic motion together, and we use underscores to divide these two kernels.

The heuristic argument of Section 4.2 and the numerical simulations of Figure 7 suggest that schemes having DBD as their limit as the refreshment rate goes to zero have a smaller bias in the x component compared to those that converge to BDB. Similarly to Section 4.2, here we focus on schemes RDBDR, DBRBD, DRBRD, as well as BDRDB. For these four schemes we computed L 2 in Proposition 31 and give the corresponding analytic expressions of f2 for three one-dimensional targets: a standard normal distribution (see Proposition 47), the distribution corresponding to the potential ψ(x) = x4 (see Proposition 48), and the Cauchy distribution (see Proposition 49). The results, both according to the theory and numerical simulations, are shown in Figure 9. Let us comment on these results. First of all, the theoretical results and the numerical simulations of Figure 9 are consistent, in the sense that they report the same behaviour although they consider two diﬀerent metrics. In particular, the bias of schemes RDBDR and BDRDB appears to be independent of the refreshment rate, while DBRBD and DRBRD have respectively linear and quadratic dependence. In the one-dimensional case, the plots show that it is best to choose λr = 0, which is possible as in this case BPS is irreducible. However, in higher dimensional settings it is necessary to take λr > 0 and thus it is essential to use schemes that have good performance for most values of λr. From Figures 1 and 9 it is also clear that RDBDR typically has the smallest bias out of all the considered splittings. The experiments in Figure 10 suggest that the ﬁndings of the onedimensional case extend to multi-dimensional targets. In particular, RDBDR has either a better performance than other splittings or behaves very similarly to BDRDB both on an independent as well as a correlated Gaussian.

E.4.1 Gaussian target

Let us start with a one-dimensional Gaussian target with mean zero and variance σ2 > 0.

Bertazzi, Dobson, and Monmarch e

Figure 8: Results for the target distribution with potential ψ(x) = x4. The left plot shows the TV distance up to the second order term according to Proposition 48. Here δ = 0.5, the number of iterations is N = 2 105, and the experiment is repeated 50 times.

Figure 9: Results for a Cauchy target distribution with γ = 1. Left: TV distance up to the second order term according to Proposition 49; right: square root of the MSE for the estimation of the truncated radius statistic 2 x2. Here δ = 0.5, the number of iterations is N = 4 106, and the experiment is repeated 200 times.

Proposition 47 Let ψ(x) = x2/(2σ2) for σ2 > 0. Then:

For the splitting scheme DBRBD it holds that

f2(x, +1) = f2(x, 1) = λr

For the splitting scheme BDRDB it holds that

f2(x, +1) = 1 8σ2 1 4σ4 x21x<0,

f2(x, 1) = 1 8σ2 1 4σ4 x21x>0.

For the splitting scheme RDBDR it holds that

f2(x, +1) = f2(x, 1) = 0.

Piecewise deterministic sampling with splitting schemes

(a) ρ = 0. (b) ρ = 0.7.

Figure 10: Square root of the MSE for empirical estimates of the radius statistic for splitting schemes of BPS with a 5-dimensional Gaussian target with covariance Σii = 1, Σij = ρ for i = j. The step size is δ = 0.5 and the number of iterations is N = 2 106. The position vector is initialised with a draw from the target distribution and the velocity from a draw from the uniform distribution on the unit sphere.

(a) Refreshment rate λr = 1.0. (b) Refreshment rate λr = 3.0.

Figure 11: Plots of the theoretical invariant measure up to second order for a standard Gaussian target, as given by Proposition 47. The step size is δ = 0.5.

For the splitting scheme DRBRD it holds that

f2(x, +1) = f2(x, 1) = λr

E.4.2 Non-Lipschitz potential

Now we focus on a target distribution with non-Lipschitz potential.

Proposition 48 Let ψ(x) = x4. Then:

Bertazzi, Dobson, and Monmarch e

For the splitting scheme DBRBD it holds that

f2(x, +1) = f2(x, 1) = λr

1 2Γ(5/4) 2x7sign(x) + 1

Γ(1/4) x2 .

For the splitting scheme BDRDB it holds that

f2(x, +1) = 5Γ(3/4)

2Γ(1/4) x2 4x61x<0 ,

f2(x, 1) = 5Γ(3/4)

2Γ(1/4) x2 4x61x 0 .

For the splitting scheme RDBDR it holds that

f2(x, +1) = f2(x, 1) = Γ(3/4)

For the splitting scheme DRBRD it holds that

f2(x, +1) = f2(x, 1) = λr

1 Γ(5/4) 4x7sign(x) + 1

E.4.3 Heavy tailed target

Finally we consider a Cauchy distribution π(x) = γ/(π(γ2 + x2)) for γ > 0.

Proposition 49 Let ψ(x) = ln(γ2 + x2). Then:

For the splitting scheme DBRBD it holds that

f2(x, +1) = f2(x, 1) = λr

4 |arctan(x/γ)| + γ|x| γ2 + x2 1

4γ2 + x2 γ2

For the splitting scheme BDRDB it holds that

f2(x, v) = (x2 3γ2)2

48γ2(x2 + γ2)2

1xv<0 + x4 54x2γ2 + 9γ4

48γ2(x2 + γ2)2

For the splitting scheme RDBDR it holds that

f2(x, +1) = f2(x, 1) = 1

4γ2 + x2 γ2

For the splitting scheme DRBRD it holds that

f2(x, +1) = f2(x, 1) = λr

4 |arctan(x/γ)| + γ|x| γ2 + x2 1

4γ2 + x2 γ2

ln 4 ln 1 + x2

Piecewise deterministic sampling with splitting schemes

E.5 Proof of Proposition 17

Fix x R, δ > 0 and let G(x, δ) := {(z, v) R { 1} : (z x)/δ Z} be the state space of the chain with initial position x. For now, let µδ be any probability measure on G(x, δ) such that µδ(y, w) = µδ(y, w) for all (y, w) G(x, δ), and let us give a suﬃcient and necessary condition for it to be invariant by the chain. Since such a µδ is invariant by the refreshment step, it is invariant for the scheme RDBDR if and only if it is invariant for the scheme R DBD, where R is a deterministic ﬂip of the velocity (which, as R, preserves µδ). Besides, from a state (y, w) G(x, δ), one transition of R DBD can only lead to (y, w) or (y + δw, w), from which it can only stay or come back to the initial (y, w). In other words the pair {(y, w), (y + δw, w)} is irreducible for this chain, and thus µδ is invariant for R DBD if and only if its restrictions on all these sets for (y, w) G(x, δ) are invariant by this scheme, which by deﬁnition reads

(y, w) G(x, δ) , µδ(y, w)e δλ(y+wδ/2,w) = µδ(y + δw, w)e δλ(y+wδ/2, w).

It turns out that this is exactly the skew detailed balance condition (5) for the scheme DBD. Writing that µδ(y, w) exp( ψδ(y)) for some ψδ and recalling that λ(y, w) λ(y, w) = wψ (y) for all y, w, this is equivalent to

(y, w) G(x, δ), ψδ(y + δw) ψδ(y) = δwψ (y + δw/2) .

Up to an additive constant, the only function ψδ which satisﬁes this is the one given in the statement of Proposition 17. As a conclusion, we have proven that a probability measure on G(x, δ) which is independent from the velocity is invariant for the scheme RDBDR if and only if it is the one given in the proposition, which concludes the proof of the ﬁrst statement. Now we focus on the convergence of empirical means, assuming that the conditions of Theorem 10 are met. The reference position x R is still ﬁxed. The long-time convergence established in Theorem 10 (for P 2 δ where Pδ is one step of the scheme) is well-known to imply an ergodic Law of Large Numbers. In particular, for all initial conditions in G(x, δ) and all bounded f, distinguishing odd and even indexes, we see that 1

N PN k=1 f(Ztk) (where (Ztk)k N is a trajectory of the scheme) converges almost surely as N to µδ(f) := (µ δ(f) + µ δ(f))/2, where µ δ and µ δ are the unique invariant measures of P 2 δ on each periodic component of the state space. In particular, µδ is an invariant measure for Pδ. In dimension 1, the scheme DBD is such that for all y, for all times, the number of visits of the points (y, 1) and (y, 1) diﬀer at most by 1, which implies by ergodicity that µδ(y, w) = µδ(y, w) for all (y, w) G(x, δ), and we conclude thanks to the ﬁrst part of the proof.

Appendix F. Proofs of Section 5

F.1 Proof of Proposition 18

Recall the expression for the acceptance rate (7) for given initial state (x, v) and proposed state ( X, RIv), as well as our notation x1/2 = x + vδ/2. Let us rewrite the exponent in the acceptance probability using Taylor s theorem. Expanding ψ( X) as well as iψ(x1/2)

Bertazzi, Dobson, and Monmarch e

around x we ﬁnd

ψ(x) ψ( X) + δ X

i/ I vi iψ(x1/2) = δ2

j I ijψ(x)vj + δ3H(x, v) + O(δ4), (65)

where H(x, v) = 1

i/ I vi v, 2 iψ(x)v 1

α:|α|=3 Dαψ(x)(v + RIv)α.

By O(δ4) we denote a remainder term that depends on fourth derivatives of ψ, is uniform in δ [0, δ0], and under our assumptions increases at most polynomially in x. We shall compute the probability of rejecting the proposed state by partitioning the state on whether there are no events, or one or more components of the velocity are ﬂipped. Consider ﬁrst the case of no events. In this scenario the second order term in (65) equals 0 since I is the empty set. Recalling that x+ x2 +/2 1 1 e x x+ we ﬁnd

E[(1 α((x, v), ( X, V )))1no events] = max 0, exp( δλ(x1/2, v))δ3H(x, v) + O(δ4)

α:|α|=3 Dαψ(x)vα

On the complementary event we ﬁnd

E[(1 α((x, v), ( X, V )))1 1 events]

j I ijψ(x)vj Y

1 exp(δλi(x1/2, v)) Y

j / I exp(δλj(x1/2, v))

i=1 λi(x1/2, v)vi X

k =i vk ikψ(x)

In the last line we focused on the terms corresponding to only one component of the velocity being ﬂipped, as these are the only situations with leading order, and we expanded the exponential terms. The term λi(x1/2, v) can be substituted by the term λi(x, v) because the function max(0, a) is Lipschitz and ψ is smooth. We obtain the statement in the proposition by summing the case of no events to that of one or more events.

F.2 Proof of Proposition 19

The proof is similar to that of Proposition 18, so we only give the main steps. Observe that we only need to focus on the DBD part of the splitting scheme, as refreshments do not aﬀect the rejection probability. Similarly to Proposition 18 we distinguish two cases based on whether a jump happens or not. In the case in which no rejections take place we have that the proposal is X = x + vδ. We rewrite the exponent in (11) replacing ψ( X) and ψ(x1/2) with their Taylor expansions around x. This gives

ψ(x) ψ( X) + δ v, ψ(x1/2) = 1

α:|α|=3 Dαψ(x)vα + O(δ4).

Piecewise deterministic sampling with splitting schemes

Following a similar reasoning of Proposition 18, we ﬁnd that in this case

E[(1 α((x, v), ( X, V )))1no reﬂection] = δ3

α:|α|=3 Dαψ(x)vα

On the complementary event we shall use the following Taylor expansion:

ψ(x) ψ(x + (v + R(x1/2)v)δ/2) = δ

2 v + R(x1/2)v, ψ(x1/2)

8 v, 2ψ(x1/2)v R(x1/2)v, 2ψ(x1/2)R(x1/2)v + O(δ3).

Notice that the ﬁrst order term equals zero by deﬁnition of the operator R. Hence on the event that a reﬂection takes place we ﬁnd

E[(1 α((x, v), ( X, V )))1reﬂection]

8 max 0, λ(x1/2, v)( v, 2ψ(x1/2)v R(x1/2)v, 2ψ(x1/2)R(x1/2)v ) + O(δ4).

It remains to show that we can replace x1/2 by x which follows if the δ3 term is continuous in x1/2. Using the deﬁnition of λ and R we ﬁnd that

F(y) := λ(y, v) v, 2ψ(y)v R(y)v, 2ψ(y)R(y)v

| ψ(y)| 2 3 , v

| ψ(y)| 2 3

v, 2ψ(y) ψ(y)

| ψ(y)| 2 3

| ψ(y)| 4 5 , v

| ψ(y)| 4 5

ψ(y), 2ψ(y) ψ(y)

| ψ(y)| 4 5

Since ψ is smooth and z/|z|4/5 is H older continuous with exponent 1/5 the above function is the composition of smooth and H older continuous functions, and hence the diﬀerence F(x1/2) F(x) converges to zero as δ tends to 0.

F.3 Bounds on the rejection probability for log-concave targets

Consider a log-concave target distribution for which the potential is gradient L-Lipschitz and the Hessian satisﬁes m Id 2ψ(x) MId, where Id is the d-dimensional identity matrix. We suppose G1 is the leading term in G, which is for instance the case when ijkψ is non-zero only for a small number of indices, and obtain a bound as follows.

Example 5 (Algorithm 3) Observe that

i=1 λi(x, v)vi X

k =i vk ikψ(x) =

i=1 λi(x, v) v, 2ψ(x)v Riv, 2ψ(x)Riv .

Bertazzi, Dobson, and Monmarch e

Using the bounds v, 2ψ(x)v Riv, 2ψ(x)Riv d(M m) and max(0, a) |a| twice we ﬁnd

Eµ[G1(x, v)] 1

i=1 Eπ [| iψ(x)|] 1

In the second inequality we applied Jensen s inequality to Dalalyan (2017, Lemma 2), which gives Eπ [| iψ(x)|]

L. Alternatively, we can assume that for all v { 1}d it holds |vi P k =i vk ikψ(x)| M for all i = 1, . . . , d, where M is independent of d. This holds e.g. for a correlated Gaussian distribution. With this assumption we ﬁnd with similar computations as above Eµ[G1(x, v)] 1

Example 6 (Algorithm 4) In order to have a fair comparison between BPS and ZZS, we consider BPS with Gaussian velocity. This choice ensures that the Euclidean norm of the velocity vectors of the two samplers are equal (on average). Similarly to Example 5, under the assumption that ψ is L-Lipschitz and m Id 2ψ(x) MId it is not hard to obtain

the bound Eµ[G1(x, v)]

Appendix G. Pseudo-codes for Section 6.2

Here we give the pseudo-codes for the jump parts of ZZS and BPS considered in Section 6.2. These are respectively Algorithm 7 and Algorithm 8. Both algorithms take as input the gradients ψ1 and ψ2, which are deﬁned as follows:

i=1 V (xi xi+1)

ψ2(x, j) = x

i=1 W(xi xj)

The pseudo-code for HMC is given in Algorithm 9.

Piecewise deterministic sampling with splitting schemes

Algorithm 7: Part B for the ZZS considered in Section 6.2

Input: Initial condition (x, v), step size δ, gradients ψ1, ψ2. Output: Updated velocity vector v. for i 1 to N do

t 0 τ1 Exp((vi iψ1(x))+) τ2 Exp(1) while min(τ1, τ2) δ t do

if τ1 < τ2 then

t t + τ1 vi vi τ2 τ2 τ1 τ1

t t + τ2 J Unif({1, . . . , N}) U Unif[0, 1] if U (vi iψ2(x, J))+ then

vi vi τ1 Exp((vi iψ1(x))+)

τ1 τ1 τ2 τ2 Exp(1)

Bertazzi, Dobson, and Monmarch e

Algorithm 8: Part B for the BPS considered in Section 6.2

Input: Initial condition (x, v), step size δ, gradients ψ1, ψ2. Output: Updated velocity vector v. t 0 β

N|v| τ1 Exp( ψ1(x), v +) τ2 Exp(β) while min(τ1, τ2) δ t do

if τ1 < τ2 then

t t + τ1 v v 2 v, ψ1(x)

| ψ1(x)|2 ψ1(x)

τ2 τ2 τ1 τ1

t t + τ2 J Unif({1, . . . , N}) U Unif[0, 1]

if U v, ψ2(x,J) +

v v 2 v, ψ2(x,J)

| ψ2(x,J)|2 ψ2(x, J)

τ1 Exp( ψ1(x), v +)

τ1 τ1 τ2 τ2 Exp(β)

Piecewise deterministic sampling with splitting schemes

Algorithm 9: HMC algorithm considered in Section 6.2

Input: Initial condition x, step size δ, number of iterations niter, parameters M, K, gradients ψ1, ψ2. Output: Markov chain (Xn)niter n=1. for j 1 to niter do

Refresh v from the standard Laplace distribution for k 1 to M do

J Unif({1, . . . , N}) v v 1

2δ ψ2(x, J) for l 1 to K do

v v δ 2K δ ψ1(x) x x + δ

K sign(v) v v δ 2K ψ1(x)

J Unif({1, . . . , N}) v v 1

2δ ψ2(x, J)

return (Xn)niter n=1

Christophe Andrieu and Samuel Livingstone. Peskun Tierney ordering for Markovian Monte Carlo: beyond the reversible scenario. Annals of Statistics, 49(4):1958 1981, 2021.

Andrea Bertazzi and Joris Bierkens. Adaptive schemes for piecewise deterministic Monte Carlo algorithms. Bernoulli, 28(4):2404 2430, 2022.

Andrea Bertazzi, Joris Bierkens, and Paul Dobson. Approximations of Piecewise Deterministic Markov Processes and their convergence properties. Stochastic Processes and their Applications, 154:91 153, 2022.

Joris Bierkens and Gareth Roberts. A piecewise deterministic scaling limit of lifted Metropolis Hastings in the Curie Weiss model. The Annals of Applied Probability, 27 (2):846 882, 2017.

Joris Bierkens, Alexandre Bouchard-Cˆot e, Arnaud Doucet, Andrew B. Duncan, Paul Fearnhead, Thibaut Lienart, Gareth Roberts, and Sebastian J. Vollmer. Piecewise deterministic Markov processes for scalable Monte Carlo on restricted domains. Statistics & Probability Letters, 136:148 154, 2018. The role of Statistics in the era of big data.

Joris Bierkens, Paul Fearnhead, and Gareth Roberts. The Zig-Zag Process and Super Eﬃcient Sampling for Bayesian Analysis of Big Data. Annals of Statistics, 47, 2019a.

Joris Bierkens, Gareth O Roberts, and Pierre-Andr e Zitt. Ergodicity of the zigzag process. The Annals of Applied Probability, 29(4):2266 2301, 2019b.

Bertazzi, Dobson, and Monmarch e

Joris Bierkens, Sebastiano Grazzi, Frank van der Meulen, and Moritz Schauer. Sticky PDMP samplers for sparse and local inference problems. Statistics and Computing, 33 (1):8, 2022a.

Joris Bierkens, Kengo Kamatani, and Gareth O. Roberts. High-dimensional scaling limits of piecewise deterministic sampling algorithms. The Annals of Applied Probability, 32(5): 3361 3407, 2022b.

Joris Bierkens, Kengo Kamatani, and Gareth O Roberts. Scaling of piecewise deterministic Monte Carlo for anisotropic targets. Bernoulli, 31(3):2323 2350, 2025.

Andrea Bonﬁglioli and Roberta Fulci. Topics in noncommutative Algebra: The Theorem of Campbell, Baker, Hausdorﬀand Dynkin, volume 2034. Springer, 01 2012. ISBN 978-3642-22596-3.

Nawaf Bou-Rabee, Andreas Eberle, and Raphael Zimmer. Coupling and convergence for Hamiltonian Monte Carlo. The Annals of Applied Probability, 30(3), Jun 2020.

Alexandre Bouchard-Cˆot e, Sebastian J. Vollmer, and Arnaud Doucet. The Bouncy Particle Sampler: A Nonreversible Rejection-Free Markov Chain Monte Carlo Method. Journal of the American Statistical Association, 113(522):855 867, 2018.

Evan Camrud, Alain Oliviero Durmus, Pierre Monmarch e, and Gabriel Stoltz. Second order quantitative bounds for unadjusted generalized Hamiltonian Monte Carlo. ar Xiv e-prints, art. ar Xiv:2306.09513, June 2023.

Yu Cao, Jianfeng Lu, and Lihan Wang. On explicit l2-convergence rate estimate for underdamped langevin dynamics. Archive for Rational Mechanics and Analysis, 247(5):90, 2023.

Antonin Chambolle and Thomas Pock. A ﬁrst-order primal-dual algorithm for convex problems with applications to imaging. Journal of mathematical imaging and vision, 40: 120 145, 2011.

Xiang Cheng, Niladri S Chatterji, Peter L Bartlett, and Michael I Jordan. Underdamped Langevin MCMC: A non-asymptotic analysis. In Conference on learning theory, pages 300 323. PMLR, 2018.

Augustin Chevallier, Sam Power, Andi Q Wang, and Paul Fearnhead. PDMP Monte Carlo methods for piecewise smooth densities. Advances in Applied Probability, 56(4):1153 1194, 2024.

Cloez, Bertrand, Dessalles, Renaud, Genadot, Alexandre, Malrieu, Florent, Marguet, Aline, and Yvinec, Romain. Probabilistic and piecewise deterministic models in biology. ESAIM: Procs, 60:225 245, 2017.

Alice Corbella, Simon E. F. Spencer, and Gareth O. Roberts. Automatic Zig-Zag sampling in practice. Statistics and Computing, 32(6):107, 2022.

Piecewise deterministic sampling with splitting schemes

Dan Crisan and Michela Ottobre. Pointwise gradient bounds for degenerate semigroups (of UFG type). In Proc. R. Soc. A, volume 472, page 20160442. The Royal Society, 2016.

Arnak S. Dalalyan. Further and stronger analogy between sampling and optimization: Langevin monte carlo and gradient descent. In Annual Conference Computational Learning Theory, 2017.

M. H. A. Davis. Piecewise-Deterministic Markov Processes: A General Class of Non Diﬀusion Stochastic Models. Journal of the Royal Statistical Society. Series B (Methodological), 46(3):353 388, 1984.

M.H.A. Davis. Markov Models & Optimization. Chapman & Hall/CRC Monographs on Statistics & Applied Probability. Taylor & Francis, 1993.

Valentin De Bortoli, Alain Durmus, Marcelo Pereyra, and Ana Fernandez Vidal. Maximum likelihood estimation of regularization parameters in high-dimensional inverse problems: an empirical Bayesian approach. Part II: Theoretical analysis. SIAM Journal on Imaging Sciences, 13(4):1990 2028, 2020.

George Deligiannidis, Daniel Paulin, Alexandre Bouchard-Cˆot e, and Arnaud Doucet. Randomized Hamiltonian Monte Carlo as scaling limit of the bouncy particle sampler and dimension-free convergence rates. The Annals of Applied Probability, 31(6):2612 2662, 2021.

Persi Diaconis, Susan Holmes, and Radford M. Neal. Analysis of a nonreversible Markov chain sampler. The Annals of Applied Probability, 10(3):726 752, 2000.

Alain Durmus and Eric Moulines. Sampling from strongly log-concave distributions with the Unadjusted Langevin Algorithm. ar Xiv preprint ar Xiv:1605.01559, 2016.

Alain Durmus, Eric Moulines, and Marcelo Pereyra. Eﬃcient Bayesian Computation by Proximal Markov Chain Monte Carlo: When Langevin Meets Moreau. SIAM Journal on Imaging Sciences, 11(1):473 506, 2018.

Alain Durmus, Arnaud Guillin, and Pierre Monmarch e. Geometric ergodicity of the Bouncy Particle Sampler. The Annals of Applied Probability, 30(5):2069 2098, 10 2020.

Alain Durmus, Arnaud Guillin, and Pierre Monmarch e. Piecewise deterministic Markov processes and their invariant measures. Annales de l Institut Henri Poincar e, Probabilit es et Statistiques, 57(3):1442 1475, 2021.

Nicola ı Gouraud, Pierre Le Bris, Adrien Majka, and Pierre Monmarch e. HMC and Underdamped Langevin United in the Unadjusted Convex Smooth Case. SIAM/ASA Journal on Uncertainty Quantiﬁcation, 13(1):278 303, 2025a.

Nicola ı Gouraud, Louis Lagard ere, Olivier Adjoua, Thomas Pl e, Pierre Monmarch e, and Jean-Philip Piquemal. Velocity Jumps for Molecular Dynamics. Journal of Chemical Theory and Computation, 21(6):2854 2866, 2025b.

Bertazzi, Dobson, and Monmarch e

Martin Hairer and Jonathan C. Mattingly. Yet another look at Harris ergodic theorem for Markov chains. In Seminar on Stochastic Analysis, Random Fields and Applications VI, volume 63 of Progr. Probab., pages 109 117. Birkh auser/Springer Basel AG, Basel, 2011.

Alan M. Horowitz. A generalized guided Monte Carlo algorithm. Physics Letters B, 268 (2):247 252, 1991.

K Hukushima and Y Sakai. An irreversible Markov-chain Monte Carlo method with skew detailed balance conditions. Journal of Physics: Conference Series, 473:012012, dec 2013.

Marie Kopec. Weak backward error analysis for overdamped Langevin processes. IMA Journal of Numerical Analysis, 35(2):583 614, 05 2014.

Louis Lagardere, F elix Aviat, and Jean-Philip Piquemal. Pushing the limits of multipletime-step strategies for polarizable point dipole molecular dynamics. The journal of physical chemistry letters, 10(10):2593 2599, 2019.

Ben Leimkuhler, Daniel T. Margul, and Mark E. Tuckerman. Stochastic, resonance-free multiple time-step algorithm for molecular dynamics with very large time steps. Molecular Physics, 111(22-23):3579 3594, 2013.

Benedict Leimkuhler and Charles Matthews. Rational Construction of Stochastic Numerical Methods for Molecular Sampling. Applied Mathematics Research e Xpress, 2013(1):34 56, 06 2012.

Benedict Leimkuhler, Charles Matthews, and Gabriel Stoltz. The computation of averages from equilibrium and nonequilibrium Langevin molecular dynamics. IMA J. Numer. Anal., 36(1):13 79, 2016.

Benedict Leimkuhler, Daniel Paulin, and Peter A Whalley. Contraction rate estimates of stochastic gradient kinetic Langevin integrators. ESAIM: Mathematical Modelling and Numerical Analysis, 58(6):2255 2286, 2024.

Vincent Lemaire, Mich ele Thieullen, and Nicolas Thomas. Exact Simulation of the Jump Times of a Class of Piecewise Deterministic Markov Processes. Journal of Scientiﬁc Computing, 2017.

Vincent Lemaire, Mich ele Thieullen, and Nicolas Thomas. Thinning and multilevel Monte Carlo methods for piecewise deterministic (Markov) processes with an application to a stochastic Morris Lecar model. Advances in Applied Probability, 52(1):138 172, 2020.

P.A.W. Lewis and G.S. Shedler. Simulation of Nonhomogeneous Poisson Processes by Thinning, 1978.

S Livingstone, M F Faulkner, and G O Roberts. Kinetic energy choice in Hamiltonian/hybrid Monte Carlo. Biometrika, 106(2):303 319, 04 2019.

Eva L ocherbach and Pierre Monmarch e. Metastability for systems of interacting neurons. Annales de l Institut Henri Poincar e, Probabilit es et Statistiques, 58(1):343 378, 2022.

Piecewise deterministic sampling with splitting schemes

Sean P. Meyn and R. L. Tweedie. Stability of Markovian processes III: Foster Lyapunov criteria for continuous-time processes. Advances in Applied Probability, 25(3):518 548, 1993.

M. Michel, S. C. Kapfer, and W. Krauth. Generalized event-chain Monte Carlo: Constructing rejection-free global-balance algorithms from inﬁnitesimal steps. The Journal of Chemical Physics, 140(5), 2014.

Pierre Monmarch e. Piecewise deterministic simulated annealing. ALEA, 13(1):357 398, 2016.

Pierre Monmarch e. Kinetic walks for sampling. ALEA Lat. Am. J. Probab. Math. Stat., 17:491 530, 2020.

Pierre Monmarch e. High-dimensional MCMC with a standard splitting scheme for the underdamped Langevin diﬀusion. Electronic Journal of Statistics, 15(2):4117 4166, 2021.

Pierre Monmarch e. Elementary coupling approach for non-linear perturbation of Markov processes with mean-ﬁeld jump mechanisms and related problems. ESAIM: PS, 27:278 323, 2023.

Pierre Monmarch e, Jeremy Weisman, Louis Lagard ere, and Jean-Philip Piquemal. Velocity jump processes: An alternative to multi-timestep methods for faster and accurate molecular dynamics simulations. The Journal of Chemical Physics, 153(2):024101, 2020.

Joseph A. Morrone, Ruhong Zhou, and B. J. Berne. Molecular dynamics with multiple time scales: How to avoid pitfalls. Journal of Chemical Theory and Computation, 6(6): 1798 1804, 2010. PMID: 26615840.

Radford M Neal. MCMC using Hamiltonian dynamics. Handbook of markov chain monte carlo, 2(11):2, 2011.

Kirill Neklyudov, Max Welling, Evgenii Egorov, and Dmitry Vetrov. Involutive MCMC: A Unifying Framework. In Proceedings of the 37th International Conference on Machine Learning, ICML 20. JMLR.org, 2020.

Akihiko Nishimura, Zhenyu Zhang, and Marc A. Suchard. Zigzag Path Connects Two Monte Carlo Samplers: Hamiltonian Counterpart to a Piecewise Deterministic Markov Process. Journal of the American Statistical Association, 0(0):1 13, 2024.

Joao P Oliveira, Jos e M Bioucas-Dias, and M ario AT Figueiredo. Adaptive total variation image deblurring: A majorization minimization approach. Signal processing, 89(9):1683 1693, 2009.

Filippo Pagani, Augustin Chevallier, Sam Power, Thomas House, and Simon Cotter. Nu ZZ: Numerical Zig-Zag for general models. Statistics and Computing, 34(1):61, 2024.

Marcelo Pereyra. Proximal Markov chain Monte Carlo algorithms. Statistics and Computing, 26(4):745 760, 2016.

Bertazzi, Dobson, and Monmarch e

Marcelo Pereyra, Luis Vargas Mieles, and Konstantinos C. Zygalakis. Accelerating Proximal Markov Chain Monte Carlo by Using an Explicit Stabilized Method. SIAM Journal on Imaging Sciences, 13(2):905 935, 2020.

E. A. J. F. Peters and G. De With. Rejection-free Monte Carlo sampling for general potentials. Physical Review E - Statistical, Nonlinear, and Soft Matter Physics, 85(2): 1 5, 2012.

R Tyrrell Rockafellar and Roger J-B Wets. Variational analysis, volume 317. Springer Science & Business Media, 2009.

Leonid I Rudin, Stanley Osher, and Emad Fatemi. Nonlinear total variation based noise removal algorithms. Physica D: nonlinear phenomena, 60(1-4):259 268, 1992.

Jesus Maria Sanz-Serna and Konstantinos C Zygalakis. Wasserstein distance estimates for the distributions of numerical approximations to ergodic stochastic diﬀerential equations. Journal of Machine Learning Research, 22(242):1 37, 2021.

Babak Shahbaba, Shiwei Lan, Wesley O Johnson, and Radford M Neal. Split Hamiltonian Monte Carlo. Statistics and Computing, 24:339 349, 2014.

Gabriel Stoltz and Zoﬁa Trstanova. Langevin dynamics with general kinetic energies. Multiscale Modeling & Simulation, 16:777 806, 01 2018.

Denis Talay. Second-order discretization schemes of stochastic diﬀerential systems for the computation of the invariant law. Stochastics: An International Journal of Probability and Stochastic Processes, 29(1):13 36, 1990.

Achille Thin, Nikita Kotelevskii, Christophe Andrieu, Alain Durmus, Eric Moulines, and Maxim Panov. Nonreversible MCMC from conditional invertible transforms: a complete recipe with convergence guarantees. ar Xiv:2012.15550, 2020.

MBBJM Tuckerman, Bruce J Berne, and Glenn J Martyna. Reversible multiple time scale molecular dynamics. The Journal of chemical physics, 97(3):1990 2001, 1992.

Konstantin S. Turitsyn, Michael Chertkov, and Marija Vucelja. Irreversible Monte Carlo algorithms for eﬃcient sampling. Physica D: Nonlinear Phenomena, 240(4):410 414, 2011.

Paul Vanetti, Alexandre Bouchard-Cˆot e, George Deligiannidis, and Arnaud Doucet. Piecewise-Deterministic Markov Chain Monte Carlo. ar Xiv:1707.05296, 2017.

Ana Fernandez Vidal, Valentin De Bortoli, Marcelo Pereyra, and Alain Durmus. Maximum likelihood estimation of regularization parameters in high-dimensional inverse problems: An empirical Bayesian approach part i: Methodology and experiments. SIAM Journal on Imaging Sciences, 13(4):1945 1989, 2020.

Marija Vucelja. Lifting A Nonreversible Markov Chain Monte Carlo Algorithm. American Journal of Physics, 84, 12 2014.