# probabilistic_exponential_integrators__ea992651.pdf

Probabilistic Exponential Integrators

Nathanael Bosch Tübingen AI Center, University of Tübingen nathanael.bosch@uni-tuebingen.de

Philipp Hennig Tübingen AI Center, University of Tübingen philipp.hennig@uni-tuebingen.de

Filip Tronarp Lund University filip.tronarp@matstat.lu.se

Probabilistic solvers provide a ﬂexible and efﬁcient framework for simulation, uncertainty quantiﬁcation, and inference in dynamical systems. However, like standard solvers, they suffer performance penalties for certain stiff systems, where small steps are required not for reasons of numerical accuracy but for the sake of stability. This issue is greatly alleviated in semi-linear problems by the probabilistic exponential integrators developed in this paper. By including the fast, linear dynamics in the prior, we arrive at a class of probabilistic integrators with favorable properties. Namely, they are proven to be L-stable, and in a certain case reduce to a classic exponential integrator with the added beneﬁt of providing a probabilistic account of the numerical error. The method is also generalized to arbitrary non-linear systems by imposing piece-wise semi-linearity on the prior via Jacobians of the vector ﬁeld at the previous estimates, resulting in probabilistic exponential Rosenbrock methods. We evaluate the proposed methods on multiple stiff differential equations and demonstrate their improved stability and efﬁciency over established probabilistic solvers. The present contribution thus expands the range of problems that can be effectively tackled within probabilistic numerics.

1 Introduction

Dynamical systems appear throughout science and engineering, and their accurate and efﬁcient simulation is a key component in many scientiﬁc problems. There has also been a surge of interest in the intersection with machine learning, both regarding the usage of machine learning methods to model and solve differential equations [36, 18, 35], and in a dynamical systems perspective on machine learning methods themselves [8, 5]. This paper focuses on the numerical simulation of dynamical systems within the framework of probabilistic numerics, which treats the numerical solvers themselves as probabilistic inference methods [11, 12, 33]. In particular, we expand the range of problems that can be tackled within this framework and introduce a new class of stable probabilistic numerical methods for stiff ordinary differential equations (ODEs).

Stiff equations are problems for which certain implicit methods perform much better than explicit ones [10]. But implicit methods come with increased computational complexity per step, as they typically require solving a system of equations. Exponential integrators are an alternative class of methods for efﬁciently solving large stiff problems [48, 16, 7, 15]. They are based on the observation that, if the ODE has a semi-linear structure, the linear part can be solved exactly and only the non-linear part needs to be numerically approximated. The resulting methods are formulated in an explicit manner and do not require solving a system of equations, while achieving similar or better stability than implicit methods. However, such methods have not yet been formulated probabilistically.

37th Conference on Neural Information Processing Systems (Neur IPS 2023).

a. Explicit method (not A-stable)

b. Semi-implicit method (A-stable)

c. Exponential integrator (L-stable)

Figure 1: Probabilistic numerical ODE solvers with different stability properties. Left: The explicit EK0 solver with a 3-times integrated Wiener process prior is unstable and diverges from the true solution. Center: The semi-implicit EK1 with the same prior does not diverge even though it uses a larger step size, due to it being A-stable, but it exhibits oscillations in the initial phase of the solution. Right: The proposed exponential integrator is L-stable and thus does not exhibit any oscillations.

In this paper we develop probabilistic exponential integrators, a new class of probabilistic numerical solvers for stiff semi-linear ODEs. We build on the ODE ﬁlters which have emerged as an efﬁcient and ﬂexible class of probabilistic numerical methods for general ODEs [40, 21, 45]. They have known convergence rates [21, 46], which have also been demonstrated empirically [2, 26, 24], they are applicable to a wide range of numerical differential equation problems [23, 25, 3], their probabilistic output can be integrated into larger inference problems [20, 39, 47], and they can be formulated parallel-in-time [4]. But while it has been shown that the choice of underlying Gauss Markov prior does inﬂuence the resulting ODE solver [30, 45, 2], there has not yet been strong evidence for the utility of priors other than the well-established integrated Wiener process. Probabilistic exponential integrators provide this evidence: in the probabilistic numerics framework, solving the linear part of the ODE exactly corresponds to an appropriate choice of prior.

Contributions Our main contribution is the development of probabilistic exponential integrators, a new class of stable probabilistic solvers for stiff semi-linear ODEs. We demonstrate the close link of these methods to classic exponential integrators in Proposition 1, provide an equivalence result to a classic exponential integrator in Proposition 2, and prove their L-stability in Proposition 3. To enable a numerically stable implementation, we present a quadrature-based approach to directly compute square-roots of the process noise covariance in Section 3.2. Finally, in Section 3.6 we also propose probabilistic exponential Rosenbrock methods for problems in which semi-linearity is not known a priori. We evaluate all proposed methods on multiple stiff problems and demonstrate the improved stability and efﬁciency of the probabilistic exponential integrators over existing probabilistic solvers.

2 Numerical ODE solutions as Bayesian state estimation

Let us ﬁrst consider an initial value problem with some general non-linear ODE, of the form

y(t) = f(y(t), t), t [0, T], (1a) y(0) = y0, (1b)

with vector ﬁeld f : Rd R Rd, initial value y0 Rd, and time span [0, T]. Probabilistic numerical ODE solvers aim to compute a posterior distribution over the ODE solution y(t) such that it satisﬁes the ODE on a discrete set of points T = {tn}N n=0 [0, T], that is

p y(t) y(0) = y0, { y(tn) = f(y(tn), tn)}N n=0 . (2)

We call this quantity, and approximations thereof, a probabilistic numerical ODE solution. Probabilistic numerical ODE solvers thus compute not just a single point estimate of the ODE solution, but a posterior distribution which provides a structured estimate of the numerical approximation error.

In the following, we brieﬂy recapitulate the probabilistic ODE ﬁlter framework of Schober et al. [40] and Tronarp et al. [45] and deﬁne the prior, data model, and approximate inference scheme. In Section 3 we build on these foundations to derive the proposed probabilistic exponential integrator.

2.1 Gauss Markov prior

A priori, we model y(t) with a Gauss Markov process, deﬁned by a stochastic differential equation d Y (t) = AY (t) dt + κB d W(t), Y (0) = Y0, (3) with state Y (t) Rd(q+1), model matrices A Rd(q+1) d(q+1), B Rd(q+1) d, diffusion scaling κ R, and smoothness q N. More precisely, A and B are chosen such that the state is structured as Y (t) = [Y (0)(t), . . . , Y (q)(t)], and then Y (i)(t) models the i-th derivative of y(t). The initial value Y0 Rd(q+1) must be chosen such that it enforces the initial condition, that is, Y (0)(0) = y0.

One concrete example of such a Gauss Markov process that is commonly used in the context of probabilistic numerical ODE solvers is the q-times Integrated Wiener process, with model matrices

AIWP(d,q) =

0 Id 0 ... ... ... ... 0 0 Id 0 0 0

, BIWP(d,q) =

Alternatives include the class of Matérn processes and the integrated Ornstein Uhlenbeck process [46] the latter plays a central role in this paper and will be discussed in detail later.

Y (t) satisﬁes linear Gaussian transition densities of the form [44] Y (t + h) | Y (t) N Φ(h)Y (t), κ2Q(h) , (5) with transition matrices Φ(h) and Q(h) given by

Φ(h) = exp (Ah) , Q(h) = Z h

0 Φ(h τ)BB Φ (h τ) dτ. (6)

These quantities can be computed with a matrix fraction decomposition [44]. For q-times integrated Wiener process priors, closed-form expressions for the transition matrices are available [21].

2.2 Information operator

The likelihood, or data model, of a probabilistic ODE solver relates the uninformed prior to the actual ODE solution of interest with an information operator I [6], deﬁned as I[Y ](t) := E1Y (t) f (E0Y (t), t) , (7) where Ei Rd d(q+1) are selection matrices such that Ei Y (t) = Y (i)(t). I[Y ] then captures how well Y solves the given ODE problem. In particular, I maps the true ODE solution y to the zero function, i.e. I[y] 0. Conversely, if I[y](t) = 0 holds for all t [0, T] then y solves the given ODE. Unfortunately, it is in general infeasible to solve an ODE exactly and enforce I[Y ](t) = 0 everywhere, which is why numerical ODE solvers typically discretize the time interval and take discrete steps. This leads to the data model used in most probabilistic ODE solvers [45]: I[Y ](tn) = E1Y (tn) f (E0Y (tn), tn) = 0, tn T [0, T]. (8) Note that this speciﬁc information operator is closely linked to the IVP considered in Eq. (1a). By deﬁning a (slightly) different data model we can also formulate probabilistic numerical IVP solvers for higher-order ODEs or differential-algebraic equations, or encode additional information such as conservation laws or noisy trajectory observations [3, 39].

2.3 Approximate Gaussian inference

The resulting inference problem is described by a Gauss Markov prior and a Dirac likelihood Y (tn+1) | Y (tn) N Φn Y (tn), κ2Qn , (9a)

Zn | Y (tn) δ (E1Y (tn) f (E0Y (tn), tn)) , (9b)

with Φn := Φ(tn+1 tn), Qn := Q(tn+1 tn), initial value Y (0) = Y0, discrete time grid {tn}N n=0, and zero-valued data Zn = 0 for all n. The solution of the resulting non-linear Gauss Markov regression problem can then be efﬁciently approximated with Bayesian ﬁltering and smoothing techniques [37]. Notable examples that have been used to construct probabilistic numerical ODE solvers include quadrature ﬁlters, the unscented Kalman ﬁlter, the iterated extended Kalman smoother, or particle ﬁlters [19, 45, 46]. Here, we focus on the well-established extended Kalman ﬁlter (EKF). We brieﬂy discuss the EKF for the given state estimation problem in the following.

Prediction Given a Gaussian state estimate Y (tn 1) N (µn 1, Σn 1) and the linear conditional distribution as in Eq. (9a), the marginal distribution Y (tn) N (µ n , Σ n ) is also Gaussian, with

µ n = Φn 1µn 1, (10a)

Σ n = Φn 1Σn 1Φ n 1 + κ2Qn 1. (10b)

Linearization To efﬁciently compute a tractable approximation of the true posterior, the EKF linearizes the information operator I around the predicted mean µ n , i.e. I[Y ](tn) Hn Y (tn) + bn,

Hn = E1 Fy E0, (11a)

bn = Fy E0µ n f(E0µ n , tn). (11b)

An exact linearization with Jacobian Fy = yf(E0µ n , tn) leads to a semi-implicit probabilistic ODE solver, which we call the EK1 [45]. Other choices include the zero matrix Fy = 0, which results in the explicit EK0 solver [40, 21], or a diagonal Jacobian approximation (the Diagonal EK1) which combines some stability beneﬁts of the EK1 with the lower computational cost of the EK0 [24].

Correction step In the linearized observation model, the posterior distribution of Y (tn) given the datum Zn is again Gaussian. Its posterior mean and covariance (µn, Σn) are given by

Sn = HnΣ n H n , (12a)

Kn = Σ n H n S 1 n , (12b)

µn = µ n Kn E1µ n f(E0µ n , tn) , (12c)

Σn = (I Kn Hn) Σ n . (12d)

This is also known as the update step of the EKF.

Smoothing To condition the state estimates on all data, the EKF can be followed by a smoothing pass. Starting with µS N := µN and ΣS N := Σn, it consists of the following backwards recursion:

Gn = ΣnΦ n Σ n+1 1 , (13a)

µS n = µn + Gn(µS n+1 µ n+1), (13b)

ΣS n = Σn + Gn(ΣS n+1 Σ n+1)G n . (13c)

Result The above computations result in a probabilistic numerical ODE solution with marginals

p Y (ti) {E1Y (tn) f (E0Y (tn), tn) = 0}N n=0 N µS i , ΣS i , (14)

which, by construction of the state Y , also contains estimates for the ODE solution as y(t) = E0Y (t). Since the EKF-based probabilistic solver does not compute only the marginals in Eq. (14), but a full posterior distribution for the continuous object y(t), it can be evaluated for times t / T (also known as dense output in the context of ODE solvers); it can produce joint samples from this posterior; and it can be used as a Gauss Markov prior for subsequent inference tasks [40, 2, 47].

2.4 Practical considerations and implementation details

To improve numerical stability and preserve positive-semideﬁniteness of the computed covariance matrices, probabilistic ODE solvers typically operate on square-roots of covariance matrices, deﬁned by a matrix decomposition of the form M =

M [26]. For example, the Cholesky factor is one possible square-root of a positive deﬁnite matrix. But in general, the algorithm does not require the square-roots to be upperor lower-triangular, or even square. Additionally, we compute the exact initial state Y0 from the IVP using Taylor-mode automatic differentiation [9, 26], we compute smoothing estimates with preconditioning [26], and we calibrate uncertainties globally with a quasi-maximum likelihood approach [45, 2].

3 Probabilistic exponential integrators

In the remainder of the paper, unless otherwise stated, we focus on IVPs with a semi-linear vector-ﬁeld

y(t) = f(y(t), t) = Ly(t) + N(y(t), t). (15)

Assuming N admits a Taylor series expansion around t, the variation of constants formula provides a formal expression of the solution at time t + h:

y(t + h) = exp(Lh)y(t) +

k=0 hk+1 Z 1

0 exp(Lh(1 τ))τ k

dtk N(y(t), t). (16)

This observation is the starting point for the development of exponential integrators [31, 15]. By further deﬁning the so-called ϕ-functions

ϕk(z) = Z 1

0 exp(z(1 τ)) τ k 1

(k 1)! dτ, (17)

the above identity of the ODE solution simpliﬁes to

y(t + h) = exp(Lh)y(t) +

k=0 hk+1ϕk+1(Lh) dk

dtk N(y(t), t). (18)

In this section we develop a class of probabilistic exponential integrators. This is achieved by deﬁning an appropriate class of priors that absorbs the partial linearity, which leads to the integrated Ornstein Uhlenbeck processes. Proposition 1 below directly relates this choice of prior to the classical exponential integrators. Proposition 2 demonstrates a direct equivalence between the predictor-corrector form exponential trapezoidal rule and the once integrated Ornstein Uhlenbeck process. Furthermore, the favorable stability properties of classical exponential integrators is retained for the probabilistic counterparts as shown in Proposition 3.

3.1 The integrated Ornstein Uhlenbeck process

In Section 2.1 we highlighted the choice of the q-times integrated Wiener process prior, which essentially corresponds to modeling the (q 1)-th derivative of the right-hand side f with a Wiener process. Here we follow a similar motivation, but only for the non-linear part N. Differentiating both sides of Eq. (15) q 1 times with respect to t yields

dtq 1 y(t) = L dq 1

dtq 1 y(t) + dq 1

dtq 1 N(y(t), t). (19)

Then, modeling dq 1 dtq 1 N(y(t), t) as a Wiener process and relating the result to y(t) gives

dy(i)(t) = y(i+1)(t) dt, (20a)

dy(q)(t) = Ly(q)(t) dt + κId d W (q)(t). (20b)

This process is also known as the q-times integrated Ornstein Uhlenbeck process (IOUP), with rate parameter L and diffusion parameter κ. It can be equivalently stated with the previously introduced notation (Section 2.1), by deﬁning a state Y (t), as the solution of a linear time-invariant (LTI) SDE as in Eq. (3), with system matrices

AIOUP(d,q) =

0 Id 0 ... ... ... ... 0 0 Id 0 0 L

, BIOUP(d,q) =

Remark 1 (The mean of the IOUP process solves the linear part of the ODE exactly). By taking the expectation of Eq. (20b) and by linearity of integration, we can see that the mean of the IOUP satisﬁes

µ(0)(t) = Lµ(0)(t), µ(0)(0) = y0. (22)

This is in line with the motivation of exponential integrators: the linear part of the ODE is solved exactly, and we only need to approximate the non-linear part. Figure 2 visualizes this idea.

3 a. Integrated Wiener process

b. Integrated Ornstein-Uhlenbeck

c. IOUP + initial value

Figure 2: Damped oscillator dynamics and priors with different degrees of encoded information. Left: Once-integrated Wiener process, a popular prior for probabilistic ODE solvers. Center: Onceintegrated Ornstein Uhlenbeck process (IOUP) with rate parameter chosen to encode the known linearity of the ODE. Right: IOUP with both the ODE information and a speciﬁed initial value and derivative. This is the kind of prior used in the probabilistic exponential integrator.

3.2 The transition parameters of the integrated Ornstein Uhlenbeck process

Since the process Y (t) is deﬁned as the solution of a linear time-invariant SDE, it satisﬁes discrete transition densities p(Y (t + h) | Y (t)) = N Φ(h)Y (t), κ2Q(h) . The following result shows that the transition parameters are intimately connected with the ϕ-functions deﬁned in Eq. (17). Proposition 1. The transition matrix of a q-times integrated Ornstein Uhlenbeck process satisﬁes

Φ(h) = exp AIWP(d,q 1)h Φ12(h) 0 exp(Lh)

, with Φ12(h) :=

hqϕq(Lh) hq 1ϕq 1(Lh) ... hϕ1(Lh)

Proof in Appendix A. Although Proposition 1 indicates that Φ(h) may be computed more efﬁciently than by calling a matrix-exponential on a d(q + 1) d(q + 1) matrix, this is known to be numerically less stable [41]. We therefore compute Φ(h) with the standard matrix-exponential formulation.

Directly computing square-roots of the process noise covariance Numerically stable probabilistic ODE solvers require a square-root, p

Q(h), of the process noise covariance rather than the full matrix, Q(h). For IWP priors this can be computed from the closed-form representation of Q(h) via an appropriately preconditioned Cholesky factorization [26]. However, for IOUP priors we have not found an analogous method that works reliably. Therefore, we compute p

Q(h) directly with numerical quadrature. More speciﬁcally, given a quadrature rule with nodes τi [0, h] and positive weights wi > 0, the integral for Q(h) given in Eq. (6) is approximated by

i=1 wi exp(A(h τi))BB exp(A (h τi)) =:

i=1 Mi, (24)

with square-roots Mi = wi exp(A(h τi))B of the summands, which is well-deﬁned since wi > 0. We can thus compute a square-root representation of the sum with a QR-decomposition

X R = QR M1 Mm . (25)

We obtain Q(h) R R, and therefore an approximate square-root factor is given by p

Q(h) R . Similar ideas have previously been employed for time integration of Riccati equations [42, 43]. We use this quadrature-trick for all IOUP methods, with Gauss Legendre quadrature on m = q nodes.

3.3 Linearization and correction

The information operator of the probabilistic exponential integrator is deﬁned exactly as in Section 2.2. But since we now assume a semi-linear vector-ﬁeld f, we have an additional option for the linearization: instead of choosing the exact Fy = yf (EK1) or the zero-matrix Fy = 0 (EK0), a cheap approximate Jacobian is given by the linear part Fy = L. We denote this approach by EKL. This is chosen as the default for the probabilistic exponential integrator. Note that the EKL approach can also be combined with an IWP prior, which will be serve as an additional baseline in the Section 4.

3.4 Equivalence to the classic exponential trapezoidal rule in predict-evaluate-correct mode

Now that the probabilistic exponential integrator has been deﬁned, we can establish an equivalence result to a classic exponential integrator, similarly to the closely-related equivalence statement by Schober et al. [40, Proposition 1] for the non-exponential case.

Proposition 2 (Equivalence to the PEC exponential trapezoidal rule). The mean estimate of the probabilistic exponential integrator with a once-integrated Ornstein Uhlenbeck prior with rate parameter L is equivalent to the classic exponential trapezoidal rule in predict-evaluate-correct mode, with the predictor being the exponential Euler method. That is, it is equivalent to the scheme

yn+1 = ϕ0(Lh)yn + hϕ1(Lh)N( yn), (26a)

yn+1 = ϕ0(Lh)yn + hϕ1(Lh)N( yn) + h2ϕ2(Lh)N( yn+1) N( yn)

where Eq. (26a) corresponds to a prediction step with the exponential Euler method, and Eq. (26b) corresponds to a correction step with the exponential trapezoidal rule.

The proof is given in Appendix B. This equivalence result provides another theoretical justiﬁcation for the proposed probabilistic exponential integrator. But note that the result only holds for the mean, while the probabilistic solver computes additional quantities in order to track the solution uncertainty, namely covariances. These are not provided by a classic exponential integrator.

3.5 L-stability of the probabilistic exponential integrator

When solving stiff ODEs, the actual efﬁciency of a numerical method often depends on its stability. One such property is A-stability: It guarantees that the numerical solution of a decaying ODE will also decay, independently of the chosen step size. In contrast, explicit methods typically only decay for sufﬁciently small steps. In the context of probabilistic ODE solvers, the EK0 is considered to be explicit, but the EK1 with IWP prior has been shown to be A-stable [45]. Here, we show that the probabilistic exponential integrator satisﬁes the stronger L-stability: the numerical solution not only decays, but it decays fast, i.e. it goes to zero as the step size goes to inﬁnity. Figure 1 visualizes the different probabilistic solver stabilities. For formal deﬁnitions, see for example [27, Section 8.6].

Proposition 3 (L-stability). The probabilistic exponential integrator is L-stable.

The full proof is given in Appendix C. The property essentially follows from Remark 1 which stated that the IOUP solves linear ODEs exactly. This implies fast decay and gives L-stability.

3.6 Probabilistic exponential Rosenbrock-type methods

We conclude with a short excursion into exponential Rosenbrock methods [14, 17, 28]: Given a nonlinear ODE y(t) = f(y(t), t), exponential Rosenbrock methods perform a continuous linearization of the right-hand side f around the numerical ODE solution and essentially solve a sequence of IVPs

y(t) = Jny(t) + (f(y(t), t) Jny(t)) , t [tn, tn+1], (27a) y(tn) = yn, (27b)

where Jn is the Jacobian of f at the numerical solution estimate ˆy(tn). This approach enables exponential integrators for problems where the right-hand side f is not semi-linear. Furthermore, by automatically linearizing along the numerical solution the linearization can be more accurate, the Lipschitz-constant of the non-linear remainder becomes smaller, and the resulting solvers can thus be more efﬁcient than their globally linearized counterparts [17].

This can also be done in the probabilistic setting: By linearizing the ODE right-hand side f at each step of the solver around the ﬁltering mean E0µn, we (locally) obtain a semi-linear problem. Then, updating the rate parameter of the integrated Ornstein Uhlenbeck process at each step of the numerical solver results in probabilistic exponential Rosenbrock-type methods. As before, the linearization of the information operator can be done with any of the EK0, EK1, or EKL. But since here the prediction relies on exact local linearization, we will by default also use an exact EK1 linearization. The resulting solver and its stability and efﬁciency will be evaluated in the following experiments.

Number of steps 1 10 100

Final error

y = y + 10 1 y2

Number of steps 1 10 100

y = y + 10 4 y2

Number of steps 1 10 100

y = y + 10 7 y2

Number of steps 1 10 100

y = y + 10 10 y2

EK0 & IWP(2) EK1 & IWP(2) EKL & IWP(2) EKL & IOUP(2)

Figure 3: The IOUP prior is more beneﬁcial with increasing linearity of the ODE. In all three examples, the IOUP-based exponential integrator achieves lower error while requiring fewer steps than the IWP-based solvers. This effect is more pronounced for the more linear ODEs.

4 Experiments

In this section we investigate the utility and performance of the proposed probabilistic exponential integrators and compare it to standard non-exponential probabilistic solvers on multiple ODEs. All methods are implemented in the Julia programming language [1], with special care being taken to implement the solvers in a numerically stable manner, that is, with exact state initialization, preconditioned state transitions, and a square-root implementation [26]. Reference solutions are computed with the Differential Equations.jl package [34]. All experiments run on a single, consumerlevel CPU. Code for the implementation and experiments is publicly available on Git Hub.1

4.1 Logistic equation with varying degrees of non-linearity

We start with a simple one-dimensional initial value problem: a logistic model with negative growth rate parameter r = 1 and carrying capacity K R+, of the form

y(t) = y(t) + 1

K y(t)2, t [0, 10], (28a)

y(0) = 1. (28b)

The non-linearity of this problem can be directly controlled through the parameter K. Therefore, this test problem lets us investigate the IOUP s capability to leverage known linearity in the ODE.

We compare the proposed exponential integrator to all introduced IWP-based solvers, with different linearization strategies: EK0 approximates yf 0 (and is thus explicit), EKL approximates yf 1, and EK1 linearizes with the correct Jacobian yf. The results for four different values of K are shown in Fig. 3. The explicit solver shows the largest error of all compared solvers, likely due to its lacking stability. On the other hand, the proposed exponential integrator behaves as expected: the IOUP prior is most beneﬁcial for larger values of K, and as the non-linearity becomes more pronounced the performance of the IOUP approaches that of the IWP-based solver. Though for large step sizes, the IOUP outperforms the IWP prior even for the most non-linear case with K = 10.

4.2 Burger s equation

Here, we consider Burger s equation, which is a semi-linear partial differential equation (PDE)

tu(x, t) = D 2 xu(x, t) u(x, t) xu(x, t), x [0, 1], t [0, 1], (29)

with diffusion coefﬁcient D R+. We transform the problem into a semi-linear ODE with the method of lines [29, 38], and discretize the spatial domain on 250 equidistant points and approximate the differential operators with ﬁnite differences. The full IVP speciﬁcation, including all domains, initial and boundary conditions, and additional information on the discretization, is given in Appendix D.

The results shown in Fig. 4 demonstrate the different stability properties of the solvers: The explicit EK0 with IWP prior is unable to solve the IVP for any of the step sizes due to its insufﬁcient stability, and even the A-stable EK1 and the more approximate EKL require small enough steps t < 10 1. On the other hand, both exponential integrators are able to compute meaningful solutions for a larger range of step sizes. They both achieve lower errors for most settings than their non-exponential

1https://github.com/nathanaelbosch/probabilistic-exponential-integrators

1 a. ODE solution

Step size 10-3 10-2 10-1 100

Final error

b. Work-precision diagram

Number of f evaluations 101 102 103

EK0 & IWP(2) EK1 & IWP(2) EKL & IWP(2) EKL & IOUP(2) EK1 & IOUP(2) (RB)

Figure 4: Benchmarking probabilistic ODE solvers on Burger s equation. Exponential and nonexponential probabilistic solvers are compared on Burger s equation (a) in two work-precision diagrams (b). Both exponential integrators with IOUP prior achieve lower errors than the existing IWP-based solvers, in particular for large steps. This indicates their stronger stability properties.

2 a. ODE solution

Step size 10-2 10-1 100

Final error

105 b. Work-precision diagram

Runtime [s] 10-1 100 101

EK0 & IWP(2) EK1 & IWP(2) EKL & IWP(2) EKL & IOUP(2) EK1 & IOUP(2) (RB)

Figure 5: Benchmarking probabilistic ODE solvers on a reaction-diffusion model. Exponential and non-exponential probabilistic solvers are compared on a reaction-diffusion model (a) in two workprecision diagrams (b). The proposed exponential integrators with IOUP prior achieve lower errors per step size than the existing IWP-based methods. The runtime comparison shows the increased cost of the Rosenbrock-type (RB) method, while the non-Rosenbrock probabilistic exponential integrator performs best in this comparison.

counterparts. The second diagram in Fig. 4 compares the achieved error to the number of vector-ﬁeld evaluations and points out a trade-off between both exponential methods: Since the Rosenbrock method additionally computes two Jacobians (with automatic differentiation) per step, it needs to evaluate the vector-ﬁeld more often than the non-Rosenbrock method. Thus, for expensive-to-evaluate vector ﬁelds the standard probabilistic exponential integrator might be preferable.

4.3 Reaction-diffusion model

Finally, we consider a discretized reaction-diffusion model given by a semi-linear PDE

tu(x, t) = D 2 xu(x, t) + R(u(x, t)), x [0, 1], t [0, T], (30)

where D R+ is the diffusion coefﬁcient and R(u) = u(1 u) is a logistic reaction term [22]. A ﬁnite-difference discretization of the spatial domain transforms this PDE into an IVP with semi-linear ODE. The full problem speciﬁcation is provided in Appendix D.

Figure 5 shows the results. We again observe the improved stability of the exponential integrator variants by their lower error for large step sizes, and they outperform the IWP-based methods on all settings. The runtime-evaluation in Fig. 5 also visualizes another drawback of the Rosenbrock-type method: Since the problem is re-linearized at each step, the IOUP also needs to be re-discretized and thus a matrix exponential needs to be computed. In comparison, the non-Rosenbrock method only discretizes the IOUP prior once at the start of the solve. This advantage makes the non-Rosenbrock probabilistic exponential integrator the most performant solver in this experiment.

5 Limitations

The probabilistic exponential integrator shares many properties of both classic exponential integrators and of other ﬁltering-based probabilistic solvers. This also brings some challenges.

Cost of computing matrix exponentials The IOUP prior is more expensive to discretize than the IWP as it requires computing a matrix exponential. This trade-off is well-known also in the context of classic exponential integrators. One approach to reduce computational cost is to compute the matrix exponential only approximately [32], for example with Krylov-subspace methods [13, 17]. Extending these techniques to the probabilistic solver setting thus poses an interesting direction for future work.

Cubic scaling in the ODE dimension The probabilistic exponential integrator shares the complexity most (semi-)implicit ODE solvers: while being linear in the number of time steps, it scales cubically in the ODE dimension. By exploiting structure in the Jacobian and in the prior, some ﬁltering-based ODE solvers have been formulated with linear scaling in the ODE dimension [24]. But this approach does not directly extend to the IOUP-prior. Nevertheless, exploiting known structure could be particularly relevant to construct solvers for speciﬁc ODEs, such as certain discretized PDEs.

6 Conclusion

We have presented probabilistic exponential integrators, a new class of probabilistic solvers for stiff semi-linear ODEs. By incorporating the fast, linear dynamics directly into the prior of the solver, the method essentially solves the linear part exactly, in a similar manner as classic exponential integrators. We also extended the proposed method to general non-linear systems via iterative re-linearization and presented probabilistic exponential Rosenbrock-type methods. Both methods have been shown both theoretically and empirically to be more stable than their non-exponential probabilistic counterparts. This work further expands the toolbox of probabilistic numerics and opens up new possibilities for accurate and efﬁcient probabilistic simulation and inference in stiff dynamical systems.

Acknowledgments and Disclosure of Funding

The authors gratefully acknowledge ﬁnancial support by the German Federal Ministry of Education and Research (BMBF) through Project ADIMEM (FKZ 01IS18052B), and ﬁnancial support by the European Research Council through ERC St G Action 757275 / PANAMA; the DFG Cluster of Excellence Machine Learning - New Perspectives for Science, EXC 2064/1, project number 390727645; the German Federal Ministry of Education and Research (BMBF) through the Tübingen AI Center (FKZ: 01IS18039A); and funds from the Ministry of Science, Research and Arts of the State of Baden-Württemberg. Filip Tronarp was partially supported by the Wallenberg AI, Autonomous Systems and Software Program (WASP) funded by the Knut and Alice Wallenberg Foundation. The authors thank the International Max Planck Research School for Intelligent Systems (IMPRS-IS) for supporting Nathanael Bosch. The authors also thank Jonathan Schmidt for many valuable discussions and for helpful feedback on the manuscript.

[1] J. Bezanson, A. Edelman, S. Karpinski, and V. B. Shah. Julia: A fresh approach to numerical computing. SIAM review, 59(1):65 98, 2017.

[2] N. Bosch, P. Hennig, and F. Tronarp. Calibrated adaptive probabilistic ODE solvers. In International Conference on Artiﬁcial Intelligence and Statistics. PMLR, 2021.

[3] N. Bosch, F. Tronarp, and P. Hennig. Pick-and-mix information operators for probabilistic ODE solvers. In International Conference on Artiﬁcial Intelligence and Statistics. PMLR, 2022.

[4] N. Bosch, A. Corenﬂos, F. Yaghoobi, F. Tronarp, P. Hennig, and S. Särkkä. Parallel-in-time probabilistic numerical ODE solvers, 2023.

[5] R. T. Q. Chen, Y. Rubanova, J. Bettencourt, and D. K. Duvenaud. Neural ordinary differential equations. In Advances in Neural Information Processing Systems. Curran Associates, Inc., 2018.

[6] J. Cockayne, C. Oates, T. Sullivan, and M. Girolami. Bayesian probabilistic numerical methods. SIAM Review, 61:756 789, 2019.

[7] S. M. Cox and P. C. Matthews. Exponential time differencing for stiff systems. Journal of Computational Physics, 176(2):430 455, 2002.

[8] W. E. A proposal on machine learning via dynamical systems. Communications in Mathematics and Statistics, 5(1):111, Mar 2017.

[9] A. Griewank and A. Walther. Evaluating Derivatives. Society for Industrial and Applied Mathematics, second edition, 2008.

[10] E. Hairer and G. Wanner. Solving Ordinary Differential Equations II: Stiff and Differential Algebraic Problems. Springer series in computational mathematics. Springer-Verlag, 1991.

[11] P. Hennig, M. A. Osborne, and M. Girolami. Probabilistic numerics and uncertainty in computations. Proceedings. Mathematical, physical, and engineering sciences, 471(2179): 20150142 20150142, Jul 2015.

[12] P. Hennig, M. A. Osborne, and H. P. Kersting. Probabilistic Numerics: Computation as Machine Learning. Cambridge University Press, 2022.

[13] M. Hochbruck and C. Lubich. On Krylov subspace approximations to the matrix exponential operator. SIAM Journal on Numerical Analysis, 34(5):19111925, Oct 1997.

[14] M. Hochbruck and A. Ostermann. Explicit integrators of Rosenbrock-type. Oberwolfach Reports, 3(2):1107 1110, 2006.

[15] M. Hochbruck and A. Ostermann. Exponential integrators. Acta Numerica, 19:209286, 2010.

[16] M. Hochbruck, C. Lubich, and H. Selhofer. Exponential integrators for large systems of differential equations. SIAM Journal on Scientiﬁc Computing, 19(5):1552 1574, 1998.

[17] M. Hochbruck, A. Ostermann, and J. Schweitzer. Exponential Rosenbrock-type methods. SIAM Journal on Numerical Analysis, 47(1):786 803, 2009.

[18] G. E. Karniadakis, I. G. Kevrekidis, L. Lu, P. Perdikaris, S. Wang, and L. Yang. Physicsinformed machine learning. Nature Reviews Physics, 3(6):422440, May 2021.

[19] H. Kersting and P. Hennig. Active uncertainty calibration in Bayesian ode solvers. In Proceedings of the 32nd Conference on Uncertainty in Artiﬁcial Intelligence (UAI), pages 309 318, June 2016.

[20] H. Kersting, N. Krämer, M. Schiegg, C. Daniel, M. Tiemann, and P. Hennig. Differentiable likelihoods for fast inversion of Likelihood-free dynamical systems. In International Conference on Machine Learning. PMLR, 2020.

[21] H. Kersting, T. J. Sullivan, and P. Hennig. Convergence rates of Gaussian ODE ﬁlters. Statistics and computing, 30(6):1791 1816, 2020.

[22] A. N. Kolmogorov. A study of the equation of diffusion with increase in the quantity of matter, and its application to a biological problem. Moscow University Bulletin of Mathematics, 1:1 25, 1937.

[23] N. Krämer and P. Hennig. Linear-time probabilistic solution of boundary value problems. In Advances in Neural Information Processing Systems. Curran Associates, Inc., 2021.

[24] N. Krämer, N. Bosch, J. Schmidt, and P. Hennig. Probabilistic ODE solutions in millions of dimensions. In International Conference on Machine Learning. PMLR, 2022.

[25] N. Krämer, J. Schmidt, and P. Hennig. Probabilistic numerical method of lines for timedependent partial differential equations. In International Conference on Artiﬁcial Intelligence and Statistics. PMLR, 2022.

[26] N. Krämer and P. Hennig. Stable implementation of probabilistic ode solvers, 2020.

[27] J. D. Lambert. Computational Methods in Ordinary Differential Equations. Introductory Mathematics for Scientists And Engineers. Wiley, 1973.

[28] V. T. Luan and A. Ostermann. Exponential Rosenbrock methods of order ﬁve construction, analysis and numerical comparisons. Journal of Computational and Applied Mathematics, 255: 417 431, 2014.

[29] N. K. Madsen. The method of lines for the numerical solution of partial differential equations. Proceedings of the SIGNUM meeting on Software for partial differential equations, 1975.

[30] E. Magnani, H. Kersting, M. Schober, and P. Hennig. Bayesian ﬁltering for ODEs with bounded derivatives, 2017.

[31] B. V. Minchev and W. M. Wright. A review of exponential integrators for ﬁrst order semi-linear problems. Technical report, Norges Teknisk-Naturvetenskaplige Universitet, 2005.

[32] C. Moler and C. Van Loan. Nineteen dubious ways to compute the exponential of a matrix, twenty-ﬁve years later. SIAM Review, 45(1):349, Jan 2003.

[33] C. J. Oates and T. J. Sullivan. A modern retrospective on probabilistic numerics. Statistics and Computing, 29(6):1335 1351, 2019.

[34] C. Rackauckas and Q. Nie. Differential Equations.jl a performant and feature-rich ecosystem for solving differential equations in julia. Journal of Open Research Software, 5(1), 2017.

[35] C. Rackauckas, Y. Ma, J. Martensen, C. Warner, K. Zubov, R. Supekar, D. Skinner, A. Ramadhan, and A. Edelman. Universal differential equations for scientiﬁc machine learning, 2021.

[36] M. Raissi, P. Perdikaris, and G. Karniadakis. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics, 378:686707, Feb 2019.

[37] S. Särkkä. Bayesian Filtering and Smoothing, volume 3 of Institute of Mathematical Statistics textbooks. Cambridge University Press, 2013.

[38] W. E. Schiesser. The numerical method of lines: integration of partial differential equations. Elsevier, 2012.

[39] J. Schmidt, N. Krämer, and P. Hennig. A probabilistic state space model for joint inference from differential equations and data. In Advances in Neural Information Processing Systems. Curran Associates, Inc., 2021.

[40] M. Schober, S. Särkkä, and P. Hennig. A probabilistic model for the numerical solution of initial value problems. Statistics and Computing, 29(1):99 122, Jan 2019.

[41] R. B. Sidje. Expokit: A software package for computing matrix exponentials. ACM Transactions on Mathematical Software, 24(1):130156, mar 1998.

[42] T. Stillfjord. Low-rank second-order splitting of large-scale differential Riccati equations. IEEE Transactions on Automatic Control, 60(10):2791 2796, 2015.

[43] T. Stillfjord. Adaptive high-order splitting schemes for large-scale differential Riccati equations. Numerical Algorithms, 78(4):11291151, Sep 2017.

[44] S. Särkkä and A. Solin. Applied Stochastic Differential Equations. Institute of Mathematical Statistics Textbooks. Cambridge University Press, 2019.

[45] F. Tronarp, H. Kersting, S. Särkkä, and P. Hennig. Probabilistic solutions to ordinary differential equations as nonlinear Bayesian ﬁltering: a new perspective. Statistics and Computing, 29(6): 1297 1315, 2019.

[46] F. Tronarp, S. Särkkä, and P. Hennig. Bayesian ODE solvers: The maximum a posteriori estimate. Statistics and Computing, 31(3):1 18, 2021.

[47] F. Tronarp, N. Bosch, and P. Hennig. Fenrir: Physics-enhanced regression for initial value problems. In International Conference on Machine Learning. PMLR, 2022.

[48] C. Van Loan. Computing integrals involving the matrix exponential. IEEE Transactions on Automatic Control, 23(3):395 404, 1978.

Probabilistic Exponential Integrators Appendix

A Proof of Proposition 1: Structure of the transition matrix

Proof of Proposition 1. The drift-matrix AIOUP(d,q) as given in Eq. (21) has block structure

AIOUP(d,q) = AIWP(d,q 1) Eq 1 0 L

where Eq 1 := [0 . . . 0 Id] Rdq d. From Van Loan [48, Theorem 1], it follows

Φ(h) = exp AIWP(d,q 1)h Φ12(h) 0 exp(Lh)

which is precisely Eq. (23). The same theorem also gives Φ12(h) as

Φ12(h) = Z h

0 exp(AIWP(d,q 1)(h τ))E(d 1) q 1 exp(Lτ) dτ. (33)

Its ith d d block is readily given by

(Φ12(h))i = Z h

0 E i exp(AIWP(d,q 1)(h τ))Eq 1 exp(Lτ) dτ

(q 1 i)! exp(Lτ) dτ

(q 1 i)! exp(Lh(1 τ)) dτ

= hq iϕq i(Lh),

where the second last equality used the change of variables τ = h(1 u), and the last line follows by deﬁnition.

B Proof of Proposition 2: Equivalence to a classic exponential integrator

We ﬁrst brieﬂy recapitulate the probabilistic exponential integrator setup for the case of the once integrated Ornstein Uhlenbeck process, and then provide some auxiliary results. Then, we prove Proposition 2 in Appendix B.3.

B.1 The probabilistic exponential integrator with once-integrated Ornstein Uhlenbeck prior

The integrated Ornstein Uhlenbeck process prior with rate parameter L results in transition densities Y (t + h) | Y (t) N (Y (t + h); Φ(h)Y (t), Q(h)), with transition matrices (from Proposition 1)

Φ(h) = exp(Ah) = I hϕ1(Lh) 0 ϕ0(Lh)

0 exp(Aτ)BB exp(A τ) dτ (36)

I τϕ1(Lτ) 0 ϕ0(Lτ)

I τϕ1(Lτ) 0 ϕ0(Lτ)

τ 2ϕ1(Lτ)ϕ1(Lτ) τϕ1(Lτ)ϕ0(Lτ)

τϕ0(Lτ)ϕ1(Lτ) ϕ0(Lτ)ϕ0(Lτ)

where we assume a unit diffusion σ2 = 1. To simplify notation, we assume an equidistant time grid T = {tn}N n=0 with tn = n h for some step size h, and we denote the constant transition matrices simply by Φ and Q and write Yn = Y (tn).

Before getting to the actual proof, let us also brieﬂy recapitulate the ﬁltering formulas that are computed at each solver step. Given a Gaussian distribution Yn N (Yn; µn, Σn), the prediction step computes

µ n+1 = Φµn, (39)

Σ n+1 = Φ(h)ΣnΦ(h) + Q(h). (40)

Then, the combined linearization and correction step compute

ˆzn+1 = E1µ n+1 f(E0µ n+1), (41)

Sn+1 = HΣ n+1H , (42)

Kn+1 = Σ n+1H S 1 n+1, (43)

µn+1 = µ n+1 Kn+1ˆzn+1, (44)

Σn+1 = Σ n+1 Kn+1Sn+1K n+1, (45)

with observation matrix H = E1 LE0 = [ L I], since we perform the proposed EKL linearization.

B.2 Auxiliary results

In the following, we show some properties of the transition matrices and the covariances that will be needed in the proof of Proposition 2 later.

First, note that by deﬁning ϕ0(z) = exp z, the ϕ-functions satisfy the following recurrence formula:

zϕk(z) = ϕk 1(z) 1 (k 1)!. (46)

See e.g. Hochbruck and Ostermann [15]. This property will be used throughout the remainder of the section. Lemma B.1. The transition matrices Φ(h), Q(h) of the once integrated Ornstein Uhlenbeck process with rate parameter L satisfy

HΦ(h) = [ L I] , (47)

Q(h)H = h2ϕ2(Lh) hϕ1(Lh)

HQ(h)H = h I, (49)

HΦ(h) = (E1 LE0) I hϕ1(Lh) 0 ϕ0(Lh)

= [0 ϕ0(Lh)] L [I hϕ1(Lh)] = [ L I] . (50)

Q(h)H = Z h

τ 2ϕ1(Lτ)ϕ1(Lτ) τϕ1(Lτ)ϕ0(Lτ)

τϕ0(Lτ)ϕ1(Lτ) ϕ0(Lτ)ϕ0(Lτ)

τϕ1(Lτ)ϕ0(Lτ) Lτ 2ϕ1(Lτ)ϕ1(Lτ)

ϕ0(Lτ)ϕ0(Lτ) Lτϕ0(Lτ)ϕ1(Lτ)

τϕ1(Lτ) ϕ0(Lτ) Lτϕ1(Lτ)

ϕ0(Lτ) ϕ0(Lτ) Lτϕ1(Lτ) dτ (53)

τϕ1(Lτ) ϕ0(Lτ)

= h2ϕ2(Lh) hϕ1(Lh)

where we used Lτϕ1(Lτ) = ϕ0(Lτ) I, and τ τ kϕk(Lτ) = τ k 1ϕk 1(Lτ). It follows that

HQ(h)H = H h2ϕ2(Lh) hϕ1(Lh)

= h (ϕ1(Lh) Lhϕ2(Lh)) = h I, (56)

where we used Lτϕ2(Lτ) = ϕ1(Lτ) I.

Lemma B.2. The prediction covariance Σ n+1 satisﬁes

Σ n+1H = Q(h)H . (57)

Proof. First, since the observation model is noiseless, the ﬁltering covariance Σn satisﬁes

HΣn = [0 0] . (58)

This can be shown directly from the correction step formula:

HΣn = HΣ n HKn Sn K n (59)

= HΣ n H Σ n H S 1 n Sn K n (60)

= HΣ n HΣ n H HΣ n H 1 Sn K n (61)

= HΣ n ISn K n (62)

= HΣ n Sn Σ n H S 1 n (63)

= HΣ n Sn S 1 n HΣ n (64)

= [0 0] . (65)

Next, since the observation matrix is H = [ L I], the ﬁltering covariance Σn is structured as

[Σn]00 I L . (66)

This can be shown directly from Eq. (58):

[0 0] = HΣ = [ L I] Σ00 Σ01 Σ10 Σ11

= [Σ10 LΣ00 Σ11 LΣ01] , (67)

Σ10 = LΣ00, (68)

Σ11 = LΣ01 = LΣ 10 = LΣ00L . (69)

Σ = Σ00 LΣ00 Σ00L LΣ00L

Σ00 I L . (70)

Finally, together with Lemma B.1 we can derive the result:

Σ n+1H = Φ(h)ΣnΦ(h) H + Q(h)H (71)

+ Q(h)H (72)

Σn 0 + Q(h)H (73)

= Q(h)H . (74)

B.3 Proof of Proposition 2

With these results, we can now prove Proposition 2.

Proof of Proposition 2. We prove the proposition by induction, showing that the ﬁltering means are all of the form

µn := yn Lyn + N( yn)

where yn, yn are deﬁned as

y0 := y0, (76) yn+1 := ϕ0(Lh)yn + hϕ1(Lh)N( yn), (77) yn+1 := ϕ0(Lh)yn + hϕ1(Lh)N( yn) hϕ2(Lh) (N( yn) N( yn+1)) . (78)

This result includes the statement of Proposition 2.

Base case n = 0 The initial distribution of the probabilistic solver is chosen as

µ0 = y0 Ly0 + N( y0)

, Σ0 = 0. (79)

This proves the base case n = 0.

Induction step n n + 1 Now, let

µn = yn Lyn + N( yn)

be the ﬁltering mean at step n and Σn be the ﬁltering covariance. The prediction mean is of the form

µ n+1 = Φ(h)µn = yn + hϕ1(Lh)(Lyn + N( yn)) ϕ0(Lh)(Lyn + N( yn))

= ϕ0(Lh)yn + hϕ1(Lh)N( yn) ϕ0(Lh)(Lyn + N( yn))

The residual ˆzn+1 is then of the form

ˆzn+1 = E1µ n+1 f(E0µ n+1) (82)

= ϕ0(Lh)(Lyn + N( yn)) f (ϕ0(Lh)yn + hϕ1(Lh)N( yn)) (83) = ϕ0(Lh)(Lyn + N( yn)) L (ϕ0(Lh)yn + hϕ1(Lh)N( yn)) N ( yn+1) (84) = ϕ0(Lh)Lyn + ϕ0(Lh)N( yn) Lϕ0(Lh)yn Lhϕ1(Lh)N( yn) N ( yn+1) (85) = (ϕ0(Lh) Lhϕ1(Lh)) N( yn) N ( yn+1) (86) = N( yn) N ( yn+1) , (87) (88)

where we used properties of the ϕ-functions, namely Lhϕ1(Lh) = ϕ0(Lh) and the commutativity ϕ0(Lh)L = Lϕ0(Lh). With Lemma B.2, the residual covariance Sn+1 and Kalman gain Kn+1 are then of the form

Sn+1 = HΣ n+1H = HQ(h)H = h I, (89)

Kn+1 = Σ n+1H S 1 n+1 = Q(h)H (h I) 1 = hϕ2(Lh) ϕ1(Lh)

This gives the updated mean

µn+1 = µ n+1 Kn+1ˆzn+1 (91)

= ϕ0(Lh)yn + hϕ1(Lh)N( yn) ϕ0(Lh)(Lyn + N( yn))

hϕ2(Lh) ϕ1(Lh)

(N( yn) N( yn+1)) (92)

= ϕ0(Lh)yn + hϕ1(Lh)N( yn) hϕ2(Lh) (N( yn) N( yn+1)) ϕ0(Lh)(Lyn + N( yn)) ϕ1(Lh) (N( yn) N( yn+1))

This proves the ﬁrst half of the mean recursion:

E0µn+1 = ϕ0(Lh)yn + hϕ1(Lh)N( yn) hϕ2(Lh) (N( yn) N( yn+1)) = yn+1. (94)

It is left to show that

E1µn+1 = Lyn+1 N( yn+1). (95)

Starting from the right-hand side, we have

Lyn+1 + N( yn+1) (96) = L (ϕ0(Lh)yn + hϕ1(Lh)N( yn) hϕ2(Lh) (N( yn) N( yn+1))) + N( yn+1) (97) = ϕ0(Lh)Lyn + Lhϕ1(Lh)N( yn) Lhϕ2(Lh) (N( yn) N( yn+1)) N( yn+1) (98) = ϕ0(Lh)Lyn + (ϕ0(Lh) I)N( yn) (ϕ1(Lh) I) (N( yn) N( yn+1)) N( yn+1) (99) = ϕ0(Lh)(Lyn + N( yn)) ϕ1(Lh)(N( yn) N( yn+1)) (100) = E1µn+1. (101)

This concludes the proof of the mean recursion and thus shows the equivalence of the two recursions.

C Proof of Proposition 3: L-stability

We ﬁrst provide deﬁnitions of L-stability and A-stability, following [27, Section 8.6].

Deﬁnition 1 (L-stability). A one-step method is said to be L-stable if it is A-stable and, in addition, when applied to the scalar test-equation y(t) = λy(t), λ C a complex constant with Re(λ) < 0, it yields yn+1 = R(hλ)yn, and R(hλ) 0 as Re(hλ) .

Deﬁnition 2 (A-stability). A one-step method is said to be A-stable if its region of absolute stability contains the whole of the left complex half-plane. That is, when applied to the scalar test-equation y(t) = λy(t) with λ C a complex constant with Re(λ) < 0, the method yields yn+1 = R(hλ)yn, and {z C : Re(z) < 0} {z C : R(z) < 1}.

Proof of Proposition 3. Both L-stability and A-stability directly follow from Remark 1: Since the probabilistic exponential integrator solves linear ODEs exactly its stability function is the exponential function, i.e. R(z) = exp(z). A-stability and L-stability then follow: Since C {z : |R(z)| 1} holds the method is A-stable. And since |R(z)| 0 as Re(z) the method is L-stable.

D Experiment details

D.1 Burger s equation

Burger s equation is a semi-linear partial differential equation (PDE) of the form

tu(x, t) = u(x, t) xu(x, t) + D 2 xu(x, t), x Ω, t [0, T], (102)

with diffusion coefﬁcient D R+. We discretize the spatial domain Ωon a ﬁnite grid and approximate the spatial derivatives with ﬁnite differences to obtain a semi-linear ODE of the form

y(t) = D L y(t) + F(y(t)), t [0, T], (103)

with N-dimensional y(t) RN, L RN N the ﬁnite difference approximation of the Laplace operator 2 x, and a non-linear part F.

More speciﬁcally, we consider a domain Ω= (0, 1), which we discretize with a grid of N = 250 equidistant locations, thus we have x = 1/N. We consider zero-Dirichlet boundary conditions, that is, u(0, t) = u(1, t) = 0. The discrete Laplacian is then

[L]ij = 1 x2

2 if i = j, 1 if i = j 1, 0 otherwise. (104)

The non-linear part of the discretized Burger s equation results from another ﬁnite-difference approximation of the term u xu, and is chosen as

[F(y)]i = 1 4 x

y2 2 if i = 1, y2 d 1 if i = d, y2 i+1 y2 i 1 else. (105)

The initial condition is chosen as

u(x, 0) = sin(3πx)3(1 x)3/2. (106)

We consider an integration time-span t [0, 1], and choose a diffusion coefﬁcient D = 0.075.

D.2 Reaction-diffusion model

The reaction-diffusion model presented in the paper, with logistic reaction term, has been used to describe the growth and spread of biological populations [22]. It is given by a semi-linear PDE

tu(x, t) = D 2 xu(x, t) + R(u(x, t)), x Ω, t [0, T], (107)

where D R+ is the diffusion coefﬁcient and R(u) = u(1 u) is a logistic reaction term. We discretize the spatial domain Ωon a ﬁnite grid and approximate the spatial derivatives with ﬁnite differences, and obtain a semi-linear ODE of the form

y(t) = D L y(t) + R(y(t)), t [0, T], (108)

with N-dimensional y(t) RN, L RN N the ﬁnite difference approximation of the Laplace operator, and the reaction term R is as before but applied element-wise.

We again consider a domain Ω= (0, 1), which we discretize on a grid of N = 100 points. This time we consider zero-Neumann conditions, that is, xu(0, t) = xu(1, t) = 0. Including these directly into the ﬁnite-difference discretization, the discrete Laplacian is then

[L]ij = 1 x2

1 if i = j = 1 or i = j = d, 2 if i = j, 1 if i = j 1, 0 otherwise.

The initial condition is chosen as

u(x, 0) = 1 1 + e30x 10 . (110)

The discrete ODE is then solved on a time-span t [0, 2], and we choose a diffusion coefﬁcient D = 0.25.