# solving_poisson_equations_using_neural_walkonspheres__8095af7e.pdf Solving Poisson Equations using Neural Walk-on-Spheres Hong Chul Nam * 1 Julius Berner * 2 Anima Anandkumar 2 We propose Neural Walk-on-Spheres (NWo S), a novel neural PDE solver for the efficient solution of high-dimensional Poisson equations. Leveraging stochastic representations and Walk-on Spheres methods, we develop novel losses for neural networks based on the recursive solution of Poisson equations on spheres inside the domain. The resulting method is highly parallelizable and does not require spatial gradients for the loss. We provide a comprehensive comparison against competing methods based on PINNs, the Deep Ritz method, and (backward) stochastic differential equations. In several challenging, high-dimensional numerical examples, we demonstrate the superiority of NWo S in accuracy, speed, and computational costs. Compared to commonly used PINNs, our approach can reduce memory usage and errors by orders of magnitude. Furthermore, we apply NWo S to problems in PDEconstrained optimization and molecular dynamics to show its efficiency in practical applications. 1. Introduction Partial Differential Equations (PDE) are foundational to our modern scientific understanding in a wide range of domains. While decades of research have been devoted to this topic, numerical methods to solve PDEs remain expensive for many PDEs. In recent years, deep learning has helped to accelerate the solution of PDEs (Azzizadenesheli et al., 2023; Zhang et al., 2023b; Cuomo et al., 2022) as well as tackle PDEs, which had been entirely out of range for classical methods (Han et al., 2018; Scherbela et al., 2022; N usken & Richter, 2021b). Among the biggest challenges for classical numerical PDE solvers are complex geometries and high dimensions. In *Equal contribution 1ETH Zurich 2Caltech. Correspondence to: Hong Chul Nam , Julius Berner . Proceedings of the 41 st International Conference on Machine Learning, Vienna, Austria. PMLR 235, 2024. Copyright 2024 by the author(s). 0 200 400 600 800 1000 Time (s) Relative L2 error Neural Wo S (Ours) Deep Ritz Diffusion Loss PINN Neural Cache Figure 1. Convergence of the relative L2-error when solving the 10d Laplace equation in Section 5 using our considered methods. particular, grid-based methods, such as finite-element, finitevolume, or finite-difference methods, scale exponentially in the underlying dimension. On the other hand, deep learning approaches have been shown to overcome this so-called curse of dimensionality (De Ryck & Mishra, 2022; Duan et al., 2021; Berner et al., 2020b). Corresponding algorithms are typically based on Monte Carlo (MC) approximations of variational formulations of PDEs. In this work, we focus on high-dimensional Poisson equations on general domains. We note that the accurate numerical solution of such types of PDEs is crucial for a large variety of areas. For instance, Poisson equations are prominent in geometry processing (Sawhney & Crane, 2020), as well as many areas of theoretical physics, e.g., electrostatics and quantum mechanics (Bahrami et al., 2014). In high dimensions, they govern important quantities in molecular dynamics, such as likely transition pathways and transition rates between regions or conformations of interest (Vanden Eijnden et al., 2006; Lu & Nolen, 2015). Several deep learning methods are amendable to the numerical solution of Poisson equations. This includes physicsinformed neural networks (PINNs) or Deep Galerkin methods (Raissi et al., 2019; Sirignano & Spiliopoulos, 2018), the Deep Ritz method (E et al., 2017), as well as approaches based on (backward) stochastic differential equations (Han et al., 2017; N usken & Richter, 2021a; Han et al., 2020). However, previous methods suffer from unnecessarily high computational costs, bias, or instabilities, see Section 3. Solving Poisson Equations using Neural Walk-on-Spheres Figure 2. Left: Time-discretization of the solution Xξ to the SDE in (6) with stopping time τ(Ω, ξ) in (7) for the domain Ω= [0, 1]2. Right: Realization of the Walk-on-Spheres algorithm in Section 4. Walk-on-Spheres (Wo S): To overcome the above challenges, we propose a novel approach based on so-called Walk-on-Spheres (Wo S) methods (Muller, 1956). The Wo S method is a Monte Carlo method specifically tailored toward Poisson equations by rewriting their solutions as an expectation over Brownian motions stopped at the boundary of the domain. Simulating the Brownian motion using time-discretizations either is slow or introduces bias (depending on the chosen time step). Leveraging the isotropy of Brownian motion, Wo S accelerates this process by iteratively sampling from spheres around the current position until reaching the boundary, see Figure 2. However, as with all Monte Carlo methods, Wo S can only obtain pointwise estimates and suffers from slow convergence w.r.t. the number of trajectories. In particular, every sufficiently accurate estimate of the solution on a single point takes a considerable amount of time. This is prohibitive if many solution evaluations are needed sequentially, e.g., in PDE-constrained optimization problems. Our approach (NWo S): We develop Neural Walk-on Spheres (NWo S), a version of Wo S that can be combined with neural networks to learn the solution to (parametric families of) Poisson equations on the whole domain. Our method amortizes the cost of Wo S during training so that the solution, and its gradients, can be evaluated in fractions of seconds afterward (and at arbitrary points in the domain). In particular, in order to obtain accuracy ε, the standard Wo S method incurs a cost of O(ε 2) trajectories for the evaluation of the solution while NWo S has a reduced cost of a single O(1) forward-pass of our model. Using the partially trained model as an estimator, we can limit the number of simulations and Wo S steps for training without introducing high bias or variance. The resulting objective is more efficient and scalable than competing methods, without the need to balance penalty terms for the boundary condition or compute spatial derivatives (Table 1). In particular, we demonstrate a significant reduction of GPU memory usage in comparison to PINNs (Figure 3) and up to orders of magnitude better performance for a given time and compute budget (Figure 1). Table 1. Comparison of neural PDE solver for Poisson equations. #Derivatives, #Loss terms, and Cost denote the order of spatial derivatives, the number of terms required in the loss function, and the computational cost for one gradient step. Propagation speed describes how quickly boundary information can propagate to the interior of the domain, see Section 3 for details. Method #Derivatives #Loss Cost Propagation terms speed PINN 2 2 medium slow Deep Ritz 1 2 low slow Feynman-Kac 0 1 high fast BSDE 1 1 high fast Diffusion loss 1 2 medium medium NWo S (ours)1 0 1 low fast 10 100 1000 Dimension (d) Memory Size (MB) Neural Wo S (Ours) Deep Ritz Diffusion Loss PINN Figure 3. Peak GPU memory usage of different methods during training with batch size 512 for the Poisson equation in Section 5 in different dimensions d. Our contributions can be summarized as follows: We analyze previous neural PDE solver and their shortcomings when applied to high-dimensional elliptic PDEs, such as Poisson equations (Section 3). We devise novel variational formulations for the solution of Poisson equations based on Wo S methods and provide corresponding theoretical guarantees and efficient implementations (Section 4). We compare against previous approaches on a series of benchmarks and demonstrate significant improvements in terms of accuracy, speed, and scalability (Section 5). 2. Related works Neural PDE solver: We provide an in-depth comparison to competing deep learning approaches to solve elliptic PDEs in Section 3. These include physics-informed neural networks (PINNs) (Raissi et al., 2019; Sirignano & Spiliopoulos, 2018), the Deep Ritz method (Jin et al., 1While not necessary for NWo S, we note that the gradient of the model and an additional boundary loss can still be used to improve performance, see Section 4.5. Solving Poisson Equations using Neural Walk-on-Spheres 2017), and the diffusion loss (N usken & Richter, 2021a), see also Table 1. The diffusion loss can be viewed as an interpolation between PINNs and losses based on backward SDEs (BSDEs) (Han et al., 2017; E et al., 2017; Beck et al., 2019). Methods based on BSDEs and the Feynman-Kac formula (Beck et al., 2018; Berner et al., 2020a; Richter & Berner, 2022) have been investigated for the solution of parabolic PDEs, where the SDE is stopped at a given terminal time. Due to costly simulation times, they cannot be applied efficiently to elliptic problems. To combat this issue, we draw inspiration from Walk-on-Spheres methods. Monte Carlo (MC) methods: Since grid-based methods cannot tackle high-dimensional PDEs, MC methods are typically used. For Poisson equations, the Wo S method has been developed by Muller (1956) and has since been successfully used in various scientific settings (Sabelfeld, 2017; Juba et al., 2016; Bossy et al., 2010) as well as recently in computer graphics (Qi et al., 2022; Sawhney et al., 2022). In the latter domain, caches based on boundary values (Miller et al., 2023) and neural networks (Li et al., 2023) have been proposed to estimate the PDE solution across the domain and accelerate convergence. Our objective can be scaled to high-dimensional parametric PDEs and guarantees that its minimizer approximates the solution on the whole domain. We refer to Grohs & Herrmann (2022); Beznea et al. (2022) for related neural network approximation results. 3. Neural PDE Solver for Elliptic PDEs We start by defining our problem and describing previous deep learning methods for its solution. Our goal is to approximate the solution2 u C(Ω) to elliptic PDEs with Dirichlet boundary conditions of the form ( P[u] = f, on Ω, u = g, on Ω, (1) with differential operator 2Tr(σσ Hessu) + µ u. In the above, Ω Rd is an open, bounded, connected, and sufficiently regular domain, see, e.g., Baldi (2017); Karatzas & Shreve (2014); Schilling & Partzsch (2014) for suitable regularity assumptions. Note that the formulation in (1) includes the Poisson equation for µ = 0 and σ = 2I, i.e., ( u = f, on Ω, u = g, on Ω. (2) In the following, we will summarize existing neural PDE solvers for these PDEs, see Table 1 for an overview. On a 2For simplicity, we assume that a sufficiently smooth, strong solution exists. high level, they propose different variational formulations minv V L[v] with the property that the minimizer over a suitable function space V C(Ω) is a solution u to the PDE in (1). The space V is then typically approximated by a set of neural networks with a given architecture, such that the minimization problem can be tackled using variants of stochastic gradient descent. 3.1. Strong and weak formulations of elliptic PDEs Let us start with methods based on strong or weak formulations of the PDE in (1). Physics-informed neural networks (PINNs): In its basic form, the loss of PINNs (Raissi et al., 2019) or Deep Galerkin methods (Sirignano & Spiliopoulos, 2018), is given by LPINN[v] := E (P[v](ξ) f(ξ))2 + βLbnd[v], (3) where Lbnd[v] := E (v(ζ) g(ζ))2 . (4) In the above, β (0, ) is a penalty parameter, and ξ and ζ are suitable random variables distributed on Ωand Ω, respectively. While improved sampling methods have been investigated, see, e.g., Tang et al. (2023); Chen et al. (2023), the default choice is to pick uniform distributions. The expectations are then approximated with standard MC or quasi-MC methods based on a suitable set of samples. By minimizing the point-wise residual of the PDE, PINNs have gained popularity as a universal and simple method. However, PINNs are sensitive to hyperparameter choices, such as β, and suffer from training instabilities or high variance (Wang et al., 2021; Krishnapriyan et al., 2021; N usken & Richter, 2021b). Moreover, the objective in (3) requires the evaluation of the derivatives appearing in P[v]. While this can be done exactly using automatic differentiation, it leads to high computational costs, see Figure 3. Deep Ritz method: For the Poisson equation in (2), one can avoid this cost by leveraging weak variational formulations, see, e.g., Evans (2010). Rather than directly optimizing the regression loss in (3), the Deep Ritz method (E et al., 2017) proposes to minimize the objective LRitz[v] := E v(ξ) 2 2 f(ξ)v(ξ) + βLbnd[v]. (5) Under suitable assumptions, the minimizer again corresponds to the solution to the PDE in (2). The objective only requires computing the gradient v instead of the Laplacian v. Using backward mode automatic differentiation, this reduces the number of backward passes from d + 1 to one, see also the reduced cost in Figure 3. Moreover, we note that the loss in (5) allows for weak solutions that are not Solving Poisson Equations using Neural Walk-on-Spheres twice differentiable. We refer to Chen et al. (2020), for an extensive comparison of the Deep Ritz method to PINNs for elliptic PDEs with different boundary conditions. Finally, we mention that both methods suffer from the fact that the interior losses only consider local, pointwise information at samples x Ω. At the beginning of training, the interior loss might thus not be meaningful. Specifically, the boundary condition g first needs to be learned via the boundary loss Lbnd, and then propagate from the boundary Ωto interior points x via the local interior loss. There exist some heuristics to mitigate this issue by, e.g., progressively learning the solution, see Penwarden et al. (2023) for an overview. The next section describes more principled ways of including boundary information in the loss and directly informing the interior points of the boundary condition. 3.2. Stochastic formulations of elliptic PDEs From weak solutions, we will now proceed to stochastic representations of elliptic PDEs in (1). To this end, consider the3 solution Xξ to the stochastic differential equation d Xξ t = µ(Xξ t ) dt + σ(Xξ t ) d Wt, Xξ 0 = ξ, (6) where W is a standard d-dimensional Brownian motion. Moreover, we define the stopping time τ as the first exit time of the stochastic process Xξ from the domain Ω, i.e., τ = τ(Ω, ξ) := inf{t [0, ): Xξ t / Ω}. (7) An application of Itˆo s lemma to the process u(Xξ t τ) shows that we almost surely have that u(Xξ τ) = u(Xξ 0) + Z τ 0 P[u](Xξ t ) dt + Su τ , where Su τ is the stochastic integral Su τ := Z τ 0 (σ u)(Xξ t ) d Wt. Using the fact that Xξ 0 = ξ and assuming that u solves the elliptic PDE in (1), we arrive at the formula g(Xξ τ) = u(ξ) + F ξ τ + Su τ , (8) where we used the abbreviation F ξ τ := Z τ 0 f(Xξ t ) dt. Since the stochastic integral Sτ u has zero expectation, see, e.g., Baldi (2017, Theorem 10.2), we can rewrite (8) as a stochastic representation, i.e., u(x) = E g(Xξ τ) F ξ τ ξ = x , (9) 3We assume that there exists a unique solution, see, e.g., Le Gall (2016) for corresponding conditions. which goes back to Kakutani s Theorem (Kakutani, 1944) and is a special case of the Feynman-Kac formula. While the representation in (9) leads to MC methods for the pointwise approximation of u at a given point x Ω, it also allows us to derive variational formulation for learning u on the whole domain Ω. Based on the above results, we can derive the following three losses. Feynman-Kac loss: The Feynman-Kac loss is given by LFK[v] := E h v(ξ) g(Xξ τ) + F ξ τ 2i (10) and follows from the fact that the solution to a quadratic regression problem as in (10) is given by the conditional expectation in (9). Notably, this variational formulation does neither require a derivative of the function v nor an extra boundary loss Lbnd. BSDE loss: Since the formula in (8) holds if and only if u solves the PDE in (1), we can derive the BSDE loss LBSDE[v] := E h v(ξ) g(Xξ τ) + F ξ τ + Sv τ 2i . (11) Compared to the Feynman-Kac loss in (10), the BSDE loss requires computing the gradient of v at every timediscretization of the SDE Xξ in order to compute Sv τ . However, due to (8), Sv τ acts as a control variates and causes the variance of the MC estimator of (11) to vanish at the optimum, see Richter & Berner (2022) for details. For the previous two losses, boundary information is directly propagated along the trajectory of the SDE Xξ to the interior. However, simulating a batch of realizations of the SDE until they reach the boundary Ω, i.e., until the stopping time τ, can incur prohibitively high costs. Diffusion loss: The diffusion loss (N usken & Richter, 2021a) circumvents long simulation times by stopping the SDE at s = τ T, i.e., at the minimum of a prescribed time T (0, ) and the stopping time τ. Since the trajectories might not reach the boundary, the loss is supplemented with a boundary loss. This yields the variational formulation LDiff[v] := E v(ξ) v(Xξ s) + F ξ s + Sv s 2 + βLbnd[v]. Note that this can be viewed as an interpolation between the BSDE loss (for s ) and the PINN loss (for s 0 and rescaling by s 2). In the same way, it also balances the advantages and disadvantages of both losses, see also Table 1. 4. Neural Walk-on-Spheres (NWo S) Method In this section, we will present a more efficient way of simulating the SDE trajectories for the case of Poisson-type PDEs as in (2). Our loss is based on the FK loss in (10), which does not require the computation of any spatial derivatives Solving Poisson Equations using Neural Walk-on-Spheres of the neural network v. However, we reduce the number of steps for simulating the process Xξ while still reaching the boundary (different from the diffusion loss). 4.1. Recursion of elliptic PDEs on sub-domains First, we outline how to cast the solution of the PDE in (1) into nested subproblems of solving elliptic PDEs on subdomains. Specifically, let Ω0 Ωbe an open sub-domain containing4 ξ0 := ξ and let τ0 := τ(Ω0, ξ) be the corresponding stopping time, defined as in (7). Analogously to (9), we obtain that u(ξ) = E u(Xξ τ0) F ξ τ0 ξ . (12) Note that this is a recursive definition since the solution u to the PDE in (1) appears again in the expectation. To resolve the recurrence, we define the random variable ξ1 Xξ τ0 and choose another open sub-domain Ω1 Ωcontaining ξ1. Considering the stopping time τ1 := τ(Ω1, ξ1), we can calculate the value of u appearing in the inner expectation u(Xξ τ0) u(ξ1) = E u(Xξ1 τ1 ) F ξ1 τ1 ξ1 We can now iterate this process for k N and combine the result with (12) to obtain k 0 F ξk τk In the above, we used the strong Markov property of the SDE solution and the tower property of the conditional expectation, see also Grohs & Herrmann (2022). This nested stochastic representation can be compared to the one in (9). The next section shows how this provides a practical algorithm that terminates in finitely many steps. 4.2. Walk-on-Spheres We tackle the problem of solving the Poisson equation in (2), i.e., µ = 0 and σ = 2I. Then, the SDE in (6) is just a scaled Brownian motion starting at ξ. Picking Ωk := Brk(ξk) to be a ball of radius rk (0, ) around ξk in the k-th step, the isotropy of Brownian motion ensures that ξk+1 Xξk τk U( Brk(ξk)). In other words, we can just sample ξk+1 uniformly from a sphere of radius rk around the previous value ξk. To terminate after finitely many steps, we pick the maximal radius in each step, i.e., rk := dist(ξk, Ω), 4Since ξ is a random variable, the sub-domain Ω0 is random, and the statement is to be understood for each realization. and stop at step κ when reaching an ε-shell, i.e., when rκ < ε for a prescribed ε (0, ). This allows us to walk from sphere to sphere until (approximately) reaching the boundary, such that we can estimate the first term in (13). Specifically, the value u(Xξ τ) in (13) is approximated by the boundary value g( ξκ), where ξκ := arg min x Ω x ξκ is the projection to the boundary. We note that the bias from introducing the stopping tolerance ε can be estimated as O(ε) (Mascagni & Hwang, 2003). Moreover, for well-behaved, e.g., convex, domains Ω, the average number of steps κ behaves like O(log(ε 1)) (Motoo, 1959; Binder & Braverman, 2012). This shows that ε can be chosen sufficiently small without incurring too much additional computational cost. We note that this leads to much faster convergence than time-discretizations of the Brownian motion. In order to have a comparable bias, we would need to take steps of size O(ε), requiring Ω(ε 2) steps to converge. 4.3. Source term To compute the second term in (13), we need to accumulate values of the form v(z) := E h F z τ(B,z) i (14) with a given ball B = Br(z). By (9), we observe that v is the solution of a Poisson equation on the ball B with zero Dirichlet boundary condition evaluated at z Ω. We can thus use classical results by Boggio (1905), see also Gazzola et al. (2010), to write the solution in terms of Green s functions. Specifically, we have that v(z) = |Br(z)| E[f(γ)Gr(γ, z)], (15) where γ U(Br(z)) and Gr(y, z) := 2π log r y z , d = 2, 4πd/2 y z 2 d r2 d , d > 2, see Appendix A for further details. In practice, we can now approximate the expectation in (15) using an MC estimate. 4.4. Learning Problem Based on the previous derivations, we can establish a variational formulation, where the minimizer is guaranteed to approximate the Poisson equation in (2) on the whole domain Ω. Specifically, we define LNWo S[v] := E h v(ξ) Wo S(ξ) 2i , (16) Solving Poisson Equations using Neural Walk-on-Spheres % ), else 2 min ' MSE[!% ; Figure 4. Neural Walk-on-Spheres (NWo S): Our algorithm for learning the solution to Poisson equations u = f on Ω Rd and u| Ω= g. 1 In each gradient descent step, we sample a batch of random points (ξi 0)m i=1 in the domain Ωand simulate Brownian motions by iteratively sampling ξi k from spheres Bri k inscribed in the domain. To account for the source term f, we sample γi k U(Bri k) to compute an MC approximation |Bri k|f(γi k)Gri k(ξi k, γi k) to the solution of the Poisson equation on the sphere Bri k using the Green s function Gri k in Section 4.3. 2 We stop after a fixed number of maximum steps K and either evaluate our neural network vθ or the boundary condition g if we reach an ε-shell of Ω. 3 If vθ satisfies the PDE, the mean-value property implies that vθ(ξi 0) is approximated by the expected value of yi minus the accumulated source term contributions. We thus minimize the corresponding mean squared error over the parameters θ using gradient descent. where the single-trajectory Wo S method Wo S(ξ) with random initial point ξ is given by Wo S(ξ) := g( ξκ) k=0 |Brk(ξk)|f(γk)Grk(γk, ξk). In the above, γk U(Brk(ξk)), and the random variables κ, ξk, ξκ, and rk are defined as in Section 4.2. From the stochastic formulation of the solution in (13) and Proposition 3.5 in Grohs & Herrmann (2022), it follows that the minimizer of (16), i.e., x 7 E Wo S(ξ) ξ = x , approximates the solution u in (2) in the uniform norm up to error O(ε), where ε is the stopping tolerance, see Section 4.2. We also remark that, in theory, the loss requires only a single Wo S trajectory per sample of ξ since the minimizer of the regression problem in (16) averages out the noise. Having established a learning problem, we can analyze both approximation and generalization errors. For the former, Grohs & Herrmann (2022) and Beznea et al. (2022) bounded the size of neural networks vθ to approximate the solution u up to a given accuracy. In particular, the number of required parameters θ only scales polynomially in the dimension d and the reciprocal accuracy, as long as the functions f, g, and dist( , Ω) can be efficiently approximated by neural networks. One can then leverage results by Berner et al. (2020b) to show that also the generalization error does not underlie the curse of dimensionality when minimizing the empirical risk, i.e., an MC approximation of (16), over a suitable set of neural networks vθ. Specifically, the number of required samples of ξ to guarantee that the empirical minimizer approximates the solution u up to a given accuracy also scales only polynomially with dimension and accuracy. 4.5. Implementation In this section, we discuss implementations for the loss LNWo S in (16) described in the previous section. We summarize our algorithm in Figure 4 and provide pseudocode for the vanilla version in Algorithm 1. In the following, we present strategies to trade-off accuracy and computational cost and to reduce the variance of MC estimators. We provide pseudocode for NWo S with these improvements in Algorithm 2 and Algorithm 3 in the appendix. Wo S with maximum number of steps: For sufficiently regular geometries, the probability of a walk taking more than k steps is exponentially decaying in k (Binder & Braverman, 2012). However, if a single walk in our batch needs significantly more steps, it slows down the overall training. We thus introduce a deterministic maximum number of steps K N; see Beznea et al. (2022) for a corresponding error analysis. However, we do not want to introduce nonnegligible bias by, e.g., just projecting to the closest point on the boundary. Solving Poisson Equations using Neural Walk-on-Spheres Table 2. Relative L2-error (and standard deviations over 5 independent runs) of our considered methods, estimated using MC integration on 106 uniformly distributed (unseen) points in Ω. Method Problem Laplace (10d) Committor (10d) Poisson Rect. (10d) Poisson (50d) PINN 7.42e 4 1.84e 4 4.10 3 1.11e 3 1.35e 2 1.57e 3 7.70e 3 2.25e 3 Deep Ritz 8.43e 4 6.29e 5 6.15e 3 5.30e 4 1.06e 2 6.20e 4 1.05e 3 1.70e 4 Diffusion loss 1.57e 4 7.74e 6 4.48e 2 6.93e 3 9.69e 2 1.03e 2 5.96e 4 1.06e 5 Neural Cache 3.99 4 4.08e 5 1.26e 3 5.82e 5 4.98e 2 1.80e 2 1.63e 2 1.42e 2 Wo S 1.08e 3 1.34e 6 1.99e 3 9.79e 6 2.32e 1 2.09e 1 4.50e 3 7.38e 4 NWo S (ours) 4.29e 5 2.02e 6 6.56e 4 2.42e 5 2.60e 3 9.99e 5 4.82e 4 1.32e 5 Instead, we want to enforce the mean-value property on subdomains of Ωbased on our recursion in Section 4.1. We thus propose to use the model v instead of the boundary condition g if the walk does not converge after K steps, i.e., we define5 ( v(ξK), d(ξK, Ω) > ε, g( ξK), else. We can then replace the second term in (16) by Wo S(ξ, v) := yξ,v k=0 |Brk(ξk)|f(γk)Grk(γk, ξk). This helps to reduce the bias when d(ξK, Ω) is nonnegligible and leads to faster convergence assuming that we obtain increasingly good approximations vθ u during training of a neural network vθ. Our approach bears similarity to the diffusion loss, see Section 3; however, we do not need to use a time-discretization of the SDE. Boundary Loss: We find empirically that an additional boundary loss can improve the performance of our method. While theoretically not required, it can especially help for a smaller number K of maximum steps (see the previous paragraph). In general, we thus sample a fraction of the points on the boundary Ωand optimize LNWo S[v] + βLbnd[v], where Lbnd is defined6 as in (4). Variance-reduction: While not necessarily needed for the objective in (16), we can still average multiple Wo S trajectories N N per sample of ξ to reduce the variance. This leads to the estimator b LNWo S[v] := 1 i=1 v(ξi) 1 n=1 Wo Sn(ξi) 5Since we stop the walk when reaching an ε-shell, the first condition can also be written as K < κ. 6Note that Lbnd can be interpreted as a special case of LNWo S where the Wo S method directly terminates since the initial points are sampled on the boundary. where ξi are i.i.d. samples of ξ and Wo Sn(ξi) are i.i.d. samples of Wo S(ξi), i.e., N trajectories with the same initial point ξi, see (16). Note that we vectorize the Wo S simulations across both the initial points and the trajectories, making our NWo S method highly parallelizable and scalable to large batch sizes. We further introduce control variates to reduce the variance of estimating Wo S(x), where we focus on a fixed x Ω for ease of presentation. Control variates seek to reduce the variance by using an estimator of the form E [Wo S(x)] E[δ] + 1 n=1 Wo Sn(x) δn, where δn are i.i.d. samples of a random variable δ with known expectation. Motivated by Sawhney & Crane (2020), we use an approximation of the first-order term of a Taylor series of u in the direction of the first Wo S step. We assume that vθ provides an increasingly accurate approximation of the gradient u during training and propose to use δn := vθ(x) (ξn 1 x), where ξn 1 is the first step of Wo Sn(x). In particular, ξn 1 U( Br1(x)) and thus E[δ] = 0 holds for any function vθ. While we need to compute the gradient vθ(x) for the control variate, we mention that this operation can be detached from the computational graph. In particular, we do not need to compute the derivative of vθ(x) w.r.t. the parameters θ as is necessary for PINNs, the Deep Ritz method, the diffusion loss, and the BSDE loss. In Appendix C, we empirically show that the overhead of using the control variates is insignificant. Buffer: Motivated by Li et al. (2023), we can use a buffer to cache training points n=1 Wo Sn(ξ(i)) B Since we only update the buffer after a given number of training steps L N, this accelerates the training. Note Solving Poisson Equations using Neural Walk-on-Spheres Algorithm 1 Training of vanilla NWo S method Input: neural network vθ with initial parameters θ, optimizer method step for updating the parameters, number of iterations T, batch size m, source term f, boundary term g, stopping tolerance ε Output: optimized parameters θ for k 0, . . . , T do xΩ sample from ξ m Sample points in Ω x xΩ r dist(x, Ω) Compute distances to Ω while r > ε do γ sample from U(Br(x)) Estimate source s s |Br(x)|f(γ)Gr(x, γ) u sample from U( Br(x)) x x + u Walk to next points r dist(x, Ω) Compute distances to Ω end while x project x to Ω Find closest points in Ω yΩ s + g(x) Estimate boundary b LNWo S MSE(vθ(xΩ), yΩ) NWo S loss θ step γ, θ b LNWo S SGD step end for that this is not possible for the other methods since they require evaluation of the current model or its gradients. In every buffer update, we average over additional trajectories, i.e., increase N in (17), for a fraction of points to improve their accuracy. However, different from Li et al. (2023), we also evict a fraction of points from the buffer and replace them with Wo S estimates on newly sampled points ξ(i) in the domain Ωto balance the diversity and accuracy of the training data in the buffer. Remark 4.1. The Neural Cache method by Li et al. (2023) uses a related approach to accelerate Wo S methods for applications in computer graphics. However, their method never replaces any point ξ(i) in the buffer, i.e., only updates estimates in the buffer. We observed that the model is thus prone to overfitting on the points in the buffer, especially in high dimensions, preventing it from achieving high accuracies across the domain Ω. 5. Experiments In this section, we compare the performance of NWo S, PINN, Deep Ritz, Diffusion loss, and Neural Cache on various problems across dimensions from 10d to 50d. We do not consider the FK and BSDE losses since they incur prohibitively long runtimes for simulating the SDEs with sufficient precision. To compare against the baselines, we consider benchmarks from the works proposing the Deep Ritz and diffusion losses (Jin et al., 2017; N usken & Richter, 2021a). For a fair comparison, we set a fixed runtime of 25d + 750 seconds and GPU memory budget of 2Gi B for training and ran a grid search over a series of hyperparameter configurations for each method. Then, we performed 5 independent runs for the best configurations w.r.t. the relative L2-error. More details on the hyperparameters and our implementations7 can be found in Appendix B. Laplace Equation: The first PDE is a Laplace equation on a square domain given by f(x) = 0, g(x) = Pd/2 i=0 x2ix2i+1, x Ω= (0, 1)d. To test our models, we compare against the analytic solution as u(x) = Pd/2 i=0 x2kx2k+1. Following Jin et al. (2017), we consider the case d = 10. Poisson Equation: Next, we consider the Poisson equation presented in Jin et al. (2017), i.e., f(x) = 2d, g(x) = Pd i=1 x2 i , x Ω= (0, 1)d, with analytic solution u(x) = Pd i=1 x2 i . We choose d = 50 and present results with8 d {100, 500} in Appendix D. Poisson Equation with Rectangular Annulus: We also consider a Poisson equation on a rectangular annulus Ω= ( 1, 1)d \ [ c, c]d with sinusoidal boundary condition and source term i=1 sin(2πxi), f(x) = 4π2 i=1 sin(2πxi). We choose c = 0.25 1 d and d = 10, and note that the analytic solution is given by u(x) = 1 d Pd i=1 sin(2πxi). Committor Function: The fourth equation deals with committor functions from molecular dynamics. These functions specify likely transition pathways and transition rates between (potentially metastable) regions or conformations of interest (Vanden-Eijnden et al., 2006; Lu & Nolen, 2015). They are typically high-dimensional and known to be challenging to compute. To compare NWo S, we consider the setting in N usken & Richter (2021a). The task is to estimate the probability of a particle hitting the outer surface of an annulus Ω= {x Rd : a < x < b} with a, b (0, ), before the inner surface. The problem can then be formulated as solving the Laplace equation given by f(x) = 0, g(x) = 1{ x =b}, x Ω. For this specific Ω, a reference solution can be computed as u(x) = a2 x 2 da2 a2 b2 da2 . 7Our Py Torch code can be found at https://github. com/bizoffermark/neural_wos. 8While d = 100 is considered by Jin et al. (2017), we find that a simple projection outperforms all models in sufficiently high dimensions for this benchmark, see Appendix D. Solving Poisson Equations using Neural Walk-on-Spheres Initialization Ground Truth Figure 5. Qualitative assessment of the solution to the PDEconstrained optimization problem. (Left) Initial function uc for random parameters c D. (Middle) Predicted function uˆc for the parameters ˆc obtained after a few gradient descent steps using the approximation of the solution to the parametric Poisson equation obtained with NWo S. (Right) The groundtruth solution uc . We further use the setting by N usken & Richter (2021a) and choose a = 1, b = 2, and d = 10. PDE-Constrained Optimization: Finally, we want to solve the optimization problem min u H1 0(Ω), m L2(Ω) 1 2 Ω (u ud)2dx + α constraint to u being a solution to the Poisson equation with g(x) = 0 and f(x) = m(x) for x Ω= (0, 1)2. The goal of the optimization problem is to balance the energy of the input control m with the proximity of the state u and the target state ud while satisfying the PDE constraint. Following Hwang et al. (2022), we choose ud = 1 2π sin(πx1) sin(πx2) as target state. To tackle this problem and showcase the capabilities of NWo S, we first solve a parametric Poisson equation, where we parametrize the control as mc = c1 sin(c2x1) sin(c3x2) with c D := [0.5, 1.0] [2.5, 3.5]2. Similar to Berner et al. (2020a), we can sample random c D in every gradient descent step to use NWo S for solving a whole family of Poisson equations. Freezing the trained neural network parameters afterward, we can reduce the PDE-constraint optimization problem to a problem over c D. In this illustrative example, we can compute the ground-truth parameters as c = 1 1+4απ4 , π, π and choose α = 10 3. 5.1. Results We present our results in Table 2. We first note that we improve the Deep Ritz method and the diffusion loss by almost an order of magnitude compared to the results reported by Jin et al. (2017); N usken & Richter (2021a). Still, our NWo S approach can outperform all other methods on our considered benchmarks. In addition to these results, we highlight that the efficient objective of NWo S also leads to faster convergence, see Figure 1. We provide ablation studies in Appendix C and additional numerical evidence in Appendix D. The PDE-constrained optimization problem shows that NWo S can be extended to parametric problems, where a whole family of Poisson equations is solved simultaneously. We observe that for this 5-dimensional problem (two spatial dimensions and three-parameter dimensions), NWo S converges within 20 minutes to a relative L2-error of 0.79% (averaged over D Ω). The trained network can then be used to solve the optimization problem directly (where we use L-BFGS) without requiring an inner loop for the PDE solver. The results show a promising relative L2-error of 1.30% for estimating the parameters c leading to an accurate prediction of the minimizer, see Figure 5 and Appendix C for an ablation study. 6. Conclusion We have developed Neural Walk-on-Spheres, a novel way of solving high-dimensional Poisson equations using neural networks. Specifically, we provide a variational formulation with theoretical guarantees that amortizes the cost of the standard Walk-on-Spheres algorithm to learn solutions on the full underlying domain. The resulting estimator is more efficient than competing methods (PINNs, the Deep Ritz method, and the diffusion loss) while achieving better performance at lower computational costs and faster convergence. We show that NWo S also performs better on a series of challenging, high-dimensional problems and parametric PDEs. This also highlights its potential for applications where such problems are prominent, e.g., in molecular dynamics and PDE-constraint optimization. Extensions and limitations: NWo S is currently only applicable to Poisson equations with Dirichlet boundary conditions. While this PDE appears frequently in applications, we also believe that future work can extend our method. For instance, one can try to leverage adaptations of Wo S to spatially varying coefficients (Sawhney et al., 2022), driftdiffusion problems (Sabelfeld, 2017), Neumann boundary conditions (Sawhney et al., 2023; Simonov, 2007), fractional Laplacians (Kyprianou et al., 2018), the screened Poisson or Helmholtz equation (Sawhney & Crane, 2020; Cheshkova, 1993), as well as linearized Poisson-Bolzmann equations (Hwang & Mascagni, 2001; Bossy et al., 2010). Moreover, one can also take other elementary shapes in each step, e.g., rectangles or stars (Deaconu & Lejay, 2006; Sawhney et al., 2023), and omit the need for ε-shells for certain geometries using the Green s function first-passage algorithm (Given et al., 1997). Finally, while NWo S can tackle parametric PDEs, we need to have a fixed parametrization of the source or boundary functions. It would be promising to extend the ideas to neural operators, which currently only use losses based on PINNs (Goswami et al., 2022; Li et al., 2021) or diffusion losses for parabolic PDEs (Zhang et al., 2023a). Solving Poisson Equations using Neural Walk-on-Spheres Acknowledgements The authors thank Rohan Sawhney for helpful discussions. J. Berner acknowledges support from the Wally Baer and Jeri Weiss Postdoctoral Fellowship. A. Anandkumar is supported in part by Bren endowed chair and by the AI2050 senior fellow program at Schmidt Sciences. Impact Statement The aim of this work is to advance the field of machine learning and scientific computing. While there are many potential societal consequences of our work, none of them are immediate to require being specifically highlighted here. Azzizadenesheli, K., Kovachki, N., Li, Z., Liu-Schiaffini, M., Kossaifi, J., and Anandkumar, A. Neural operators for accelerating scientific simulations and design. ar Xiv preprint ar Xiv:2309.15325, 2023. Bahrami, M., Großardt, A., Donadi, S., and Bassi, A. The Schr odinger Newton equation and its foundations. New Journal of Physics, 16(11):115007, 2014. Baldi, P. Stochastic Calculus: An Introduction Through Theory and Exercises. Universitext. Springer International Publishing, 2017. Beck, C., Becker, S., Grohs, P., Jaafari, N., and Jentzen, A. Solving stochastic differential equations and Kolmogorov equations by means of deep learning. ar Xiv preprint ar Xiv:1806.00421, 2018. Beck, C., E, W., and Jentzen, A. Machine learning approximation algorithms for high-dimensional fully nonlinear partial differential equations and second-order backward stochastic differential equations. Journal of Nonlinear Science, 29(4):1563 1619, 2019. Berner, J., Dablander, M., and Grohs, P. Numerically solving parametric families of high-dimensional Kolmogorov partial differential equations via deep learning. In Advances in Neural Information Processing Systems, pp. 16615 16627, 2020a. Berner, J., Grohs, P., and Jentzen, A. Analysis of the generalization error: Empirical risk minimization over deep artificial neural networks overcomes the curse of dimensionality in the numerical approximation of Black Scholes partial differential equations. SIAM Journal on Mathematics of Data Science, 2(3):631 657, 2020b. doi: 10.1109/IWOBI.2017.7985525. Beznea, L., Cimpean, I., Lupascu-Stamate, O., Popescu, I., and Zarnescu, A. From Monte Carlo to neural net- works approximations of boundary value problems. ar Xiv preprint ar Xiv:2209.01432, 2022. Binder, I. and Braverman, M. The rate of convergence of the walk on spheres algorithm. Geometric and Functional Analysis, 22(3):558 587, 2012. Boggio, T. Sulle funzioni di green d ordine m. Rendiconti del Circolo Matematico di Palermo (1884-1940), 20:97 135, 1905. Bossy, M., Champagnat, N., Maire, S., and Talay, D. Probabilistic interpretation and random walk on spheres algorithms for the Poisson-Boltzmann equation in molecular dynamics. ESAIM: Mathematical Modelling and Numerical Analysis, 44(5):997 1048, 2010. Chen, J., Du, R., and Wu, K. A comparison study of deep Galerkin method and deep Ritz method for elliptic problems with different boundary conditions. ar Xiv preprint ar Xiv:2005.04554, 2020. Chen, X., Cen, J., and Zou, Q. Adaptive trajectories sampling for solving pdes with deep learning methods. ar Xiv preprint ar Xiv:2303.15704, 2023. Cheshkova, A. walk on spheres algorithms for solving helmholtz equation. Bulletin of the Novosibirsk Computing Center: Numerical analysis, (4):7, 1993. Cuomo, S., Di Cola, V. S., Giampaolo, F., Rozza, G., Raissi, M., and Piccialli, F. Scientific machine learning through physics informed neural networks: Where we are and what s next. Journal of Scientific Computing, 92(3):88, 2022. De Ryck, T. and Mishra, S. Error analysis for physicsinformed neural networks (PINNs) approximating kolmogorov PDEs. Advances in Computational Mathematics, 48(6):1 40, 2022. Deaconu, M. and Lejay, A. A random walk on rectangles algorithm. Methodology and Computing in Applied Probability, 8:135 151, 2006. Duan, C., Jiao, Y., Lai, Y., Lu, X., and Yang, Z. Convergence rate analysis for deep ritz method. ar Xiv preprint ar Xiv:2103.13330, 2021. E, W. and Yu, B. The deep ritz method: a deep learningbased numerical algorithm for solving variational problems. Communications in Mathematics and Statistics, 6 (1):1 12, 2018. E, W., Han, J., and Jentzen, A. Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations. Communications in Mathematics and Statistics, 5(4):349 380, 2017. Solving Poisson Equations using Neural Walk-on-Spheres Evans, L. C. Partial Differential Equations, volume 19. American Mathematical Soc., 2010. Gazzola, F., Grunau, H.-C., and Sweers, G. Polyharmonic boundary value problems: positivity preserving and nonlinear higher order elliptic equations in bounded domains. Springer Science & Business Media, 2010. Given, J. A., Hubbard, J. B., and Douglas, J. F. A firstpassage algorithm for the hydrodynamic friction and diffusion-limited reaction rate of macromolecules. The Journal of chemical physics, 106(9):3761 3771, 1997. Goswami, S., Bora, A., Yu, Y., and Karniadakis, G. E. Physics-informed neural operators. ar Xiv preprint ar Xiv:2207.05748, 2022. Grohs, P. and Herrmann, L. Deep neural network approximation for high-dimensional elliptic PDEs with boundary conditions. IMA Journal of Numerical Analysis, 42(3): 2055 2082, 2022. Han, J., Jentzen, A., et al. Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations. Communications in mathematics and statistics, 5 (4):349 380, 2017. Han, J., Jentzen, A., and E, W. Solving high-dimensional partial differential equations using deep learning. Proceedings of the National Academy of Sciences, 115(34): 8505 8510, 2018. Han, J., Nica, M., and Stinchcombe, A. R. A derivative-free method for solving elliptic partial differential equations with deep neural networks. Journal of Computational Physics, 419:109672, 2020. Hwang, C.-O. and Mascagni, M. Efficient modified walk on spheres algorithm for the linearized Poisson Bolzmann equation. Applied Physics Letters, 78(6):787 789, 2001. Hwang, R., Lee, J. Y., Shin, J. Y., and Hwang, H. J. Solving PDE-constrained control problems using operator learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pp. 4504 4512, 2022. Jin, K. H., Mc Cann, M. T., Froustey, E., and Unser, M. Deep convolutional neural network for inverse problems in imaging. IEEE Transactions on Image Processing, 26 (9):4509 4522, 2017. Juba, D., Keyrouz, W., Mascagni, M., and Brady, M. Acceleration and parallelization of zeno/walk-on-spheres. Procedia computer science, 80:269 278, 2016. Kakutani, S. Two-dimensional Brownian motion and harmonic functions. Proceedings of the Imperial Academy, 20(10):706 714, 1944. Karatzas, I. and Shreve, S. Brownian motion and stochastic calculus, volume 113. springer, 2014. Krishnapriyan, A., Gholami, A., Zhe, S., Kirby, R., and Mahoney, M. W. Characterizing possible failure modes in physics-informed neural networks. Advances in Neural Information Processing Systems, 34, 2021. Kyprianou, A. E., Osojnik, A., and Shardlow, T. Unbiased walk-on-spheres Monte Carlo methods for the fractional Laplacian. IMA Journal of Numerical Analysis, 38(3): 1550 1578, 2018. Le Gall, J.-F. Brownian motion, martingales, and stochastic calculus. Springer, 2016. Li, Z., Zheng, H., Kovachki, N., Jin, D., Chen, H., Liu, B., Azizzadenesheli, K., and Anandkumar, A. Physicsinformed neural operator for learning partial differential equations. ar Xiv preprint ar Xiv:2111.03794, 2021. Li, Z., Yang, G., Deng, X., De Sa, C., Hariharan, B., and Marschner, S. Neural caches for Monte Carlo partial differential equation solvers. In SIGGRAPH Asia 2023 Conference Papers, pp. 1 10, 2023. Lu, J. and Nolen, J. Reactive trajectories and the transition path process. Probability Theory and Related Fields, 161 (1-2):195 244, 2015. Mascagni, M. and Hwang, C.-O. ϵ-shell error analysis for walk on spheres algorithms. Mathematics and computers in simulation, 63(2):93 104, 2003. Miller, B., Sawhney, R., Crane, K., and Gkioulekas, I. Boundary value caching for walk on spheres. ar Xiv preprint ar Xiv:2302.11825, 2023. Motoo, M. Some evaluations for continuous Monte Carlo method by using brownian hitting process. Annals of the Institute of Statistical Mathematics, 11:49 54, 1959. Muller, M. E. Some continuous Monte Carlo methods for the dirichlet problem. The Annals of Mathematical Statistics, pp. 569 589, 1956. N usken, N. and Richter, L. Interpolating between BSDEs and PINNs: deep learning for elliptic and parabolic boundary value problems. ar Xiv preprint ar Xiv:2112.03749, 2021a. N usken, N. and Richter, L. Solving high-dimensional Hamilton Jacobi Bellman PDEs using neural networks: perspectives from the theory of controlled diffusions and measures on path space. Partial Differential Equations and Applications, 2(4):1 48, 2021b. Solving Poisson Equations using Neural Walk-on-Spheres Penwarden, M., Jagtap, A. D., Zhe, S., Karniadakis, G. E., and Kirby, R. M. A unified scalable framework for causal sweeping strategies for physics-informed neural networks (PINNs) and their temporal decompositions. ar Xiv preprint ar Xiv:2302.14227, 2023. Qi, Y., Seyb, D., Bitterli, B., and Jarosz, W. A bidirectional formulation for walk on spheres. In Computer Graphics Forum, volume 41, pp. 51 62. Wiley Online Library, 2022. Raissi, M., Perdikaris, P., and Karniadakis, G. E. Physicsinformed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics, 378:686 707, 2019. Richter, L. and Berner, J. Robust SDE-based variational formulations for solving linear PDEs via deep learning. In Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pp. 18649 18666. PMLR, 2022. Sabelfeld, K. K. Random walk on spheres algorithm for solving transient drift-diffusion-reaction problems. Monte Carlo Methods and Applications, 23(3):189 212, 2017. Sawhney, R. and Crane, K. Monte Carlo geometry processing: A grid-free approach to PDE-based methods on volumetric domains. ACM Transactions on Graphics, 39 (4), 2020. Sawhney, R., Seyb, D., Jarosz, W., and Crane, K. Grid-free Monte Carlo for PDEs with spatially varying coefficients. ACM Transactions on Graphics (TOG), 41(4):1 17, 2022. Sawhney, R., Miller, B., Gkioulekas, I., and Crane, K. Walk on stars: A grid-free Monte Carlo method for PDEs with Neumann boundary conditions. ar Xiv preprint ar Xiv:2302.11815, 2023. Scherbela, M., Reisenhofer, R., Gerard, L., Marquetand, P., and Grohs, P. Solving the electronic Schr odinger equation for multiple nuclear geometries with weight-sharing deep neural networks. Nature Computational Science, 2(5): 331 341, 2022. Schilling, R. L. and Partzsch, L. Brownian motion: an introduction to stochastic processes. Walter de Gruyter Gmb H & Co KG, 2014. Simonov, N. Random walk-on-spheres algorithms for solving mixed and Neumann boundary-value problems. Sibirskii Zhurnal Vychislitel noi Matematiki, 10(2):209 220, 2007. Sirignano, J. and Spiliopoulos, K. DGM: A deep learning algorithm for solving partial differential equations. Journal of computational physics, 375:1339 1364, 2018. Tang, K., Wan, X., and Yang, C. DAS-PINNs: A deep adaptive sampling method for solving high-dimensional partial differential equations. Journal of Computational Physics, 476:111868, 2023. Vanden-Eijnden, E. et al. Towards a theory of transition paths. Journal of statistical physics, 123(3):503 523, 2006. Wang, S., Teng, Y., and Perdikaris, P. Understanding and mitigating gradient flow pathologies in physics-informed neural networks. SIAM Journal on Scientific Computing, 43(5):A3055 A3081, 2021. Zhang, R., Meng, Q., Zhu, R., Wang, Y., Shi, W., Zhang, S., Ma, Z.-M., and Liu, T.-Y. Monte Carlo neural operator for learning pdes via probabilistic representation. ar Xiv preprint ar Xiv:2302.05104, 2023a. Zhang, X., Wang, L., Helwig, J., Luo, Y., Fu, C., Xie, Y., Liu, M., Lin, Y., Xu, Z., Yan, K., et al. Artificial intelligence for science in quantum, atomistic, and continuum systems. ar Xiv preprint ar Xiv:2307.08423, 2023b. Solving Poisson Equations using Neural Walk-on-Spheres A. Green s function for the Ball For the sake of completeness, this section provides details on the derivation in Section 4.3. To compute integrals of the form (14), we look at a special case of a Poisson equation on a ball B = Br(z) with zero Dirichlet boundary condition, i.e., ( v = f, on B, v = 0, on B. Analogously to (9), we obtain that 0 f(Xz t ) dt where τ(B, z) is the corresponding stopping time, see (7). However, since we simplified the domain to a simple ball, we can write the solution in terms of Green s functions. Specifically, we have that B f(y)Gr(y, z) dy (19) Gr(y, z) := 2π log r y z , d = 2, 4πd/2 y z 2 d r2 d , d > 2. We note that (19) is equivalent to (15). While this is a classical result by Boggio (1905), see also Gazzola et al. (2010), we will sketch a proof in the following. We consider the Laplace equation Φx = δx for given x Rd in the distributional sense. It is well known that the fundamental solution Φx is given by 2π log y x , d = 2, (d 2)ωd , d > 2, ωd = | B1(0)| = 2πd/2 Γ(d/2) = 4πd/2 (d 2)Γ(d/2 1) is the surface measure of the d-dimensional unit ball B1(0). Under suitable conditions, it further holds that the solution to (18) is given by B f(y) (Φx(y) ϕx(y)) dy (20) for every x B, where the corrector function ϕx satisfies the Laplace equation ( ϕx = 0, on B, ϕx = Φx, on B, see Evans (2010, Chapter 2.2). Based on (9) and the fact that Φz is constant at the boundary of B = z + Br(0), we can compute the value of the corrector function ϕz, i.e., ϕz(y) = E[Φz(Xy τ(B,y))] = 1 2π log r, d = 2, rd 2 (2 d)ωd , d > 2. This shows that the value of the Green s function at the center z of the ball B is given by Φz(y) ϕz(y) = Gr(y, z), which, together with (20), establishes the claim. A.1. Stable Implementation For numerical stability, we directly compute the quantity Gr(γ, z) := |Br(z)|Gr(γ, z) in practice, as needed in (15). The volume of the hyper-sphere |Br(z)| is given by |Br(z)| = π d 2 such that we obtain Gr(γ, z) :=:= 2 log r γ z , d = 2, rd d(d 2) γ z 2 d r2 d , d > 2. B. Implementation Details We implemented all methods in Py Torch and provide pseudocode in Algorithms 2 and 3. The experiments have been conducted on A100 GPUs. For all our training, we use the Adam optimizer and limit the runtime to 25d + 750 seconds for a fair comparison. In every step, we sample uniformly distributed samples (ξ, ζ) in the domain Ωand on the boundary Ωto approximate the expectations of the loss and boundary terms. Moreover, we employ an exponentially decaying learning rate, which reduces the initial learning rate by two orders of magnitude throughout training. We choose a feedforward neural network with residual connections, 6 layers, a width of 256, and a GELU activation function. We also perform the grid search for the boundary loss penalty term, i.e., β {0.5, 1, 5, 50, 100, 500, 1000, 5000}. We further include the batch size m {2i}17 i=7 in our grid search. For a fair comparison, we set a fixed GPU memory budget of 2Gi B for training, leading to different maximal batch-sizes depending on the method; see also Figure 3. Unless otherwise specified, 10% of the batch size is used for boundary points. Moreover, we set ε = 10 4 for all methods using an ε-shell. Let us detail the hyperparameter choices specific to each method in the following. Solving Poisson Equations using Neural Walk-on-Spheres Algorithm 2 Training of our NWo S method Input: neural network vθ with initial parameters θ, optimizer method step for updating the parameters, Wo S method Wo S in Algorithm 3, number of iterations T, batch sizes md and mb for domain and boundary points, buffer B of size B, boundary function g, buffer update interval L, boundary penalty parameter β Output: optimized parameters θ x Ω sample from ζ B Sample points in Ω B initialize with (x Ω, g(x Ω)) Initialize buffer for k 0, . . . , T do if k mod L = 0 then xΩ sample from ξ md Sample points in Ω x B sample from B Sample points in B x [xΩ, x B] Concatenate points [yΩ, y B] vmap[Wo S(x, vθ)] Wo S B update with (x B, y B) Update estimates B replace with (xΩ, yΩ) Replace points end if x Ω sample from ζ mb Sample points in Ω (x B, y B) sample from B Sample points in B b LNWo S MSE(vθ(x B), y B) Domain loss b Lbnd MSE(vθ(x Ω), g(x Ω)) Boundary loss b L = b LNWo S + β b Lbnd θ step γ, θ b L SGD step end for Walk-on-Spheres (Wo S): For Wo S (Muller, 1956), we directly approximate the solution at the evaluation points. We batch trajectories to saturate the memory budget and present the best result for different configurations within the given runtime. Specifically, we pick the number of trajectories N in the grid {1, 10, 100, 1000, 10000, 100000} and the maximum number of steps K in {0, 1, 10, 100, 1000}. Neural Walk-on-Spheres (NWo S): For NWo S, we try the different extensions in Section 4.5. Specifically, we fix the buffer size B to 10 times that of the batch size m, and sweep the number of gradient steps between buffer updates L {10, 100, 1000}. We also include the maximum number of Wo S steps K {0, 1, 5, 10, 50, 100} and the number of trajectories per update N {1, 10, 100, 200, 300, 400, 500, 1000} in our grid search. If using a boundary loss, we sweep over {0.1, 0.2, 0.3, 0.4, 0.5} in the grid search to find the optimal proportion of the batch size for the boundary loss. Neural Cache: For the neural cache method (Li et al., 2023), we use the best configuration for different buffer sizes, update intervals, and number of trajectories within the given time and memory constraints. Specifically, we try buffer sizes B {10000, 20000, 100000, 1000000}, intervals L {1, 10, 100, 1000, 5000, 10000} to update the buffer, and N {1, 10, 20, 30, 40, 50, 100, 500, 1000} Algorithm 3 Walk-on-Spheres (Wo S) Input: neural network vθ, source term f, boundary term g, point for evaluation x, maximum number of steps K, stopping tolerance ε, number of trajectories N Output: estimator bv of solution v to PDE in (2) at x ˆv 0 for i 1, . . . , N do Batched in implementation s 0 for t 1, . . . , K do r dist(x, Ω) Compute distance to Ω if r < ε then Break Reach boundary end if γ sample from U(Br(x)) Estimate source s s |Br(x)|f(γ)Gr(x, γ) u sample from U( Br(x)) if t = 0 & use control variate then s s xvθ(x) u Control variate end if x x + u Walk to next point end for if r < ε then Estimate solution at x x project x to Ω Find closest point in Ω bv s + g(x) else bv s + vθ(x) end if end for ˆv 1 N ˆv Compute MC estimate number of trajectories for each update. Diffusion loss: For the diffusion loss (N usken & Richter, 2021a), we perform a grid search over the time-steps t {10 3, 10 4, 10 5} of the Euler-Maruyama scheme and the maximum number of steps in {1, 5, 10, 50}. PINNs: For PINNs (Raissi et al., 2019; Sirignano & Spiliopoulos, 2018), we use automatic differentiation to compute the Laplacian vθ. Deep Ritz: For the Deep Ritz method (E et al., 2017), we experiment with the original network architecture proposed in their paper. We sweep the number of blocks in {4, 6, 8}, the number of layers in {2, 4}, and the hidden dimension in {64, 128, 256}. Moreover, we replace the activation function with GELU. C. Ablation Studies In this section, we provide additional ablation studies on the the contribution of our additional improvements in Section 4.5, namely the control variates as well as the neural network evaluation for trajectories that did not reach the Solving Poisson Equations using Neural Walk-on-Spheres Table 3. Ablation study of the contribution of control variates and the neural network evaluation for trajectories that did not converge after a given maximum number of steps K. We report the relative L2-error for the parameter estimation in our PDE-constrained optimization problem. Method Relative L2-error Base NWo S 2.89% + Control Variate 1.72% + Terminal Eval 1.70% + Both 1.30% 1 5 10 Maximum Steps Time per iteration (s) Base Control Variate Terminal Eval Figure 6. Decomposition of the training time for one iteration of NWo S in the plain version (Section 4.4), as well as using our improvements from Section 4.5, i.e., the control variates and a neural network evaluation for trajectories that did not converge after a given maximum number of steps K. 1 5 10 Maximum Steps Time per iteration (s) Deep Ritz Diffusion Loss PINN Figure 7. Training time of our considered methods for one gradient step. For NWo S, we present the comparison for a different maximum number of steps K. boundary. We also analyze the speed of NWo S and perform comparisons with PINNs, Deep Ritz, and the diffusion loss. In Table 3, we perform an ablation study on the contribution of our improvements in our PDE-constrained optimization problem. We observe that they can decrease the relative L2error from 2.89% to 1.72% and 1.70%, respectively. If both the control variate and the neural network evaluation are Table 4. Relative L2-error (and standard deviations over 5 independent runs) of our considered methods, estimated using MC integration on 106 uniformly distributed (unseen) points in Ω. Method Problem Poisson (100d) Poisson (500d) PINN 1.49e 3 3.21e 5 2.42 2 6.06e 4 Deep Ritz 1.77e 2 1.94e 4 9.92e 3 2.56e 5 Diffusion loss 6.71e 4 1.31e 5 9.47e 3 3.81e 5 Projection 2.92e 4 5.17e 7 1.19e 5 1.67e 8 NWo S (ours) 6.22e 4 1.18e 5 9.14 3 6.31e 5 used, we obtain the best relative error of 1.30%, indicating that they can efficiently decrease bias and variance. Figure 6 decomposes the training time per iteration into the time for the base NWo S algorithm and the time for the additional extensions from Section 4.5. We assume the batch size to be fixed to m = 512 and test on the Poisson equation in Section 4.5 in 100d. We observe that our proposed extensions incur comparably small overheads. Figure 7 further compares NWo S with Deep Ritz, NSDE, and PINN with different maximum number of steps K, see Section 4.5. Considering the logarithmic scaling of the plot, each iteration of NWo S is significantly faster than PINN and NSDE but slower than Deep Ritz for a larger maximum number of steps K. Choosing K, we can balance high accuracy and fast training. D. Further Evaluations In this section, we provide further numerical evidence. We report the convergence of the relative L2-error for the other PDEs and evaluate our method on the Poisson equation in 100d and 500d. Figures 8 to 10 demonstrate that neural Wo S achieves the fastest convergence in comparison to all baseline methods within the provided time and memory constraints. Table 4 provides results for our considered methods on the Poisson equation in 100d and 500d as proposed by E & Yu (2018). We demonstrate that our NWo S method achieves lower relative L2-error than the baselines. However, we discover empirically that, for this benchmark, a simple projection to the boundary achieves the highest accuracy. This can be motivated by the smoothness of the solution and the fact that uniformly distributed evaluation samples concentrate at the boundary in high dimensions. Solving Poisson Equations using Neural Walk-on-Spheres 0 200 400 600 800 1000 Time (s) Relative L2 error Neural Wo S (Ours) Deep Ritz Diffusion Loss PINN Neural Cache Figure 8. Convergence of the relative L2-error when solving the Committor function in 10d using our considered methods. 0 200 400 600 800 1000 Time (s) Relative L2 error Neural Wo S (Ours) Deep Ritz Diffusion Loss PINN Neural Cache Figure 9. Convergence of the relative L2-error when solving the Poisson equation in 10d with rectangular torus using our considered methods. 0 250 500 750 1000 1250 1500 1750 2000 Time (s) Relative L2 error Neural Wo S (Ours) Deep Ritz Diffusion Loss PINN Neural Cache Figure 10. Convergence of the relative L2-error when solving the Poisson equation in 50d using our considered methods.