# physicsinformed_kernel_learning__ce54caf5.pdf

Journal of Machine Learning Research 26 (2025) 1-39 Submitted 9/24; Revised 3/25; Published 8/25

Physics-informed Kernel Learning

Nathan Doum eche NATHAN.DOUMECHE@SORBONNE-UNIVERSITE.FR Sorbonne University, EDF R&D

Francis Bach FRANCIS.BACH@INRIA.FR INRIA, Ecole Normale Sup erieure, PSL University

G erard Biau GERARD.BIAU@SORBONNE-UNIVERSITE.FR Sorbonne University, Institut universitaire de France

Claire Boyer CLAIRE.BOYER@UNIVERSITE-PARIS-SACLAY.FR Universit e Paris-Saclay, Institut universitaire de France

Editor: Animashree Anandkumar

Physics-informed machine learning typically integrates physical priors into the learning process by minimizing a loss function that includes both a data-driven term and a partial differential equation (PDE) regularization. Building on the formulation of the problem as a kernel regression task, we use Fourier methods to approximate the associated kernel, and propose a tractable estimator that minimizes the physics-informed risk function. We refer to this approach as physics-informed kernel learning (PIKL). This framework provides theoretical guarantees, enabling the quantiﬁcation of the physical prior s impact on convergence speed. We demonstrate the numerical performance of the PIKL estimator through simulations, both in the context of hybrid modeling and in solving PDEs. In particular, we show that PIKL can outperform physics-informed neural networks in terms of both accuracy and computation time. Additionally, we identify cases where PIKL surpasses traditional PDE solvers, particularly in scenarios with noisy boundary conditions. Keywords: Physics-informed machine learning, Kernel methods, Physics-informed neural networks, Rates of convergence, Physical regularization

1. Introduction

Physics-informed machine learning. Physics-informed machine learning (PIML), as described by Raissi et al. (2019), is a promising framework that combines statistical and physical principles to leverage the strengths of both ﬁelds. PIML can be applied to a variety of problems, such as solving partial differential equations (PDEs) using machine learning techniques, leveraging PDEs to accelerate the learning of unknown functions (hybrid modeling), and learning PDEs directly from data (inverse problems). For an introduction to the ﬁeld and a literature review, we refer to Karniadakis et al. (2021) and Cuomo et al. (2022).

Hybrid modeling setting. We consider in this paper the classical regression model, which aims at learning the unknown function f : Rd R such that Y = f (X) + ε, where Y R is the output, X Ωare the features with Ω [ L, L]d the input domain, and ε is a random noise. Using n observations (X1, Y1), . . . , (Xn, Yn), independent copies of (X, Y ), the goal is to construct an estimator ˆfn of f . What makes PIML special compared to other regression settings is the prior knowledge that f approximately follows a PDE. Therefore, we assume that f is

c 2025 Nathan Doum eche, Francis Bach, G erard Biau, and Claire Boyer.

License: CC-BY 4.0, see https://creativecommons.org/licenses/by/4.0/. Attribution requirements are provided at http://jmlr.org/papers/v26/24-1536.html.

DOUM ECHE, BACH, BIAU, AND BOYER

weakly differentiable up to the order s > d

2 and that there exists a known differential operator D such that D(f ) 0. This framework typically accounts for modeling error by recognizing that D(f ) may not be exactly zero, since most PDEs in physics are derived under ideal conditions and may not hold exactly in practice. For example, if f is expected to satisfy the wave equation 2 t f(x, t) 2 xf(x, t), we deﬁne the operator D(f)(x, t) = 2 t f(x, t) 2 xf(x, t) for (x, t) Ω. To estimate f , we consider the minimizer of the physics-informed empirical risk

i=1 |f(Xi) Yi|2 + λn f 2 Hs(Ω) + µn D(f) 2 L2(Ω) (1)

over the class F = Hs(Ω) of candidate functions, where λn > 0 and µn 0 are hyperparameters that weight the relative importance of each term. Here, Hs(Ω) denotes the Sobolev space of functions with weak derivatives up to order s. The empirical risk function Rn(f) is characteristic of hybrid modeling, as it is composed of:

A data ﬁdelity term 1

n Pn i=1 |f(Xi) Yi|2, which is standard in supervised learning and measures the discrepancy between the predicted values f(Xi) and the observed targets Yi;

A regularization term λn f 2 Hs(Ω), which penalizes the regularity of the estimator;

A model error term µn D(f) 2 L2(Ω), which measures the deviation of f from the physical prior encoded in the differential operator D. To put it simply, the lower this term, the more closely the estimator aligns with the underlying physical principles.

Throughout the paper, we refer to ˆfn as the unique minimizer of the empirical risk function, i.e.,

ˆfn = argmin f Hs(Ω) Rn(f). (2)

Algorithms to solve the PIML problem. Various algorithms have been proposed to compute the estimator ˆfn, and physics-informed neural networks (PINNs) have emerged as a leading approach (e.g., Raissi et al., 2019; Arzani et al., 2021; Karniadakis et al., 2021; Kurz et al., 2022; Agharafeie et al., 2023; Wang et al., 2023). PINNs are usually trained by minimizing a discretized version of the risk over a class of neural networks using gradient descent strategies. Leveraging the good approximation properties of neural networks, as the size of the PINN grows, this type of estimator typically converges to the unique minimizer over the entire space Hs(Ω) (Shin et al., 2020; Doum eche et al., 2025; Mishra and Molinaro, 2023; Shin et al., 2023; Bonito et al., 2025). However, apart from the fact that optimizing PINNs by gradient descent is an art in itself, the theoretical understanding of the estimators derived through this approach is far from complete (Bonfanti et al., 2024; Rathore et al., 2024), and only a few initial studies have begun to outline their theoretical contours (Krishnapriyan et al., 2021; Wang et al., 2022a; Doum eche et al., 2025). Alternative algorithms for physics-informed learning have since been developed, primarily based on kernel methods, and are seen as promising candidates for bridging the gap between machine learning and PDEs. The connections between PDEs and kernel methods are now well established (e.g., Schaback and Wendland, 2006; Chen et al., 2021; Batlle et al., 2025). Recently, a kernel method has been adapted to perform operator learning (Nelsen and Stuart, 2024). It consists of solving a PDE using samples of the initial condition (with a purely data-driven empirical risk).

PHYSICS-INFORMED KERNEL LEARNING

Quantifying the impact of physics. Understanding how physics can enhance learning is of critical importance to the PIML community. Arnone et al. (2022) show that for second-order elliptic PDEs in dimension d = 2, the PIML estimator converges at a rate of n 4/5, outperforming the Sobolev minimax rate of n 2/3.

Kernel formulation in the Fourier space. This work builds on the results of Doum eche et al. (2024), which shows that, in the case of hybrid modeling (potentially including a noise ε = 0 and a modeling error, i.e., D(f ) = 0), the PIML problem (2) can be reformulated as a kernel regression task. Provided the associated kernel K is made explicit, this reformulation allows to obtain a closedform estimator that converges at least at the Sobolev minimax rate. However, the kernel K is highly dependent on the underlying PDE, and its computation can be tedious even for simple priors, such as D = d dx in one dimension. Thus, one of the goals of the present paper is to propose an approximation of K, making it possible to implement this kernel method in practice. For general linear PDEs in dimension d, Doum eche et al. (2024) have adapted to PIML the notion of effective dimension, a central idea in kernel methods that quantify their convergence rate. As a result, for d = 1, s = 1, Ω= [ L, L], and D = d dx, the authors show that the L2-error of the physics-informed kernel method is of the order of log(n)2/n when D(f ) = 0, and achieves the Sobolev minimax rate n 2/3 otherwise. However, extending this type of results to more complex differential operators D remains a challenge. In this context, we show how to approximate the effective dimension, making it possible to experimentally estimate the convergence rate of a given PIML problem.

Contributions. Building on the characterization of the PIML problem as a kernel regression task, we use Fourier methods to approximate the associated kernel K and, in turn, propose a tractable estimator minimizing the physics-informed risk function. The approach involves developing the kernel K along the Fourier modes with frequencies bounded by m, and then taking m as large as possible. We refer to this approach as the physics-informed kernel learning (PIKL) method. Subsequently, for general linear operators D, a numerical strategy is developed to estimate the effective dimension of the kernel problem, allowing for the quantiﬁcation of the expected statistical convergence rate when incorporating the physics prior into the learning process. Finally, we demonstrate the numerical performance of the PIKL estimator through simulations, both in the context of hybrid modeling and in solving partial differential equations. In short, the PIKL algorithm consistently outperforms specialized PINNs from the literature, which were speciﬁcally designed for the applications under consideration.

2. The PIKL Estimator

In this section, we detail the construction of the PIKL estimator, our approximate kernel method for physics-informed learning. Throughout this paper, we assume that the differential operator D is linear with constant coefﬁcients, as stated in the following assumption.

Assumption 2.1 (Linear differential operator with constant coefﬁcients) The differential operator D : Hs(Ω) L2(Ω) is linear with constant coefﬁcients, i.e., D(f) = P |α| s aα αf for some s N and aα R.

We begin by observing that solving the PIML problem (2) is equivalent to performing a kernel regression task, as shown by Doum eche et al. (2024, Theorem 3.3). Thus, leveraging the extensive

DOUM ECHE, BACH, BIAU, AND BOYER

0.0 0.2 0.4 0.6 0.8 1.0

115 FEM Kernel

0.1 0.2 0.3 0.4 0.5 0.6 1.0

1.8 Data Target f

FEM-kernel solution

Figure 1: Left: Kernel function K(0.4, ) estimated by the FEM. Right: Kernel method ˆfn combined with the FEM.

literature on kernel methods, it follows that the estimator ˆfn has the closed-form expression

ˆfn = x 7 (K(x, X1), . . . , K(x, Xn))(K + n In) 1Y ,

where K : Ω2 R is the kernel associated with the squared norm λn f 2 Hs(Ω) + µn D(f) 2 L2(Ω) of (1), and K is the n n kernel matrix deﬁned by Ki,j = K(Xi, Xj).

A ﬁnite-element-method approach. The analysis of Doum eche et al. (2024, Proposition 3.4) reveals that the kernel related to the PIML problem is uniquely characterized as the solution to a weak PDE. Indeed, for all x Ω, the function y 7 K(x, y) is the unique solution in Hs(Ω) to the weak formulation

φ Hs(Ω), λn

K(x, ) φ+ X

|α|=s αK(x, ) αφ +µn

Ω D(K(x, )) D(φ) = φ(x). (3)

(This is a consequence of Doum eche et al., 2024, Proposition 3.4, applied to the risk Rn.) A spontaneous idea is to approximate the kernel K using ﬁnite element methods (FEM). For illustrative purposes, we have applied this approach in numerical experiments with d = 1, Ω= [0, 1], and D(f) = d dxf f. Figure 1 (Left) depicts the associated kernel function K(0.4, ) with λn = 10 2, µn = 0, and 100 nodes. Figure 1 (Right) shows that the PIML method (2) successfully reconstructs f (x) = exp(x) using n = 10 data points, ε N(0, 10 2), λn = 10 10, and µn = 1000. However, solving the weak formulation (3) in full generality is quite challenging, particularly when dealing with arbitrary domains Ωin dimension d > 1. In fact, FEM strategies need to be speciﬁcally tailored to the PDE and the domain in question. Additionally, standard kernel methods combined with FEM approaches come at a high computational cost, since storing the matrix K requires O(n2) memory. This becomes prohibitive for large amounts of data, as n = 104 already requires several gigabytes of RAM.

Fourier approximation. Our primary objective in this article is to develop a more agile, ﬂexible, and efﬁcient method capable of handling arbitrary domains Ω. Following Doum eche et al. (2024), our methodology ﬁrst requires extending the learning problem from Ω [ L, L]d to the torus [ 2L, 2L]d. Indeed, any function of Hs(Ω) can be periodically extended into a function

PHYSICS-INFORMED KERNEL LEARNING

of Hs per([ 2L, 2L]d) (see Doum eche et al., 2024, Proposition A.6 and Figure 1). The choice of [ 2L, 2L]d is commented in Appendix A.2. This initial technical step allows us to use approximations with the standard Fourier basis, given for k Zd and x [ 2L, 2L]d by

φk(x) = (4L) d/2e iπ 2L k,x ,

particularly adapted to periodic functions on [ 2L, 2L]d. Therefore, the minimization of the risk Rn deﬁned in (1) over Hs(Ω) can then be transferred into the minimization of the PIML risk

i=1 |f(Xi) Yi|2 + λn f 2 Hsper([ 2L,2L]d) + µn D(f) 2 L2(Ω)

over the periodic Sobolev space Hs per([ 2L, 2L]d) (see Appendix B for an introduction to periodic Sobolev spaces). This results in a slightly modiﬁed kernel, determined by the RKHS norm

f 2 RKHS = λn f 2 Hsper([ 2L,2L]d) + µn D(f) 2 L2(Ω).

It is important to note that the estimators derived from the minimization of either Rn or Rn share the same statistical guarantees, as both kernel methods have been shown to converge to f at the same rate (Doum eche et al., 2024, Theorem 4.6). To implement this kernel method, a natural approach is to expand the kernel using a truncated Fourier series, i.e., Km(x, y) = P k m akφk(x)φk(y), with (φk) k m the Fourier basis, (ak) k m the kernel coefﬁcients in this basis, and m the order of approximation. This idea is at the core of techniques such as random Fourier features (RFF) (e.g., Rahimi and Recht, 2007; Yang et al., 2012). However, unlike RFF, the Fourier features in our problem are not random quantities, as they systematically correspond to the low-frequency modes. This low-frequency approximation is particularly well-suited to the Sobolev penalty, which more strongly regularizes high frequencies (the analogous RFF algorithm would involve sampling random frequencies k according to a density that is proportional to the Sobolev decay). In addition, and more importantly, the use of such approximations bypasses the need to discretize the domain into ﬁnite elements and requires only the knowledge of the (partial) Fourier transform of 1Ω, as will be explained later. A key milestone in the development of our method is to minimize Rn not over the entire space Hs per([ 2L, 2L]d), but rather on the ﬁnite-dimensional subspace Hm = Span((φk) k m). This leads to the PIKL estimator, deﬁned by

ˆf PIKL = argmin f Hm Rn(f). (4)

This naturally transforms the PIML problem into a ﬁnite-dimensional kernel regression task, where the associated kernel Km corresponds to a Fourier expansion of K, as will be clariﬁed in the following paragraph. Of course, Hm provides better approximates of Hs per([ 2L, 2L]d) as m increases, since for any function f Hs per([ 2L, 2L]d), limm ming Hm f g Hsper([ 2L,2L]d) = 0. Remarkably, the key advantage of using Fourier approximations in our PIKL algorithm lies in the fact that both the squared Sobolev norm f 2 Hsper([ 2L,2L]d) and the PDE penalty D(f) 2 L2(Ω) are bilinear functions of the Fourier coefﬁcients of f. As shown below, these bilinear forms can be represented as closed-form matrices, easing the computation of the estimator.

DOUM ECHE, BACH, BIAU, AND BOYER

RKHS norm in Fourier space. Suppose that the differential operator D is linear with constant coefﬁcients, i.e., it can be expressed as D(f) = P |α| s aα αf for some s N and aα R. If f Hm, then f can be rewritten in terms of its Fourier coefﬁcients as

f(x) = z, Φm(x) C(2m+1)d,

where , C(2m+1)d denotes the canonical inner product on C(2m+1)d, z is the vector of Fourier coefﬁcients of f, and Φm(x) = (4L) d/2e iπ 2L k,x

According to Parseval s theorem, the L2-norm of the derivatives of f Hs per([ 2L, 2L]d) can be expressed using the Fourier coefﬁcients of f as follows: for r s and 1 i1, . . . , ir d,

r i1,...,irf 2 L2([ 2L,2L]d) = (2L) 2k X

j m |zj|2 r Y

With this notation, the Sobolev norm reads

f 2 Hsper([ 2L,2L]d) = X

j m, k m zj zk 1 + k 2 2 (2L)d

and, similarly,

D(f) 2 L2(Ω) = X

j m, k m zj zk P(j) P(k)

Ω e iπ 2L k j,x dx,

where P(k) = P |α| s aα( iπ

2L )|α| Qd ℓ=1(kℓ)αℓ. Therefore, introducing Mm the (2m+1)d (2m+ 1)d matrix with coefﬁcients indexed by j, k { m, . . . , m}d,

(Mm)j,k = λn 1 + k 2 2 (2L)d

s δj,k + µn P(j) P(k)

Ω e iπ 2L k j,x dx, (5)

we obtain that the RKHS norm of f is expressed as a bilinear form of its Fourier coefﬁcients z, i.e.,

f 2 RKHS = z, Mmz C(2m+1)d.

It is important to note that Mm is Hermitian,1 positive,2 and deﬁnite.3 Therefore, the spectral theorem (see Theorem B.6) ensures that Mm is invertible, and that its positive inverse square root M 1/2 m is unique and well-deﬁned. We have now all the ingredients to deﬁne the PIKL algorithm.

Remark 2.2 (Linear PDEs with non-constant coefﬁcients) This framework could be adapted to PDEs with non-constant coefﬁcients, i.e., to operators D(f) = P |α| s aα αf for some s N and aα C0(R). In this case, the polynomial P in (5) should be replaced by convolutions involving the Fourier coefﬁcients of the functions aα.

1. since M m = M m = Mm. 2. since z, Mmz C(2m+1)d = f 2 RKHS 0. 3. since z, Mmz C(2m+1)d = 0 implies f Hs([ 2L,2L]d) = 0, i.e., f = 0.

PHYSICS-INFORMED KERNEL LEARNING

Computing the PIKL estimator. For a function f Hm, one can evaluate f at x by f(x) = M1/2 m z, M 1/2 m Φm(x) C(2m+1)d. This reproducing property indicates that minimizing the risk Rn on Hm is a kernel method governed by the kernel

Km(x, y) = M 1/2 m Φm(x), M 1/2 m Φm(y) C(2m+1)d.

Deﬁne Y = (Y1, . . . , Yn) and Km Mn(C) to be the matrix such that (Km)i,j = Km(Xi, Xj) for all 1 i, j n. The PIKL estimator (4), minimizer of Rn restricted to Hm, is therefore given by

ˆf PIKL(x) = (Km(x, X1), . . . , Km(x, Xn))(Km + n In) 1Y

= Φm(x) (Φ Φ + n Mm) 1Φ Y, (6)

Φm(X1) ... Φm(Xn)

is an n (2m + 1)d matrix. The formula obtained in (6) is provided by

the so-called kernel trick. This step offers a signiﬁcant advantage to the PIKL estimator as it reduces the computational burden in large sample regimes: instead of storing and inverting the n n matrix Km + n In, we only need to store and invert the (2m + 1)d (2m + 1)d matrix Φ Φ + n Mm. Moreover, the computation of Φ Φ and Φ Y can be performed online and in parallel as n grows. Of course, this approach is subject to the curse of dimensionality. However, it is unreasonable to try to learn more parameters than the sample complexity n. Therefore, in practice, (2m + 1)d n, which justiﬁes the preference of the (2m + 1)2d storage complexity over the n2 storage complexity of the FEM-based algorithm. In addition, similar to PINNs, the PIKL estimator has the advantage that its training phase takes longer than its evaluation at certain points. In fact, once the (2m + 1)d

Fourier modes of ˆf PIKL (given by (Φ Φ + n Mm) 1Φ Y) are computed, the evaluation of Φ m(x) is straightforward. This is in sharp contrast to the FEM-based strategy, which requires approximating the kernel vector (K(x, X1), . . . , K(x, Xn)) at each query point x. We also emphasize that the PIKL predictor is characterized by low-frequency Fourier coefﬁcients, which, in turn, enhance its interpretability. This methodology differs signiﬁcantly from PINNs, which are less interpretable and rely on gradient descent for optimization (see, e.g., Wang et al., 2022a).

Remark 2.3 (PIKL vs. spectral methods) In the PIML context, the RFF approach resembles a well-known class of powerful tools for solving PDEs, known as spectral and pseudo-spectral methods (e.g., Canuto et al., 2007). These methods solve PDEs by selecting a basis of orthogonal functions and computing the coefﬁcients of the solution on that basis to satisfy both the boundary conditions and the PDE itself. For example, the Fourier basis (x 7 exp( iπ

2L k, x ))k Zd already used in this paper is particularly well suited for solving linear PDEs on the square domain [ 2L, 2L]d with periodic boundary conditions. Spectral methods such as these have already been used in the PIML community to integrate PDEs with machine learning techniques (e.g., Meuris et al., 2023). However, the basis functions used in spectral and pseudo-spectral methods must be speciﬁcally tailored to the domain Ω, the differential operator D, and the boundary conditions. For more information on this topic, please refer to Appendix A.1.

DOUM ECHE, BACH, BIAU, AND BOYER

Computing Mm for speciﬁc domains. Computing the matrix Mm requires the evaluation of the integrals (j, k) 7 R

Ωe iπ 2L k j,x dx. In general, these integrals can be approximated using numerical integration schemes or Monte Carlo methods. However, it is possible to provide closed-form expressions for speciﬁc domains Ω. To do so, for d N , L > 0, and Ω [ L, L]d, we deﬁne the characteristic function FΩof Ωby

FΩ(k) = 1 (4L)d

Ω e iπ 2L k,x dx.

Proposition 2.4 (Closed-form characteristic functions) The characteristic functions associated with the cube and the Euclidean ball can be analytically obtained as follows.

(Cube) Let Ω= [ L, L]d. Then, for k Zd,

(Euclidean ball) Let d = 2 and Ω= {x [ L, L], x 2 L}. Then, for k Zd,

FΩ(k) = J1(π k 2/2)

where J1 is the Bessel function of the ﬁrst kind of parameter 1.

This proposition, along with similar analytical results for other domains, can be found in Bracewell (2000, Table 13.4), noting that FΩis the Fourier transform of the indicator function 1Ωand is also the characteristic function of the uniform distribution on Ωevaluated at k 2L. We can extend these computations further since, given the characteristic functions of elementary domains Ω, it is easy to compute the characteristic functions of translation, dilation, disjoint unions, and Cartesian products of such domains (see Proposition C.1 in Appendix C). For instance, it is straightforward to obtain the characteristic function of the three-dimensional cylinder Ω= {x [ L, L], x 2 L} [ L, L] as

FΩ(k1, k2, k3) = J1(π(k2 1 + k2 2)1/2/2) 4(k2 1 + k2 2)1/2 sin(πk3/2)

3. The PIKL Algorithm in Practice

To enhance the reproducibility of our work, we provide a Python package that implements the PIKL estimator, designed to handle any linear PDE prior with constant coefﬁcients in dimensions d = 1 and d = 2. This package is available at https://github.com/Nathan Doumeche/ numerical_PIML_kernel. Note that this package implements the matrix inversion of the PIKL formula (6) by solving a linear system using the LU decomposition. Of course, any other efﬁcient method to avoid direct matrix inversion could be used instead, such as solving a linear system with the conjugate gradient method. Through numerical experiments, we demonstrate the performance of our approach in simulations for hybrid modeling (Subsection 3.1), and derive experimental convergence rates that quantify the beneﬁts of incorporating PDE knowledge into a learning regression task (Subsection 3.2).

PHYSICS-INFORMED KERNEL LEARNING

3.1 Hybrid Modeling

3 2 1 0 1 2 3

Data Target f

Figure 2: OLS and PIKL estimators for the harmonic oscillator with d = 1, sample size n = 10.

Perfect modeling with closed-form PDE solutions. We start by assessing the performance of the PIKL estimator in a perfect modeling situation (i.e., D(f ) = 0), where the solutions of the PDE D(f) = 0 can be decomposed on a basis (fk)k N of closed-form solution functions. In this ideal case, the spectral method suggests an alternative estimator, which involves learning the coefﬁcients ak R of f = P k N akfk in this basis. For example, consider the one-dimensional case (d = 1) with domain Ω= [ π, π], and the harmonic oscillator differential prior D(f) = d2f

dx + f. In this case, the solutions of D(f) = 0 are the linear combinations f = a1f1 + a2f2, where (a1, a2) R2, f1(x) = exp( x/2) cos(

3x/2), and f2(x) = exp( x/2) sin(

3x/2). Thus, the spectral method focuses on learning the vector (a1, a2) R2, instead of learning the Fourier coefﬁcients of f , which is the approach taken by the PIKL algorithm. A baseline that exactly leverages the particular structure of this problem, referred to as the ordinary least squares (OLS) estimator, is therefore ˆgn = ˆa1f1 + ˆa2f2, where

(ˆa1, ˆa2) = argmin (a1,a2) R2 1 n

i=1 |a1f1(Xi) + a2f2(Xi) Yi|2.

Figure 3: L2-error (mean std over 5 runs) of the OLS and PIKL estimators for the harmonic oscillator with d = 1, w.r.t. n in log10 log10 scale. The dashed lines represent adjusted linear models w.r.t. n, for both L2-errors.

To compare the PIKL and OLS estimators, we generate data such that Y = f (X) + ε, where X U(Ω), ε N(0, σ2) with σ = 0.5, and the target function is f = f1 (corresponding to (a1, a2) = (1, 0)). We implement the PIKL algorithm with 601 Fourier modes (m = 300) and s = 2. Figure 2 shows that even with very few data points (n = 10) and high noise levels, both the OLS and PIKL methods effectively reconstruct f , both incorporating physical knowledge in their own way. In Figure 3, we display the L2-error of both estimators for different sample sizes n. The two methods have an experimental convergence rate of n 1.1, which is consistent with the expected parametric rate of n 1. This sanity check shows that under perfect modeling conditions, the PIKL estimator with m = 300 performs as well as the OLS estimator speciﬁcally designed to explore the space of PDE solutions.

Combining the best of physics and data in imperfect modeling. In this paragraph, we deal with an imperfect modeling scenario using the heat differential operator D(f) = 1f 2 2,2f in dimension d = 2 over the domain Ω= [ π, π]2. The data are generated according to the model Y = f (X) + ε, where D(f ) L2(Ω) = 0. We assume, however, that the PDE

DOUM ECHE, BACH, BIAU, AND BOYER

serves as a good physical prior, meaning that f 2 L2(Ω) is signiﬁcantly larger than the modeling error D(f ) 2 L2(Ω). The hybrid model is implemented using the PIKL estimator with parameters

s = 2, λn = n 2/3/10, and µn = 100/n. These hyperparameters are selected to ensure that, when only a small amount of data is available, the model relies heavily on the PDE. Yet, as more data become available, the model can use the data to correct the modeling error. The performance of the PIKL estimator is compared with that of a purely data-driven estimator, referred to as the Sobolev estimator, and a strongly PDE-penalized estimator, referred to as the PDE estimator. The Sobolev estimator uses the same parameter s = 2 and λn = n 2/3/10, but sets µn = 0. This conﬁguration ensures that the estimator relies entirely on the data without considering the PDE as a prior. On the other hand, the PDE estimator is conﬁgured with parameters s = 2, λn = 10 10, and µn = 1010. These hyperparameters are set to ensure that the resulting PDE estimator effectively satisﬁes the heat equation, making it highly dependent on the physical model. We perform an experiment where ε N(0, σ2) with σ = 0.5, and f (t, x) = exp( t) cos(x)+ 0.5 sin(2x). This scenario is an example of imperfect modeling, since D(f ) 2 L2(Ω) = π > 0. However, the heat equation serves as a strong physical prior, since D(f ) 2 L2(Ω)/ f 2 L2(Ω) 4 10 3. Figure 4 illustrates the performance of the different estimators.

Norm of f PDE error PDE estimator PIKL estimator Sobolev estimator

Figure 4: L2-error (mean std over 5 runs) of the PDE, PIKL, and Sobolev estimators for imperfect modeling with the heat equation, as a function of n in log10 log10 scale. The PDE error is the L2-norm between f and the PDE solution that is closest to f .

Clearly, the PDE estimator outperforms the Sobolev estimator when the data set is small (n 102). As expected, the performance of the Sobolev estimator improves as the sample size increases (n 103), but it remains consistently inferior to that of the PIKL. When only a small amount of data is available, the PDE provides signiﬁcant beneﬁts, and the L2-error decreases at the super-parametric rate of n 2 for both the PIKL and the PDE estimators. However, in the context of imperfect modeling, the PDE estimator cannot overcome the PDE error, resulting in no further improvement beyond n 100. In addition, when a large amount of data is available, the data become more reliable than the PDE. In this case, the errors for both the PIKL and the Sobolev estimators decrease at the Sobolev minimax rate of n 2/3. Overall, the PIKL estimator successfully combines the strengths of both approaches, using the PDE when data is scarce and relying more on data when it becomes abundant.

3.2 Measuring the Impact of Physics with the Effective Dimension

The important question of measuring the impact of the differential operator D on the convergence rate of the PIML estimator has not yet found a clear answer in the literature. In this subsection, we propose an approach to experimentally compare the PIKL convergence rate to the Sobolev minimax rate in Hs(Ω), which is n 2s/(2s+d) (e.g., Tsybakov, 2009, Theorem 2.1).

PHYSICS-INFORMED KERNEL LEARNING

Theoretical backbone. According to Doum eche et al. (2024, Theorem 4.3), if X has a bounded density and the noise ε is sub-Gamma with parameters (σ, M), the L2-error of both estimators (2) and (4) satisﬁes

Ω | ˆfn f |2d PX

C4 log2(n) λn f 2 Hs(Ω) + µn D(f ) 2 L2(Ω) + M2

n2λn + σ2N (λn, µn)

where PX is the distribution of X. The quantity N (λn, µn) on the right-hand side of (7) is referred to as the effective dimension. Given the integral kernel operator LK, deﬁned by LK : f L2(PX) 7 (x 7 R

ΩK(x, y)f(y)d PX(y)) L2(PX), the effective dimension is the trace of the operator (LK +Id) 1LK (see, e.g., Caponnetto and Vito, 2007). Since λn and µn can be freely chosen by the practitioner, the effective dimension N (λn, µn) becomes a key consideration that help quantify the impact of the physics on the learning problem. Unfortunately, bounding N (λn, µn) is not trivial. Doum eche et al. (2024) have shown that, whenever d PX

dx κ with κ 1,

N (λn, µn) X

1 1 + (κλ) 1 κ X

1 1 + λ 1 ,

where On is the operator On = limm M 1 m (where the limit is taken in the sense of the operator norm see Deﬁnition B.3) and C is the operator C(f) = 1Ωf (that is, C(f)(x) = f(x) if x Ω, and C(f)(x) = 0 otherwise). Therefore, a natural idea to assess the effective dimension is to replace COn C by Cm M 1 m Cm, where Cm : Hm Hm is deﬁned by

j, k { m, . . . , m}d, (Cm)j,k = 1 (4L)d

Ω e iπ 2L k j,x dx.

The following theorem shows that this is a sound strategy, in the sense that computing the effective dimension using the eigenvalues of Cm M 1 m Cm becomes increasingly accurate as m grows.

Theorem 3.1 (Convergence of the effective dimension)

(i) One has

λ σ(Cm M 1 m Cm)

1 1 + λ 1 = X

1 1 + λ 1 .

(ii) Let σ k(Cm M 1 m Cm) be the k-th highest eigenvalue of Cm M 1 m Cm. The spectrum of the matrix Cm M 1 m Cm converges to the spectrum of COn C in the following sense:

k N , lim m σ k(Cm M 1 m Cm) = σ k(COn C).

The provided Python package4,5 includes numerical approximations of the effective dimension in dimensions d = 1 and d = 2 for any linear operator D with constant coefﬁcients, when

4. https://github.com/Nathan Doumeche/numerical_PIML_kernel 5. Also on pip at https://pypi.org/project/pikernel

DOUM ECHE, BACH, BIAU, AND BOYER

Ωis either a cube or a Euclidean ball. The code is designed to run on both CPU and GPU. The convergence of the effective dimension as m grows is studied in greater detail in Appendix D.2.

Comparison to the closed-form case. We start by assessing the quality of the approximation encapsulated in Theorem 3.1 in a scenario where the eigenvalues can be theoretically bounded. When d = 1, s = 1, D = d dx, and Ω= [ π, π], one has (Doum eche et al., 2024, Proposition 5.2)

4 (λn + µn)(k + 4)2 σ k(COn C) 4 (λn + µn)(k 2)2 .

This shows that log σ k(COn C) k 2 log(k). Figure 5 (Left) represents the eigenvalues of Cm M 1 m Cm in decreasing order, for increasing values of m, with λn = 0.01 and µn = 1. For any ﬁxed m, two distinct regimes can be clearly distinguished: initially, the eigenvalues decrease linearly on a log log scale and align with the theoretical values of 2 log(k). Afterward, the eigenvalues suddenly drop to zero. As m increases, the spectrum progressively approaches the theoretical bound. In Appendix D.2, we show that m = 102 Fourier modes are sufﬁcient to accurately approximate the effective dimension when n 104. It is evident from Figure 5 (Right) that the effective dimension exhibits a sub-linear behavior in the log log scale, experimentally conﬁrming the ﬁndings of Doum eche et al. (2024), which show that N (log(n)

n , 1 log(n)) = on (nγ) for all γ > 0. So, plugging this into (7) with λn = n 1 log(n) and µn = log(n) 1 leads to

[ L,L] | ˆfn f |2d PX = ( f 2 H1(Ω) + σ2 + M2)On n 1 log3(n)

when D(f ) = 0, i.e., when the modeling is perfect. The Sobolev minimax rate on H1(Ω) is n 2/3, whereas the experimental bound in this context gives a rate of n 1. This indicates that when the target f satisﬁes the underlying PDE, the gain in terms of speed from incorporating the physics into the learning problem is n 1/3.

Harmonic oscillator equation. Here, we follow up on the example of Subsection 3.1, as presented in Figures 2 and 3. Thus, we set d = 1, s = 2, D(u) = d2 dx2 u + d

dxu + u, and Ω= [ π, π]. Recall that in this perfect modeling experiment, we observed a parametric convergence rate of n 1, which is not surprising since the regression problem essentially involves learning the two parameters a1 and a2. Figure 6 (Left) shows the eigenvalues of Cm M 1 M Cm, while Figure 6 (Right) shows the effective dimension as a function of n. Similarly to the previous closed-form case, we observe that N (log(n)

n , 1 log(n)) = on (nγ) for all γ > 0. The same argument as in the paragraph above shows that this results in a parametric convergence rate, provided D(f ) = 0.

Heat equation on the disk. Let us now consider the one-dimensional heat equation D = x 2

y2 , with d = 2, s = 2, and the disk Ω= {x R2, x 2 π}. Since the heat equation is known to have C solutions with bounded energy (see, e.g., Evans, 2010, Chapter 2.3, Theorem 8), we expect the convergence rate to match that of H (Ω), which corresponds to the parametric rate of n 1. Once again, we observe N (log(n)

n , 1 log(n)) = on (nγ) for all γ > 0, and thus an improvement over the

n 2/3-Sobolev minimax rate on H2(Ω) when D(f ) = 0.

Quantifying the impact of physics. The three examples above show how incorporating physics can enhance the learning process by reducing the effective dimension, leading to a faster convergence

PHYSICS-INFORMED KERNEL LEARNING

0 1 2 3 4 log10 (index)

log10 (Eigenvalue)

2m + 1 = 100.5

2m + 1 = 101.0

2m + 1 = 101.5

2m + 1 = 102.0

2m + 1 = 102.5

2m + 1 = 103.0

2m + 1 = 103.5

2m + 1 = 104.0

Theoretical UB

1 2 3 4 log10 (n)

log10 (Effective dimension)

2m + 1 = 104.0

Figure 5: The case of D = d dx. Left: Spectrum of Cm M 1 m Cm. Right: Estimation of the effective dimension n 7 N (log(n)

n , 1 log(n)).

0 1 2 3 4 log10 (index)

log10 (Eigenvalue)

2m + 1 = 100.5

2m + 1 = 101.0

2m + 1 = 101.5

2m + 1 = 102.0

2m + 1 = 102.5

2m + 1 = 103.0

2m + 1 = 103.5

2m + 1 = 104.0

1 2 3 4 log10 (n)

log10 (Effective dimension)

2m + 1 = 104.0

Figure 6: Harmonic oscillator. Left: Spectrum of Cm M 1 m Cm. Right: Estimation of the effective dimension n 7 N (log(n)

n , 1 log(n)).

DOUM ECHE, BACH, BIAU, AND BOYER

0 1 2 3 4 log10 (index)

log10 (Eigenvalue)

(2m + 1)2 = 100.0

(2m + 1)2 = 101.0

(2m + 1)2 = 101.4

(2m + 1)2 = 101.9

(2m + 1)2 = 102.5

(2m + 1)2 = 103.0

(2m + 1)2 = 103.5

(2m + 1)2 = 104.0

1 2 3 4 log10 (n)

log10 (Effective dimension)

(2m + 1)2 = 104.0

Figure 7: Heat equation. Left: Spectrum of Cm M 1 m Cm. Right: Estimation of the effective dimension n 7 N (log(n)

n , 1 log(n)).

rate. In all cases, the rate becomes parametric due to the PDE, achieving the fastest possible speed, as predicted by the central limit theorem. Our package can be directly applied to any linear PDE with constant coefﬁcients to compute the effective convergence rate given a scaling of λn and µn. By identifying the optimal convergence rate, this approach can assist in determining the best parameters λn and µn for use in other PIML techniques, such as PINNs.

4. PDE Solving: Mitigating the Difﬁculties of PINNs with PIKL

It turns out that our PIKL algorithm can be effectively used as a PDE solver. In this scenario, there is no noise (i.e., ε = 0), no modeling error (i.e., D(f ) = 0), and the data consist of samples of boundary and initial conditions, as is typical for PINNs. Assume for example that the objective is to solve the Laplacian equation (f ) = 0 on a domain Ω [ 1, 1]2 with the Dirichlet boundary condition f | Ω= g, where g is a known function. Then this problem can be addressed by implementing the PIKL estimator, which minimizes the risk Rn(f) = 1 n Pn i=1 |f(Xi) Yi|2 +λn f 2 H2per([ 1,1]2) +µn (f) 2 L2(Ω), where the Xi are uniformly sampled on Ωand Yi = g(Xi). Of course, this example focuses on Dirichlet boundary conditions, but PIKL is a highly ﬂexible framework that can incorporate a wide variety of boundary conditions, such as periodic and Neumann boundary conditions, as the next two examples will illustrate.

Comparison with PINNs for the convection equation. To begin, we compare the performance of our PIKL algorithm with the PINN approach developed by Krishnapriyan et al. (2021) for solving the one-dimensional convection equation D(f) = tf + β xf on the domain Ω= [0, 1] [0, 2π]. The problem is subject to the following periodic boundary conditions: x [0, 2π], f(0, x) = sin(x), t [0, 1], f(t, 0) = f(t, 2π) = 0.

The solution of this PDE is given by f (t, x) = sin(x βt). Krishnapriyan et al. (2021) show that for high values of β, PINNs struggle to solve the PDE effectively. To address this challenge, we

PHYSICS-INFORMED KERNEL LEARNING

train our PIML kernel method using n = 100 data points and 1681 Fourier modes (i.e., m = 20). The training data set (Xi, Yi)1 i n is constructed such that Xi = (0, 2πUi) and Yi = sin(Ui), where (Ui)1 i n are i.i.d. uniform random variables. To enforce the periodic boundary conditions, we center Ωat Ω= Ω (0.5, π), extend it to [ 1, 1] [ π, π], and consider Hm = Span((t, x) 7 ei( π

2 k1t+k2x)) k m. Noting that for all (j1, k1), (j2, k2) Z2,

[ 1,1] [ π,π] ei( π

2 (k1 j1)t+(k2 j2)x)dx = sin(π(k1 j1)/2)

we let the matrix (Mm)j,k be as follows:

(Mm)j,k = λn 1 + k 2 2 (2L)2

δj,k + µn P(j) P(k)

(4L)2 sin(π(k1 j1)/2)

where P is the polynomial associated with the operator D. Notice that, although f is a sinusoidal function, the frequency vector of f is ( β, 1), which does not belong to π

2 Z Z. As a result, f

does not lie in Hm for any m. Table 1 compares the performance of various PIML methods using a sample of n = 100 initial condition points. The performance of an estimator ˆfn on a test set (Test) is evaluated based on the L2 relative error (P x Test ˆfn(x) f (x) 2 2/ P y Test f (y) 2 2)1/2. Standard deviations are computed across 10 trials. The results show that the PIML kernel estimator clearly outperforms PINNs in terms of accuracy.

Table 1: L2 relative error of the kernel method in solving the advection equation.

Vanilla PINNs Curriculum-trained PINNs PIKL estimator

β = 20 7.50 10 1 9.84 10 3 (1.56 3.46) 10 8

β = 30 8.97 10 1 2.02 10 2 (0.91 2.20) 10 7

β = 40 9.61 10 1 5.33 10 2 (7.31 6.44) 10 9

Krishnapriyan et al. (2021, Table 1)

Comparison with PINNs for the 1d-wave equation. The performance of the PIKL algorithm is compared to the PINN methodology of Wang et al. (2022a, Section 7.3) for solving the one-dimensional wave equation D(f) = 2 t,tf 4 2 x,xf on the square domain [0, 1]2, with the following boundary conditions:

x [0, 1], f(0, x) = sin(πx) + sin(4πx)/2, x [0, 1], tf(0, x) = 0, t [0, 1], f(t, 0) = f(t, 1) = 0.

The solution of the PDE is f (t, x) = sin(πx) cos(2πt) + sin(4πx) cos(8πt)/2. This solution serves as an interesting benchmark since f exhibits signiﬁcant variations, with tf 2 2/ f 2 2 = 16π2 (Figure 8, Left). Meanwhile, PINNs are known to have a spectral bias toward low frequencies (e.g., Deshpande et al., 2022; Wang et al., 2022b). The optimization of the PINNs in Wang et al. (2022b) is carried out using stochastic gradient descent with 80, 000 steps, each drawing 300 points at random, resulting in a sample size of n = 2.4 106. The architecture of the PINNs

DOUM ECHE, BACH, BIAU, AND BOYER

these authors employ is a dense neural network with tanh activation functions and layers of sizes (2, 500, 500, 500, 1), resulting in m = (2 500 + 500) + 2 (500 500 + 500) + (500 1 + 1) = 503, 001 parameters. The training time for Vanilla PINNs is 7 minutes on an Nvidia L4 GPU (24 GB of RAM, 30.3 tera FLOPs for Float32). We obtain an L2 relative error of 4.21 10 1, which is consistent with the results of Wang et al. (2022a), who report a L2 relative error of 4.52 10 1. Figure 8 (Middle) shows the Vanilla PINNs. We train our PIKL method using n = 105 data points and 1681 Fourier modes (i.e., m = 20). Let (Ui)1 i n be i.i.d. random variables uniformly distributed on [0, 1]. The training data set (Xi, Yi)1 i n is constructed such that

if 1 i n/4 , then Xi = (0, Ui) and Yi = sin(πUi) + sin(4πUi)/2,

if n/4 + 1 i 2 n/4 , then Xi = (Ui, 0) and Yi = 0,

if 2 n/4 + 1 i 3 n/4 , then Xi = (Ui, 1) and Yi = 0,

if 3 n/4 + 1 i n, then Xi = (1/n, Ui) and

Yi = f(0, Ui) + 1 2n2 2 t,tf(0, Ui) = f(0, Ui) + 2

n2 2 x,xf(0, Ui)

sin(πUi) + 1

The ﬁnal requirement enforces the initial condition tf = 0 in a manner similar to that of a secondorder numerical scheme. Table 2 compares the performance of the PINN approach from Wang et al. (2022a) with the PIKL estimator. Across 10 trials, the PIKL method achieves an L2 relative error of (8.70 0.08) 10 4, which is 50% better than the performance of the PINNs. This demonstrates that the kernel approach is more accurate, requiring fewer data points and parameters than the PINNs. The training time for the PIKL estimator is 6 seconds on an Nvidia L4 GPU. Thus, the PIKL estimator can be computed 70 times faster than the Vanilla PINNs. Figure 8 (Right) shows the PIKL estimator. Note that in this case, the solution f can be represented by a sum of complex exponential functions (f H16), which could have biased the result in favor of the PIKL estimator by canceling its approximation error. However, the results remain unchanged when altering the frequencies in Hm (e.g., taking L = 0.55 in Equation 5 instead of L = 0.5 yields an L2 relative error of (9.6 0.3) 10 4).

Table 2: Performance of PINN/PIKL methods for solving the wave equation on Ω= [0, 1]2

Vanilla PINNs NTK-optimized PINNs PIKL estimator

L2 relative error 4.52 10 1 1.73 10 3 (8.70 0.08) 10 4

Training data (n) 2.4 106 2.4 106 105

Number of parameters 5.03 105 5.03 105 1.68 103

Wang et al. (2022a, Figure 6)

PHYSICS-INFORMED KERNEL LEARNING

0.0 0.2 0.4 0.6 0.8 1.0 0.0

0.0 0.2 0.4 0.6 0.8 1.0 0.0

0.0 0.2 0.4 0.6 0.8 1.0 0.0

Figure 8: Left: ground truth solution f to the wave equation (taken from Wang et al., 2022a, Figure 6). Middle: Vanilla PINNs from Wang et al. (2022a). Right: PIKL estimator.

5. PDE Solving with Noisy Boundary Conditions

5.1 Wave Equation in Dimension 2

Comparison with traditional PDE solvers. PIML is a promising framework for solving PDEs, particularly due to its adaptability to domains Ωwith complex geometries, where most traditional PDE solvers tend to be highly domain-dependent. However, its comparative performance against traditional PDE solvers remains unclear in scenarios where both approaches can be easily implemented. The meta-analysis by Mc Greivy and Hakim (2024) indicates that, in some cases, PINNs may be faster than traditional PDE solvers, although they are often less accurate. In our study, solving the wave equation on a simple square domain represents a setting where traditional numerical methods are straightforward to implement and are known to perform well. Table 3 summarizes the performance of classical techniques, including the explicit Euler, Runge-Kutta 4 (RK4), and Crank-Nicolson (CN) schemes (see Appendix D.3 for a brief presentation of these methods). These methods clearly outperform both PINNs and the PIKL algorithm, even with fewer data points.

Table 3: Performance of traditional PDE solvers for the wave equation on Ω= [0, 1]2.

Euler explicit RK4 CN

L2 relative error 3.8 10 6 6.8 10 6 5.6 10 3

Training data (n) 104 104 104

Noisy boundary conditions. However, a more relevant setting for comparing the performance of these methods arises when noise is introduced into the boundary conditions. This situation is common, for instance, when the initial condition of the wave is measured by a noisy sensor. Such a setting aligns with hybrid modeling, where ε = 0, but there is no modeling error (i.e., D(f ) = 0). Table 4 compares the performance of all methods with Gaussian noise of variance of 10 2. In this case, the PIKL estimator outperforms all other approaches. Such PDEs with noisy boundary conditions are special cases of the hybrid modeling framework, where the data is located on the boundary of the domain. This situation arises, for example, in Cai et al. (2021) which models the temperature in the core of a nuclear reactor.

DOUM ECHE, BACH, BIAU, AND BOYER

Table 4: Performance for the wave equation with noisy boundary conditions.

PINNs Euler explicit RK4 CN PIKL estimator

L2 relative error 4.61 10 1 1.25 10 1 6.05 10 2 2.01 10 2 1.87 10 2

Training data (n) 2.4 106 4 104 4 104 4 104 4 104

101 102 103 104 105 10 4

Constant baseline PIKL PINN

Figure 9: L2 relative error of the models for the 4-dimensional heat equation with noisy boundary conditions as a function of the number of training points. Standard deviations are estimated over 5 runs.

5.2 Heat Equation in Dimension 4

Still in the context of noisy boundary conditions, we study the feasibility and limitations of the PIKL estimator in higher dimensions. To this aim, we consider the task of learning a solution to the heat equation in dimension 4 with noisy boundary conditions. In this setting, the goal is to learn

f (x1, x2, x3, x4) = exp( 3x1/π2) cos(x2/π) cos(x3/π) cos(x4/π)

on Ω= [ 0.5, 0.5]4 given n i.i.d. observations (X1, Y1), . . . , (Xn, Yn) such that Y = f (X) + N(0, 10 2), and X is sampled on ({0} [ 0.5, 0.5]3) ([ 0.5, 0.5] [ 0.5, 0.5]3) to respectively encode the initial and boundary conditions. The function f is the unique solution to the heat equation ( 1 2 2,2 2 3,3 2 4,4)f = 0 satisfying this set of initial and boundary conditions (see, e.g., Evans, 2010, Chapter 2.3, Theorem 5). The function f is ﬂat, and close to the constant function equal to 1. We compare the performance of our PIKL estimator with a PINN. Here, the PIKL estimator is computed with m = 3, leading to 2401 Fourier modes. The PINN is a fully-connected neural network with three hidden layers of size 10, using tanh as activation function, and optimized on 2 105 collocation points by 2000 gradient descent steps. In this high-dimensional setting, Figure 9 shows that the PIKL estimator clearly outperforms the PINN both in terms of performance. Note that both methods outperform the constant model equal to R

Ωf (dotted line). Moreover, the PIKL estimator is more than 100 times faster than the PINN. The experimental convergence rate of the PIKL estimator is n 0.53, which matches the parametric rate of n 1/2.

PHYSICS-INFORMED KERNEL LEARNING

6. Conclusion and Future Directions

In this article, we developed an efﬁcient algorithm to solve the PIML hybrid problem (2). The PIKL estimator can be computed exactly through matrix inversion and possesses strong theoretical properties. Speciﬁcally, we demonstrated how to estimate its convergence rate based on the PDE prior D. Moreover, through various examples, we showed that it outperforms PINNs in terms of performance, stability, and training time in certain PDE-solving tasks where PINNs struggle to escape local minima during optimization. Future work could focus on comparing PIKL with the implementation of RFF and exploring its performance against PINNs in the case of PDEs with nonconstant coefﬁcients. Another avenue for future research is to assess the effectiveness of the kernel approach compared to traditional PDE solvers, as discussed in Section 5.

Extension to nonlinear PDEs. Extending the PIKL framework to accommodate nonlinear PDEs is an important and interesting direction for future research. From a PDE theory point of view, it is expected to be harder to deal with nonlinear PDEs, since nonlinear differential operators are challenging compared to linear operators. Even in dimension d = 1, the solution of the ODE y = y2 with initial condition y(0) = y0 is y(t) = (1/y0 t) 1, which explodes at t = y0. Note that the domain of the solution y is given by the set {t y0}, which intricately depends on the initial condition y0. This prevents us from using a systematic methodology to solve this problem, and would require to design speciﬁc algorithms tailored to the condition-dependent geometry of the domain Ω.

Comparison with neural networks. In our experiments, physics-informed kernel methods systematically outperformed neural networks, both in terms of performance and running time. However, neural networks have the great property to be able to adapt to low-dimensional problems, resulting in quicker convergence rate (under the hypothesis of an exact optimization) contrary to kernel methods (see, e.g., Bach, 2024, Section 2.6). Thus, there are two scenarios in which PINNs could be preferable to PIKL. First, implementing PIKL for nonlinear PDEs is not straightforward, whereas PINNs handle them with ease. Second, for high-dimensional problems with an intrinsically lowdimensional structure such as coupled differential equations commonly found in earth sciences or biology the computational cost of PIKL becomes prohibitively high.

Acknowledgments

We greatly thank the Action Editor and two referees for their valuable comments and insightful suggestions, which lead to a substantial improvement of the paper.

DOUM ECHE, BACH, BIAU, AND BOYER

Appendix A. Comments on the PIKL Estimator

A.1 Spectral Methods and PIKL

The Fourier approximation on which the PIKL algorithm relies resembles usual spectral methods. Spectral methods are a class of numerical techniques used to solve PDEs by representing the solution as a sum of basis functions, typically trigonometric (Fourier series) or polynomial (Chebyshev or Legendre polynomials). These methods are particularly powerful for problems with smooth solutions and periodic or well-behaved boundary conditions (e.g., Canuto et al., 2007). However the basis functions used in spectral and pseudo-spectral methods must be speciﬁcally tailored to the domain Ω, the differential operator D, and the boundary conditions. This customization ensures that the method effectively captures the characteristics of the problem being solved. For example, the Fourier basis is unable to accurately reconstruct non-periodic functions on a square domain, leading to the Gibbs phenomenon at points of periodic discontinuity. A natural solution to this problem is to extend the solution of the PDE from the domain Ωto a simpler domain that admits a known spectral basis (e.g., Matthysen and Huybrechs, 2016, for Fourier basis extension). If the solution of the PDE on Ωcan be extended to a solution of the same PDE on the extended domain, it becomes possible to apply a spectral method directly to the extended domain (e.g., Badea and Daripa, 2001; Lui, 2009). However, the PDE must satisfy certain regularity conditions (e.g., ellipticity), and there must be a method to implement the boundary conditions on Ωinstead of on the boundary of the extended domain. In this article, we take a slightly different approach. Although we extend Ω [ L, L]d to [ 2L, 2L]d, we impose the PDE only on Ωand not on the entire extended domain [ 2L, 2L]d. Also, unlike spectral methods, we do not require that D( ˆf PIKL) = 0. Instead, to ensure that the problem is well-posed, we regularize the PIML problem using the Sobolev norm of the periodic extension. This Tikhonov regularization is a conventional approach in kernel learning and is known to resemble spectral methods because it acts as a low-pass ﬁlter (see, e.g., Caponnetto and Vito, 2007). However, given a kernel, it is non-trivial to identify the basis of orthogonal functions that diagonalize it. The main contribution of this article is to establish an explicit connection between the Fourier basis and the PIML kernel, leading to surprisingly simple formulas for the kernel matrix Mm.

A.2 Choice of the Extended Domain

Embedding Ωin a toroidal structure is necessary to consider periodic functions, which is a requirement for our Fourier expansion. Let LT deﬁne the length of the torus, i.e., such that an extension to [ LT, LT] is considered. Since we assume that Ω [ L, L]d, we necessarily have LT > L. This condition being satisﬁed, any value of LT > L would be admissible for the PIKL framework. However, the choice of LT may impact the algorithmic performances. In particular, the basis functions Φk(x) = exp(i π

LT k, x ) depend on LT. With a ﬁxed number m of Fourier modes, we see that for all k such that k m, and for all x Ω, lim LT Φk(x) = 1. This means that, in the case of an excessively large torus (when LT grows), our PIKL estimator will only be able to learn constant functions. Therefore, LT should not be too large relative to L. Letting LT be close to L, i.e., LT = (1 + ε)L is not a good idea. To see this, consider the case where d = 1 and Ω= [ 1, 1]. Then, if ε is small, the periodic extension of f to [ 1 ε, 1 + ε] will admit an exploding Sobolev norm. Indeed, for any periodic function f H1 per([ 1 ε, 1+ε]),

PHYSICS-INFORMED KERNEL LEARNING

f 2 H1per([ 1 ε,1+ε]) = Z

[ 1 ε,1+ε] f2 + (f )2 Z

[ 1 ε,1+ε] (f )2 Z

[ ε,ε] (f (x + 1 + ε))2dx.

Then, the Cauchy-Schwarz inequality states that Z

[ ε,ε] (f (x + 1 + ε))2dx Z

[ ε,ε] 1dx Z

[ ε,ε] f (x + 1 + ε)dx 2 ,

meaning that Z

[ ε,ε] (f (x + 1))2dx ε 1[f(1) f( 1)]2.

All in all, denoting by E(f ) the extension of f to [ 1 ε, 1 + ε], we deduce that

E(f ) 2 H1per([ 1 ε,1+ε]) ε 1(f (1) f ( 1))2.

This computation shows that taking the extension torus to be [ 1 ε, 1 + ε] results in the Sobolev norm f 2 H1per([ 1 ε,1+ε]) introducing a bias towards functions satisfying f (1) = f ( 1), i.e., favoring periodic functions on Ω. The choice of an optimal constant LT depending on L is non-trivial, and closely relates to computing the constant in the Sobolev embedding Hs(Ω) Hs per([ LT, LT]), which is known to be a difﬁcult problem in functional analysis. Given that LT = 2L strikes a balance by avoiding both pathological behaviors discussed earlier, we have adopted this choice throughout the paper.

A.3 Reproducing Property

Here, we formally prove that both properties

(i) f(x) = M1/2 m z, M 1/2 m Φm(x) C(2m+1)d, and

(ii) f 2 RKHS = z, Mmz C(2m+1)d = M1/2 m z 2 2

are sufﬁcient to show that minimizing Rn is a kernel method. From (i) and (ii), we deduce that the feature map is x 7 M 1/2 m Φm(x). The kernel is thus necessarily given by K(x, y) = M 1/2 m Φm(x), M 1/2 m Φm(y) C(2d+1)d. From (ii), we deduce that the RKHS inner product is given

by z, z RKHS = M1/2 m z, M1/2 m z C(2d+1)d, so that z 2 RKHS = M1/2 m z 2 2. The reproducing property is then a consequence of (i) and (ii). Indeed, let x Ω. We know that f, K(x, ) RKHS = M1/2 m z, M1/2 m z C(2d+1)d, where z is the Fourier vector of f and z the Fourier vector of K(x, ). Since the kernel K results from

K(x, y) = M 1/2 m Φm(x), M 1/2 m Φm(y) C(2d+1)d = M 1 m Φm(x), Φm(y) C(2d+1)d,

we know that z = M 1 m Φm(x). This means that

f, K(x, ) RKHS = M1/2 m z, M1/2 m M 1 m Φm(x) C(2d+1)d = z, Φm(x) C(2d+1)d = f(x),

which is the reproducing property.

DOUM ECHE, BACH, BIAU, AND BOYER

Appendix B. Fundamentals of Functional Analysis on Complex Hilbert Spaces

Let L > 0 and d N . We deﬁne L2([ 2L, 2L]d, C) as the space of complex-valued functions f on the hypercube [ 2L, 2L]d such that R

[ 2L,2L]d |f|2 < . The real part of f is denoted by ℜ(f), and the imaginary part by ℑ(f), such that f = ℜ(f) + iℑ(f). Throughout the appendix, for the sake of clarity, we use the dot symbol to represent functions. For example, denotes the function x 7 x , and , stands for the function (x, y) 7 x, y .

Deﬁnition B.1 (L2-space and 2-norm) The separable Hilbert space L2([ 2L, 2L]d, C) is associated with the inner product f, g = R

[ 2L,2L]d f g and the norm f 2 2 = R

[ 2L,2L]d |f|2.

Deﬁnition B.2 (Periodic Sobolev spaces) The periodic Sobolev space Hs per([ 2L, 2L]d, R) is the space of real functions f(x) = P k Zd zk exp( iπ

2L k, x ) such that the Fourier coefﬁcients zk satisfy P k |zk|2(1 + k 2s) < . The corresponding complex periodic Sobolev space is deﬁned by

Hs per([ 2L, 2L]d, C) = Hs per([ 2L, 2L]d, R) i Hs per([ 2L, 2L]d, R).

It is the space of complex-valued functions f(x) = P k Zd zk exp( iπ

2L k, x ) such that P k |zk|2(1+ k 2s) < .

We recall that, given two Hilbert spaces H1 and H2, an operator is a linear function from H1 to H2.

Deﬁnition B.3 (Operator norm) (e.g., Brezis, 2010, Section 2.6) Let O : L2([ 2L, 2L]d, C) L2([ 2L, 2L]d, C) be an operator. Its operator norm |||O|||2 is deﬁned by

|||O|||2 = sup g L2([ 2L,2L]d,C) g 2=1

Og 2 = sup g L2([ 2L,2L]d,C) g =0

g 1 2 Og 2.

The operator norm is sub-multiplicative, i.e., |||O1 O2|||2 |||O1|||2 |||O2|||2.

Deﬁnition B.4 (Adjoint) Let (H, , H) be an Hilbert space and O : H H be an operator. The adjoint O of O is the unique operator such that f, g H, f, Og H = O f, g H.

If H = Rd with the canonical scalar product, then O is the d d matrix O = OT . If H = Cd

with the canonical sesquilinear inner product, then O is the d d matrix O = OT .

Deﬁnition B.5 (Hermitian operator) Let H be an Hilbert space and O : H H be an operator. The operator O is said to be Hermitian if O = O .

Theorem B.6 (Spectral theorem) (e.g. Rudin, 1991, Theorems 12.29 and 12.30) Let O be a positive Hermitian compact operator. Then O is diagonalizable on an Hilbert basis with positive eigenvalues that tend to zero. We denote its eigenvalues, ordered in decreasing order, by σ(O) = (σ k(O))k N .

We emphasize that, given an invertible positive self-adjoint compact operator O and its inverse O 1, the eigenvalues of O 1 can also be ordered in increasing order, i.e.,

σ(O 1) = (σ k(O 1))k N = (σ k(O) 1)k N . (8)

PHYSICS-INFORMED KERNEL LEARNING

Theorem B.7 (Courant-Fischer minmax theorem ) (Brezis, 2010, Problem 37) Let O : H H be a positive Hermitian compact operator. Then

σ k(O) = max H H dim H=k

min g H g 2=1

If O is injective, then

σ k(O 1) = min H O(H) dim H=k

max g H g 2=1

Interestingly, if O is a positive Hermitian compact operator, then Theorem B.7 shows that |||O|||2 equals its largest eigenvalue.

Deﬁnition B.8 (Orthogonal projection on Hm) We let Πm : Hs per([ 2L, 2L]d, C) Hm be the orthogonal projection with respect to , , i.e., for all f Hs per([ 2L, 2L]d, C),

[ 2L,2L]d exp iπ

2L k, x f(x)dx exp iπ

Note that Πm is Hermitian and that, for all f L2([ 2L, 2L]d, C), limm f Πmf 2 = 0.

Appendix C. Theoretical Results for PIKL

C.1 Detailed Computation of the Fourier Expansion of the Differential Penalty

The formula relating D(f) L2(Ω) to the Fourier coefﬁcients of f follows from (i) expanding f in the Fourier basis, (ii) leveraging the linearity of D, (iii) applying the property D(x 7 e iπ

2L k,x ) = (x 7 P(k)e iπ

2L k,x ), and (iv) recognizing that |z|2 = z z. From these steps, we deduce that

D(f) 2 L2(Ω) = Z

Ω |D(f)(x)|2dx

j m zj P(j) (4L)d/2 e iπ

2L j,x 2 dx,

j m zj P(j) (4L)d/2 e iπ

k m zk P(k) (4L)d/2 e iπ 2L k,x dx dx

j m, k m zj zk P(j) P(k)

Ω e iπ 2L k j,x dx.

DOUM ECHE, BACH, BIAU, AND BOYER

C.2 Proof of Proposition 2.4

[ L,L]d e iπ 2L k,x dx = 1 (4L)d

[ L,L] e iπ 2L kjxdx =

2iπe iπ 2L kjxi L

2 kj) πkj .

The characteristic function of the Euclidean ball is computed in Bracewell (2000, Table 13.4).

C.3 Operations on Characteristic Functions

Proposition C.1 (Operations on characteristic functions) Consider d N , L > 0, and Ω [ L, L]d.

Let a [ 1, 1]. Then a Ω [ L, L]d and

Fa Ω(k) = |a|d FΩ(a k).

Let Ω [ L, L]d be a domain such that Ω Ω= . Then Ω Ω [ L, L]d and

FΩ Ω(k) = FΩ(k) + F Ω(k).

Assume that Ω [ L/2, L/2]d, and let z Rd be such that z < L/2. Then Ω+ z [ L, L]d and

FΩ+z(k) = FΩ(k) exp iπ

Assume that Ω= Ω1 Ω2, where Ω1 [ L, L]d1, Ω2 [ L, L]d2, and d1 + d2 = d. Then

FΩ(k) = FΩ1(k1, . . . , kd1) FΩ2(kd1+1, . . . , kd).

C.4 Operator Extensions

Deﬁnition C.2 (Projection on Ω) The projection C : L2([ 2L, 2L]d, C) L2([ 2L, 2L]d, C) on Ωis deﬁned by C(f) = 1Ωf.

Deﬁnition C.3 (Operator extensions) The operators Cm : Hm Hm, Mm : Hm Hm, and M 1 m : Hm Hm can be extended to L2([ 2L, 2L]d, C) by Cm = Πm CmΠm, Mm = Πm MmΠm, and M 1 m = Πm M 1 m Πm.

From now on, we consider the extensions of these operators, allowing us to express equivalently

|||M 1 m |||2 = sup g Hm g 2=1

M 1 m g 2 = sup g L2([ 2L,2L]d) g 2=1

It is important to note that the extended operator M 1 m is no longer the inverse of the extended operator Mm.

PHYSICS-INFORMED KERNEL LEARNING

Proposition C.4 (Compact operator extension) Let O be a positive Hermitian compact operator on L2([ 2L, 2L]d, R). Then its unique extension O to L2([ 2L, 2L]d, C) is a positive Hermitian compact operator with the same real eigenfunctions and positive eigenvalues.

Proof Since O is C-linear, we necessarily have O(f) = O(ℜ(f)) + i O(ℑ(f)). Therefore, the extension is unique. Since O is compact, O is also compact. According to Theorem B.6, the operator O is diagonalizable in a Hermitian basis (fk)k N . Thus, for all f L2([ 2L, 2L], R),

k N σ k(O) f, fk L2([ 2L,2L],R)fk.

Thus, for all f L2([ 2L, 2L], C)

k N σ k(O)( ℜ(f), fk L2([ 2L,2L],R) + i ℑ(f), fk L2([ 2L,2L],R))fk

k N σ k(O) f, fk L2([ 2L,2L],C)fk.

This formula shows that O is Hermitian and diagonalizable with the same real eigenfunctions and positive eigenvalues as O.

Recall that On is the operator On = limm M 1 m , where the limit is taken in the sense of the operator norm (see Proposition B.2 Doum eche et al., 2024).

Deﬁnition C.5 (Operator M) Proposition C.4 shows that the operator On can be extended to L2([ 2L, 2L], C). We denote the extension of On by M 1.

The uniqueness of the extension in Proposition C.4 implies that the extension of the operator COn C : L2([ 2L, 2L]d) L2([ 2L, 2L]d) to C is indeed CM 1C : L2([ 2L, 2L]d, C) L2([ 2L, 2L]d, C). Proposition C.4 shows that COn C has the same eigenvalues as CM 1C.

C.5 Convergence of M 1 m

Lemma C.6 (Bounding the spectrum of M 1 m ) Let m N . Then, for all k N ,

σ k(M 1 m ) σ k(M 1).

Proof Let f Hm. Then,

f, Mmf = f, Mf = λn f 2 Hsper([ 2L,2L]d) + µn D(f) 2 L2(Ω).

Thus, using Theorem B.7, we deduce that

σ k(Mm) = min H Hm dim H=k

max g H g 2=1

= min H Hm dim H=k

max g H g 2=1

min H Hsper([ 2L,2L]d) dim H=k

max g H g 2=1

DOUM ECHE, BACH, BIAU, AND BOYER

From (8), we deduce that σ k(M 1 m ) σ k(M 1).

Lemma C.7 (Spectral convergence of Mm) Let m N . Then, for all k N , one has

lim m σ k(M 1 m ) = σ k(M 1).

Proof By continuity of the RKHS norm f 7 f, Mf on Hs per([ 2L, 2L]d, C) (Doum eche et al., 2024, Proposition B.1), we know that, for all function f Hs per([ 2L, 2L]d, C), the quantity λn Πm(f) 2 Hsper([ 2L,2L]d,C) + µn D(Πm(f)) 2 L2(Ω,C) converges to λn f 2 Hsper([ 2L,2L]d,C) +

µn D(f) 2 L2(Ω,C) as m goes to the inﬁnity. Thus,

f Hs per([ 2L, 2L]d, C), lim m f, (M Mm)f = 0.

Next, consider f1, . . . , fk to be the eigenfunctions of M associated with the ordered eigenvalue σ 1(M), . . . , σ k(M). Since, for any 1 j, ℓ k, we have that limm fj + fℓ, Mm(fj + fℓ) = fj + fℓ, M(fj + fℓ) and limm fj + fℓ, Mm(fj + fℓ) = limm fj, Mmfj + limm fℓ, Mmfℓ + 2 limm ℜ( fj, Mmfℓ ), we deduce that limm ℜ( fj, Mmfℓ ) = ℜ( fj, Mfℓ ). Using the same argument by developing fj + ifℓ, Mm(fj + ifℓ) shows that limm ℑ( fj, Mmfℓ ) = ℑ( fj, Mfℓ ). Overall,

1 j, ℓ k, lim m fj, Mmfℓ = fj, Mfℓ . (9)

Now, observe that

(g Span(f1, . . . , fk) and g 2 = 1) ( (a1, . . . , ak) Ck, g =

j=1 ajfj and

j=1 |aj|2 = 1).

max g Span(f1,...,fk) g 2=1

| g, Mmg g, Mg | max a 2=1

i,j=1 |aiaj|| fi, (Mm M)fj |

k max 1 i,j k | fi, (Mm M)fj |

m 0 according to (9).

lim m max g Span(Πm(f1),...,Πm(fk)) g 2=1

g, Mmg = lim m max g Span(f1,...,fk) g 2=1

= max g Span(f1,...,fk) g 2=1

= σ k(M). (10)

PHYSICS-INFORMED KERNEL LEARNING

Note that Span(Πm(f1), . . . , Πm(fk)) Hm. Moreover, for m large enough, we have that dim Span(Πm(f1), . . . , Πm(fk)) = k. Therefore, according to Theorem B.7,

σ k(Mm) = min H Hm dim H=k

max g H g 2=1

g, Mmg max g Span(Πm(f1),...,Πm(fk)) g 2=1

Combining this inequality with identity (10) shows that lim supm σ k(Mm) σ k(M). Equivalently, lim inf m σ k(M 1 m ) σ k(M 1).

Finally, by Lemma C.6, we have σ k(M 1 m ) σ k(M 1). We conclude that limm σ k(M 1 m ) = σ k(M 1).

Lemma C.8 (Eigenfunctions convergence) Let (fj,m)j N be the eigenvectors of Mm associated with the eigenvalues (σ j (Mm))j N . Let Ej = ker(M σ j (M)Id). Then

j N , lim m min y Ej fj,m y 2 = 0.

Proof Let (fj)j N be the eigenvectors of M associated with the eigenvalues (σ j (M))j N . The proof proceeds by contradiction. Assume that the lemma is false, and consider the minimum integer p N such that lim supm miny Ep fp,m y 2 > 0. Let k1 < p be the largest integer such that σ k1(Mm) < σ p(Mm), and let k2 > p be the smallest integer such that σ k2(Mm) >

σ p(Mm). Observe that Ep = Span(fk1+1, . . . , fk2 1). Let k1 < j < k2. We know that fj, Mmfj = P ℓ N σ j (Mm)| fℓ,m, fj |2 from diagonalizing Mm. The minimality assumption on j ensures that, for all ℓ k1, limm miny Eℓ fℓ,m y 2 = 0. Since Ej Span(f1, . . . , fk1), we deduce that ℓ k1, limm fℓ,m, fj = 0. Thus,

ℓ k1 | fℓ,m, fj |2 = 1 + om (1), (11)

k1 ℓ<k2 σ j (Mm)| fℓ,m, fj |2 = om (1). (12)

By Lemma C.7, using | fℓ,m, fj |2 1, we have

k1 ℓ k2 σ j (Mm)| fℓ,m, fj |2 = σ j (M) X

k1 ℓ<k2 | fℓ,m, fj |2 + om (1). (13)

Combining (12) and (13), we deduce that

fj, Mmfj = om (1) + σ j (M) X

k1 ℓ<k2 | fℓ,m, fj |2 + X

ℓ k2 σ j (Mm)| fℓ,m, fj |2.

DOUM ECHE, BACH, BIAU, AND BOYER

Moreover, identity (9) ensures that fj, Mmfj = σ j (M) + om (1). Thus,

σ j (M)(1 X

k1 ℓ<k2 | fℓ,m, fj |2) = om (1) + X

ℓ k2 σ j (Mm)| fℓ,m, fj |2.

However, according to Lemma C.7, there is ε > 0 such that, for m large enough,

ℓ k2, σ j (Mm) σ k(M) + ε.

σ j (M)(1 X

k1 ℓ<k2 | fℓ,m, fj |2) om (1) + (σ j (M) + ε) X

ℓ k2 | fℓ,m, fj |2.

Combining this inequality with (11), this means that

0 om (1) + ε X

ℓ k2 | fℓ,m, fj |2.

Thus, limm P ℓ k2 | fℓ,m, fj |2 = 0 and limm P k1 ℓ<k2 | fℓ,m, fj |2 = 1. We deduce that, for all k1 < j < k2, limm miny Span(fk1+1,m,...,fk2 1,m) fj y 2 = 0. By symmetry of the ℓ2-distance between two spaces of the same dimension k2 k1 1, for all k1 < j < k2, limm miny Span(fk1+1,...,fk2 1) fj,m y 2 = 0. This contradicts the fact that lim supm miny Ep fp,m y 2 > 0.

Lemma C.9 (Convergence of M 1 m ) One has

lim m |||M 1 M 1 m |||2 = 0.

Proof Let (fj,m)j N be the eigenvectors of Mm, each associated with the corresponding eigenvalues (σ j (Mm))j N . Let (fj)j N be the eigenvectors of M, each associated with the eigenvalues

(σ j (M))j N . By Lemma C.6, σ j (M 1 m ) σ j (M 1); by Lemma C.7, limm σ j (M 1 m ) =

σ j (M 1); and by Lemma C.8 limm miny Ej fj,m y 2 = 0.

Notice that M 1 m = P ℓ N σ j (M 1 m ) fj,m, fj,m and that M 1 = P ℓ N σ j (M 1) fj, fj. Let g L2([ 2L, 2L]d, C) be such that g 2 = 1. Then,

(M 1 M 1 m )g 2 X

j N σ j (M 1 m ) fj,m, g fj,m σ j (M 1) fj, g fj 2

j N (σ j (M 1 m ) σ j (M 1)) fj,m, g fj,m 2

j N σ j (M 1) fj,m fj, g fj,m 2

j N σ j (M 1) fj, g (fj,m fj) 2.

PHYSICS-INFORMED KERNEL LEARNING

Since fj,m 2 = fj 2 = 1, it follows that | fj,m, g | 1. Additionally, by the Cauchy-Schwarz inequality, | fj, g | 1 and | fj,m fj, g | (fj,m fj) 2. Thus, the above inequality can be simpliﬁed as

(M 1 M 1 m )g 2 X

j N (σ j (M 1 m ) σ j (M 1) + 2σ j (M 1) fj,m fj 2).

|||M 1 M 1 m |||2 X

j N (σ j (M 1 m ) σ j (M 1) + 2σ j (M 1) fj,m fj 2).

Clearly, since |σ j (M 1 m ) σ j (M 1)| 2σ j (M 1) and fj,m fj 2 2,

|σ j (M 1 m ) σ j (M 1) + 2σ j (M 1) fj,m fj 2| 4σ j (M 1).

Moreover, P j N σ j (M 1) < (Doum eche et al., 2024, Proposition B.6). Thus, since we have

that limm |σ j (M 1 m ) σ j (M 1)| = limm (fj,m fj) 2 = 0, we conclude with the dominated convergence theorem that limm |||M 1 M 1 m |||2 = 0, as desired.

C.6 Operator Norms of Cm and C

Lemma C.10 One has |||C|||2 1 and |||Cm|||2 1, for all m N .

Proof Let g L2([ 2L, 2L]d, C). Then, by deﬁnition, Cg = 1Ωg, and Cg 2 = 1Ωg 2 g 2. Therefore, |||C|||2 1. Let m N . Then, since Cm : L2([ 2L, 2L]d, C) L2([ 2L, 2L]d, C) is a positive Hermitian compact operator, Theorem B.7 states that

σ 1(Cm) = max h L2([ 2L,2L]d,C) h 2=1

h, Cmh = max h L2([ 2L,2L]d,C) h 2=1

Πmh 2 2 max h L2([ 2L,2L]d,C) h 2=1

Since σ 1(C2 m) = σ 1(Cm)2 1, we deduce that

1 σ 1(Cm)2 = max h L2([ 2L,2L]d,C) h 2=1

h, C2 mh = max h L2([ 2L,2L]d,C) h 2=1

This shows that |||Cm|||2 1.

DOUM ECHE, BACH, BIAU, AND BOYER

C.7 Proof of Theorem 3.1

Note that if H is a linear subspace of Hm, then Cm H is also a subspace of Hm, and dim H dim Cm H. Therefore,

σ k(Cm M 1 m Cm) = max H Hm dim H=k

min g H g 2=1

g, Cm M 1 m Cmg

= max H Hm dim H=k

min g H g 2=1

(Cmg), M 1 m (Cmg)

max H Hm dim H=k

min g H g 2=1

= σ k(M 1 m )

Moreover, according to Lemma C.9, one has |||M 1 m M 1|||2 0. Thus, sup g 2=1 ||(M 1 m M 1)(g)||2 0. Using Lemma C.10, we see that

Cm M 1 m Cmg CM 1Cg 2 (Cm C)M 1Cg 2 + Cm(M 1 m Cm M 1C)g 2 |||(Cm C)M 1|||2 + (M 1 m Cm M 1C)g 2 |||(Cm C)M 1|||2 + (M 1 m M 1)Cmg 2 + M 1(Cm C)g 2 |||(Cm C)M 1|||2 + |||M 1 m M 1|||2 + |||M 1(Cm C)|||2.

|||Cm M 1 m Cm CM 1C|||2 |||(Cm C)M 1|||2 +|||M 1 m M 1|||2 +|||M 1(Cm C)|||2.

By diagonalizing M 1 and using the facts that P ℓ N σ ℓ(M 1) < and that limm (Cm C)f 2 = 0 f L2([ 2L, 2L]d), it is easy to see that

lim m |||(Cm C)M 1|||2 = lim m |||M 1(Cm C)|||2 = 0.

Applying Lemma C.9, we deduce that

lim m |||Cm M 1 m Cm CM 1C|||2 = 0.

But, by Theorem B.7,

σ k(Cm M 1 m Cm) = max H L2([ 2L,2L]d,C) dim H=k

min g H g 2=1

g, Cm M 1 m Cmg ,

and σ k(CM 1C) = max H L2([ 2L,2L]d,C) dim H=k

min g H g 2=1

g, CM 1Cg .

PHYSICS-INFORMED KERNEL LEARNING

Clearly, for all g L2([ 2L, 2L]d, C),

| g, CM 1Cg g, Cm M 1 m Cmg | = | g, (CM 1C Cm M 1 m Cm)g |

|||Cm M 1 m Cm CM 1C|||2.

|σ k(Cm M 1 m Cm) σ k(CM 1C)| |||Cm M 1 m Cm CM 1C|||2,

and, in turn, limm σ k(Cm M 1 m Cm) = σ k(CM 1C). To conclude the proof, observe that, on the one hand,

1 + σ k(Cm M 1 m Cm) 1 = σ k(Cm M 1 m Cm)

1 + σ k(Cm M 1 m Cm) σ k(Cm M 1 m Cm) σ k(M 1),

with P k N σ k(M 1) < . On the other hand,

1 + σ k(Cm M 1 m Cm) 1 = 1

1 + σ k(CM 1C) 1 ,

by continuity on R+ of the function x 7 x 1+x. Thus, applying the dominated convergence theorem, we are led to lim m

λ σ(Cm M 1 m Cm)

1 1 + λ 1 = X

1 1 + λ 1 .

Appendix D. Experiments

D.1 Numerical Precision

Enabling high numerical precision is crucial for efﬁcient kernel inversion. Setting the default precision to Float32 and Complex64 can lead to signiﬁcant numerical errors when approximating the kernel. For example, consider the harmonic oscillator case with d = 1, s = 2, and the operator D = d2 dx2 u + d dxu + u, with λn = 0.01 and µn = 1. Figure 10 (left) shows the spectrum of Cm M 1 m Cm using Float32 precision, while Figure 10 (right) shows the same spectrum with Float64 precision. It is evident that with Float32, the diagonalization results in lower eigenvalues compared to Float64. In more physical terms, some energy of the matrix is lost when the last digits of the matrix coefﬁcients are ignored. This leads to a problematic interpretation of the situation, as the Float64 estimation of the eigenvalues shows a clear convergence of the spectrum of Cm M 1 m Cm, whereas the Float32 estimation appears to indicate divergence.

D.2 Convergence of the Effective Dimension Approximation

The PIKL algorithm relies on a Fourier approximation of the PIML kernel, as developed in Section 2. The precision of this approximation is determined by the number m of Fourier modes used to compute the kernel. However, determining an optimal value of m for a speciﬁc regression problem is challenging. There is a trade-off between the accuracy of the kernel estimation, which improves with higher values of m, and the computational complexity of the algorithm, which also

DOUM ECHE, BACH, BIAU, AND BOYER

0 1 2 3 4 log10 (index)

log10 (Eigenvalue)

2m + 1 = 100.5

2m + 1 = 101.0

2m + 1 = 101.5

2m + 1 = 102.0

2m + 1 = 102.5

2m + 1 = 103.0

2m + 1 = 103.5

2m + 1 = 104.0

0 1 2 3 4 log10 (index)

log10 (Eigenvalue)

2m + 1 = 100.5

2m + 1 = 101.0

2m + 1 = 101.5

2m + 1 = 102.0

2m + 1 = 102.5

2m + 1 = 103.0

2m + 1 = 103.5

2m + 1 = 104.0

Figure 10: Spectrum of Cm M 1 m Cm. Left: Float32 precision. Right: Float64 precision.

1 2 3 4 log10 (2m+1)

log10 (Effective dimension)

Figure 11: Convergence of the effective dimension as m grows for D = d dx.

increases with m. There is a trade-off between the accuracy of the kernel approximation, which improves with higher values of m, and the computational complexity of the algorithm, which also increases with m.

An interesting tool to leverage here is the effective dimension, as it captures the underlying degrees of freedom of the PIML problem and, consequently, the precision of the method. Theorem 3.1 states that the estimation of the effective dimension on Hm converges to the effective dimension on Hs(Ω) as m increases to inﬁnity. Therefore, the smallest value m at which the effective dimension stabilizes is a strong candidate for balancing accuracy and computational complexity.

Figures 11, 12, and 13 illustrate the convergence of the effective dimension estimation, using the eigenvalues of Cm M 1 M Cm, as m increases, for different values of n. These ﬁgures provide insights into the PIK algorithm. As expected, the Fourier approximations converge more slowly as the dimension d increases. Speciﬁcally, Figures 11 and 12 show that in dimension d = 1, m 102

for n 104, while Figure 13 indicates that in dimension d = 2, m 102 for n 103.

PHYSICS-INFORMED KERNEL LEARNING

1 2 3 4 log10 (2m+1)

log10 (Effective dimension)

Figure 12: Convergence of the effective dimension as m grows for the harmonic oscillator

0 1 2 3 4 2log10 (2m+1)

log10 (Effective dimension)

Figure 13: Convergence of the effective dimension as m grows for the heat equation on the disk.

DOUM ECHE, BACH, BIAU, AND BOYER

D.3 Numerical Schemes

We detail below the numerical schemes used as benchmarks in Section 4 for solving the wave equation. All these numerical schemes are constructed by discretizing the domain Ω= [0, 1]2

into the grid (ℓ 1 1 Z/ℓ1Z) (ℓ 1 2 Z/ℓ2Z). The initial and boundary conditions are then enforced on n = 2ℓ1 + ℓ2 points, with the approximation ˆfn deﬁned accordingly as

for all 0 ℓ ℓ2, ˆfn(0, ℓ/ℓ2) = sin(πℓ/ℓ2) + sin(4πℓ/ℓ2)/2,

for all 0 ℓ ℓ1, ˆfn(ℓ/ℓ1, 0) = 0,

for all 0 ℓ ℓ1, ˆfn(ℓ/ℓ1, 1) = 0.

Let the discrete Laplacian (ℓ1,ℓ2) be deﬁned for all (a, b) Z/ℓ1Z Z/ℓ2Z by

( (ℓ1,ℓ2) ˆfn)(a/ℓ1, b/ℓ2) = ℓ2 2( ˆfn(a/ℓ1, (b + 1)/ℓ2) 2 ˆfn(a/ℓ1, b/ℓ2) + ˆfn(a/ℓ1, (b 1)/ℓ2)).

If f C2([0, 1]2), its Taylor expansion leads to ( (ℓ1,ℓ2)f )(a/ℓ1, b/ℓ2) = 2 x,xf (a/ℓ1, b/ℓ2) + oℓ2 0(1). Similarly, let the second-order time partial derivative operative 2 t,t,(ℓ1,ℓ2) be deﬁned for all (a, b) Z/ℓ1Z Z/ℓ2Z by

( 2 t,t,(ℓ1,ℓ2) ˆfn)(a/ℓ1, b/ℓ2) = ℓ2 2( ˆfn((a + 1)/ℓ1, b/ℓ2) 2 ˆfn(a/ℓ1, b/ℓ2) + ˆfn((a 1)/ℓ1, b/ℓ2)).

Euler explicit. The Euler explicit scheme is initialized using the Taylor expansion f(t, x) = f(0, x) + t tf(0, x) + t2 2 t,tf(0, x)/2 + ot 0(t2). With the initial condition tf(0, x) = 0 and the wave equation 2 t,tf(0, x) = 4 2 x,xf(0, x), this simpliﬁes to f(t, x) = f(0, x) + 2t2 2 x,xf(0, x) + ot 0(t2). This leads to the initialization

0 b ℓ2, ˆfn(1/ℓ1, b/ℓ2) = ˆfn(0, b/ℓ2) + 2ℓ 2 1 ( (ℓ1,ℓ2) ˆfn)(0, b/ℓ2).

The wave equation 2 t,tf = 4 2 x,xf can then be discretized as 2 t,t,(ℓ1,ℓ2) ˆfn = 4 (ℓ1,ℓ2) ˆfn. This leads to the explicit Euler recursive formula

ˆfn((a + 1)/ℓ1, b/ℓ2) = 2 ˆfn(a/ℓ1, b/ℓ2) ˆfn((a 1)/ℓ1, b/ℓ2) + 4ℓ 2 1 ( (ℓ1,ℓ2) ˆfn)(a/ℓ1, b/ℓ2).

This formula allows to compute ˆfn((a + 1)/ℓ1, ) given the values of ˆfn(0, ), . . . , ˆfn(a/ℓ1, ).

Runge-Kutta 4. The RK4 scheme is a numerical scheme applied on both f and its derivative tf . Here, ˆgn represents the approximation of tf . The initial condition tf(0, ) = 0 translates into

0 b ℓ2, ˆgn(0, b/ℓ2) = 0.

To infer ˆfn((a + 1)/ℓ1, ) and ˆgn((a + 1)/ℓ1, ) given the values of ˆfn(0, ), . . . , ˆfn(a/ℓ1, ) and ˆgn(0, ), . . . , ˆgn(a/ℓ1, ), the RK4 scheme introduces intermediate estimates as follows:

f1 = ˆgn(a/ℓ1, )/ℓ1,

g1 = 4( (ℓ1,ℓ2) ˆfn)(a/ℓ1, )/ℓ1,

f2 = (ˆgn(a/ℓ1, ) + 0.5 f1)/ℓ1,

PHYSICS-INFORMED KERNEL LEARNING

g2 = 4(0.5 g1 + ( (ℓ1,ℓ2) ˆfn)(a/ℓ1, ))/ℓ1,

f3 = (ˆgn(a/ℓ1, ) + 0.5 f2)/ℓ1,

g3 = 4(0.5 g2 + ( (ℓ1,ℓ2) ˆfn)(a/ℓ1, ))/ℓ1,

f4 = (ˆgn(a/ℓ1, ) + f3)/ℓ1,

g4 = 4( g3 + ( (ℓ1,ℓ2) ˆfn)(a/ℓ1, ))/ℓ1,

ˆfn((a + 1)/ℓ1, ) = ˆfn(a/ℓ1, ) + ( f1 + 2 f2 + 2 f3 + f4)/6,

ˆgn((a + 1)/ℓ1, ) = ˆgn(a/ℓ1, ) + ( g1 + 2 g2 + 2 g3 + g4)/6.

Similarly to the Euler explicit scheme, the RK4 relies on a recursive formulas to compute ˆfn.

Crank-Nicolson. The CN scheme is an implicit scheme deﬁned as follows. Similar to the Euler explicit scheme, the tf(0, ) = 0 initial condition is implemented as

0 ℓ ℓ2, ˆfn(1/ℓ1, ℓ/ℓ2) = ˆfn(0, ℓ/ℓ2) + 2ℓ 2 1 ( (ℓ1,ℓ2) ˆfn)(0, ℓ/ℓ2).

Then, the recursive formula of this scheme takes the form

2 t,t,(ℓ1,ℓ2) ˆfn(a/ℓ1, ) = 2(( (ℓ1,ℓ2) ˆfn)(a/ℓ1, ) + ( (ℓ1,ℓ2) ˆfn)((a + 1)/ℓ1, )).

This leads to the recursion

ˆfn((a + 1)/ℓ1, ) = (Id + 2ℓ2 2ℓ 2 1 ) 1(3 ˆfn(a/ℓ1, ) ˆfn((a 1)/ℓ1, )) ˆfn((a 1)/ℓ1, ),

2 1 0 . . . 0

1 2 1 ... ...

0 1 2 ... 0 ... ... ... ... 1 0 . . . 0 1 2

is the discrete Laplacian matrix.

D.4 PINN Training

Figures 14 and 15 illustrate the performance of the PINNs during training while solving the 1d wave equation with noisy boundary conditions.

DOUM ECHE, BACH, BIAU, AND BOYER

0.0 0.2 0.4 0.6 0.8 1.0 x1

Exact u(t, x)

0.0 0.2 0.4 0.6 0.8 1.0 t

Predicted u(t, x)

0.0 0.2 0.4 0.6 0.8 1.0 t

Absolute error

0.0 0.2 0.4 0.6 0.8 1.0 t

Exact r(t, x)

0.0 0.2 0.4 0.6 0.8 1.0 t

Predicted r(t, x)

0.0 0.2 0.4 0.6 0.8 1.0 t

Absolute error

Figure 14: Left: Ground truth. Middle: PINN estimator. Right: Error = PINN - Ground truth.

0 100 200 300 400 500 600 700 800 iterations

Figure 15: PINN training with noisy boundary conditions.

PHYSICS-INFORMED KERNEL LEARNING

R. Agharafeie, J. R. C. Ramos, J. M. Mendes, and R. Oliveira. From shallow to deep bioprocess hybrid modeling: Advances and future perspectives. Fermentation, 9, 2023.

E. Arnone, A. Kneip, F. Nobile, and L. M. Sangalli. Some ﬁrst results on the consistency of spatial regression with partial differential equation regularization. Statistica Sinica, 32:209 238, 2022.

A. Arzani, J.-X. Wang, and R. M. D Souza. Uncovering near-wall blood ﬂow from sparse data with physics-informed neural networks. Physics of Fluids, 33:071905, 2021.

F. Bach. Learning Theory from First Principles. MIT Press, 2024.

L. Badea and P. Daripa. On a boundary control approach to domain embedding methods. SIAM

Journal on Control and Optimization, 40:421 449, 2001.

P. Batlle, Y. Chen, B. Hosseini, H. Owhadi, and A. M. Stuart. Error analysis of kernel/GP methods for nonlinear and parametric PDEs. Journal of Computational Physics, 520:113488, 2025.

A. Bonfanti, G. Bruno, and C. Cipriani. The challenges of the nonlinear regime for physics-informed neural networks. In A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak, and C. Zhang, editors, Advances in Neural Information Processing Systems, volume 37, pages 41852 41881. Curran Associates, Inc., 2024.

A. Bonito, R. De Vore, G. Petrova, and J. W. Siegel. Convergence and error control of consistent PINNs for elliptic PDEs. IMA Journal of Numerical Analysis, draf008, 2025.

R. Bracewell. The Fourier Transform and its Applications. Electrical Engineering series. Mc Graw Hill International Editions, Boston, 3 edition, 2000.

H. Brezis. Functional Analysis, Sobolev Spaces and Partial Differential Equations. Springer, New York, 2010.

S. Cai, Z. Wang, S. Wang, P. Perdikaris, and G. E. Karniadakis. Physics-informed neural networks for heat transfer problems. Journal of Heat Transfer, 143:060801, 2021.

C. Canuto, A. Quarteroni, M. Y. Hussaini, and T. A. Zang. Spectral Methods. Scientiﬁc Computation. Springer, Berlin, 1 edition, 2007.

A. Caponnetto and E. D. Vito. Optimal rates for the regularized least-squares algorithm. Foundations of Computational Mathematics, 7:331 368, 2007.

Y. Chen, B. Hosseini, H. Owhadi, and A. M. Stuart. Solving and learning nonlinear PDEs with Gaussian processes. Journal of Computational Physics, 447:110668, 2021.

S. Cuomo, V. S. Di Cola, F. Giampaolo, G. Rozza, M. Raissi, and F. Piccialli. Scientiﬁc machine learning through physics-informed neural networks: Where we are and what s next. Journal of Scientiﬁc Computing, 92:88, 2022.

M. Deshpande, S. Agarwal, and A. K. Bhattacharya. Investigations on convergence behaviour of physics informed neural networks across spectral ranges and derivative orders. In 2022 IEEE Symposium Series on Computational Intelligence (SSCI), pages 1172 1179, 2022.

DOUM ECHE, BACH, BIAU, AND BOYER

N. Doum eche, F. Bach, G. Biau, and C. Boyer. Physics-informed machine learning as a kernel method. In S. Agrawal and A. Roth, editors, Proceedings of Thirty Seventh Conference on Learning Theory, volume 247 of Proceedings of Machine Learning Research, pages 1399 1450. PMLR, 2024.

N. Doum eche, G. Biau, and C. Boyer. Convergence and error analysis of PINNs. Bernoulli, 31: 2127 2151, 2025.

L. Evans. Partial Differential Equations, volume 19 of Graduate Studies in Mathematics. American Mathematical Society, Providence, 2nd edition, 2010.

G. E. Karniadakis, I. G. Kevrekidis, L. Lu, S. Wang, P. Perdikaris, and L. Yang. Physics-informed machine learning. Nature Reviews Physics, 3:422 440, 2021.

A. S. Krishnapriyan, A. Gholami, S. Zhe, R. M. Kirby, and M. W. Mahoney. Characterizing possible failure modes in physics-informed neural networks. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P. Liang, and J. W. Vaughan, editors, Advances in Neural Information Processing Systems, volume 34, pages 26548 26560. Curran Associates, Inc., 2021.

S. Kurz, H. De Gersem, A. Galetzka, A. Klaedtke, M. Liebsch, D. Loukrezis, S. Russenschuck, and M. Schmidt. Hybrid modeling: Towards the next level of scientiﬁc computing in engineering. Journal of Mathematics in Industry, 12, 2022.

S. Lui. Spectral domain embedding for elliptic PDEs in complex domains. Journal of Computational

and Applied Mathematics, 225:541 557, 2009.

R. Matthysen and D. Huybrechs. Fast algorithms for the computation of Fourier extensions of arbitrary length. SIAM Journal on Scientiﬁc Computing, 38:A899 A922, 2016.

N. Mc Greivy and A. Hakim. Weak baselines and reporting biases lead to overoptimism in machine learning for ﬂuid-related partial differential equations. Nature Machine Intelligence, 6:1256 1269, 2024.

B. Meuris, S. Qadeer, and P. Stinis. Machine-learning-based spectral methods for partial differential equations. Scientiﬁc Reports, 13:1739, 2023.

S. Mishra and R. Molinaro. Estimates on the generalization error of physics-informed neural networks for approximating PDEs. IMA Journal of Numerical Analysis, 43:1 43, 2023.

N. H. Nelsen and A. M. Stuart. Operator learning using random features: A tool for scientiﬁc computing. SIAM Review, 66:535 571, 2024.

A. Rahimi and B. Recht. Random features for large-scale kernel machines. In J. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems, volume 20. Curran Associates, Inc., 2007.

M. Raissi, P. Perdikaris, and G. Karniadakis. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics, 378:686 707, 2019.

PHYSICS-INFORMED KERNEL LEARNING

P. Rathore, W. Lei, Z. Frangella, L. Lu, and M. Udell. Challenges in training PINNs: A loss landscape perspective. In Proceedings of the 41st International Conference on Machine Learning, volume 235, pages 42159 42191. PMLR, 2024.

W. Rudin. Functional Analysis. International series in Pure and Applied Mathematics. Mc Graw Hill, New-York, 2 edition, 1991.

R. Schaback and H. Wendland. Kernel techniques: From machine learning to meshless methods.

Acta Numerica, 15:543 639, 2006.

Y. Shin, J. Darbon, and G. E. Karniadakis. On the convergence of physics informed neural networks for linear second-order elliptic and parabolic type PDEs. Communications in Computational Physics, 28:2042 2074, 2020.

Y. Shin, Z. Zhang, and G. E. Karniadakis. Error estimates of residual minimization using neural networks for linear PDEs. Journal of Machine Learning for Modeling and Computing, 4:73 101, 2023.

A. Tsybakov. Introduction to Nonparametric Estimation. Springer, New York, 2009.

S. Wang, X. Yu, and P. Perdikaris. When and why PINNs fail to train: A neural tangent kernel perspective. Journal of Computational Physics, 449:110768, 2022a.

Y. Wang, C.-Y. Lai, J. G omez-Serrano, and T. Buckmaster. Asymptotic self-similar blow-up proﬁle for three-dimensional axisymmetric euler equations using neural networks. Phys. Rev. Lett., 130: 244002, 2023.

Z. Wang, W. Xing, R. Kirby, and S. Zhe. Physics informed deep kernel learning. In Proceedings

of the 25th International Conference on Artiﬁcial Intelligence and Statistics, volume 151, pages 1206 1218. PMLR, 2022b.

T. Yang, Y.-F. Li, M. Mahdavi, R. Jin, and Z.-H. Zhou. Nystr om method vs random Fourier features: A theoretical and empirical comparison. In F. Pereira, C. Burges, L. Bottou, and K. Weinberger, editors, Advances in Neural Information Processing Systems, volume 25. Curran Associates, Inc., 2012.