# an_analysis_of_ermakovzolotukhin_quadrature_using_kernels__187e7b68.pdf

An analysis of Ermakov-Zolotukhin quadrature using kernels

Ayoub Belhadji Univ Lyon, ENS de Lyon Inria, CNRS, UCBL LIP UMR 5668, Lyon, France ayoub.belhadji@ens-lyon.fr

We study a quadrature, proposed by Ermakov and Zolotukhin in the sixties, through the lens of kernel methods. The nodes of this quadrature rule follow the distribution of a determinantal point process, while the weights are deﬁned through a linear system, similarly to the optimal kernel quadrature. In this work, we show how these two classes of quadrature are related, and we prove a tractable formula of the expected value of the squared worst-case integration error on the unit ball of an RKHS of the former quadrature. In particular, this formula involves the eigenvalues of the corresponding kernel and leads to improving on the existing theoretical guarantees of the optimal kernel quadrature with determinantal point processes.

1 Introduction

Integrals appear in many scientiﬁc ﬁelds as quantities of interest per se. For example, in statistics, they represent expectations [27], while in mathematical ﬁnance, they represent the prices of ﬁnancial products [17]. Unfortunately, integrals that can be written in closed form are exceptional. In general, their values are only known through approximations. For this reason, numerical integration is at the heart of many tasks in applied mathematics and statistics. Among all the possible approximation schemes, quadratures are the most practical since they approximate the integral of a function by a ﬁnite mixture of its evaluations. In this work, we focus on quadrature rules that take the form Z

X f(x)g(x)dω(x) X

i [N] wif(xi), (1)

where the nodes xi are independent of f and g, while the weights wi depend only on g. The nodes and the weights of a quadrature may be seen as degrees of freedom that the practitioner may tune in order to achieve a given level of approximation error. The design of quadratures gave birth to a rich literature from Gaussian quadrature [15] to Monte Carlo methods [25] to quadratures based on determinantal point processes (DPPs) [1]. These latter form a large class of probabilistic models of repulsive random subsets that make numerical integration possible in a variety of domains with strong theoretical guarantees. In particular, central limit theorems with asymptotic convergence rates that scale better than the typical Monte Carlo rate O(N 1/2) were proven for several DPP based quadratures: when the integrand is a C1 function [1] or even when the integrand is non-differentiable [10]. Moreover, it is possible to design quadrature rules based on DPPs with non-asymptotic guarantees and with rates of convergence that adapt to the smoothness of the integrand. This is the case of the quadrature proposed by Ermakov and Zolotukhin in [14] and recently revisited in [16], and the optimal kernel quadrature [3].

In this work, we study the quadrature rule proposed by Ermakov and Zolotukhin (EZQ) through the lens of kernel methods. We start by comparing the weights of EZQ to the weights of the optimal

35th Conference on Neural Information Processing Systems (Neur IPS 2021).

kernel quadrature (OKQ), and we prove that they both belong to a broader class of quadrature rules that we call kernel based interpolation quadrature. Then, we study the approximation quality of EZQ in reproducing kernel Hilbert spaces (RKHSs). This is done by proving a general tractable formula of the expected value of the squared worst-case integration error for functions that belong to the unit ball of an RKHS when the nodes follow the distribution of a determinantal point process. This formula involves principally the eigenvalues of the integral operator, and converges to 0 at a slightly slower rate than the optimal rate. Interestingly, this analysis yields a better upper bound for the optimal kernel quadrature with DPPs proposed initially in [3]. Comparably to the theoretical guarantees given in [14, 16], our theoretical guarantees are independent of the choice of the test function. This facilitates the comparison of EZQ with other quadratures such as OKQ.

The rest of the article is organized as follows. Section 2 reviews the work of [14] and recall key concepts on kernel based quadrature. In Section 3, we present the main results of this work and their consequences. A sketch of the proof of the main theorem is given in Section 4. We illustrate the theoretical results by numerical experiments in Section 5. Finally, we give a conclusion in Section 6.

Notation and assumptions. We use the notation N = N {0}. We denote by ω a Borel measure supported on X, and we denote by L2(ω) the Hilbert space of square integrable real-valued functions on X with respect to ω, equipped with the inner product , ω, and the associated norm . ω. For N N , we denote by ω N the tensor product of ω deﬁned on X N. Moreover, we denote by F the RKHS associated to the kernel k : X X R that we assume to be continuous and satisfying the condition R

X k(x, x)dω(x) < + . In particular, we assume the Mercer decomposition

k(x, y) = X

m N σmφm(x)φm(y), (2)

to hold, where the convergence is pointwise, and σm and φm are the corresponding eigenvalues and eigenfunctions of the integral operator Σ deﬁned for f L2(ω) by

X k( , y)f(y)dω(y). (3)

We assume that the sequence σ = (σm)m N is non-increasing and its elements are non-vanishing, and we assume that the corresponding eigenfunctions φm are continuous. Note that the φm are normalized: φm ω = 1 for m N . In particular, (φm)m N is an o.n.b. of L2(ω), and every element f F satisﬁes X

f, φm 2 ω σm < + . (4)

Moreover, for every N N , we denote by EN the eigen-subspace of L2(ω) spanned by φ1, . . . , φN. For any kernel κ : X X R, and for x X N, we deﬁne the kernel matrix κ(x) := (κ(xi, xj))i,j [N] RN N. Finally, we denote in bold fonts the corresponding kernel matrices: K(x) for the kernel k, KN(x) for the kernel k N, K N(x) for the kernel k N, κ(x) for the kernel κ... Similarly, for any function µ : X R and for x X N, we deﬁne the vector of evaluations µ(x) := (µ(xi))i [N] RN.

2 Related work

In the section, we review some results that are relevant to the contribution.

2.1 Ermakov-Zolotukhin quadrature

The quadrature rule proposed by Ermakov and Zolotukhin in [14] deals with integrals that write Z

X f(x)φm(x)dω(x), (5)

where f L2(ω), and (φm)m N is an orthonormal family with respect to the measure ω. Its construction goes as follows. Let N N and let x X N such that the matrix

ΦN(x) := (φn(xi))(n,i) [N] [N]

is non-singular. For n [N], deﬁne

IEZ,n(f) = X

i [N] ˆw EZ,n i f(xi), (6)

where ˆw EZ,n := ( ˆw EZ,n i )i [N] RN is given by ˆw EZ,n = ΦN(x) 1en, with en is the vector of RN with the n-th coordinate is 1 and the rest are 0 1. We can prove easily that this quadrature is exact

f Span(φn)n [N], X

i [N] ˆw EZ,n i f(xi) = Z

X f(x)φn(x) dω(x). (7)

A quadrature rule that satisﬁes (7) is typically called an interpolatory quadrature rule.

Now, if f / Span(φn)n [N], the authors studied the expected value and the variance of IEZ,n(f) when x = (x1, . . . , x N) is taken to be a random variable in X N that follows the distribution of density

p DPP(x1, . . . , x N) := 1

N! Det2 ΦN(x), (8)

with respect to the product measure ω N deﬁned on X N. As it was observed in [16], the nodes of the quadrature follow the distribution of the determinantal point process of reference measure ω and marginal kernel κN deﬁned by κN(x, y) = P n [N] φn(x)φn(y). We refer to [20] for further details on determinantal point processes. Now, we recall the main result of [14]. Theorem 1. Let x be a random subset of X that follows the distribution of DPP of kernel κN and reference measure ω. Let f L2(ω), and n [N]. Then

EDPPIEZ,n(f) = Z

X f(x)φn(x)dω(x), (9)

and VDPPIEZ,n(f) = X

m N+1 f, φm 2 ω. (10)

Theorem 1 shows that the IEZ,n(f) is an unbiased estimator of R

X f(x)φn(x)dω(x), and its variance depends on the coefﬁcients f, φm ω for m N + 1. Consequently, the expected squared error of the quadrature is equal to the variance of IEZ,n(f) and it is given by

X f(x)φn(x)dω(x) X

i [N] ˆw EZ,n i f(xi)

m N+1 f, φm 2 ω. (11)

The identity (11) gives a theoretical guarantee for an interpolatory quadrature rule when the nodes follow the distribution of the DPP deﬁned by (8). Compared to existing work on the literature [11, 22, 23, 24], (11) is generic and applies to any orthonormal family.

Now, we may observe that the expected squared error in (11) depends strongly on the function f. This makes the comparison between EZQ and other quadratures, based on some test function f, tricky: the choice of f may favor (or disfavor) EZQ. In order to circumvent this difﬁculty, we suggest to study a ﬁgure of merit that is independent of the choice of the function f. This is possible using kernels through the study of the worst-case integration error on the unit ball of an RKHS. The deﬁnition of this quantity will be recalled in the following section.

2.2 The worst integration error in kernel quadrature

The use of the kernel framework in the context of numerical integration can be tracked back to the work of Hickernell [18, 19], who introduced the use of kernels to the quasi Monte Carlo community. Their use was popularized in the machine learning community by [28, 9]. In this framework, the quality of a quadrature is assessed by the worst-case integration error on the unit ball of an RKHS F associated to some kernel k : X X R+. This quantity is deﬁned as follows

sup f F, f F 1

f(x)g(x)dω(x) X

i [N] wif(xi) . (12)

1The dependency of the vector ˆw EZ,n on x was dropped for simplicity.

This quantity reﬂects how good is the quadrature uniformly on the unit ball of F. Interestingly, this quantity has a closed formula µg X

i [N] wik(xi, .) F , (13)

where µg = Σg is the so-called embedding of g in the RKHS F. We shall use in Section 3.2 the equivalent expression (13) of the worst-case integration error, to derive a closed formula of

EDPP sup f F, f F 1

f(x)g(x)dω(x) X

i [N] ˆw EZ,n i f(xi) 2 . (14)

By now, we observe that the weights ˆw EZ,n i of EZQ are non-optimal in the sense that they do not minimize (13). By deﬁnition, the optimal kernel quadrature for a given conﬁguration x, such that the kernel matrix K(x) is non-singular, is the quadrature with nodes the xi and weights the ˆwi that minimize (13). We specify in Section 3.1 the subtle difference between the quadrature of Ermakov and Zolotukhin and the optimal kernel quadrature. Before that, we review the existing constructions of the optimal kernel quadrature in the following section.

2.3 The design of the optimal kernel quadrature

The optimal kernel quadrature may be calculated numerically under the assumption that the matrix K(x) is non-singular. Indeed, for a given conﬁguration of nodes x X N, the square of (13) is quadratic on w and have a unique solution given by ˆw OKQ,g = K(x) 1µg(x) 2. In particular, the optimal mixture P

i [N] ˆw OKQ,g i k(xi, .) takes the same values as µg on the nodes xi: the optimal mixture interpolates the function µg on the conﬁguration of nodes x. At this level, x is still a degree of freedom and need to be designed. This task was tackled by different approaches. One approach consists on using adhoc designs for which a theoretical analysis of the convergence rate is possible. This is the case of, inter alia, the uniform grid in the periodic Sobolev space [6, 26], higher-order digital nets sequences in tensor products of Sobolev spaces [8], or tensor product of scaled Hermite roots in the RKHS deﬁned by the Gaussian kernel [22]. Another approach consists on using a sequential algorithm to build up the conﬁguration x [12, 13, 21, 7]. In general, each step of these greedy algorithms requires to solve a non-convex problem and costly approximations must be employed. Alternatively, random designs, based on determinantal point processes and their mixtures [3, 4], were shown to have strong theoretical guarantees and competitive empirical performances. More precisely, it was shown that if x follows the distribution of DPP of reference measure ω and marginal kernel κN, and if g L2(ω) such that g ω 1, then

i [N] ˆw OKQ,g i k(xi, .)

F 2σN+1 + 2 X

n [N] | g, φn ω| 2 Nr N, (15)

where r N = P

m N+1 σm [2] (Theorem 4.8). However, numerical simulations suggest that the l.h.s. of (15) converges to 0 at the faster rate O(σN+1), which corresponds to the best achievable rate according to [4]. This optimal rate was proved to be achieved, under some mild conditions on the eigenvalues σn, using the distribution of continuous volume sampling (CVS) [4]. This distribution is a mixture of determinantal point processes and is closely tied to the projection DPP used in [3] and comes with the following guarantee

g L2(ω), ECVS

i [N] ˆw OKQ,g i k(xi, .)

m N g, φn 2 ωϵm(N), (16)

where ϵm(N) = O(σN+1) for every m N , so that the expected squared worst-integration error of OKQ under the continuous volume sampling distribution scales as O(σN+1) for every g L2(ω).

2The dependency of the vector ˆw OKQ,g on x was dropped for simplicity.

3 Main results

This section gathers the main contributions of this article. In Section 3.1, we prove that both EZQ and OKQ belong to a larger class of quadrature rules called kernel-based interpolation quadrature (KBIQ). In Section 3.2, we prove a close formula of the expected squared worst-case integration error of EZQ. In Section 3.3, we use Theorem 3 to improve on the existing theoretical guarantees of OKQ with DPPs.

3.1 Kernel-based interpolation quadrature

In this section, we deﬁne a new class of quadrature rules that extends both Ermakov-Zolotukhin quadrature and the optimal kernel quadrature. We start by the following observation: the weights ˆw EZ,n i (x) of EZQ, deﬁned in (6), writes as

ˆw EZ,n(x) = ΦN(x) 1en. (17)

By observing that φn(x) = ΦN(x) en, and κN(x) = ΦN(x) ΦN(x), we prove that

ˆw EZ,n(x) = κN(x) 1φn(x), (18)

Equivalently, we have φn(x) = κN(x) ˆw EZ,n(x). In other words, P

i [N] ˆw EZ,n i (x)κN(xi, .) takes the same values as φn on the nodes xi: ˆw EZ,n(x) is the vector resulting of the interpolation of φn by the kernel κN. From this observation, we deﬁne kernel-based interpolation quadrature as an extension of EZQ as follows: let γ := (γm)m N be a sequence of positive real numbers, and let M N {+ }. Deﬁne the kernel κγ,M on X X by

x, y X, κγ,M(x, y) =

m=1 γmφm(x)φm(y). (19)

Now, starting from a conﬁguration x X N such that Det κN(x) > 0, we have Det κγ,M(x) > 0 3, and for a given g L2(ω), we deﬁne the vector of weights ˆwγ,M,g(x) RN by

ˆwγ,M,g(x) = κγ,M(x) 1µγ,M g (x), (20)

µγ,M g (x) =

m=1 γm g, φm ωφm(x). (21)

We check again that P

i [N] ˆwγ,M,g i κγ,M(xi, .) takes the same values as µγ,M g on the nodes xi: the mixture interpolates µγ,M g on the nodes xi. Now, for a given g L2(ω), the vector of weights ˆwγ,M,g(x) have two degrees of freedom: the sequence γ and the rank of the kernel M. These degrees of freedom may be mixed in a variety of ways to cover a large class of quadrature rules. In particular, we show in Section 3.1.1 that, for any sequence γ, KBIQ is equivalent to EZQ when M = N, and we show in Section 3.1.2 that KBIQ is equivalent to OKQ when M = + and γ = σ; as summarized in Table 1. We may also consider M to be ﬁnite but strictly larger than N. Yet, the theoretical analysis of these intermediate quadrature rules is beyond the scope of this work.

Quadrature M γ µγ,M g κγ,M

EZQ N Any P

n [N] γn g, φn ωφn κγ,N

EZQ N γm = 1 g N := P n [N] g, φn ωφn κN . . . . . . . . . . . . . . . OKQ + σ µg k Table 1: An overview of some examples of KBIQ with the corresponding couples (κγ,M,µγ,M g ).

3See Appendix D.1. in [3] for a proof.

3.1.1 EZQ is a special case of KBIQ

We recover EZQ, as deﬁned in [14, 16], by taking M = N, and γ is deﬁned by γm = 1 for every m N , and g φn for some n [N]. The equivalent deﬁnition (20) extends EZQ to the situation when g / EN. Even better, we show in the following that ˆwγ,N,g(x) is independent of γ when M = N. In particular, for any sequence of positive numbers γ we have

n [N], ˆwγ,N,φn(x) = ˆw EZ,n(x). (22)

Proposition 2. Let g EN, and let x X N such that Det κN(x) > 0. Let γ = (γm)m N and γ = ( γm)m N be two sequences of positive numbers. We have

ˆwγ,N,g(x) = ˆw γ,N,g(x). (23)

Thanks to the invariance of ˆwγ,N,g with respect to γ, we simplify the notation and we write ˆw EZ,g instead 4. Moreover, using this invariance, EZQ may be seen as an approximation of OKQ when g EN. Indeed, by approximating the kernel matrix K(x) KN(x) where k N(x, y) = P

n [N] σnφn(x)φn(y), we have

K(x) 1µg(x) ˆw EZ,g, (24)

since KN(x) 1µg(x) = ˆw EZ,g by Proposition 2. Interestingly, this approximation is reminiscent to the one used in [23] in the case of the Gaussian kernel.

3.1.2 OKQ is a special case of KBIQ

The optimal kernel quadrature is a special case of KBIQ when M = + and γ = σ. Indeed, in this case, we have κγ,M = k, and µγ,M g = µg, so that

ˆwσ,M,g(x) = K(x) 1µg(x) = ˆw OKQ,g. (25)

In other words, Ermakov-Zolotukhin quadrature and the optimal kernel quadrature are extreme instances of interpolation based kernel quadrature that correspond to the regimes M = N and M = + . As it was shown in Proposition 2, the weights of EZQ depend only on the eigenfunctions φm and do not depend on the eigenvalues σm. This is to be compared to the weights of OKQ that depend simultaneously on the eigenvalues and the eigenfunctions.

3.2 Main theorem

We give in this section the theoretical analysis of the worst case integration error of EZQ under the distribution of the projection DPP.

Theorem 3. Let N N . We have

g EN, EDPP µg X

i [N] ˆw EZ,g i k(xi, .) 2 F = X

n [N] g, φn 2 ωr N, (26)

where r N = P

m N+1 σm. Moreover,

g L2(ω), EDPP µg X

i [N] ˆw EZ,g i k(xi, .) 2 F 4 g 2 ωr N. (27)

As an immediate consequence of Theorem 3, we have

g L2(ω), EDPP sup f F, f F=1

X f(x)g(x)dω(x) X

i [N] ˆw EZ,g i f(xi)

2 = O(r N). (28)

In other words, the squared worst-case integration error of Ermakov-Zolotukhin quadrature with DPP nodes converges to 0 at the rate O(r N+1). This rate is slower than the rate of convergence of

4In the case g φn for some n [N], we use ˆw EZ,φn or ˆw EZ,n alternatively.

EDPPIEZ,n(f)2 given by Theorem 1. Indeed, if f F then f 2 F = P

m N f, φm 2 ω/σm < + , and Theorem 1 yields VDPPIEZ,n(f)2 σN+1 f 2 F, (29) so that

X f(x)φn(x)dω(x) X

i [N] ˆw EZ,n i f(xi)

2 = VDPPIEZ,n(f)2 = O(σN+1). (30)

Now, observe that for some sequences we have σN+1 = o(r N+1). For instance, if σm = m 2s for some s > 1/2, then r N+1 = O(N 1 2s). We conclude that the convergence of EZQ under DPP is slower than the optimal rate O(σN+1), that was observed empirically for OKQ under DPP in [3] and proved theoretically for OKQ under CVS in [4], if we consider the worst-case integration error as a ﬁgure of merit. This is to be compared with the theoretical result of [14] that can not predict the difference in the rate of convergence between EZQ and OKQ: our analysis highlights the interest of using kernels when comparing quadratures.

3.3 Improved theoretical guarantees for the optimal kernel quadrature with DPPs

Theorem 3 improves on the existing theoretical guarantees of the optimal kernel quadrature with determinantal point processes initially proposed in [3]. This is the purpose of the following result. Theorem 4. Let N N . We have

g L2(ω), EDPP µg X

i [N] ˆw OKQ,g i k(xi, .) 2 F 4 g 2 ωr N. (31)

Compared to the analysis conducted in [3], Theorem 4 offers a sharper upper bound of

i [N] ˆw OKQ,g i k(xi, .) 2 F. (32)

Indeed, the upper bound (31) is dominated by PN n=1 g, φn 2 ωr N comparably to the upper bound (15), proved in [3], dominated by (PN n=1 | g, φn ω|)2Nr N: our bound improves upon (15) by a factor of N 2, since

n=1 | g, φn ω|)2 N

n=1 g, φn 2 ω N g 2 ω. (33)

Theorem 4 follows immediately from Theorem 3 by observing that

i [N] ˆw OKQ,g i k(xi, .) 2 F µg X

i [N] ˆw EZ,g i k(xi, .) 2 F. (34)

Table 2 summarizes the theoretical contributions of this work compared to the existing literature.

Quadrature Distribution Theoretical rate Empirical rate Reference EZQ DPP O(r N+1) O(r N+1) Theorem 3 OKQ DPP N 2O(r N+1) O(σN+1) [3] O(r N+1) O(σN+1) Theorem 4 OKQ CVS O(σN+1) O(σN+1) [4] Table 2: A comparison of the rates given by Theorem 3 and Theorem 4 compared to the existing guarantees in the literature.

We give in the following section, a sketch of the main ideas behind the proof of Theorem 3.

4 Sketch of the proof

The proof of Theorem 3 decomposes into two steps. First, in Section 4.1, we give a decomposition of the squared approximation error µg P

i [N] ˆw EZ,g i k(xi, .) 2 F, then, in Section 4.2, we use this

decomposition to prove a closed formula of EDPP µg P

i [N] ˆw EZ,g i k(xi, .) 2 F .

4.1 A decomposition of the approximation error

Let g EN and let x X N such that Det κN(x) > 0, we have

i [N] ˆw EZ,g i k(xi, .) 2 F = µg 2 F 2µg(x) ˆw EZ,g + ˆw EZ,g K(x) ˆw EZ,g. (35)

The last two terms of the r.h.s of (35) decompose as follows.

Proposition 5. Let g EN and let x X N such that Det κN(x) > 0. We have

µg(x) ˆw EZ,g = µg 2 F, (36)

ˆw EZ,g K(x) ˆw EZ,g = µg 2 F + ϵ ΦN(x) 1 K N(x)ΦN(x) 1ϵ, (37)

where ϵ = P

n [N] g, φn ωen, and k N is the kernel deﬁned by

k N(x, y) = X

m N+1 σmφm(x)φm(y). (38)

The proof of Proposition 5 is detailed in Appendix A.3. Now, by combining (35), (36) and (37), we get

i [N] ˆw EZ,g i k(xi, .) 2 F = µg 2 F 2 µg 2 F + µg 2 F + ϵ ΦN(x) 1 K N(x)ΦN(x) 1ϵ.

This proves the following result.

Theorem 6. Let g EN and let x X N such that Det κN(x) > 0. We have

i [N] ˆw EZ,g i k(xi, .) 2 F = ϵ ΦN(x) 1 K N(x)ΦN(x) 1ϵ, (39)

where ϵ = P

n [N] g, φn ωen.

4.2 A tractable formula of the expected approximation error

In the following, we prove a closed formula for EDPP µg P

i [N] ˆw EZ,g i k(xi, .) 2 F. By Theorem 6, it is enough to calculate EDPPϵ ΦN(x) 1 K N(x)ΦN(x) 1ϵ, (40)

for ϵ RN. For this purpose, observe that K N(x) = P

m N+1 σmφm(x)φm(x) , so that

ϵ ΦN(x) 1 K N(x)ΦN(x) 1ϵ = X

m N+1 σmϵ ΦN(x) 1 φm(x)φm(x) ΦN(x) 1ϵ. (41)

Therefore, the calculation of 40 boils down to the calculation of

EDPPϵ ΦN(x) 1 φm(x)φm(x) ΦN(x) 1ϵ, (42)

for m N + 1. This is the purpose of the following result.

Theorem 7. Let ϵ = P

n [N] ϵnen, ϵ = P

n [N] ϵnen RN, and m N + 1. Then

EDPPϵ ΦN(x) 1 φm(x)φm(x) ΦN(x) 1 ϵ = X

n [N] ϵn ϵn. (43)

In particular,

EDPPϵ ΦN(x) 1 K N(x)ΦN(x) 1 ϵ = X

n [N] ϵn ϵn. (44)

10 20 30 50 100 log10(N)

log10(Squared error)

(a) EZQ (s = 2)

10 20 30 50 100 log10(N)

log10(Squared error)

KBIQ 10 (N)

KBIQ 20 (N)

(b) KBIQ (s = 2)

10 20 30 50 100 log10(N)

log10(Squared error)

(c) OKQ (s = 2)

10 20 30 50 100 log10(N)

log10(Squared error)

(d) EZQ (s = 3)

10 20 30 50 100 log10(N)

log10(Squared error)

KBIQ 10 (N)

KBIQ 20 (N)

(e) KBIQ (s = 3)

10 20 30 50 100 log10(N)

log10(Squared error)

(f) OKQ (s = 3)

Figure 1: Squared worst-case integration error vs. number of nodes N for EZQ, KBIQ and OKQ in the Sobolev space of periodic functions of order s {2, 3}.

We give the proof of Theorem 7 in Appendix A.4. By taking ϵ = ϵ = P

n [N] g, φn ωen in Theorem 7, we obtain (26). As for (27), it is sufﬁcient to observe that

i [N] ˆw EZ,g i k(xi, .) 2 F 2 µg µg N 2 F + µg N X

i [N] ˆw EZ,g i k(xi, .) 2 F , (45)

where g N = P

n [N] g, φn ωφn EN, so that we can apply (26) to g N and we obtain

EDPP µg N X

i [N] ˆw EZ,g i k(xi, .) 2 F = X

n [N] g, φn 2 ω X

m N+1 σm g 2 ω X

m N+1 σm. (46)

The term µg µg N 2 F is upper bounded by σN+1 g 2 ω. We give the details in Appendix A. This concludes the proof of Theorem 3. In the following section, we give numerical experiments illustrating this result.

5 Numerical experiments

In this section, we illustrate the theoretical results presented in Section 3 in the case of the RKHS associated to the kernel

ks(x, y) = 1 + X

1 m2s cos(2πm(x y)), (47)

that corresponds to the periodic Sobolev space of order s on [0, 1] [5], and we take ω to be the uniform measure on X = [0, 1]. We compare the squared worst-case integration error of EZQ and OKQ and KBIQ, with M = 2N and γ = σ, for x that follows the distribution of the projection DPP and for g {e1, e10, e20}. We take N [5, 100]. Figure 1 shows log-log plots of the squared error w.r.t. N, averaged over 1000 samples for each point, for s {2, 3}. We observe that the squared error of EZQ converges to 0 at the exact rate O(r N+1) predicted by Theorem 3, while the squared error of OKQ converges to 0 at the rate O(σN+1) as it was already observed in [3], which is still better than the rate O(r N+1) proved in Theorem 4. Finally, KBIQ (M = 2N and γ = σ) converges to 0 at the rate O(σN+1). We conclude that, by taking M = αN with α > 1, KBIQ have practically the same averaged error as OKQ (M = + ). As we have mentioned before, the theoretical analysis of KBIQ in the regime when M is ﬁnite and strictly larger than N is beyond the scope of this work, and we defer it for future work.

6 Conclusion

We studied the quadrature rule proposed by Ermakov and Zolotukhin through the lens of kernel methods. We proved that EZQ and OKQ belong to a larger class of quadrature rules that may be deﬁned through kernel based interpolation. From this new perspective, EZQ may be seen as an approximation of OKQ. Moreover, we studied the expected value of the squared worst-case integration error of EZQ when the nodes follow the distribution of a DPP. In particular, we proved that EZQ converges to 0 at the rate O(r N+1) which is slower than the optimal rate O(σN+1) typically observed for OKQ with DPPs. This work shows the importance of the worst-case integration error as a ﬁgure of merit when comparing quadrature rules. Interestingly, we use our analysis of EZQ to improve upon the existing theoretical guarantees of OKQ under DPPs. Finally, we illustrated the theoretical results by some numerical experiments that hint that KBIQ in the regime M > N may have similar performances as OKQ. It would be interesting to study this broader class of quadratures in the future.

Broader impact

This article makes contributions to the fundamentals of numerical integration, and due to its theoretical nature, the author sees no ethical or immediate societal consequence of this work.

Acknowledgments and Disclosure of Funding

This project was supported by the Allegro Assai ANR project ANR-19-CHIA-0009. The author would like to thank the reviewers for their thorough and constructive reviews and would like to thank Pierre Chainais and Rémi Bardenet for their constructive feedback on an early version of this work.

[1] R. Bardenet and A. Hardy. Monte Carlo with determinantal point processes. The Annals of Applied Probability, 30(1):368 417, 2020.

[2] A. Belhadji. Subspace sampling using determinantal point processes. Ph D thesis, Ecole Centrale de Lille, 2020.

[3] A. Belhadji, R. Bardenet, and P. Chainais. Kernel quadrature with DPPs. In Advances in Neural Information Processing Systems 32, pages 12907 12917. 2019.

[4] A. Belhadji, R. Bardenet, and P. Chainais. Kernel interpolation with continuous volume sampling. Proceedings of the 37th International Coference on International Conference on Machine Learning, 2020.

[5] A. Berlinet and C. Thomas-Agnan. Reproducing kernel Hilbert spaces in probability and statistics. Springer Science & Business Media, 2011.

[6] B. Bojanov. Uniqueness of the optimal nodes of quadrature formulae. Mathematics of computation, 36(154):525 546, 1981.

[7] F. X. Briol, C. Oates, M. Girolami, and M. Osborne. Frank-Wolfe Bayesian quadrature: Probabilistic integration with theoretical guarantees. In Advances in Neural Information Processing Systems, pages 1162 1170, 2015.

[8] F. X. Briol, C. J. Oates, M. Girolami, M. A. Osborne, D. Sejdinovic, et al. Probabilistic integration: A role in statistical computation? Statistical Science, 34(1):1 22, 2019.

[9] Y. Chen, M. Welling, and A. Smola. Super-samples from kernel herding. In Proceedings of the Twenty-Sixth Conference on Uncertainty in Artiﬁcial Intelligence, UAI 10, pages 109 116, Arlington, Virginia, United States, 2010. AUAI Press.

[10] J. F. Coeurjolly, A. Mazoyer, and P. O. Amblard. Monte carlo integration of non-differentiable functions on [0, 1]ι, ι = 1, . . . , d, using a single determinantal point pattern deﬁned on [0, 1]d. ar Xiv preprint ar Xiv:2003.10323, 2020.

[11] P. J. Davis and P. Rabinowitz. Methods of numerical integration. 1984. Comput. Sci. Appl. Math, 1984.

[12] S. De Marchi. On optimal center locations for radial basis function interpolation: computational aspects. Rend. Splines Radial Basis Functions and Applications, 61(3):343 358, 2003.

[13] S. De Marchi, R. Schaback, and H. Wendland. Near-optimal data-independent point locations for radial basis function interpolation. Advances in Computational Mathematics, 23(3):317 330, 2005.

[14] S. M. Ermakov and V. Zolotukhin. Polynomial approximations and the monte-carlo method. Theory of Probability & Its Applications, 5(4):428 431, 1960.

[15] C. F. Gauss. Methodus nova integralium valores per approximationem inveniendi. apvd Henricvm Dieterich, 1815.

[16] G. Gautier, R. Bardenet, and M. Valko. On two ways to use determinantal point processes for monte carlo integration. In Advances in Neural Information Processing Systems, volume 32, 2019.

[17] P. Glasserman. Monte Carlo methods in ﬁnancial engineering, volume 53. Springer Science & Business Media, 2013.

[18] F. J. Hickernell. Quadrature error bounds with applications to lattice rules. SIAM Journal on Numerical Analysis, 33(5):1995 2016, 1996.

[19] F. J. Hickernell. A generalized discrepancy and quadrature error bound. Mathematics of computation, 67(221):299 322, 1998.

[20] J. B. Hough, M. Krishnapur, Y. Peres, and B. Virág. Determinantal processes and independence. Probability surveys, 3:206 229, 2006.

[21] F. Huszár and D. Duvenaud. Optimally-weighted herding is Bayesian quadrature. In Proceedings of the Twenty-Eighth Conference on Uncertainty in Artiﬁcial Intelligence, UAI 12, pages 377 386. AUAI Press, 2012.

[22] T. Karvonen, C. J. Oates, and M. Girolami. Integration in reproducing kernel Hilbert spaces of gaussian kernels. ar Xiv preprint ar Xiv:2004.12654, 2020.

[23] T. Karvonen and S. Särkkä. Gaussian kernel quadrature at scaled Gauss-Hermite nodes. BIT Numerical Mathematics, pages 1 26, 2019.

[24] J. Ma, V. Rokhlin, and S. Wandzura. Generalized gaussian quadrature rules for systems of arbitrary functions. SIAM Journal on Numerical Analysis, 33(3):971 996, 1996.

[25] N. Metropolis and S. Ulam. The Monte Carlo method. Journal of the American statistical association, 44(247):335 341, 1949.

[26] E. Novak, M. Ullrich, and H. Wo zniakowski. Complexity of oscillatory integration for univariate sobolev spaces. Journal of Complexity, 31(1):15 41, 2015.

[27] C. P. Robert and G. Casella. Monte Carlo statistical methods. Springer, 2004.

[28] A. Smola, A. Gretton, L. Song, and B. Schölkopf. A Hilbert space embedding for distributions. In International Conference on Algorithmic Learning Theory, pages 13 31. Springer, 2007.