# near_optimal_reconstruction_of_spherical_harmonic_expansions__17c17120.pdf Near Optimal Reconstruction of Spherical Harmonic Expansions Amir Zandieh Independent Researcher amir@zed512.gmail.com Insu Han Yale University insu.han@yale.edu Haim Avron Tel Aviv University haimav@tauex.tau.ac.il We propose an algorithm for robust recovery of the spherical harmonic expansion of functions defined on the d-dimensional unit sphere Sd 1 using a near-optimal number of function evaluations. We show that for any f L2(Sd 1), the number of evaluations of f needed to recover its degree-q spherical harmonic expansion equals the dimension of the space of spherical harmonics of degree at most q, up to a logarithmic factor. Moreover, we develop a simple yet efficient kernel regressionbased algorithm to recover degree-q expansion of f by only evaluating the function on uniformly sampled points on Sd 1. Our algorithm is built upon the connections between spherical harmonics and Gegenbauer polynomials. Unlike the prior results on fast spherical harmonic transform, our proposed algorithm works efficiently using a nearly optimal number of samples in any dimension d. Furthermore, we illustrate the empirical performance of our algorithm on numerical examples. 1 Introduction We consider the fundamental problem of recovering a function from a finite number of (noisy) observations. To provide accurate and reliable predictions at unobserved points we need to avoid overfitting which is typically achieved through restricting our estimator or interpolant to a family of smooth or structured functions. In this paper, we focus on interpolating square-integrable functions on the d-dimensional unit sphere, with low-degree spherical harmonics, a critical task in scenarios where rotational invariance is a fundamental property. Spherical harmonics are essential in various theoretical and practical applications, including the representation of electromagnetic fields [Wei95], gravitational potential [Wer97], cosmic microwave background radiation [KKS97] and medical imaging [CLL15], as well as modelling of 3D shapes in computer graphics [KFR03] and computer vision [GCL+22]. Further notable real-life applications include molecular/atom systems, where understanding the underlying functions within a spherical context can significantly enhance predictive modeling and simulation accuracy [EEHM17, LWL+21, ZDK+22]. We begin by observing that any function f in L2(Sd 1), i.e., the family of square-integrable functions on the sphere Sd 1, can be uniquely decomposed into orthogonal spherical harmonic components. Specifically, if we denote the space of spherical harmonics of degree ℓin dimension d by Hℓ(Sd 1), any f L2(Sd 1) has a unique orthogonal expansion f = P ℓ=0 fℓwith fℓ Hℓ(Sd 1) (Lemma 2). With this observation, we aim to find the best spherical harmonic approximation of degree q to f using minimal number of samples (essentially treating higher order terms in f s expansion as noise). 37th Conference on Neural Information Processing Systems (Neur IPS 2023). Problem 1 (Informal Version of Problem 2). For an unknown function f L2(Sd 1) and an integer q 1, efficiently (both in terms of number of samples from f and computations) learn the first q + 1 spherical harmonic components fℓ Hℓ(Sd 1) q ℓ=0 of f which minimize ℓ=0 fℓ(w) f(w) The angular power spectrum of f commonly obeys a power law decay of the form fℓ 2 Sd 1 O(ℓ s), for some s > 0, depending on the order of differentiability of f. In fact, for any infinitely differentiable f, fℓ 2 Sd 1 decays asymptotically faster than any rational function of ℓ. Furthermore, for any real analytic f on the sphere, fℓ 2 Sd 1 decays exponentially. Thus, the first q + 1 spherical harmonic components of f typically well approximate f for even modest q, and answering Problem 1 is meaningful for a wide range of differentiable functions. 1.1 Our Main Results We reformulate Problem 1 as a least-squares regression and then solve it using randomized numerical linear algebra techniques. We first consider an orthonormal projection operator that maps functions in L2(Sd 1) onto the space of bounded-degree spherical harmonics Lq ℓ=0 Hℓ(Sd 1). Specifically, if K(q) d is an operator that maps any f with expansion f = P ℓ=0 fℓwhere fℓ Hℓ(Sd 1), onto f s first q + 1 expansion components, i.e., K(q) d f = Pq ℓ=0 fℓ, then Problem 1 can be formulated as min g L2(Sd 1) K(q) d g f 2 However, solving this regression problem with continuous cost function is challenging. To avoid such continuous optimizations, we adopt the approach of [AKM+19] which discretizes the aforementioned regression problem according to the leverage scores of the operator K(q) d . It turns out that if we can draw random samples with probabilities proportional to the leverage scores of K(q) d then we can recover the degree-q spherical harmonic expansion of f, i.e. Pq ℓ=0 fℓ, with finite number of observations. Particularly, by exploiting the connections between spherical harmonics and Zonal (Gegenbauer) Harmonics and the fact that zonal harmonics are the reproducing kernels of Hℓ(Sd 1) (Lemma 3), we prove that the leverage scores of K(q) d are constant everywhere on the sphere Sd 1. Thus, solving a discrete regression problem with uniformly sampled observations yields a near-optimal solution to Problem 1. Informal statements of our results are as follows. Theorem 1 (Informal Version of Theorem 5). Let βq,d be the dimension of spherical harmonics of degree at most q, i.e., βq,d dim Lq ℓ=0 Hℓ(Sd 1) . There exists an algorithm that finds a (1 + ε)-approximation to the optimal solution of Problem 1, given s = O(βq,d log βq,d + ε 1βq,d) observations of function f at uniformly sampled points on Sd 1, with O(s2d + sω)1 runtime. We also prove that our bound on the number of required samples is optimal up to a logarithmic factor. Theorem 2 (Informal Version of Theorem 6). Any (randomized) algorithm that takes s < βq,d samples on any input fails with probability greater than 9/10, where βq,d dim Lq ℓ=0 Hℓ(Sd 1) . 1.2 Related Work Efficient reconstruction of functions as per Problem 1 has been extensively studied in various fields. Many prior papers considered reconstructing 1-dimensional functions from finite number of samples on a finite interval under smoothness assumption about the underlying function. Notably, the influential line of work of [SP61, LP61, LP62, XRY01] focused on reconstructing Fourier-bandlimited functions and [CKPS16, KVZ19, EMM20, BKM+23] considered interpolating Fourier-sparse signals. Recently, [AKM+19] unified these reconstruction methods in dimension d = 1 and gave a universal sampling framework for reconstructing nearly all classes of functions with Fourier-based smoothness constraints. One can view 1-dimensional functions on a finite interval as functions on the unit circle 1ω < 2.3727 is the exponent of the fast matrix multiplication algorithm [Wil12] S1. Thus, Problem 1 is indeed a generalization of the aforementioned prior work to high dimensions under the assumption that the generalized Fourier series (Lemma 2) of the underlying function only contains bounded-degree spherical harmonics. This degree constraint on spherical harmonic expansions can be viewed as the d-dimensional analog of Fourier-bandlimitedness on circle S1. Computing the spherical harmonic expansion in dimension d = 3 has received considerable attention in physics and applied mathematics communities. The algorithms for this special case of Problem 1 are known in the literature as fast spherical harmonic transform [SS00, ST02]. Most notably, [RT06] proposed an algorithm for computing spherical harmonic expansion of degree q to precision ε using O(βq,3) samples and O(βq,3 log βq,3 log(1/ε)) time. These fast algorithms were developed based on fast Fourier and associated Legendre transforms and make use of a (well-conditioned) orthogonal basis for Hℓ(Sd 1), which happened to be the associated Legendre polynomials when d = 3. However, it is in general very difficult to compute an orthogonal basis for spherical harmonics [MNY06], so unlike our Theorem 1, it is inefficient to extend these prior results to higher d. From the techniques point of view, a related work is [GMMM21], which employs harmonic analysis over Sd 1 to analyze the generalization of two-layered neural tangent kernels. They show that an unknown function defined on Sd 1 can be efficiently recovered using kernel regression w.r.t. neural tangent kernel on uniform random samples from the function. However, the number of samples that [GMMM21] requires for recovering bounded degree spherical harmonics, especially when the degrees are high, is sub-optimal and is strictly worse than our result. Additionally, [GMMM21] does not guarantee recovery with relative error, while our Theorem 5 provides relative error guarantees. Furthermore, recent applications of Gegenbauer polynomials, along with other orthonormal polynomials like Hermite polynomials, have been found in designing efficient random features for approximating various kernel functions. These applications extend to dot-product kernels [HZA22], Neural Tangent Kernels [ZHA+21, HZL+22], and Gaussian kernels [WZ22, AKK+20]. 2 Mathematical Preliminaries We denote by Sd 1 the unit sphere in d dimension. We use |Sd 1| = 2πd/2 Γ(d/2) to denote the surface area of sphere Sd 1 and U(Sd 1) to denote the uniform probability distribution on Sd 1. We denote by L2 Sd 1 the set of all square-integrable real-valued functions on sphere Sd 1. Furthermore, for any f, g L2 Sd 1 we use the following definition of the inner product on the unit sphere2, f, g Sd 1 := Z Sd 1 f(w)g(w)dw = Sd 1 E w U(Sd 1) [f(w)g(w)]. (2) The function space L2 Sd 1 is complete with respect to the norm induced by the above inner product, i.e. f Sd 1 := p f, f Sd 1, so L2 Sd 1 is a Hilbert space. We often use the term quasi-matrix which is informally defined as a matrix in which one dimension is finite while the other is infinite. A quasi-matrix can be tall (or wide) meaning that there is a finite number of columns (or rows) where each one is a functional operator. For a more formal definition, see [SA22]. Spherical Harmonics are the solutions of Laplace s equation in spherical domains and can be thought of as functions defined on Sd 1 employed in solving partial differential equations. Formally, Definition 1 (Spherical Harmonics). For integers ℓ 0 and d 1, let Pℓ(d) be the space of degree-ℓ homogeneous polynomials with d variables and real coefficients. Let Hℓ(d) denote the space of degree-ℓharmonic polynomials in dimension d, i.e., homogeneous polynomial solutions of Laplace s equation: Hℓ(d) := {P Pℓ(d) : P = 0}, x2 d is the Laplace operator on Rd. Finally, let Hℓ Sd 1 be the space of (real) Spherical Harmonics of order ℓin dimension d, i.e. restrictions of harmonic polynomials in 2Formally, L2 Sd 1 is a space of equivalence classes of functions that differ at a set of points with measure 0. For notational simplicity, here and throughout we use f to denote the specific representative of the equivalence class f L2 Sd 1 . In this way, we can consider the point-wise value f(w) for every w Sd 1. Hℓ(d) to the sphere Sd 1. The dimension of this space, αℓ,d dim Hℓ Sd 1 , is α0,d = 1, α1,d = d, αℓ,d = d + ℓ 1 ℓ d + ℓ 3 ℓ 2 2.1 Gegenbauer Polynomials The Gegenbauer (a.k.a. ultraspherical) polynomial of degree ℓ 0 in dimension d 2 is given by P ℓ d(t) := j=0 cj tℓ 2j (1 t2)j, (3) where c0 = 1 and cj+1 = (ℓ 2j)(ℓ 2j 1) 2(j+1)(d 1+2j) cj for j = 0, 1, . . . , ℓ/2 1. These polynomials are orthogonal on the interval [ 1, 1] with respect to the measure (1 t2) d 3 2 , i.e., Z 1 1 P ℓ d(t) P ℓ d (t) (1 t2) d 3 αℓ,d |Sd 2| 1{ℓ=ℓ }. (4) Zonal Harmonics. The Gegenbauer polynomials naturally provide positive definite dot-product kernels on Sd 1 known as Zonal Harmonics, which are closely related to the spherical harmonics. The following reproducing property of zonal harmonics plays a crucial role in our analysis. Lemma 1 (Reproducing Property of Zonal Harmonics). Let P ℓ d( ) be the Gengenbauer polynomial of degree ℓin dimension d. For any x, y Sd 1: P ℓ d( x, y ) = αℓ,d Ew U(Sd 1) P ℓ d ( x, w ) P ℓ d ( y, w ) , Furthermore, for any ℓ = ℓ: Ew U(Sd 1) h P ℓ d ( x, w ) P ℓ d ( y, w ) i = 0. The proof of this and all subsequent results can be found in the appendix. The following useful fact, known as the addition theorem, connects Gegenbauer polynomials and spherical harmonics. Theorem 3 (Addition Theorem). For every integer ℓ 0, if n yℓ 1, yℓ 2, . . . , yℓ αℓ,d o is an orthonormal basis for Hℓ Sd 1 , then for any σ, w Sd 1 we have αℓ,d |Sd 1| P ℓ d ( σ, w ) = j=1 yℓ j(σ) yℓ j(w). 3 Reconstructing L2 Sd 1 Functions via Spherical Harmonics In this section we show how to approximate any function f L2 Sd 1 by spherical harmonics using the optimal number of samples. We begin with the fact that spherical harmonics form a complete set of orthonormal functions and thus form an orthonormal basis for the Hilbert space of square-integrable functions on sphere Sd 1. This is analogous to periodic functions, viewed as functions defined on the circle S1, which can be expressed as a linear combination of circular functions (sines and cosines) via the Fourier series. Lemma 2 (Direct Sum Decomposition of L2(Sd 1)). The family of spaces Hℓ Sd 1 yields a Hilbert space direct sum decomposition L2 Sd 1 = L ℓ=0 Hℓ Sd 1 : the summands are closed and pairwise orthogonal, and every f L2 Sd 1 is the sum of a converging series (in the sense of mean-square convergence with the L2-norm defined in Eq. (2)), where fℓ Hℓ Sd 1 are uniquely determined functions. Furthermore, given any orthonormal basis n yℓ 1, yℓ 2, . . . , yℓ αℓ,d o of Hℓ Sd 1 we have fℓ= Pαℓ,d j=1 f, yℓ j Sd 1 yℓ j. The series expansion in Lemma 2 is the analog of the Fourier expansion of periodic functions, and is known as generalized Fourier series [Pen30] with respect to the Hilbert basis yℓ j : j [αℓ,d], ℓ 0 . We remark that it is in general intractable to compute an orthogonal basis for the space of spherical harmonics [MNY06], which renders the generalized Fourier series expansion in Lemma 2 primarily existential. While finding the generalized Fourier expansion of a function f L2 Sd 1 is computationally intractable, our goal is to answer the next fundamental question, which is about finding the projection of a function f onto the space of spherical harmonics, i.e., the fℓ s in Lemma 2. Concretely, we seek to solve the following problem. Problem 2. For an integer q 0 and a given function f L2 Sd 1 whose decomposition over the Hilbert sum L ℓ=0 Hℓ Sd 1 is f = P ℓ=0 fℓas per Lemma 2, let us define the low-degree expansion of this function as f (q) := Pq ℓ=0 fℓ. How efficiently can we learn f (q) Lq ℓ=0 Hℓ Sd 1 ? More precisely, we want to find a set {w1, w2, . . . , ws} Sd 1 with minimal cardinality s along with an efficient algorithm that given samples {f(wi)}s i=1 can interpolate f( ) with a function f (q) Lq ℓ=0 Hℓ Sd 1 such that: f (q) f (q) 2 Sd 1 ε f (q) f 2 For ease of notation, we denote the Hilbert space of spherical harmonics of degree at most q by H(q) Sd 1 := Lq ℓ=0 Hℓ Sd 1 . To answer Problem 2 we exploit the close connection between the spherical harmonics and Gengenbauer polynomials, and in particular the fact that zonal harmonics are the reproducing kernels of the Hilbert spaces Hℓ Sd 1 . Lemma 3 (A Reproducing Kernel for Hℓ Sd 1 ). For every f L2 Sd 1 , if f = P ℓ=0 fℓis the unique decomposition of f over L ℓ=0 Hℓ Sd 1 as per Lemma 2, then fℓis given by fℓ(σ) = αℓ,d E w U(Sd 1) f(w)P ℓ d ( σ, w ) for σ Sd 1. Now we define a kernel operator, based on the low-degree Gegenbauer polynomials, which projects functions onto their low-degree spherical harmonic expansion. Definition 2 (Projection Operator onto H(q)(Sd 1)). For any integers q 0 and d 2, define the kernel operator K(q) d : L2 Sd 1 L2 Sd 1 as follows: for f L2 Sd 1 and σ Sd 1, h K(q) d f i (σ) := αℓ,d |Sd 1| f, P ℓ d ( σ, ) ℓ=0 αℓ,d E w U(Sd 1) f(w)P ℓ d ( σ, w ) . (5) This is an integral operator with kernel function kq,d(σ, w) := Pq ℓ=0 αℓ,d |Sd 1| P ℓ d ( σ, w ). Note that the operator K(q) d is self-adjoint and positive semi-definite. Moreover, using the reproducing property of this kernel we can establish that K(q) d is a projection operator. Claim 1. The operator K(q) d defined in Definition 2 satisfies the property K(q) d 2 = K(q) d . Furthermore, by the addition theorem (Theorem 3), K(q) d is trace-class (i.e., the trace is finite and independent of the choice of basis) because: trace K(q) d = Z Sd 1 kq,d(w, w) dw = αℓ,d |Sd 1| Z Sd 1 P ℓ d ( w, w ) dw ℓ=0 αℓ,d = d + q 1 q + d + q 2 q 1 By combining Theorem 3 and Lemma 2, and using the definition of the projection operator K(q) d , it follows that for any function f L2 Sd 1 with Hilbert sum decomposition f = P ℓ=0 fℓ, the lowdegree component f (q) = Pq ℓ=0 fℓ H(q) Sd 1 can be computed as f (q) = K(q) d f. Equivalently, in order to learn f (q), it suffices to solve the following least-squares regression problem, min g L2(Sd 1) K(q) d g f 2 If g is an optimal solution to the above regression problem then f (q) = K(q) d g . In the next claim we show that solving the least squares problem in Eq. (7), even to a coarse approximation, is sufficient to solve our interpolation problem (i.e., Problem 2): Claim 2. For any f L2 Sd 1 , any integer q 0, and any C 1, if g L2 Sd 1 satisfies, K(q) d g f 2 Sd 1 C min g L2(Sd 1) K(q) d g f 2 and if we let f (q) := K(q) d f, where K(q) d is defined as per Definition 2, then the following holds K(q) d g f (q) 2 Sd 1 (C 1) f (q) f 2 Claim 2 shows that solving the regression problem in Eq. (7) approximately provides a solution to our spherical harmonics interpolation problem (Problem 2). But how can we solve this least-squares problem efficiently? Not only does the problem involve a possibly infinite dimensional parameter vector g, but the objective function also involves the continuous domain on the surface of Sd 1. 3.1 Randomized Discretization via Leverage Function Sampling We solve the continuous regression in Eq. (7) by randomly discretizing the sphere Sd 1, thereby reducing our problem to a regression on a finite set of points w1, w2, . . . , ws Sd 1. In particular, we propose to sample points on Sd 1 with probability proportional to the so-called leverage function, a specific distribution that has been widely applied in randomized algorithms for linear algebra problems on discrete matrices [LMP13]. We start with the definition of the leverage function for compact operators such as K(q) d : Definition 3 (Leverage Function). For integers q 0 and d > 0, we define the leverage function of the operator K(q) d (see Definition 2) for every w Sd 1 as follows, τq(w) := max g L2(Sd 1) Sd 1 h K(q) d g i (w) 2 . (8) Intuitively, τq(w) is an upper bound of how much a function that is spanned by the eigenfunctions of the operator K(q) d can blow up at w. The larger the leverage function τq(w) implies the higher the probability we will be required to sample w. This ensures that our sample points well reflect any possibly significant components, or spikes , of the function. Ultimately, the integral R Sd 1 τq(w) dw determines how many samples we require to solve the regression problem Eq. (7) to a given accuracy. It is a known fact that the leverage function integrates to the rank of the operator K(q) d (which is equal to the dimensionality of the Hilbert space H(q)(Sd 1)). This ultimately allows us to achieve a e O(Pq ℓ=0 αℓ,d) sample complexity bound for solving Problem 2. To compute the leverage function, we make use of the following useful alternative characterization of the leverage function. Lemma 4 (Min Characterization of the Leverage Function). For any w Sd 1, let τq(w) be the leverage function (Definition 3) and define φw L2(Sd 1) by φw(σ) Pq ℓ=0 αℓ,d |Sd 1|P ℓ d ( σ, w ). We have the following minimization characterization of the leverage function: τq(w) = min g L2(Sd 1) g 2 Sd 1, s.t. K(q) d g = φw Using the min and max characterizations of the leverage function we can find upper and lower bounds on this function. Surprisingly, in this case the upper and lower bounds match, so we actually have an exact value for the leverage function. Lemma 5 (Leverage Function is Constant). The leverage function given in Definition 3 is equal to τq(w) = Pq ℓ=0 αℓ,d |Sd 1| for every w Sd 1. Algorithm 1 Efficient Spherical Harmonic Expansion 1: Input: accuracy parameter ε > 0, integer q 0 2: Set s = c (βq,d log βq,d + βq,d/ε) for sufficiently large fixed constant c 3: Sample i.i.d. random points w1, w2, . . . , ws from U(Sd 1) 4: Compute K Rs s with Ki,j = Pq ℓ=0 αℓ,d s |Sd 1| P ℓ d ( wi, wj ) for i, j [s] 5: Compute f Rs with fj = 1 s f(wj) for j [s] 6: Solve the regression by computing z = K f 7: Return: y H(q)(Sd 1) with y(σ) := Pq ℓ=0 αℓ,d s |Sd 1| Ps j=1 zj P ℓ d ( wj, σ ) for σ Sd 1 We prove this lemma in Appendix C. The integral of the leverage function, which determines the total samples needed to solve our least-squares regression, is therefore equal to the dimensionality of the Hilbert space H(q)(Sd 1). Corollary 1. The leverage function defined in Definition 3 integrates to the dimensionality of the Hilbert space H(q)(Sd 1), which we denote by βq,d, i.e., Z Sd 1 τq(w) dw = dim H(q)(Sd 1) = ℓ=0 αℓ,d βq,d. We now show that the leverage function can be used to randomly sample the points on the unit sphere to discretize the regression problem in Eq. (7) and solve it approximately. Theorem 4 (Approximate Regression via Leverage Function Sampling). For any ε > 0, let s = c βq,d log βq,d + βq,d ε , for sufficiently large fixed constant c, and let x1, x2, . . . , xs be i.i.d. uniform samples on Sd 1. Define the quasi-matrix P : Rs L2(Sd 1) as follows, for every v Rd: [P v](σ) := αℓ,d s |Sd 1| j=1 vj P ℓ d ( xj, σ ) for σ Sd 1. Also let f Rs be a vector with fj := 1 s f(xj) for j = 1, 2, . . . , s and let P be the adjoint of P . If g is an optimal solution to the least-squares problem g arg ming L2(Sd 1) P g f 2 2, then with probability at least 1 10 4 the following holds, K(q) d g f 2 Sd 1 (1 + ε) min g L2(Sd 1) K(q) d g f 2 We prove Theorem 4 in Appendix C. This theorem shows that the function g obtained from solving the discretized regression problem provides an approximate solution to Eq. (7). 3.2 Efficient Solution for the Discretized Least-Squares Problem In this section, we demonstrate how to apply Theorem 4 algorithmically to approximately solve the regression problem of Eq. (7). To achieve this, we leverage the kernel trick, following a similar approach to previous works such as [AKM+19, CP19], which allows us to efficiently address the randomly discretized least squares problem as detailed in Algorithm 1. The associated guarantee for this approach is provided in Theorem 5. Theorem 5 (Efficient Spherical Harmonic Interpolation). Algorithm 1 returns a function y H(q)(Sd 1) such that, with probability at least 1 10 4: y f (q) 2 Sd 1 ε f (q) f 2 Sd 1 , where f (q) := K(q) d f. Suppose we can compute the Gegenbauer polynomial P ℓ d(t) at every point t [ 1, 1] in constant time. Then Algorithm 1 queries the function f at s = O βq,d log βq,d + βq,d ε points on the sphere Sd 1 and runs in O(s2 d + sω) time. This algorithm evaluates y(σ) in O(d s) time for any σ Sd 1. We provide the proof of this theorem in Appendix D. Remark 1 (Noise Robustness). In Theorem 5, our method s robustness is demonstrated under a noise model where the function f is not strictly a low-degree spherical harmonic and may include high-degree components. In this scenario, the higher-degree components are considered as noise added to the input function. However, our algorithm is robust against alternative noise models, particularly additive i.i.d. Normal noise that corrupts the function evaluations f in Algorithm 1 with iid Normal noise. More precisely, suppose that we observe samples from the function f (q) contaminated by Gaussian noise, i.e., f(wj) = f (q)(wj) + nj in Algorithm 1 for i.i.d. n1, n2, . . . ns N(0, 1). The expected value of perturbation s norm in the output y of our algorithm caused by this noise is: = (1/s) trace K KK . By Markov s inequality, with 0.99 probability y f (q) 2 Sd 1 = O(1/s) trace K KK . If we let P be the operator defined in Theorem 4, then we can see that K = P P . Using the properties of the trace, one can see that trace K KK = 1/λ1 + 1/λ2 + . . ., where λi s are the singular values of the operator P P . By matrix Chernoff inequalities one can show that all singular values of the operator P P closely approximate the singular values of the projection operator K(q) d up to a constant factor. So, we have trace K KK = O(rank(K(q) d )) = O(βq,d). Thus, because s Ω(β/ε) and using union bound, with 0.98 probability: y f (q) 2 4 Lower Bound on The Number of Required Observations We conclude by showing that the dimensionality of the Hilbert space H(q)(Sd 1) tightly characterizes the sample complexity of Problem 2. Thus, our Theorem 5 is optimal up to a logarithmic factor. Intuitively, there are βq,d degrees of freedom for specifying a spherical harmonic. Consequently, any deterministic algorithm attempting to reconstruct such polynomials would need at least βq,d samples. We aim to demonstrate that even a randomized algorithm, which succeeds with only a constant probability, must still gather βq,d samples. This complements our upper bound, which is established using a randomized algorithm. The crucial fact we use for proving the lower bound is that all (non-zero) eigenvalues of the operator K(q) d are equal to one. This fact follows from the addition theorem presented in Theorem 3, i.e., if n yℓ 1, yℓ 2, . . . , yℓ αℓ,d o is an orthonormal basis of Hℓ Sd 1 , then for any function f L2 Sd 1 , h K(q) d f i (σ) = ℓ=0 αℓ,d E w U(Sd 1) P ℓ d ( σ, w ) f(w) = j=1 yℓ j, f Sd 1 yℓ j(σ). (10) Theorem 6 (Lower Bound). Consider an error parameter ε > 0, and any (possibly randomized) algorithm that solves Problem 2 with probability greater than 1/10 for any input function f and makes at most r (possibly adaptive) queries on any input. Then r βq,d. We describe a distribution on input function f on which any deterministic algorithm that takes r < βq,d samples fails with probability 9/10. The theorem then follows by Yao s principle. Hard Input Distribution. For integer ℓ q, consider an orthonormal basis of Hℓ Sd 1 and denote it by n yℓ 1, yℓ 2, . . . , yℓ αℓ,d o . Let Yℓ: Rαℓ,d Hℓ Sd 1 be the quasi-matrix with yℓ j as its jth column, i.e., [Yℓ u](σ) := Pαℓ,d j=1 uj yℓ j(σ) for any u Rαℓ,d and σ Sd 1. Let v(0) Rα0,d, v(1) Rα1,d, . . . , v(q) Rαq,d be independent random vectors with i.i.d. Gaussian entries: v(ℓ) j N(0, 1). The random input is defined to be f := Pq ℓ=0 Yℓ v(ℓ). In other words, f = Pq ℓ=0 Yℓ v(ℓ) is a random linear combination of the eigenfunctions of K(q) d . We prove that accurate reconstruction of f drawn from the aforementioned distribution yields accurate reconstruction of random vectors v(0), v(1), . . . , v(q). Since each v(ℓ) is αℓ,d-dimensional, this reconstruction requires Ω(Pq ℓ=0 αℓ,d) = Ω(βq,d) samples, giving us a lower bound for accurate reconstruction of f. First we show that finding an f (q) satisfying the condition of Problem 2 is at least as hard as accurately finding all vectors v(0), v(1), . . . , v(q). The following lemma is proved in Appendix E. Lemma 6. If a deterministic algorithm solves Problem 2 with probability at least 1/10 over our random input distribution f = Pq ℓ=0 Yℓ v(ℓ), then with probability at least 1/10, the output of the algorithm f (q) satisfies Y ℓ f (q) = v(ℓ) for all integers ℓ q. Finally, we complete the proof of Theorem 6 by arguing that if f (q) is formed using less than βq,d queries from f, then Pq ℓ=0 Y ℓ f (q) v(ℓ) 2 2 > 0 with good probability, thus the bound of Lemma 6 cannot hold and f (q) cannot be a solution to Problem 2. Assume for sake of contradiction that there is a deterministic algorithm which solves Problem 2 with probability 1/10 over the random input f = Pq ℓ=0 Yℓ v(ℓ) that makes r = βq,d 1 queries on any input (we can modify an algorithm that makes fewer queries to make exactly βℓ,d 1 queries). For every σ Sd 1 and integer ℓ q define uℓ σ := h yℓ 1(σ), yℓ 2(σ), . . . , yℓ αℓ,d(σ) i . Also define uσ := u0 σ, u1 σ, . . . , uq σ Rβq,d and v Rβq,d as v := v(0), v(1), . . . , v(q) . Additionally, define the quasi-matrix Y := [Y0, . . . , Yq]. Using the above notations and the definition of the hard input instance f, each query to f is in fact a query to the random vector v in the form of f(σ) = uσ, v . Now consider a deterministic function Q, that is given input V Ri βq,d (for any positive integer i) and outputs Q(V ) Rβq,d βq,d such that Q(V ) has orthonormal rows with the first i rows spanning the i rows of V . If σ1, σ2, . . . , σr Sd 1 denote the points where our algorithm queries the input f, for any integer i [r], let Qi be an orthonormal matrix whose first i rows span the first i queries of the algorithm, i.e., Qi := Q [uσ1, uσ2, . . . , uσi] . Since the algorithm is deterministic, Qi is a deterministic function of input v. The following claim is proved in [AKM+19]: Claim 3 (Claim 23 of [AKM+19]). Conditioned on the queries f(σ1), f(σ2), . . . , f(σr) for r < βq,d, the variable [Qr v](βq,d) is distributed as N(0, 1). Now using Claim 3 we can write, v(ℓ) Y ℓ f (q) 2 h Qrv = Qr Y f (q)i Pr v [Qrv]βq,d = h Qr Y f (q)i [Qrv]βq,d = h Qr Y f (q)i f(σ1), . . . , f(σr) , where the expectation in the last line is taken over the randomness of f(σ1), . . . , f(σr). Conditioned on f(σ1), . . . , f(σr), h Qr Y f (q)i (βq,d) is a fixed quantity because the algorithm determines f (q) given the knowledge of the queries f(σ1), . . . , f(σr). Furthermore, by Claim 3, [Qr v](βq,d) is a random variable distributed as N(0, 1), conditioned on f(σ1), . . . , f(σr). This implies that, Pr h [Qr v] (βq,d) = h Qr Y f (q)i (βq,d) f(σ1), . . . , f(σr) i = 0. Thus, Pr Pq ℓ=0 v(ℓ) Y ℓ f (q) 2 2 = 0 = Ef(σ1),...,f(σr)[0] = 0. However, we have assumed that this algorithm solves Problem 2 with probability at least 1/10, and hence, by Lemma 6, Pr Pq ℓ=0 v(ℓ) Y ℓ f (q) 2 2 = 0 1/10. This is a contradiction, yielding Theorem 6. 5 Numerical Evaluation Noise-free Setting. For a fixed q, we generate a random function f(σ) = Pq ℓ=0 cℓP ℓ d( σ, v ) where v U(Sd 1) and cℓ s are i.i.d. samples from N(0, 1). Then, f is recovered by running Algorithm 1 with s random evaluations of f on Sd 1. Note that K(q) d f f Sd 1 = 0 since f H(q)(Sd 1), thus, (b) d = 4 Figure 1: (Left) Empirical success probabilities of Algorithm 1 varying the number of samples s and the degree of spherical harmonic expansion q. (Right) The dimension βq,d of the Hilbert space H(q)(Sd 1) as a function of q when (a) d = 3 and (b) d = 4, respectively. Figure 2: (Left) Empirical success probabilities of Algorithm 1 in presence of additive noise, where success means the error s energy is below the noise level f (q) f Sd 1 n Sd 1. (Right) The error s norm f (q) f Sd 1 as a function of q when (a) d = 3 and (b) d = 4, respectively. as shown in Theorem 4, Algorithm 1 can recover f exactly using s = O(βq,d log βq,d) evaluations, where βq,d is the dimension of the Hilbert space H(q)(Sd 1). We predict f s value on a random test set on Sd 1 and consider the algorithm fails if the testing error is greater than 10 12. We count the number of failures among 100 independent random trials with different choices of d {3, 4}, q {5, . . . , 22}, and s {40, . . . , 2400}. The empirical success probabilities for d = 3 and 4 are reported in Fig. 1(a) and Fig. 1(b), respectively. Fig. 1 illustrates that the success probabilities of our algorithm sharply transition to 1 as soon as the number of samples approaches s βq,d for a wide range of q and both d = 3, 4. These experimental results complement our Theorem 4 along with the lower bound analysis in Section 4 and empirically verify the performance of our algorithm. Noisy Setting. We repeated our experiments in the presence of an additive noise which is a linear combination of random spherical harmonics of degrees q + 1 to 2q. More precisely, we let the noise be n(σ) = P2q ℓ=q+1 cℓP ℓ d( v, σ ) H(2q)(Sd 1) \ H(q)(Sd 1) for cℓ s that are i.i.d. samples from N(0, 1). We then re-scale the noise to have norm n Sd 1 = 10 6. Furthermore, the function f is defined as before, and f is recovered by Algorithm 1 with s random evaluations of f + n on Sd 1. The heat-maps in Fig. 2 are generated by considering an instance of our algorithm as a success if the error s energy is below the noise level, f (q) f Sd 1 n Sd 1 = 10 6. The success probability transitions less sharply than the noiseless setting but the shift of probabilities starts at βs,q samples. 6 Conclusion We studied the problem of robustly recovering spherical harmonic expansion of a function defined on the sphere. The number of function evaluations needed to recover its degree-q expansion is the dimension of spherical harmonics of degree at most q, up to a logarithmic factor. We develop a simple yet efficient kernel regression-based algorithm to recover degree-q expansion of the function by only evaluating the function on uniformly sampled points on the sphere. Unlike the prior results on fast spherical harmonic transform, our algorithm works efficiently using a nearly optimal number of samples in any dimension. We believe our findings would appeal to the readership of the community. [AH12] Kendall Atkinson and Weimin Han. Spherical harmonics and approximations on the unit sphere: an introduction. Springer Science & Business Media, 2012. [AKK+20] Thomas D Ahle, Michael Kapralov, Jakob BT Knudsen, Rasmus Pagh, Ameya Velingker, David P Woodruff, and Amir Zandieh. Oblivious sketching of high-degree polynomial kernels. In Symposium on Discrete Algorithms (SODA), 2020. [AKM+17] Haim Avron, Michael Kapralov, Cameron Musco, Christopher Musco, Ameya Velingker, and Amir Zandieh. Random Fourier features for kernel ridge regression: Approximation bounds and statistical guarantees. In International Conference on Machine Learning (ICML), 2017. [AKM+19] Haim Avron, Michael Kapralov, Cameron Musco, Christopher Musco, Ameya Velingker, and Amir Zandieh. A universal sampling method for reconstructing signals with simple fourier transforms. In Symposium on the Theory of Computing (STOC), 2019. [BKM+23] Karl Bringmann, Michael Kapralov, Mikhail Makarov, Vasileios Nakos, Amir Yagudin, and Amir Zandieh. Traversing the fft computation tree for dimension-independent sparse fourier transforms. In Proceedings of the 2023 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 4768 4845. SIAM, 2023. [CKPS16] Xue Chen, Daniel M Kane, Eric Price, and Zhao Song. Fourier-sparse interpolation without a frequency gap. In Foundations of Computer Science (FOCS), 2016. [CLL15] Pui Tung Choi, Ka Chun Lam, and Lok Ming Lui. FLASH: Fast landmark aligned spherical harmonic parameterization for genus-0 closed brain surfaces. SIAM Journal on Imaging Sciences, 2015. [CP19] Xue Chen and Eric Price. Active Regression via Linear-Sample Sparsification. In Conference on Learning Theory (COLT), 2019. [EEHM17] Michael Eickenberg, Georgios Exarchakis, Matthew Hirn, and Stéphane Mallat. Solid harmonic wavelet scattering: Predicting quantum molecular energy from invariant descriptors of 3d electronic densities. Advances in Neural Information Processing Systems, 30, 2017. [EMM20] Tamás Erdélyi, Cameron Musco, and Christopher Musco. Fourier sparse leverage scores and approximate kernel learning. Neural Information Processing Systems (Neur IPS), 2020. [GCL+22] Jan Gerken, Oscar Carlsson, Hampus Linander, Fredrik Ohlsson, Christoffer Petersson, and Daniel Persson. Equivariance versus augmentation for spherical images. In International Conference on Machine Learning, pages 7404 7421. PMLR, 2022. [GMMM21] B Ghorbani, S Mei, T Misiakiewicz, and A Montanari. Linearized two-layers neural networks in high dimension. The Annals of Statistics, 2021. [HZA22] Insu Han, Amir Zandieh, and Haim Avron. Random Gegenbauer Features for Scalable Kernel Methods. In International Conference on Machine Learning, pages 8330 8358. PMLR, 2022. [HZL+22] Insu Han, Amir Zandieh, Jaehoon Lee, Roman Novak, Lechao Xiao, and Amin Karbasi. Fast neural kernel embeddings for general activations. Advances in neural information processing systems, 35:35657 35671, 2022. [KFR03] Michael Kazhdan, Thomas Funkhouser, and Szymon Rusinkiewicz. Rotation invariant spherical harmonic representation of 3 d shape descriptors. In Symposium on Geometry Processing, 2003. [KKS97] Marc Kamionkowski, Arthur Kosowsky, and Albert Stebbins. Statistics of cosmic microwave background polarization. Physical Review D, 1997. [KVZ19] Michael Kapralov, Ameya Velingker, and Amir Zandieh. Dimension-independent sparse fourier transform. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 2709 2728. SIAM, 2019. [Lan12] Serge Lang. Real and functional analysis. Springer Science & Business Media, 2012. [LMP13] Mu Li, Gary L Miller, and Richard Peng. Iterative row sampling. In Foundations of Computer Science (FOCS), 2013. [LP61] Henry J Landau and Henry O Pollak. Prolate spheroidal wave functions, Fourier analysis and uncertainty II. Bell System Technical Journal, 1961. [LP62] Henry J Landau and Henry O Pollak. Prolate spheroidal wave functions, fourier analysis and uncertainty III: the dimension of the space of essentially time-and band-limited signals. Bell System Technical Journal, 1962. [LWL+21] Yi Liu, Limei Wang, Meng Liu, Yuchao Lin, Xuan Zhang, Bora Oztekin, and Shuiwang Ji. Spherical message passing for 3d molecular graphs. In International Conference on Learning Representations, 2021. [MNY06] Ha Quang Minh, Partha Niyogi, and Yuan Yao. Mercer s theorem, feature maps, and smoothing. In Conference on Learning Theory (COLT), 2006. [Mor98] Mitsuo Morimoto. Analytic functionals on the sphere. American Mathematical Soc., 1998. [Pen30] WO Pennell. A generalized Fourier series representation of a function. The American Mathematical Monthly, 1930. [RT06] Vladimir Rokhlin and Mark Tygert. Fast algorithms for spherical harmonic expansions. SIAM Journal on Scientific Computing (SISC), 2006. [SA22] Paz Fink Shustin and Haim Avron. Semi-Infinite Linear Regression and Its Applications. SIAM Journal on Matrix Analysis and Applications (SIMAX), 2022. [SP61] David Slepian and Henry O Pollak. Prolate spheroidal wave functions, Fourier analysis and uncertainty I. Bell System Technical Journal, 1961. [SS00] Paul N Swarztrauber and William F Spotz. Generalized discrete spherical harmonic transforms. Journal of Computational Physics, 2000. [ST02] Reiji Suda and Masayasu Takami. A fast spherical harmonics transform algorithm. Mathematics of Computation, 2002. [Wei95] DR Weimer. Models of high-latitude electric potentials derived with a least error fit of spherical harmonic coefficients. Journal of Geophysical Research: Space Physics, 1995. [Wer97] Robert A Werner. Spherical harmonic coefficients for the potential of a constant-density polyhedron. Computers & Geosciences, 1997. [Wil12] Virginia Vassilevska Williams. Multiplying matrices faster than Coppersmith-Winograd. In Symposium on the Theory of Computing (STOC), 2012. [WZ22] David Woodruff and Amir Zandieh. Leverage score sampling for tensor product matrices in input sparsity time. In International Conference on Machine Learning, pages 23933 23964. PMLR, 2022. [XRY01] Hong Xiao, Vladimir Rokhlin, and Norman Yarvin. Prolate spheroidal wavefunctions, quadrature and interpolation. Inverse problems, 2001. [ZDK+22] Larry Zitnick, Abhishek Das, Adeesh Kolluru, Janice Lan, Muhammed Shuaibi, Anuroop Sriram, Zachary Ulissi, and Brandon Wood. Spherical channels for modeling atomic interactions. Advances in Neural Information Processing Systems, 35:8054 8067, 2022. [ZHA+21] Amir Zandieh, Insu Han, Haim Avron, Neta Shoham, Chaewon Kim, and Jinwoo Shin. Scaling Neural Tangent Kernels via Sketching and Random Features. In Neural Information Processing Systems (Neur IPS), 2021. A Properties of Gegenbauer Polynomials and Spherical Harmonics In this section we prove the basic properties of the Gegenbauer Polynomials as well as the Spherical Harmonics and establish the connection between the two. We start by the direct sum decomposition of the Hilbert space L2(Sd 1) in terms of the spherical harmonics, Lemma 2 (Direct Sum Decomposition of L2(Sd 1)). The family of spaces Hℓ Sd 1 yields a Hilbert space direct sum decomposition L2 Sd 1 = L ℓ=0 Hℓ Sd 1 : the summands are closed and pairwise orthogonal, and every f L2 Sd 1 is the sum of a converging series (in the sense of mean-square convergence with the L2-norm defined in Eq. (2)), where fℓ Hℓ Sd 1 are uniquely determined functions. Furthermore, given any orthonormal basis n yℓ 1, yℓ 2, . . . , yℓ αℓ,d o of Hℓ Sd 1 we have fℓ= Pαℓ,d j=1 f, yℓ j Sd 1 yℓ j. Proof. This is in fact a standard result. For example, see [Lan12] for a proof. Now we show that the Gegenbauer polynomials and spherical harmonics are related through the so called the addition theorem, Theorem 3 (Addition Theorem). For every integer ℓ 0, if n yℓ 1, yℓ 2, . . . , yℓ αℓ,d o is an orthonormal basis for Hℓ Sd 1 , then for any σ, w Sd 1 we have αℓ,d |Sd 1| P ℓ d ( σ, w ) = j=1 yℓ j(σ) yℓ j(w). Proof. The result can be proven analytically, using the properties of the Poisson kernel in the unit ball. This is classic and the proof can be found in [AH12, Theorem 2.9]. Next we show that the Gegenbauer kernels can project any function into the space of their corresponding spherical harmonics, Lemma 3 (A Reproducing Kernel for Hℓ Sd 1 ). For every f L2 Sd 1 , if f = P ℓ=0 fℓis the unique decomposition of f over L ℓ=0 Hℓ Sd 1 as per Lemma 2, then fℓis given by fℓ(σ) = αℓ,d E w U(Sd 1) f(w)P ℓ d ( σ, w ) for σ Sd 1. Proof. This is a classic textbook result, see [Mor98]. Now we prove that the Gegenbauer kernels satisfy the reproducing property for the Hilbert space Hℓ(Sd 1). Lemma 1 (Reproducing Property of Zonal Harmonics). Let P ℓ d( ) be the Gengenbauer polynomial of degree ℓin dimension d. For any x, y Sd 1: P ℓ d( x, y ) = αℓ,d Ew U(Sd 1) P ℓ d ( x, w ) P ℓ d ( y, w ) , Furthermore, for any ℓ = ℓ: Ew U(Sd 1) h P ℓ d ( x, w ) P ℓ d ( y, w ) i = 0. Proof. This result follows directly from the Funk Hecke formula (See [AH12]). However, we provide another proof here. First note that for every x Sd 1 the function P ℓ d ( x, ) Hℓ Sd 1 . Therefore the first claim follow by applying Lemma 3 on function f(σ) = P ℓ d ( x, σ ) which also satisfies fℓ= f. On the other hand, P ℓ d ( y, ) Hℓ Sd 1 for every y Sd 1. Thus, for ℓ = ℓ, using the fact that spherical harmonics are orthogonal spaces of functions, P ℓ d ( y, ) Hℓ Sd 1 , which gives the second claim. Next we prove that the kernel operator defined in Definition 2 is in fact a projection operator, Claim 1. The operator K(q) d defined in Definition 2 satisfies the property K(q) d 2 = K(q) d . Proof. For every f L2 Sd 1 and every σ Sd 1, using Definition 2 we have, K(q) d 2 f (σ) = αℓ ,d |Sd 1| D K(q) d f, P ℓ d ( σ, ) E ℓ =0 αℓ,dαℓ ,d E w U(Sd 1) P ℓ d ( σ, w ) E τ U(Sd 1) P ℓ d ( τ, w ) f(τ) # ℓ =0 αℓ,dαℓ ,d E τ U(Sd 1) f(τ) E w U(Sd 1) h P ℓ d ( σ, w ) P ℓ d ( τ, w ) i# ℓ=0 αℓ,d E τ U(Sd 1) f(τ) P ℓ d ( σ, τ ) = h K(q) d f i (σ), where the fourth line above follows from Lemma 1. This proves the claim. B Reducing the Interpolation Problem to a Least-Squares Regression In this section we show that our spherical harmonic interpolation problem, i.e., Problem 2, can be solved by approximately solving a least-squares problem as claimed in Claim 2. We start by showing that for any function f L2(Sd 1), K(q) d f gives its low-degree component. More precisely, let f = P ℓ=0 fℓbe the decomposition of f over the Hilbert sum L ℓ=0 Hℓ Sd 1 as per Lemma 2. Now if we let K(q) d be the kernel operator from Definition 2 and if n yℓ 1, yℓ 2, . . . , yℓ αℓ,d o is an orthonormal basis for Hℓ Sd 1 , then by Theorem 3 we have, h K(q) d f i (σ) = ℓ=0 αℓ,d E w U(Sd 1) f(w)P ℓ d ( σ, w ) Sd 1 E w U(Sd 1) j=1 yℓ j(σ) yℓ j(w) j=1 yℓ j(σ) Sd 1 E w U(Sd 1) f(w) yℓ j(w) j=1 f, yℓ j(w) Sd 1 yℓ j(σ) ℓ=0 fℓ(σ) = f (q)(σ), where the the second line above follows from Theorem 3, the fourth line follows from Eq. (2), and the last line follows from Lemma 2. This proves that the low-degree component f (q) = K(q) d f. Claim 2. For any f L2 Sd 1 , any integer q 0, and any C 1, if g L2 Sd 1 satisfies, K(q) d g f 2 Sd 1 C min g L2(Sd 1) K(q) d g f 2 and if we let f (q) := K(q) d f, where K(q) d is defined as per Definition 2, then the following holds K(q) d g f (q) 2 Sd 1 (C 1) f (q) f 2 Proof. First, note that g = f is an optimal solution to the least-squares problem in Eq. (7). Thus we have, min g L2(Sd 1) K(q) d g f 2 Sd 1 = K(q) d f f 2 Sd 1 = f (q) f 2 Next, we can write, K(q) d g f 2 Sd 1 = K(q) d g K(q) d f + K(q) d f f 2 = K(q) d ( g f) + K(q) d f f 2 = K(q) d ( g f) 2 Sd 1 + K(q) d f f 2 = K(q) d g f (q) 2 Sd 1 + f (q) f 2 where the third line follows from the Pythagorean theorem because K(q) d ( g f) H(q) Sd 1 while K(q) d f f = P ℓ>q fℓ, thus K(q) d f f H(q) Sd 1 . Combining the two equalities above with inequality K(q) d g f 2 Sd 1 C ming L2(Sd 1) K(q) d g f 2 Sd 1 given in the statement of the claim, proves Claim 2. C Approximate Regression via Leverage Score Sampling In this section we ultimately prove our main result of Theorem 4. We start by proving useful properties of the leverage function given in Definition 3. First, we show the fact that the leverage function can be characterized in terms of a least-squares minimization problem, which is crucial for computing the leverage scores distribution. This fact was previously exploited in [AKM+17] and [AKM+19] in the context of Fourier operators. Lemma 4 (Min Characterization of the Leverage Function). For any w Sd 1, let τq(w) be the leverage function (Definition 3) and define φw L2(Sd 1) by φw(σ) Pq ℓ=0 αℓ,d |Sd 1|P ℓ d ( σ, w ). We have the following minimization characterization of the leverage function: τq(w) = min g L2(Sd 1) g 2 Sd 1, s.t. K(q) d g = φw We remark that this lemma is in fact an adaptation and generalization of Theorem 5 of [AKM+19]. We prove this lemma here for the sake of completeness. Proof. First we show that the right hand side of Eq. (9) is never smaller than the leverage function in Definition 3. Let g w L2(Sd 1) be the optimal solution of Eq. (9) for any w Sd 1. Note that the optimal solution satisfies K(q) d g w = φw. Thus, for any function f L2(Sd 1), using Definition 2, we can write h K(q) d f i (w) 2 = ℓ=0 αℓ,d E σ U(Sd 1) P ℓ d ( σ, w ) f(σ) = | φw, f Sd 1|2 = D K(q) d g w, f E = D g w, K(q) d f E 2 (because K(q) d is self-adjoint) g w 2 Sd 1 K(q) d f 2 Sd 1 (by Cauchy Schwarz inequality) Therefore, for any f L2(Sd 1) with K(q) d f Sd 1 > 0, we have h K(q) d f i (w) 2 g w 2 Sd 1. (11) We conclude the proof by showing that the maximum value is attained. First, we show that the optimal solution g w of Eq. (9) satisfies the property that K(q) d g w = g w. Suppose for the sake of contradiction that K(q) d g w = g w. In this case, Claim 1 implies that, K(q) d K(q) d g w g w = K(q) d 2 g w K(q) d g w = K(q) d g w K(q) d g w = 0. Thus, the function g = K(q) d g w satisfies the constraint of the minimization problem in Eq. (9). Now, using the above and the fact that K(q) d is self-adjoint we can write, D K(q) d g w, K(q) d g w g w E Sd 1 = D g w, K(q) d K(q) d g w g w E This shows that K(q) d g w K(q) d g w g w , hence by Pythagorean theorem we have, g w 2 Sd 1 = K(q) d g w 2 Sd 1 + K(q) d g w g w 2 Sd 1 > K(q) d g w 2 Sd 1 = g 2 Sd 1 , which is in contrast with the assumption that g w is the optimal solution of Eq. (9). Therefore, our claim that K(ℓ) d g w = g w holds. Now, we show that for f = g w, the maximum value in inequality Eq. (11) is attained. For any w Sd 1 we have the following h K(q) d f i (w) = D K(q) d g w, f E Sd 1 = g w, g w Sd 1 = g w 2 Sd 1. On the other hand we have K(q) d f 2 Sd 1 = g w 2 Sd 1. Thus, K(q) d f 2 Sd 1 h K(q) d f i (w) 2 = g w 2 Sd 1 which implies that τq(w) = g w 2 Sd 1 and thus proves the lemma. Next we prove that the leverage function is constant. Lemma 5 (Leverage Function is Constant). The leverage function given in Definition 3 is equal to τq(w) = Pq ℓ=0 αℓ,d |Sd 1| for every w Sd 1. Proof. First we prove that τq(w) Pq ℓ=0 αℓ,d |Sd 1| using the min-characterization. If we let φw L2(Sd 1) be defined as φw(σ) := Pq ℓ=0 αℓ,d |Sd 1|P ℓ d ( σ, w ), then by Definition 2, for every σ Sd 1 we can write, h K(q) d φw i (σ) = ℓ=0 αℓ,d E v U(Sd 1) P ℓ d ( σ, v ) φw(v) |Sd 1| E v U(Sd 1) h P ℓ d ( σ, v ) P ℓ d ( v, w ) i αℓ,d |Sd 1|P ℓ d ( σ, w ) = φw(σ), (12) where the third line above follows from Lemma 1. Therefore, the test function g := φw satisfies the constraint of the minimization in Eq. (9), i.e., K(q) d g = φw. Thus, Lemma 4 implies that, τq(w) g 2 Sd 1 = φw 2 Sd 1 = αℓ,d |Sd 1|, where the equality above follows from Lemma 1 along with Eq. (2). This establishes the upper bound on the leverage function that we sought to prove. Now, using the maximization characterization of the leverage function in Definition 3, we prove that τq(w) Pq ℓ=0 αℓ,d |Sd 1|. Again, we consider the same test function g = φw and write, K(q) d φw 2 Sd 1 h K(q) d φw i (w) 2 = |φw(w)|2 Pq ℓ=0 αℓ,d |Sd 1|P ℓ d ( w, w ) 2 Pq ℓ=0 αℓ,d |Sd 1| Pq ℓ=0 αℓ,d |Sd 1|P ℓ d(1) 2 Pq ℓ=0 αℓ,d |Sd 1| = αℓ,d |Sd 1|, where the first and second line above follow from Eq. (12) and Lemma 1, respectively. Therefore, the max characterization of the leverage function in Definition 3 implies that, τq(w) K(q) d φw 2 Sd 1 h K(q) d φw i (w) 2 = αℓ,d |Sd 1|. This completes the proof of Lemma 5 and establishes that τq(w) is uniformly equal to Pq ℓ=0 αℓ,d |Sd 1|. To prove Theorem 4, we need to use prior results about solving linear systems in continuous setting via leverage score sampling. In particular, we use Theorem 6.3 from [CP19], which is restated below, Theorem 7 (Theorem 6.3 of [CP19]). Consider any dimension n linear space F of functions from a domain G to R. Let D be a distribution over G and f be some function from G to R. Also, define the norm with respect to D of any function h : G R as h 2 D := Ex D[|h(x)|2] and let y = arg minh F h f 2 D. Fix any distribution D over G and let KD := supx G suph F n D(x) D (x) |h(x)|2 For i.i.d. sample query points x1, x2, . . . xs D and weights wi = D(xi) s D (xi) for i [s], let the weighted ERM estimator efs be defined as efs := arg minh F Ps i=1 wi |h(xi) f(xi)|2. For any ε > 0 and a sufficiently large fixed constant c, if the number of queries s c KD log n + KD then the weighted ERM estimator efs satisfies, D ε f y 2 D Now we are ready to prove Theorem 4. Our approach is to apply Theorem 7 to the space of degree q spherical harmonics H(q) Sd 1 and use the fact that the leverage scores distribution of the operator K(q) d are uniform on the unit sphere Sd 1. Theorem 4 (Approximate Regression via Leverage Function Sampling). For any ε > 0, let s = c βq,d log βq,d + βq,d ε , for sufficiently large fixed constant c, and let x1, x2, . . . , xs be i.i.d. uniform samples on Sd 1. Define the quasi-matrix P : Rs L2(Sd 1) as follows, for every v Rd: [P v](σ) := αℓ,d s |Sd 1| j=1 vj P ℓ d ( xj, σ ) for σ Sd 1. Also let f Rs be a vector with fj := 1 s f(xj) for j = 1, 2, . . . , s and let P be the adjoint of P . If g is an optimal solution to the least-squares problem g arg ming L2(Sd 1) P g f 2 2, then with probability at least 1 10 4 the following holds, K(q) d g f 2 Sd 1 (1 + ε) min g L2(Sd 1) K(q) d g f 2 Proof. We prove this theorem by invoking Theorem 7. To do so, we first let the space F of function from Sd 1 to R be F := H(q) Sd 1 . It is clear that the space of spherical harmonics is a linear space of functions because of the existence of the kernel operator K(q) d which is a projection operator onto H(q) Sd 1 , so F satisfies the first precondition of Theorem 7. Additionally, the dimension of this space of functions is n = βq,d. Also, let D be a uniform distribution over the unit sphere Sd 1. For this distribution, the norm defined in Theorem 7 satisfies h 2 Sd 1 = |Sd 1| h 2 D, for any h L2(Sd 1). Moreover, let D be a uniform distribution over the unit sphere Sd 1 as well. Now we show that for these choices of F, D, D , the condition number KD defined as per Theorem 7 is equal to βq,d. We can write, KD := sup x Sd 1 sup h F D (x) |h(x)|2 = |Sd 1| sup x Sd 1 sup h F = |Sd 1| sup x Sd 1 sup g L2(Sd 1) h K(q) d g i (x) 2 = |Sd 1| sup x Sd 1 τq(x) where the second line above follows from the fact that D, D are both equal to the uniform distribution over Sd 1 and h 2 Sd 1 = |Sd 1| h 2 D. The third line above follows from the fact that any function h H(q) Sd 1 can be expressed as h = K(q) d g for some g L2(Sd 1) because K(q) d is the projection operator onto H(q) Sd 1 . The fourth line follows from Definition 3 and last line follows from Lemma 5. Finally, in order to invoke Theorem 7, we can write that the weighted ERM estimator efs is equal to the following, efs := arg min h F i=1 wi |h(xi) f(xi)|2 = arg min g L2(Sd 1) i=1 wi h K(q) d g i (xi) f(xi) 2 = arg min g L2(Sd 1) 1 s h K(q) d g i (xi) fi = arg min g L2(Sd 1) P g f 2 2 , where the third line above uses the definition of fi = 1 sf(xi), and the last line follows from the definition of adjoint of the quasi-matrix P . Therefore, the theorem follows by invoking Theorem 7. D Efficient Algorithm for Spherical Harmonic Interpolation In this section we prove our main theorem about our spherical harmonic interpolation algorithm. Theorem 5 (Efficient Spherical Harmonic Interpolation). Algorithm 1 returns a function y H(q)(Sd 1) such that, with probability at least 1 10 4: y f (q) 2 Sd 1 ε f (q) f 2 Sd 1 , where f (q) := K(q) d f. Suppose we can compute the Gegenbauer polynomial P ℓ d(t) at every point t [ 1, 1] in constant time. Then Algorithm 1 queries the function f at s = O βq,d log βq,d + βq,d ε points on the sphere Sd 1 and runs in O(s2 d + sω) time. This algorithm evaluates y(σ) in O(d s) time for any σ Sd 1. Proof. First note that the random points w1, w2, . . . , ws in line 3 of Algorithm 1 are i.i.d. sample with uniform distribution on the surface of Sd 1. Therefore, we can invoke Theorem 4. More specifically, if we let P be the quasi-matrix defined in Theorem 4 corresponding to the random points w1, w2, . . . , ws sampled in line 3 and if we let f be the vector of function samples defined in line 5 of the algorithm, then with probability at least 1 10 4, any optimal solution to the following least-squares problem g arg min g L2(Sd 1) P g f 2 2 , (13) satisfies the following, K(q) d g f 2 Sd 1 (1 + ε) min g L2(Sd 1) K(q) d g f 2 Sd 1 . (14) Now note that the least-squares problem in Eq. (13) has at least one optimal solution g which is in the eigenspace of the operator P P . More specifically, there exists a vector z Rs such that g = P z is an optimal solution for Eq. (13). Therefore, we can focus on finding this optimal solution by solving the following least-squares problem z arg min x Rs P P x f 2 2 , and then letting g = P z. This g is guaranteed to be an optimal solution for Eq. (13), thus it satisfies Eq. (14). We solve the above least-squares problem using the kernel trick. In fact we show that P P is equal to the kernel matrix K computed in line 4 of Algorithm 1. To see why, note that for any i, j [s] we have, [P P ]i,j = αℓ,d s |Sd 1| P ℓ d ( wi, ) , αℓ,d s |Sd 1| P ℓ d ( wj, ) αℓ,dαℓ ,d s |Sd 1|2 D P ℓ d ( wi, ) , P ℓ d ( wj, ) E αℓ,dαℓ ,d s |Sd 1| E v U(Sd 1) h P ℓ d ( wi, v ) P ℓ d ( wj, v ) i αℓ,d s |Sd 1| P ℓ d ( wi, wj ) = Ki,j, where the fourth line above follows from Lemma 1. Therefore, we are interested in the optimal solution of the following least-squares problem z arg min x Rs Kx f 2 2 . The least-squares solution to the above problem is z = K f which is exactly what is computed in line 6 of the algorithm. Now note that, the function g = P z satisfies Eq. (14). Because g = P z H(q)(Sd 1) and because K(q) d is an orthonormal projection operator into H(q)(Sd 1), we have K(q) d g = g = P z. This together with Eq. (14) imply that, P z f 2 Sd 1 (1 + ε) min g L2(Sd 1) K(q) d g f 2 Now if we invoke Claim 2 with C = 1 + ε on the above inequality we find that, P z f (q) 2 Sd 1 ε f (q) f 2 Finally, one can easily see that the function y H(q)(Sd 1) that Algorithm 1 outputs in line 7 is exactly equal to y = P z. This completes the accuracy bound of the theorem. Runtime and Sample Complexity. these bounds follow from observing that: s d time is needed to generate w1, w2, . . . , ws in line 3 of the algorithm. To do this, we first generate random Gaussian points in Rd and then project then onto Sd 1 by normalizing them. s2 d operations are needed to form the kernel matrix K in line 4 of the algorithm. s queries to function f are needed to form the samples vector f in line 5 of the algorithm. sω time is needed to compute the least-squares solution z = K f in line 6 of the algorithm. s d operations are needed to evaluate the output function y(σ) in line 7 of the algorithm. This completes the proof of Theorem 5. E Lower Bound: Claims and Lemmas In this section we prove the Claims and Lemmas used in our lower bound analysis for proving Theorem 6. Claim 4. Given the random input f = Pq ℓ=0 Yℓ v(ℓ) generated as described in Section 4, to solve Problem 2, an algorithm must return a function f (q) H(q) Sd 1 such that f (q) f 2 Sd 1 = 0. Proof. Note that Problem 2 requires recovering a function f (q) H(q) Sd 1 such that: f (q) f (q) 2 Sd 1 ε f (q) f 2 Sd 1 , (15) where f (q) = K(q) d f. Using the definition of the input function f = Pq ℓ=0 Yℓ v(ℓ), we can write, f (q) = K(q) d f = ℓ=0 K(q) d Yℓ v(ℓ) ℓ =0 Yℓ Y ℓ ℓ=0 Yℓ v(ℓ) = f, where the equality in the second line above follows from Eq. (10) and the addition theorem in Theorem 3, and the third line follows because the operator Yℓhas orthonormal columns and thus Y ℓ Yℓ= Iαℓ,d 1{ℓ=ℓ }. Therefore, plugging this into Eq. (15) gives, f (q) f 2 Sd 1 = f (q) f (q) 2 Sd 1 ε f (q) f 2 Sd 1 = ε f f 2 Sd 1 = 0. Lemma 6. If a deterministic algorithm solves Problem 2 with probability at least 1/10 over our random input distribution f = Pq ℓ=0 Yℓ v(ℓ), then with probability at least 1/10, the output of the algorithm f (q) satisfies Y ℓ f (q) = v(ℓ) for all integers ℓ q. Proof. By Claim 4, the output of the algorithm that solves Problem 2, satisfies f (q) f 2 Sd 1 = 0. Therefore, by orthonormality of the columns of the operator Yℓ, we can write, Y ℓ f (q) = Y ℓf + Y ℓ( f (q) f) = ℓ =0 Y ℓYℓ v(ℓ ) = v(ℓ).