# when_locally_linear_embedding_hits_boundary__c24b9495.pdf

Journal of Machine Learning Research 24 (2023) 1-80 Submitted 6/21; Revised 2/23; Published 3/23

When Locally Linear Embedding Hits Boundary

Hau-Tieng Wu HAUWU@MATH.DUKE.EDU Department of Mathematics and Department of Statistical Science, Duke University, Durham, NC 27708, United States;

Nan Wu NAN.WU@UTDALLAS.EDU Department of Mathematical Sciences, The University of Texas at Dallas, Richardson, TX 75080, United States

Editor: Aapo Hyvarinen

Based on the Riemannian manifold model, we study the asymptotic behavior of a widely applied unsupervised learning algorithm, locally linear embedding (LLE), when the point cloud is sampled from a compact, smooth manifold with boundary. We show several peculiar behaviors of LLE near the boundary that are different from those diffusion-based algorithms. In particular, we show that LLE pointwisely converges to a mixed-type differential operator with degeneracy and we calculate the convergence rate. The impact of the hyperbolic part of the operator is discussed and we propose a clipped LLE algorithm which is a potential approach to recover the Dirichlet Laplace-Beltrami operator.

Keywords: Locally linear embedding; manifold learning; manifold with boundary; mixed-type differential operator; Dirichlet Laplace-Beltrami operator.

1. Introduction

Arguably, unsupervised learning is the holy grail of artificial intelligence. While a lot of challenges are on different fronts, many attempts have been explored, including ISOMAP (Tenenbaum et al., 2000), locally linear embedding (LLE) (Roweis and Saul, 2000), Hessian LLE (Donoho and Grimes, 2003), eigenmap (Belkin and Niyogi, 2003), diffusion map (DM) (Coifman and Lafon, 2006), vector diffusion map (VDM) (Singer and Wu, 2012), t-distributed stochastic neighboring embedding (van der Maaten and Hinton, 2008), maximal variation unfolding (Weinberger and Saul, 2006), to name but a few. In this paper, based on the Riemannian manifold model, we study the asymptotic behavior of LLE when the point cloud is sampled from a compact, smooth manifold with boundary. LLE is an algorithm based on a rudimentary idea by well parametrizing the dataset locally, we can patch all local information to recover the global one. It has been widely applied in different fields and has been cited more than 15,800 times according to Google Scholar. However, its theoretical justification for data points sampled on compact manifolds without boundary was only made available at the end of 2017 (Wu and Wu, 2018; Malik et al., 2019). Essentially, the established theory says that under the manifold without boundary setup, LLE has several peculiar behaviors that are different from those of diffusion-based algorithms, including eigenmap and DM. First, unlike DM, LLE may not behave like a diffusion process since the associated kernel function is not always positive. Second, it is sensitive to the regularization, and different regularizations lead to different

2023 Hau-Tieng Wu and Nan Wu.

License: CC-BY 4.0, see https://creativecommons.org/licenses/by/4.0/. Attribution requirements are provided at http://jmlr.org/papers/v24/21-0697.html.

differential operators. If the regularization is chosen properly, LLE asymptotically converges to the Laplace-Beltrami operator without extra probability density function (p.d.f.) estimation, even if the p.d.f. is not uniform. However, when the regularization is not chosen properly, LLE converges to a fourth order differential operator in the cases like the spheres. Third, when the regularization is chosen properly, the convergence of LLE to the Laplace-Beltrami operator is comparable to that of DM with a proper normalization (Coifman and Lafon, 2006; Singer and Wu, 2017; Cheng and Wu, 2022). Fourth, the kernel associated with LLE is in general not symmetric, and this asymmetric kernel depends on the curvature and p.d.f. information. Fifth, the kernel depends on the local covariance matrix analysis and the Mahalanobis distance, since it is the mix-up of the ordinary kernel and a special kernel depending on the Mahalanobis distance (Malik et al., 2019). While several theoretical properties have been discussed in Wu and Wu (2018) and (Malik et al., 2019), there are more open problems about LLE left. In this paper, we are interested in exploring the asymptotic behavior of LLE when the manifold has a boundary. First, we show that asymptotically LLE pointwisely converges to a mixed-type differential operator with degeneracy and we calculate the convergence rate. Second, after showing that the asymptotic operator near the boundary involves singular coefficients, we study the 1-dim manifold case and relate the eigenvalue problem of LLE to a Sturm-Liouville equation. Third, through a series of numerical simulations, we explore the impact of the hyperbolic part of the operator. In those simulations, we modify the LLE by clipping certain points that are close to the boundary, which asymptotically is equivalent to eliminating the hyperbolic part of the operator, then we obtain an algorithm that is potential to recover the Laplace-Beltrami operator with the Dirichlet boundary condition. This enlightens a new approach to recovering the Dirichlet Laplace-Beltrami operator. Fourth, we compare LLE with DM to explain the differences between their behaviors on the boundary. The paper is organized in the following way. In Section 2, we review the LLE algorithm and provide some spectral properties of LLE on the linear algebra level. In Section 3, we provide the manifold model when the boundary is not empty, and develop the asymptotic theory for the LLE matrix, particularly the associated kernel behavior and its relationship with the geometrical structure of the manifold. In Section 7, we discuss the clipped LLE which potentially leads to the Laplace Beltrami operator with the Dirichlet boundary condition. Numerical simulations of the clipped LLE are provided. The paper is closed with the discussion in Section 8. Technical proofs are postponed to the Appendices. For reproducibility purposes, the Matlab code to reproduce figures in this paper can be downloaded from http://hautiengwu.wordpress.com/code/.

2. Review locally linear embedding

We start with some matrix notations. For p,r N so that r p, let Ir Rr r be the identity matrix.

Denote Jp,r = Ir 0

Rp r, i.e. the (i,i)-th entry of Jp,r is 1 for i = 1,...,r, and the other entries are

zero. Denote Jp,r = 0 Ir

Rp r, i.e. the (p r +i,i) entry of Jp,r is 1 for i = 1,...,r, and the other

entries are zero. Denote Ip,r := Jp,r J p,r = Ir 0 0 0

Rp p and Ip,r := Jp,r J p,r = 0 0 0 Ir

Finally, for d r p, define Jp,r d := Jp,p d Jp d,r d Rp (r d). We quickly recall necessary information about LLE and refer readers with interest in more discussion to (Roweis and Saul, 2000; Wu and Wu, 2018). The key ingredient of LLE is the barycentric

WHEN LOCALLY LINEAR EMBEDDING HITS BOUNDARY

coordinate, which is a quantity shown in Wu and Wu (2018) to be parallel to the kernel chosen in the graph Laplacian. Suppose we have the point cloud X = {zi}n i=1 Rp. There are two nearest neighbor search schemes to proceed. The first one is the ε-radius ball scheme. Fix ε > 0. For zk X , assume there are Nk data points, excluding zk, in the ε-radius ball centered at zk. The second one is the K-nearest neighbor (KNN) scheme used in the original LLE algorithm (Roweis and Saul, 2000); that is, for a fixed K N, find the K neighboring points. Fix one nearest neighbor search scheme, and denote the nearest neighbors of zk X as Nk = {zk,i}Nk i=1. Then the barycentric coordinate of zk associated with Nk, denoted as wk, is defined as the solution of the following optimization problem:

wk = argmin w RNk,w 1Nk=1

Nk j=1 w(j)zk,j 2 = argmin w RNk,w 1Nk=1 w G n,k Gn,kw RNk, (2.1)

where 1Nk is a vector in RNk with all entries 1 and

| | zk,1 zk ... zk,Nk zk | |

Rp Nk (2.2)

is called the local data matrix. In general, G n,k Gn,k might be singular, and it is suggested in Roweis and Saul (2000) to stabilize the algorithm by regularizing the equation and solve

(G n,k Gn,k +c INk Nk)yk = 1Nk , wk = yk y k 1Nk , (2.3)

where c > 0 is the regularizer chosen by the user. As is shown in Wu and Wu (2018), the regularizer plays a critical role in LLE. With the barycentric coordinate of xk for k = 1,...,n, the LLE matrix, which is a n n matrix denoted as W, is defined as

Wki = wk(j) if zi = zk,j Nk; 0 otherwise. (2.4)

The barycentric coordinates are invariant under rotation and translation, because the matrix Gn,k is invariant under translation, and G n,k Gn,k is invariant under rotation. As discussed in Wu and Wu (2018), the barycentric coordinates can be understood as the projection of 1Nk onto the null space of G n,k Gn,k . Suppose rn = rank(G n,k Gn,k). Note that rn = rank(Gn,k) = rank(G n,k Gn,k) = rank(Gn,k G n,k) min(Nk, p) p and Gn,k G n,k is positive (semi-)definite. Denote the eigen-decomposition of the matrix Gn,k G n,k as UnΛn U n , where Λn = diag(λn,1,λn,2,...,λn,p), λn,1 λn,2 λn,rn > λn,rn+1 = = λn,p = 0, and Un O(p). Denote

Ic(Gn,k G n,k) := Un Ip,rn(Λn +c Ip p) 1U n , (2.5)

Tn,xk := Ic(Gn,k G n,k)Gn,k1Nk . (2.6)

Then, it is shown in (Wu and Wu, 2018, Section 2) that the solution to (2.3) is

y k =c 11 Nk c 1T n,xk Gn,k, (2.7)

w k = 1 Nk T n,xk Gn,k Nk T n,xk Gn,k1Nk . (2.8)

Note that Nk T n,xk Gn,k1Nk in the denominator of (2.8) is the sum of entries of 1 Nk T n,xk Gn,k in the numerator, so we could view y k as the kernel function associated with LLE, and w k as the normalized kernel. To reduce the dimension of X , it is suggested in Roweis and Saul (2000) to embed X into a low dimension Euclidean space via

zk 7 Yk = [v1(k), ,vℓ(k)] Rℓ (2.9)

for each zk X , where ℓ N is the dimension of the embedded points chosen by the user and v1, ,vℓ Rn are eigenvectors of (I W) (I W) corresponding to the ℓsmallest eigenvalues.

2.1 Spectral properties of the LLE matrix

We provide some spectral properties of the LLE matrix. Unlike the graph Laplacian (GL), in general, W is not a symmetric matrix or a Markov transition matrix, according to the analysis shown in Wu and Wu (2018). For A Rn n, let σ(A) C be the spectrum of A and define ρ(A) to be the spectral radius of A.

Proposition 1 The LLE matrix W Rn n satisfies ρ(W) 1.

The proof of the proposition is straightforward. Since W1 = 1, where 1 is an n-dim vector with all entries 1, 1 σ(W). Thus we have that ρ(W) 1. In Appendix A, we construct an example to show that it is possible for ρ(W) > 1.

Since in general, the LLE matrix W may not be symmetric, the eigenvalues might be complex and can be complicated. For example, in the null case that 400 points are sampled independently and identically from a 200-dim Gaussian random vector with 0 mean and identity covariance, the eigenvalue distribution of W spreads on the complex plane. See Figure 1 for the distribution of such a dataset. However, in some special cases, we can well control the imaginary part of the distribution. Consider the symmetric and anti-symmetric parts of W, W + = (W +W )/2 and W = (W W )/2, so that W = W + +W . By applying the Bauer-Fike theorem with the L2 norm and Holder s inequality, for any eigenvalue λ of W, there is a real eigenvalue µ of W + such that |λ µ| W 2 p

W 1 W . Below we show that the imaginary part of eigenvalues of the LLE matrix W is well controlled under some conditions.

Proposition 2 Denote N = maxk Nk, where Nk = |Nk|. If maxi,j |Wi j Wji| Cε

N for some C 0, the imaginary part of eigenvalues of the LLE matrix W is of order ε.

WHEN LOCALLY LINEAR EMBEDDING HITS BOUNDARY

-0.5 0 0.5 1 Real

Figure 1: The distribution of eigenvalues of the LLE matrix, where W is constructed with 50 nearest neighbors. In this example, the top eigenvalue is 1.

Proof Note that W ij = 0 if zi zj 2 ε and W i j might be nonzero if zi zj 2 < ε. Since p

W 1 W N maxi,j |Wij Wji|, based on the assumption, the imaginary part of eigenvalues of W is bounded by O(ε).

Note that maxi,j |Wij Wji| measures the similarity of different ε-neighborhood Nk. Thus, the assumption that maxi,j |Wij Wji| Cε

N for some C 0 means that the affinity graph is not too imbalanced . This assumption holds asymptotically under the manifold setup. Since the KNN scheme and the ε-radius ball scheme are directly related under a suitable manipulation as is shown in Wu and Wu (2018, Section 5), from now on we fix to the ε-radius ball scheme in the rest of the paper for the sake of theoretical analysis.

3. Preliminaries for LLE under the manifold with boundary setup

3.1 Main assumptions

In this subsection, we summarize the major assumptions that we need in this paper. First, we have the following assumption about the manifold M.

Assumption 3.1 Let (M,g) be a d-dimensional compact, smooth Riemannian manifold with boundary isometrically embedded in Rp via ι : M , Rp. We assume the boundary of M is smooth.

Next, we make the following assumption about the sample points on the manifold M.

Assumption 3.2 Suppose (Ω,F,P) is a probability space, where P is a probability measure defined on the Borel sigma algebra F on Ω. Let X be a random variable on (Ω,F,P) with the range on (M,g). We assume P := X P is absolutely continuous with respect to the volume measure on M associated with g so that d P = Pd V by the Radon-Nikodym theorem, where d V is the volume form of M and P is a non-negative function defined on M. We call P the probability density function (p.d.f.) associated with X. We further assume P C2(M) and 0 < Pm P(x) PM for all x M. We assume {x1 ,xn} M are i.i.d. sampled from P.

Remark 3 Under the regularity assumption of the boundary and the density function in this model, in general, the chance to sample a point on the boundary is zero, unless we further assume the knowledge of the boundary and sample on the boundary. Without the knowledge of the boundary, an estimate of the boundary is therefore needed. Such estimate has wide applications including the distance to the boundary estimation and kernel density estimation on a manifold with boundary. We refer the readers to Berry and Sauer (2017) for a discussion.

Remark 4 We refer the readers to Lee (2012) for a discussion of the differentiability of a function on the boundary of the manifold. Compared with the P C5(M) requirement imposed in Wu and Wu (2018), in this work we only assume P C2(M). In Wu and Wu (2018), we need P C5(M) to explore the regularization effect on the whole algorithm. In this work, since we will fix the regularization and focus on the boundary, P C2(M) is sufficient.

We adopt the notations in Section 2. Let X = {zi = ι(xi)}n i=1. Fix ε > 0, we propose the choice of the parameter for regularization: c = nεd+3.

Then, we construct the LLE matrix W by using X and c = nεd+3 as shown in (2.3) and (2.4).

3.2 Manifold with boundary setup

The manifold setup is nowadays standard and has been considered to study several algorithms, including Eigenmap (Belkin and Niyogi, 2007), DM (Coifman and Lafon, 2006; Trillos et al., 2020), VDM (Singer and Wu, 2012, 2017), LLE (Wu and Wu, 2018) and several others, like the gradient estimation (Mukherjee et al., 2010), diffusion on the fiber structure (Lin et al., 2018; Gao, 2021), Bayesian regression (Yang and Dunson, 2016), extrinsic local regression (Lin et al., 2017), image processing model (Osher et al., 2017), sensor fusion algorithm (Shnitzer et al., 2018), to name but a few. Although the manifold model is standard, when the boundary is non-empty, it is less discussed in the literature. We introduce the following setup for manifold with boundary. See Vaughn et al. (2019) for a different treatment. Denote dg( , ) to be the geodesic distance associated with g. For ε > 0, define the ε-neighborhood of M as Mε = {x M|dg(x, M) < ε}. (3.1)

For the tangent space Tx M on x M, denote ι Tx M to be the embedded tangent space in Rp

and (ι Tx M) be the normal space at ι(x). Let IIx be the second fundamental form of ι(M) at ι(x). Denote Sd 1 to be the (d 1)-dim unit sphere embedded in Rp, and |Sd 1| be its volume. Denote {ei}p i=1 to be the canonical basis of Rp, where ei is a unit vector with 1 in the i-th entry. Since the barycentric coordinate is rotational and translational invariant, without loss of generality, when we analyze local behaviors around x M in this paper, we implicitly assume that the manifold has been properly translated and rotated so that ι Tx M is spanned by e1,...,ed. We consider the following extension of M and ι(M) (Wong, 2008). By the Whitney extension Theorem (Whitney, 1992), there is a compact manifold with boundary M and δ > 0 satisfying the following properties:

1. M is an isometric extension of M.

2. d M( M, M) δ, where d M is the geodesic distance measured in M.

WHEN LOCALLY LINEAR EMBEDDING HITS BOUNDARY

3. M is isometrically embedded in Rp via ι such that ι|M = ι. Thus, ι(M) ι( M).

Due to the above extension, we abuse the notation and use expx to denote the exponential map of M at x M and the exponential map of M at x M. Note that when x Mε, the exponential map of M at x may not be well defined on ι 1(BRp ε (ι(x)) ι(M)). For example, consider an annulus in the plane and ι(x) is a point on the inner circle. However, when ε < δ

2 and ε is small enough, we can make sure that BRp ε (ι(x)) ι( M) is contained in the interior of ι( M) and expx is well defined over ι 1(BRp ε (ι(x)) ι( M)). Since BRp ε (ι(x)) ι(M) BRp ε (ι(x)) ι( M), expx is well defined over ι 1(BRp ε (ι(x)) ι(M)). Hence, we conclude that expx is well defined over ι 1(BRp ε (ι(x)) ι(M)) for any x in both Mε and M \Mε. Now, we can handle the ε-ball near the boundary. For x Mε, define

Dε(x) = ( ι expx) 1(BRp ε (ι(x)) ι(M)) Tx M ,

where Tx M is identified with Rd. Denote x := argminy M d(y,x) and

εx = min y Md(y,x). (3.2)

Due to the smoothness assumption of the boundary, if ε is sufficiently small, such x is unique. Clearly, we have 0 εx ε when x Mε. Choose the normal coordinates { i}d i=1 around x so that x = ι expx( εx d). Denote γx(t) to be the unique geodesic with γx(0) = x and γx( εx) = x. We further rotate ι(M) so that

ed = ι d dt γx| εx.

Hence, when x = x M, ed is the inward normal direction of ι( M) at ι(x). Since M is an isometric extension of M and ι is an extension of ι, for x M, we use IIx to denote both the second fundamental form of ι(M) at ι(x) and the second fundamental form of ι(M) at ι(x). Recall that the second fundamental form at x is a symmetric bilinear map from Tx M Tx M to (ι Tx M) . We define IIij(x) = IIx( i, j) for i, j = 1, ,d. When x is close to the boundary, ( ι expx) 1(BRp ε (ι(x)) ι( M)) is not empty and can be regarded as the graph of a function. Denote ai j(x ), i, j = 1,...,d 1, to be the second fundamental form of the embedding of M into M at x . Then there is a domain K Rd 1 and a smooth function q defined on K, such that

( ι expx) 1(BRp ε (ι(x)) ι( M))

= n d l=1 ul l Tx M (u1, ,ud 1) K, ud = q(u1, ,ud 1) o ,

where q(u1, ,ud 1) can be approximated by

εx + d 1 i,j=1 ai j(x )uiuj

up to an error depending on a cubic function of u1,...,ud 1. For the sake of self containedness, we provide a proof of this fact in Lemma 24. Note that in general the region Dε(x) may not be symmetric with respect to x. We define the symmetrized region associated with Dε(x).

Definition 5 For x Mε and ε > 0 sufficiently small, the symmetrized region associated with Dε(x) is defined as

Dε(x) = n (u1, ud) Tx M

d i=1 u2 i ε2 and ud εx + d 1 i,j=1 ai j(x )uiuj o .

When x Mε, Dε(x) is symmetric across 1,..., d 1 since if (u1, ,ui, ud) Dε(x), then (u1, , ui, ,ud) Dε(x) for i = 1, ,d 1 by definition. Clearly, the volume of Dε(x) is an approximation of that of Dε(x) up to the third order error term. See Corollary 25 for details. For x Mε and ε sufficiently small, define the symmetric region associated with Dε(x) as

Dε(x) = n (u1, ud) Tx M

d i=1 u2 i ε2o Tx M. (3.3)

3.3 The augmented vectors and the kernels associated with LLE

Before we define the augmented vectors and the kernels associated with LLE, we recall the definition of the local covariance matrix. For x M, we call

Cx := E[(ι(X) ι(x))(ι(X) ι(x)) χBRp ε (ι(x))(ι(X))] Rp p (3.4)

the local covariance matrix at ι(x) ι(M), which is the covariance matrix considered for the local principal component analysis (PCA) (Singer and Wu, 2012; Cheng and Wu, 2013). In this paper, we use the following symbols for the local covariance matrix. For x M, suppose rank(Cx) = r p. Clearly, r depends on x, but we ignore x for simplicity. Denote the eigen-decomposition of Cx as Cx = UxΛx U x , where Ux O(p) is composed of eigenvectors and Λx is a diagonal matrix with the associated eigenvalues λ1 λ2 λr > λr+1 = = λp = 0. The theoretical property of local PCA on the manifold without boundary has been studied in a sequence of works, like Nadler (2008); Singer and Wu (2012); Tyagi et al. (2013); Cheng and Wu (2013); Kaslovsky and Meyer (2014); Little et al. (2017); Wu and Wu (2018); Dunson and Wu (2021), and the companion property near the boundary will be discussed in Section D. In this work, the regularizer in (2.3) we have interest in is c = nεd+3. For the sake of selfcontainedness, we provide an intuitive explanation for the choice of the regularizer when M has no boundary based on the results in Wu and Wu (2018). Under Assumptions 3.1 and 3.2, let X = {zi = ι(xi)}n i=1. Let Gn,k be the local data matrix constructed from X as defined in (2.2). Then, as n , we expect 1 n Gn,k G n,k Cxk,

where Cxk is the local covariance matrix at ι(xk). In Wu and Wu (2018), the authors show that the first d eigenvalues of Cxk are of order εd+2, while the rest p d eigenvalues are bounded by εd+4

when ε is sufficiently small. Those p d eigenvalues include the extrinsic geometric information, e.g. the second fundamental form of ι(M). Hence, if ε is chosen properly based on n, we expect that the first d eigenvalues of Gn,k G n,k are of order nεd+2, while the rest p d eigenvalues containing the extrinsic geometric information are bounded by nεd+4. Suppose we choose c = nεd+3 in (2.5). Then last p d diagonal terms of Λn + c Ip p are dominated by c but less than the first d diagonal terms. Hence, we can eliminate the impact of the extrinsic geometry of ι(M) in (2.5). Since the

WHEN LOCALLY LINEAR EMBEDDING HITS BOUNDARY

Laplace Beltrami operator only depends on the intrinsic geometry of M, we choose c = nεd+3 to construct the LLE matrix in (2.4) and (2.8) in order to recover the operator. When the boundary is non-empty, since our focus is the boundary effect, we fix this regularizer so that points away from the boundary have a good control.

Definition 6 Define the augmented vector at x M as

T(x) = E[(ι(X) ι(x))χBRp ε (ι(x))(ι(X))] Ux Ip,r(Λx +εd+3Ip p) 1U x Rp , (3.5)

which is a Rp-valued vector field on M.

The nomination of T(x) comes from analyzing the kernel associated with LLE. It has been shown in Wu and Wu (2018, Corollary 3.1) that the kernel associated with LLE is not symmetric and is defined as

Kε(x,y) := χBRp ε (ι(x))(ι(y)) [(ι(y) ι(x))χBRp ε (ι(x))(ι(y))] T(x). (3.6)

We call T(x) the augmented vector since it augments the symmetric 0 1 kernel K(x,y) = χBRp ε (ι(x))(ι(y)) by the inner product of T(x) and [(ι(y) ι(x))χBRp ε (ι(x))(ι(y))]. Notice that the vector Tn,xk defined in (2.6) is a discretization of T(x) and the theoretical justification is provided in Appendix G.

3.4 Empirical estimation of the regularizer

In Wu and Wu (2018), the authors propose to choose the regularizer in LLE as c = nεd+3 based on the asymptotic analysis. However, in practice, the dimension of the manifold d is unknown. While it is possible to estimate the dimension, it is usually a challenging mission. We thus need a practical way to determine the regularizer without estimating the dimension. In this subsection, we provide empirical estimators of c without estimating the dimension of the underlying manifold given a finite sample X = {zi = ι(xi)}n i=1 satisfying Assumptions 3.1 and 3.2, and the estimator is asymptotically equal to nεd+3 up to a constant. Suppose z,z Rp. Define Kε(z,z ) = exp( z z Rp

ε ), where ε is the same bandwidth in the ε radius ball scheme of LLE. We suggest considering the following empirical regularizer

c = ε3median

n j=1 Kε(zi,zj)

where the median is evaluated over all zi X . The construction of the above estimator is motivated by the kernel density estimation. Suppose z = ι(x) for x M. It is shown in Berry and Sauer (2017) that if nεd and ε 0 as n , then c(z) C(x)P(x)nεd+3, where C(x) depends on d and dg(x, M) if x is sufficient close to the boundary, and C(x) only depends on d if x is away from the boundary. This estimator is easy to implement and does not require an estimation of d. We shall mention that although we choose the squared exponential kernel for the density estimation, more general kernels can be applied to construct the estimator. We refer the readers to Wu and Wu (2022) for a discussion.

4. Asymptotic analysis of LLE under the manifold with boundary setup

In this section, we provide the asymptotic analysis of LLE under the manifold with boundary setup. With the ε-radius ball scheme, the asymptotic analysis is achieved in the following 4 steps. We also summarize the main results of each step as follows.

Step 1: In subsection 4.1, we study the augmented vector T(x) for x M. We show that the vectors T(x) form a smooth vector field on ι(M). The geometry of the vector field can be intuitively described as follows. The vector T(x) almost points toward the normal direction of ι(M) in the interior region x M \ Mε. However, in Mε, T(x) leans towards the tangent direction of ι(M) gradually. It is worth noting that the restriction of the tangent components of T(x) on ι( M) forms an inward normal vector field of ι( M).

Step 2: Since T(x) is the major ingredient in the definition of the kernel function Kε(x,y), where x,y M, in subsection 4.2, we explore the properties of Kε(x,y) by using the properties of T(x) that we derive in the previous subsection. Obviously, based on the definition, Kε(x,y) = 0 when ι(y) BRp ε (ι(x)). We show that Kε(x,y) is approximately equal to 1 when x M \ Mε and ι(y) BRp ε (ι(x)). However, when x Mε, Kε(x,y) is not symmetric and may be negative.

Step 3: In subsection 4.3, we provide the variance analysis. First, we use Kε(x,y) to define an integral operator Qε on C(M). For f C2(M), let f be the discretization of f over {x1, ,xn} M. Let W be the LLE matrix. Then, we show that [(W I) f](k) converges to Qε f(xk) at the

log(n) n1/2εd/2 1 regardless the xk in M \ Mε or Mε. Hence, the result implies that the pointwise convergence rate of LLE is the same for manifolds with or without boundary.

Step 4: In subsection 4.4, we provide the bias analysis. We define a second order mixed-type differential operator Dε. We show that for f C3(M), Qε f(x)/ε2 can be approximated by Dε f(x) for all x M. The final result connecting [(W I) f](k) and Dε f(xk) is achieved by combining the variance and the bias analysis.

4.1 Properties of the augmented vector on manifold with boundary

The main challenge to analyze LLE is dealing with the augmented vector. It involves three main players in the data structure, the p.d.f., the curvature, and the boundary if the boundary is not empty. Clearly, when x is close to the boundary, the term E[(ι(X) ι(x))χBRp ε (ι(x))(ι(X))] in T(x) includes the geometry of the boundary, and the integration will depend on the p.d.f.. On the other hand, while a manifold can be locally well approximated by an affine space, the curvature appears in the eigenvalues of the local covariance matrix. Hence, the term (Λx +εd+3Ip p) 1 in T(x) involves the curvature. Dealing with these terms requires a careful asymptotic analysis. To alleviate the heavy notation toward this goal, we consider the following functions, and their role will become clear along the theory development.

WHEN LOCALLY LINEAR EMBEDDING HITS BOUNDARY

Definition 7 Suppose ε is sufficiently small. We define the following functions on [0, ), where |Sd 2|

d 1 is defined to be 1 when d = 1.

2d + |Sd 2|

ε 0 (1 x2) d 1

2 dx for 0 t ε |Sd 1|

d for t > ε (4.1)

d2 1 (1 ( t

2 for 0 t ε 0 otherwise

|Sd 1| 2d(d+2) + |Sd 2|

ε 0 (1 x2) d+1

2 dx for 0 t ε

|Sd 1| d(d+2) otherwise

|Sd 1| 2d(d+2) + |Sd 2|

ε 0 (1 x2) d 1

2 x2dx for 0 t ε

|Sd 1| d(d+2) otherwise

( |Sd 2| (d2 1)(d+3)(1 ( t

2 for 0 t ε 0 otherwise

( |Sd 2| (d2 1)(d+3)(2+(d +1)( t

ε )2)(1 ( t

2 for 0 t ε 0 otherwise

Note that these functions are of order 1 when t ε. These seemingly complicated formulas share a simple geometric picture. If R is the region between the unit sphere and the hyperspace xd = t

ε in Rd with coordinates {x1, ,xd}, where 0 t ε, then σ0(t), σ1,d(t), σ2(t), σ2,d(t), σ3(t) and σ3,d(t) are expansions of the integrals of 1, xd, x2 1, x2 d, x2 1xd and x3 d over R respectively. All the above functions are differentiable of all orders except when t = ε. The regularity of the functions at t = ε depends on d. For example σ0(t) is at least C0 at t = ε and the other functions are at least C1 at t = ε. With these notations, the behavior of T(x), particularly when x is near the boundary, can be fully described.

Proposition 8 Decompose T(x) = Ttan(x)+Tper(x), where Ttan(x) is the tangential component of T(x) and Tper(x) is the normal component of T(x); that is, Ttan(x) ι Tx M and Tper(x) (ι Tx M) . If x Mε, then

Ttan(x) = σ1,d( εx)

σ2,d( εx) 1 ε ed +O(1)

Tper(x) = P(x)

" σ2( εx) σ1,d( εx)

σ2,d( εx)σ3( εx) d 1 j=1 IIj j(x)

+ σ2,d( εx) σ1,d( εx)

σ2,d( εx)σ3,d( εx) IIdd(x)

# 1 ε +O(1).

If x M \Mε, then

Ttan(x) =Jp,d P(x)

Tper(x) = P(x)

d j=1 IIj j(x) 1

The proof is postponed to Appendix E. The above proposition says that when x Mε, both the tangent and normal components of T(x) are of order 1

ε , and the normal component depends on the extrinsic curvature of the manifold at ι(x). An interesting observation is that a construction of the inward normal vector field on ι( M) is naturally encoded in the LLE algorithm. In particular, Ttan(x) on ι( M) forms an inward normal vector field on ι( M) with an order O(1) perturbation. Moreover, the magnitude of Ttan(x) on Mε only depends on the distance from x to the boundary and it is independent of the p.d.f.. On the other hand, when x M \Mε, T(x) is of order 1

ε in the normal direction of M \ Mε with an order O(1) perturbation in the tangential direction. With the theorem developed in Wu and Wu (2018) for the augmented vector field away from the boundary, we have the full knowledge of the augmented vector field. At last, in Figure 2, we provide a visualization of the augmented vector field in a 2-dim manifold parametrized by (x,y,x2 y3), where x2 + y2 1. We sample the manifold in the following way. First, uniformly sample 20,000 points independently on [ 1,1] [ 1,1], and keep points with norm less and equal to 1. The i-th point is then constructed by the parametrization. Clearly, the sampling is not uniform. The LLE matrix is constructed with the ε-radius ball nearest neighbor search scheme with ε = 0.2.

Figure 2: The T vector field. The sampled point cloud is plotted in gray. Left: the black points indicates points satisfies 0.98 x2 +y2 1, and the T on those points are marked in red. Right: the T on points with x2 +y2 < 0.98 are marked in red.

Remark 9 Another work that constructs a normal vector field on the boundary of a manifold is Berry and Sauer (2017). Inspired by the kernel density estimation, the authors propose an inward normal vector field by using F(ι(x)) = E[(ι(X) ι(x))K( ι(X) ι(x) Rp

h )] Rp, where K : [0, )

WHEN LOCALLY LINEAR EMBEDDING HITS BOUNDARY

[0, ) has an exponential decay and h is the bandwidth. Since the construction originates from the kernel density estimation, when x M, the magnitude of the vector field depends on the p.d.f..

4.2 Properties of the kernel on manifold with boundary

With the above knowledge of the augmented vector field near the boundary, the behavior of the kernel near the boundary can be well quantified.

Proposition 10 Let Kε(x,y) be the kernel function defined in (3.6). Fix x M, we summarized the properties of Kε(x,y) as follows.

1. Suppose x Mε. When ι(y) BRp ε (ι(x)), Kε(x,y) = 1 O(ε). Otherwise Kε(x,y) = 0. Hence, Kε(x,y) 0, when ε is sufficiently small. The implied constant in O(ε) depends on the minimum and C1 norm of P and the maximum of second fundamental form of the manifold.

2. If x Mε and ι(y) BRp ε (ι(x)), when ε is sufficiently small,

Kε(x,y) = 1 σ1,d( εx)ud

σ2,d( εx)ε +O(ε), (4.2)

where the coordinate ud of y is defined in Definition 5. The implied constant in O(ε) depends on the minimum and C1 norm of P and the maximum of second fundamental form of the manifold. Otherwise Kε(x,y) = 0. Hence, we have

inf x,y Kε(x,y) = 1 |Sd 2|

d 1 2d(d +2) (d +1)|Sd 1| +O(ε) < 0

when ε is sufficiently small, where |Sd 2|

d 1 is defined to be 1 when d = 1.

3. For any x M, we have EKε(x,X) = C(x)εd +O(εd+1), (4.3)

where C(x) > C > 0, and C is a constant depending only on d and P. Hence, EKε(x,X) > 0 for all x M when ε is sufficiently small. The implied constant in O(εd+1) depends on the C1

norm of P and the maximum of second fundamental form of the manifold.

This proposition provides several facts about LLE. First, the assumption of Proposition 2 is satisfied when the manifold is boundary free, since the higher order error terms depend on various curvatures of M and M is smooth and compact. So, the eigenvalues of the LLE matrix in the boundary-free manifold setup have well controlled imaginary parts. However, when the boundary is not empty, we may lose this control. Second, the kernel function behaves differently when x is near the boundary and away from the boundary. When x is away from the boundary, the kernel is non-negative. However, when x is close to the boundary, then it is possible that Kε(x,y) is negative. In particular, when x M, ι(y) BRp ε (ι(x)), the geodesic distance between x and y is ε +O(ε2) and the minimizing geodesic between x and y is perpendicular to M, then Kε(x,y) = 1 |Sd 2|

d 1 2d(d+2) (d+1)|Sd 1| + O(ε) < 0. Although it is possible that Kε(x,y) is negative, EKε(x,X) is always positive if ε is small enough. See Figure 3 for an illustration of the kernel associated with LLE, where the manifold, the sampling scheme, and the LLE matrix are the same as that in Figure 2, expect ε = 0.1.

Figure 3: The kernel function associated with LLE. The sampled point cloud is plotted in gray. Left: the kernel function Kε, where ε = 0.1, on two points, one is close to the boundary (indicated by the red circle, with the zoomed in enhanced visualization), and one is away from the boundary. It is clear that the kernel close to the boundary changes sign, while the kernel away from the boundary is positive. Right: the EKε(x,X). It is clear that the expectations of the kernel at all points are positive.

4.3 Variance analysis of LLE on manifold with boundary

Define the integral operator from C(M) to C(M):

Qε f(x) := E[Kε(x,X)f(X)]

EKε(x,X) f(x), (4.4)

where f C(M). We now show that when the boundary is not empty, the LLE matrix W converges to the integral operator Qε when n . The proof of the theorem is postponed to Appendix G.

Theorem 11 (Variance analysis) Suppose f C2(M). Suppose ε = ε(n) so that

log(n) n1/2εd/2+1 0 and ε 0 as n . We have with probability greater than 1 n 2 that for all k = 1,...,n,

Nk j=1 yk(j) = EKε(xk,X)

log(n) n1/2εd/2+3

n j=1 [W In n]k j f(xj) = Qε f(xk)+O p

log(n) n1/2εd/2 1

where yk is defined in (2.3). The implied constants in the error terms depend on C2 norm of f, C1

norm of P and the L norm of maxi,j=1,...,d IIi j(x) .

Note that the order of the variance does not depend on the location of xj. By combining (4.5) and (4.3) in Proposition 10, we know that if n is sufficiently large, the sum of all components of yk is positive. This result restates the fact that wk defined in (2.8) does not blow up.

WHEN LOCALLY LINEAR EMBEDDING HITS BOUNDARY

4.4 Bias analysis of LLE on manifold with boundary and the main result

In this subsection, we study the integral operator Qε by relating it to the following differential operator.

Definition 12 Fix ε > 0. Define a differential operator on C2(M) as

Dε f(x) = φ1( εx) d 1 i=1 2 ii f(x)+φ2( εx) 2 dd f(x)+V(x) d f(x), (4.7)

where φ1 and φ2 are functions defined on [0, ) by

2 σ2,d(t)σ2(t) σ3(t)σ1,d(t)

σ2,d(t)σ0(t) σ2 1,d(t) , (4.8)

σ2 2,d(t) σ3,d(t)σ1,d(t)

σ2,d(t)σ0(t) σ2 1,d(t) , (4.9)

and V is a function on M defined by

V(x) = σ1,d( εx) P(x) σ2,d( εx)σ0( εx) σ2 1,d( εx) . (4.10)

Next, we take a closer look at coefficients of Dε.

Proposition 13 Fix ε > 0. We have the following properties of coefficients of the differential operator Dε.

1. φ1(t) > 0. When t ε,

φ1(t) = 1 2(d +2). (4.11)

Moreover, φ1(t) is differentiable of all orders at all t > 0 except at t = ε, where it is at least first order differentiable.

2. φ2(0) < 0. If t ε, then

φ2(t) = 1 2(d +2). (4.12)

Moreover, φ2(t) is differentiable of all orders at all t > 0 except at t = ε, where it is at least first order differentiable. Hence, there is a set S Mε diffeomorphic to M and φ2( εx) vanishes on S . Denote the geodesic distance from x S to M as t (x). t (x) depends only on ε and d. In fact, δ1ε < t (x) < δ2ε, where

1 1+ (d2 1)|Sd 1|

2d(d+2)|Sd 2|

1 (d2 1)|Sd 1|

4d(d +2)|Sd 2| + 1 d +3

2 < 1 (4.14)

and δ2 0 as d .

3. V(x) 0. Moreover, V(x) = O(ε2) is differentiable of all orders at all x except when εx = ε, where it is at least differentiable of the first order. If x Mε satisfies εx ε, in other words, x M\Mε, then V(x) = 0. In particular, if P(x) is constant, then V(x) is an increasing function of εx.

Denote Mw to be the interior subset of the region between S and M on M. Denote Me to be the interior subset of M\Mw on M. Clearly, Mw is a strict subset of Mε. According to Proposition 13, Dε is of hyperbolic type over Mw, of elliptic type over Me, and degenerate over S . We thus call Mw the wave region, Me the elliptic region, and S the degenerate region. We conclude that the operator Dε is a mixed-type differential operator with degeneracy.

Remark 14 In fact, t is the solution of the following nonlinear equation of t:

|Sd 1| 2d(d +2) + |Sd 2|

0 (1 x2) d 1

(d2 1)2(d +3)

h 2+(d +1) t

2id+1 , (4.15)

where t > 0

We have the following theorem describing how Qε is related to Dε when ε is sufficiently small. The proof is postponed to Appendix F.

Theorem 15 (Bias analysis) Suppose f C3(M) and P C2(M). We have

Qε f(x) = Dε f(x)ε2 +O(ε3). (4.16)

By combining the bias and variance analyses, we have the following pointwise convergence result.

Theorem 16 Suppose f C3(M) and P C2(M). Suppose ε = ε(n) so that

log(n) n1/2εd/2+1 0 and ε 0 as n . We have with probability greater than 1 n 2 that for all k = 1,...,n,

n j=1 [W In n]k j f(xj) = Dε f(x)ε2 +O(ε3)+O p

log(n) n1/2εd/2 1

where the implied constants in the error terms depend on the C2 norm of f, the C1 norm of P, and the L norm of maxi,j=1,...,d IIij(x) .

Note that when M is a manifold without boundary, then Dε = 1 2(d+2) . Hence, the above theorem is consistent with the result in Wu and Wu (2018) when M has no boundary. However, the regularizer in Wu and Wu (2018) is chosen as c = nεd+ρ, where ρ may be different than 3. The main result there is presented in different cases capturing the interaction between ρ and different local covariance matrix structures. In contrast, the above theorem is much simpler as we only focus on the case when ρ = 3.

WHEN LOCALLY LINEAR EMBEDDING HITS BOUNDARY

4.5 Relationship between W I and (W I) (W I)

In this paper and Wu and Wu (2018), we focus on studying the asymptotic behavior of W I. However, in the original LLE algorithm, it is the eigenvector of the matrix (W I) (W I) that is used to reduce the dimension of X . We shall clarify the relationship between W I and (W I) (W I) from two aspects spectral geometry and linear algebra. Recall that when a manifold is compact without boundary, based on the spectral embedding results (B erard et al., 1994; Bates, 2014; Portegies, 2016), the eigenfunctions of can be applied to construct an embedding of the manifold in a Euclidean space, and the embedding is almost isometric if both eigenfunctions and eigenvalues are properly used. Therefore, if we could obtain eigenpairs of from the database X sampled from a manifold, we could recover the manifold via this spectral embedding. However, it is not clear what the eigenvectors of (W I) (W I) or W I mean directly from the algorithm. An immediate approach to answering this question is via the pointwise convergence. In Wu and Wu (2018), if M is a compact manifold without boundary, W I pointswisely converges to the operator ε2 2(d+2) . In Theorem 16 of the current work, we show that the same result holds over the region M \ Mε when M = /0, and near the boundary the asymptotic behavior is complicated with degeneracy. Hence, one may guess that (W I) (W I) will pointwisely converge to ε4 4(d+2)2 2 over the data points in the region M \Mε. However, it is in general not true, particularly when the sampling is nonuniform. It is because the nonsymmetry of W I plays an important role in eliminating the impact of nonuniform distribution of the point cloud on M (Wu and Wu, 2018), and we lose such property if we consider (W I) (W I). Consider the following analysis of (W I) (W I) in a simple manifold for an illustration. In contrast to the operator Qε, we define a new integral operator from C(M) to C(M):

Qε f(x) := E[Kε(X,x)f(X)]

EKε(X,x) f(x), (4.18)

where f C(M). By the law of large number, we would have

n j=1 [W In n] k j f(xj) Qε f(xk)

and n i=1 [W In n] ki

n j=1 [W In n]i j f(xj) ( Qε(Qε f))(xk)

as n . Now, suppose ι(M) = [ 1,1] R and let X = {ι(xi)}n i=1, where {x1,x2, ,xn} are i.i.d. sampled following a p.d.f. P C2(M). Then, for any ε small enough, suppose ι(xk) [ 1+ε,1 ε], for any f C5(M), we have

( Qε(Qε f))(xk) = 1

4 f (xk)+ f (xk)P (xk)

+O(ε), (4.19)

where f = [f(x1),..., f(xn)] . The detailed calculation of (4.19) is postponed to Appendix H. In this result, although the manifold is flat and there is no extrinsic geometric information involved, asymptotically (W I) (W I) involves not only the desired 2 but also the sampling distribution. This says that even if we do not consider the boundary, the asymptotic differential operator is more

complicated than the bi-Laplacian 2. A similar result can be derived when M is a general manifold without boundary, but we omit details here. Based on the above discussion, we could reasonably conjecture that the eigenvectors of W I approximate the eigenfunctions of when M = /0 and the eigenfunctions of more complicated second order differential operator with degeneracy when M = /0, and the eigenvectors of (W I) (W I) approximate the eigenfunctions of more complicated fourth order differential operator. To prove these conjectures, we need to establish the spectral convergence results, which is out of the scope of this paper. Note that previous work on spectral convergence of graph Laplacian (Dunson et al., 2021; Calder et al., 2022; Wormell and Reich, 2021) mainly focuses on symmetric kernel matrices, except the work discussing the k NN kernel construction Calder et al. (2022). However, these kernels are a priori assigned, so their approaches cannot be directly applied to study W I. Specifically, the LLE matrix is not only nonsymmetric but also determined by the dataset. We need different analysis tools to establish the spectral convergence. On the other hand, a complete understanding of the original LLE is certainly via understanding (W I) (W I). We shall mention that even for bi-Laplacian to which (W I) (W I) pointwisely converges under special conditions, it is still challenging to derive the spectral convergence. Note that it is still an active research field to study bi-Laplacian and its spectral behavior (Chang et al., 1999; Cuccu and Porru, 2009). To sum up, with the help of pointwise convergence results, we could conjecture the spectral behavior of W I and (W I) (W I), and their behaviors are different in general. From the linear algebra perspective, there are several interesting relationships between (W I) (W I) and W I. First, W I is a sparse matrix and in general sparser than (W I) (W I). Since (W I) (W I) is symmetric, its eigendecomposition always exists, while W I is not always diagonalizable. Second, eigenvalues of (W I) (W I) are the square of the singular values of W I and the eigenvectors of (W I) (W I) are the same as the right singular vectors of the W I. While W I is in general not diagonalizable, based on Proposition 10 under the manifold setup, the conditions in Proposition 2 are satisfied, which says that when the sample size of the data is large enough, W I is close to the symmetric matrix W+W

2 I and the imaginary parts of the eigenvalues of W I are small. Note that even if W I is diagonalizable, the right singular vectors of W I are different from the right eigenvectors of W I. It echos what we discuss above under the boundary-free manifold setup, we conjecture that the eigenvectors of W I approximate the eigenfunctions of , while the eigenvectors of (W I) (W I), and hence the singular vectors of W I, approximate the eigenfunctions of a fourth order differential operator that involves the nonuniform sampling information. Third, we shall mention that numerically we consistently found that under the manifold setup, when n is sufficiently large, the leading eigenvectors of W I recover the corresponding eigenfunctions of . Note that a theoretical justification of this numerical finding is part of the spectral convergence conjecture listed above. Therefore, if we consider W I for the dimension reduction purpose, we propose to use the real parts of the top eigenvectors of W I corresponding to the leading eigenvalues listed in the decreasing order of their real parts to define the embedding. We provide a numerical comparison of the embedding by W I and (W I) (W I) in Figure 4, where we nonuniformly sample 3513 points from the unit disk on R2 (shown on the top left subfigure superimposed with the radius as the color), construct W with the ε = 0.04 radius ball and the regularizer nε5, and embed the data using the top 2 non-trivial eigenvectors of W I (shown on the top middle subfigure) and (W I) (W I) (shown on the top right subfigure). Note that in the top middle and right subfigures, the original radius of each point is superimposed as the color, which indicates how the embedding behaves. The top three nontrivial eigenvectors of W I are

WHEN LOCALLY LINEAR EMBEDDING HITS BOUNDARY

shown in the middle panel, and the top three nontrivial eigenvectors of (W I) (W I) are shown in the bottom panel. In this example, we see that the embeddings by W I and (W I) (W I) are different. Based on the analysis in the current work, it is not surprising that the embedding by W I is impacted by the boundary, but it is interesting to see that (W I) (W I) is less impacted by the boundary. The nonuniform sampling also plays a role here. While we do not show it here, we found that when the sampling from the unit disk is uniform, the embeddings by W I and (W I) (W I) are similar, which suggests the interaction between the sampling and boundary. The above interesting findings warrant further study of the behavior of (W I) (W I) from various aspects to fully understand how LLE functions.

Figure 4: Top left subfigure: the original dataset uniformly sampled from the unit disk superimposed with the radius as the color. Top middle (right respectively) subfigure: the embedding by the top two nontrivial eigenvectors of W I ((W I) (W I) respectively), where the radius of the original radius of each point is superimposed as the color. The top three nontrivial eigenvectors of W I are shown in the middle panel, and the top three nontrivial eigenvectors of (W I) (W I) are shown in the bottom panel.

5. Exploration of LLE on 1-dim manifold with boundary

In this section, we further explore the LLE matrix W. In Theorem 16, we show that the matrix W I converges pointwisely to the differential operator Dε. Then, we illustrate how the differential operator Dε looks like in the 1 dimensional case.

Corollary 17 Let M be a regular smooth curve in Rp. Let γ(t) : [0,a] Rp be the arclength parametrization. Let P(t) be the probability density function. Then, we have, for f C2(M),

ε )2)f (t)+ 6ε2(ε t)

P(t)(ε+t)3 f (t) if t [0,ε]; 1 6 f (t) if t [ε,a ε]; 1

ε )2)f (t) 6ε2(ε+t a)

P(t)(ε+a t)3 f (t) if t [a ε,a].

Specifically, Dε f(t) degenerates to 2

3 P(t) f (t) at t = (2

3)ε and t = a (2

This corollary comes from a direct expansion of the formula in Definition 12. Note that ed is in the outward normal direction by definition. Therefore, d f(t) = f (t), when t [0,ε]. And d f(t) = f (t), when t [a ε,a]. To study the spectral property of Dε, it is natural to consider converting Dε into the Sturm-Liouville (SL) form by the integrating factor. However, due to the degeneracy of aε, several technical details need to be taken care of. Here we provide a summary of known facts about the SL theory (Na ımark, 1967, Chapter V). Fix a > 0. The SL problem on (0,a) is finding a complex function f(x) defined on (0,a) that solves (p(x)f(x) ) +q(x)f(x) = λw(x)f(x), (5.1)

where p(x), q(x) and w(x) are measurable real functions on (a,b) and λ is a complex number. The SL problem is called regular if 1/p, q, and w are all functions in L1(a,b); otherwise, it is called singular. A complex function f(x) is a solution of the SL problem (5.1), if f [0](x) and f [1](x) exist, where f [0](x) := f(x) (respectively f [1](x) := p(x)f (x)) is the zero (respectively first) order quasiderivative of f, and are absolutely continuous on any compact subinterval of (a,b). It is worth noting that in general f (x) may not be absolutely continuous. It is stated in (Na ımark, 1967, Chapter V) that for x0 (0,a) and complex numbers c0 and c1, there exists a unique solution to the regular SL problem with f [0](x0) = c0 and f [1](x0) = c1. As an eigenvalue problem, by (Atkinson, 1964), given boundary conditions A1 f [0](0) + A2 f [1](0) = 0 and B1 f [0](a) + B2 f [1](a) = 0 with A2 1 + A2 2 > 0 and B2 1 + B2 2 > 0, if the SL problem is regular and p > 0 and w > 0 on (0,a), then the eigenvalues are discrete and bounded from below; that is, we have eigenvalues < λ0 < λ1 < λ2 < ... so that λn as n . Moreover, if fn is the corresponding eigenvalues of λn, then fn has exactly n zeros in (0,a). Now we come back to the challenge. Suppose P(t) = 1/a; that is, the sampling is uniform. By Corollary 17, when ε is sufficiently small, the second order ordinary differential equation

Dε f(t) = Aε(t)f (t)+Bε(t)f (t) (5.2)

dominates. Note that Aε(t) > 0 on the elliptic region ((2

3)ε), and Aε(t) < 0 on the wave region [0,(2

3)ε,a], while Bε(t) 0 on [0,a]. To convert Dε into the SL form, we define two more functions over t [0,ε]. First,

g(t) := t (2

3)aε(t +ε)8aεe [ 12aε3

(ε+t)2 + 12aε2

WHEN LOCALLY LINEAR EMBEDDING HITS BOUNDARY

over t [0,ε]. Clearly, g(t) > 0. Moreover, g is continuous and is smooth except at t = (2

3)ε. Second,

h(t) := t (2

3)aε 1 t (2+

3)aε 1(t +ε)8aεe [ 12aε3

(ε+t)2 + 12aε2

over t [0,ε]. By a direct check, we know that h(t) > 0 on [0,(2

3)ε,ε] and h(t) when t (2

3)ε since (4+2

3)aε 1 < 0. With g and h, define

g(t) if t [0,(2

3)ε]; g(t) if t [(2

3)ε,ε]; g(ε) if t [ε,a ε]; g(a t) if t [a ε,a (2

3)ε]; g(a t) if t [a (2

on [0,a] and

h(t) if t [0,ε]; h(ε) if t [ε,a ε]; h(a t) if t [a ε,a]. (5.4)

on [0,a]. We have the following proposition summarizing the behavior of p and w.

Proposition 18 Suppose ε is sufficiently small. The defined function p satisfies the following properties.

1. p(t) > 0 on the elliptic region ((2

3)ε), p(t) < 0 on the wave region [0,(2

3)ε,a], and p(t) = 0 when t = (2

3)ε or t = a (2

2. p(t) is C1 on [0,a] except at t = (2

3)ε and t = a (2

3)ε. In particular, p (t) as t (2

3)ε from right or t a (2

3)ε from left; p (t) as t (2

3)ε from left or t a (2

3)ε from right.

3. p(t) is absolutely continuous.

4. 1/p L1 on [0,a].

The defined function w satisfies the following properties.

1. w(t) is C1 on [0,a] except at t = (2

3)ε and t = a (2

2. w(t) as t (2

3)ε or t a (2

3. w L1 on [0,a].

The defined functions p and w are related to Aε and Bε and satisfy the following properties.

1. p(t) w(t) = Aε(t) and p (t)

w(t) = Bε(t), when t = (2

3)ε and t = a (2

2. p(t) w(t) Aε(t) and p (t)

w(t) Bε(t) when t (2

3)ε or t a (2

With this proposition, we conclude that except at t = (2

3)ε and t = a (2

3)ε, the following relationship holds:

Aε(t)f (t)+Bε(t)f (t) = p(t)

w(t) f (t)+ p (t)

w(t) f (t) = (p(t)f (t))

w(t) . (5.5)

Also, the corresponding eigenvalue problem

(p(t)f (t)) = λw(t)f(t), (5.6)

is regular when ε is small enough. If f is a solution to the above problem, then f is absolutely continuous, and hence it is differentiable almost everywhere. Moreover, p(t)f (t) is absolutely continuous and p(t) is differentiable and nonzero except at t = (2

3)ε and t = a (2

3)ε. By the quotient rule f is twice differentiable almost everywhere. With p(t) defined in (5.3), define

Hp := { f [0](t) and f [1](t) are absolutely continuous on [0,a]}.

With the above discussion, we have the following corollary based on Atkinson (1964) that are related to understanding the spectrum of the LLE matrix.

Corollary 19 Suppose we impose the Dirichlet boundary condition for the eigenvalue problem, Dε f(t) = λ f(t), over the elliptic region [(2

3)ε]; that is,

3)ε) = f(a (2

Then the eigenvalues are discrete and bounded from above; that is, the eigenvalues are > λ0 > λ1 > λ2 > ... so that λn as n . If fn Hp is the corresponding eigenfunctions of λn, then fn has exactly n zeros in ((2

We mention that this corollary is only for theoretical interest but not for practical interest since we need extra steps to clip the wave region and impose the Dirichlet boundary condition when we only have a point cloud. Also, the above conclusion may not hold when P(t) is not uniform. In fact, it is not hard to show that if P(t) behaves like t +ε in [0,ε], then the corresponding SL problem is singular.

6. A comparison of LLE and DM

We provide a comparison of LLE and DM (Coifman and Lafon, 2006) on a manifold with smooth boundary. Recall that, unlike LLE, when we run DM, the affinity matrix is defined by composing a fixed kernel function chosen by the user with the distance between pairs of sampled points. Below we summarize the bias analysis result of DM using our notations for a further comparison when the manifold has a non-empty boundary. A full calculation can be found in Coifman and Lafon (2006); Singer and Wu (2017). To simplify the comparison, we consider the Gaussian kernel H(t) = e t2. More general kernels can be considered, and we refer the reader with interest to, e.g., (Coifman and Lafon, 2006; Singer and Wu, 2017). Also, see (Vaughn, 2020) for more relevant results. For

WHEN LOCALLY LINEAR EMBEDDING HITS BOUNDARY

x,y M, we define Hε(x,y) = exp( ι(x) ι(y) 2

ε2 ), where ε > 0 is the bandwidth. For 0 α 1, we define the α-normalized kernel as

Hε,α(x,y) := Hε(x,y) pαε (x)pαε (y),

where pε(x) := E[Hε(x,X)]. With the α-normalized kernel Hε,α, for f C3(M), the diffusion operator associated with the α-normalized DM is

Hε,α f(x) := E[Hε,α(x,X)f(X)]

E[Hε,α(x,X)] . (6.1)

The behavior of the operator Hε,α is summarized below.

Theorem 20 (Bias analysis of Diffusion map) Let (M,g) be a d-dimensional compact, smooth Riemannian manifold isometrically embedded in Rp, with a non-empty smooth boundary. Suppose f C3(M) and P C2(M). If α = 1, we have

Hε,α f(x) =σ1,d( εx)

σ0( εx) d f(x)ε + h ψ1( εx) d 1 i=1 2 ii f(x)+ψ2( εx) 2 dd f(x) i ε2 (6.2)

+U( εx) d f(x)ε2 +O(ε3),

where ψ1,ψ2 and U are scalar value functions defined on [0, ) so that

2 σ2(t) σ0(t), ψ2(t) := 1

U(t) = 0 if t ε, U( εx) depends on the second fundamental form of M in M at x, and U( εx) is independent of P. In fact,

R Dε(x) uddu R Dε(x) 1du σ1,d( εx)

σ0( εx) ε , (6.3)

R Dε (x) uddu R Dε (x) 1du is a function depending on εx and the second fundamental form of M as a codimension 1 submanifold embedded in M at x by Definition 5.

Compared with the differential operator Dε associated with LLE, the differential operator associated with DM has a very different behavior. First, in DM, when ε > 0 is finite, asymptotically the first order differential operator exists in the ε order (Coifman and Lafon, 2006), which suggests that the boundary condition is Neumann. In Vaughn (2020), it is shown that the graph Laplacian converges in the weak sense to the Laplace-Beltrami operator with the Neumann boundary condition. With this boundary condition, the spectral convergence of DM when the boundary is non-empty without a convergence rate was provided in Singer and Wu (2017) as a special case when the considered group action is SO(1). Another spectral convergence result when the boundary exists can be found in Peoples and Harlim (2021). Note that when the boundary is empty, more spectral convergence results with a convergence rate can be found in Von Luxburg et al. (2008); Trillos et al. (2020); Dunson et al. (2021), while none provide the rate is optimal to our knowledge. Second, the coefficients of the second order differential operator, ψ1 and ψ2, do not change sign and do not

degenerate over the whole manifold. Third, in DM, there is an extra first order differential operator in the ε2 order. As a result, in addition to what has been explored in Wu and Wu (2018), we see more differences between LLE and DM. While LLE and DM are different, they are intimately related. Here we provide a brief exploration for this relationship from the kernel perspective. Observe that we can rewrite the kernel as

E[K(x,X)f(X)]

EK(x,X) = E[χBRp ε (ι(x))(ι(X)) (ι(X) ι(x)) T(x)χBRp ε (ι(x))(ι(X))]f(X)

E[χBRp ε (ι(x))(ι(X)) (ι(X) ι(x)) T(x)χBRp ε (ι(x))(ι(X))]

2χBRp ε (ι(x))(ι(X)) 1

2(ι(X) ι(x)) T(x)χBRp ε (ι(x))(ι(X))]f(X)

2χBRp ε (ι(x))(ι(X)) 1

2(ι(X) ι(x)) T(x)χBRp ε (ι(x))(ι(X))] ,

which means that the kernel function is an average of two functions,

K1(x,y) := χBRp ε (ι(x))(ι(y))

and K2(x,y) := (ι(y) ι(x)) T(x)χBRp ε (ι(x))(ι(y)).

We can thus consider the following kernel generalizing the LLE kernel:

K(α)(x,y) = αK1(x,y)+(1 α)K2(x,y),

where α [0,1]. Note that K1 can be viewed as a 0 1 kernel that is commonly used in DM, so when α = 1, we recover the DM. When α = 1/2, it is clear that K(1/2) is the LLE kernel. When α = 0, we get a different kernel with different behavior. Recall Definition 6. We have

K2(x,y) = E[(ι(X) ι(x))χBRp ε (ι(x))(ι(X))] Ip,r(Cx +c Ip p) 1Ip,r(ι(y) ι(x))

BRp ε (ι(x))(ι(z) ι(x)) Ip,r(Cx +c Ip p) 1Ip,r(ι(y) ι(x))dz. (6.4)

As discussed in Malik et al. (2019), since Ip,r(Cx + c Ip p) 1Ip,r can be viewed as the regularized precision matrix , we can view (ι(z) ι(x)) Ip,r(Cx + c Ip p) 1Ip,r(ι(y) ι(x)) as the local Mahalanobis distance between z and y, or the distance between the latent variables related to z and y. Thus, when α = 0, the kernel comes from averaging out the pairwise local Mahalanobis distance, and hence depends on the local geometric structure. The relationship between the second fundamental form of M at x and the latent space will be explored in the future work. It is natural to ask if we can alleviate the impact of the wave region by choosing different α. To answer this question, we briefly discuss the behavior of E[Kα(x,X)f(X)] with different choices of α, particularly when α < 1. Let us take a more careful look at the case when α = 1/2; that is, the kernel for LLE; particularly, we look into the reason why LLE does not have the Neurman boundary condition, and why LLE is independent of the nonuniform density function from the kernel perspective. We have

E[K(1/2)(x,X)f(X)] = 1

2E[(f(X) f(x))χBRp ε (ι(x))(ι(X))] (6.5)

2E[(ι(X) ι(x))(f(X) f(x))χBRp ε (ι(x))(ι(X))] T(x).

WHEN LOCALLY LINEAR EMBEDDING HITS BOUNDARY

Recall the behavior of the two terms on the right hand side. By a direct calculation, we know E[(f(X) f(x))χBRp ε (ι(x))(ι(X))] becomes P(x)σ1,d( εx) d f(x)εd+1 + O(εd+2), when x Mε, and [1

2P(x) f(x) + f(x) P(x)]εd+2 + O(εd+3) when x Mε. Note that this is the behavior of the kernel K(1). On the other hand, E[(ι(X) ι(x))(f(X) f(x))χBRp ε (ι(x))(ι(X))] T(x) becomes P(x)σ1,d( εx) d f(x)εd+1 +O(εd+2) when x Mε, and f(x) P(x)εd+2 +O(εd+3) when x Mε. Note that this is the behavior of the kernel K(0). As a result, when x Mε, since the common term P(x)σ1,d( εx) d f(x)εd+1 cancels, there is no such Neumann boundary behavior when α = 1/2 as that in DM. When x Mε, then the common term f(x) P(x)εd+2 cancels. Hence, the behavior of LLE in the interior of the manifold is independent of the density function.

However, it is worth noting that one cannot remove the wave region Mw through adjusting α after the above analysis since both E[(f(X) f(x))χBRp ε (ι(x))(ι(X))] and E[(ι(X) ι(x))(f(X) f(x))χBRp ε (ι(x))(ι(X))] T(x) are dominated by P(x)σ1,d( εx) d f(x)εd+1 when x is near the boundary, if α = 1/2, the first order term remains.

7. Clipped LLE matrix

Based on the above theoretical results, we provide an immediate application. The Laplace-Beltrami operator with the Dirichlet boundary condition is widely used in various fields, like in the analysis of stochastic dynamics. For example, in Georgiou et al. (2017) the eigenfunctions of the Laplace Beltrami operators with the Neumann and Dirichlet boundary conditions are used together to reconstruct the conformational space of a stochastic gradient system. According to the developed theory in Theorem 16, the asymptotic operator in general behaves well away from the boundary. We thus consider the following modification of LLE that echos the theoretical development in Section 3 and this algorithm is a potential candidate to recover the Laplace-Beltrami operator with the Dirichlet boundary condition on manifold with boundary.

For a given sampling set X = {xi}n i=1, due to the ε-radius nearest neighbor scheme, assume we can divide the LLE matrix W Rn n into blocks according to four portions, the interior, transition, wave-boundary and non-wave-boundary portions. The interior portion XI := {xi X : d(xi, M) > 2ε} that includes points far away from the boundary, the wave-boundary portion Xw := {xi X : xi Mw M} that includes points in the wave region, XB := {xi X : xi Mε\(Mw M)} and the transition portion XT := {xi X : ε d(xi, M) 2ε} that includes the remaining points touching the other three portions. The W matrix is thus divided into

Www Ww B Ww T 0 WBw WBB WBT 0 WTw WTB WTT WTI 0 0 WIT WII

where Www R|Xw| |Xw| represents the wave-boundary portion of the LLE matrix, WBB R|XB| |XB|

represents the non-wave-boundary portion of the LLE matrix, WTT R|XT | |XT | represents the transition portion of the LLE matrix, WII R|XI| |XI| represents the interior portion of the LLE matrix, and the other submatrices represent the interaction of the four portions of the LLE matrix.

Construct a new matrix Wr Rn n , where n = n |Xw|, by restricting W to Y ; that is,

WBB WBT 0 WTB WTT WTI 0 WIT WII

We call Wr the clipped LLE matrix for simplicity. Note that in general the matrix Wr is not symmetric, not a transition matrix, and the rows may not sum to 1. We shall emphasize that this algorithm depends on the knowledge of the wave region. In general, we need an algorithm to detect the wave region from a point cloud. Below we see some numerical results. We uniformly and independently sample points from M1 := [0,1] R1. The LLE matrix is constructed with the ε-radius scheme, where ε = 0.01. The first 5 eigenfunctions are shown in the top panel of Figure 5. Note that the first eigenfunction is constant, and the second eigenfunction is linear, and both are with eigenvalue 1; that is, these two eigenfunctions form the null space of I W. The other eigenfunctions look like the eigenfunctions of the Laplace-Beltrami operator with the Dirichlet boundary condition, but higher eigenfunctions become irregular when getting closer to the boundary. Next, consider another 1-dim curve M2 embedded in R3 that is parametrized by t (t,log(0.5 +t),cos(πt)) R3, where t [0,1]. We uniformly and independently sample 8,000 points from [0,1] and mapped them to M2. Denote the sampled points X := {xi}8,000 i=1 R3. Note that the sample is not uniform. The LLE matrix W R8,000 8,000 is constructed with the ε-radius scheme, where ε = 0.01, and hence the clipped LLE matrix Wr. The first 10 eigenfunctions of Wr constructed from M1 and M2 are shown in the middle and bottom panels in Figure 5. According to Corollary 17 and the discussion in Subsection 5, the asymptotic operator is well behaved in [(2

3)ε]. This theoretical finding fits the numerical results the eigenfunctions are all 0 at the boundary points (2

3)ε and 1 (2

3)ε. Second, we uniformly sample points from a unit disk, M3 R2, by keeping points with norm less than or equal to 1 from 20,000 points sampled uniformly and independently from [ 1,1] [ 1,1]. The LLE matrix is constructed with the ε-radius ball nearest neighbor search scheme, where ε = 0.1. The first 20 eigenfunctions are shown in Figure 6. The first eigenfunction is constant, and the second and third eigenfunctions are linear, and these three eigenfunctions are associated with eigenvalue 1. These three eigenfunctions form the null space of I W. We can see three types of eigenfunctions those of the first type look like the eigenfunctions of the Laplace Beltrami operator with the Dirichlet boundary condition, those of the second type look like the eigenfunctions of the Laplace-Beltrami operator restricted to the rim near the boundary, which is topologically a closed manifold S1, and those of the third type look like the mix-up of the first two types. Next, we explore the clipped LLE matrix on the unit disk M3 R2 with the same uniform sampling scheme. For M3, we remove rows and columns associated with points with norm greater than 1 (2

3)ε, where ε = 0.1. The first 20 eigenfunctions are shown in Figure 7. It is clear that compared with those shown in Figure 6, all eigenfunctions are 0 at the boundary . Next, the first 20 eigenfunctions of the LLE matrix and the clipped LLE matrix of the surface shown in Figures 2 and 3 with the same sampling scheme are shown in Figures 8 and 9. It is clear that while the first 3 eigenfunctions of the LLE matrix behave like constant or linear functions, the other eigenfunctions are not easy to describe. However, all eigenfunctions of the clipped LLE matrix are zero on the boundary , which behaves like the eigenfunctions of the Laplace-Beltrami operator with the Dirichlet boundary condition.

WHEN LOCALLY LINEAR EMBEDDING HITS BOUNDARY

The above examples are all manifolds without interesting topological structure since they can all be parametrized by one chart. In the final example we show a two dimensional manifold with non-trivial topology. Consider a torus embedded in R3, which is parametrized by

Φ : (θ,φ) 7 ((3+1.2cos(θ))cos(φ),(3+1.2cos(θ))sin(φ),1.2sin(φ)) R3,

where θ,φ [0,2π). The manifold M4 is defined as

M4 = {Φ(θ,φ) : θ,φ [0,2π) and (3+1.2cos(θ))cos(φ) > 3.4};

that is, M4 is a truncated torus with the boundary diffeomorphic to S1. We sample uniformly 25,000 points on [0,2π] [0,2π], and remove points associated with (3+1.2cos(θ))cos(φ) > 3.4. Note that this is a nonuniform sampling scheme from M4. Then, establish the LLE matrix and the clipped LLE matrix with ε = 0.3. The results are shown in Figures 10 and 11. Again, it is clear that the eigenfunctions of the clipped LLE matrix are zero on the boundary . With these numerical results, we conjecture that if we clip the wave region, the operator Dε over M\(Mw M) asymptotically converges to the Laplace-Beltrami operator with the Dirichlet boundary condition over M\(Mw M) in the spectral sense when ε 0.1 We will explore this problem in our future work.

8. Discussion and Conclusion

In this paper, we provide an exploration of LLE when the manifold has boundary. We mention several interesting problems that we will explore in our future work. First, the distribution of the LLE matrix eigenvalues under the null case has an interesting distribution behavior, which rings the bell of the interaction between kernel random matrix and random matrix theory. Recently, the spectral behavior of graph Laplacian has been studied from the random matrix perspective in Ding and Wu (2022). However, due to the non-symmetric nature of the LLE matrix, such an approach cannot be directly applied. Understanding the behavior of LLE will pave the road toward statistical inference of unsupervised manifold learning. Second, the potential Dirichlet boundary condition associated with the clipped LLE matrix and its relationship with the Neuman boundary condition associated with the GL suggest exploring the Dirichlet-to-Neumann map and Schur s complement as a future direction. Moreover, from the practical perspective, when we do not know where is the wave region, we shall design an effective algorithm to determine all points from the wave region, so that we can clip the LLE matrix. Third, as is shown in Theorem 16, LLE converges pointwisely to a mixed-type differential operator with degeneracy, which is an SL equation with a peculiar structure in the one-dimensional case. In other words, we have a degenerate mixed-type differential equation and the boundary condition is not known a priori. Understanding the spectral behavior of this operator is of interest on its own from the theoretical perspective and it might be necessary in order to explore the spectral behavior of LLE when there is a boundary. Fourth, the spectral convergence of LLE is so far an open problem to our knowledge. To attack this problem, we shall again compare the original proposed (W I) (W I) and W I considered

1. In the 1-dim case, this is related to a different differential equation, the Kimura equation Epstein and Mazzeo (2013), that shares the same degeneracy on the boundary. In the Kimura equation, the boundary condition is adaptively encoded in the functional space that we search for the eigenfunctions.

in this paper and (Wu and Wu, 2018). Assume first that M = /0. We may learn from what has been done in the literature. To study the spectral convergence of DM (Trillos et al., 2020; Dunson et al., 2021), we need at least two pieces of information. The first one is the knowledge of the spectral behavior of the asymptotic differential operator, and the second one is the pointwise convergence result. The pointwise convergence result of W I has been studied in Wu and Wu (2018) when M = /0, which is generalized to the case M = /0 in this paper. It might be intuitive to conclude that since the spectral behavior of the Laplace-Beltrami operator has been well known, when M = /0, we may easily obtain the spectral convergence of W I. However, since the matrix W I is not symmetric, the associated integral operator of LLE is not self-adjoint. Thus, we cannot directly apply the same method in Trillos et al. (2020); Dunson et al. (2021) to prove the spectral convergence of W I, and new tools are needed. Moreover, since the fourth order differential operator is involved in (W I) (W I) via a pointwise convergence analysis, and its spectral behavior is less well known, it is more challenging to study the spectral convergence of (W I) (W I). The situation is certainly more complicated when M = /0 since the associated kernel, the integral operator behavior, and the asymptotic differential operator are all different. Even for W I, it is still an open problem as mentioned above as the third point, not to mention (W I) (W I) and if the nearest neighbor scheme is taken into consideration. A further systematic pointwise and spectral convergence study of (W I) (W I) is thus needed to advance the field.

Acknowledgments

The authors acknowledge the fruitful discussion with Professor Jun Kitagawa about the boundary condition. They also thank the anonymous reviewers for their valuable comments that help strengthen this paper.

WHEN LOCALLY LINEAR EMBEDDING HITS BOUNDARY

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 x

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 x

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 x

Figure 5: Top: The first 5 eigenfunctions of the LLE matrix for a point cloud sampled from the [0,1] interval are plotted with different colors. The dashed vertical gray lines indicate ε and 1 ε. It is clear that the third, fourth, and fifth eigenfunctions look like the eigenfunctions of the Laplace-Beltrami operator with the Dirichlet boundary condition, but higher eigenfunctions become irregular when getting closer to the boundary. Middle: the first 10 eigenfunctions of the clipped LLE matrix Wr for a point cloud sampled from M1 = [0,1] are plotted with different colors. The dashed vertical gray lines indicate ε and 1 ε. Bottom: the first 10 eigenfunctions of the clipped LLE matrix Wr for a point cloud sampled from the curve M3 R3 are plotted with different colors. The dashed vertical gray lines indicate ε and 1 ε. Compared with those eigenfunctions of M1, the amplitude of higher eigenfunctions of M3 becomes less constant, which is expected due to the nonuniform sampling effect. It is clear that these eigenfunctions look like the eigenfunctions of the Laplace-Beltrami operator with the Dirichlet boundary condition without the irregularity behavior close to the boundary observed in the top panel.

Figure 6: The first 20 eigenfunctions of the LLE matrix for a point cloud sampled from the unit disk are plotted from top left to bottom right. It is clear that some eigenfunctions (indicated by red arrows) look like the eigenfunctions of the Laplace-Beltrami operator with the Dirichlet boundary condition combined with the eigenfunctions of Laplace-Beltrami operator of the rim near the boundary. Note that the rim near the boundary is close to S1 in the Gromov-Hausdorff sense.

WHEN LOCALLY LINEAR EMBEDDING HITS BOUNDARY

Figure 7: The first 20 eigenfunctions of the clipped LLE matrix for a point cloud sampled from the unit disk are plotted from top left to bottom right. It is clear that all eigenfunctions look like the eigenfunctions of the Laplace-Beltrami operator with the Dirichlet boundary condition.

Figure 8: The first 9 eigenfunctions of the LLE matrix for a point cloud sampled from the surface in Figures 2 and 3 are plotted from top left to bottom right. To enhance the visualization, the boundary of the surface is colored by red. It is clear that the first three eigenfunctions are either constant or linear, while the behavior of other eigenfunctions is not easy to describe, while compared with those shown in Figure 6.

WHEN LOCALLY LINEAR EMBEDDING HITS BOUNDARY

Figure 9: The first 9 eigenfunctions of the clipped LLE matrix for a point cloud sampled from the surface in Figures 2 and 3 are plotted from top left to bottom right. To enhance the visualization, the boundary of the surface is colored by red. It is clear that all eigenfunctions are zero on the boundary, and the behavior looks like the eigenfunctions of the Laplace Beltrami operator with the Dirichlet boundary condition.

Figure 10: The first 9 eigenfunctions of the LLE matrix for a point cloud sampled from the truncated torus M4 are plotted from top left to bottom right. To enhance the visualization, the boundary of the truncated torus is colored by red.

WHEN LOCALLY LINEAR EMBEDDING HITS BOUNDARY

Figure 11: The first 9 eigenfunctions of the clipped LLE matrix for a point cloud sampled from the truncated torus M4 are plotted from top left to bottom right. To enhance the visualization, the boundary of the truncated torus is colored by red. It is clear that all eigenfunctions are zero on the boundary .

Appendix A. Examples for Proposition 1

To show that it is possible ρ(W) = 1, consider the following example. Let n = 2m, where m 2 is an integer. Suppose X = {z1,z2, ,zn} is a uniform grid of S1 R2 so that zi = (cos(2π(i 1)

n ),sin(2π(i 1)

n )), where i = 1, ,n. We choose ε so that Nk only contains two data points (cos(2π(i 2)

n ),sin(2π(i 2)

n )) and (cos(2πi

n ),sin(2πi

n )). Fix zk, and zk,1 and zk,2 are the two data points in Nk. Without loss of generality, we assume that zk = (0,0), zk,1 = (a,b) and zk,2 = ( a,b). Hence, Gn,k at zk is

Gn,k = a a b b

and the solution y k = [yk,1,yk,2] to the regularized equation (2.3) with the regularizer c > 0 satisfies a2 +b2 +c a2 +b2

a2 +b2 a2 +b2 +c

Therefore, we have yk,1 = yk,2, w k = [1/2,1/2], and

Wki = 1/2 if zi = zk,j Nk; 0 otherwise. (A.3)

Suppose λ0 λ1 λn 1 are the eigenvalues of W. Then λ0 = 1, λn 1 = 1 and λ2i 1 = λ2i = cos(π(m i)

m ) for i = 1, ,m 1. We provide another example to show that in general it is possible that ρ(W) > 1. Consider a point cloud with ten points in R3, ( 0.56, 0.34,1.03), ( 0.51,0.32, 0.02), ( 0.53, 1.47, 0.57), (1.34,0.47, 0.15), (1.01, 1.56,1.22), ( 0.55, 1, 0.07), (0.09, 1.04, 0.2), ( 1.27,2.07, 0.9), (1.26, 0.71, 1.2), and (1.46,0,0.61). The LLE matrix of this point cloud with 5 nearest neighbors and the regularizer c = 10 3 has an eigenvalue 2.4233.

Appendix B. Technical lemmas for some geometric quantities

In this section we collect several technical lemmas for some geometric quantities we will encounter in the proof. They might be also useful for other works when the manifold with boundary setup is considered. For the manifold with boundary, denoted as M, with the isometric embedding ι into Rp, we consider the extensions M and ι introduced in Section 3.2. Let expx be the exponential map of M at x M. Recall that expx is well defined over ι 1(BRp ε (ι(x)) ι(M)) for any x M. For x M, we use IIx to denote both the second fundamental form of ι(M) at ι(x) and the second fundamental form of ι(M) at ι(x). The first three lemmas are basic facts about expx, the normal coordinate, and the volume form. The proof of these three lemmas can be found in Singer and Wu (2012).

Lemma 21 Fix x M. Consider the extension M. If we use the Cartesian coordinate to parametrize Tx M, the volume form has the following expansion

d i,j=1 Ricx(i, j)uiuj +O(u3) du, (B.1)

where u = d i=1 uiei Tx M, Ricx(i, j) = Ricx(ei,ej).

WHEN LOCALLY LINEAR EMBEDDING HITS BOUNDARY

Lemma 22 Fix x M. Consider the extensions M and ι. For u Tx M with u sufficiently small, we have the following Taylor expansion:

ι expx(u) ι(x) = ι u+ 1

2IIx(u,u)+O( u 3). (B.2)

In the next Lemma, we compare the geodesic distance and the Euclidean distance.

Lemma 23 Fix x M. Consider the extensions M and ι. If we use the polar coordinate (t,θ) [0, ) Sd 1 to parametrize Tx M, when t > 0 is sufficiently small and t = ι expx(θt) ι(x) Rp, then

24 IIx(θ,θ) 2t3 +O(t4) (B.3)

24 IIx(θ,θ) 2 t3 +O( t4),

where θ Sd 1 Tx M.

The following lemma describes a parametrization of the boundary set. This parametrization is needed when we analyze the LLE matrix near the boundary.

Lemma 24 Fix x Mε. Consider the extensions M and ι.

( ι expx) 1(BRp ε (ι(x)) ι( M)) (B.4)

= n d l=1 ul l Tx M (u1, ,ud 1) K, ud = q(u1, ,ud 1) o ,

q(u1, ,ud 1) = εx + d 1 i,j=1 ai j(x )uiuj +O( u 3), (B.5)

and ai j(x ) is the second fundamental form of the embedding of M in M at x .

Proof Note that ( ι expx) 1(BRp ε (ι(x)) ι( M)) is a hypersurface with boundary in Tx M. Since M is smooth, by the implicit function theorem, if ε is small enough,

( ι expx) 1(BRp ε (ι(x)) ι( M)) (B.6)

= n d l=1 ul l Tx M (u1, ,ud 1) K, ud = q(u1, ,ud 1) o

for a smooth function q of u1, ,ud 1. By Taylor s expansion, we have

q(u1, ,ud 1) = εx + d 1 i,j=1 ai j(x)uiuj +O( u 3), (B.7)

where the first order disappears since the tangent space of ( ι expx) 1(BRp ε (ι(x)) ι( M)) in Tx M at exp 1 x (x ) is perpendicular to ud direction by Gauss s lemma, and ai j(x) is the coefficient of the

second order expansion. Due to the smoothness of the manifold, ai j(x) is smooth along the minimizing geodesic from x to x. Also, when x = x , ai j(x ) is the second fundamental form of the embedding of M in M at x . Therefore, by another Taylor s expansion, ai j(x) = ai j(x )+O(ud), the conclusion follows.

Next Lemma describes the discrepancy between R Dε(x) f(u)du and R Dε(x) f(u)du. Note that the order of the discrepancy does not dependent on the location of x.

Corollary 25 Fix x M. When ε > 0 is sufficiently small, we have

Dε(x) du = O(εd+2). (B.8)

Proof Based on Lemma 23 and the definition of Dε(x), the distance between the boundary of Dε(x) and the boundary of Dε(x) is of order ε3. The volume of the boundary Dε(x) is of order εd 1. Hence the volume difference between Dε(x) and Dε(x) is of order εd 1 ε3 = εd+2. The conclusion follows.

Appendix C. Technical lemmas for the kernel analysis

To have a closer look at the kernel, we need the following quantities. First, we introduce some notations. For v Rp, denote

v = [[v1, v2]] Rp , (C.1)

where v1 Rd forms the first d coordinates of v and v2 Rp d forms the last p d coordinates of v. Thus, for v = [[v1, v2]] Tι(x)Rp, v1 = J p,dv is the coordinate of the tangential component of v on ι Tx M and v2 = J p,p dv is the coordinate of the normal component of v associated with a chosen basis of the normal bundle. Define

Ni j(x) := J p,p d IIi j(x).

Note that Nij(x) = N ji(x).

Definition 26 (Moments) For x M, consider the following moments that capture the geometric asymmetry:

µv(x,ε) := Z

d i=1 uvi i du,

where v = [v1,...,vd] describes the moment order.

In next lemma, we quantitatively describe all the moments up to the third order. This Lemma tells us that when x / Mε (when x is far away from the boundary), all odd order moments disappear due to the symmetry of the integration domain. However, when x Mε, it no longer holds the integration domain becomes asymmetric, and the odd moments no longer disappear. Therefore, we can show that µ0(x,ε), µed(x,ε), µ2ei(x,ε) and µ2ei+ed(x,ε) are all the non-trivial moments needed in analyzing LLE. The proof follows from the symmetry argument and a straightforward integration, so we omit it here.

WHEN LOCALLY LINEAR EMBEDDING HITS BOUNDARY

Lemma 27 [Symmetry] Suppose ε is sufficiently small. Then, the moments up to order three can be quantitatively described as follows. In fact, µ0(x,ε), µed(x,ε), µ2ei and µ2ei+ed(x,ε) for all i = 1, ,d are the only non-trivial moments. And they are continuous functions of x on M. Define |Sd 2|

d 1 = 1 when d = 1, we have

1. Zero order moment, µ0 If x Mε, µ0 is an increasing function of εx and

µ0(x,ε) = |Sd 1|

2d εd + Z εx

d 1 (ε2 h2) d 1

2 dh+O(εd+1).

If x Mε, then

µ0(x,ε) = |Sd 1|

In general, the following bound holds for µ0(x,ε):

2d εd +O(εd+1) µ0(x,ε) |Sd 1|

2. First order moment, µei If x Mε, µed is an increasing function of εx and

µed(x,ε) = |Sd 2|

d2 1(ε2 ε2 x ) d+1

2 +O(εd+2).

If x Mε, then µed(x,ε) = 0.

In general, µed(x,ε) is of order εd+1. For the rest of the moments, µei = 0, for i = 1, ,d 1.

3. Second order moment, µei+e j If x Mε, µ2ei is an increasing function of εx for i = 1, ,d. We have

µ2ei(x,ε) = |Sd 1| 2d(d +2)εd+2 + Z εx

0 |Sd 2| d2 1(ε2 h2) d+1

2 dh+O(εd+3),

for i = 1, ,d 1, and

µ2ed(x,ε) = |Sd 1| 2d(d +2)εd+2 + Z εx

d 1 (ε2 h2) d 1

2 h2dh+O(εd+3).

If x Mε, then

µ2ei(x,ε) = |Sd 1| d(d +2)εd+2.

In general, the following bounds hold for µ2ei(x,ε), where i = 1, ,d:

|Sd 1| 2d(d +2)εd+2 +O(εd+3) µ2ei(x,ε) |Sd 1| d(d +2)εd+2,

For the rest of the moments, µei+ej = 0, whenever i = j.

4. Third order moment, µei+ej+ek If x Mε, µ2ei+ed is an increasing function of εx and

µ2ei+ed(x,ε) = |Sd 2| (d2 1)(d +3)(ε2 ε2 x ) d+3

2 +O(εd+4),

for i = 1, d 1, and

µ3ed(x,ε) = |Sd 2| (d2 1)(d +3)(ε2 ε2 x ) d+1

2 (2ε2 +(d +1) ε2 x )+O(εd+4).

If x Mε, then µ3ed(x,ε) = 0.

In general, µ2ei+ed(x,ε) is of order εd+3. And µei+ej+ek = 0, for the rest of the cases.

Below, we relate those non-trivial moments described in the previous lemma to those σ functions in Definition 7. The proof follows from a straightforward change of variable, so we omit the details.

Corollary 28 The relationship between the non-trivial moments in Lemma 27 and the functions σ defined in Definition 7 satisfies:

µ0(x,ε) = σ0( εx)εd +O(εd+1)

µed(x,ε) = σ1,d( εx)εd+1 +O(εd+2).

µ2ei(x,ε) = σ2( εx)εd+2 +O(εd+3),

for i = 1, d 1, and

µ2ed(x,ε) = σ2,d( εx)εd+2 +O(εd+3).

µ2ei+ed(x,ε) = σ3( εx)εd+3 +O(εd+4),

for i = 1, d 1, and

µ3ed(x,ε) = σ3,d( εx)εd+3 +O(εd+4).

Next, we prove the following lemma about the ratio between the volumes of d 1 sphere and d 2 sphere. We need it to study the relation between different σ functions later.

Lemma 29 For d N, we have

(d +1)2(d +3)

8d2(d +2)2 < |Sd 2|2

(d 1)2|Sd 1|2 < (d +1)2

4d2(d +2) , (C.2)

where |Sd 2|

d 1 is defined as 1 when d = 1.

WHEN LOCALLY LINEAR EMBEDDING HITS BOUNDARY

Proof The inequality can be verified by a straightforward calculation for d 6.

Next, we show that |Sd 2|2

|Sd 1|2 < (d2 1)2

4d2(d+2) for d > 6. Note that |Sd 2|2

|Sd 1|2 = Γ( d

2 )2 . Hence, it suffices

to prove Γ( d

2 )2 < π(d2 1)2

4d2(d+2). In Kershaw (1983), it is proved that for all x > 0 and 0 < s < 1,

2)1 s < Γ(x+1)

Γ(x+s) < e(1 s)ψ(x+ 1+s

where ψ(y) = Γ (y)

Γ(y) . Choose x = d

2 1 and s = 1

2, then Γ( d

2 )2 < eψ( d

4). Hence, it suffice to show

4) < π(d2 1)2

4d2(d +2). (C.4)

Actually, we have eψ(y) < y for any postive y. The conclusion follows by verifying d

4 < π(d2 1)2

4d2(d+2) for d > 6. At last, we show that (d2 1)2(d+3)

8d2(d+2)2 < |Sd 2|2

|Sd 1|2 , which is equivalent to π(d2 1)2(d+3)

8d2(d+2)2 < Γ( d

(C.3), with x = d

2 1 and s = 1

2, we have d

2 )2 . The conclusion follows by verifying

π(d2 1)2(d+3)

8d2(d+2)2 < d

4 for d large.

We calculate some major ingredients that we are going to use in the proof of the main theorem. Specifically, we calculate the first two order terms in E[χBRp ε (ι(x))(ι(X))], E[(f(X) f(x))χBRp ε (ι(x))(ι(X))], and the first two order terms in the tangent component of E[(ι(X) ι(x))χBRp ε (ι(x))(ι(X))] and E[(ι(X) ι(x))(f(X) f(x))χBRp ε (ι(x))(ι(X))]. This long Lemma is the generalization of (Wu and Wu, 2018, Lemma B.5) to the boundary. In particularly, when x / Mε, we recover (Wu and Wu, 2018, Lemma B.5).

Lemma 30 Fix x M and f C3(M). When ε > 0 is sufficiently small, the following expansions hold.

1. E[χBRp ε (ι(x))(ι(X))] satisfies

E[χBRp ε (ι(x))(ι(X))] = P(x)µ0(x,ε)+ d P(x)µed(x,ε)+O(εd+2).

2. E[(f(X) f(x))χBRp ε (ι(x))(ι(X))] satisfies

E[(f(X) f(x))χBRp ε (ι(x))(ι(X))] = P(x) d f(x)µed(x,ε)

+ d i=1 (P(x)

2 2 ii f(x)+ i f(x) i P(x))µ2ei(x,ε)+O(εd+3).

3. The vector E[(ι(X) ι(x))χBRp ε (ι(x))(ι(X))] satisfies

E[(ι(X) ι(x))χBRp ε (ι(x))(ι(X))] = [[v1,v2]],

v1 =P(x)µed(x,ε)J p,ded + d i=1

i P(x)µ2ei(x,ε) J p,dei +O(εd+3)

d i=1 Nii(x)µ2ei(x,ε)+O(εd+3).

4. The vector E[(ι(X) ι(x))(f(X) f(x))χBRp ε (ι(x))(ι(X))] satisfies

E[(ι(X) ι(x))(f(X) f(x))χBRp ε (ι(x))(ι(X))] = [[v1,v2]],

v1 =P(x) d i=1

i f(x)µ2ei(x,ε) J p,dei

i f(x) d P(x)+ d f(x) i P(x)+P(x) 2 id f(x) µ2ei+ed(x,ε)J p,dei

i f(x) i P(x)+ P(x)

2 2 ii f(x) µ2ei+ed(x,ε) J p,ded +O(εd+4),

v2 =P(x) d 1 i=1 i f(x)Nid(x)µ2ei+ed(x,ε)

2 d f(x) d i=1 Nii(x)µ2ei+ed(x,ε)+O(εd+4).

Proof We use the extensions M and ι introduced in Section 3.2. For any x M, let expx be the exponential map of M. First, we calculate E[χBRp ε (ι(x))(ι(X))].

E[χBRp ε (ι(x))(ι(X))] (C.5)

P(x)+ d i=1 i P(x)ui +O(u2) 1 d i,j=1

1 6Ricx(i, j)uiuj +O(u3) du

Dε(x) du+ Z

d i=1 i P(x)uidu+O(εd+2)

=P(x)µ0(x,ε)+ d P(x)µed(x,ε)+O(εd+2),

where the second equality holds by applying Corollary 25 that the error of changing domain from Dε(x) to Dε(x) is of order εd+2. We use Lemma 27 in the last step. Note that P(x) is bounded away from 0. Second, we calculate E[(f(X) f(x))χBRp ε (ι(x))(ι(X))]. Note that when ε is sufficiently small, we have

f expx(u) f(x) = d i=1 i f(x)ui + 1

d i,j=1 2 i j f(x)uiuj +O(u3), (C.6)

WHEN LOCALLY LINEAR EMBEDDING HITS BOUNDARY

which is of order ε for u Dε(x). By a direct expansion, we have

E[(f(X) f(x))χBRp ε (ι(x))(ι(X))] (C.7)

Dε(x)( d i=1 i f(x)ui + 1

d i,j=1 2 i j f(x)uiuj +O(u3))(P(x)+ d i=1 i P(x)ui +O(u2))

1 6Ricx(i, j)uiuj +O(u3))du,

which by Corollary 25 and Lemma 27 becomes

P(x) d i=1 i f(x)ui + P(x)

d i,j=1 2 i j f(x)uiuj + d i=1 i f(x)ui

d j=1 j P(x)uj +O(u3) du

=P(x) d f(x) Z

Dε(x) uddu+ d i=1 (P(x)

2 2 ii f(x)+ i f(x) i P(x)) Z

Dε(x) u2 i du+O(εd+3)

=P(x) d f(x)µed(x,ε)+ d i=1 (P(x)

2 2 ii f(x)+ i f(x) i P(x))µ2ei(x,ε)+O(εd+3).

Note that the leading term in the integral is of order ε, so the error of changing the domain from Dε(x) to Dε(x) is of order εd+3. Third, by a direct expansion, we have

E[(ι(X) ι(x))χBRp ε (ι(x))(ι(X))] (C.8)

Dε(x)( ι u+ 1

2IIx(u,u)+O(u3))(P(x)+ d i=1 i P(x)ui +O(u2))

1 6Ricx(i, j)uiuj +O(u3))du,

which is a vector in Rp. We then find the tangential part and the normal part of E[(ι(X) ι(x))χBRp ε (ι(x))(ι(X))] respectively. The tangential part is

Dε(x)( ι u+O(u3))(P(x)+ d i=1 i P(x)ui +O(u2)) (C.9)

1 6Ricx(i, j)uiuj +O(u3))du

Dε(x)( ι u+O(u3))(P(x)+ d i=1 i P(x)ui +O(u2))

1 6Ricx(i, j)uiuj +O(u3))du+O(εd+3),

where the equality holds by Corollary 25. Similarly, by Corollary 25, the normal part is

2IIx(u,u)+O(u3))(P(x)+ d i=1 i P(x)ui +O(u2)) (C.10)

1 6Ricx(i, j)uiuj +O(u3))du

2IIx(u,u)+O(u3))(P(x)+ d i=1 i P(x)ui +O(u2))

1 6Ricx(i, j)uiuj +O(u3))du+O(εd+4)

since the leading term P(x)IIx(u,u) is of order ε2 on Dε(x). As a result, by putting the tangent part and normal part together, E[(ι(X) ι(x))χBRp ε (ι(x))(ι(X))] = [[v1,v2]], where

v1 =J p,d h P(x) Z

Dε(x) ι udu+ Z

Dε(x) ι u d i=1 i P(x)uidu+O(εd+3) i (C.11)

Dε(x) uddu J p,ded + d i=1

Dε(x) u2 i du J p,dei +O(εd+3)

=P(x)µed(x,ε)J p,ded + d i=1 i P(x)µ2ei(x,ε)J p,dei +O(εd+3)

Dε(x) IIx(u,u)du+O(εd+3) = P(x)

d i=1 Nii(x)µ2ei(x,ε)+O(εd+3).

Finally, we evaluate E[(ι(X) ι(x))(f(X) f(x))χBRp ε (ι(x))(ι(X))] and then find the tangential part and the normal part. By a direct expansion,

E[(ι(X) ι(x))(f(X) f(x))χBRp ε (ι(x))(ι(X))] (C.12)

Dε(x)( ι u+ 1

2IIx(u,u)+O(u3))( d i=1 i f(x)ui + 1

d i,j=1 2 i j f(x)uiuj +O(u3))

P(x)+ d i=1 i P(x)ui +O(u2) 1 d i,j=1

1 6Ricx(i, j)uiuj +O(u3) du.

The tangential part is

Dε(x)( ι u+O(u3)) d i=1 i f(x)ui + 1

d i,j=1 2 i j f(x)uiuj +O(u3) (C.13)

P(x)+ d i=1 i P(x)ui +O(u2) 1 d i,j=1

1 6Ricx(i, j)uiuj +O(u3) du.

WHEN LOCALLY LINEAR EMBEDDING HITS BOUNDARY

The leading term P(x) ι u d i=1 i f(x)ui is of order ε2 on Dε(x), therefore the error of changing domain from Dε(x) to Dε(x) is of order εd+4. The normal part is

2IIx(u,u)+O(u3)) d i=1 i f(x)ui + 1

d i,j=1 2 i j f(x)uiuj +O(u3) (C.14)

P(x)+ d i=1 i P(x)ui +O(u2) 1 d i,j=1

1 6Ricx(i, j)uiuj +O(u3) du.

The leading term P(x)IIx(u,u) d i=1 i f(x)ui is of order ε3 on Dε(x). Therefore, the error of changing domain from Dε(x) to Dε(x) is of order εd+5. Putting the above together, E[(ι(X) ι(x))(f(X) f(x))χBRp ε (ι(x))(ι(X))] = [[v1,v2]], where by the symmetry of Dε(x) we have

Dε(x) ι u d i=1 i f(x)uidu+ Z

Dε(x) ι u d i=1 i f(x)ui

d j=1 j P(x)ujdu (C.15)

Dε(x) ι u d i,j=1 2 ij f(x)uiujdu+O(εd+4)

=P(x) d i=1

Dε(x) u2 i du J p,dei

i f(x) d P(x)+ d f(x) i P(x)+P(x) 2 id f(x) Z

Dε(x) u2 i uddu J p,dei

i f(x) i P(x)+ P(x)

2 2 ii f(x) Z

Dε(x) u2 i uddu J p,ded +O(εd+4)

=P(x) d i=1 i f(x)µ2ei(x,ε)J p,dei

i f(x) d P(x)+ d f(x) i P(x)+P(x) 2 id f(x) µ2ei+ed(x,ε)J p,dei

i f(x) i P(x)+ P(x)

2 2 ii f(x) µ2ei+ed(x,ε)J p,ded +O(εd+4),

d i=1 i f(x) Z

Dε(x) IIx(u,u)uidu+O(εd+4)

=P(x) d 1 i=1 i f(x)Nid(x)µ2ei+ed(x,ε)+ P(x)

2 d f(x) d i=1 Nii(x)µ2ei+ed(x,ε)+O(εd+4).

Appendix D. Structure of the local covariance matrix under the manifold setup

In this section we provide detailed analysis for the local covariance matrix Cx = E[(ι(X) ι(x))(ι(X) ι(x)) χBRp ε (ι(x))(ι(X))]. This Lemma could be viewed as the generalization of (Wu and Wu, 2018, Proposition 3.2) in the sense that when x / Mε, the result is reduced to that of (Wu and Wu, 2018, Proposition 3.2). To handle the boundary effect, we only need to calculate the first two order terms in eigenvalues and orthonormal eigenvectors of Cx.

Lemma 31 Fix x M. Suppose that rank(Cx) = r, there is a choice of ed+1, ,ep so that we have e i IIx(ej,ej) = 0 for all i = r +1, , p and j = 1, d. We have

M(0)(x,ε) 0 0 0 0 0 0 0 0

M(11)(x,ε) M(12)(x,ε) 0 M(21)(x,ε) 0 0 0 0 0

+ O(εd+4) O(εd+4) O(εd+4) M(3)(x,ε)+O(εd+5)

where M(0) is a d d diagonal matrix with the m-th diagonal entry µ2em(x,ε). M(11) is a symmetric

d d matrix. M(12) Rd (r d). M(21) = M(12) . In particular, when x Mε,

M(11)(x,ε) M(12)(x,ε) 0 M(21)(x,ε) 0 0 0 0 0

0. M(3)(x,ε) is diagonal (p d) (p d) matrix and is of order εd+4. The first d eigenvalues of Cx are λi = P(x)µ2ei(x,ε)+λ (1) i (x,ε)+O(εd+4), (D.2)

where i = 1,...,d. And λ (1) i (x,ε) = O(εd+3). If x Mε, λ (1) i (x,ε) = 0. The last p d eigenvalues of Cx are λi = O(εd+4), where i = d +1,..., p. The corresponding orthonormal eigenvector matrix is

X(x,ε) = X(x,0)+X(x,0)S(x,ε)+O(ε2), (D.3)

X1(x) 0 0 0 X2(x) 0 0 0 X3(x)

S11(x,ε) S12(x,ε) S13(x,ε) S21(x,ε) S22(x,ε) S23(x,ε) S31(x,ε) S32(x,ε) S33(x,ε)

X1 O(d), X2 O(r d) and X3 O(p r). The matrix S(x,ε) is divided into blocks the same as X(x,0). Moreover, S(x,ε) is an antisymmetric matrix with 0 on the diagonal entries. In particular, if x Mε, S(x,ε) = 0.

The proof is essentially the same as that of (Wu and Wu, 2018, Proposition 3.2), except that when x is close to the boundary, the integral domain is no longer symmetric. Proof By definition, the (m,n)-th entry of Cx is

e m Cxen = Z

Dε(x)(ι(y) ι(x)) em(ι(y) ι(x)) en P(y)d V(y). (D.5)

WHEN LOCALLY LINEAR EMBEDDING HITS BOUNDARY

By the expression

ι expx(u) ι(x) = ι u+ 1

2IIx(u,u)+O(u3), (D.6)

(ι(y) ι(x)) em(ι(y) ι(x)) en

= (e m ι u)(e n ι u)+ 1

2(e m ι u)(e n IIx(u,u))+ 1

2(e m IIx(u,u))(e n ι u)+O(u4).

Thus, (D.5) is reduced to

e m Cxen = Z

(e m ι u)(e n ι u)+ 1

2(e m ι u)(e n IIx(u,u))+ 1

2(e m IIx(u,u))(e n ι u)

4[e m IIx(u,u)e n IIx(u,u)]+O(u5) (D.7)

P(x)+ u P(x)+O(u2) 1 d i,j=1

1 6Ricx(i, j)uiuj +O(u3) du.

For 1 m,n d, (e m ι u)(e n ι u) = umun. Moreover, e n IIx(u,u) and e m IIx(u,u) are zero, so

e m Cxen (D.8)

Dε(x)(umun +O(u4)) P(x)+ u P(x)+O(u2)

1 6Ricx(i, j)uiuj +O(u3) du

Dε(x) umundu+ Z

d k=1 uk k P(x)du+O(εd+4).

where we use Lemma 25 to handle the error of changing domain from Dε(x) to Dε(x), which is O(εd+4). By the symmetry of domain Dε(x), if 1 m = n d,

M(0) m,n = Z

Dε(x) u2 mdu = µ2em(x,ε) (D.9)

and M(0) m,n is 0 otherwise. Next,

M(11) m,n = Z

d k=1 uk k P(x)du (D.10)

So, by the symmetry of domain Dε(x), we have

M(11) m,n =

d P(x)µ2em+ed(x,ε) 1 m = n d, n P(x)µ2en+ed(x,ε) m = d, 1 n d, m P(x)µ2em+ed(x,ε) n = d, 1 m d, 0 otherwise.

For d +1 m p and d +1 n p, we have

e m Cxen = Z

4[e m IIx(u,u)e n IIx(u,u)]+O(u5) (P(x)+O(u))(1+O(u))du (D.12)

Dε(x) e m IIx(u,u)e n IIx(u,u)du+O(εd+5).

Hence, we have

M(3) m d,n d(x,ε) = P(x)

Dε(x) e m IIx(u,u)e n IIx(u,u)du. (D.13)

Since M(3) m d,m d(x,ε) is symmetric, we can choose ed+1, ,ep so that it is diagonal. Then M(3) m d,m d(x,ε) = 0 implies Z

Dε(x)(e m IIx(u,u))2du = 0. (D.14)

Note that since e m IIx(u,u) is a quadratic form of u, we have e m IIx(u,u) = 0. Since Cx has rank r, M(3) m d,m d(x,ε) = 0 for m = r+1, , p, and e m IIx(ei,e j) = 0 for m = r+1, , p and i, j = 1, ,d. For 1 m d and n d,

e m Cxen = Z

2(e m ι u)(e n IIx(u,u))+O(u4) P(x)+ u P(x)+O(u2) (D.15)

1 6Ricx(i, j)uiuj +O(u3) du

Dε(x) um(e n IIx(u,u))du+O(εd+4).

We use Lemma 25 to handle the error of changing domain from Dε(x) to Dε(x), which is O(εd+5). Hence, for 1 m d and d +1 n r,

M(12)(x)m,n d = P(x)

Dε(x) um(e n IIx(u,u))du. (D.16)

By symmetry of Cx, we have M(21) = M(12) . For 1 m d and r +1 n p,

e m Cxen = P(x)

Dε(x) um(e n IIx(u,u))du+O(εd+4) = O(εd+4). (D.17)

For 1 n d and r +1 m p, e m Cxen = O(εd+4) by symmetry.

Based on Lemma 27,

M(0)(x,ε) 0 0 0 0 0 0 0 0

is of order εd+2 and

M(11)(x,ε) M(12)(x,ε) 0 M(21)(x,ε) 0 0 0 0 0

is of order εd+3. Note that the entries of

M(11)(x,ε) M(12)(x,ε) 0 M(21)(x,ε) 0 0 0 0 0

are integrals of odd-order

WHEN LOCALLY LINEAR EMBEDDING HITS BOUNDARY

polynomials over Dε(x). Hence, the matrix is 0 when x Mε. By applying the perturbation theory (see, for example, (Wu and Wu, 2018, Appendix A)), the first d eigenvalues of Cx are

λi = P(x)µ2ei(x,ε)+λ (1) i (x,ε)+O(εd+4), (D.18)

for i = 1,...,d and any x M, where {λ (1) i (x,ε)} are of order εd+3. The calculation of {λ (1) i (x,ε)} depends on M(11)(x,ε) and whether µ2ei(x,ε) are the same. Moreover, λi = O(εd+4) for i = d + 1,..., p. Suppose that rank(Cx) = r, based on the perturbation theory (see, for example, (Wu and Wu, 2018, Appendix A)), the orthonormal eigenvector matrix of Cx is in the form

X1(x) 0 0 0 X2(x) 0 0 0 X3(x)

X1(x) 0 0 0 X2(x) 0 0 0 X3(x)

S(x,ε)+O(ε2), (D.19)

where X1(x) O(d), X2(x) O(r d) and X3(x) O(p r). And

S11(x,ε) S12(x,ε) S13(x,ε) S21(x,ε) S22(x,ε) S23(x,ε) S31(x,ε) S32(x,ε) S33(x,ε)

S(x,ε) is an antisymmetric matrix with 0 on the diagonal entries. It is of order ε and depends on those terms of Cx of order εd+2, order εd+3 and higher orders. In particular,

X1(x)S12(x,ε) = [P(x)M(0)(x,ε)] 1M(12)(x,ε)X2.

And a straightforward calculation shows that

e i Jp,d X1(x)S12(x,ε) = µ2ei+ed(x,ε)

µ2ei(x,ε) N id(x)Jp d,r d X2(x), (D.20)

for i = 1, ,d 1, and

e d Jp,d X1(x)S12(x,ε) = 1

µ2ej+ed(x,ε)

µ2ed(x,ε) N j j(x)Jp d,r d X2(x). (D.21)

If x Mε, S(x,ε) = 0. Moreover, if among first µ2e1, ,µ2ed, there are 1 k d distinct ones, then there is a choice of the basis in the tangent space of M so that

X(1) 1 (x) 0 0 0 X(2) 1 (x) 0

0 0 ... 0 0 0 X(k) 1 (x)

where each X(i) 1 (x) is an orthogonal matrix corresponding to the same of µ2ei. The conclusion follows.

Appendix E. Analysis on the augmented vector T(x)

We now calculate T(x). For our purpose, we need an asymptotic expansion up to the first two orders for the tangent component of T(x) and to the first order for the normal component when ε is sufficiently small.

Lemma 32 T(x) = [[v( 1) 1 +v(0) 1,1 +v(0) 1,2 +v(0) 1,3 +v(0) 1,4,v( 1) 2 ]]+[[O(ε),O(1)]], where

v( 1) 1 = µed(x,ε)

µ2ed(x,ε)J p,ded ,

v(0) 1,1 = P(x)

v(0) 1,2 = εd+3µed(x,ε) P(x)(µ2ed(x,ε))2 J p,ded , (E.1)

v(0) 1,3 = d i=1

i P(x)µed(x,ε)µ2ei+ed(x,ε)

P(x)µ2ei(x,ε)µ2ed(x,ε) J p,dei ,

v(0) 1,4 = P(x)

µed(x,ε)µ2ej+ed(x,ε)

µ2ed(x,ε) µ2e j(x,ε) µ2ei+ed(x,ε)

µ2ei(x,ε) N j j(x) Nid(x)J p,dei

µed(x,ε)µ2ej+ed(x,ε)

µ2ed(x,ε) µ2e j(x,ε) µ2ei+ed(x,ε)

µ2ed(x,ε) N j j(x) Nii(x)J p,ded,

v( 1) 2 = P(x)

µ2e j(x,ε) µed(x,ε)µ2ej+ed(x,ε)

µ2ed(x,ε) N j j. (E.2)

Note that by Lemma 27, v( 1) 1 is of order ε 1 when x Mε and 0 when x / Mε; v(0) 1,2 is of order 1 since µed(x,ε) is of order εd+1 and µ2ei(x,ε) is of order εd+2 for i = 1,...,d. Moreover, when x / Mε, we have µed(x,ε) = 0 and µ2ei+ed(x,ε) = 0. Hence, v(0) 1,2 = 0,v(0) 1,3 = 0 and v(0) 1,4 = 0. Similarly,

v( 1) 2 is of order ε 1. Proof Recall that

T(x) = r i=1

E[(ι(X) ι(x))χBRp ε (ι(x))(ι(X))] βiβ i λi +εd+3 . (E.3)

To show the proof, we evaluate the terms in T(x) one by one. Based on Lemma 31, the first d eigenvalues are λi = P(x)µ2ei(x,ε) + λ (1) i (x,ε) + O(εd+4), where i = 1,...,d, and the corresponding eigenvectors are

βi = X1(x)J p,dei 0(p d) 1

+ X1(x)S11(x,ε)J p,dei +O(ε2) O(ε)

where X1(x) O(d). For i = d +1,...,r, λi = O(εd+4), and the corresponding eigenvectors are

βi = 0d 1 Jp d,r d X2(x)J p,r dei

+ X1(x)S12(x,ε)J p,r dei +O(ε2) O(ε)

WHEN LOCALLY LINEAR EMBEDDING HITS BOUNDARY

where X2(x) O(r d). By Lemma 30, we have

E[(ι(X) ι(x))χBRp ε (ι(x))(ι(X))] = [[v1,v2]], (E.6)

v1 =P(x)µed(x,ε)J p,ded + d i=1 i P(x)µ2ei(x,ε)J p,dei +O(εd+3), (E.7)

d i=1 Nii(x)µ2ei(x,ε)+O(εd+3).

Next, we calculate E[(ι(X) ι(x))χBRp ε (ι(x))(ι(X))] βi , for i = 1,...,d. Note that the normal component of βi is of order ε and the normal component of E[(ι(X) ι(x))χBRp ε (ι(x))(ι(X))] is of order εd+2, so they will only contribute in the O(εd+3) term. Therefore, for i = 1,...,d, the first two order terms of E[(ι(X) ι(x))χBRp ε (ι(x))(ι(X))] βi are

E[(ι(X) ι(x))χBRp ε (ι(x))(ι(X))] βi

= P(x)µed(x,ε) e d Jp,d X1(x)J p,dei + P(x)µed(x,ε) e d Jp,d X1(x)S11(x,ε)J p,dei

Pj(x)µ2ej(x,ε) e j Jp,d X1(x)J p,dei +O(εd+3).

By putting the above expressions together, a direct calculation shows that the normal component

E[(ι(X) ι(x))χBRp ε (ι(x))(ι(X))] βiβ i λi+εd+3 is of order 1 and the tangent component of

E[(ι(X) ι(x))χBRp ε (ι(x))(ι(X))] βiβ i λi+εd+3 is of order ε 1:

P(x)µed(x,ε) d i=1

(e d Jp,d X1(x)J p,dei)X1(x)J p,dei λi +εd+3 (E.8)

d j=1( Pj(x)µ2ej(x,ε))(e j Jp,d X1(x)J p,dei)X1(x)J p,dei λi +εd+3

+P(x)µed(x,ε) d i=1

(e d Jp,d X1(x)S11(x,ε)J p,dei)X1(x)J p,dei λi +εd+3

+P(x)µed(x,ε) d i=1

(e d Jp,d X1(x)J p,dei)X1(x)S11(x,ε)J p,dei λi +εd+3 +O(ε),

where the first term is of order ε 1, the second to the fourth terms are of order 1 since µed(x,ε) is of order εd+1, µ2ei(x,ε) is of order εd+2 for i = 1,...,d, S11(x,ε) is of order ε and λi is of order εd+2

for i = 1,...,d. Note that above formula involves λi and S11(x,ε). We are going to express those terms by µ2ei and µ2ei+ed. In the following paragraph, we prepare some necessary ingredients to

simplify the formula of tangent component of d i=1

E[(ι(X) ι(x))χBRp ε (ι(x))(ι(X))] βiβ i λi+εd+3 . Recall that from

X(1) 1 (x) 0 0 0 X(2) 1 (x) 0

0 0 ... 0 0 0 X(k) 1 (x)

1 k d. Here different X(i) 1 corresponds to different µ2ei. Each X(i) 1 is an orthogonal matrix. By reordering the basis {e1, ,ed} of the tangent space Tx M, we suppose that among µ2e1(x,ε), ,µ2ed 1(x,ε), the first t terms of them are different from µ2ed. Define

X(1) 1 (x) 0 0 0 X(2) 1 (x) 0

0 0 ... 0 0 0 X(k 1) 1 (x)

Hence, we have

X1(x) = X1,1(x) 0 0 X1,2(x)

where X1,1(x) O(t), X1,2 O(d t), and 0 t d 1. Divide M(11)(x,ε) in (D.1) and S11(x,ε)

in (D.4) corresponding to X1,1(x) 0 0 X1,2(x)

M(11)(x,ε) =

" M(11) 1 (x,ε) M(11) 2 (x,ε) M(11) 3 (x,ε) M(11) 4 (x,ε)

, S11(x,ε) = S11,1(x,ε) S11,2(x,ε) S11,3(x,ε) S11,4(x,ε)

Recall that

M(11) m,n =

d P(x)µ2em+ed(x,ε) 1 m = n d, n P(x)µ2en+ed(x,ε) m = d,1 n d, m P(x)µ2em+ed(x,ε) n = d,1 m d, 0 otherwise.

By the perturbation theory (see, e.g., (Wu and Wu, 2018, Appendix A)), λ (1) t+1(x,ε), ,λ (1) d (x,ε)

are the eigenvalues of M(11) 4 (x,ε) and X1,2(x) is the orthonormal eigenvector matrix of M(11) 4 (x,ε). We have S11,2(x,ε) = X 1,1(x)[P(x)µ2ed(x,ε)It t Λ] 1M(11) 2 (x,ε)X1,2(x), (E.11)

P(x)µ2e1(x,ε) 0

0 ... 0 0 P(x)µ2et(x,ε)

WHEN LOCALLY LINEAR EMBEDDING HITS BOUNDARY

Next, we simplify the terms in equation (E.8) one by one. We start from the first one. Recall that based on the structure of X1, we have e d Jp,d X1(x)J p,dei = 0 for i = 1, ,t. Hence,

P(x)µed(x,ε) d i=1

(e d Jp,d X1(x)J p,dei)X1(x)J p,dei λi +εd+3 (E.12)

=P(x)µed(x,ε) d i=t+1

(e d Jp,d X1(x)J p,dei)X1(x)J p,dei

P(x)µ2ed(x,ε)+λ (1) i (x,ε)+εd+3 +O(εd+4)

=P(x)µed(x,ε) d i=t+1

1 P(x)µ2ed(x,ε) λ (1) i (x,ε)+εd+3

(P(x)µ2ed(x,ε))2 +O(ε d)

(e d Jp,d X1(x)J p,dei)X1(x)J p,dei

d i=t+1 (e d Jp,d X1(x)J p,dei)X1(x)J p,dei

µed(x,ε) P(x)(µ2ed(x,ε))2

d i=t+1 λ (1) i (x,ε)(e d Jp,d X1(x)J p,dei)X1(x)J p,dei

εd+3µed(x,ε) P(x)(µ2ed(x,ε))2

d i=t+1 (e d Jp,d X1(x)J p,dei)X1(x)J p,dei +O(ε).

Note that we use (D.2) in the first step. Moreover, we have

µed(x,ε) µ2ed(x,ε)

d i=t+1 (e d Jp,d X1(x)J p,dei)X1(x)J p,dei = µed(x,ε)

µ2ed(x,ε)J p,ded (E.13)

εd+3µed(x,ε) P(x)(µ2ed(x,ε))2

d i=t+1 (e d Jp,d X1(x)J p,dei)X1(x)J p,dei = εd+3µed(x,ε) P(x)(µ2ed(x,ε))2 J p,ded. (E.14)

By using the eigen-decomposition of M(11) 4 (x,ε), we have

d i=t+1 λ (1) i (x,ε)(e d Jp,d X1(x)J p,dei)X1(x)J p,dei (E.15)

= d i=t+1 λ (1) i (x,ε)(e d Jp,d X1(x)J p,dei)(e k Jp,d X1(x)J p,dei)

= d i=t+1 λ (1) i (x,ε)(e d Jp,d X1(x)J p,dei)(e i Jp,d X 1 (x)J p,dek)

= k P(x)µ2ek+ed(x,ε)

for t +1 k d and this quantity is 0 if 1 k d. Thus, if we sum up the above terms, we have

P(x)µed(x,ε) d i=1

(e d Jp,d X1(x)J p,dei)X1(x)J p,dei λi +εd+3 (E.16)

µ2ed(x,ε)J p,ded εd+3µed(x,ε) P(x)(µ2ed(x,ε))2 J p,ded

j P(x)µed(x,ε)µ2ej+ed(x,ε)

P(x)(µ2ed(x,ε))2 J p,dej +O(ε).

Next, we simplify the second term in (E.8). Recall the description of X1 (e.g. (D.22)). We have

e j Jp,d X1(x)J p,dei X1(x)J p,dei µ2ei(x,ε) = 1 µ2ej(x,ε)J p,dej. (E.17)

d j=1 Pj(x)µ2ej(x,ε) e j Jp,d X1(x)J p,dei X1(x)J p,dei λi +εd+3 (E.18)

d j=1 Pj(x)µ2ej(x,ε) e j Jp,d X1(x)J p,dei X1(x)J p,dei

P(x)µ2ei(x,ε)+λ (1) i (x,ε)+εd+3 +O(εd+4)

Pj(x)µ2ej(x,ε)

e j Jp,d X1(x)J p,dei X1(x)J p,dei µ2ei(x,ε) +O(ε)

P(x) +O(ε).

At last, we simplify the third and the last terms in (E.8) together, because we need to use the antisymmetric property of S11(x,ε).

P(x)µed(x,ε) d i=1

(e d Jp,d X1(x)S11(x,ε)J p,dei)X1(x)J p,dei λi (E.19)

+P(x)µed(x,ε) d i=1

(e d Jp,d X1(x)J p,dei)X1(x)S11(x,ε)J p,dei λi

=µed(x,ε) d i=1

(e d Jp,d X1(x)S11(x,ε)J p,dei)X1(x)J p,dei

µ2ei(x,ε)+λ (1) i (x,ε)/P(x)+εd+3 +O(εd+4)

+ (e d Jp,d X1(x)J p,dei)X1(x)S11(x,ε)J p,dei

µ2ei(x,ε)+λ (1) i (x,ε)/P(x)+εd+3 +O(εd+4)

where we denote

v :=µed(x,ε) d i=1

(e d Jp,d X1(x)S11(x,ε)J p,dei)X1(x)J p,dei µ2ei(x,ε) (E.20)

+ (e d Jp,d X1(x)J p,dei)X1(x)S11(x,ε)J p,dei µ2ei(x,ε)

WHEN LOCALLY LINEAR EMBEDDING HITS BOUNDARY

We now simplify v. Note that, for 1 k d,

=µed(x,ε) d i=1

d j=1 e d Jp,d X1(x)J p,de j e j Jp,d S11(x,ε)J p,dei µ2ei(x,ε) e k Jp,d X1(x)J p,dei

+ µed(x,ε) d i=1 e d Jp,d X1(x)J p,dei d j=1 e k Jp,d X1(x)J p,dej e j Jp,d S11(x,ε)J p,dei µ2ei(x,ε)

=µed(x,ε) d j=1 e d Jp,d X1(x)J p,dej d i=1 e k Jp,d X1(x)J p,dei e j Jp,d S11(x,ε)J p,dei µ2ei(x,ε)

+ µed(x,ε) d j=1 e d Jp,d X1(x)J p,dej d i=1 e k Jp,d X1(x)J p,dei e i Jp,d S11(x,ε)J p,dej µ2ej(x,ε)

=µed(x,ε) d j=1 e d Jp,d X1(x)J p,dej

h d i=1 e k Jp,d X1(x)J p,dei 1 µ2ej(x,ε) 1 µ2ei(x,ε)

e i Jp,d S11(x,ε)J p,dej i .

In the last step, we use the fact that S11(x,ε) is antisymmetric. Based on the structure of X1(x), e d Jp,d X1(x)J p,dej = 0 for j = 1, ,t, e k Jp,d X1(x)J p,dei = 0, for k = 1, ,t and i = t + 1, ,d and e k Jp,d X1(x)J p,dei = 0 for k = t +1, ,d and i = 1, ,t. We can further simplify v as

e k Jp,d v = µed(x,ε) d j=t+1 e d Jp,d X1(x)J p,dej (E.21)

h t i=1 e k Jp,d X1(x)J p,dei 1 µ2ed(x,ε) 1 µ2ei(x,ε)

e i Jp,d S11(x,ε)J p,dej i ,

for k = 1, ,t, and

=µed(x,ε) d j=t+1 e d Jp,d X1(x)J p,de j (E.22)

h d i=t+1 e k Jp,d X1(x)J p,dei 1 µ2ed(x,ε) 1 µ2ei(x,ε)

e i Jp,d S11(x,ε)J p,de j i

for k = t +1, ,d, where we use the fact that µ2et+1(x,ε) = = µ2ed(x,ε). Next, we focus on the case when 1 k t. By (E.11), for 1 i t and t +1 j d, we have

e i Jp,d S11(x,ε)J p,dej = t l=1 e i Jp,d X 1 (x)J p,del

(e l Jp,d M(11)(x,ε)J p,dem)(e m Jp,d X1(x)J p,dej)

P(x)µ2ed(x,ε) P(x)µ2el(x,ε) . (E.23)

Note by Lemma 31, if 1 l t, and t +1 m < d, then e l Jp,d M(11)(x,ε)J p,dem = 0. And

e l Jp,d M(11)(x,ε)J p,ded = l P(x)µ2el+ed(x,ε). (E.24)

e i Jp,d S11(x,ε)J p,dej = e j Jp,d X 1 (x)J p,ded

t l=1 e i Jp,d X 1 (x)J p,del l P(x)µ2el+ed(x,ε) P(x)µ2ed(x,ε) P(x)µ2el(x,ε). (E.25)

We substitute above equation into (E.21),

=µed(x,ε) d j=t+1 (e d Jp,d X1(x)J p,dej)(e j Jp,d X 1 (x)J p,ded) t i=1 e k Jp,d X1(x)J p,dei

1 µ2ed(x,ε) 1 µ2ei(x,ε)

t l=1 e i Jp,d X 1 (x)J p,del l P(x)µ2el+ed(x,ε) P(x)µ2ed(x,ε) P(x)µ2el(x,ε)

=µed(x,ε) t i=1 e k Jp,d X1(x)J p,dei 1 µ2ed(x,ε) 1 µ2ei(x,ε)

t l=1 e i Jp,d X 1 (x)J p,del l P(x)µ2el+ed(x,ε) P(x)µ2ed(x,ε) P(x)µ2el(x,ε)

=µed(x,ε) t l=1

t i=1 (e k Jp,d X1(x)J p,dei)(e i Jp,d X 1 (x)J p,del) 1 µ2ed(x,ε) 1 µ2ei(x,ε)

l P(x)µ2el+ed(x,ε) P(x)µ2ed(x,ε) P(x)µ2el(x,ε)

=µed(x,ε) ( 1 µ2ed(x,ε) 1 µ2ek(x,ε)) t l=1

t i=1 (e k Jp,d X1(x)J p,dei)(e i Jp,d X 1 (x)J p,del)

l P(x)µ2el+ed(x,ε) P(x)µ2ed(x,ε) P(x)µ2el(x,ε)

where we use the fact that

d j=t+1 (e d Jp,d X1(x)J p,dej)(e j Jp,d X 1 (x)J p,ded) = 1 (E.26)

in the second step and the fact that

(e k Jp,d X1(x)J p,dei)(e i Jp,d X 1 (x)J p,del) = 0 (E.27)

only if e k Jp,d X1(x)J p,dei and e i Jp,d X 1 (x)J p,del are entries in the block X(m) 1 in (E.9) corresponding to µ2ek in the fourth step. Note that

t i=1 (e k Jp,d X1(x)J p,dei)(e i Jp,d X 1 (x)J p,del) = 1, (E.28)

WHEN LOCALLY LINEAR EMBEDDING HITS BOUNDARY

if 1 k = l t and is 0 otherwise. Hence, we have

e k Jp,d v = k P(x)µed(x,ε)µ2ek+ed(x,ε)

P(x)µ2ek(x,ε)µ2ed(x,ε) , (E.29)

for 1 k t, and e k Jp,d v = 0, for t +1 k d. If we sum up equations (E.16) (E.18) and (E.29),

the tangent component of d i=1

E[(ι(X) ι(x))χBRp ε (ι(x))(ι(X))] βiβ i λi+εd+3 becomes

µed(x,ε) µ2ed(x,ε)J p,ded + P(x)

P(x) εd+3µed(x,ε) P(x)(µ2ed(x,ε))2 J p,ded (E.30)

i P(x)µed(x,ε)µ2ei+ed(x,ε)

P(x)(µ2ed(x,ε))2 J p,dei

i P(x)µed(x,ε)µ2ei+ed(x,ε)

P(x)µ2ei(x,ε)µ2ed(x,ε) J p,dei +O(ε)

µ2ed(x,ε)J p,ded + P(x)

P(x) εd+3µed(x,ε) P(x)(µ2ed(x,ε))2 J p,ded

i P(x)µed(x,ε)µ2ei+ed(x,ε)

P(x)µ2ei(x,ε)µ2ed(x,ε) J p,dei +O(ε),

where we use µ2et+1(x,ε) = = µ2ed(x,ε) in the last step.

We now finish calculating the tangent component of d i=1

E[(ι(X) ι(x))χBRp ε (ι(x))(ι(X))] βiβ i λi+εd+3 . Next,

we need to calculate both the tangent and the normal component of r i=d+1

E[(ι(X) ι(x))χBRp ε (ι(x))(ι(X))] βiβ i λi+εd+3 Note that for i = d +1,...,r,

E[(ι(X) ι(x))χBRp ε (ι(x))(ι(X))] βi (E.31)

=P(x)µed(x,ε) e d Jp,d X1(x)S12(x,ε)J p,r dei

d j=1 µ2ej(x,ε)N j j(x)Jp d,r d X2(x)J p,r dei +O(εd+3),

where both terms are of order εd+2. Since λi = O(εd+4), εd+3 dominates the eigenvalues. For i = d +1,...,r, we have

E[(ι(X) ι(x))χBRp ε (ι(x))(ι(X))] βi λi +εd+3 = E[(ι(X) ι(x))χBRp ε (ι(x))(ι(X))] βi εd+3 +O(εd+4) (E.32)

=P(x)µed(x,ε)

εd+3 e d Jp,d X1(x)S12(x,ε)J p,r dei

εd+3 N j j(x)Jp d,r d X2(x)J p,r dei +O(1).

Similarly, we need to express the above formula in terms of µ2ei(x,ε) and µ2ei+ed(x,ε). The simplification here mainly relies on the perturbation formula equations (D.20) and (D.21) which relates

S12(x,ε) with the second fundamental form of the manifold at x. First of all, a direct calculation shows that

E[(ι(X) ι(x))χBRp ε (ι(x))(ι(X))] βiβ i λi +εd+3

E[(ι(X) ι(x))χBRp ε (ι(x))(ι(X))] βiβ i εd+3 +O(εd+4)

= hh r i=d+1

P(x)µed(x,ε)

εd+3 e d Jp,d X1(x)S12(x,ε)J p,r dei X1(x)S12(x,ε)J p,r dei

εd+3 N j j(x)Jp d,r d X2(x)J p,r dei

X1(x)S12(x,ε)J p,r dei +O(ε),

P(x)µed(x,ε)

εd+3 e d Jp,d X1(x)S12(x,ε)J p,r dei Jp d,r d X2(x)J p,r dei

εd+3 N j j(x)Jp d,r d X2(x)J p,r dei

Jp d,r d X2(x)J p,r dei +O(1) ii

= hh r i=d+1

h µ2e j(x,ε) µed(x,ε)µ2e j+ed(x,ε)

N j j(x) i Jp d,r d X2(x)J p,r dei

X1(x)S12(x,ε)J p,r dei +O(ε),

h µ2ej(x,ε) µed(x,ε)µ2e j+ed(x,ε)

Jp d,r d X2(x)J p,r dei

Jp d,r d X2(x)J p,r dei +O(1) ii ,

where we use (D.21) in the last step. To simplify the tangent and normal components of

E[(ι(X) ι(x))χBRp ε (ι(x))(ι(X))] βiβ i λi+εd+3 , we need the following formula. Suppose v Rr d, G

Rd (r d) with e i Jp,d G = w i for i = 1, ,d. By Lemma 31, X2(x) O(r d). We can represent the inner product between v and wi in orthonormal basis formed by the column vectors of X2(x).

v X2(x)J p,r dei GX2(x)J p,r dei = d i=1 v wi J p,dei. (E.33)

WHEN LOCALLY LINEAR EMBEDDING HITS BOUNDARY

By (D.20), the tangent component of r i=d+1

E[(ι(X) ι(x))χBRp ε (ι(x))(ι(X))] βiβ i λi+εd+3 is

h µ2e j(x,ε) µed(x,ε)µ2e j+ed(x,ε)

N j j(x) i Jp d,r d X2(x)J p,r dei

X1(x)S12(x,ε)J p,r dei

h µed(x,ε)µ2ej+ed(x,ε)

µ2ed(x,ε) µ2ej(x,ε) µ2ei+ed(x,ε)

µ2ei(x,ε) N j j(x) i

Jp d,r d J p d,r d Nid(x)J p,dei

µed(x,ε)µ2e j+ed(x,ε)

µ2ed(x,ε) µ2ej(x,ε) µ2ei+ed(x,ε)

µ2ed(x,ε) N j j(x)

Jp d,r d J p d,r d Nii(x)J p,ded

µed(x,ε)µ2ej+ed(x,ε)

µ2ed(x,ε) µ2e j(x,ε) µ2ei+ed(x,ε)

µ2ei(x,ε) N j j(x) Nid(x)J p,dei

µed(x,ε)µ2e j+ed(x,ε)

µ2ed(x,ε) µ2ej(x,ε) µ2ei+ed(x,ε)

µ2ed(x,ε) N j j(x) Nii(x)J p,ded,

where in the first step we apply equations (D.20), (D.21) and (E.33). In the last step, e m II(ei,ej) = 0 for m = r +1, , p, and i, j = 1, ,d. Hence,

N j j(x)Jp d,r d J p d,r d Nii(x) = N j j(x)Nii(x). (E.34)

By Lemma 31, we have X2(x) O(r d), and e m II(ej,ej) = 0 for m = r + 1, , p, and j = 1, ,d. Hence, (E.33) implies that

r i=d+1 [N j j(x)Jp d,r d X2(x)J p,r dei]Jp d,r d X2(x)J p,r dei = N j j. (E.35)

We use it to simplify the normal component r i=d+1

E[(ι(X) ι(x))χBRp ε (ι(x))(ι(X))] βiβ i λi+εd+3 . We have

E[(ι(X) ι(x))χBRp ε (ι(x))(ι(X))] βiβ i λi +εd+3

µed(x,ε)µ2ej+ed(x,ε)

µ2ed(x,ε) µ2e j(x,ε) µ2ei+ed(x,ε)

µ2ei(x,ε) N j j(x) Nid(x)J p,dei

h µed(x,ε)µ2ej+ed(x,ε)

µ2ed(x,ε) µ2ej(x,ε) µ2ei+ed(x,ε)

µ2ed(x,ε) N j j(x) i

Nii(x)J p,ded +O(ε),

µ2ej(x,ε) µedµ2ej+ed

N j j +O(1) ii .

By summing up d i=1

E[(ι(X) ι(x))χBRp ε (ι(x))(ι(X))] βiβ i λi+εd+3 and r i=d+1

E[(ι(X) ι(x))χBRp ε (ι(x))(ι(X))] βiβ i λi+εd+3 , we have the conclusion.

Appendix F. Bias analysis on the kernel of LLE and the associated integral operator

F.1 Proof of Proposition 10

1. When x M \ Mε, µed = 0. By Lemma 32, T(x) = [[O(1),O(ε 1)]]. If ι(y) BRp ε (ι(x)), ι(y) ι(x) = [[O(ε),O(ε2)]]. So, (ι(y) ι(x)) T(x) = O(ε) and Kε(x,y) = 1 O(ε) > 0 when ε is small enough.

2. When x Mε, T(x) = [[ µed (x,ε) µ2ed (x,ε)J p,ded +O(1),O(ε 1)]] and

ι(y) ι(x) =[[ d i=1 uiei +O( u 3),O( u 2)]] = [[ d i=1 uiei +O(ε3),O(ε2)]]. (F.1)

Therefore, by Corollary 28,

Kε(x,y) = 1 σ1,d( εx)ud

σ2,d( εx)ε +O(ε). (F.2)

By definition, σ1,d( εx)

σ2,d( εx) > 0 and it is a decreasing function of εx. Therefore, to discuss the infimum of Kε(x,y), it is sufficient to consider the case when x M, i.e. when εx = 0. If εx = 0, then

Kε(x,y) = 1+ h2d(d +2)|Sd 2|

(d2 1)|Sd 1|ε +O(1) i ud +O(ε). (F.3)

Hence, let u d = infud where the infimum is taken over x M and ι(y) BRp ε (ι(x)), then if ε is small enough,

inf x,y Kε(x,y) = 1+ h2d(d +2)|Sd 2|

(d2 1)|Sd 1|ε +O(1) i u d +O(ε). (F.4)

Obviously, u d = ε +O(ε2). Therefore, infx,y Kε(x,y) = 1 2d(d+2)|Sd 2|

(d2 1)|Sd 1| +O(ε). It is worth

to note that 2d(d+2)|Sd 2|

(d2 1)|Sd 1| > 1 by Lemma 29.

3. By Lemma 25 and part (1)

D(x)(1 µed(x,ε)ud

µ2ed(x,ε) +O(ε))(P(x)+O(u))(1+O(u2))du

D(x)(1 µed(x,ε)ud

µ2ed(x,ε) +O(ε))(P(x)+O(u))(1+O(u2))du+O(εd+2)

D(x) 1 σ1,d( εx)ud

σ2,d( εx)ε du+O(εd+1)

WHEN LOCALLY LINEAR EMBEDDING HITS BOUNDARY

Since σ1,d( εx)

σ2,d( εx) > 0 and it is a decreasing function of εx, it suffice to show that if x M, then R D(x) 1 µed (x,ε)ud

µ2ed (x,ε) du C(d)εd.

If x M, then 1 σ1,d( εx)ud

σ2,d( εx)ε = 1+ 2d(d+2)|Sd 2|ud

(d2 1)|Sd 1ε , and

D(x) 1 σ1,d( εx)ud

σ2,d( εx)ε du

ε[1+ 2d(d +2)|Sd 2|ud

(d2 1)|Sd 1|ε ](ε2 u2 d) d 1

0 [1 2d(d +2)|Sd 2|a

(d2 1)|Sd 1| ](1 a2) d 1

0 (1 a2) d 1

2 da |Sd 2|

d 1 2d(d +2)|Sd 2|

(d2 1)|Sd 1|

0 a(1 a2) d 1

2d 2d(d +2)|Sd 2|2

(d2 1)2|Sd 1| .

We have thus finished the proof since |Sd 1|

2d 2d(d+2)|Sd 2|2

(d2 1)2|Sd 1| > 0 for any d following from Lemma 29.

F.2 Proof of Proposition 13

When d = 1, the differentiability follows from the direct calculation. For d > 1, the differentiability follows from the fundamental theorem of calculus. The rest of the statements follow directly from the definition of σ, except φ1( εx) > 0 and φ2( εx) < 0 when εx = 0. We now prove φ1( εx) > 0. When εx = 0,

σ2,d(0)σ2(0) σ3(0)σ1,d(0) = |Sd 1|2

4d2(d +2)2 |Sd 2|2

(d2 1)2(d +3), (F.5)

which is positive since we have proved |Sd 2|2

|Sd 1|2 < (d2 1)2

4d2(d+2) in Lemma 29. Note that σ2,d( εx)σ2( εx) σ3( εx)σ1,d( εx) is increasing when 0 εx ε. Hence, σ2,d( εx)σ2( εx) σ3( εx)σ1,d( εx) > 0. Similarly, we can show that σ2,d( εx)σ0( εx) σ2 1,d( εx) > 0. Therefore, we conclude that φ1( εx) > 0.

Next, we study φ2. To prove φ2( εx) < 0 when εx = 0, it suffices to show σ2 2,d(0) σ3,d(0)σ1,d(0) < 0 since we have shown σ2,d( εx)σ0( εx) σ2 1,d( εx) > 0 above. When εx = 0, we have

σ2 2,d(0) σ3,d(0)σ1,d(0) = |Sd 1|2

4d2(d +2)2 2|Sd 2|2

(d2 1)2(d +3), (F.6)

which is negative due to |Sd 2|2

|Sd 1|2 > (d2 1)2(d+3)

8d2(d+2)2 proved in Lemma 29. We now check that σ2 2,d( εx)

σ3,d( εx)σ1,d( εx) > 0 when εx = ε. Since σ2 2,d( εx) σ3,d( εx)σ1,d( εx) is an increasing continuous function of εx, there is a unique t = t (x) (0, εx) such that σ2 2,d( εx) σ3,d( εx)σ1,d( εx) = 0, and

hence φ2(t ) = 0. We thus have

h |Sd 1| 2d(d +2) + |Sd 2|

0 (1 z2) d 1

2 z2dz i2 (F.7)

(d2 1)2(d +3)

Since t does not depend on x, the set S is diffeomorphic to M when ε is sufficiently small. Since 0 < t

ε < 1, (F.7) becomes

h |Sd 1| 2d(d +2) + |Sd 2|

0 (1 z2) d 1

2 zdz i2 > 2|Sd 2|2

(d2 1)2(d +3)

ε 2 d+1 , (F.8)

which is equivalent to

|Sd 1| 2d(d +2) + |Sd 2|

2 d +3 |Sd 2| d2 1

If we isolate t in the above equation, we have the lower bound for t :

1 1+ (d2 1)|Sd 1|

2d(d+2)|Sd 2|

2 ε < t . (F.10)

Note that by Lemma 29, (d2 1)|Sd 1|

2d(d+2)|Sd 2| < q

2 d+3, so 1 1+ (d2 1)|Sd 1|

2d(d+2)|Sd 2|

Next, we find the upper bound of t . Since t

ε < 1, by (F.7), we have,

h |Sd 1| 2d(d +2) + |Sd 2|

0 (1 z2) d 1

2 z3dx i2 (F.11)

(d2 1)2(d +3)

which is equivalent to

|Sd 1| 2d(d +2) + |Sd 2| (d2 1)(d +3)

2 2+(d +1) t

< |Sd 2| (d2 1)

If we isolate t in the above equation, we have the lower bound,

1 (d2 1)|Sd 1|

4d(d +2)|Sd 2| + 1 d +3

2 ε. (F.13)

By Lemma 29 and the upper bound, t 0 as d .

WHEN LOCALLY LINEAR EMBEDDING HITS BOUNDARY

F.3 Proof of Theorem 15

In this proof, we caluclate the first two order terms in Rε f(x). First, we are going to calculate E[χBRp ε (ι(x))(ι(X))] E[(ι(X) ι(x))χBRp ε (ι(x))(ι(X))] T(x) and show that it is dominated by the order εd terms. Then we are going to calculate E[(f(X) f(x))χBRp ε (ι(x))(ι(X))] E[(ι(X) ι(x))(f(X) f(x))χBRp ε (ι(x))(ι(X))] T(x) and show that it is dominated by the order εd+2 terms. Hence their ratio is dominated by the order ε2 terms. By Lemma 30 and Lemma 32, we have

E[χBRp ε (ι(x))(ι(X))] = P(x)µ0(x,ε)+O(εd+1), (F.14)

E[(ι(X) ι(x))χBRp ε (ι(x))(ι(X))] = [[P(x)µed(x,ε)J p,ded +O(εd+2),O(εd+2)]],

and T(x) = [[v( 1) 1 +v(0) 1,1 +v(0) 1,2 +v(0) 1,3 +v(0) 1,4,v( 1) 2 ]]+[[O(ε),O(1)]], where

v( 1) 1 = µed(x,ε)

µ2ed(x,ε)J p,ded, v(0) 1,1 = P(x)

P(x) , (F.15)

and v(0) 1,2, v(0) 1,3, v(0) 1,4 and v( 1) 2 are defined in Lemma 32. Moreover, v(0) 1,2, v(0) 1,3 and v(0) 1,4 are of order 1

and v( 1) 2 is of order ε 1. Hence,

E[χBRp ε (ι(x))(ι(X))] E[(ι(X) ι(x))χBRp ε (ι(x))(ι(X))] T(x) (F.16)

=P(x) h µ0(x,ε) µed(x,ε)2

i +O(εd+1),

=P(x) hµ0(x,ε)µ2ed(x,ε) µed(x,ε)2

i +O(εd+1),

where the leading term in above expression is of order εd by Lemma 27. Based on Lemma 30, we have

E[(f(X) f(x))χBRp ε (ι(x))(ι(X))] (F.17)

=P(x) d f(x)µed(x,ε)+ d i=1

2 2 ii f(x)+ i f(x) i P(x) µ2ei(x,ε)+O(εd+3),

and E[(ι(X) ι(x))(f(X) f(x))χBRp ε (ι(x))(ι(X))] = [[v1,v2]], (F.18)

v1 =P(x) d i=1

i f(x)µ2ei(x,ε) J p,dei

i f(x) d P(x)+ d f(x) i P(x)+P(x) 2 id f(x) µ2ei+ed(x,ε)J p,dei

i f(x) i P(x)+ P(x)

2 2 ii f(x) µ2ei+ed(x,ε) J p,ded +O(εd+4),

v2 =P(x) d 1 i=1 i f(x)Nid(x)µ2ei+ed(x,ε)+ P(x)

2 d f(x) d i=1 Nii(x)µ2ei+ed(x,ε)+O(εd+4).

Therefore, we have

E[(ι(X) ι(x))(f(X) f(x))χBRp ε (ι(x))(ι(X))] T(x) (F.19)

=P(x) d i=1

i f(x)µ2ei(x,ε) v( 1) 1 J p,dei

+P(x) d i=1

i f(x)µ2ei(x,ε) v(0) 1,1 J p,dei

+P(x) d i=1

i f(x)µ2ei(x,ε) [v(0) 1,2 +v(0) 1,3 +v(0) 1,4] J p,dei

i f(x) d P(x)+ d f(x) i P(x)+P(x) 2 id f(x) µ2ei+ed(x,ε)v( 1) 1 J p,dei

i f(x) i P(x)+ P(x)

2 2 ii f(x) µ2ei+ed(x,ε)v( 1) 1 J p,ded

+P(x) d 1 i=1 i f(x)µ2ei+ed(x,ε)v( 1) 2 Nid(x)

2 d f(x) d i=1 µ2ei+ed(x,ε)v( 1) 2 Nii(x)+O(εd+3).

Note that by Lemma 27, the first term is of order εd+1 and the second to seventh terms are of order εd+2. Furthermore, we can simplify the first and the second term as:

i f(x)µ2ei(x,ε) v( 1) 1 J p,dei = P(x) d f(x)µed(x,ε) (F.20)

i f(x)µ2ei(x,ε) v(0) 1,1 J p,dei = d i=1 i f(x) i P(x)µ2ei(x,ε).

Next we calculate E[(f(X) f(x))χBRp ε (ι(x))(ι(X))] E[(ι(X) ι(x))(f(X) f(x))χBRp ε (ι(x))(ι(X))] T(x). Clearly, the common terms, P(x) d f(x)µed(x,ε) and d i=1 i f(x) i P(x)µ2ei(x,ε), are canceled, and

WHEN LOCALLY LINEAR EMBEDDING HITS BOUNDARY

hence only terms of order εd+2 are left in the difference; that is, we have

E[(f(X) f(x))χBRp ε (ι(x))(ι(X))] E[(ι(X) ι(x))(f(X) f(x))χBRp ε (ι(x))(ι(X))] T(x)

d i=1 2 ii f(x)µ2ei(x,ε) P(x) d i=1

i f(x)µ2ei(x,ε) [v(0) 1,2 +v(0) 1,3 +v(0) 1,4] J p,dei

i f(x) d P(x)+ d f(x) i P(x)+P(x) 2 id f(x) µ2ei+ed(x,ε)v( 1) 1 J p,dei

i f(x) i P(x)+ P(x)

2 2 ii f(x) µ2ei+ed(x,ε)v( 1) 1 J p,ded

P(x) d 1 i=1 i f(x)µ2ei+ed(x,ε)v( 1) 2 Nid(x)

2 d f(x) d i=1 µ2ei+ed(x,ε)v( 1) 2 Nii(x)+O(εd+3).

Next, we simplify the above expression. Note that v( 1) 1 J p,dei = µed (x,ε) µ2ed (x,ε) if i = d, and it is 0 otherwise. Hence,

i f(x) d P(x)+ d f(x) i P(x)+P(x) 2 id f(x) µ2ei+ed(x,ε)v( 1) 1 J p,dei = 0

and by definition of v(0) 1,3 and v( 1) 1 , we have

P(x)µ2ei(x,ε)v(0)

1,3 J p,dei + i P(x)µ2ei+ed(x,ε)v( 1) 1 J p,ded = 0. (F.21)

For i = 1, ,d 1, by definition of v(0) 1,4 and v( 1) 2 , we have

P(x)µ2ei(x,ε)v(0) 1,4 J p,dei +P(x)µ2ei+ed(x,ε)v( 1) 2 Nid(x) = 0 (F.22)

P(x)µ2ed(x,ε)v(0) 1,4 J p,ded + P(x)

d i=1 µ2ei+ed(x,ε)v( 1) 2 Nii(x) = 0. (F.23)

Moreover, we have v(0) 1,2 J p,dei = µed (x,ε)εd+3

P(x)(µ2ed (x,ε))2 if i = d, and it is 0 otherwise. Therefore,

E[(f(X) f(x))χBRp ε (ι(x))(ι(X))] E[(ι(X) ι(x))(f(X) f(x))χBRp ε (ι(x))(ι(X))] T(x)

d i=1 2 ii f(x) µ2ei(x,ε) µ2ei+ed(x,ε) µed(x,ε)

µ2ed(x,ε) + d f(x) µed(x,ε)εd+3

µ2ed(x,ε) .

Therefore, the ratio

E[(f(X) f(x))χBRp ε (ι(x))(ι(X))] E[(ι(X) ι(x))(f(X) f(x))χBRp ε (ι(x))(ι(X))] T(x)

E[χBRp ε (ι(x))(ι(X))] E[(ι(X) ι(x))χBRp ε (ι(x))(ι(X))] T(x) (F.24)

= d i=1 2 ii f(x) hµ2ei(x,ε)µ2ed(x,ε) µ2ei+ed(x,ε)µed(x,ε)

2µ0(x,ε)µ2ed(x,ε) 2µed(x,ε)2

+ d f(x) µed(x,ε)εd+3

P(x) µ0(x,ε)µ2ed(x,ε) µed(x,ε)2 +O(ε3).

And the conclusion follows by substituting terms and in Corollary 28.

Appendix G. Variance analysis on LLE

For simplicity of notations, for each xk, denote

f := (f(xk,1), f(xk,2),..., f(xk,N)) RN.

By a direct expansion of equations (2.5), (2.6), (2.8) and c = nεd+3, we have

n j=1 [W In n]k j f(xj) = 1 N f 1 NG n,k Un Ip,rn(Λn +nεd+3Ip p) 1U n Gn,k f

N 1 NG n,k Un Ip,rn(Λn +nεd+3Ip p) 1U n Gn,k1N f(xk), (G.1)

which can be rewritten as gn,1

gn,2 , where

N j=1 (f(xk,j) f(xk)) [ 1

N j=1 (xk,j xk)] Un Ip,rn( Λn

nεd +ε3Ip p) 1

N j=1 (xk,j xk)(f(xk,j) f(xk))]

N j=1 (xk,j xk)] Un Ip,rn( Λn

nεd +ε3Ip p) 1U n [ 1

N j=1 (xk,j xk)].

The goal is to relate the finite sum quantity gn,1

gn,2 to Qε f(xk) := g1

εd χBRp ε (ι(xk))(X)(f(X) f(xk)) E 1

εd (ι(X) ι(xk))χBRp ε (ι(xk))(X) (G.2)

εd (ι(X) ι(xk))χBRp ε (ι(xk))(X)(f(X) f(xk))

εd χBRp ε (ι(xk))(X) E 1

εd (ι(X) ι(xk))χBRp ε (ι(xk))(X) (G.3)

εd (ι(X) ι(xk))χBRp ε (ι(xk))(X) .

WHEN LOCALLY LINEAR EMBEDDING HITS BOUNDARY

We now control the size of the fluctuation of the following four terms

N j=1 1 (G.4)

N j=1 (f(xk,j) f(xk)) (G.5)

N j=1 (xk,j xk) (G.6)

N j=1 (xk,j xk)(f(xk,j) f(xk)) (G.7)

as a function of n and ε by the Bernstein type inequality. Here, we put ε d in front of each term to normalize the kernel so that the computation is consistent with the existing literature, like (Cheng and Wu, 2013; Singer and Wu, 2017). The size of the fluctuation of these terms are controlled in the following Lemmas. The term (G.4) is the usual kernel density estimation, so we have the following lemma.

Lemma 33 Suppose ε = ε(n) so that

log(n) n1/2εd/2+1 0 and ε 0 as n . We have with probability greater than 1 n 2 that for all k = 1,...,n, 1 nεd

N j=1 1 E 1

εd χBRp ε (xk)(ι(X))

log(n) n1/2εd/2

Denote Ω0 to be the event space that above Lemma is satisfied. The behavior of (G.5) is summarized in the following Lemma.

Lemma 34 Suppose ε = ε(n) so that

log(n) n1/2εd/2+1 0 and ε 0 as n . We have with probability greater than 1 n 2 that for all k = 1,...,n, 1 nεd

N j=1 (f(xk,j) f(xk)) E 1

εd (f(X) f(xk))χBRp ε (xk)(ι(X))

log(n) n1/2εd/2 1

Proof By denoting

εd (f(xj) f(xk))χBRp ε (xk)(xj), (G.8)

we have 1 nεd

N j=1 (f(xk,j) f(xk)) = 1

n j =k,j=1 F1,j. (G.9)

Define a random variable

εd (f(X) f(xk))χBRp ε (xk)(ι(X)). (G.10)

Clearly, when j = k, F1,j can be viewed as randomly sampled i.i.d. from F1. Note that we have

n j =k,j=1 F1,j = n 1

n j =k,j=1 F1,j

n 1 as n , the error incurred by replacing 1

n by 1 n 1 is of order 1

n, which is negligible asymptotically, we can simply focus on analyzing 1 n 1 n j=1,j =i F1,j. We have by Lemma 27 and Lemma 30,

E[F1] =O(ε) if x Mε (G.12)

E[F1] =O(ε2) if x Mε

E[F2 1 ] = d i=1 P(xk)( i f(xk))2µ2ei(xk,ε)ε 2d +O(ε d+3), (G.13)

By Lemma 27, |Sd 1| 2d(d+2)ε d+2 +O(ε d+3) µ2ei(xk,ε)ε 2d |Sd 1|

d(d+2)ε d+2, therefore, in any case,

σ2 1 := Var(F1) |Sd 1| P L

d(d +2) ε d+2 +O(ε d+3). (G.14)

With the above bounds, we could apply the large deviation theory. First, note that the random variable F1 is uniformly bounded by

c1 = 2 f L ε d , (G.15)

so we apply Bernstein s inequality to provide a large deviation bound. Recall Bernstein s inequality

n j =k,j=1 (F1,j E[F1]) > η1

e nη2 1 2σ2 1 + 2

3 c1η1 , (G.16)

where η1 > 0. Note that E[F1] = O(ε), if xk Mε and E[F1] = O(ε2), if xk Mε. Hence, we assume η1 = O(ε2+s), where s > 0. Then c1η1 = O(ε d+2+s). If ε is small enough, 2σ2 1 + 2

3c1η1 Cε d+2

for some constant C which depends on f and P. We have,

nη2 1 2σ2 1 + 2

3c1η1 nη2 1εd 2

Suppose n is chosen large enough so that

C 3log(n); (G.18)

that is, the deviation from the mean is set to

log(n) n1/2εd/2 1

WHEN LOCALLY LINEAR EMBEDDING HITS BOUNDARY

Note that by the assumption that η1 = O(ε2+s), we know that η1/ε2 =

log(n) n1/2εd/2+1 0. It implies that the deviation greater than η1 happens with probability less than

nη2 1 2σ2 1 + 2

exp nη2 1εd 2

= exp( 3log(n)) = 1/n3. (G.20)

As a result, by a simple union bound, we have

n j =k, j=1 (F1,j E[F1]) > η1 k = 1,...,n

ne nη2 1 2σ2 1 + 2

3 c1η1 1/n2. (G.21)

Denote Ω1 to be the event space that the deviation 1 n 1 n j =k, j=1(F1,j E[F1]) η1 for all i = 1,...,n, where η1 is chosen in (G.19) is satisfied.

Lemma 35 Suppose ε = ε(n) so that

log(n) n1/2εd/2+1 0 and ε 0 as n . We have with probability greater than 1 n 2 that for all k = 1,...,n,

N j=1 (xk,j xk) E 1

εd (ι(X) ι(xk))χBRp ε (xk)(ι(X))

log(n) n1/2εd/2 1

where i = 1,...,d. And

N j=1 (xk,j xk) E 1

εd (ι(X) ι(xk))χBRp ε (xk)(ι(X))

log(n) n1/2εd/2 2

where i = d +1,..., p.

Proof Fix xk. By denoting

N j=1 (xk,j xk) = 1

p ℓ=1 F2,ℓ,jeℓ. (G.24)

where F2,ℓ,j := 1

εd e ℓ(x j xk)χBRp ε (xk)(xj), (G.25)

and we know that when j = k, F2,ℓ,j is randomly sampled i.i.d. from

εd e ℓ(ι(X) ι(xk))χBRp ε (xk)(ι(X)). (G.26)

Similarly, we can focus on analyzing 1 n 1 n j=1,j =i F2,ℓ,j since n 1

n 1 as n . By Lemma 30 we have

P(x)µed(x,ε)ε d e ℓed + d i=1

i P(x)µ2ei(x,ε)ε d e ℓei +O(εd+3) when ℓ= 1,...,d

d i=1 Nii(x)µ2ei +O(εd+3) when ℓ= d +1,..., p.

In other words, by Lemma 27, for ℓ= 1,...,d we have E[F2,ℓ] = O(ε) if xk Mε, and E[F2,ℓ] = O(ε2) if xk Mε. Moreover, E[F2,ℓ] = O(ε2) for ℓ= d +1,..., p. By (D.8) we have, for ℓ= 1,...,d

E[F2 2,ℓ] Cℓε d+2 +O(ε d+3), (G.27)

and Cℓdepends on P L . For ℓ= d +1,..., p,

E[F2 2,ℓ] Cℓε d+4 +O(ε d+5), (G.28)

and Cℓdepends on P L and second fundamental form of M. Thus, we conclude that

σ2 2,ℓ Cℓε d+2 +O(ε d+3) when ℓ= 1,...,d (G.29)

σ2 2,ℓ Cℓε d+4 +O(ε d+5) when ℓ= d +1,..., p.

Note that for ℓ= d +1,..., p, the variance is of higher order than that of ℓ= 1,...,d. With the above bounds, we could apply the large deviation theory. For ℓ= 1,...,d, the random variable F2,ℓis uniformly bounded by c2,ℓ= 2ε d+1. Since E[F2,ℓ] = O(ε) if xk Mε, and E[F2,ℓ] = O(ε2) if xk Mε, we assume η2,ℓ= O(ε2+s), where s > 0. Then c2,ℓη2,ℓ= O(ε d+3+s). If ε is small enough, 2σ2 2,ℓ+ 2

3c2,ℓη2,ℓ Cε d+2 for some constant C which depends on P and manifold M. We have nη2 2,ℓ 2σ2 2,ℓ+ 2

3c2,ℓη2,ℓ nη2 2,ℓεd 2

Suppose n is chosen large enough so that

nη2 2,ℓεd 2

C 3log(n); (G.31)

that is, the deviation from the mean is set to

log(n) n1/2εd/2 1

Note that by the assumption that η2,ℓ= O(ε2+s), we know that η2,ℓ/ε2 =

log(n) n1/2εd/2+1 0. Thus, when ε is sufficiently smaller and n is sufficiently large, the exponent in Bernstein s inequality

n j =k,j=1 (F2,ℓ,j E[F2,ℓ]) > η2,ℓ

exp nη2 2,ℓ 2σ2 2,ℓ+ 2

n3 . (G.33)

By a simple union bound, for ℓ= 1,...,d, we have

n j =k, j=1 F2,ℓ,j E[F2,ℓ]

> η2,ℓ k = 1,...,n

1/n2. (G.34)

For ℓ= d + 1,..., p, the random variable F2,ℓis uniformly bounded by c2,ℓ= 2ε d+1. Since E[F2,ℓ] = O(ε2) for ℓ= d + 1,..., p, we assume η2,ℓ= O(ε3+s), where s > 0. Then c2,ℓη2,ℓ=

WHEN LOCALLY LINEAR EMBEDDING HITS BOUNDARY

O(ε d+4+s). If ε is small enough, 2σ2 2,ℓ+ 2

3c2,ℓη2,ℓ Cε d+4 for some constant C which depends on M and P. We have, nη2 2,ℓ 2σ2 2,ℓ+ 2

3c2,ℓη2,ℓ nη2 2,ℓεd 4

Suppose n is chosen large enough so that

nη2 2,ℓεd 4

C = 3log(n); (G.36)

that is, the deviation from the mean is set to

log(n) n1/2εd/2 2

Note that by the assumption that β1 = O(ε3+s), we know that η2,ℓ/ε3 =

log(n) n1/2εd/2+1 0. By a similar argument, for ℓ= d +1,..., p, we have

n j =k, j=1 F2,ℓ,j E[F2,ℓ]

> η2,ℓ k = 1,...,n

1/n2. (G.38)

Denote Ω2 to be the event space that the deviation 1

n n j =k, j=1 F2,ℓ,j E[F2,ℓ] η2,ℓfor all ℓ= 1,..., p and k = 1,...,n, where η2,ℓare chosen in (G.32) and (G.37). Next Lemma summarizes behavior of (G.7) and can be proved similarly as Lemma 35.

Lemma 36 Suppose ε = ε(n) so that

log(n) n1/2εd/2+1 0 and ε 0 as n . We have with probability greater than 1 n 2 that for all k = 1,...,n,

N j=1 (xk,j xk)(f(xk,j) f(xk)) E 1

εd (ι(X) ι(xk))(f(X) f(xk))χBRp ε (xk)(ι(X))

log(n) n1/2εd/2 2

where i = 1,...,d, and

N j=1 (xk,j xk)(f(xk,j) f(xk)) E 1

εd (ι(X) ι(xk))(f(X) f(xk))χBRp ε (xk)(ι(X))

log(n) n1/2εd/2 3

where i = d +1,..., p.

Denote Ω3 to be the event space that Lemma 36 is satisfied. In the next two lemmas, we describe the behavior of 1 nεd Gn,k G n,k. The proofs are the same as Lemma E.4 in Wu and Wu (2018) with ρ = 3.

Lemma 37 Suppose ε = ε(n) so that

log(n) n1/2εd/2+1 0 and ε 0 as n . We have with probability greater than 1 n 2 that for all k = 1,...,n,

nεd Gn,k G n,k 1

εd Cxk ej = O p

log(n) n1/2εd/2 2

where i, j = 1,...,d.

nεd Gn,k G n,k 1

εd Cxk ej = O p

log(n) n1/2εd/2 4

where i, j = 1+1,..., p.

nεd Gn,k G n,k 1

εd Cxk ej = O p

log(n) n1/2εd/2 3

Lemma 38 rn r and rn is a non decreasing function of n. If n is large enough, rn = r. Suppose

ε = ε(n) so that

log(n) n1/2εd/2+1 0 and ε 0 as n . We have with probability greater than 1 n 2

that for all k = 1,...,n,

e i h Ip,rn Λn

nεd +ε3Ip p 1 Ip,r Λ

εd +ε3Ip p 1i ei = O p

log(n) n1/2εd/2+2

for i = 1,...,r and

log(n) n1/2εd/2 2UΘS+O log(n)

where S o(p), and Θ O(p). Θ commutes with Ip,r( Λ

εd +ε3Ip p) 1.

Denote Ω4 to be the event space that Lemma 38 is satisfied. In the proofs of Lemma 32 and Theorem 15, we need the order εd+3 terms of the eigenvalues {λi} of Cx for i = 1, ,d and we need the order ε term of the eigenvectors {βi} of Cx for i = 1, , p. We also use the fact that {λi} of Cx for i = d +1, , p are of order εd+4, so that we can calculate the leading terms (order ε2) of

Qε f(x) for all x M. Since

log(n) n1/2εd/2+1 0, the above two lemmas imply that the differences between the first d eigenvalues of 1 nεd Gn,k G n,k and 1 εd Cxk are less than O(ε3). The differences between the rest of the eigenvalues of 1 nεd Gn,k G n,k and 1

εd Cxk are less than O(ε4). In other words, we can make sure that the rest of the eigenvalues of 1 nεd Gn,k G n,k are of order ε4. Moreover Un and UΘ differ by a matrix of order ε3. Consequently, in the following proof, we can show that the deviation between n j=1[W In n]k j f(xk,j) and Qε f(xk) is less than ε2 for all xk. Proof [Proof of Theorem 11.] Denote Ω:= i=0,...,4Ωi. By a direct union bound, the probability of the event space Ωis great than 1 n 2. Below, all arguments are conditional on Ω. Based on previous lemmas, we have, for k = 1,...,n,

N j=1 1 = E 1

εd χBRp ε (xk)(ι(X))+O p

log(n) n1/2εd/2

WHEN LOCALLY LINEAR EMBEDDING HITS BOUNDARY

N j=1 (f(xk,j) f(xk)) = E 1

εd (f(X) f(xk))χBRp ε (xk)(ι(X))+O p

log(n) n1/2εd/2 1

N j=1 (xk,j xk) = E 1

εd (ι(X) ι(xk))χBRp ε (xk)(ι(X))+E1 , (G.46)

where E1 Rp, e i E1 = O

log(n) n1/2εd/2 1 for i = 1,...,d, and e i E1 = O

log(n) n1/2εd/2 2 for i = d +1,..., p. Moreover, we have

N j=1 (xk,j xk)(f(xk,j) f(xk)) (G.47)

εd (ι(X) ι(xk))(f(X) f(xk))χBRp ε (xk)(ι(X))+E2,

where E2 Rp. e i E2 = O

log(n) n1/2εd/2 2 for i = 1,...,d, and e i E2 = O

log(n) n1/2εd/2 3 for i = d +1,..., p. Therefore, we have

Un Ip,rn Λn

nεd +ε3Ip p 1 U n UIp,r Λ

εd +ε3Ip p 1 U

log(n) n1/2εd/2 2UΘS+O log(n)

εd +ε3Ip p 1 +O p

log(n) n1/2εd/2+2

log(n) n1/2εd/2 2UΘS+O(log(n)

nεd 4 ) UIp,r Λ

εd +ε3Ip p 1 U .

log(n) n1/2εd/2 2UΘ SIp,r Λ

εd +ε3Ip p 1 +Ip,r Λ

εd +ε3Ip p) 1S Θ U

log(n) n1/2εd/2+2

Ip p + higher order terms .

Define a p p matrix

log(n) n1/2εd/2 2UΘ h SIp,r Λ

εd +ε3Ip p 1 +Ip,r Λ

εd +ε3Ip p 1 S i Θ U (G.48)

log(n) n1/2εd/2+2

N j=1 (xk,j xk)] Un Ip,rn( Λn

nεd +ε3Ip p) 1U n [ 1

N j=1 (xk,j xk)(f(xk,j) f(xk))]

εd (ι(X) ι(xk))χBRp ε (xk)(ι(X))+E1] [UIp,r( Λ

εd +ε3Ip p) 1U +E3 +higher order terms]

εd (ι(X) ι(xk))(f(X) f(xk))χBRp ε (xk)(ι(X))+E2]

εd (ι(X) ι(xk))χBRp ε (xk)(ι(X)) [UIp,r( Λ

εd +ε3Ip p) 1U ]

εd (ι(X) ι(xk))(f(X) f(xk))χBRp ε (xk)(ι(X))

+E 1 UIp,r( Λ

εd +ε3Ip p) 1U E 1

εd (ι(X) ι(xk))(f(X) f(xk))χBRp ε (xk)(ι(X))

εd (ι(X) ι(xk))χBRp ε (xk)(ι(X)) E3E 1

εd (ι(X) ι(xk))(f(X) f(xk))χBRp ε (xk)(ι(X))

εd (ι(X) ι(xk))χBRp ε (xk)(ι(X)) UIp,r( Λ

εd +ε3Ip p) 1U E2 +higher order terms.

εd (ι(X) ι(xk))χBRp ε (xk)(ι(X)) UIp,r( Λ

εd +ε3Ip p) 1U E2 = Tι(xk)E2 . (G.49)

Tι(xk)E2 =[[O(ε 1),O(ε 1)]] hh O p

log(n) n1/2εd/2 2

log(n) n1/2εd/2 3

log(n) n1/2εd/2 1

Tι(xk)E2 =[[O(1),O(ε 1)]] hh O p

log(n) n1/2εd/2 2

log(n) n1/2εd/2 3

log(n) n1/2εd/2 2

Moreover, when xk Mε or xk M \ Mε by a similar calculation as in Lemma 32, UIp,r( Λ

εd + ε3Ip p) 1U E 1

εd (ι(X) ι(xk))(f(X) f(xk))χBRp ε (xk)(ι(X)) = [[O(1),O(1)]]. Hence,

E 1 UIp,r( Λ

εd +ε3Ip p) 1U E 1

εd (ι(X) ι(xk))(f(X) f(xk))χBRp ε (xk)(ι(X)) = O p

log(n) n1/2εd/2 1

Next, we calculate E 1

εd (ι(X) ι(xk))χBRp ε (xk)(ι(X)) E3E 1

εd (ι(X) ι(xk))(f(X) f(xk))χBRp ε (xk)(ι(X)). By a straightforward calculation, we can show that it is dominated by

log(n) n1/2εd/2+2 E 1

εd (ι(X) ι(xk))χBRp ε (xk)(ι(X)) E 1

εd (ι(X) ι(xk))(f(X) f(xk))χBRp ε (xk)(ι(X)).

Hence, when xk Mε,

εd (ι(X) ι(xk))χBRp ε (xk)(ι(X))E3E 1

εd (ι(X) ι(xk))(f(X) f(xk))χBRp ε (xk)(ι(X)) = O p

log(n) n1/2εd/2 1

WHEN LOCALLY LINEAR EMBEDDING HITS BOUNDARY

When xk Mε,

εd (ι(X) ι(xk))χBRp ε (xk)(ι(X))E3E 1

εd (ι(X) ι(xk))(f(X) f(xk))χBRp ε (xk)(ι(X)) = O p

log(n) n1/2εd/2 2

In conclusion for k = 1, ,n, we have

N j=1 (xk,j xk) i Un Ip,rn Λn

nεd +ε3Ip p 1 U n h 1

N j=1 (xk,j xk)(f(xk,j) f(xk)) i

εd (ι(X) ι(xk))χBRp ε (xk)(ι(X)) h UIp,r( Λ

εd +ε3Ip p) 1U i

εd (ι(X) ι(xk))(f(X) f(xk))χBRp ε (xk)(ι(X))+O p

log(n) n1/2εd/2 1

A similar argument shows that for k = 1, ,n,

N j=1 (xk,j xk) i Un Ip,rn Λn

nεd +ε3Ip p 1 U n h 1

N j=1 (xk,j xk) i (G.50)

εd (ι(X) ι(xk))χBRp ε (xk)(ι(X)) h UIp,r Λ

εd +ε3Ip p 1 U i E 1

εd (ι(X) ι(xk))χBRp ε (xk)(ι(X))

log(n) n1/2εd/2

By Theorem 15, g1 has order O(ε2) and g2 has order 1. Hence, we have

n j=1 [W In n]k j f(xk,j) = g1 +O

log(n) n1/2εd/2 1

log(n) n1/2εd/2 = Qε f(xk)+O p

log(n) n1/2εd/2 1

Appendix H. Detailed calculation of (4.19)

We show that the operator Qε f(x) may not approximate the Laplace-Beltrami operator in the interior of ι(M) = [ 1,1] R. Let x,y M such that ι(x) [ 1+ε,1 ε]. Since M is 1-dimensional and is embedded in R, the local covariance matrix Cx has rank 1. Suppose P C2(M) and ε > 0 is small enough. By (Wu and Wu, 2018, Lemma SI.6, Case 0) and the fact that K(x,y) = 0 when |ι(x) ι(y)| ε, we have

K(x,y) = 1 P (x)

P(x) (ι(y) ι(x))+O(ε2) χBRε (ι(x)). (H.1)

K(y,x) = 1 P (y)

P(y) (ι(x) ι(y))+O(ε2) χBRε (ι(x)). (H.2)

Without loss of generality, consider ι(x) = 0. Let γ(t) : [ 1,1] M be the arclength parametrization of M with γ(0) = x. Suppose f C3(M). Then, by a straightforward expansion in the parametrization γ(t), we have

Qε f(x) := E[Kε(X,x)[f(X) f(x)]]

R ε ε 1+ (P γ) (t)

(P γ)(t) t +O(ε2) (f γ) (0)t + (f γ) (0)

2 t2 +O(t3) (P γ)(t)dt R ε ε 1+ (P γ) (t)

(P γ)(t) t +O(ε2) (P γ)(t)dt (H.3)

The numerator of (H.3) can be expanded as

1+ (P γ) (t)

(P γ)(t) t +O(ε2) (f γ) (0)t + (f γ) (0)

2 t2 +O(t3) (P γ)(t)dt

=(f γ) (0) Z ε

ε(P γ)(t)tdt +(f γ) (0) Z ε

ε (P γ) (t)

(P γ)(t) (P γ)(t)t2dt

+ (f γ) (0)

ε(P γ)(t)t2dt +O(ε4)

=(f γ) (0) Z ε

ε(P γ)(t)tdt +(f γ) (0) Z ε

ε(P γ) (t)t2dt

+ (f γ) (0)

ε(P γ)(t)t2dt +O(ε4)

=(f γ) (0) Z ε

(P γ)(0)t +(P γ) (0)t2 +O(t3) dt +(f γ) (0) Z ε

(P γ) (0)t2 +O(t3) dt

+ (f γ) (0)

ε((P γ)(0)t2O(t3) dt +O(ε4)

3(f γ) (0)(P γ) (0)ε3 + 2

3(f γ) (0)(P γ) (0)ε3 + 1

3(f γ) (0)(P γ)(0)ε3 +O(ε4)

3(f γ) (0)(P γ) (0)+ 1

3(f γ) (0)(P γ)(0) ε3 +O(ε4).

The denominator of (H.3) can be expanded as

1+ (P γ) (t)

(P γ)(t) t +O(ε2) (P γ)(t)dt = 2(P γ)(0)ε +O(ε2).

Hence, we have

h 4 3(f γ) (0)(P γ) (0)+ 1

3(f γ) (0)(P γ)(0) i ε3 +O(ε4)

2(P γ)(0)ε +O(ε2) (H.4)

3 (f γ) (0)(P γ) (0)

P γ)(0) + 1

6(f γ) (0) ε2 +O(ε3)

3 f (x)P (x)

WHEN LOCALLY LINEAR EMBEDDING HITS BOUNDARY

Recall that Qε f(x) = 1

6 f (x)ε2 + O(ε3). However, since the kernel is not symmetric, Qε does not approximate the Laplace-Beltrami operator . Moreover, when f C5(M), we conclude that

QεQε f(x) = 1

4 f (x)+ f (x)P (x)

F.V. Atkinson. Discrete and continuous boundary value problems. Acad. Press., New York. MR, 176141, 1964.

Jonathan Bates. The embedding dimension of Laplacian eigenfunction maps. Applied and Computational Harmonic Analysis, 37(3):516 530, 2014.

M. Belkin and P. Niyogi. Laplacian Eigenmaps for dimensionality reduction and data representation. Neural. Comput., 15(6):1373 1396, 2003.

M. Belkin and P. Niyogi. Convergence of Laplacian eigenmaps. In Adv. Neur. In.: Proceedings of the 2006 Conference, volume 19, page 129. The MIT Press, 2007.

Pierre B erard, G erard Besson, and Sylvain Gallot. Embedding Riemannian manifolds by their heat kernel. Geometric & Functional Analysis GAFA, 4(4):373 398, 1994.

T. Berry and T. Sauer. Density estimation on manifolds with boundary. Computational Statistics and Data Analysis, 107:1 17, 2017.

Jeff Calder, Nicolas Garcia Trillos, and Marta Lewicka. Lipschitz regularity of graph Laplacians on random data clouds. SIAM Journal on Mathematical Analysis, 54(1):1169 1222, 2022.

Sun-Yung A Chang, Lihe Wang, and Paul C Yang. A regularity theory of biharmonic maps. Communications on Pure and Applied Mathematics, 52(9):1113 1137, 1999.

M.-Y. Cheng and H.-T. Wu. Local linear regression on manifolds and its geometric interpretation. J. Am. Stat. Assoc., 108:1421 1434, 2013.

Xiuyuan Cheng and Nan Wu. Eigen-convergence of Gaussian kernelized graph Laplacian by manifold heat interpolation. Applied and Computational Harmonic Analysis, 61:132 190, 2022.

R. R. Coifman and S. Lafon. Diffusion maps. Appl. Comput. Harmon. Anal., 21(1):5 30, 2006.

Fabrizio Cuccu and Giovanni Porru. Maximization of the first eigenvalue in problems involving the bi-Laplacian. Nonlinear Analysis: Theory, Methods & Applications, 71(12):e800 e809, 2009.

Xiucai Ding and Hau-Tieng Wu. Impact of signal-to-noise ratio and bandwidth on graph Laplacian spectrum from high-dimensional noisy point cloud. IEEE Transactions on Information Theory, 2022.

D. L. Donoho and C. Grimes. Hessian eigenmaps: Locally linear embedding techniques for highdimensional data. P. Natl. Acad. Sci. USA, 100(10):5591 5596, 2003.

David B Dunson and Nan Wu. Inferring manifolds from noisy data using Gaussian processes. ar Xiv preprint ar Xiv:2110.07478, 2021.

David B Dunson, Hau-Tieng Wu, and Nan Wu. Spectral convergence of graph Laplacian and heat kernel reconstruction in L from random samples. Applied and Computational Harmonic Analysis, 55:282 336, 2021.

C. L. Epstein and R. Mazzeo. Degenerate diffusion operators arising in population biology. Number 185. Princeton University Press, 2013.

Tingran Gao. The diffusion geometry of fibre bundles: Horizontal diffusion maps. Applied and Computational Harmonic Analysis, 50:147 215, 2021.

A. S. Georgiou, J. M. Bello-Rivas, C. W. Gear, H.-T. Wu, E. Chiavazzo, and I. G. Kevrekidis. An exploration algorithm for stochastic simulators driven by energy gradients. Entropy, 19(7):294, 2017.

D. N. Kaslovsky and F. G. Meyer. Non-asymptotic analysis of tangent space perturbation. Information and Inference: a Journal of the IMA, 3(2):134 187, 2014.

D. Kershaw. Some extensions of W. Gautschi s inequalities for the gamma function. Mathematics of Computation, pages 607 611, 1983.

John M Lee. Smooth manifolds. Springer, 2012.

Chen-Yun Lin, Arin Minasian, Xin Jessica Qi, and Hau-Tieng Wu. Manifold learning via the principle bundle approach. Frontiers in Applied Mathematics and Statistics, 4:21, 2018.

L. Lin, B. St. Thomas, H. Zhu, and D. B. Dunson. Extrinsic local regression on manifold-valued data. Journal of the American Statistical Association, 112(519):1261 1273, 2017.

A. V. Little, M. Maggioni, and L. Rosasco. Multiscale geometric methods for data sets I: Multiscale SVD, noise and curvature. Applied and Computational Harmonic Analysis, 43(3):504 567, 2017.

J. Malik, C. Shen, H.-T. Wu, and N. Wu. Connecting dots from local covariance to empirical intrinsic geometry and locally linear embedding. Pure and Applied Analysis, 1(4):515 542, 2019.

S. Mukherjee, Q. Wu, and D.-X. Zhou. Learning gradients on manifolds. Bernoulli, 16(1):181 207, 2010.

B. Nadler. Finite sample approximation results for principal component analysis: A matrix perturbation approach. The Annals of Statistics, 36(6):2791 2817, 2008.

M. A. Na ımark. Linear differential operators. F. Ungar Publishing Company, 1967.

S. Osher, Z. Shi, and W. Zhu. Low dimensional manifold model for image processing. SIAM Journal on Imaging Sciences, 10(4):1669 1690, 2017.

J Wilson Peoples and John Harlim. Spectral convergence of symmetrized graph Laplacian on manifolds with boundary. ar Xiv preprint ar Xiv:2110.06988, 2021.

WHEN LOCALLY LINEAR EMBEDDING HITS BOUNDARY

Jacobus W Portegies. Embeddings of Riemannian manifolds with heat kernels and eigenfunctions. Communications on Pure and Applied Mathematics, 69(3):478 518, 2016.

S. T. Roweis and L. K. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500):2323 2326, 2000. doi: 10.1126/science.290.5500.2323.

T. Shnitzer, M. Ben-Chen, L. Guibas, R. Talmon, and H.-T. Wu. Recovering hidden components in multimodal data with composite diffusion operators. ar Xiv preprint ar Xiv:1808.07312, 2018.

A. Singer and H.-T. Wu. Vector diffusion maps and the connection Laplacian. Communications on Pure and Applied Mathematics, 65(8):1067 1144, 2012.

A. Singer and H.-T. Wu. Spectral convergence of the connection Laplacian from random samples. Information and Inference: A Journal of the IMA, 6(1):58 123, 2017.

J. B. Tenenbaum, V. de Silva, and J. C. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500):2319 2323, 2000.

N. G. Trillos, M. Gerlach, M. Hein, and D. Slepcev. Error estimates for spectral convergence of the graph Laplacian on random geometric graphs toward the Laplace Beltrami operator. Foundations of Computational Mathematics, 20(4):827 887, 2020.

H. Tyagi, E. Vural, and P. Frossard. Tangent space estimation for smooth embeddings of Riemannian manifolds. Information and Inference: A Journal of the IMA, 2(1):69 114, 2013.

L. van der Maaten and G. Hinton. Visualizing data using t-SNE. Journal of Machine Learning Research, 9:2579 2605, 2008.

R. Vaughn, T. Berry, and H. Antil. Diffusion maps for embedded manifolds with boundary with applications to pdes. ar Xiv preprint ar Xiv:1912.01391, 2019.

Ryan Vaughn. DIffusion maps for manifolds with boundary. Ph D thesis, George Mason University, 2020.

U. Von Luxburg, M. Belkin, and O. Bousquet. Consistency of spectral clustering. The Annals of Statistics, pages 555 586, 2008.

K.Q. Weinberger and L.K. Saul. An introduction to nonlinear dimensionality reduction by maximum variance unfolding. Aaai, pages 1683 1686, 2006.

Hassler Whitney. Analytic extensions of differentiable functions defined in closed sets. In Hassler Whitney Collected Papers, pages 228 254. Springer, 1992.

J. Wong. An extension procedure for manifolds with boundary. Pacific Journal of Mathematics, 235(1):173 199, 2008.

Caroline L Wormell and Sebastian Reich. Spectral convergence of diffusion maps: Improved error bounds and an alternative normalization. SIAM Journal on Numerical Analysis, 59(3):1687 1734, 2021.

H.-T. Wu and N Wu. Think globally, fit locally under the manifold setup: Asymptotic analysis of locally linear embedding. Annals of Statistics, 46(6B):3805 3837, 2018.

Hau-Tieng Wu and Nan Wu. Strong uniform consistency with rates for kernel density estimators with general kernels on manifolds. Information and Inference: A Journal of the IMA, 11(2): 781 799, 2022.

Y. Yang and D. B. Dunson. Bayesian manifold regression. The Annals of Statistics, 44(2):876 905, 2016.