This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Eigen-convergence of Gaussian kernelized graph Laplacian by manifold heat interpolation

Xiuyuan Cheng                  Nan Wu Department of Mathematics, Duke University. Email: xiuyuan.cheng@duke.eduDepartment of Mathematics and Department of Statistical Science, Duke University. Email: nan.wu@duke.edu
Abstract

This work studies the spectral convergence of graph Laplacian to the Laplace-Beltrami operator when the graph affinity matrix is constructed from NN random samples on a dd-dimensional manifold embedded in a possibly high dimensional space. By analyzing Dirichlet form convergence and constructing candidate approximate eigenfunctions via convolution with manifold heat kernel, we prove that, with Gaussian kernel, one can set the kernel bandwidth parameter ϵ(logN/N)1/(d/2+2)\epsilon\sim(\log N/N)^{1/(d/2+2)} such that the eigenvalue convergence rate is N1/(d/2+2)N^{-1/(d/2+2)} and the eigenvector convergence in 2-norm has rate N1/(d+4)N^{-1/(d+4)}; When ϵ(logN/N)1/(d/2+3)\epsilon\sim(\log N/N)^{1/(d/2+3)}, both eigenvalue and eigenvector rates are N1/(d/2+3)N^{-1/(d/2+3)}. These rates are up to a logN\log N factor and proved for finitely many low-lying eigenvalues. The result holds for un-normalized and random-walk graph Laplacians when data are uniformly sampled on the manifold, as well as the density-corrected graph Laplacian (where the affinity matrix is normalized by the degree matrix from both sides) with non-uniformly sampled data. As an intermediate result, we prove new point-wise and Dirichlet form convergence rates for the density-corrected graph Laplacian. Numerical results are provided to verify the theory.

Keywords: Graph Laplacian, heat kernel, Laplace-Beltrami operator, manifold learning, Gaussian kernel, spectral convergence

1 Introduction

Table 1: List of default notations
{{\mathcal{M}}} dd-dimensional manifold in D\mathbb{R}^{D}
pp data sampling density on {{\mathcal{M}}}
Δ\Delta_{{\mathcal{M}}} Laplace-Beltrami operator, also as Δ\Delta
μk\mu_{k} population eigenvalue of Δ-\Delta
ψk\psi_{k} population eigenfunctions of Δ-\Delta
λk\lambda_{k} empirical eigenvalue of graph Laplacian
vkv_{k} empirical eigenvector of graph Laplacian
\nabla_{{\mathcal{M}}} manifold gradient, also as \nabla
Ht{H}_{t} manifold heat kernel
QtQ_{t} semi-group operator of manifold diffusion, Qt=etΔQ_{t}=e^{t\Delta}
XX dataset points used for computing WW
NN number of samples in XX
ϵ\epsilon kernel bandwidth parameter
KϵK_{\epsilon} graph affinity kernel, Wij=Kϵ(xi,xj)W_{ij}=K_{\epsilon}(x_{i},x_{j}), Kϵ(x,y)=ϵd/2h(xy2ϵ)K_{\epsilon}(x,y)=\epsilon^{-d/2}h(\frac{\|x-y\|^{2}}{\epsilon})
hh a function [0,)[0,\infty)\to\mathbb{R}
m0m_{0} m0[h]:=dh(|u|2)𝑑um_{0}[h]:=\int_{\mathbb{R}^{d}}h(|u|^{2})du
m2m_{2} m2[h]:=1dd|u|2h(|u|2)𝑑um_{2}[h]:=\frac{1}{d}\int_{\mathbb{R}^{d}}|u|^{2}h(|u|^{2})du
WW kernelized graph affinity matrix
DD degree matrix of WW, Dii=j=1NWijD_{ii}=\sum_{j=1}^{N}W_{ij}
LunL_{un} un-normalized graph Laplacian
LrwL_{rw} random-walk graph Laplacian
ENE_{N} graph Dirichlet form
ρX\rho_{X} function evaluation operator, ρXf={f(xi)}i=1N\rho_{X}f=\{f(x_{i})\}_{i=1}^{N}
W~\tilde{W} density-corrected affinity matrix, W~=D1WD1\tilde{W}=D^{-1}WD^{-1}
D~\tilde{D} degree matrix of W~\tilde{W}
Asymptotic Notations
O()O(\cdot) f=O(g)f=O(g): |f|C|g||f|\leq C|g| in the limit, C>0C>0, Oa()O_{a}(\cdot) declaring the constant dependence on aa
Θ()\Theta(\cdot) f=Θ(g)f=\Theta(g): for ff, g0g\geq 0, C1gfC2gC_{1}g\leq f\leq C_{2}g in the limit, C1,C2>0C_{1},C_{2}>0
\sim fgf\sim g same as f=Θ(g)f=\Theta(g)
o()o(\cdot) f=o(g)f=o(g): for g>0g>0, |f|/g0|f|/g\to 0 in the limit
Ω()\Omega(\cdot) f=Ω(g)f=\Omega(g): for f,g>0f,g>0, f/gf/g\to\infty in the limit
O~()\tilde{O}(\cdot) O()O(\cdot) multiplied another factor involving a log, defined every time used in text
  When the superscript a is omitted, it declares that the constants are absolute ones. f=O(g1,g2)f=O(g_{1},g_{2}) means that f=O(|g1|+|g2|)f=O(|g_{1}|+|g_{2}|).

Graph Laplacian matrices built from data samples are widely used in data analysis and machine learning. The earlier works include Isomap [2], Laplacian Eigenmap [3], Diffusion Map [10, 30], among others. Apart from being a widely-used unsupervised learning method for clustering analysis and dimension reduction (see, e.g., the review papers [33, 30]), graph Laplacian methods also drew attention via the application in semi-supervised learning [24, 12, 29, 15]. Under the manifold setting, data samples are assumed to lie on low-dimensional manifolds embedded in a possibly high-dimensional ambient space. A fundamental problem is the convergence of the graph Laplacian matrix to the manifold Laplacian operator in the large sample limit. The operator point-wise convergence has been intensively studied and established in a series of works [19, 18, 4, 10, 27], and extended to variant settings, such as different kernel normalizations [23, 36] and general class of kernels [31, 5, 9]. The eigen-convergence, namely how the empirical eigenvalues and eigenvectors converge to the population eigenvalues and eigenfunctions of the manifold Laplacian, is a more subtle issue and has been studied in [4, 34, 6, 35, 28, 14] (among others) and recently in [32, 7, 11, 8].

The current work proves the eigen-convergence, specifically the consistency of eigenvalues and eigenvectors in 2-norm, for finitely many low-lying eigenvalues of the graph Laplacian constructed using Gaussian kernel from i.i.d. sampled manifold data. The result covers the un-normalized and random-walk graph Laplacian when data density is uniform, and the density-corrected graph Laplacian (defined below) with non-uniformly sampled data. For the latter, we also prove new point-wise and Dirichlet form convergence rates as an intermediate result. We overview the main results in Section 1.1 in the context of literature, which are also summarized in Table 2.

The framework of our work follows the variational principle formulation of eigenvalues using the graph and manifold Dirichlet forms. Dirichlet form-based approach to prove graph Laplacian eigen-convergence was firstly carried out in [6] under a non-probabilistic setting. [32, 7] extended the approach under the probabilistic setting, where xix_{i} are i.i.d. samples, using optimal transport techniques. Our analysis follows the same form-based approach and differs from previous works in the following aspects: Let ϵ\epsilon be the (squared) kernel bandwidth parameter corresponding to diffusion time, NN the number of samples, and dd the manifold intrinsic dimensionality,

\bullet Leveraging the observation in [10, 27] that the bias error in the point-wise rate of graph Laplacian can be improved from O(ϵ)O(\sqrt{\epsilon}) to O(ϵ)O(\epsilon) using a C2C^{2} kernel function, we show that the improved point-wise rate Errpt=O(ϵ,logNNϵd/2+1){\rm Err}_{pt}=O\left(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2+1}}}\right) of Gaussian kernelized graph Laplacian translates into an improved eigen-convergence rate than using compactly supported kernels. Specifically, the eigenvector (2-norm) convergence rate is O((logN/N)1/(d/2+3)){O}((\log N/N)^{1/(d/2+3)}), achieved at the optimal choice of ϵ(logN/N)1/(d/2+3)\epsilon\sim(\log N/N)^{1/(d/2+3)}.

\bullet We show that the eigenvalue convergence rate matches that of the Dirichlet form convergence rate Errform=O(ϵ,logNNϵd/2){\rm Err}_{form}=O\left(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}\right) in [9], which is better than the point-wise rate Errpt{\rm Err}_{pt}. This leads to an eigenvalue convergence rate of O((logN/N)1/(d/2+2)){O}((\log N/N)^{1/(d/2+2)}), achieved at the optimal choice of ϵ(logN/N)1/(d/2+2)\epsilon\sim(\log N/N)^{1/(d/2+2)}. The optimal ϵ\epsilon for eigenvalue and eigenvector estimation differs in order of NN.

\bullet In obtaining the initial crude eigenvalue lower bound (LB), called Step 1 in below, we develop a short proof using manifold heat kernel to define the “interpolation mapping”, which constructs from a vector vv a smooth function ff on {\mathcal{M}}. The manifold variational form of ff, defined via the heat kernel, naturally relates to the graph Dirichlet form of vv when the graph affinity matrix is constructed using a Gaussian kernel. The analysis makes use of special properties of manifold heat kernel and only holds when the graph affinity kernel locally approximates the heat kernel, like the Gaussian. This specialty of heat kernel has not been exploited in previous graph Laplacian analysis to obtain eigen-convergence rates.

Towards the eigen-convergence, our work also recaps and develops several intermediate results under weaker assumptions of the kernel function (i.e., non-Gaussian), including an improved point-wise convergence rate of density-corrected graph Laplacian. The density-corrected graph Laplacian, originally proposed in [10], is an important variant of the kernelized graph Laplacian where the affinity matrix is W~=D1WD1\tilde{W}=D^{-1}WD^{-1}. In applications, the data distribution pp is often not uniform on the manifold, and then the standard graph Laplacian with WW recovers the Fokker-Planck operator (weighted Laplacian) with measure p2p^{2}, which involves a drift term depending on logp\nabla_{\mathcal{M}}\log p. The density-corrected graph Laplacian, in contrast, recovers the Laplace-Beltrami operator consistently when pp satisfies certain regularity condition, and thus is useful in many applications. In this work, we first prove the point-wise convergence and Dirichlet form convergence of the density-corrected graph Laplacian with W~\tilde{W}, both matching those of the standard graph Laplacian, and this can be of independent interest. Then the eigen-consistency result extends to such graph Laplacians (with Gaussian kernel function), also achieving the same rate as the standard graph Laplacian when pp is uniform.

In below, we give an overview of the theoretical results starting from assumptions, and end the introduction section with some further literature review. In the rest of the paper, Section 2 gives preliminaries needed in the analysis. Sections 3-5 develop the eigen-convergence of standard graph Laplacians, both the un-normalized and the normalized (random-walk) ones. Section 6 extends to density-corrected graph Laplacian, and Section 7 gives numerical results. We discuss possible extensions in the last section.

Notations. Default and asymptotic notations like O()O(\cdot), Ω()\Omega(\cdot), Θ()\Theta(\cdot), are listed in Table 1. In this paper, we treat constants which are determined by hh, {\mathcal{M}}, pp as absolute ones, including the intrinsic dimension dd. We mainly track the number of samples NN and the kernel diffusion time parameter ϵ\epsilon, and we may emphasize the constant dependence on pp or {\mathcal{M}} in certain circumstances, using the subscript notation like O()O_{{\mathcal{M}}}(\cdot). All constant dependence can be tracked in the proof.

Table 2: Summary of theoretical results.
pp uniform pp non-uniform Needed assumptions Error bound
LunL_{un} with WW LrwL_{rw} with WW L~rw\tilde{L}_{rw} with W~\tilde{W} on hh on ϵ\epsilon (ϵ0+\epsilon\to 0+)
Eigenvalue UB Prop. 3.1 Prop. 3.6 Prop. 6.5 Assump. 2 ϵd/2=Ω(logNN)\epsilon^{d/2}=\Omega(\frac{\log N}{N}) form rate
Crude eigenvalue LB Prop. 4.1 Prop. 4.4 Prop. 6.6 Gaussian ϵd/2+2>cKlogNN\epsilon^{d/2+2}>c_{K}\frac{\log N}{N} O(1)O(1)
Eigenvector convergence Prop. 5.2 - - Gaussian ϵd/2+2>cKlogNN\epsilon^{d/2+2}>c_{K}\frac{\log N}{N} point-wise rate
Eigenvalue convergence Prop. 5.3 - - form rate
Eigenvalue/vector combined convergence Thm. 5.4 Thm. 5.5 Thm. 6.7 Gaussian ϵd/2+3logNN\epsilon^{d/2+3}\sim\frac{\log N}{N} (optimal order of ϵ\epsilon to minimize Errpt{\rm Err}_{pt}) Both λk\lambda_{k} and vkv_{k}: O~(N1/(d/2+3))\tilde{O}(N^{-{1}/(d/2+3)})
ϵd/2+2logNN\epsilon^{d/2+2}\sim\frac{\log N}{N} (optimal order of ϵ\epsilon to minimize Errform{\rm Err}_{form}) λk:O~(N1/(d/2+2))\lambda_{k}:\tilde{O}(N^{-{1}/{(d/2+2)}}), vk:O~(N1/(d+4))v_{k}:\tilde{O}(N^{-{1}/{(d+4)}})
Point-wise convergence Thm. 5.1 [27, 9] Thm. 6.2 Assump. 2 ϵd/2+1=Ω(logNN)\epsilon^{d/2+1}=\Omega(\frac{\log N}{N}) point-wise rate
Dirichlet form convergence Thm. 3.2 [9] Thm. 6.3 Assump. 2 ϵd/2=Ω(logNN)\epsilon^{d/2}=\Omega(\frac{\log N}{N}) form rate

“form rate” is Errform=O(ϵ,logNNϵd/2){\rm Err}_{form}=O\left(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}\right),     “point-wise rate” is Errpt=O(ϵ,logNNϵd/2+1){\rm Err}_{pt}=O\left(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2+1}}}\right).

In the table, convergence of first kmaxk_{max} eigenvalues and eigenvectors are concerned, where kmaxk_{max} is fixed. In the most right column, “λk\lambda_{k}” means the error of eigenvalue convergence, and “vkv_{k}” means the error of eigenvector convergence (in 2-norm). O~()\tilde{O}(\cdot) stands for the possible involvement of a factor of (logN)α(\log N)^{\alpha} for some α>0\alpha>0. In the 2nd (3rd) column, the eigenvector and eigenvalue convergences are proved in Thm. 5.5 (Thm. 6.7) and are not written as separated propositions. The point-wise convergence and Dirichlet form convergence results of graph Laplacian with WW hold when pp satisfies Assump. 1(A2), i.e., when pp is not uniform. The Dirichlet form convergence with rate may hold when hh is not differentiable, e.g., when h=𝟏[0,1)h={\bf 1}_{[0,1)}, c.f. Remark 2.

1.1 Overview of main results

We first introduce needed assumptions, and then provide a technical overview of our analysis in Section 1.1.2 (Steps 0-1) and Section 1.1.3 (Steps 2-3), summarized as a roadmap at the end of the section.

1.1.1 Set-up and assumptions

The current paper inherits the probabilistic manifold data setting, namely, the dataset {xi}i=1N\{x_{i}\}_{i=1}^{N} consists of i.i.d. samples drawn from a distribution on {\mathcal{M}} with density pp satisfying the following assumption:

Assumption 1 (Smooth {{\mathcal{M}}} and pp).

(A1) {{\mathcal{M}}} is a dd-dimensional compact connected CC^{\infty} manifold (without boundary) isometrically embedded in D\mathbb{R}^{D}.

(A2) pC()p\in C^{\infty}({\mathcal{M}}) and uniformly bounded both from below and above, that is, pmin,pmax>0\exists p_{min},\,p_{max}>0 s.t.

0<pminp(x)pmax<,x.0<p_{min}\leq p(x)\leq p_{max}<\infty,\quad\forall x\in{{\mathcal{M}}}.

Suppose {\mathcal{M}} is embedded via ι\iota, and when there is no danger of confusion, we use the same notation xx to denote xx\in{{\mathcal{M}}} and ι(x)D\iota(x)\in\mathbb{R}^{D}. We have the measure space (,dV)({\mathcal{M}},dV): when {\mathcal{M}} is orientable, dVdV is the Riemann volume form; otherwise, dVdV is the measure associated with the local volume form. The smoothness of pp and {\mathcal{M}} fulfills many application scenarios, and possible extensions to less regular {\mathcal{M}} or pp are postponed. Our analysis first addresses the basic case where pp is uniform on {\mathcal{M}}, i.e., p=1Vol()p=\frac{1}{\mathrm{Vol}({\mathcal{M}})} and is a positive constant. For non-uniform pp as in (A2), we adopt and analyze the density correction graph Laplacian in Section 6. In both cases, the graph Laplacian recovers the Laplace-Beltrami operator Δ\Delta_{\mathcal{M}}. In below, we write Δ\Delta_{{\mathcal{M}}} as Δ\Delta, \nabla_{{\mathcal{M}}} as \nabla.

Given NN data samples, the graph affinity matrix WW and the degree matrix DD are defined as

Wij=Kϵ(xi,xj),Dii=j=1NWij.W_{ij}=K_{\epsilon}(x_{i},x_{j}),\quad D_{ii}=\sum_{j=1}^{N}W_{ij}.

WW is real symmetric, typically Wij0W_{ij}\geq 0, and for the kernelized affinity matrix, Wij=Kϵ(xi,xj)W_{ij}=K_{\epsilon}(x_{i},x_{j}) where

Kϵ(x,y):=ϵd/2h(xy2ϵ),K_{\epsilon}(x,y):=\epsilon^{-d/2}h\left(\frac{\|x-y\|^{2}}{\epsilon}\right), (1)

for a function h:[0,)h:[0,\infty)\to\mathbb{R}. The parameter ϵ>0\epsilon>0 can be viewed as the “time” of the diffusion process. Some results in literature are written in terms of the parameter ϵ>0\sqrt{\epsilon}>0, which corresponds to the scale of the local distance xy\|x-y\| such that h(xy2ϵ)h(\frac{\|x-y\|^{2}}{\epsilon}) is of O(1)O(1) magnitude. Our results are written with respect to the time parameter ϵ\epsilon, which corresponds to the squared local distance length scale.

Our main result of graph Laplacian eigen-convergence considers when the kernelized graph affinity is computed with

h(ξ)=1(4π)d/2eξ/4,ξ[0,),h(\xi)=\frac{1}{(4\pi)^{d/2}}e^{-\xi/4},\quad\xi\in[0,\infty), (2)

we call such hh the Gaussian kernel function. (The constant factor (4π)d/2(4\pi)^{-d/2} is included in the definition of hh for theoretical convenience, and may not be needed in algorithm, e.g., in the normalized graph Laplacian the constant factor is cancelled.)

The Gaussian hh belongs to a larger family of differentiable functions:

Assumption 2 (Differentiable hh).

(C1) Regularity. hh is continuous on [0,)[0,\infty), C2C^{2} on (0,)(0,\infty).
(C2) Decay condition. a,ak>0\exists a,a_{k}>0, s.t., |h(k)(ξ)|akeaξ|h^{(k)}(\xi)|\leq a_{k}e^{-a\xi} for all ξ>0\xi>0, k=0,1,2k=0,1,2.
(C3) Non-negativity. h0h\geq 0 on [0,)[0,\infty). To exclude the case that h0h\equiv 0, assume h>0\|h\|_{\infty}>0.

A summary of results with needed assumptions is provided in Table 2, from which we can see that several important intermediate results, which can be of independent interest, only require hh to satisfy Assumption 2 or weaker, including

- Point-wise convergence of graph Laplacians.

- Convergence of the graph Dirichlet form.

- The eigenvalue upper bound (UB), which matches to the Dirichlet form convergence rate.

The point-wise convergence and Dirichlet form convergence of standard graph Laplacian only require a differentiable and decay condition of hh as originally taken in [10], and even without Assumption 2(C3) non-negativity. Our analysis of density-corrected graph Laplacian assumes Wij0W_{ij}\geq 0, and our main result of eigen-convergence needs hh to be Gaussian, thus we include (C3) in Assumption 2 to simplify exposition. The need of Gaussian hh shows up in proving the (initial crude) eigenvalue lower bound (LB), to be explained in below, and it is due to the fundamental connection between Gaussian kernel and the manifold heat kernel.

1.1.2 Eigenvalue UB/LB and the interpolation mapping

To explain these results and the difference in proving eigenvalue UB and LB, we start by introducing the notion of point-wise rate and form rate. In the current paper,

\bullet Point-wise convergence of graph Laplacians is shown to have the rate of O(ϵ,logNNϵd/2+1)O\left(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2+1}}}\right). We call this rate the “point-wise rate”, and denote by Errpt{\rm Err}_{pt}.

\bullet Convergence of the graph Dirichlet form 1ϵN2uT(DW)u\frac{1}{\epsilon N^{2}}u^{T}(D-W)u applied to smooth manifold functions, i.e., u={f(xi)}i=1Nu=\{f(x_{i})\}_{i=1}^{N} for ff smooth on {\mathcal{M}}, is shown to have the rate of O(ϵ,logNNϵd/2)O\left(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}\right). We call this rate the “form rate”, and denote by Errform{\rm Err}_{form}.

In literature, the point-wise convergence of random-walk graph Laplacian (ID1W)(I-D^{-1}W) with differentiable and decay hh was firstly shown to have rate O(ϵ,logNNϵd/2+1)O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2+1}}}) in [27]. The exposition in [27] was for Gaussian hh but the analysis therein extends directly to general hh. The Dirichlet form convergence with differentiable hh was shown to have rate O(ϵ,logNNϵd/2)O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}) in [9] via a V-statistic analysis. [9] also derived point-wise rate for both the random-walk and the un-normalized graph Laplacian (DW)(D-W). The analysis in [9] was mainly developed for kernel with adaptive bandwidth, and higher order regularity of hh (C4C^{4} instead of C2C^{2}) was assumed to handle the complication due to variable kernel bandwidth. For the fixed-bandwidth kernel as in (1), the analysis in [9] can be simplified to proceed under less restrictive conditions of hh. We include more details in below when quoting these previous results, which pave the way towards proving eigen-convergence.

Table 2 illustrates a difference between eigenvalue UB and LB analysis. Specifically, the eigenvalue UB holds for general differentiable hh, while the initial crude eigenvalue LB, and consequently the final eigenvalue and eigenvector convergence rate, need hh to be Gaussian. This difference between eigenvalue UB and LB analysis is due to the subtlety of the variational principle approach in analyzing empirical eigenvalues. To be more specific, by “projecting” the population eigenfunctions to vectors in N\mathbb{R}^{N} and use as “candidate” eigenvectors in the variational form, the Dirichlet form convergence rate directly translates into a rate of eigenvalue UB (for fixed finitely many low-lying eigenvalues). This is why the eigenvalue UB matches the form rate before any LB is derived, and we call this the “Step 0” of our analysis.

The eigenvalue LB, however, is more difficult, as has been pointed out in [6]. In [6] and following works taking the variational principle approach, the LB analysis is by “interpolating” the empirical eigenvectors to be functions on {\mathcal{M}}. Unlike with the population eigenfunctions which are known to be smooth, there is less property of the empirical eigenvectors that one can use, and any regularity property of these discrete objects is usually non-trivial to obtain [8]. The interpolation mapping in [6] first assigns a point xix_{i} to a Voronoi cell ViV_{i}, assuming that {xi}i\{x_{i}\}_{i} forms an ε\varepsilon-net of {\mathcal{M}} to begin with (a non-probabilistic setting), and this maps a vector uu to a piece-wise constant function PuP^{*}u on {\mathcal{M}}; next, PuP^{*}u is convolved with a kernel function which is compacted supported on a small geodesic ball, and this produces “candidate” eigenfunctions, whose manifold differential Dirichlet form is upper bounded by the graph Dirichlet form of uu, up to an error, through differential geometry calculations. Under the probabilistic setting of i.i.d. samples, [32] constructed the mapping PP^{*} using a Wasserstein-\infty optimal transport (OT) map, where the \infty-OT distance between the empirical measure 1Niδxi\frac{1}{N}\sum_{i}\delta_{x_{i}} and the population measure pdVpdV is bounded by constructing a Voronoi tessellation of {\mathcal{M}} when d2d\geq 2. This led to an overall eigen-convergence rate of O~(N1/2d)\tilde{O}(N^{-1/2d}) in [32] when hh is compactly supported and satisfies certain regularity conditions and d2d\geq 2, the O~()\tilde{O}(\cdot) indicating a possible a factor of certain power of logN\log N. A typical example is when hh is an indicator function h=𝟏[0,1)h={\bf 1}_{[0,1)}, which is called “ε\varepsilon-graph” in computer science literature (ε\varepsilon corresponds to ϵ\sqrt{\epsilon} in our notation). The approach was extended to kkNN graphs in [7], where the rate of eigenvalue and 22-norm eigenvector convergence was also improved to match the point-wise rate of the epsilon-graph or kkNN graph Laplacians, leading to a rate of O~(N1/(d+4))\tilde{O}(N^{-1/(d+4)}) when ϵd/2+2=Ω(logNN){\epsilon}^{d/2+2}=\Omega(\frac{\log N}{N}). The same rate was shown for \infty-norm consistency of eigenvectors in [8], combined with Lipschitz regularity analysis of empirical eigenvectors using advanced PDE tools. Eigenvalue consistency with degraded rate was obtained under the regime ϵd/2=Ω(logNN)\epsilon^{d/2}=\Omega(\frac{\log N}{N}), which is very sparse graph just beyond graph connectivity threshold [7].

In the current work, we take a different approach for the interpolation mapping in the eigenvalue LB analysis. Our method is based on manifold heat kernels, and the analysis makes use of the fact that at short time and on small local neighborhoods, the heat kernel Ht(x,y){H}_{t}(x,y) can be approximated by

Gt(x,y):=1(4πt)d/2ed(x,y)24t,G_{t}(x,y):=\frac{1}{(4\pi t)^{d/2}}e^{-\frac{d_{\mathcal{M}}(x,y)^{2}}{4t}}, (3)

and consequently by Kt(x,y)K_{t}(x,y) when hh is Gaussian as in (2). The first approximation HtGtH_{t}\approx G_{t} is by classical results of elliptical operators on Riemannian manifolds, c.f. Theorem 2.1. Next, we show that GtKtG_{t}\approx K_{t} because KtK_{t} replaces geodesic distance d(x,y)d_{\mathcal{M}}(x,y) with Euclidean distance xy\|x-y\| in GtG_{t}, and the two locally match by d(x,y)=xy+O(xy3)d_{\mathcal{M}}(x,y)=\|x-y\|+O(\|x-y\|^{3}). (The constant in the big-O here depends on the second fundamental form, and by compactness of {\mathcal{M}} is universal for xx. Similar universal constant in big-O holds throughout the paper.) These estimates allow us to construct interpolated C()C^{\infty}({\mathcal{M}}) functions Ir[v]I_{r}[v] from discrete vector vNv\in\mathbb{R}^{N} by convolving with the heat kernel at time r=ϵδ2r=\frac{\epsilon\delta}{2}, where 0<δ<10<\delta<1 is a fixed constant determined by the first K=kmax+1K=k_{max}+1 low-lying population eigenvalues μk\mu_{k} of Δ-\Delta. Specifically, δ\delta is inversely proportional to the smallest eigen-gap between μk\mu_{k} for kKk\leq K (μk\mu_{k} assumed to have single multiplicity in the first place, and then the result generalizes to greater than one multiplicity), which is an O(1)O(1) constant determined by Δ-\Delta and KK. Applying the variational principle to the operator IQtI-Q_{t}, where QtQ_{t} is the diffusion semi-group operator and QtQ_{t}’s spectrum is determined by that of Δ-\Delta, allows to prove an initial eigenvalue LB smaller than half of the minimum first-KK eigen-gap.

The step to derive O(1)O(1) initial crude eigenvalue LB using manifold heat kernel interpolation mapping is called “Step 1” in our analysis. While the interpolation mapping by convolving with a smooth kernel has been used in previous works [6, 32, 7], using the manifold heat kernel plays a special role in the eigenvalue LB analysis, and this cannot be equivalently achieved by other choices of kernels (unless the kernel locally approximates the heat kernel, like the Gaussian kernel here). Specifically, Lemma 4.3 is proved using heat kernel properties (without using concentration of i.i.d. data samples), and the lemma connects the continuous integral form of interpolated candidate eigenfunctions with the graph Dirichlet form.

1.1.3 Road-map of analysis

The previous subsection has explained Step 0 and 1 of our analysis. Here we summarize the rest of the analysis and provide a road-map.

After an O(1)O(1) initial crude eigenvalue LB is obtained in Step 1, we adopt the “bootstrap strategy” from [7], named as therein, to obtain a refined (2-norm) eigenvector consistency rate to match to the graph Laplacian point-wise convergence rate. We call this “Step 2”. Note that the use of smooth kernel (like Gaussian) has an improved bias error in the point-wise rate than compactly supported kernel function, and then consequently improves the eigen-convergence rate, see more in Remark 4.

Next, leveraging the eigenvector consistency proved in Step 2, we further improve the eigenvalue convergence to match the form rate, which is better than the point-wise rate. We call this “Step 3”. Then the refined eigenvalue LB matches the eigenvalue UB in rate. In the process, the first KK many empirical eigenvalues are upper bounded to be O(1)O(1), which follows by the eigenvalue UB proved in the beginning.

In summary, our eigen-convergence analysis consists of the following four steps,

  • -

    Step 0. Eigenvalue UB by the Dirichlet form convergence, matching to the form rate.

  • -

    Step 1. Initial crude eigenvalue LB, providing eigenvalue error up to the smallest first KK eigen-gap.

  • -

    Step 2. 22-norm consistency of eigenvectors, up to the point-wise rate.

  • -

    Step 3. Refined eigenvalue consistency, up to the form rate.

Step 1 requires hh to be non-negative and currently only covers the Gaussian case. This may be relaxed, since the proof only uses the approximation property of hh, namely that KϵHϵK_{\epsilon}\approx{H}_{\epsilon}. In this work, we restrict to the Gaussian case for simplicity and the wide use of Gaussian kernels in applications.

1.2 More related works

As we adopt a Dirichlet form-based analysis, the eigen-convergence result in the current paper is of the same type as in previous works using variational principle [6, 32, 7]. In particular, the rate concerns the convergence of the first kmaxk_{max} many low-lying eigenvalues of the Laplacian, where kmaxk_{max} is a fixed finite integer. The constants in the big-OO notations in the bounds are treated as O(1)O(1), and they depend on kmaxk_{max} and these leading eigenvalues and eigenfunctions of the manifold Laplacian. Such results are useful for applications where leading eigenvectors are the primary focus, e.g., spectral clustering and dimension-reduced spectral embedding. An alternative approach is to analyze functional operator consistency [4, 34, 28, 26], which may provide different eigen-consistency bounds, e.g., \infty-norm consistency of eigenvectors using compact embedding of Glivenko-Cantelli function classes [11].

The current work considers noise-less data on {\mathcal{M}}, while the robustness of graph Laplacian against noise in data is important for applications. When manifold data vectors are perturbed by noise in the ambient space, [13] showed that Gaussian kernel function hh has special property to make kernelized graph Laplacian robust to noise (by a modification of diagonal entries). More recently, [20] showed that bi-stochastic normalization can make the Gaussian kernelized graph affinity matrix robust to high dimensional heteroskedastic noise in data. These results suggest that Gaussian hh is a special and useful choice of kernel function for graph Laplacian methods.

Meanwhile, bi-stochastically normalized graph Laplacian has been studied in [23], where the point-wise convergence of the kernel integral operator to the manifold operator was proved. The spectral convergence of bi-stochastically normalized graph Laplacian for data on hyper-torus was recently proved to be O(N1/(d/2+4)+o(1))O(N^{-1/(d/2+4)+o(1)}) in [36]. The density-corrected affinity kernel matrix W~=D1WD1\tilde{W}=D^{-1}WD^{-1}, which is analyzed in the current work, provides another normalization of the graph Laplacian which recovers the Laplace-Beltrami operator. It would be interesting to explore the connections to these works and extend our analysis to bi-stochastically normalized graph Laplacians, which may have better properties of spectral convergence and noise-robustness.

2 Preliminaries

2.1 Graph and manifold Laplacians

We define the following moment constants of function hh satisfying Assumption 2,

m0[h]:=dh(u2)𝑑u,m2[h]:=1ddu2h(u2)𝑑u,m~[h]:=m2[h]2m0[h].m_{0}[h]:=\int_{\mathbb{R}^{d}}h(\|u\|^{2})du,\quad m_{2}[h]:=\frac{1}{d}\int_{\mathbb{R}^{d}}\|u\|^{2}h(\|u\|^{2})du,\quad\tilde{m}[h]:=\frac{m_{2}[h]}{2m_{0}[h]}.

By (C3), h0h\geq 0 and the case h0h\equiv 0 is excluded, thus m0[h],m2[h]>0m_{0}[h],m_{2}[h]>0. With Gaussian hh as in (2), m0=1m_{0}=1, m2=2m_{2}=2, and m~=1\tilde{m}=1. Denote m2[h]m_{2}[h] and m0[h]m_{0}[h] by m2m_{2} and m0m_{0} for a shorthand notation, and

  • The un-normalized graph Laplacian LunL_{un} is defined as

    Lun:=1m22pϵN(DW).L_{un}:=\frac{1}{\frac{m_{2}}{2}p\epsilon N}(D-W). (4)

    Note that the standard un-normalized graph Laplacian is usually DWD-W, and we divide by the constant m22pϵN\frac{m_{2}}{2}p\epsilon N for the convergence of LunL_{un} to Δ-\Delta.

  • The random-walk graph Laplacian LrwL_{rw} is defined as

    Lrw:=1m22m0ϵ(ID1W),L_{rw}:=\frac{1}{\frac{m_{2}}{2m_{0}}\epsilon}(I-D^{-1}W), (5)

    with the constant normalization to ensure convergence to Δ-\Delta.

The matrix LunL_{un} is real-symmetric, positive semi-definite (PSD), and the smallest eigenvalue is zero. Suppose eigenvalues of LunL_{un} are λk\lambda_{k}, k=1,2,k=1,2,\cdots, and sorted in ascending order, that is,

0=λ1(Lun)λ2(Lun)λN(Lun).0=\lambda_{1}(L_{un})\leq\lambda_{2}(L_{un})\leq\cdots\leq\lambda_{N}(L_{un}).

The LrwL_{rw} matrix is well-define when Di>0D_{i}>0 for all ii, which holds w.h.p. under the regime that ϵd/2=Ω(logNN)\epsilon^{d/2}=\Omega(\frac{\log N}{N}), c.f. Lemma 3.5. We always work under the ϵd/2=Ω(logNN)\epsilon^{d/2}=\Omega(\frac{\log N}{N}) regime, namely the connectivity regime. Due to that D1WD^{-1}W is similar to D1/2WD1/2D^{-1/2}WD^{-1/2} which is PSD, LrwL_{rw} is also real-diagonalized and has NN non-negative real eigenvalues, sorted and denoted as 0=λ1(Lrw)λ2(Lrw)λN(Lrw)0=\lambda_{1}(L_{rw})\leq\lambda_{2}(L_{rw})\leq\cdots\leq\lambda_{N}(L_{rw}). We also have that, by the min-max variational formula for real-symmetric matrix,

λk(Lun)=minLN,dim(L)=ksupvL,v0vTLunvvTv,k=1,,N.\lambda_{k}(L_{un})=\min_{L\subset\mathbb{R}^{N},\,dim(L)=k}\sup_{v\in L,v\neq 0}\frac{v^{T}L_{un}v}{v^{T}v},\quad k=1,\cdots,N.

We define the graph Dirichlet form EN(u)E_{N}(u) for uNu\in\mathbb{R}^{N} as

EN(u)=1m221ϵN2uT(DW)u=1m2212ϵN2i,j=1NWi,j(uiuj)2.E_{N}(u)=\frac{1}{\frac{m_{2}}{2}}\frac{1}{\epsilon N^{2}}u^{T}(D-W)u=\frac{1}{\frac{m_{2}}{2}}\frac{1}{2\epsilon N^{2}}\sum_{i,j=1}^{N}W_{i,j}(u_{i}-u_{j})^{2}. (6)

By (4), EN(u)=p1NuTLunuE_{N}(u)=p\frac{1}{N}u^{T}L_{un}u, and thus

λk(Lun)=minLN,dim(L)=ksupvL,v0EN(v)p1Nv2,k=1,,N.\lambda_{k}(L_{un})=\min_{L\subset\mathbb{R}^{N},\,dim(L)=k}\sup_{v\in L,v\neq 0}\frac{E_{N}(v)}{p\frac{1}{N}\|v\|^{2}},\quad k=1,\cdots,N. (7)

Similarly, we have

λk(Lrw)=minLN,dim(L)=ksupvL,v0EN(v)1m01N2vTDv,k=1,,N.\lambda_{k}(L_{rw})=\min_{L\subset\mathbb{R}^{N},\,dim(L)=k}\sup_{v\in L,v\neq 0}\frac{E_{N}(v)}{\frac{1}{m_{0}}\frac{1}{N^{2}}v^{T}Dv},\quad k=1,\cdots,N. (8)

To introduce notations of manifold Laplacian, we define inner-product in H:=L2(,dV)H:=L^{2}({\mathcal{M}},dV) as f,g:=f(x)g(x)𝑑V(x)\langle f,g\rangle:=\int_{\mathcal{M}}f(x)g(x)dV(x), for f,gL2(,dV)f,g\in L^{2}({\mathcal{M}},dV). We also use ,q\langle\cdot,\cdot\rangle_{q} to denote inner-product in L2(,qdV)L^{2}({\mathcal{M}},qdV), qdVqdV being a general measure on {\mathcal{M}} (not necessarily probability measure), that is f,gq:=f(x)g(x)q(x)𝑑V(x)\langle f,g\rangle_{q}:=\int_{\mathcal{M}}f(x)g(x)q(x)dV(x), for f,gL2(,qdV)f,g\in L^{2}({\mathcal{M}},qdV). For smooth connected compact manifold {\mathcal{M}}, the (minus) manifold Laplacian-Beltrami operator Δ-\Delta has eigen-pairs {μk,ψk}k=1\{\mu_{k},\psi_{k}\}_{k=1}^{\infty},

0=μ1<μ2μk,0=\mu_{1}<\mu_{2}\leq\cdots\leq\mu_{k}\leq\cdots,
Δψk=μkψk,ψk,ψl=δk,l,ψkC(),k,l=1,2,.-\Delta\psi_{k}=\mu_{k}\psi_{k},\quad\langle\psi_{k},\psi_{l}\rangle=\delta_{k,l},\quad\psi_{k}\in C^{\infty}({\mathcal{M}}),\quad k,l=1,2,\cdots.

The second eigenvalue μ2>0\mu_{2}>0 due to connectivity of {\mathcal{M}}. When μi==μi+l1=μ\mu_{i}=\cdots=\mu_{i+l-1}=\mu for some eigenvalue μ\mu of Δ-\Delta having multiplicity ll, the eigenfunctions ψi,,ψi+l1\psi_{i},\cdots,\psi_{i+l-1} can be set to be an orthonormal basis of the ll-dimensional eigenspace associated with μ\mu. Note that ψkC()\psi_{k}\in C^{\infty}({\mathcal{M}}) for generic smooth {\mathcal{M}}.

2.2 Heat kernel on {\mathcal{M}}

We leverage the special property of Gaussian kernel in the ambient space D\mathbb{R}^{D} that it locally approximates the manifold heat kernel on {{\mathcal{M}}}. We start from the notations of manifold heat kernel. Since {\mathcal{M}} is smooth compact (no-boundary), the Green’s function of the heat equation on {{\mathcal{M}}} exists, namely the heat kernel Ht(x,y){H}_{t}(x,y) of {\mathcal{M}}. We denote the heat diffusion semi-group operator as QtQ_{t} which can be formally written as Qt=etΔQ_{t}=e^{t\Delta}, and

Qtf(x)=Ht(x,y)f(y)𝑑V(y),fL2(,dV).Q_{t}f(x)=\int_{{\mathcal{M}}}{H}_{t}(x,y)f(y)dV(y),\quad\forall f\in L^{2}({\mathcal{M}},dV).

By that QtQ_{t} is semi-group, we have the reproduce property

Ht(x,y)Ht(y,z)𝑑V(y)=H2t(x,z),x,z,t>0.\int_{{\mathcal{M}}}{H}_{t}(x,y){H}_{t}(y,z)dV(y)=H_{2t}(x,z),\quad\forall x,z\in{\mathcal{M}},\quad\forall t>0.

Meanwhile, by the probability interpretation,

Ht(x,y)𝑑V(y)=1,x,t>0.\int_{{\mathcal{M}}}{H}_{t}(x,y)dV(y)=1,\quad\forall x\in{\mathcal{M}},\quad\forall t>0.

Using the eigenvalue and eigenfunctions {μk,ψk}k\{\mu_{k},\psi_{k}\}_{k} of Δ-\Delta, the heat kernel has the expansion representation Ht(x,y)=k=1etμkψk(x)ψk(y){H}_{t}(x,y)=\sum_{k=1}^{\infty}e^{-t\mu_{k}}\psi_{k}(x)\psi_{k}(y). We will not use the spectral expansion of Ht{H}_{t} in our analysis, but only that ψk\psi_{k} are also eigenfunctions of QtQ_{t}, that is,

Qtψk=etμkψk,k=1,2,Q_{t}\psi_{k}=e^{-t\mu_{k}}\psi_{k},\quad k=1,2,\cdots (9)

Next, we derive Lemma 2.2, which characterizes two properties of the heat kernel Ht{H}_{t} at sufficiently short time: First, on a local neighborhood on {\mathcal{M}}, Ht(x,y)H_{t}(x,y) can be approximated by Kt(x,y)K_{t}(x,y) in the leading order, where KtK_{t} is defined as in (1) with Gaussian hh; Second, globally on the manifold the heat kernel Ht(x,y)H_{t}(x,y) has a sub-Gaussian decay. These are based on classical results about heat kernel on Riemannian manifolds [21, 16, 25, 17], summarized in the following theorem.

Theorem 2.1 (Heat kernel parametrix and decay [25, 16]).

Suppose {\mathcal{M}} is as in Assumption 1 (A1), and m>d/2+2m>d/2+2 is a positive integer. Then there are positive constants t0<1t_{0}<1, δ0<inj()\delta_{0}<inj({\mathcal{M}}) i.e. the injective radius of {\mathcal{M}}, and both t0t_{0} and δ0\delta_{0} depend on {\mathcal{M}}, and

1) Local approximation: There are positive constants C1C_{1}, C2C_{2} which depending on {\mathcal{M}}, and u0,,umu_{0},\cdots,u_{m} C()\in C^{\infty}({\mathcal{M}}), where u0u_{0} satisfies that

|u0(x,y)1|C1d(x,y)2,y,d(y,x)<δ0,|u_{0}(x,y)-1|\leq C_{1}d_{\mathcal{M}}(x,y)^{2},\quad\forall y\in{\mathcal{M}},\,d_{\mathcal{M}}(y,x)<\delta_{0},

and GtG_{t} is defined as in (3), such that, when t<t0t<t_{0}, for any xx\in{\mathcal{M}},

|Ht(x,y)Gt(x,y)(l=0mtlul(x,y))|C2tmd/2+1,y,d(y,x)<δ0.\left|{H}_{t}(x,y)-G_{t}(x,y)\left(\sum_{l=0}^{m}t^{l}u_{l}(x,y)\right)\right|\leq C_{2}t^{m-d/2+1},\quad\forall y\in{\mathcal{M}},\,d_{\mathcal{M}}(y,x)<\delta_{0}. (10)

2) Global decay: There is positive constant C3C_{3} depending on {\mathcal{M}} such that, when t<t0t<t_{0},

Ht(x,y)C3td/2ed(x,y)25t,x,y.{H}_{t}(x,y)\leq C_{3}t^{-d/2}e^{-\frac{d_{\mathcal{M}}(x,y)^{2}}{5t}},\quad\forall x,y\in{\mathcal{M}}. (11)

Part 1) is by the classical parametrix construction of heat kernel on {\mathcal{M}}, see e.g. Chapter 3 of [25], and Part 2) follows the classical upper bound of heat kernel by Gaussian estimate dating back to 60s [1, 17]. We include a proof of the theorem in Appendix B for completeness.

The theorem directly gives to the following lemma (proof in Appendix B), which is useful for our construction of interpolation mapping using heat kernel. We denote by Bδ(x)B_{\delta}(x) the Euclidean ball in D\mathbb{R}^{D} centered at point xx of radius δ\delta.

Lemma 2.2.

Suppose {\mathcal{M}} is as in Assumption 1 (A1), and t0+t\to 0+. Let δt:=6(10+d2)tlog1t\delta_{t}:=\sqrt{6(10+\frac{d}{2})t\log{\frac{1}{t}}}, and Kt(x,y)K_{t}(x,y) be with Gaussian kernel hh, i.e., Kt(x,y)=(4πt)d/2exy2/4tK_{t}(x,y)=(4\pi t)^{-d/2}e^{-\|x-y\|^{2}/4t}. Then there is positive constant ϵ0\epsilon_{0} depending on {\mathcal{M}} such that, when t<ϵ0t<\epsilon_{0}, for any xx\in{\mathcal{M}},

Ht(x,y)=Kt(x,y)(1+O(t(logt1)2))+O(t3),yBδt(x),\displaystyle{H}_{t}(x,y)=K_{t}(x,y)(1+{O}(t(\log t^{-1})^{2}))+O(t^{3}),\quad\forall y\in B_{\delta_{t}}(x)\cap{\mathcal{M}}, (12)
Ht(x,y)=O(t10),yBδt(x),\displaystyle{H}_{t}(x,y)=O(t^{10}),\quad\forall y\notin B_{\delta_{t}}(x)\cap{\mathcal{M}}, (13)
Ht(x,y)=O(td/2),x,y.\displaystyle{H}_{t}(x,y)=O(t^{-d/2}),\quad\forall x,y\in{\mathcal{M}}. (14)

The constants in big-OO in all the equations only depend on {\mathcal{M}} and are uniform for all xx.

3 Eigenvalue upper bound

In this section, we consider uniform pp on {\mathcal{M}}, and standard graph Laplacians LunL_{un} and LrwL_{rw} with the kernelized affinity matrix WW, Wij=Kϵ(xi,xj)W_{ij}=K_{\epsilon}(x_{i},x_{j}) defined as in (1). We show the eigenvalue UB for general differentiable hh satisfying Assumption 2, not necessarily Gaussian.

3.1 Un-normalized graph Laplacian eigenvalue UB

We now derive Step 0 for LunL_{un}, the result being summarized in the following proposition.

Proposition 3.1 (Eigenvalue UB of LunL_{un}).

Under Assumption 1(A1), pp being uniform on {\mathcal{M}}, and Assumption 2. For fixed KK\in\mathbb{N}, if as NN\to\infty, ϵ0+\epsilon\to 0+ and ϵd/2=Ω(logNN)\epsilon^{d/2}=\Omega(\frac{\log N}{N}), then for sufficiently large NN, w.p. >14K2N10>1-4K^{2}N^{-10},

λk(Lun)μk+O(ϵ,logNNϵd/2),k=1,,K.\lambda_{k}(L_{un})\leq\mu_{k}+O\left(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}\right),\quad k=1,\cdots,K.

The proposition holds when the population eigenvalues μk\mu_{k} have more than 1 multiplicities, as long as they are sorted in an ascending order. The proof is by constructing a kk-dimensional subspace LL in (7) spanned by vectors in N\mathbb{R}^{N} which are produced by evaluating the population eigenfunctions ψk\psi_{k} at the NN data points. The proof is given in the end of this subsection after we introduce a few needed middle-step results.

Given X={xi}i=1NX=\{x_{i}\}_{i=1}^{N}, define the function evaluation operator ρX\rho_{X} applied to f:f:{\mathcal{M}}\to\mathbb{R} as

ρX:C()N,ρXf=(f(x1),,f(xN)).\rho_{X}:C({\mathcal{M}})\to\mathbb{R}^{N},\quad\rho_{X}f=(f(x_{1}),\cdots,f(x_{N})).

We will use uk=1pρXψku_{k}=\frac{1}{\sqrt{p}}\rho_{X}\psi_{k} as “candidate” approximate eigenvectors. To analyze EN(1pρXψk)E_{N}(\frac{1}{\sqrt{p}}\rho_{X}\psi_{k}), the following result from [9] shows that it converges to the differential Dirichlet form

p1ψk,(Δ)ψkp2=pμkp^{-1}\langle\psi_{k},(-\Delta)\psi_{k}\rangle_{p^{2}}=p\mu_{k}

with the form rate. The result is for general smooth pp and weighted Laplacian Δq\Delta_{q}, which is defined as Δq:=Δ+qq\Delta_{q}:=\Delta+\frac{\nabla q}{q}\cdot\nabla for measure qdVqdV on {\mathcal{M}}. Δq\Delta_{q} is reduced to Δ\Delta when qq is uniform.

Theorem 3.2 (Theorem 3.4 in [9]).

Under Assumptions 1 and 2, as NN\to\infty, ϵ0+\epsilon\to 0+, ϵd/2=Ω(logNN)\epsilon^{d/2}=\Omega(\frac{\log N}{N}), then for any fC()f\in C^{\infty}({{\mathcal{M}}}), when NN is sufficiently large, w.p. >12N10>1-2N^{-10},

EN(ρXf)=f,Δp2fp2+Op,f(ϵ)+O(logNNϵd/2|f|4p2).E_{N}(\rho_{X}f)=\langle f,-\Delta_{p^{2}}f\rangle_{p^{2}}+O_{p,f}\left(\epsilon\right)+O\left(\sqrt{\frac{\log N}{N\epsilon^{d/2}}\int_{{\mathcal{M}}}|\nabla f|^{4}p^{2}}\right).

The constant in Op,f()O_{p,f}(\cdot) depends on the C4C^{4} norm of pp and ff on {\mathcal{M}}, and that in O()O(\cdot) is an absolute one.

Proof of Theorem 3.2.

The proof is by a going through of the proof of Theorem 3.4 of [9] under the simplified situation when β=0\beta=0 (no normalization of the estimated density is involved). Specifically, the proof uses the concentration of the VV-statistics Vij:=1ϵKϵ(xi,xj)(f(xi)f(xj))2V_{ij}:=\frac{1}{\epsilon}K_{\epsilon}(x_{i},x_{j})(f(x_{i})-f(x_{j}))^{2}. The expectation of 𝔼Vij\mathbb{E}V_{ij}, iji\neq j, equals 1ϵKϵ(x,y)(f(x)f(y))2p(x)p(y)𝑑V(x)𝑑V(y)=m2[h]f,Δp2fp2+Op,f(ϵ)\frac{1}{\epsilon}\int_{{\mathcal{M}}}\int_{{\mathcal{M}}}K_{\epsilon}(x,y)(f(x)-f(y))^{2}p(x)p(y)dV(x)dV(y)=m_{2}[h]\langle f,-\Delta_{p^{2}}f\rangle_{p^{2}}+O_{p,f}(\epsilon). Meanwhile, |Vij||V_{ij}| is bounded by O(ϵd/2)O(\epsilon^{-d/2}), and the variance of the VijV_{ij} can also be bounded by O(ϵd/2)O(\epsilon^{-d/2}) with the constant as in the theorem, following the calculation in the proof of Theorem 3.4 in [9]. The concentration of 1N(N1)i,j=1NVij\frac{1}{N(N-1)}\sum_{i,j=1}^{N}V_{ij} at 𝔼Vij\mathbb{E}V_{ij} then follows by the decoupling of the VV-statistics, and it gives the high probability bound in the theorem.

Note that the results in [9] are proved under the assumption that hh to be C4C^{4} rather than C2C^{2}, that is, requiring Assumption 2(C1)(C2) to hold for up to 4-th derivative of hh. This is because C4C^{4} regularity of hh is used to handle complication of the adaptive bandwidth in the other analysis in [9]. With the fixed bandwidth kernel Kϵ(x,y)K_{\epsilon}(x,y) as defined in (1), C2C^{2} regularity suffices, as originally assumed in [10]. ∎

Remark 1 (Relaxation of Assumption 2).

Since the proof only involves the computation of moments of the VV-statistic, it is possible to relax Assumption 2(C3) non-negativity of hh and replace with certain non-vanishing conditions on m0[h]m_{0}[h] and m2[h]m_{2}[h], e.g., as in [10] and Assumption A.3 in [9]. Since the non-negativity of WijW_{ij} is used in other places in the paper, and our eigenvalue LB needs hh to be Gaussian, we adopt the non-negativity of hh in Assumption 2 for simplicity. The C4C^{4} regularity of ff may also be relaxed, and the constant in Op,f()O_{p,f}(\cdot) may be improved accordingly. These extensions are not further pursued here.

Remark 2 (Dirichlet form convergence with compactly supported hh).

The “epsilon-graph” corresponds to construct graph affinity using the indicator function kernel h=𝟏[0,1)h={\bf 1}_{[0,1)}. Note that the “epsilon” stands for the scale of local distance and thus is the ϵ\sqrt{\epsilon} here, because our ϵ\epsilon is “time”. When h=𝟏[0,1)h={\bf 1}_{[0,1)}, using the same method as in the proof of Lemma 8 in [10], one can verify that (proof in Appendix C.1), for iji\neq j,

𝔼Vij=m2[h]f,Δp2fp2+Op,f(ϵ),fC().\mathbb{E}V_{ij}=m_{2}[h]\langle f,-\Delta_{p^{2}}f\rangle_{p^{2}}+O_{p,f}(\epsilon),\quad f\in C^{\infty}({\mathcal{M}}). (15)

The boundedness and variance of VijV_{ij} are again bounded by O(ϵd/2)O(\epsilon^{-d/2}), and thus the Dirichlet form convergence with h=𝟏[0,1)h={\bf 1}_{[0,1)} has the same rate O(ϵ,logNNϵd/2)O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}) as in Theorem 3.2. This firstly implies that the eigenvalue UB also has the same rate, following the same proof of Proposition 3.1. The final eigen-convergence rate also depends on the point-wise rate of the graph Laplacian, see more in Remark 4.

In Theorem 3.2 and in below, the logN\log N factor in the variance error bound is due to the concentration argument. Throughout the paper, the classical Bernstein inequality Lemma B.1 is intensively used.

To proceed, recall the definition of EN(u)E_{N}(u) as in (6), we define the bi-linear form for u,vNu,v\in\mathbb{R}^{N} as

BN(u,v):=14(EN(u+v)EN(uv))=1m2/21ϵN2uT(DW)v,B_{N}(u,v):=\frac{1}{4}(E_{N}(u+v)-E_{N}(u-v))=\frac{1}{m_{2}/2}\frac{1}{\epsilon N^{2}}u^{T}(D-W)v,

which is symmetric, i.e., BN(u,v)=BN(v,u)B_{N}(u,v)=B_{N}(v,u), and BN(u,u)=EN(u)B_{N}(u,u)=E_{N}(u). The following lemma characterizes the forms ENE_{N} and BNB_{N} applied to ρXψk\rho_{X}\psi_{k}, proved in Appendix C.1.

Lemma 3.3.

Under Assumption 1 (A1), pp being uniform on {\mathcal{M}}, and Assumption 2. As NN\to\infty, ϵ0+\epsilon\to 0+, ϵd/2N=Ω(logN)\epsilon^{d/2}N=\Omega(\log N). For fixed KK, when NN is sufficiently large, w.p. >12K2N10>1-2K^{2}N^{-10},

EN(1pρXψk)=pμk+O(ϵ)+O(logNNϵd/2),k=1,,K,BN(1pρXψk,1pρXψl)=O(ϵ)+O(logNNϵd/2),kl, 1k,lK.\begin{split}E_{N}(\frac{1}{\sqrt{p}}\rho_{X}\psi_{k})&=p\mu_{k}+O(\epsilon)+O\left(\sqrt{\frac{\log N}{N\epsilon^{d/2}}}\right),\quad k=1,\cdots,K,\\ B_{N}(\frac{1}{\sqrt{p}}\rho_{X}\psi_{k},\frac{1}{\sqrt{p}}\rho_{X}\psi_{l})&=O(\epsilon)+O\left(\sqrt{\frac{\log N}{N\epsilon^{d/2}}}\right),\quad k\neq l,\,1\leq k,l\leq K.\end{split} (16)

We need to show the linear independence of the vectors ρXψ1,,ρXψK\rho_{X}\psi_{1},\cdots,\rho_{X}\psi_{K} such that they span a KK-dimensional subspace in N\mathbb{R}^{N}. This holds w.h.p. at large NN, by the following lemma showing the near-isometry of the projection mapping ρX\rho_{X}, proved in Appendix C.1.

Lemma 3.4.

Under Assumptions 1 (A1), pp being uniform on {\mathcal{M}}. For fixed KK, when NN is sufficiently large, w.p. >12K2N10>1-2K^{2}N^{-10},

1N1pρXψk2=1+O(logNN), 1kK;1N(1pρXψk)T(1pρXψl)=O(logNN),kl, 1k,lK.\begin{split}\frac{1}{N}\|\frac{1}{\sqrt{p}}\rho_{X}\psi_{k}\|^{2}&=1+O(\sqrt{\frac{\log N}{N}}),\,1\leq k\leq K;\\ \frac{1}{N}(\frac{1}{\sqrt{p}}\rho_{X}\psi_{k})^{T}(\frac{1}{\sqrt{p}}\rho_{X}\psi_{l})&=O(\sqrt{\frac{\log N}{N}}),\,k\neq l,\,1\leq k,l\leq K.\end{split} (17)

Given these estimates, we are ready to prove Proposition 3.1.

Proof of Proposition 3.1.

For fixed KK, consider the intersection of both good events in Lemma 3.3 and 3.4, which happens w.p. >14K2N10>1-4K^{2}N^{-10} with large enough NN. Let uk=1pρXψku_{k}=\frac{1}{\sqrt{p}}\rho_{X}\psi_{k}, by (17), the set {u1,,uK}\{u_{1},\cdots,u_{K}\} is linearly independent.

For any 1kK1\leq k\leq K, let L=Span{u1,,uk}L=\text{Span}\{u_{1},\cdots,u_{k}\}, then dim(L)=kdim(L)=k. By (7), to show the UB of λk\lambda_{k} as in the proposition, it suffices to show that

supvL,v2=N1pEN(v)μk+O(ϵ)+O(logNNϵd/2).\sup_{v\in L,\|v\|^{2}=N}\frac{1}{p}E_{N}(v)\leq\mu_{k}+O(\epsilon)+O\left(\sqrt{\frac{\log N}{N\epsilon^{d/2}}}\right).

For any vLv\in L, v2=N\|v\|^{2}=N, there are cjc_{j}, 1jk1\leq j\leq k, such that v=j=1kcjujv=\sum_{j=1}^{k}c_{j}u_{j}. By (17),

1=1Nv2=j=1kcj2(1+O(logNN))+jl,j,l=1k|cj||cl|O(logNN)=c2(1+O(KlogNN)),1=\frac{1}{N}\|v\|^{2}=\sum_{j=1}^{k}c_{j}^{2}(1+O(\sqrt{\frac{\log N}{N}}))+\sum_{j\neq l,j,l=1}^{k}|c_{j}||c_{l}|O(\sqrt{\frac{\log N}{N}})=\|c\|^{2}(1+O(K\sqrt{\frac{\log N}{N}})),

thus c2=1+O(logNN)\|c\|^{2}=1+O(\sqrt{\frac{\log N}{N}}). Meanwhile, EN(v)=EN(j=1kcjuj)=j,l=1kcjclBN(uj,ul)E_{N}(v)=E_{N}(\sum_{j=1}^{k}c_{j}u_{j})=\sum_{j,l=1}^{k}c_{j}c_{l}B_{N}(u_{j},u_{l}), and by (16),

EN(v)\displaystyle E_{N}(v) =j=1kcj2(pμj+O(ϵ,logNNϵd/2))+jl,j,l=1k|cj||cl|O(ϵ,logNNϵd/2)\displaystyle=\sum_{j=1}^{k}c_{j}^{2}\left(p\mu_{j}+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})\right)+\sum_{j\neq l,j,l=1}^{k}|c_{j}||c_{l}|O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})
=pj=1kμjcj2+Kc2O(ϵ,logNNϵd/2)c2{pμk+O(ϵ,logNNϵd/2)},\displaystyle=p\sum_{j=1}^{k}\mu_{j}c_{j}^{2}+K\|c\|^{2}O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})\leq\|c\|^{2}\left\{p\mu_{k}+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})\right\}, (18)

where since KK is fixed integer, we incorporate it into the big-OO. Also, μkμK=O(1)\mu_{k}\leq\mu_{K}=O(1), and then

1pEN(v)(1+O(logNN)){μk+O(ϵ)+O(logNNϵd/2)}=μk+O(ϵ)+O(logNNϵd/2),\frac{1}{p}E_{N}(v)\leq\left(1+O(\sqrt{\frac{\log N}{N}})\right)\left\{\mu_{k}+O(\epsilon)+O\left(\sqrt{\frac{\log N}{N\epsilon^{d/2}}}\right)\right\}=\mu_{k}+O(\epsilon)+O\left(\sqrt{\frac{\log N}{N\epsilon^{d/2}}}\right),

which finishes the proof. ∎

3.2 Random-walk graph Laplacian eigenvalue UB

We fist establish a concentration argument of DiD_{i} in the following lemma, which shows that Di>0D_{i}>0 w.h.p., by that 1NDi\frac{1}{N}D_{i} concentrates at the value of m0p>0m_{0}p>0. Consequently, 1N2uTDu\frac{1}{N^{2}}u^{T}Du also concentrates and the deviation is uniformly bounded for all uNu\in\mathbb{R}^{N}, which will be used in analyzing (8).

Lemma 3.5.

Under Assumption 1(A1), pp uniform, and Assumption 2. Suppose as N0N\to 0, ϵ0+\epsilon\to 0+ and ϵd/2=Ω(logNN)\epsilon^{d/2}=\Omega(\frac{\log N}{N}). Then, when NN is large enough, w.p. >12N9>1-2N^{-9},

1) The degree DiD_{i} concentrates for all ii, namely,

1NDi=m0p+O(ϵ,logNNϵd/2),i=1,,N.\frac{1}{N}D_{i}=m_{0}p+O\left(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}\right),\quad\forall i=1,\cdots,N. (19)

2) The from 1N2uTDu\frac{1}{N^{2}}u^{T}Du concentrates for all uu, namely,

1N2uTDu=1Nu2(m0p+O(ϵ,logNNϵd/2)),uN.\frac{1}{N^{2}}u^{T}Du=\frac{1}{N}\|u\|^{2}\left(m_{0}p+O\left(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}\right)\right),\quad\forall u\in\mathbb{R}^{N}. (20)

The constants in big-O in (19) and (20) are determined by (,h)({\mathcal{M}},h) and uniform for all ii and uu.

Part 2) immediately follows from Part 1), the latter being proved by standard concentration argument of independent sum and a union bound for NN events. With Lemma 3.5, the proof of the following proposition is similar to that of Proposition 3.1, and the difference lies in handling the denominator of the Rayleigh quotient in (8). The proofs of Lemma 3.5 and Proposition 3.6 are in Appendix C.1.

Proposition 3.6 (Eigenvalue UB of LrwL_{rw}).

Suppose {\mathcal{M}}, pp uniform, hh, KK, μk\mu_{k}, and ϵ\epsilon are under the same condition as in Proposition 3.1, then for sufficiently large NN, w.p. >12N94K2N10>1-2N^{-9}-4K^{2}N^{-10}, Di>0D_{i}>0 for all ii, and

λk(Lrw)μk+O(ϵ,logNNϵd/2),k=1,,K.\lambda_{k}(L_{rw})\leq\mu_{k}+O\left(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}\right),\quad k=1,\cdots,K.

4 Eigenvalue crude lower bound in Step 1

In this section, we prove O(1)O(1) eigenvalue LB in Step 1, first for LunL_{un}, and then the proof for LrwL_{rw} is similar.

We consider for t>0t>0 the operator t{\mathcal{L}}_{t} on H=L2(,dV)H=L^{2}({\mathcal{M}},dV) defined as

t:=IQt,tf(x)=f(x)Ht(x,y)f(y)𝑑V(y),fH.{\mathcal{L}}_{t}:=I-Q_{t},\quad{\mathcal{L}}_{t}f(x)=f(x)-\int_{{\mathcal{M}}}{H}_{t}(x,y)f(y)dV(y),\quad f\in H.

The semi-group operator QtQ_{t} is Hilbert-Schmidt, compact, and has eigenvalues and eigenfunctions as in (9). Thus, the operator t{\mathcal{L}}_{t} is self-adjoint and PSD, and has

tψk=(1etμk)ψk,k=1,2,{\mathcal{L}}_{t}\psi_{k}=(1-e^{-t\mu_{k}})\psi_{k},\quad k=1,2,\cdots

For any t>0t>0, the eigenvalues {1etμk}k\{1-e^{-t\mu_{k}}\}_{k} are ascending from 0 and have limit point 1. We denote f2=f,f\|f\|^{2}=\langle f,f\rangle for fHf\in H. By the variational principle, we have that when t>0t>0, for any kk,

1etμk=infLH,dim(L)=ksupfL,f20f,tff,f.1-e^{-t\mu_{k}}=\inf_{L\subset H,\,dim(L)=k}\sup_{f\in L,\,\|f\|^{2}\neq 0}\frac{\langle f,{\mathcal{L}}_{t}f\rangle}{\langle f,f\rangle}. (21)

For the first result, we assume that μk\mu_{k} are all of multiplicity 1 for simplicity. When population eigenvalues have greater than one multiplicity, the result extends by considering eigenspace rather than eigenvectors in the standard way, see Remark 5.

4.1 Un-normalized graph Laplacian eigenvalue crude LB

We now derive Step 1 for LunL_{un}, the result being summarized in the following proposition.

Proposition 4.1 (Initial crude eigenvalue LB of LunL_{un}).

Under Assumptions 1 (A1), suppose pp is uniform on {\mathcal{M}}, and hh is Gaussian. For fixed kmaxk_{max}\in\mathbb{N}, K=kmax+1K=k_{max}+1, suppose 0=μ1<<μK<0=\mu_{1}<\cdots<\mu_{K}<\infty are all of single multiplicity, and define

γK:=12min1kkmax(μk+1μk),\gamma_{K}:=\frac{1}{2}\min_{1\leq k\leq k_{max}}(\mu_{k+1}-\mu_{k}), (22)

γK>0\gamma_{K}>0 and is a fixed constant. Then there is a absolute constant cKc_{K} determined by {\mathcal{M}} and kmaxk_{max} (specifically, cK=c(μKγK)d/2γK2c_{K}=c(\frac{\mu_{K}}{\gamma_{K}})^{d/2}\gamma_{K}^{-2}, where cc is a constant depending on {\mathcal{M}}), such that, if as NN\to\infty, ϵ0+\epsilon\to 0+, and ϵd/2+2>cKlogNN\epsilon^{d/2+2}>c_{K}\frac{\log N}{N}, then for sufficiently large NN, w.p. >14K2N104N9>1-4K^{2}N^{-10}-4N^{-9},

λk(Lun)>μkγK,k=2,,K.\lambda_{k}(L_{un})>\mu_{k}-\gamma_{K},\quad k=2,\cdots,K.

We prove Proposition 4.1 in the end of this subsection after we introduce heat kernel interpolation and establish the needed lemmas.

Suppose {λk,vk}k=1K\{\lambda_{k},v_{k}\}_{k=1}^{K} are eigenvalue and eigenvectors of LunL_{un}, to construct a test function fkf_{k} on {\mathcal{M}} from the vector vkv_{k}, we define the interpolation mapping (the terminology “interpolation” is inherited from [6]) by the heat kernel with diffusion time rr, 0<r<ϵ0<r<\epsilon to be determined. Specifically, define

Ir[u](x):=1Nj=1NujHr(x,xj),Ir:NC(),I_{r}[u](x):=\frac{1}{N}\sum_{j=1}^{N}u_{j}{H}_{r}(x,x_{j}),\quad I_{r}:\mathbb{R}^{N}\to C^{\infty}({\mathcal{M}}),

and then for any t>0t>0,

Ir[u],QtIr[u]=1N2i,j=1NuiujH2r+t(xi,xj),Ir[u],Ir[u]=1N2i,j=1NuiujH2r(xi,xj).{\langle I_{r}[u],Q_{t}I_{r}[u]\rangle}=\frac{1}{N^{2}}\sum_{i,j=1}^{N}u_{i}u_{j}{H}_{2r+t}(x_{i},x_{j}),\quad{\langle I_{r}[u],I_{r}[u]\rangle}=\frac{1}{N^{2}}\sum_{i,j=1}^{N}u_{i}u_{j}{H}_{2r}(x_{i},x_{j}). (23)

We define the quadratic form

qs(u):=1N2i,j=1NuiujHs(xi,xj),s>0,uN.q_{s}(u):=\frac{1}{N^{2}}\sum_{i,j=1}^{N}u_{i}u_{j}{H}_{s}(x_{i},x_{j}),\quad s>0,\quad u\in\mathbb{R}^{N}.

We also define qs(0)q_{s}^{(0)} and qs(2)q_{s}^{(2)} as below, and then for any uNu\in\mathbb{R}^{N}, qs(u)=qs(0)(u)qs(2)(u)q_{s}(u)=q^{(0)}_{s}(u)-q^{(2)}_{s}(u), where

qs(0)(u):=1Ni=1Nui2(1Nj=1NHs(xi,xj)),qs(2)(u):=121N2i,j=1NHs(xi,xj)(uiuj)2\displaystyle q^{(0)}_{s}(u):=\frac{1}{N}\sum_{i=1}^{N}u_{i}^{2}\left(\frac{1}{N}\sum_{j=1}^{N}{H}_{s}(x_{i},x_{j})\right),\quad q^{(2)}_{s}(u):=\frac{1}{2}\frac{1}{N^{2}}\sum_{i,j=1}^{N}{H}_{s}(x_{i},x_{j})(u_{i}-u_{j})^{2} (24)

We will show that qs(0)(u)p1Nu2q^{(0)}_{s}(u)\approx p\frac{1}{N}\|u\|^{2} by concentration of the independent sum 1Nj=1NHs(xi,xj)\frac{1}{N}\sum_{j=1}^{N}{H}_{s}(x_{i},x_{j}); qs(2)(u)0q^{(2)}_{s}(u)\geq 0 by definition, and will be O(s)O(s) when uu is an eigenvector with u2=N\|u\|^{2}=N.

Lemma 4.2.

Under Assumptions 1 (A1), pp being uniform on {\mathcal{M}}. Suppose as N0N\to 0, s0+s\to 0+ and sd/2=Ω(logNN)s^{d/2}=\Omega(\frac{\log N}{N}). Then, when NN is large enough, w.p. >12N9>1-2N^{-9},

qs(0)(u)=1Nu2(p+O(logNNsd/2)),uN.q^{(0)}_{s}(u)=\frac{1}{N}\|u\|^{2}\left(p+O_{\mathcal{M}}(\sqrt{\frac{\log N}{Ns^{d/2}}})\right),\quad\forall u\in\mathbb{R}^{N}.

The notation O()O_{\mathcal{M}}(\cdot) indicates that the constant depends on {\mathcal{M}} and is uniform for all uu.

Proof of Lemma 4.2.

By definition, qs(0)(u)=1Ni=1Nui2(Ds)iq^{(0)}_{s}(u)=\frac{1}{N}\sum_{i=1}^{N}u_{i}^{2}(D_{s})_{i}, where (Ds)i:=1Nj=1NHs(xi,xj)(D_{s})_{i}:=\frac{1}{N}\sum_{j=1}^{N}{H}_{s}(x_{i},x_{j}), and {(Ds)i}i=1N\{(D_{s})_{i}\}_{i=1}^{N} are NN positive valued random variables. It suffices to show that with large enough NN, w.p. indicated in the lemma,

(Ds)i=p+O(logNNsd/2),i=1,,N.(D_{s})_{i}=p+O_{\mathcal{M}}(\sqrt{\frac{\log N}{Ns^{d/2}}}),\quad\forall i=1,\cdots,N. (25)

This can be proved using concentration argument, similar as in the proof of Lemma 3.5 1), where we use the boundedness of the heat kernel (14) in Lemma 2.2. The proof of (25) is given in Appendix C.2. Note that (25) is a property of the r.v. Hs(xi,xj){H}_{s}(x_{i},x_{j}) only, which is irrelevant to the vector uu. Thus the threshold of large NN in the lemma and the constant in big-OO depend on {\mathcal{M}} and are uniform for all uu. ∎

Lemma 4.3.

Under Assumptions 1 (pp can be non-uniform), hh being Gaussian, let 0<α<10<\alpha<1 be a fixed constant. Suppose ϵ0+\epsilon\to 0+ as NN\to\infty, then with sufficiently small ϵ\epsilon, for any realization of XX,

0qϵ(2)(u)=(1+O(ϵ(log1ϵ)2))uT(DW)uN2+u2NO(ϵ3),uN,0\leq q^{(2)}_{\epsilon}(u)=\left(1+{O}(\epsilon(\log\frac{1}{\epsilon})^{2})\right)\frac{u^{T}(D-W)u}{N^{2}}+\frac{\|u\|^{2}}{N}O(\epsilon^{3}),\quad\forall u\in\mathbb{R}^{N}, (26)

and

0qαϵ(2)(u)1.1αd/2uT(DW)uN2+u2NO(ϵ3),uN.0\leq q^{(2)}_{\alpha\epsilon}(u)\leq 1.1\alpha^{-d/2}\frac{u^{T}(D-W)u}{N^{2}}+\frac{\|u\|^{2}}{N}O(\epsilon^{3}),\quad\forall u\in\mathbb{R}^{N}. (27)

The constants in big-OO only depend on {\mathcal{M}} and are uniform for all uu and α\alpha.

Proof of Lemma 4.3.

For any uNu\in\mathbb{R}^{N}, qϵ(2)(u)=121N2i,j=1NHϵ(xi,xj)(uiuj)20q^{(2)}_{\epsilon}(u)=\frac{1}{2}\frac{1}{N^{2}}\sum_{i,j=1}^{N}{H}_{\epsilon}(x_{i},x_{j})(u_{i}-u_{j})^{2}\geq 0. Since ϵ=o(1)\epsilon=o(1), take tt in Lemma 2.2 to be ϵ\epsilon, when ϵ<ϵ0\epsilon<\epsilon_{0}, the three equations hold. By (13), truncate at an δϵ=6(10+d2)ϵlog1ϵ\delta_{\epsilon}=\sqrt{6(10+\frac{d}{2})\epsilon\log{\frac{1}{\epsilon}}} Euclidean ball,

qϵ(2)(u)=121N2i,j=1NHϵ(xi,xj)𝟏{xjBδϵ(xi)}(uiuj)2+O(ϵ10)121N2i,j=1N(uiuj)2.\displaystyle q^{(2)}_{\epsilon}(u)=\frac{1}{2}\frac{1}{N^{2}}\sum_{i,j=1}^{N}{H}_{\epsilon}(x_{i},x_{j}){\bf 1}_{\{x_{j}\in B_{\delta_{\epsilon}}(x_{i})\}}(u_{i}-u_{j})^{2}+O(\epsilon^{10})\frac{1}{2}\frac{1}{N^{2}}\sum_{i,j=1}^{N}(u_{i}-u_{j})^{2}.

By that 1N2i,j=1N(uiuj)22Nu2\frac{1}{N^{2}}\sum_{i,j=1}^{N}(u_{i}-u_{j})^{2}\leq\frac{2}{N}\|u\|^{2}, and apply (12) with the short hand that O~(ϵ)\tilde{O}(\epsilon) stands for O(ϵ(log1ϵ)2){O}(\epsilon(\log\frac{1}{\epsilon})^{2}),

qϵ(2)(u)\displaystyle q^{(2)}_{\epsilon}(u) =121N2i,j=1N(Kϵ(xi,xj)(1+O~(ϵ))+O(ϵ3))𝟏{xjBδϵ(xi)}(uiuj)2+O(ϵ10)u2N\displaystyle=\frac{1}{2}\frac{1}{N^{2}}\sum_{i,j=1}^{N}\left(K_{\epsilon}(x_{i},x_{j})(1+\tilde{O}(\epsilon))+O(\epsilon^{3})\right){\bf 1}_{\{x_{j}\in B_{\delta_{\epsilon}}(x_{i})\}}(u_{i}-u_{j})^{2}+O(\epsilon^{10})\frac{\|u\|^{2}}{N}
=(1+O~(ϵ))121N2i,j=1NKϵ(xi,xj)𝟏{xjBδϵ(xi)}(uiuj)2+O(ϵ3)u2N.\displaystyle=(1+\tilde{O}(\epsilon))\frac{1}{2}\frac{1}{N^{2}}\sum_{i,j=1}^{N}K_{\epsilon}(x_{i},x_{j}){\bf 1}_{\{x_{j}\in B_{\delta_{\epsilon}}(x_{i})\}}(u_{i}-u_{j})^{2}+O(\epsilon^{3})\frac{\|u\|^{2}}{N}.

By the truncation argument for Kϵ(xi,xj)K_{\epsilon}(x_{i},x_{j}), we have that

121N2i,j=1NKϵ(xi,xj)𝟏{xjBδϵ(xi)}(uiuj)2=uT(DW)uN2+u2NO(ϵ10).\frac{1}{2}\frac{1}{N^{2}}\sum_{i,j=1}^{N}K_{\epsilon}(x_{i},x_{j}){\bf 1}_{\{x_{j}\in B_{\delta_{\epsilon}}(x_{i})\}}(u_{i}-u_{j})^{2}=\frac{u^{T}(D-W)u}{N^{2}}+\frac{\|u\|^{2}}{N}O(\epsilon^{10}). (28)

Putting together, we have

qϵ(2)(u)=(1+O~(ϵ))(uT(DW)uN2+u2NO(ϵ10))+O(ϵ3)u2N,q^{(2)}_{\epsilon}(u)=(1+\tilde{O}(\epsilon))\left(\frac{u^{T}(D-W)u}{N^{2}}+\frac{\|u\|^{2}}{N}O(\epsilon^{10})\right)+O(\epsilon^{3})\frac{\|u\|^{2}}{N},

which proves (26).

To prove (27), since α<1\alpha<1 is a fixed positive constant, 0<αϵ<ϵ<ϵ00<\alpha\epsilon<\epsilon<\epsilon_{0}, we then apply Lemma 2.2 with tt therein being αϵ\alpha\epsilon. With a truncation at δαϵ\delta_{\alpha\epsilon}-Euclidean ball, and by (12),

qαϵ(2)(u)\displaystyle q^{(2)}_{\alpha\epsilon}(u) =121N2i,j=1N(Kαϵ(xi,xj)(1+O~(αϵ))+O(α3ϵ3))𝟏{xjBδαϵ(xi)}(uiuj)2+u2NO(ϵ10)\displaystyle=\frac{1}{2}\frac{1}{N^{2}}\sum_{i,j=1}^{N}\left(K_{\alpha\epsilon}(x_{i},x_{j})(1+\tilde{O}(\alpha\epsilon))+O(\alpha^{3}\epsilon^{3})\right){\bf 1}_{\{x_{j}\in B_{\delta_{\alpha\epsilon}}(x_{i})\}}(u_{i}-u_{j})^{2}+\frac{\|u\|^{2}}{N}O(\epsilon^{10})
=(1+O~(ϵ))121N2i,j=1NKαϵ(xi,xj)𝟏{xjBδαϵ(xi)}(uiuj)2+u2NO(ϵ3).\displaystyle=(1+\tilde{O}(\epsilon))\frac{1}{2}\frac{1}{N^{2}}\sum_{i,j=1}^{N}K_{\alpha\epsilon}(x_{i},x_{j}){\bf 1}_{\{x_{j}\in B_{\delta_{\alpha\epsilon}}(x_{i})\}}(u_{i}-u_{j})^{2}+\frac{\|u\|^{2}}{N}O(\epsilon^{3}).

Suppose ϵ\epsilon is sufficiently small such that 1+O~(ϵ)1+\tilde{O}(\epsilon) is less than 1.1. Note that

Kαϵ(x,y)=1(4παϵ)d/2exy24αϵ1αd/21(4πϵ)d/2exy24ϵ=αd/2Kϵ(x,y),x,y,K_{\alpha\epsilon}(x,y)=\frac{1}{(4\pi\alpha\epsilon)^{d/2}}e^{-\frac{\|x-y\|^{2}}{4\alpha\epsilon}}\leq\frac{1}{\alpha^{d/2}}\frac{1}{(4\pi\epsilon)^{d/2}}e^{-\frac{\|x-y\|^{2}}{4\epsilon}}=\alpha^{-d/2}K_{\epsilon}(x,y),\quad\forall x,y\in{\mathcal{M}}, (29)

then, by that 𝟏{xjBδαϵ(xi)}𝟏{xjBδϵ(xi)}{\bf 1}_{\{x_{j}\in B_{\delta_{\alpha\epsilon}}(x_{i})\}}\leq{\bf 1}_{\{x_{j}\in B_{\delta_{\epsilon}}(x_{i})\}}, and again with (28),

qαϵ(2)(u)\displaystyle q^{(2)}_{\alpha\epsilon}(u) 1.1121N2i,j=1Nαd/2Kϵ(xi,xj)𝟏{xjBδϵ(xi)}(uiuj)2+u2NO(ϵ3)\displaystyle\leq 1.1\frac{1}{2}\frac{1}{N^{2}}\sum_{i,j=1}^{N}\alpha^{-d/2}K_{\epsilon}(x_{i},x_{j}){\bf 1}_{\{x_{j}\in B_{\delta_{\epsilon}}(x_{i})\}}(u_{i}-u_{j})^{2}+\frac{\|u\|^{2}}{N}O(\epsilon^{3})
=1.1αd/2(uT(DW)uN2+u2NO(ϵ10))+u2NO(ϵ3),\displaystyle=1.1\alpha^{-d/2}\left(\frac{u^{T}(D-W)u}{N^{2}}+\frac{\|u\|^{2}}{N}O(\epsilon^{10})\right)+\frac{\|u\|^{2}}{N}O(\epsilon^{3}),

and this proves (27). ∎

We are ready to prove Proposition 4.1.

Proof of Proposition 4.1.

For fixed kmaxk_{max}, since γK<μK\gamma_{K}<\mu_{K}, define

δ:=0.5γKμK<0.5,\delta:=\frac{0.5\gamma_{K}}{\mu_{K}}<0.5, (30)

δ>0\delta>0 and is a fixed constant determined by {\mathcal{M}} and kmaxk_{max}. For ϵ>0\epsilon>0, let

r:=δϵ2,t=ϵ2r=(1δ)ϵ.r:=\frac{\delta\epsilon}{2},\quad t=\epsilon-2r=(1-\delta)\epsilon.

For Lunvk=λkvkL_{un}v_{k}=\lambda_{k}v_{k}, where vkv_{k} are normalized s.t.

1NvkTvl=δkl,1k,lN,\frac{1}{N}v_{k}^{T}v_{l}=\delta_{kl},\quad 1\leq k,l\leq N, (31)

let fk=Ir[vk]f_{k}=I_{r}[v_{k}], k=1,,Kk=1,\cdots,K, then fkC()Hf_{k}\in C^{\infty}({\mathcal{M}})\subset H. Because ϵd/2+2>cKlogNN\epsilon^{d/2+2}>c_{K}\frac{\log N}{N}, and ϵ=o(1)\epsilon=o(1), ϵd/2=Ω(logNN)\epsilon^{d/2}=\Omega(\frac{\log N}{N}). Thus, under the assumption of the current proposition, the condition needed in Proposition 3.1 is satisfied, and then when NN is sufficiently large, there is an event EUBE_{UB} which happens w.p. >14K2N10>1-4K^{2}N^{-10}, under which

λkμk+0.1μK1.1μK,1kK.\lambda_{k}\leq\mu_{k}+0.1\mu_{K}\leq 1.1\mu_{K},\quad 1\leq k\leq K. (32)

We first show that {fj}j=1K\{f_{j}\}_{j=1}^{K} are linearly independent by considering fk,fl\langle f_{k},f_{l}\rangle. By definition, for 1kK1\leq k\leq K,

fk,fk=q2r(vk)=qδϵ(0)(vk)qδϵ(2)(vk),\langle f_{k},f_{k}\rangle=q_{2r}(v_{k})=q^{(0)}_{\delta\epsilon}(v_{k})-q^{(2)}_{\delta\epsilon}(v_{k}),

and for klk\neq l, 1k,lK1\leq k,l\leq K,

(fk±fl),(fk±fl)=q2r(vk±vl)=qδϵ(0)(vk±vl)qδϵ(2)(vk±vl).\langle(f_{k}\pm f_{l}),(f_{k}\pm f_{l})\rangle=q_{2r}(v_{k}\pm v_{l})=q^{(0)}_{\delta\epsilon}(v_{k}\pm v_{l})-q^{(2)}_{\delta\epsilon}(v_{k}\pm v_{l}).

Because s=δϵs=\delta\epsilon, under the condition of the proposition, ss satisfies the condition in Lemma 4.2, and thus, with sufficiently large NN, there is an event E(0)E^{(0)} which happens w.p. >12N9>1-2N^{-9}, under which

qδϵ(0)(vk)=p+O(logNNϵd/2),1kK;qδϵ(0)(vk±vl)=2p+O(logNNϵd/2),kl,1k,lK,q^{(0)}_{\delta\epsilon}(v_{k})=p+O(\sqrt{\frac{\log N}{N\epsilon^{d/2}}}),\quad 1\leq k\leq K;\quad q^{(0)}_{\delta\epsilon}(v_{k}\pm v_{l})=2p+O(\sqrt{\frac{\log N}{N\epsilon^{d/2}}}),\quad k\neq l,1\leq k,l\leq K,

where we used that the factor δd/2\delta^{-d/2} is a fixed constant. Meanwhile, applying (27) in Lemma 4.3 where α=δ\alpha=\delta, and note that

vkT(DW)vkN2=pϵλk;(vk±vl)T(DW)(vk±vl)N2=pϵ(λk+λl),kl,1k,lK,\frac{v_{k}^{T}(D-W)v_{k}}{N^{2}}=p\epsilon\lambda_{k};\quad\frac{(v_{k}\pm v_{l})^{T}(D-W)(v_{k}\pm v_{l})}{N^{2}}=p\epsilon(\lambda_{k}+\lambda_{l}),\quad k\neq l,1\leq k,l\leq K,

we have that

qδϵ(2)(vk)=O(δd/2)pϵλk+O(ϵ3),1kK,qδϵ(2)(vk±vl)=O(δd/2)pϵ(λk+λl)+2O(ϵ3),kl,\begin{split}q^{(2)}_{\delta\epsilon}(v_{k})&=O(\delta^{-d/2})p\epsilon\lambda_{k}+O(\epsilon^{3}),\quad 1\leq k\leq K,\\ q^{(2)}_{\delta\epsilon}(v_{k}\pm v_{l})&=O(\delta^{-d/2})p\epsilon(\lambda_{k}+\lambda_{l})+2O(\epsilon^{3}),\quad k\neq l,\end{split}

and by that λk,λl1.1μK\lambda_{k},\,\lambda_{l}\leq 1.1\mu_{K} which is a fixed constant, so is δ\delta, we have that

qδϵ(2)(vk)=O(ϵ),1kK;qδϵ(2)(vk±vl)=O(ϵ),kl, 1k,lK.q^{(2)}_{\delta\epsilon}(v_{k})=O(\epsilon),\quad 1\leq k\leq K;\quad q^{(2)}_{\delta\epsilon}(v_{k}\pm v_{l})=O(\epsilon),\quad k\neq l,\,1\leq k,l\leq K. (33)

Putting together, we have that

fk,fk=p+O(logNNϵd/2,ϵ),1kK,fk,fl=14(qδϵ(vk+vl)qδϵ(vkvl))=O(logNNϵd/2,ϵ),kl, 1k,lK.\begin{split}\langle f_{k},f_{k}\rangle&=p+O(\sqrt{\frac{\log N}{N\epsilon^{d/2}}},\epsilon),\quad 1\leq k\leq K,\\ \langle f_{k},f_{l}\rangle&=\frac{1}{4}(q_{\delta\epsilon}(v_{k}+v_{l})-q_{\delta\epsilon}(v_{k}-v_{l}))=O(\sqrt{\frac{\log N}{N\epsilon^{d/2}}},\epsilon),\quad k\neq l,\,1\leq k,l\leq K.\end{split} (34)

This proves linear independence of {fj}j=1K\{f_{j}\}_{j=1}^{K} when NN is large enough, since O(logNNϵd/2,ϵ)=o(1)O(\sqrt{\frac{\log N}{N\epsilon^{d/2}}},\epsilon)=o(1).

We consider first KK eigenvalues of t{\mathcal{L}}_{t}, t=(1δ)ϵt=(1-\delta)\epsilon. For each 2kK2\leq k\leq K, let Lk=Span{f1,,fk}L_{k}=\text{Span}\{f_{1},\cdots,f_{k}\} be a kk-dimensional subspace in HH, then by (21),

1e(1δ)ϵμksupfLk,f20f,tff,f=f,ff,Qtff,f.1-e^{-(1-\delta)\epsilon\mu_{k}}\leq\sup_{f\in L_{k},\,\|f\|^{2}\neq 0}\frac{\langle f,{\mathcal{L}}_{t}f\rangle}{\langle f,f\rangle}=\frac{\langle f,f\rangle-\langle f,Q_{t}f\rangle}{\langle f,f\rangle}. (35)

For any fLkf\in L_{k}, f20\|f\|^{2}\neq 0, there is ckc\in\mathbb{R}^{k}, c0c\neq 0, such that f=j=1kcjfjf=\sum_{j=1}^{k}c_{j}f_{j}. Thus

f=j=1kcjIr[vj]=Ir[j=1kcjvj]=Ir[v],v:=j=1kcjvj.f=\sum_{j=1}^{k}c_{j}I_{r}[v_{j}]=I_{r}[\sum_{j=1}^{k}c_{j}v_{j}]=I_{r}[v],\quad v:=\sum_{j=1}^{k}c_{j}v_{j}.

Because vjv_{j} are orthogonal, vj2=N\|v_{j}\|^{2}=N, we have that

v2N=c2,vT(DW)vN2=j=1kcj2(pϵλj)λkpϵc2.\frac{\|v\|^{2}}{N}=\|c\|^{2},\quad\frac{v^{T}(D-W)v}{N^{2}}=\sum_{j=1}^{k}c_{j}^{2}(p\epsilon\lambda_{j})\leq\lambda_{k}p\epsilon\|c\|^{2}.

By definition, f,f=qδϵ(v)\langle f,f\rangle=q_{\delta\epsilon}(v), and f,Qtf=qϵ(v)\langle f,Q_{t}f\rangle=q_{\epsilon}(v).

We first upper bound the numerator of the r.h.s. of (35). By that qδϵ(2)(v)0q^{(2)}_{\delta\epsilon}(v)\geq 0,

f,ff,Qtf\displaystyle\langle f,f\rangle-\langle f,Q_{t}f\rangle =qδϵ(v)qϵ(v)=qδϵ(0)(v)qδϵ(2)(v)qϵ(0)(v)+qϵ(2)(v)\displaystyle=q_{\delta\epsilon}(v)-q_{\epsilon}(v)=q^{(0)}_{\delta\epsilon}(v)-q^{(2)}_{\delta\epsilon}(v)-q^{(0)}_{\epsilon}(v)+q^{(2)}_{\epsilon}(v)
(qδϵ(0)(v)qϵ(0)(v))+qϵ(2)(v).\displaystyle\leq(q^{(0)}_{\delta\epsilon}(v)-q^{(0)}_{\epsilon}(v))+q^{(2)}_{\epsilon}(v). (36)

We have already obtained the good event E(0)E^{(0)} when applying Lemma 4.2 with s=δϵs=\delta\epsilon. We apply the lemma again to s=ϵs=\epsilon, which gives that with sufficiently large NN there is an event E(1)E^{(1)} which happens w.p.>12N9w.p.>1-2N^{-9}, and then under E(0)E(1)E^{(0)}\cap E^{(1)},

qδϵ(0)(v)=c2(p+O(δd/2logNNϵd/2)),qϵ(0)(v)=c2(p+O(logNNϵd/2)).q^{(0)}_{\delta\epsilon}(v)=\|c\|^{2}(p+O_{\mathcal{M}}(\sqrt{\delta^{-d/2}\frac{\log N}{N\epsilon^{d/2}}})),\quad q^{(0)}_{\epsilon}(v)=\|c\|^{2}(p+O_{\mathcal{M}}(\sqrt{\frac{\log N}{N\epsilon^{d/2}}})). (37)

We track the constant dependence here: the constant in O()O_{\mathcal{M}}(\cdot) in Lemma 4.2 is only depending on {\mathcal{M}} (and not on KK), thus we use the notation O()O_{{\mathcal{M}}}(\cdot) in (37) and below to emphasize that the constant is {\mathcal{M}}-dependent only and independent from KK. Then (37) gives that

qδϵ(0)(v)qϵ(0)(v)=c2δd/4O(logNNϵd/2).q^{(0)}_{\delta\epsilon}(v)-q^{(0)}_{\epsilon}(v)=\|c\|^{2}\delta^{-d/4}O_{\mathcal{M}}\left(\sqrt{\frac{\log N}{N\epsilon^{d/2}}}\right).

The UB of qϵ(2)(v)q^{(2)}_{\epsilon}(v) follows from (26) in Lemma 4.3, with the shorthand that O~(ϵ)\tilde{O}(\epsilon) stands for O(ϵ(log1ϵ)2){O}(\epsilon(\log\frac{1}{\epsilon})^{2}),

qϵ(2)(v)=vT(DW)vN2(1+O~(ϵ))+c2O(ϵ3)ϵc2(λkp(1+O~(ϵ))+O(ϵ2)).q^{(2)}_{\epsilon}(v)=\frac{v^{T}(D-W)v}{N^{2}}(1+\tilde{O}(\epsilon))+\|c\|^{2}O(\epsilon^{3})\leq\epsilon\|c\|^{2}(\lambda_{k}p(1+\tilde{O}(\epsilon))+O(\epsilon^{2})).

Thus, (36) continues as

f,ff,Qtfϵc2(λkp(1+O~(ϵ))+O(ϵ2)+δd/4O(1ϵlogNNϵd/2)).\langle f,f\rangle-\langle f,Q_{t}f\rangle\leq\epsilon\|c\|^{2}\left(\lambda_{k}p(1+\tilde{O}(\epsilon))+O(\epsilon^{2})+\delta^{-d/4}O_{\mathcal{M}}(\frac{1}{\epsilon}\sqrt{\frac{\log N}{N\epsilon^{d/2}}})\right). (38)

Next we lower bound the denominator f,f\langle f,f\rangle. Here we use (27) in Lemma 4.3, which gives that

0qδϵ(2)(v)Θ(δd/2)vT(DW)vN2+c2O(ϵ3)ϵc2(λkpΘ(δd/2)+O(ϵ2)).0\leq q^{(2)}_{\delta\epsilon}(v)\leq\Theta(\delta^{-d/2})\frac{v^{T}(D-W)v}{N^{2}}+\|c\|^{2}O(\epsilon^{3})\leq\epsilon\|c\|^{2}\left(\lambda_{k}p\Theta(\delta^{-d/2})+O(\epsilon^{2})\right).

Note that we assume under event EUBE_{UB} so that the eigenvalue UB (32) holds, thus λkpΘ(δd/2)+O(ϵ2)=O(1)\lambda_{k}p\Theta(\delta^{-d/2})+O(\epsilon^{2})=O(1). Together with that δ\delta is a fixed constant, we have that

qδϵ(2)(v)=c2O(ϵ).q^{(2)}_{\delta\epsilon}(v)=\|c\|^{2}O(\epsilon).

Then, again under E(1)E^{(1)},

f,f=qδϵ(0)(v)qδϵ(2)(v)=c2(p+O(δd/2logNNϵd/2)O(ϵ))c2(pO(ϵ,logNNϵd/2)).\langle f,f\rangle=q^{(0)}_{\delta\epsilon}(v)-q^{(2)}_{\delta\epsilon}(v)=\|c\|^{2}\left(p+O(\sqrt{\delta^{-d/2}\frac{\log N}{N\epsilon^{d/2}}})-O(\epsilon)\right)\geq\|c\|^{2}\left(p-O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})\right).

Putting together, and by that λk1.1μK\lambda_{k}\leq 1.1\mu_{K}, we have that

f,ff,Qtff,fϵ(λkp+O~(ϵ)+δd/4O(1ϵlogNNϵd/2))pO(ϵ,logNNϵd/2)ϵ(λk+O~(ϵ)+CϵlogNNϵd/2),\frac{\langle f,f\rangle-\langle f,Q_{t}f\rangle}{\langle f,f\rangle}\leq\frac{\epsilon\left(\lambda_{k}p+\tilde{O}(\epsilon)+\delta^{-d/4}O_{\mathcal{M}}(\frac{1}{\epsilon}\sqrt{\frac{\log N}{N\epsilon^{d/2}}})\right)}{p-O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})}\leq\epsilon\left(\lambda_{k}+\tilde{O}(\epsilon)+\frac{C}{\epsilon}\sqrt{\frac{\log N}{N\epsilon^{d/2}}}\right),

where C=c()δd/4C=c({\mathcal{M}})\delta^{-d/4}, and c()c({\mathcal{M}}) is a constant only depending on {\mathcal{M}}. We set

cK:=(C0.1γK)2=(c()0.1)2δd/2γK2,c_{K}:=(\frac{C}{0.1\gamma_{K}})^{2}=(\frac{c({\mathcal{M}})}{0.1})^{2}\delta^{-d/2}\gamma_{K}^{-2},

and since we assume ϵd/2+2>cKlogNN\epsilon^{d/2+2}>c_{K}\frac{\log N}{N} in the current proposition, we have that CϵlogNNϵd/2<0.1γK\frac{C}{\epsilon}\sqrt{\frac{\log N}{N\epsilon^{d/2}}}<0.1\gamma_{K}. Then, comparing to l.h.s. of (35), we have that

1e(1δ)ϵμkf,ff,Qtff,fϵ(λk+O~(ϵ)+0.1γK).1-e^{-(1-\delta)\epsilon\mu_{k}}\leq\frac{\langle f,f\rangle-\langle f,Q_{t}f\rangle}{\langle f,f\rangle}\leq\epsilon\left(\lambda_{k}+\tilde{O}(\epsilon)+0.1\gamma_{K}\right).

By the relation that 1exxx21-e^{-x}\geq x-x^{2} for any x0x\geq 0, 1e(1δ)ϵμkϵ(1δ)(μk(1δ)ϵμk2)1-e^{-(1-\delta)\epsilon\mu_{k}}\geq\epsilon(1-\delta)\left(\mu_{k}-(1-\delta)\epsilon\mu_{k}^{2}\right), and when ϵ\epsilon is sufficiently small s.t. ϵμk2ϵ(1.1μK)2<0.1γK\epsilon\mu_{k}^{2}\leq\epsilon(1.1\mu_{K})^{2}<0.1\gamma_{K},

1e(1δ)ϵμkϵ(1δ)(μk0.1γK)>0.1-e^{-(1-\delta)\epsilon\mu_{k}}\geq\epsilon(1-\delta)\left(\mu_{k}-0.1\gamma_{K}\right)>0.

Noting that for k2k\geq 2, μkμ22γK>0\mu_{k}\geq\mu_{2}\geq 2\gamma_{K}>0, because μ1=0\mu_{1}=0. Thus, when ϵ\epsilon is sufficiently small and the O~(ϵ)\tilde{O}(\epsilon) term is less than 0.1γK0.1\gamma_{K}, under the good events E(1)EUBE^{(1)}\cap E_{UB}, which happens w.p. >14K2N104N9>1-4K^{2}N^{-10}-4N^{-9}, we have that

0<(1δ)(μk0.1γK)λk+O~(ϵ)+0.1γK<λk+0.2γK.0<(1-\delta)(\mu_{k}-0.1\gamma_{K})\leq\lambda_{k}+\tilde{O}(\epsilon)+0.1\gamma_{K}<\lambda_{k}+0.2\gamma_{K}.

Recall that by definition (30), δμK=0.5γK\delta\mu_{K}=0.5\gamma_{K}, then δμkδμK=0.5γK\delta\mu_{k}\leq\delta\mu_{K}=0.5\gamma_{K}, also 0<δ<0.50<\delta<0.5. Re-arranging the terms gives that μk<λk+0.8γK\mu_{k}<\lambda_{k}+0.8\gamma_{K}. This can be verified for all 2kK2\leq k\leq K, and note that the good event E(1)E^{(1)} is w.r.t XX, and EUBE_{UB} is constructed for fixed kmaxk_{max}, and none is for specific kKk\leq K. ∎

4.2 Random-walk graph Laplacian eigenvalue crude LB

The counterpart result of random-walk graph Laplacian is the following proposition. It replaces Proposition 3.1 with Proposition 3.6 in obtaining the eigenvalue UB in the analysis, and consequently the high probability differs slightly.

Proposition 4.4 (Initial crude eigenvalue LB of LrwL_{rw}).

Under the same condition and setting of {\mathcal{M}}, pp being uniform, hh being Gaussian, and kmaxk_{max}, μk\mu_{k}, ϵ\epsilon same as in Proposition 4.1. Then, for sufficiently large NN, w.p.>14K2N106N9>1-4K^{2}N^{-10}-6N^{-9}, λk(Lrw)>μkγK\lambda_{k}(L_{rw})>\mu_{k}-\gamma_{K}, for k=2,,Kk=2,\cdots,K.

The proof is similar to that of Proposition 4.1 and left to Appendix C.2. The difference lies in that the empirical eigenvectors vkv_{k} are DD-orthonormal rather than orthonormal, and the degree concentration Lemma 3.5 is used to relate v2N\frac{\|v\|^{2}}{N} with 1N2vTDv\frac{1}{N^{2}}v^{T}Dv for arbitrary vector vv.

5 Steps 2-3 and eigen-convergence

Refer to caption
Figure 1: Population eigenvalues μk\mu_{k} of Δ-\Delta, and empirical eigenvalues λk\lambda_{k} of graph Laplacian matrix LNL_{N}, LNL_{N} can be LunL_{un} or LrwL_{rw}. The positive integer kmaxk_{max} is fixed, and the constant γK\gamma_{K} is half of the minimum first-KK eigen-gaps, defined as in (22). Eigenvalue UB and initial LB are proved for kKk\leq K, which guarantees (41). Extending to greater than one multiplicity by defining γK\gamma_{K} as in (46).

In this section, we obtain eigen-convergence rate of LunL_{un} and LrwL_{rw} from the initial crude eigenvalue bound in Step 1. We first derive the Steps 2-3 for LunL_{un}, and the proof for LrwL_{rw} is similar.

5.1 Step 2 eigenvector consistency

In Step 1, the crude bound of eigenvalue (the UB already matches the form rate, the LB is crude) gives that for fixed kmaxk_{max} and at large NN, each λk\lambda_{k} will fall into the interval (μkγK,μk+γK)(\mu_{k}-\gamma_{K},\mu_{k}+\gamma_{K}), where γK\gamma_{K} is less than half of the smallest eigenvalue gaps (μ2μ1)(\mu_{2}-\mu_{1}), …, (μkmax+1μkmax)(\mu_{k_{max}+1}-\mu_{k_{max}}), illustrated in Fig. 1. This means that λk\lambda_{k} is separated from neighboring μk1\mu_{k-1} and μk+1\mu_{k+1} by an O(1)O(1) distance away. This O(1)O(1) initial separation is enough for proving eigenvector consistency up to the point-wise rate, which is a standard argument, see e.g. proof of Theorem 2.6 part 2) in [7]. In below we provide an informal explanation and then the formal statement in Proposition 5.2, with a proof for completeness.

We first give an illustrative informal derivation. Take k=2k=2 for example, let LN=LunL_{N}=L_{un}, LNuk=λkukL_{N}u_{k}=\lambda_{k}u_{k}, and we want to show that u2u_{2} and ρXψ2\rho_{X}\psi_{2} are aligned.

r2:=LN(ρXψ2)ρX(Δ)ψ2N,r2(i)=LN(ρXψ2)(xi)(Δ)ψ2(xi),r_{2}:=L_{N}(\rho_{X}\psi_{2})-\rho_{X}(-\Delta)\psi_{2}\in\mathbb{R}^{N},\quad r_{2}(i)=L_{N}(\rho_{X}\psi_{2})(x_{i})-(-\Delta)\psi_{2}(x_{i}),

the point-wise convergence of graph Laplacian gives LL^{\infty} bound of the residual vector r2r_{2}, suppose r22ερXψ22\|r_{2}\|_{2}\leq\varepsilon\|\rho_{X}\psi_{2}\|_{2}. Meanwhile, for any l=1,3,,Nl=1,3,\cdots,N, the crude bound of eigenvalues λ3\lambda_{3} gives that

λ3>μ2+γK,\lambda_{3}>\mu_{2}+\gamma_{K},

where γK>0\gamma_{K}>0 is an O(1)O(1) constant determined by kmaxk_{max} and {\mathcal{M}}. Because empirical eigenvalues are sorted, λl\lambda_{l} for l3l\geq 3 are also γK\gamma_{K} away from μ2\mu_{2}. As a result,

|λlμ2|>γK>0,l2,1lN.|\lambda_{l}-\mu_{2}|>\gamma_{K}>0,\quad l\neq 2,\quad 1\leq l\leq N.

Then we use the relation that for each l2l\neq 2, ulTr2=ulT(LN(ρXψ2)μ2ρXψ2)=(λlμ2)ulT(ρXψ2)u_{l}^{T}r_{2}=u_{l}^{T}(L_{N}(\rho_{X}\psi_{2})-\mu_{2}\rho_{X}\psi_{2})=(\lambda_{l}-\mu_{2})u_{l}^{T}(\rho_{X}\psi_{2}), which gives that

|ulT(ρXψ2)|=|ulTr2||λlμ2|εγKul2ρXψ22.|u_{l}^{T}(\rho_{X}\psi_{2})|=\frac{|u_{l}^{T}r_{2}|}{|\lambda_{l}-\mu_{2}|}\leq\frac{\varepsilon}{\gamma_{K}}\|u_{l}\|_{2}\|\rho_{X}\psi_{2}\|_{2}.

This shows that ρXψ2\rho_{X}\psi_{2} has O(ε)O(\varepsilon) alignment with all the other eigenvectors than u2u_{2}, and since {u1,,uN}\{u_{1},\cdots,u_{N}\} are orthogonal basis in N\mathbb{R}^{N}, this guarantees 1O(ε)1-O(\varepsilon) alignment between ρXψ2\rho_{X}\psi_{2} and u2u_{2}.

To proceed, we use the point-wise rate of graph Laplacian with C2C^{2} kernel hh as in the next theorem. The analysis of point-wise convergence was given in [27] and [9]: The original theorem in [27] considers the normalized graph Laplacian (ID1W)(I-D^{-1}W). The analysis is similar for (DW)(D-W) and leads to the same rate, which was derived in [9] under the setting of variable kernel bandwidth. These previous works consider a fixed point x0x_{0} on {\mathcal{M}}, and since the concentration result has exponentially high probability, it directly gives the version of uniform error bound at every data point xix_{i}, which is needed here.

Theorem 5.1 ([27, 9]).

Under Assumptions 1 and 2, if as NN\to\infty, ϵ0+\epsilon\to 0+, ϵd/2+1=Ω(logNN)\epsilon^{d/2+1}=\Omega(\frac{\log N}{N}), then for any fC4()f\in C^{4}({\mathcal{M}}),

1) When NN is large enough, w.p. >14N9>1-4N^{-9},

1ϵm22m0((ID1W)(ρXf))i=Δp2f(xi)+εi,sup1iN|εi|=O(ϵ)+O(logNNϵd/2+1).\frac{1}{\epsilon\frac{m_{2}}{2m_{0}}}\left((I-D^{-1}W)(\rho_{X}f)\right)_{i}=-\Delta_{p^{2}}f(x_{i})+\varepsilon_{i},\quad\sup_{1\leq i\leq N}|\varepsilon_{i}|=O(\epsilon)+O(\sqrt{\frac{\log N}{N\epsilon^{d/2+1}}}).

2) When NN is large enough, w.p. >12N9>1-2N^{-9},

1ϵm22p(xi)N((DW)(ρXf))i=Δp2f(xi)+εi,sup1iN|εi|=O(ϵ)+O(logNNϵd/2+1).\frac{1}{\epsilon\frac{m_{2}}{2}p(x_{i})N}\left((D-W)(\rho_{X}f)\right)_{i}=-\Delta_{p^{2}}f(x_{i})+\varepsilon_{i},\quad\sup_{1\leq i\leq N}|\varepsilon_{i}|=O(\epsilon)+O(\sqrt{\frac{\log N}{N\epsilon^{d/2+1}}}).

The constants in the big-O notations depend on {\mathcal{M}}, pp and the C4C^{4} norm of ff.

Note that Theorem 5.1 holds for non-uniform pp, while in our eigen-convergence analysis of graph Laplacian with WW in below, we only use the result when pp is uniform. Meanwhile, similar to Theorem 3.2, Assumption 2(C3) may be relaxed for Theorem 5.1 to hold, c.f. Remark 1.

Proof of Theorem 5.1.

Consider the NN events such that εi\varepsilon_{i} is less than the error bound. For each of the ii-th event, condition on xix_{i}, Theorem 3.8 in [9] can be directly used to show that the event holds w.p. >14N10>1-4N^{-10} for the case 1) random-walk graph Laplacian. For the case 2) un-normalized graph Laplacian, adopting the same technique of Theorem 3.6 in [9] proves the same rate as for the fixed-bandwidth kernel, and gives that the event holds w.p. >12N10>1-2N^{-10}. Specifically, the proof is by showing the concentration of the 1ϵNj=1NKϵ(xi,xj)(f(xj)f(xi))\frac{1}{\epsilon N}\sum_{j=1}^{N}K_{\epsilon}(x_{i},x_{j})(f(x_{j})-f(x_{i})), which is an independent summation condition on xix_{i}. The r.v. Hj:=1ϵKϵ(xi,xj)(f(xj)f(xi))H_{j}:=\frac{1}{\epsilon}K_{\epsilon}(x_{i},x_{j})(f(x_{j})-f(x_{i})), jij\neq i, has expectation 𝔼Hj=m22p(xi)Δp2f(xi)+Of,p(ϵ)\mathbb{E}H_{j}=\frac{m_{2}}{2}p(x_{i})\Delta_{p^{2}}f(x_{i})+O_{f,p}(\epsilon), and 𝔼Hj2\mathbb{E}H_{j}^{2} can be shown to be bounded by Θ(ϵd/21)\Theta(\epsilon^{-d/2-1}), and |Hj||H_{j}| is also bounded by Θ(ϵd/21)\Theta(\epsilon^{-d/2-1}), following the same calculation as in the proof of Theorem 3.6 in [9]. This shows that the bias error is O(ϵ)O(\epsilon), and the variance error is O(logNNϵd/2+1)O(\sqrt{\frac{\log N}{N\epsilon^{d/2+1}}}), by classical Bernstein. Same as in Theorem 3.2, C2C^{2} regularity and decay up to 2nd derivative of hh are enough here.

Strictly speaking, the analysis in [9] is for the “1N1ji,j=1N\frac{1}{N-1}\sum_{j\neq i,j=1}^{N}” summation and not the “1Nji,j=1N\frac{1}{N}\sum_{j\neq i,j=1}^{N}” one here. However, the difference between 1N1\frac{1}{N-1} and 1N\frac{1}{N} only introduces an O(1N)O(\frac{1}{N}) relative error and is of higher order, and the i=ji=j term cancels out in the summation of (DW)ρXf(D-W)\rho_{X}f. In proving this large deviation bound at xix_{i}, the needed threshold for large NN is determined by (,f,p)({\mathcal{M}},f,p) and uniform for xix_{i}. Then, when NN exceeds a threshold uniform for all xix_{i}, by the independence of the xix_{i}’s, the ii-th event holds w.p.>14N10>1-4N^{-10} and >12N10>1-2N^{-10} for cases 1) and 2) respectively. The current theorem, in both 1) and 2), follows by a union bound. ∎

We are ready for Step 2 for the unnormalized graph Laplacian Lun=1ϵm22pN(DW)L_{un}=\frac{1}{\epsilon\frac{m_{2}}{2}pN}(D-W). Here we consider eigenvectors normalized to have 2-norm 1, i.e., Lunuk=λkukL_{un}u_{k}=\lambda_{k}u_{k}, ukTul=δklu_{k}^{T}u_{l}=\delta_{kl}, and we compare uku_{k} to

ϕk:=1pNρXψkN,\phi_{k}:=\frac{1}{\sqrt{pN}}\rho_{X}\psi_{k}\in\mathbb{R}^{N}, (39)

where ψk\psi_{k} are population eigenfunctions which are orthonormal in H=L2(,dV)H=L^{2}({\mathcal{M}},dV), same as above.

Proposition 5.2.

Under Assumptions 1(A1), pp being uniform on {\mathcal{M}}, and hh is Gaussian, for fixed kmaxk_{max}\in\mathbb{N}, K=kmax+1K=k_{max}+1, assume that the eigenvalues μk\mu_{k} for kKk\leq K are all single multiplicity, and γK>0\gamma_{K}>0 as defined in (22), the constant cKc_{K} as in Proposition 4.1. If as NN\to\infty, ϵ0+\epsilon\to 0+, ϵd/2+2>cKlogNN\epsilon^{d/2+2}>c_{K}\frac{\log N}{N}, then for sufficiently large NN, w.p. >14K2N10(2K+4)N9>1-4K^{2}N^{-10}-(2K+4)N^{-9}, there exist scalars αk0\alpha_{k}\neq 0, actually |αk|=1+o(1)|\alpha_{k}|=1+o(1), such that

ukαkϕk2=O(ϵ,logNNϵd/2+1),1kkmax.\|u_{k}-\alpha_{k}\phi_{k}\|_{2}=O\left(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2+1}}}\right),\quad 1\leq k\leq k_{max}.
Proof of Proposition 5.2.

The proof uses the same approach as that of Theorem 2.6 part 2) in [7], and since our setting is different, we include a proof for completeness.

When k=1k=1, we always have λ1=μ1=0\lambda_{1}=\mu_{1}=0, u1u_{1} is the constant vector u1=1N𝟏Nu_{1}=\frac{1}{\sqrt{N}}\mathbf{1}_{N}, and ψ1\psi_{1} is the constant function, and thus ϕ1=u1\phi_{1}=u_{1} up to a sign. Under the condition of the current proposition, the assumptions of Proposition 4.1 are satisfied, and because ϵd/2+2>cKlogNN\epsilon^{d/2+2}>c_{K}\frac{\log N}{N} implies that ϵd/2+1=Ω(logNN)\epsilon^{d/2+1}=\Omega(\frac{\log N}{N}), the assumptions of Theorem 5.1 2) are also satisfied. We apply Theorem 5.1 2) to the KK functions ψ1,,ψK\psi_{1},\cdots,\psi_{K}. By a union bound, we have that when NN is large enough, w.p. >12KN9>1-2KN^{-9}, Lunϕkμkϕk=1pN(O(ϵ)+O(logNNϵd/2+1))\|L_{un}\phi_{k}-\mu_{k}\phi_{k}\|_{\infty}=\frac{1}{\sqrt{pN}}(O(\epsilon)+O(\sqrt{\frac{\log N}{N\epsilon^{d/2+1}}})) for 2kK2\leq k\leq K. By that v2Nv\|v\|_{2}\leq\sqrt{N}\|v\|_{\infty} for any vNv\in\mathbb{R}^{N}, this gives that there is Errpt>0\text{Err}_{pt}>0,

Lunϕkμkϕk2Errpt,2kK,Errpt=O(ϵ)+O(logNNϵd/2+1).\|L_{un}\phi_{k}-\mu_{k}\phi_{k}\|_{2}\leq\text{Err}_{pt},\quad 2\leq k\leq K,\quad\text{Err}_{pt}=O(\epsilon)+O(\sqrt{\frac{\log N}{N\epsilon^{d/2+1}}}). (40)

The constants in big-O depends on first KK eigenfunctions and are absolute ones because KK is fixed. Applying Proposition 4.1, and consider the intersection with the good event in Proposition 4.1, we have for each 2kK2\leq k\leq K, |μkλk|<γK|\mu_{k}-\lambda_{k}|<\gamma_{K}. By definition of γK\gamma_{K} as in (22),

min1jN,jk|μkλj|>γK>0,2kkmax.\min_{1\leq j\leq N,\,j\neq k}|\mu_{k}-\lambda_{j}|>\gamma_{K}>0,\quad 2\leq k\leq k_{max}. (41)

For each kkmaxk\leq k_{max}, let Sk=Span{uk}S_{k}=\text{Span}\{u_{k}\} be the 1-dimensional subspace in N\mathbb{R}^{N}, and let SkS_{k}^{\perp} be its orthogonal complement. We will show that PSkϕk2\|P_{S_{k}^{\perp}}\phi_{k}\|_{2} is small. By definition, PSkμkϕk=jk,j=1Nμk(ujTϕk)ujP_{S_{k}^{\perp}}\mu_{k}\phi_{k}=\sum_{j\neq k,j=1}^{N}\mu_{k}(u_{j}^{T}\phi_{k})u_{j}, and meanwhile, PSkLunϕk=jk,j=1N(ujTLunϕk)uj=jk,j=1Nλj(ujTϕk)ujP_{S_{k}^{\perp}}L_{un}\phi_{k}=\sum_{j\neq k,j=1}^{N}(u_{j}^{T}L_{un}\phi_{k})u_{j}=\sum_{j\neq k,j=1}^{N}\lambda_{j}(u_{j}^{T}\phi_{k})u_{j}. Subtracting the two gives that PSk(μkϕkLunϕk)=jk,j=1N(μkλj)(ujTϕk)ujP_{S_{k}^{\perp}}(\mu_{k}\phi_{k}-L_{un}\phi_{k})=\sum_{j\neq k,j=1}^{N}(\mu_{k}-\lambda_{j})(u_{j}^{T}\phi_{k})u_{j}. By that uju_{j} are orthonormal vectors, and (41),

PSk(μkϕkLunϕk)22=jk,j=1N(μkλj)2(ujTϕk)2γK2jk,j=1N(ujTϕk)2=γK2PSkϕk22.\|P_{S_{k}^{\perp}}(\mu_{k}\phi_{k}-L_{un}\phi_{k})\|_{2}^{2}=\sum_{j\neq k,j=1}^{N}(\mu_{k}-\lambda_{j})^{2}(u_{j}^{T}\phi_{k})^{2}\geq\gamma_{K}^{2}\sum_{j\neq k,j=1}^{N}(u_{j}^{T}\phi_{k})^{2}=\gamma_{K}^{2}\|P_{S_{k}^{\perp}}\phi_{k}\|_{2}^{2}.

Then, combined with (40), we have that γKPSkϕk2PSk(μkϕkLunϕk)2μkϕkLunϕk2Errpt\gamma_{K}\|P_{S_{k}^{\perp}}\phi_{k}\|_{2}\leq\|P_{S_{k}^{\perp}}(\mu_{k}\phi_{k}-L_{un}\phi_{k})\|_{2}\leq\|\mu_{k}\phi_{k}-L_{un}\phi_{k}\|_{2}\leq\text{Err}_{pt}, namely, PSkϕk2ErrptγK\|P_{S_{k}^{\perp}}\phi_{k}\|_{2}\leq\frac{\text{Err}_{pt}}{\gamma_{K}}.

By definition, PSkϕk=ϕk(ukTϕk)ukP_{S_{k}^{\perp}}\phi_{k}=\phi_{k}-(u_{k}^{T}\phi_{k})u_{k}, where uk2=1\|u_{k}\|_{2}=1. Note that ϕk\phi_{k} are unit vectors up to an O(logNN)O(\sqrt{\frac{\log N}{N}}) error: Because the good event in Proposition 4.1 is under that in the eigenvalue UB Proposition 3.1, and specifically that of Lemma 3.4. Thus (17) holds, which means that |ϕk21|Errnorm|\|\phi_{k}\|^{2}-1|\leq\text{Err}_{norm}, 1kK1\leq k\leq K, where Errnorm=O(logNN)\text{Err}_{norm}=O(\sqrt{\frac{\log N}{N}}). Then, one can verify that

|ukTϕk|=1+O(Errnorm,Errpt2)=1+o(1),|u_{k}^{T}\phi_{k}|=1+O(\text{Err}_{norm},\text{Err}_{pt}^{2})=1+o(1), (42)

and then we set αk=1ukTϕk\alpha_{k}=\frac{1}{u_{k}^{T}\phi_{k}}, and have that

αkϕkuk2=O(Errpt)|ukTϕk|O(Errpt)1O(Errnorm,Errpt2)=O(Errpt)(1+O(Errnorm,Errpt2))=O(Errpt).\|\alpha_{k}\phi_{k}-u_{k}\|_{2}=\frac{O(\text{Err}_{pt})}{|u_{k}^{T}\phi_{k}|}\leq\frac{O(\text{Err}_{pt})}{1-O(\text{Err}_{norm},\text{Err}_{pt}^{2})}=O(\text{Err}_{pt})(1+O(\text{Err}_{norm},\text{Err}_{pt}^{2}))=O(\text{Err}_{pt}).

The bound holds for each kkmaxk\leq k_{max}. ∎

5.2 Step 3: refined eigenvalue LB

We now derive Step 3 for LunL_{un}, the result being summarized in the following proposition.

Proposition 5.3.

Under the same condition of Proposition 5.2, kmaxk_{max} is fixed. Then, for sufficiently large NN, with the same indicated high probability,

|μkλk|=O(ϵ,logNNϵd/2),1kkmax.|\mu_{k}-\lambda_{k}|=O\left(\epsilon,\,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}\right),\quad 1\leq k\leq k_{max}.
Proof of Proposition 5.3.

We inherit the notations in the proof of Proposition 5.2. Again μ1=λ1=0\mu_{1}=\lambda_{1}=0. For 2kkmax2\leq k\leq k_{max}, note that

ukT(Lunϕkμkϕk)=(λkμk)ukTϕk,u_{k}^{T}(L_{un}\phi_{k}-\mu_{k}\phi_{k})=(\lambda_{k}-\mu_{k})u_{k}^{T}\phi_{k}, (43)

and meanwhile, we have shown that uk=αkϕk+εku_{k}=\alpha_{k}\phi_{k}+\varepsilon_{k}, where αk=1+o(1)\alpha_{k}=1+o(1) and εk2=O(Errpt)\|\varepsilon_{k}\|_{2}=O(\text{Err}_{pt}). Thus the l.h.s. of (43) equals

(αkϕk+εk)T(Lunϕkμkϕk)=αk(ϕkTLunϕkμkϕk22)+εkT(Lunϕkμkϕk)=:+.(\alpha_{k}\phi_{k}+\varepsilon_{k})^{T}(L_{un}\phi_{k}-\mu_{k}\phi_{k})=\alpha_{k}(\phi_{k}^{T}L_{un}\phi_{k}-\mu_{k}\|\phi_{k}\|_{2}^{2})+\varepsilon_{k}^{T}(L_{un}\phi_{k}-\mu_{k}\phi_{k})=:①+②.

By definition of ϕk\phi_{k}, ϕkTLunϕk=1pN(ρXψk)TLun(ρXψk)=1p2EN(ρXψk)\phi_{k}^{T}L_{un}\phi_{k}=\frac{1}{pN}(\rho_{X}\psi_{k})^{T}L_{un}(\rho_{X}\psi_{k})=\frac{1}{p^{2}}E_{N}(\rho_{X}\psi_{k}). The good event in Proposition 5.2 is under the good event EUBE_{UB}, under which Lemma 3.3 and Lemma 3.4 hold. Then by (16), EN(ρXψk)=p2μk+O(ϵ,logNNϵd/2)E_{N}(\rho_{X}\psi_{k})=p^{2}\mu_{k}+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}); By (17), ϕk2=1+O(logNN)\|\phi_{k}\|^{2}=1+O(\sqrt{\frac{\log N}{N}}). Putting together, and by that αk=1+o(1)=O(1)\alpha_{k}=1+o(1)=O(1),

=αk(ϕkTLunϕkμkϕk22)=O(1)(μk+O(ϵ,logNNϵd/2)μk(1+O(logNN)))=O(ϵ,logNNϵd/2).①=\alpha_{k}(\phi_{k}^{T}L_{un}\phi_{k}-\mu_{k}\|\phi_{k}\|_{2}^{2})=O(1)\left(\mu_{k}+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})-\mu_{k}(1+O(\sqrt{\frac{\log N}{N}}))\right)=O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}).

Meanwhile, by (40), Lunϕkμkϕk2Errpt\|L_{un}\phi_{k}-\mu_{k}\phi_{k}\|_{2}\leq\text{Err}_{pt}, and then

||εk2Lunϕkμkϕk2=O(Errpt2).|②|\leq\|\varepsilon_{k}\|_{2}\|L_{un}\phi_{k}-\mu_{k}\phi_{k}\|_{2}=O(\text{Err}_{pt}^{2}).

Because ϵd/2+2>cKlogNN\epsilon^{d/2+2}>c_{K}\frac{\log N}{N} for some cK>0c_{K}>0, logNNϵd/2+1=ϵlogNNϵd/2+2<ϵcK\frac{\log N}{N\epsilon^{d/2+1}}=\epsilon\frac{\log N}{N\epsilon^{d/2+2}}<\frac{\epsilon}{c_{K}}, thus Errpt=O(ϵ+logNNϵd/2+1)=O(ϵ)\text{Err}_{pt}=O(\epsilon+\sqrt{\frac{\log N}{N\epsilon^{d/2+1}}})=O(\sqrt{\epsilon}), and then =O(Errpt2)=O(ϵ)②=O(\text{Err}_{pt}^{2})=O(\epsilon). Back to (43), we have that

|λkμk||ukTϕk|=|+|=O(ϵ,logNNϵd/2)+O(ϵ),|\lambda_{k}-\mu_{k}||u_{k}^{T}\phi_{k}|=|①+②|=O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})+O(\epsilon),

and by (42), |ukTϕk|=1+o(1)|u_{k}^{T}\phi_{k}|=1+o(1), thus |λkμk|=|+|1+o(1)=O(|+|)=O(ϵ,logNNϵd/2)|\lambda_{k}-\mu_{k}|=\frac{|①+②|}{1+o(1)}=O(|①+②|)=O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}). The above holds for all kkmaxk\leq k_{max}. ∎

5.3 Eigen-convergence rate

We are ready to prove the main theorems on eigen-convergence of graph Laplacians, when pp is uniform and the kernel function hh is Gaussian.

Theorem 5.4 (eigen-convergence of LunL_{un}).

Under Assumption 1 (A1), pp is uniform on {\mathcal{M}}, and hh is Gaussian. For kmaxk_{max}\in\mathbb{N} fixed, assume that the eigenvalues μk\mu_{k} for kK:=kmax+1k\leq K:=k_{max}+1 are all single multiplicity, and the constant cKc_{K} as in Proposition 4.1. Consider first kmaxk_{max} eigenvalues and eigenvectors of LunL_{un}, Lunuk=λkukL_{un}u_{k}=\lambda_{k}u_{k}, ukTul=δklu_{k}^{T}u_{l}=\delta_{kl}, and the vectors ϕk\phi_{k} are defined as in (39). If as NN\to\infty, ϵ0+\epsilon\to 0+, ϵd/2+2>cKlogNN\epsilon^{d/2+2}>c_{K}\frac{\log N}{N}, then for sufficiently large NN, w.p. >14K2N10(2K+4)N9>1-4K^{2}N^{-10}-(2K+4)N^{-9},

|μkλk|=O(ϵ,logNNϵd/2),1kkmax,|\mu_{k}-\lambda_{k}|=O\left(\epsilon,\,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}\right),\quad 1\leq k\leq k_{max}, (44)

and there exist scalars αk0\alpha_{k}\neq 0, actually |αk|=1+o(1)|\alpha_{k}|=1+o(1), such that

ukαkϕk2=O(ϵ,logNNϵd/2+1),1kkmax.\|u_{k}-\alpha_{k}\phi_{k}\|_{2}=O\left(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2+1}}}\right),\quad 1\leq k\leq k_{max}. (45)
Remark 3 (Choice of ϵ\epsilon and overall rates).

The eigen-convergence bounds (44) and (45) are provided in the combined form of ϵ\epsilon and NN, as long as the condition ϵ=o(1)\epsilon=o(1) and ϵd/2+2>cKlogN/N\epsilon^{d/2+2}>c_{K}\log N/N holds. The bias error in both cases is O(ϵ)O(\epsilon), and the variance error has a different inverse power of ϵ\epsilon (d/4-d/4 and d/41/2-d/4-1/2 respectively). The eigenvalue convergence (44) achieves the form rate Errform=O(ϵ,logNNϵd/2){\rm Err}_{form}=O\left(\epsilon,\,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}\right), which is the rate of the Dirichlet form convergence, c.f. Theorem 3.2. The (2-norm) eigenvector convergence (45) achieves the point-wise rate Errpt=O(ϵ,logNNϵd/2+1){\rm Err}_{pt}=O\left(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2+1}}}\right), which is the rate of point-wise convergence of graph Laplacian, c.f. Theorem 5.1.

The different powers of ϵ\epsilon leads to different optimal choice of ϵ\epsilon, in order of NN, to achieve the best overall rates for eigenvalue and eigenvector convergence respectively. Specifically,

  • The optimal choice of ϵ\epsilon to minimize Errform{\rm Err}_{form} is when ϵ=(clogNN)1/(d/2+2)\epsilon=(c^{\prime}\frac{\log N}{N})^{1/(d/2+2)} for c>cKc^{\prime}>c_{K} (which is also the smallest order of ϵ\epsilon allowed by the theorem). This choice leads to

    |μkλk|=O((logN/N)1/(d/2+2))=O~(N1/(d/2+2)),1kkmax,|\mu_{k}-\lambda_{k}|={O}\left(({\log N}/{N})^{{1}/{(d/2+2)}}\right)=\tilde{O}(N^{-1/(d/2+2)}),\quad 1\leq k\leq k_{max},

    which is the best overall rate of eigenvalue convergence by our theory. We use O~()\tilde{O}(\cdot) to denote the involvement of certain factor of logN\log N. In this case, ukαkϕk2=O((logNN)1/(d+4))\|u_{k}-\alpha_{k}\phi_{k}\|_{2}={O}((\frac{\log N}{N})^{{1}/{(d+4)}}).

  • The optimal choice of ϵ\epsilon to minimize Errpt{\rm Err}_{pt} is when ϵ(logN/N)1/(d/2+3)\epsilon\sim(\log N/N)^{{1}/{(d/2+3)}}, which leads to

    ukαkϕk2=O((logN/N)1/(d/2+3))=O~(N1/(d/2+3)),1kkmax,\|u_{k}-\alpha_{k}\phi_{k}\|_{2}={O}\left(({\log N}/{N})^{{1}/{(d/2+3)}}\right)=\tilde{O}(N^{-{1}/{(d/2+3)}}),\quad 1\leq k\leq k_{max},

    which is the best overall rate of eigenvector convergence. In this case, |μkλk|=O~(N1(d/2+3))|\mu_{k}-\lambda_{k}|=\tilde{O}(N^{-{1}{(d/2+3)}}).

We can see that the overall rate of eigenvalue convergence achieves the best overall rate of form convergence O~(N1/(d/2+2))\tilde{O}(N^{-1/(d/2+2)}), and that of eigenvector (2-norm) convergence achieves the best overall rate of point-wise convergence O~(N1/(d/2+3))\tilde{O}(N^{-1/(d/2+3)}), at the optimal ϵ\epsilon for each convergence respectively.

Proof of Theorem 5.4.

Under the condition of the theorem, the eigenvector and eigenvalue error bounds have been proved in Proposition 5.2 and Proposition 5.3. For the the two specific asymptotic scaling of ϵ\epsilon, the rate follows from the bounds involving both ϵ\epsilon and NN. ∎

Remark 4 (Comparison to compactly supported hh).

For h=𝟏[0,1)h={\bf 1}_{[0,1)} (see also Remark 2), the point-wise convergence of graph Laplacian is known to have the rate as Errpt,ind=O(ϵ,logNNϵd/2+1)\text{Err}_{pt,ind}=O\left(\sqrt{\epsilon},\,\sqrt{\frac{\log N}{N\epsilon^{d/2+1}}}\right), see [19, 4, 27, 7] among others. While our way of Step 1 cannot be applied to such hh, [7] covered this case when d2d\geq 2, and provided the eigenvalue and eigenvector consistency up to Errpt,ind\text{Err}_{pt,ind} when ϵd/2+2=Ω(logNN)\epsilon^{d/2+2}=\Omega(\frac{\log N}{N}). The scaling ϵd/2+2=Θ~(N1)\epsilon^{d/2+2}=\tilde{\Theta}(N^{-1}) is the optimal one to balance the bias and variance errors in Errpt,ind\text{Err}_{pt,ind}, and then it gives the overall error rate as O~(N1/(d+4))\tilde{O}(N^{-1/(d+4)}), which agrees with the eigen-convergence rate in [7]. Here O~()\tilde{O}(\cdot) and Θ~()\tilde{\Theta}(\cdot) indicate that the constant is possibly multiplied by a factor of certain power of logN\log N. Meanwhile, we note that, if following our approach of using the Dirichlet form convergence rate, the eigenvalue consistency can be improved to be squared namely O~(N1/(d/2+2))\tilde{O}(N^{-1/(d/2+2)}) when ϵ=Θ~(N1/(d/2+2))\epsilon=\tilde{\Theta}(N^{-1/(d/2+2)}). Specifically, by Remark 2, the Dirichlet form convergence with indicator hh is Errform,ind=O(ϵ,logNNϵd/2)\text{Err}_{form,ind}=O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}). Then, once the initial crude eigenvalue LB is established, in Step 2, the eigenvector 2-norm consistency can be shown to be Errpt,ind\text{Err}_{pt,ind}. In Step 3, the eigenvalue consistency for the first kmaxk_{max} eigenvalues can be shown to be O(Errform,ind,Errpt,ind2)=O(ϵ,logNNϵd/2)O(\text{Err}_{form,ind},\text{Err}_{pt,ind}^{2})=O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}). This would imply the eigenvalue convergence rate of O~(N1/(d/2+2))\tilde{O}(N^{-1/(d/2+2)}) under the regime where ϵ=Θ~(N1/(d/2+2))\epsilon=\tilde{\Theta}(N^{-1/(d/2+2)}), while the eigenvector consistency remains O~(N1/(d+4))\tilde{O}(N^{-1/(d+4)}). Compared to Remark 3, these rates are the same as Gaussian kernel when setting ϵ=Θ~(N1/(d/2+2))\epsilon=\tilde{\Theta}(N^{-1/(d/2+2)}) (the optimal order to minimize the eigenvalue rate which is Errform{\rm Err}_{form}). However, using Gaussian kernel allows to obtain a better rate for eigenvector convergence, namely O~(N1/(d/2+3))\tilde{O}(N^{-1/(d/2+3)}), by setting ϵΘ~(N1/(d/2+3))\epsilon\sim\tilde{\Theta}(N^{-1/(d/2+3)}) (the optimal order to minimize the eigenvector convergence rate which is Errpt{\rm Err}_{pt}). This improved eigenvector (2-norm) rate is due to the improved point-wise rate of smooth kernel Errpt{\rm Err}_{pt} than that of the indicator kernel Errpt,ind\text{Err}_{pt,ind}, and specifically, the bias error is O(ϵ)O(\epsilon) instead of O(ϵ)O(\sqrt{\epsilon}).

Remark 5 (Extension to larger eigenvalue multiplicity).

The result extends when the population eigenvalues μk\mu_{k} have multiplicity greater than one. Suppose we consider 0=μ(1)<μ(2)<<μ(M)<0=\mu^{(1)}<\mu^{(2)}<\cdots<\mu^{(M)}<\cdots, which are distinct eigenvalues, and μ(m)\mu^{(m)} has multiplicity lm1l_{m}\geq 1. Then let kmax=m=1Mlmk_{max}=\sum_{m=1}^{M}l_{m}, K=m=1M+1lmK=\sum_{m=1}^{M+1}l_{m}, μK=μ(M+1)\mu_{K}=\mu^{(M+1)}, and {μk,ψk}k=1K\{\mu_{k},\psi_{k}\}_{k=1}^{K} are sorted eigenvalues and associated eigenfunctions. Step 0. eigenvalue UB holds, since Proposition 3.1 does not require single multiplicity. In Step 1, the only place in Proposition 4.1 where single multiplicity of μk\mu_{k} is used is in the definition of γK\gamma_{K}. Then, by changing to

γ(M)=12min1mM(μ(m+1)μ(m))>0,\gamma^{(M)}=\frac{1}{2}\min_{1\leq m\leq M}(\mu^{(m+1)}-\mu^{(m)})>0, (46)

and defining δ=0.5γ(M)μK\delta=0.5\frac{\gamma^{(M)}}{\mu_{K}}, 0<δ<0.50<\delta<0.5 is a positive constant depending on {\mathcal{M}} and KK, Proposition 4.1 proves that |λkμ(m)|<γ(M)|\lambda_{k}-\mu^{(m)}|<\gamma^{(M)} for all kKk\leq K, i.e. mM+1m\leq M+1. This allows to extend Step 2 Proposition 5.2 by considering the projection PSP_{S^{\perp}} where the subspace in N\mathbb{R}^{N} is spanned by eigenvectors whose eigenvalues λk\lambda_{k} approaches μk=μ(m)\mu_{k}=\mu^{(m)}, similar as in the original proof of Theorem 2.6 part 2) in [7]. Specifically, suppose μi==μi+lm1=μ(m)\mu_{i}=\cdots=\mu_{i+l_{m}-1}=\mu^{(m)}, 2mM2\leq m\leq M, let S(m)=Span{ui,,ui+lm1}S^{(m)}=\text{Span}\{u_{i},\cdots,u_{i+l_{m}-1}\}, and the index set Im:={i,,i+lm1}I_{m}:=\{i,\cdots,i+l_{m}-1\}. For eigenfunction ψk\psi_{k}, kImk\in I_{m}, then μk=μ(m)\mu_{k}=\mu^{(m)}, similarly as in the proof of Proposition 5.2, one can verify that

P(S(m))(μkϕkLunϕk)22=jIm(μkλj)2(ujTϕk)2(γ(M))2jIm(ujTϕk)2=(γ(M))2P(S(m))ϕk22,\|P_{(S^{(m)})^{\perp}}(\mu_{k}\phi_{k}-L_{un}\phi_{k})\|_{2}^{2}=\sum_{j\notin I_{m}}(\mu_{k}-\lambda_{j})^{2}(u_{j}^{T}\phi_{k})^{2}\geq(\gamma^{(M)})^{2}\sum_{j\notin I_{m}}(u_{j}^{T}\phi_{k})^{2}=(\gamma^{(M)})^{2}\|P_{(S^{(m)})^{\perp}}\phi_{k}\|_{2}^{2},

which gives that ϕkPS(m)ϕk2=P(S(m))ϕk21γ(M)Errpt\|\phi_{k}-P_{S^{(m)}}\phi_{k}\|_{2}=\|P_{(S^{(m)})^{\perp}}\phi_{k}\|_{2}\leq\frac{1}{\gamma^{(M)}}\text{Err}_{pt}, for all kImk\in I_{m}. By that {ϕk}k=1K\{\phi_{k}\}_{k=1}^{K} are near orthonormal with large NN (Lemma 3.4), this proves that there exists an lml_{m}-by-lml_{m} orthogonal transform QmQ_{m}, and |αk|=1+o(1)|\alpha_{k}|=1+o(1), such that ukαkϕk2=O(Errpt)=O(ϵ,logNNϵd/2+1)\|u_{k}-\alpha_{k}\phi_{k}^{\prime}\|_{2}=O(\text{Err}_{pt})=O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2+1}}}), kImk\in I_{m}, where [ϕk]kIm=[ϕk]kImQm[\phi_{k}^{\prime}]_{k\in I_{m}}=[\phi_{k}]_{k\in I_{m}}Q_{m}, and the notation [vj]jJ[v_{j}]_{j\in J} stands for the NN-by-|J||J| matrix formed by concatenating the vectors vjv_{j} as columns. This proves consistency of empirical eigenvectors uku_{k} up to the point-wise rate for kkmaxk\leq k_{max}. Finally, Step 3 Proposition 5.3 extends by considering (43) for uku_{k} and ϕk\phi_{k}^{\prime}, making use of ukαkϕk2=O(Errpt)\|u_{k}-\alpha_{k}\phi_{k}^{\prime}\|_{2}=O(\text{Err}_{pt}), the Dirichlet form convergence of EN(ρXψk)E_{N}(\rho_{X}\psi_{k}) (Lemma 3.3), and that {ϕk}kIm\{\phi_{k}^{\prime}\}_{k\in I_{m}} is transformed from {ϕk}kIm\{\phi_{k}\}_{k\in I_{m}} by an orthogonal matrix QmQ_{m}.

To address the eigen-convergence of LrwL_{rw}, we define the D/ND/N-weighted 2-norm as

uDN2=1NuTDu,\|u\|_{\frac{D}{N}}^{2}=\frac{1}{N}u^{T}Du,

and recall that eigenvectors of LrwL_{rw} are DD-orthogonal. The following theorem is the counterpart of Theorem 5.4 for LrwL_{rw}, obtaining the same rates.

Theorem 5.5 (eigen-convergence of LrwL_{rw}).

Under the same condition and setting of {\mathcal{M}}, pp being uniform, hh being Gaussian, and kmaxk_{max}, K, μk\mu_{k}, ϵ\epsilon same as in Theorem 5.4. Consider first kmaxk_{max} eigenvalues and eigenvectors of LrwL_{rw}, Lrwvk=λkvkL_{rw}v_{k}=\lambda_{k}v_{k}, vkTDvl=δklNpv_{k}^{T}Dv_{l}=\delta_{kl}Np, i.e. vkDN2=p\|v_{k}\|_{\frac{D}{N}}^{2}=p, and the vectors ϕk\phi_{k} defined as in (39). Then, for sufficiently large NN, w.p. >14K2N10(4K+6)N9>1-4K^{2}N^{-10}-(4K+6)N^{-9}, vk2=1+o(1)\|v_{k}\|_{2}=1+o(1), and the same bound of |μkλk||\mu_{k}-\lambda_{k}| and vkαkϕk2\|v_{k}-\alpha_{k}\phi_{k}\|_{2} as in Theorem 5.4 hold for 1kkmax1\leq k\leq k_{max}, with certain scalars αk\alpha_{k} satisfying |αk|=1+o(1)|\alpha_{k}|=1+o(1),

The extension to when μk\mu_{k} has greater than 1 multiplicity is possible, similarly as in Remark 5. The proof of LrwL_{rw} uses almost the same method as for LunL_{un}, and the difference is that vkv_{k} are no longer orthonormal but DD-orthogonal. This is handled by that u22\|u\|_{2}^{2} and 1puD/N2\frac{1}{p}\|u\|_{D/N}^{2} agrees in relative error up to the form rate, due to the concentration of Di/ND_{i}/N (Lemma 3.5). The detailed proof is left to Appendix C.3.

6 Density-corrected graph Laplacian

We consider pp as in Assumption 1(A2). The density-corrected graph Laplacian is defined as [10]

L~rw=1m22m0ϵ(ID~1W~),W~ij=WijDiDj,D~ii=j=1NW~ij,\tilde{L}_{rw}=\frac{1}{\frac{m_{2}}{2m_{0}}\epsilon}(I-\tilde{D}^{-1}\tilde{W}),\quad\tilde{W}_{ij}=\frac{W_{ij}}{D_{i}D_{j}},\quad\tilde{D}_{ii}=\sum_{j=1}^{N}\tilde{W}_{ij},

where Wij=Kϵ(xi,xj)W_{ij}=K_{\epsilon}(x_{i},x_{j}) as before, and DD is the degree matrix of WW. The density-corrected graph laplacian recovers Laplace-Beltrami operator when pp is not uniform.. In this section, we extend the theory of point-wise convergence, Dirichlet form convergence, and eigen-convergence to such graph Laplacian.

6.1 Point-wise convergence of L~rw\tilde{L}_{rw}

This subsection proves Theorem 6.2, which shows that the point-wise rate of L~rw\tilde{L}_{rw} is same as that of LrwL_{rw} without the density-correction. The result is for general differentiable hh satisfying Assumption 2, which can be of independent interest.

We first establish the counterpart of Lemma 3.5 about the concentration of all 1NDi=1Nj=1NWij\frac{1}{N}D_{i}=\frac{1}{N}\sum_{j=1}^{N}W_{ij} when pp is not uniform. The deviation bound is uniform for all ii and has an bias error at O(ϵ2)O(\epsilon^{2}).

Lemma 6.1.

Under Assumptions 1 and 2, suppose as NN\to\infty, ϵ0+\epsilon\to 0+, ϵd/2=Ω(logNN)\epsilon^{d/2}=\Omega(\frac{\log N}{N}). Then,

1) When NN is large enough, w.p. >12N9>1-2N^{-9}, Di>0D_{i}>0 for all ii s.t. W~\tilde{W} is well-defined, and

1NDi=m0p~ϵ(xi)+O(ϵ2,logNNϵd/2),p~ϵ:=p+m~ϵ(ωp+Δp),1iN.\frac{1}{N}D_{i}=m_{0}\tilde{p}_{\epsilon}(x_{i})+O\left(\epsilon^{2},\sqrt{\frac{\log N}{N\epsilon^{d/2}}}\right),\quad\tilde{p}_{\epsilon}:=p+\tilde{m}\epsilon(\omega p+\Delta p),\quad 1\leq i\leq N. (47)

where ωC()\omega\in C^{\infty}({\mathcal{M}}) is determined by manifold extrinsic coordinates, and m~[h]=m2[h]2m0[h]\tilde{m}[h]=\frac{m_{2}[h]}{2m_{0}[h]}.

2) When NN is large enough, w.p. >14N9>1-4N^{-9}, D~i>0\tilde{D}_{i}>0 for all ii s.t. L~rw\tilde{L}_{rw} is well-defined, and

j=1NWij1Dj=1+O(ϵ,logNNϵd/2),1iN.\sum_{j=1}^{N}W_{ij}\frac{1}{D_{j}}=1+O\left(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}\right),\quad 1\leq i\leq N. (48)

The constants in big-OO in parts 1) and 2) depend on (,p){\mathcal{M}},p), and are uniform for all ii.

The proof is left to Appendix D. The following theorem proves the point-wise rate of L~rw\tilde{L}_{rw}.

Theorem 6.2.

Under Assumptions 1 and 2, if as NN\to\infty, ϵ0+\epsilon\to 0+, ϵd/2+1=Ω(logNN)\epsilon^{d/2+1}=\Omega(\frac{\log N}{N}), then for any fC4()f\in C^{4}({\mathcal{M}}), when NN is large enough, w.p. >18N9>1-8N^{-9},

1ϵm22m0(ID~1W~)(ρXf)(xi)=Δf(xi)+εi,sup1iN|εi|=O(ϵ)+O(logNNϵd/2+1).\frac{1}{\epsilon\frac{m_{2}}{2m_{0}}}(I-\tilde{D}^{-1}\tilde{W})(\rho_{X}f)(x_{i})=-\Delta f(x_{i})+\varepsilon_{i},\quad\sup_{1\leq i\leq N}|\varepsilon_{i}|=O(\epsilon)+O\left(\sqrt{\frac{\log N}{N\epsilon^{d/2+1}}}\right).

The constants in the big-O notation depend on {\mathcal{M}}, pp and the C4C^{4} norm of ff.

The theorem slightly improves the point-wise convergence rate of O(ϵ,logNNϵd/2+2)O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2+2}}}) in [28]. It is proved using the same techniques as the analysis of point-wise convergence of LrwL_{rw} in [27, 9], and we include a proof for completeness here.

Proof of Theorem 6.2.

By definition,

1ϵm22m0(ID~1W~)(ρXf)(xi)=1ϵm22m0j=1NWijf(xj)f(xi)Djj=1NWij1Dj.-\frac{1}{\epsilon\frac{m_{2}}{2m_{0}}}(I-\tilde{D}^{-1}\tilde{W})(\rho_{X}f)(x_{i})=\frac{1}{\epsilon\frac{m_{2}}{2m_{0}}}\frac{\sum_{j=1}^{N}W_{ij}\frac{f(x_{j})-f(x_{i})}{D_{j}}}{\sum_{j=1}^{N}W_{ij}\frac{1}{D_{j}}}. (49)

The proof of Lemma 6.1 has constructed two good events E1E_{1} and E2E_{2} (E1E_{1} is for Part 1) to hold, Part 2) assumes E1E_{1} and E2E_{2}), such that with large enough NN, E1E2E_{1}\cap E_{2} happens w.p. >14N9>1-4N^{-9}, under which DiD_{i}, D~i>0\tilde{D}_{i}>0 for all ii, W~\tilde{W} and L~rw\tilde{L}_{rw} are well-defined, and equations (47), (A.21), and (48) hold. (48) provides the concentration of the denominator of the r.h.s. of (49). We now consider the numerator. Note that, with sufficiently small ϵ\epsilon, p~ϵ\tilde{p}_{\epsilon} is uniformly bounded from below by O(1)O(1) constant pminp_{min}^{\prime}. This is because ω,pC()\omega,p\in C^{\infty}({\mathcal{M}}), {\mathcal{M}} is compact, then (ωp+Δp)(\omega p+\Delta p) is uniformly bounded, and meanwhile pp is uniformly bounded from below. Thus, under E1E_{1},

1Nj=1NWijf(xj)f(xi)1NDj=1Nj=1NWij(f(xj)f(xi))m0p~ϵ(xj)(1+εj),max1jN|εj|=O(ϵ2,logNNϵd/2),\frac{1}{N}\sum_{j=1}^{N}W_{ij}\frac{f(x_{j})-f(x_{i})}{\frac{1}{N}D_{j}}=\frac{1}{N}\sum_{j=1}^{N}\frac{W_{ij}(f(x_{j})-f(x_{i}))}{m_{0}\tilde{p}_{\epsilon}(x_{j})(1+\varepsilon_{j})},\quad\max_{1\leq j\leq N}|\varepsilon_{j}|=O(\epsilon^{2},\sqrt{\frac{\log N}{N\epsilon^{d/2}}}),

and the equation equals

1Nj=1NWij(f(xj)f(xi))m0p~ϵ(xj)(1+εj)\displaystyle\frac{1}{N}\sum_{j=1}^{N}\frac{W_{ij}(f(x_{j})-f(x_{i}))}{m_{0}\tilde{p}_{\epsilon}(x_{j})}(1+\varepsilon_{j}^{\prime}) =1Nj=1NWij(f(xj)f(xi))m0p~ϵ(xj)+1Nj=1NWij(f(xj)f(xi))m0p~ϵ(xj)εj\displaystyle=\frac{1}{N}\sum_{j=1}^{N}\frac{W_{ij}(f(x_{j})-f(x_{i}))}{m_{0}\tilde{p}_{\epsilon}(x_{j})}+\frac{1}{N}\sum_{j=1}^{N}\frac{W_{ij}(f(x_{j})-f(x_{i}))}{m_{0}\tilde{p}_{\epsilon}(x_{j})}\varepsilon_{j}^{\prime}
=:+,max1jN|εj|=O(ϵ2,logNNϵd/2)\displaystyle=:①+②,\quad\max_{1\leq j\leq N}|\varepsilon_{j}^{\prime}|=O(\epsilon^{2},\sqrt{\frac{\log N}{N\epsilon^{d/2}}})

and we analyze the two terms respectively.

To bound |||②|, we use Wij0W_{ij}\geq 0 and again that p~ϵ(x)pmin>0\tilde{p}_{\epsilon}(x)\geq p_{min}^{\prime}>0 to have

||1Nj=1NWij|f(xj)f(xi)|m0p~ϵ(xj)|εj|max1jN|εj|m0pmin1Nj=1NWij|f(xj)f(xi)|.\displaystyle|②|\leq\frac{1}{N}\sum_{j=1}^{N}\frac{W_{ij}|f(x_{j})-f(x_{i})|}{m_{0}\tilde{p}_{\epsilon}(x_{j})}|\varepsilon_{j}^{\prime}|\leq\frac{\max_{1\leq j\leq N}|\varepsilon_{j}^{\prime}|}{m_{0}p_{min}^{\prime}}\cdot\frac{1}{N}\sum_{j=1}^{N}{W_{ij}|f(x_{j})-f(x_{i})|}.

We claim that, for large enough NN, w.p. >12N9>1-2N^{-9}, and we call this good event E3E_{3}, under which

1Nj=1NWij|f(xj)f(xi)|=O(ϵ),1iN,\frac{1}{N}\sum_{j=1}^{N}{W_{ij}|f(x_{j})-f(x_{i})|}=O(\sqrt{\epsilon}),\quad 1\leq i\leq N, (50)

and the proof is in below. With (50), under E3E_{3}, |||②| can be bounded by

||=(max1jN|εj|)O(ϵ)=O(ϵ2,logNNϵd/2)O(ϵ)=O(ϵ5/2,logNNϵd/21).|②|=(\max_{1\leq j\leq N}|\varepsilon_{j}^{\prime}|)O(\sqrt{\epsilon})=O(\epsilon^{2},\sqrt{\frac{\log N}{N\epsilon^{d/2}}})O(\sqrt{\epsilon})=O(\epsilon^{5/2},\sqrt{\frac{\log N}{N\epsilon^{d/2-1}}}). (51)

The analysis of uses concentration of independent sum again. Condition on xix_{i} and consider

=1N1ji,j=1NKϵ(xi,xj)f(xj)f(xi)p~ϵ(xj)=:1N1ji,j=1NYj,①^{\prime}=\frac{1}{N-1}\sum_{j\neq i,j=1}^{N}K_{\epsilon}(x_{i},x_{j})\frac{f(x_{j})-f(x_{i})}{\tilde{p}_{\epsilon}(x_{j})}=:\frac{1}{N-1}\sum_{j\neq i,j=1}^{N}Y_{j},

and we have =1m0(11N)①=\frac{1}{m_{0}}(1-\frac{1}{N})①^{\prime}. Due to uniform boundedness of p~ϵ\tilde{p}_{\epsilon} from below by pmin>0p_{min}^{\prime}>0, |Yj||Y_{j}| are bounded by LY=Θ(ϵd/2)L_{Y}=\Theta(\epsilon^{-d/2}). We claim that the expectation (proof in below)

𝔼Yj=Kϵ(xi,y)f(y)p(y)p~ϵ(y)𝑑V(y)f(xi)Kϵ(xi,y)p(y)p~ϵ(y)𝑑V(y)=m22ϵΔf(xi)+O(ϵ2).\mathbb{E}Y_{j}=\int_{{\mathcal{M}}}K_{\epsilon}(x_{i},y)\frac{f(y)p(y)}{\tilde{p}_{\epsilon}(y)}dV(y)-f(x_{i})\int_{{\mathcal{M}}}K_{\epsilon}(x_{i},y)\frac{p(y)}{\tilde{p}_{\epsilon}(y)}dV(y)=\frac{m_{2}}{2}\epsilon\Delta f(x_{i})+O(\epsilon^{2}). (52)

The variance of YjY_{j} is bounded by

𝔼Yj2\displaystyle\mathbb{E}Y_{j}^{2} =Kϵ(xi,y)2(f(y)f(xi)p~ϵ(y))2p(y)𝑑V(y)\displaystyle=\int_{{\mathcal{M}}}K_{\epsilon}(x_{i},y)^{2}\left(\frac{f(y)-f(x_{i})}{\tilde{p}_{\epsilon}(y)}\right)^{2}p(y)dV(y)
1pmin2Kϵ(xi,y)2(f(y)f(xi))2p(y)𝑑V(y)νY=Θf,p(ϵd/2+1),\displaystyle\leq\frac{1}{p_{min}^{\prime 2}}\int_{{\mathcal{M}}}K_{\epsilon}(x_{i},y)^{2}\left(f(y)-f(x_{i})\right)^{2}p(y)dV(y)\leq\nu_{Y}=\Theta_{f,p}(\epsilon^{-d/2+1}),

which follows the same derivation as in the proof of the point-wise convergence of LrwL_{rw} without density-correction, c.f. Theorem 5.1 1), and can be directly verified by a similar calculation as in (54). We attempt at the large deviation bound at Θ(logNNνY)(logNNϵd/21)1/2\Theta(\sqrt{\frac{\log N}{N}\nu_{Y}})\sim(\frac{\log N}{N\epsilon^{d/2-1}})^{1/2} which is of small order than νYLY=Θ(ϵ)\frac{\nu_{Y}}{L_{Y}}=\Theta(\epsilon) under the theorem condition that ϵd/2+1=Ω(logNN)\epsilon^{d/2+1}=\Omega(\frac{\log N}{N}). Thus the classical Bernstein gives that for large enough NN, where the threshold is determined by (,f,p)({\mathcal{M}},f,p) and uniform for xix_{i}, w.p. >12N10>1-2N^{-10},

=𝔼Yj+O(logNNνY)=m22ϵΔf(xi)+O(ϵ2)+O(logNNϵd/21),①^{\prime}=\mathbb{E}Y_{j}+O(\sqrt{\frac{\log N}{N}\nu_{Y}})=\frac{m_{2}}{2}\epsilon\Delta f(x_{i})+O(\epsilon^{2})+O(\sqrt{\frac{\log N}{N\epsilon^{d/2-1}}}),

and as a result,

=m~ϵΔf(xi)+O(ϵ2)+O(logNNϵd/21).①=\tilde{m}\epsilon\Delta f(x_{i})+O(\epsilon^{2})+O(\sqrt{\frac{\log N}{N\epsilon^{d/2-1}}}). (53)

By a union bound over the events needed at NN points, we have that (53) holds at all xix_{i} under a good event E4E_{4} which happens w.p. >12N9>1-2N^{-9}.

Putting together, under E3E_{3} and E4E_{4}, by (51) and (53), at all xix_{i},

1ϵj=1NWijf(xj)f(xi)Dj\displaystyle\frac{1}{\epsilon}\sum_{j=1}^{N}W_{ij}\frac{f(x_{j})-f(x_{i})}{D_{j}} =m~Δf(xi)+O(ϵ)+O(logNNϵd/2+1)+O(ϵ3/2,logNNϵd/2+1)\displaystyle=\tilde{m}\Delta f(x_{i})+O(\epsilon)+O(\sqrt{\frac{\log N}{N\epsilon^{d/2+1}}})+O(\epsilon^{3/2},\sqrt{\frac{\log N}{N\epsilon^{d/2+1}}})
=m~Δf(xi)+O(ϵ,logNNϵd/2+1).\displaystyle=\tilde{m}\Delta f(x_{i})+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2+1}}}).

Combined with (48), under E1,E2,E3,E4E_{1},E_{2},E_{3},E_{4},

1ϵm~j=1NWijf(xj)f(xi)Djj=1NWij1Dj\displaystyle\frac{1}{\epsilon\tilde{m}}\frac{\sum_{j=1}^{N}W_{ij}\frac{f(x_{j})-f(x_{i})}{D_{j}}}{\sum_{j=1}^{N}W_{ij}\frac{1}{D_{j}}} =Δf(xi)+O(ϵ,logNNϵd/2+1)1+O(ϵ,logNNϵd/2)=Δf(xi)+O(ϵ,logNNϵd/2+1).\displaystyle=\frac{\Delta f(x_{i})+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2+1}}})}{1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})}=\Delta f(x_{i})+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2+1}}}).

It remains to establish (50) and (52) to finish the proof of the theorem.

Proof of (50): Define r.v. Yj=Wij|f(xj)f(xi)|Y_{j}=W_{ij}|f(x_{j})-f(x_{i})| and condition on xix_{i}, for jij\neq i, 𝔼Yj=Kϵ(xi,y)|f(y)f(xi)|p(y)𝑑V(y)\mathbb{E}Y_{j}=\int_{{\mathcal{M}}}K_{\epsilon}(x_{i},y)|f(y)-f(x_{i})|p(y)dV(y). Let δϵ=(d+10a)ϵlog1ϵ\delta_{\epsilon}=\sqrt{(\frac{d+10}{a})\epsilon\log{\frac{1}{\epsilon}}}, for any xx\in{\mathcal{M}}, Kϵ(x,y)=O(ϵ10)K_{\epsilon}(x,y)=O(\epsilon^{10}) when yBδϵ(x)y\notin B_{\delta_{\epsilon}}(x), then

Kϵ(x,y)|f(y)f(x)|p(y)𝑑V(y)\displaystyle\int_{{\mathcal{M}}}K_{\epsilon}(x,y)|f(y)-f(x)|p(y)dV(y)
=Bδϵ(x)Kϵ(x,y)|f(y)f(x)|p(y)𝑑V(y)+O(ϵ10)fp\displaystyle=\int_{B_{\delta_{\epsilon}}(x)}K_{\epsilon}(x,y)|f(y)-f(x)|p(y)dV(y)+O(\epsilon^{10})\|f\|_{\infty}\|p\|_{\infty}
Bδϵ(x)Kϵ(x,y)(fyx)p(y)𝑑V(y)+Of,p(ϵ10)\displaystyle\leq\int_{B_{\delta_{\epsilon}}(x)}K_{\epsilon}(x,y)(\|\nabla f\|_{\infty}\|y-x\|)p(y)dV(y)+O_{f,p}(\epsilon^{10})
=Of,p(ϵ)+Of,p(ϵ10)=O(ϵ).\displaystyle=O_{f,p}(\sqrt{\epsilon})+O_{f,p}(\epsilon^{10})=O(\sqrt{\epsilon}).

The Of,p(ϵ)O_{f,p}(\sqrt{\epsilon}) is obtained because p\|p\|_{\infty}, f\|\nabla f\|_{\infty} are finite constants, and

1ϵBδϵ(x)Kϵ(x,y)yx𝑑V(y)=Bδϵ(x)ϵd/2h(xy2ϵ)yxϵ𝑑V(y)\displaystyle\frac{1}{\sqrt{\epsilon}}\int_{B_{\delta_{\epsilon}}(x)}K_{\epsilon}(x,y)\|y-x\|dV(y)=\int_{B_{\delta_{\epsilon}}(x)}\epsilon^{-d/2}h(\frac{\|x-y\|^{2}}{\epsilon})\frac{\|y-x\|}{\sqrt{\epsilon}}dV(y)
Bδϵ(x)ϵd/2a0eaxy2ϵyxϵ𝑑V(y)\displaystyle~{}~{}~{}\leq\int_{B_{\delta_{\epsilon}}(x)}\epsilon^{-d/2}a_{0}e^{-a\frac{\|x-y\|^{2}}{\epsilon}}\frac{\|y-x\|}{\sqrt{\epsilon}}dV(y)
u<1.1δϵ,uda0ea1.1u2u0.9(1+O(u2))𝑑u=O(1),\displaystyle~{}~{}~{}\leq\int_{\|u\|<1.1\delta_{\epsilon},\,u\in\mathbb{R}^{d}}a_{0}e^{-\frac{a}{1.1}\|u\|^{2}}\frac{\|u\|}{0.9}(1+O(\|u\|^{2}))du=O(1), (54)

where udu\in\mathbb{R}^{d} is the projected coordinates in the tangent plane Tx()T_{x}({\mathcal{M}}), and the comparison of xyD\|x-y\|_{\mathbb{R}^{D}} to u\|u\| (namely 0.9xyD<u<1.1xyD0.9\|x-y\|_{\mathbb{R}^{D}}<\|u\|<1.1\|x-y\|_{\mathbb{R}^{D}}) and the volume comparison (namely dV(y)=(1+O(u2))dudV(y)=(1+O(\|u\|^{2}))du) hold when δϵ<δ0()\delta_{\epsilon}<\delta_{0}({\mathcal{M}}) which is a constant depending on {\mathcal{M}}, see e.g. Lemma A.1 in [9].

Meanwhile, |Yj||Y_{j}| is bounded by LY=fΘ(ϵd/2)L_{Y}=\|f\|_{\infty}\Theta(\epsilon^{-d/2}), and the variance of YjY_{j} is bounded by 𝔼Yj2\mathbb{E}Y_{j}^{2} and then bounded by νY=Θ(ϵd/2+1)\nu_{Y}=\Theta(\epsilon^{-d/2+1}), by a similar calculation as in (54). We attempt at the large deviation bound at Θ(logNNνY)(logNNϵd/21)1/2\Theta(\sqrt{\frac{\log N}{N}\nu_{Y}})\sim(\frac{\log N}{N\epsilon^{d/2-1}})^{1/2} which is of small order than νYLY=Θ(ϵ)\frac{\nu_{Y}}{L_{Y}}=\Theta(\epsilon) under the theorem condition that ϵd/2+1=Ω(logNN)\epsilon^{d/2+1}=\Omega(\frac{\log N}{N}). Thus, for each ii, when NN is enough where the threshold is determined by (,f,p)({\mathcal{M}},f,p) and uniform for xix_{i}, w.p. >12N10>1-2N^{-10},

1N1jiYj=𝔼Yj+O(logNNϵd/21)=O(ϵ)+o(ϵ)=O(ϵ).\frac{1}{N-1}\sum_{j\neq i}Y_{j}=\mathbb{E}Y_{j}+O(\sqrt{\frac{\log N}{N\epsilon^{d/2-1}}})=O(\sqrt{\epsilon})+o(\epsilon)=O(\sqrt{\epsilon}).

The j=ij=i term in (50) equals zero. By the same argument of independence of xix_{i} from {xj}ji\{x_{j}\}_{j\neq i} and the union bound over NN events, we have proved (50).

Proof of (52): Note that

pp~ϵ=11+ϵm~(ω+Δpp)=1ϵm~(ω+Δpp)+ϵ2rϵ=1ϵr1+ϵ2rϵ,\frac{p}{\tilde{p}_{\epsilon}}=\frac{1}{1+\epsilon\tilde{m}(\omega+\frac{\Delta p}{p})}=1-\epsilon\tilde{m}(\omega+\frac{\Delta p}{p})+\epsilon^{2}r_{\epsilon}=1-\epsilon r_{1}+\epsilon^{2}r_{\epsilon},

where r1:=m~(ω+Δpp)r_{1}:=\tilde{m}(\omega+\frac{\Delta p}{p}) is a deterministic function, r1C()r_{1}\in C^{\infty}({\mathcal{M}}); rϵC()r_{\epsilon}\in C^{\infty}({\mathcal{M}}), and rϵ=O(1)\|r_{\epsilon}\|_{\infty}=O(1) when ϵ\epsilon is less than some O(1)O(1) threshold due to that ω+Δpp=O(1)\|\omega+\frac{\Delta p}{p}\|_{\infty}=O(1). Then,

Kϵ(xi,y)fpp~ϵ(y)𝑑V(y)=Kϵ(xi,y)f(y)(1ϵr1+ϵ2rϵ)(y)𝑑V(y)\displaystyle\int_{{\mathcal{M}}}K_{\epsilon}(x_{i},y)\frac{fp}{\tilde{p}_{\epsilon}}(y)dV(y)=\int_{{\mathcal{M}}}K_{\epsilon}(x_{i},y)f(y)(1-\epsilon r_{1}+\epsilon^{2}r_{\epsilon})(y)dV(y)
=Kϵ(xi,y)f(y)𝑑V(y)ϵKϵ(xi,y)(fr1)(y)𝑑V(y)+ϵ2Kϵ(xi,y)(frϵ)(y)𝑑V(y)\displaystyle~{}~{}~{}=\int_{{\mathcal{M}}}K_{\epsilon}(x_{i},y)f(y)dV(y)-\epsilon\int_{{\mathcal{M}}}K_{\epsilon}(x_{i},y)(fr_{1})(y)dV(y)+\epsilon^{2}\int_{{\mathcal{M}}}K_{\epsilon}(x_{i},y)(fr_{\epsilon})(y)dV(y)
=(m0f(xi)+m22ϵ(ωf+Δf)(xi)+O(ϵ2))ϵ(m0fr1(xi)+O(ϵ))+O(ϵ2)\displaystyle~{}~{}~{}=\left(m_{0}f(x_{i})+\frac{m_{2}}{2}\epsilon(\omega f+\Delta f)(x_{i})+O(\epsilon^{2})\right)-\epsilon\left(m_{0}fr_{1}(x_{i})+O(\epsilon)\right)+O(\epsilon^{2})
=m0f(xi)+m22ϵ(ωf+Δf1m~fr1)(xi)+O(ϵ2),\displaystyle~{}~{}~{}=m_{0}f(x_{i})+\frac{m_{2}}{2}\epsilon(\omega f+\Delta f-\frac{1}{\tilde{m}}fr_{1})(x_{i})+O(\epsilon^{2}),

and taking f=1f=1 gives that

Kϵ(xi,y)pp~ϵ(y)𝑑V(y)=m0+m22ϵ(ω1m~r1)(xi)+O(ϵ2).\int_{{\mathcal{M}}}K_{\epsilon}(x_{i},y)\frac{p}{\tilde{p}_{\epsilon}}(y)dV(y)=m_{0}+\frac{m_{2}}{2}\epsilon(\omega-\frac{1}{\tilde{m}}r_{1})(x_{i})+O(\epsilon^{2}).

Putting together and subtracting the two terms in (52) proves that 𝔼Yj=m22ϵΔf(xi)+O(ϵ2)\mathbb{E}Y_{j}=\frac{m_{2}}{2}\epsilon\Delta f(x_{i})+O(\epsilon^{2}). ∎

6.2 Dirichlet form convergence of density-corrected graph Laplacian

The graph Dirichlet form of density-corrected graph Laplacian is defined as

E~N(u):=1m22m02ϵuT(D~W~)u=1m2m02ϵi,j=1NW~i,j(uiuj)2=1m2m02ϵi,j=1NWi,j(uiuj)2DiDj.\tilde{E}_{N}(u):=\frac{1}{\frac{m_{2}}{2m_{0}^{2}}\epsilon}u^{T}(\tilde{D}-\tilde{W})u=\frac{1}{\frac{m_{2}}{m_{0}^{2}}\epsilon}\sum_{i,j=1}^{N}\tilde{W}_{i,j}(u_{i}-u_{j})^{2}=\frac{1}{\frac{m_{2}}{m_{0}^{2}}\epsilon}\sum_{i,j=1}^{N}W_{i,j}\frac{(u_{i}-u_{j})^{2}}{D_{i}D_{j}}. (55)

We establish the counter part of Theorem 3.2, which achieves the same form rate. The theorem is for general differentiable hh, which can be of independent interest.

Theorem 6.3.

Under Assumptions 1 and 2, if as NN\to\infty, ϵ0+\epsilon\to 0+, ϵd/2N=Ω(logN)\epsilon^{d/2}N=\Omega(\log N), then for any fC()f\in C^{\infty}({{\mathcal{M}}}), when NN is sufficiently large, w.p. >12N92N10>1-2N^{-9}-2N^{-10},

E~N(ρXf)=f,Δf+Op,f(ϵ,logNNϵd/2).\tilde{E}_{N}(\rho_{X}f)=\langle f,-\Delta f\rangle+O_{p,f}\left(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}\right).
Proof of Theorem 6.3.

By definition (55),

E~N(ρXf)=1m2m02ϵ1N2i,j=1NWi,j(f(xi)f(xj))2DiNDjN.\tilde{E}_{N}(\rho_{X}f)=\frac{1}{\frac{m_{2}}{m_{0}^{2}}\epsilon}\frac{1}{N^{2}}\sum_{i,j=1}^{N}W_{i,j}\frac{(f(x_{i})-f(x_{j}))^{2}}{\frac{D_{i}}{N}\frac{D_{j}}{N}}.

The following lemma (proved in Appendix D) makes use of the concentration of Di/ND_{i}/N to reduce the graph Dirichlet form to be a V-statistics up to a relative error at the form rate.

Lemma 6.4.

Under the good event in Lemma 6.1 1),

E~N(u)=(1m2[h]ϵ1N2i,j=1NWi,j(uiuj)2p(xi)p(xj))(1+O(ϵ,logNNϵd/2)),uN,\tilde{E}_{N}(u)=\left(\frac{1}{m_{2}[h]\epsilon}\frac{1}{N^{2}}\sum_{i,j=1}^{N}W_{i,j}\frac{(u_{i}-u_{j})^{2}}{p(x_{i})p(x_{j})}\right)\left(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})\right),\quad\forall u\in\mathbb{R}^{N},

and the constant in big-OO is determined by (,p)({\mathcal{M}},p) and uniform for all uu.

We consider under the good event in Lemma 6.1 1), which is called E1E_{1} and happens w.p. >12N9>1-2N^{-9}. Then applying Lemma 6.4 with u=ρXfu=\rho_{X}f, we have that

E~N(ρXf)={1m2ϵ1N2i,j=1NWi,j(f(xi)f(xj))2p(xi)p(xj)}(1+O(ϵ,logNNϵd/2))=:(1+O(ϵ,logNNϵd/2))\tilde{E}_{N}(\rho_{X}f)=\left\{\frac{1}{m_{2}\epsilon}\frac{1}{N^{2}}\sum_{i,j=1}^{N}W_{i,j}\frac{(f(x_{i})-f(x_{j}))^{2}}{p(x_{i})p(x_{j})}\right\}(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}))=:③(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})) (56)

The term in (56) equals 1N2i,j=1NVi,j\frac{1}{N^{2}}\sum_{i,j=1}^{N}V_{i,j}, where Vi,j:=1m2ϵKϵ(xi,xj)(f(xi)f(xj))2p(xi)p(xj)V_{i,j}:=\frac{1}{m_{2}\epsilon}K_{\epsilon}(x_{i},x_{j})\frac{(f(x_{i})-f(x_{j}))^{2}}{p(x_{i})p(x_{j})}, and Vi,i=0V_{i,i}=0. We follow the same approach as in the proof of Theorem 3.4 in [9] to analyze this V-statistic, and show that (proof in Appendix D)

{  in (56}=f,Δf+Of,p(ϵ,logNNϵd/2).\{\text{ $③$ in \eqref{eq:form-pf-1} }\}=\langle f,-\Delta f\rangle+O_{f,p}(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}). (57)

Back to (56), we have shown that under E1E3E_{1}\cap E_{3},

E~N(ρXf)\displaystyle\tilde{E}_{N}(\rho_{X}f) =(1+O(ϵ,logNNϵd/2))=(f,Δf+O(ϵ,logNNϵd/2))(1+O(ϵ,logNNϵd/2))\displaystyle=③(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}))=\left(\langle f,-\Delta f\rangle+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})\right)(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}))
=f,Δf+O(ϵ,logNNϵd/2),\displaystyle=\langle f,-\Delta f\rangle+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}),

and the constant in big-OO depends on {\mathcal{M}}, ff and pp. ∎

6.3 Eigen convergence of L~rw\tilde{L}_{rw}

In this subsection, let λk\lambda_{k} be eigenvalues of L~rw\tilde{L}_{rw} and vkv_{k} the associated eigenvectors. By (55), recall that m~=m22m0\tilde{m}=\frac{m_{2}}{2m_{0}}, the analogue of (8) is the following

λk=minLN,dim(L)=ksupvL,v01ϵm~vT(D~W~)vvTD~v=1m0E~N(v)vTD~v,1kN.\lambda_{k}=\min_{L\subset\mathbb{R}^{N},\,dim(L)=k}\sup_{v\in L,v\neq 0}\frac{\frac{1}{\epsilon\tilde{m}}v^{T}(\tilde{D}-\tilde{W})v}{v^{T}\tilde{D}v}=\frac{\frac{1}{m_{0}}\tilde{E}_{N}(v)}{v^{T}\tilde{D}v},\quad 1\leq k\leq N. (58)

The methodology is same as before, with a main difference in the definition of the heat interpolation mapping with weights p(xj)p(x_{j}) as in (59). This gives to the pp-weighted quadratic form q~s(u)\tilde{q}_{s}(u) defined in (60), for which we derive the concentration argument of for q~s(0)\tilde{q}^{(0)}_{s} in (A.33) and the upper bound of q~s(2)\tilde{q}^{(2)}_{s} in Lemma D.2. The other difference is that the D~\tilde{D}-weighted 2-norm is considered because the eigenvectors are D~\tilde{D}-orthogonal. All the proofs of the Steps 0-3 and Theorem 6.7 are left to Appendix D.

Step 0. We first establish eigenvalue UB based on Lemma 6.1 and the form convergence in Theorem 6.3.

Proposition 6.5 (Eigenvalue UB of L~rw\tilde{L}_{rw}).

Under Assumptions 1 and 2, for fixed KK\in\mathbb{N}, Suppose 0=μ1<<μK<0=\mu_{1}<\cdots<\mu_{K}<\infty are all of single multiplicity. If as NN\to\infty, ϵ0+\epsilon\to 0+, and ϵd/2=Ω(logNN)\epsilon^{d/2}=\Omega(\frac{\log N}{N}), then for sufficiently large NN, w.p. >14N94K2N10>1-4N^{-9}-4K^{2}N^{-10}, L~rw\tilde{L}_{rw} is well-defined, and

λkμk+O(ϵ,logNNϵd/2),k=1,,K.\lambda_{k}\leq\mu_{k}+O\left(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}\right),\quad k=1,\cdots,K.

Step 1. Eigenvalue crude LB. We prove with the pp-weighted interpolation mapping defined as

I~r[u]=1Nj=1Nujp(xj)Hr(x,xj)=Ir[u~],u~i=ui/p(xi).\tilde{I}_{r}[u]=\frac{1}{N}\sum_{j=1}^{N}\frac{u_{j}}{p(x_{j})}H_{r}(x,x_{j})=I_{r}[\tilde{u}],\quad\tilde{u}_{i}=u_{i}/p(x_{i}). (59)

Then, same as before, I~r[u],I~r[u]=qδϵ(u~)\langle\tilde{I}_{r}[u],\tilde{I}_{r}[u]\rangle=q_{\delta\epsilon}(\tilde{u}), and I~r[u],QtI~r[u]=qϵ(u~)\langle\tilde{I}_{r}[u],Q_{t}\tilde{I}_{r}[u]\rangle=q_{\epsilon}(\tilde{u}), where for s>0s>0,

q~s(u):=1N2i,j=1NHs(xi,xj)p(xi)p(xj)uiuj=qs(u~)=q~s(0)(u)q~s(2)(u),q~s(0)(u):=1Ni=1Nui2(1Nj=1NHs(xi,xj)p(xi)p(xj)),q~s(2)(u):=12N2i,j=1NHs(xi,xj)p(xi)p(xj)(uiuj)2.\begin{split}\tilde{q}_{s}(u)&:=\frac{1}{N^{2}}\sum_{i,j=1}^{N}\frac{{H}_{s}(x_{i},x_{j})}{p(x_{i})p(x_{j})}u_{i}u_{j}=q_{s}(\tilde{u})=\tilde{q}^{(0)}_{s}(u)-\tilde{q}^{(2)}_{s}(u),\\ \tilde{q}^{(0)}_{s}(u)&:=\frac{1}{N}\sum_{i=1}^{N}u_{i}^{2}\left(\frac{1}{N}\sum_{j=1}^{N}\frac{{H}_{s}(x_{i},x_{j})}{p(x_{i})p(x_{j})}\right),\quad\tilde{q}^{(2)}_{s}(u):=\frac{1}{2N^{2}}\sum_{i,j=1}^{N}\frac{{H}_{s}(x_{i},x_{j})}{p(x_{i})p(x_{j})}(u_{i}-u_{j})^{2}.\end{split} (60)
Proposition 6.6 (Initial crude eigenvalue LB of L~rw\tilde{L}_{rw}).

Under Assumptions 1, hh is Gaussian. For fixed kmaxk_{max}\in\mathbb{N}, K=kmax+1K=k_{max}+1, and μk\mu_{k}, ϵ\epsilon and NN satisfy the same condition as in Proposition 4.1, where the definition of cKc_{K} is the same except that cc is a constant depending on (,p)({\mathcal{M}},p). Then, for sufficiently large NN, w.p.>14K2N108N9>1-4K^{2}N^{-10}-8N^{-9}, λk>μkγK\lambda_{k}>\mu_{k}-\gamma_{K}, for k=2,,Kk=2,\cdots,K.

Steps 2-3. We prove eigenvector consistency and refined eigenvalue convergence rate. Define

uD~2:=i=1Nui2D~i,uN.\|u\|_{\tilde{D}}^{2}:=\sum_{i=1}^{N}u_{i}^{2}\tilde{D}_{i},\quad\forall u\in\mathbb{R}^{N}. (61)

The proof uses same techniques as before, and the differences is in handling the D~\tilde{D}-orthogonality of the eigenvectors and using the concentration arguments in Lemma 6.1. Same as before, extension to when μk\mu_{k} has greater than 1 multiplicity is possible (Remark 5).

Theorem 6.7 (eigen-convergence of L~rw\tilde{L}_{rw}).

Under the same condition and setting of {\mathcal{M}}, pp being uniform, hh being Gaussian, and kmaxk_{max}, KK, μk\mu_{k}, ϵ\epsilon same as in Theorem 5.4, where the definition of cKc_{K} is the same except that cc is a constant depending on (,p)({\mathcal{M}},p). Consider first kmaxk_{max} eigenvalues and eigenvectors of L~rw\tilde{L}_{rw}, L~rwvk=λkvk\tilde{L}_{rw}v_{k}=\lambda_{k}v_{k}, and vkv_{k} are normalized s.t. NvkD~2=1N\|v_{k}\|_{\tilde{D}}^{2}=1. Define for 1kK1\leq k\leq K,

ϕ~k:=ρX(1Nψk).\tilde{\phi}_{k}:=\rho_{X}\left(\frac{1}{\sqrt{N}}\psi_{k}\right).

Then, for sufficiently large NN, w.p.>14K2N10(4K+8)N9>1-4K^{2}N^{-10}-(4K+8)N^{-9}, vk2=Θ(1)\|v_{k}\|_{2}=\Theta(1), and the same bounds as in Theorem 5.4 hold for |μkλk||\mu_{k}-\lambda_{k}| and vkαkϕ~k2\|v_{k}-\alpha_{k}\tilde{\phi}_{k}\|_{2}, for 1kkmax1\leq k\leq k_{max}, with certain scalars αk\alpha_{k} satisfying |αk|=1+o(1)|\alpha_{k}|=1+o(1),

Refer to caption
(a)
Refer to caption
(b)
Figure 2: Data points are sampled uniformly on S1S^{1} embedded in 4\mathbb{R}^{4}. (a) The eigenvalue relative error RelErrλ\text{RelErr}_{\lambda}, visualized (in log10\log_{10}) as a field on a grid of (log10\log_{10}) NN and ϵ\epsilon, kmax=9k_{max}=9. The red curve on the left plot indicates the post-selected optimal ϵ\epsilon which minimizes the error, and that minimal error as a function of NN is plotted on the right in log-log scale. (b) Same plot as (a) for eigenvector relative error RelErrv\text{RelErr}_{v}. The relative errors are defined in (62). The empirical errors are averaged over 500 runs of experiments, and the log error values are smoothed over the grid for better visualization. Plots of the raw values are shown in Fig. A.1.
Refer to caption
(a)
Refer to caption
(b)
Figure 3: Data points are sampled uniformly on S2S^{2} embedded in 3\mathbb{R}^{3}, same plots as Fig. 2. kmax=9k_{max}=9, and the plots of raw values are shown in Fig. A.2.

7 Numerical experiments

In this section gives numerical results of point-wise convergence and eigen-convergence of graph Laplacians built from simulated manifold data. Codes are released at https://github.com/xycheng/eigconvergence_gaussian_kernel.

7.1 Eigen-convergence of LrwL_{rw}

We test on two simulated datasets, which are uniformly sampled on S1S^{1} (embedded in 4\mathbb{R}^{4}, the formula is in Appendix A) and unit sphere S2S^{2} (embedded in 3\mathbb{R}^{3}). For both datasets, we compute over an increasing number of samples N={562,,1584}N=\{562,\cdots,1584\} and a range of values of ϵ\epsilon, where the grid points of both NN and ϵ\epsilon are evenly spaced in log scale. For each value of NN and ϵ\epsilon, we generate NN data points, construct the kernelized matrix Wij=Kϵ(xi,xj)W_{ij}=K_{\epsilon}(x_{i},x_{j}) as defined in (1) with Gaussian hh, and compute the first 10 eigenvalues λk\lambda_{k} and eigenvectors vkv_{k} of LrwL_{rw}. The errors are computed by

RelErrλ=k=2kmax|λkμk|μk,RelErrv=k=2kmaxvkϕk2ϕk2,\text{RelErr}_{\lambda}=\sum_{k=2}^{k_{max}}\frac{|\lambda_{k}-\mu_{k}|}{\mu_{k}},\quad\text{RelErr}_{v}=\sum_{k=2}^{k_{max}}\frac{\|v_{k}-\phi_{k}\|_{2}}{\|\phi_{k}\|_{2}}, (62)

where ϕk\phi_{k} is as defined by (39). The experiment is repeated for 500 replicas from which the averaged empirical errors are computed. For the data on S1S^{1}, ϵ={102.8,,104}\epsilon=\{10^{-2.8},\cdots,10^{-4}\}. The manifold (in first 3 coordinates) is illustrated in Fig. 4(a) but the density is uniform here. See more details in Appendix A. For the data on S2S^{2}, ϵ={100.2,,101.8}\epsilon=\{10^{-0.2},\cdots,10^{-1.8}\}. These ranges are chosen so that the minimal error over ϵ\epsilon for each NN are observed, at least for RelErrλ\text{RelErr}_{\lambda}. Note that for S1S^{1}, the population eigenvalues starting from μ2\mu_{2} are of multiplicity 2, and for S2S^{2}, the multiplicities are 3, 5, \cdots.

The results are shown in Figures 2 and 3. For data on S1S^{1}, Fig. 2 (a) shows that RelErrλ\text{RelErr}_{\lambda} as a function of NN (with post-selected best ϵ\epsilon) shows a convergence order of about N0.4N^{-0.4}, which is consistent with the theoretical bound of N1/(d/2+2)N^{-1/(d/2+2)} in Theorem 5.5, since d=1d=1 here. In the left plot of colored field, the log error values are smoothed over the grid of NN and ϵ\epsilon, and the best ϵ\epsilon scales with NN as about N0.4N^{-0.4}. The empirical scaling of optimal ϵ\epsilon is less stable to observe: depending on the level of smoothing, the slope of log10ϵ\log_{10}\epsilon varies between -0.2 and -0.5 (the left plot), while the slope for best (log) error is always about -0.4 (the right plot). The result without smoothing is shown in Fig. A.1. The eigenvector error in Fig. 2(b) shows an order of about N0.5N^{-0.5}, which is better than the theoretical prediction. For the data on S2S^{2}, the eigenvalue convergence shows an order of about N0.33N^{-0.33}, in agreement with the theoretical rate of N1/(d/2+2)N^{-1/(d/2+2)} when d=2d=2. The eigenvector error again shows an order of about N0.5N^{-0.5} which is better than theory. The small error of eigenvector estimation at very large value of ϵ\epsilon may be due to the symmetry of the simple manifolds S1S^{1} and S2S^{2}. In both experiments, the eigenvector estimation prefers a much larger value of ϵ\epsilon than the eigenvalue estimation, which is consistent with the theory.

Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Figure 4: (a) Random sampled data on S1S^{1} embedded in 4\mathbb{R}^{4}, the first 3 coordinates are shown, and colored by the density. (b) Density pp and the test function ff plotted as a function of intrinsic coordinate (arc-length) on [0,1)[0,1) of S1S^{1}. (c) One realization of L~rw(ρXf)\tilde{L}_{rw}(\rho_{X}f) plotted in comparison with the true function of ρX(Δf)\rho_{X}(\Delta f). (d) Log relative error log10RelErrpt\log_{10}\text{RelErr}_{pt}, as defined in (63), computed over a range of values of ϵ\epsilon, averaged over 50 runs of repeated experiments. The two fitted lines shows the approximate scaling of RelErrpt\text{RelErr}_{pt} at small ϵ\epsilon, where variance error dominates, and at large ϵ\epsilon, where bias error dominates.
Refer to caption
(a)
Refer to caption
(b)
Figure 5: Same eigenvalue and eigenvector relative error plots as Fig. 2, where data are non-uniformly sampled on S1S^{1} as in Fig. 4(a). kmax=9k_{max}=9, and the plots of raw values are shown in Fig. A.3.

7.2 Density-corrected graph Laplacian

To examine the density-corrected graph Laplacian, we switch to non-uniform density on S1S^{1}, illustrated in Fig. 4(a). We first investigate the point-wise convergence of L~rwf-\tilde{L}_{rw}f to Δf\Delta f, on a test function f:S1f:S^{1}\to\mathbb{R}, see more details in Appendix A. The error is computed as

RelErrpt=L~rwρXfρX(Δf)1ρX(Δf)1,\text{RelErr}_{pt}=\frac{\|-\tilde{L}_{rw}\rho_{X}f-\rho_{X}(\Delta f)\|_{1}}{\|\rho_{X}(\Delta f)\|_{1}}, (63)

and the result is shown in Fig. 4. Theorem 6.2 predicts the bias error to be O(ϵ)O(\epsilon) and the variance error to be O(ϵd/41/2)=O(ϵ3/4)O(\epsilon^{-d/4-1/2})=O(\epsilon^{-3/4}) since NN is fixed, which agrees with Fig. 4(d).

The results of RelErrλ\text{RelErr}_{\lambda} and RelErrv\text{RelErr}_{v} are shown in Fig. 5. The order of convergence with best ϵ\epsilon appears to be about N0.8N^{-0.8} for both eigenvalue and eigenvector errors, which is better than those of LrwL_{rw} (when pp is uniform) in Fig. 2, and better than the theoretical prediction in Theorem 6.7.

8 Discussion

The current result may be extended in several directions. First, for manifold with smooth boundary, the random-walk graph Laplacian recovers the Neumann Laplacian [10], and one can expect to prove the spectral convergence as well, such as in [22]. Second, extension to kernel with variable or adaptive bandwidth [5, 9], and other normalization schemes, e.g., bi-stochastic normalization [23, 20, 36], would be important to improve the robustness against low sampling density and noise in data, and even the spectral convergence as well. Related is the problem of spectral convergence to other manifold diffusion operators, e.g., the Fokker-Planck operator, on L2(,pdV)L^{2}({\mathcal{M}},pdV). It would also be interesting to extend to more general types of kernel function hh which is not Gaussian, and even not symmetric [37], for the spectral convergence. Relaxing the condition on the kernel bandwidth ϵ\epsilon can also be useful: the optimal transport approach was able to show spectral consistency in the regime just beyond graph connectivity, namely when ϵd/2logN/N\epsilon^{d/2}\gg\log N/N [7], which is less restrictive than the condition needed by Gaussian kernel in the current paper. Being able to extend the analysis to very sparse graph is important for applications. At last, further investigation is needed to explain the good spectral convergence observed in experiments, particularly that of the eigenvector convergence and the faster rate with density-corrected graph Laplacian. For the eigenvector convergence, the current work focuses on the 2-norm consistency, while the \infty-norm consistency as has been derived in [11, 8] is also important to study.

Acknowledgement

The authors thank Hau-Tieng Wu for helpful discussion. Cheng thanks Yiping Lu for helpful discussion on the eigen-convergence problem. The work is supported by NSF DMS-2007040. XC is partially supported by NSF, NIH, and the Alfred P. Sloan Foundation.

References

  • [1] Donald Gary Aronson. Bounds for the fundamental solution of a parabolic equation. Bulletin of the American Mathematical Society, 73(6):890–896, 1967.
  • [2] Mukund Balasubramanian and Eric L Schwartz. The isomap algorithm and topological stability. Science, 295(5552):7–7, 2002.
  • [3] Mikhail Belkin and Partha Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 15(6):1373–1396, 2003.
  • [4] Mikhail Belkin and Partha Niyogi. Convergence of Laplacian eigenmaps. In Advances in Neural Information Processing Systems, pages 129–136, 2007.
  • [5] Tyrus Berry and John Harlim. Variable bandwidth diffusion kernels. Applied and Computational Harmonic Analysis, 40(1):68–96, 2016.
  • [6] Dmitri Burago, Sergei Ivanov, and Yaroslav Kurylev. A graph discretization of the Laplace-Beltrami operator. Journal of Spectral Theory, 4(4):675–714, 2014.
  • [7] Jeff Calder and Nicolas Garcia Trillos. Improved spectral convergence rates for graph Laplacians on ϵ\epsilon-graphs and k-NN graphs. Applied and Computational Harmonic Analysis, 60:123–175, 2022.
  • [8] Jeff Calder, Nicolas Garcia Trillos, and Marta Lewicka. Lipschitz regularity of graph Laplacians on random data clouds. SIAM Journal on Mathematical Analysis, 54(1):1169–1222, 2022.
  • [9] Xiuyuan Cheng and Hau-Tieng Wu. Convergence of graph Laplacian with knn self-tuned kernels. Information and Inference: A Journal of the IMA, 2021.
  • [10] Ronald R Coifman and Stéphane Lafon. Diffusion maps. Applied and Computational Harmonic Analysis, 21(1):5–30, 2006.
  • [11] David B Dunson, Hau-Tieng Wu, and Nan Wu. Spectral convergence of graph Laplacian and heat kernel reconstruction in LL^{\infty} from random samples. Applied and Computational Harmonic Analysis, 55:282–336, 2021.
  • [12] Ahmed El Alaoui, Xiang Cheng, Aaditya Ramdas, Martin J Wainwright, and Michael I Jordan. Asymptotic behavior of p\ell_{p}-based Laplacian regularization in semi-supervised learning. In Conference on Learning Theory, pages 879–906, 2016.
  • [13] Noureddine El Karoui and Hau-Tieng Wu. Graph connection Laplacian methods can be made robust to noise. The Annals of Statistics, 44(1):346–372, 2016.
  • [14] Justin Eldridge, Mikhail Belkin, and Yusu Wang. Unperturbed: spectral analysis beyond Davis-Kahan. arXiv preprint arXiv:1706.06516, 2017.
  • [15] M Flores, J Calder, and G Lerman. Algorithms for lp-based semi-supervised learning on graphs. arXiv preprint arXiv:1901.05031, 2019.
  • [16] Alexander Grigor’yan. Gaussian upper bounds for the heat kernel on arbitrary manifolds. Journal of Differential Geometry, 45:33–52, 1997.
  • [17] Alexander Grigor’yan. Heat kernel and analysis on manifolds, volume 47. American Mathematical Society, Providence, RI, 2009.
  • [18] Matthias Hein. Uniform convergence of adaptive graph-based regularization. In International Conference on Computational Learning Theory, pages 50–64. Springer, 2006.
  • [19] Matthias Hein, Jean-Yves Audibert, and Ulrike Von Luxburg. From graphs to manifolds–weak and strong pointwise consistency of graph Laplacians. In International Conference on Computational Learning Theory, pages 470–485. Springer, 2005.
  • [20] Boris Landa, Ronald R Coifman, and Yuval Kluger. Doubly-stochastic normalization of the Gaussian kernel is robust to heteroskedastic noise. arXiv preprint arXiv:2006.00402, 2020.
  • [21] Peter Li, Shing Tung Yau, et al. On the parabolic kernel of the Schrödinger operator. Acta Mathematica, 156:153–201, 1986.
  • [22] Jinpeng Lu. Graph approximations to the Laplacian spectra. Journal of Topology and Analysis, pages 1–35, 2020.
  • [23] Nicholas F Marshall and Ronald R Coifman. Manifold learning with bi-stochastic kernels. IMA Journal of Applied Mathematics, 84(3):455–482, 2019.
  • [24] Boaz Nadler, Nathan Srebro, and Xueyuan Zhou. Semi-supervised learning with the graph Laplacian: The limit of infinite unlabelled data. Advances in Neural Information Processing Systems, 22:1330–1338, 2009.
  • [25] Steven Rosenberg. The Laplacian on a Riemannian manifold: An introduction to analysis on manifolds. Number 31. Cambridge University Press, 1997.
  • [26] Zuoqiang Shi. Convergence of Laplacian spectra from random samples. arXiv preprint arXiv:1507.00151, 2015.
  • [27] Amit Singer. From graph to manifold Laplacian: The convergence rate. Applied and Computational Harmonic Analysis, 21(1):128–134, 2006.
  • [28] Amit Singer and Hau-Tieng Wu. Spectral convergence of the connection Laplacian from random samples. Information and Inference: A Journal of the IMA, 6(1):58–123, 2016.
  • [29] Dejan Slepcev and Matthew Thorpe. Analysis of p-Laplacian regularization in semisupervised learning. SIAM Journal on Mathematical Analysis, 51(3):2085–2120, 2019.
  • [30] Ronen Talmon, Israel Cohen, Sharon Gannot, and Ronald R Coifman. Diffusion maps for signal processing: A deeper look at manifold-learning techniques based on kernels and graphs. IEEE signal processing magazine, 30(4):75–86, 2013.
  • [31] Daniel Ting, Ling Huang, and Michael Jordan. An analysis of the convergence of graph Laplacians. arXiv preprint arXiv:1101.5435, 2011.
  • [32] Nicolás García Trillos, Moritz Gerlach, Matthias Hein, and Dejan Slepčev. Error estimates for spectral convergence of the graph Laplacian on random geometric graphs toward the Laplace–Beltrami operator. Foundations of Computational Mathematics, 20(4):827–887, 2020.
  • [33] Laurens Van Der Maaten, Eric Postma, and Jaap Van den Herik. Dimensionality reduction: a comparative review. J Mach Learn Res, 10(66-71):13, 2009.
  • [34] Ulrike Von Luxburg, Mikhail Belkin, and Olivier Bousquet. Consistency of spectral clustering. The Annals of Statistics, pages 555–586, 2008.
  • [35] Xu Wang. Spectral convergence rate of graph Laplacian. arXiv preprint arXiv:1510.08110, 2015.
  • [36] Caroline L. Wormell and Sebastian Reich. Spectral convergence of diffusion maps: Improved error bounds and an alternative normalization. SIAM Journal on Numerical Analysis, 59(3):1687–1734, 2021.
  • [37] Hau-Tieng Wu and Nan Wu. Think globally, fit locally under the manifold setup: Asymptotic analysis of locally linear embedding. Annals of Statistics, 46(6B):3805–3837, 2018.

Appendix A Details of numerical experiments

In the example of S1S^{1} data, the isometric embedding in 4\mathbb{R}^{4} is by

ι(t)=12π5(cos(2πt),sin(2πt),23cos(2π3t),23sin(2π3t)),\iota(t)=\frac{1}{2\pi\sqrt{5}}\left(\cos(2\pi t),\sin(2\pi t),\frac{2}{3}\cos(2\pi 3t),\frac{2}{3}\sin(2\pi 3t)\right),

where t[0,1)t\in[0,1) is the intrinsic coordinate of S1S^{1} (arc-length). In the example in Section. 7.2 where pp is not uniform, p(t)=1+12sin(2π2t)+0.62sin(2π5t)p(t)=1+\frac{1}{2}\sin(2\pi 2t)+\frac{0.6}{2}\sin(2\pi 5t), and the test function f(t)=0.2sin(4πt)0.8sin(4π2t)f(t)=0.2\sin(4\pi t)-0.8\sin(4\pi 2t). In the example of S2S^{2} data, sample are on unit sphere in 3\mathbb{R}^{3}.

In both plots of the raw error data without smoothing, Figures A.1 and A.2 the slope of error convergence rates (about -0.4 and - 0.33) are about the same. The slope of post-selected optimal (log) ϵ\epsilon as a function of (log) NN changes, due to the closeness of the values over the multiple values of ϵ\epsilon.

Refer to caption
(a)
Refer to caption
(b)
Figure A.1: Same plots as Fig. 2 where the log error values on the (log) grid of NN and ϵ\epsilon are without smoothing.
Refer to caption
(a)
Refer to caption
(b)
Figure A.2: Same plots as Fig. 3 where the log error values on the (log) grid of NN and ϵ\epsilon are without smoothing.
Refer to caption
(a)
Refer to caption
(b)
Figure A.3: Same plots as Fig. 5 where the log error values on the (log) grid of NN and ϵ\epsilon are without smoothing.

Appendix B More preliminaries

Throughout the paper, we use the following version of classical Bernstein inequality, where the tail probability uses ν>0\nu>0 which is an upper bound of the variance. We use the sub-Gaussian near-tail, which holds when the tempted deviation threshold t<3νLt<\frac{3\nu}{L}.

Lemma B.1 (Classical Bernstein).

Let ξj\xi_{j} be i.i.d. bounded random variables, j=1,,Nj=1,\cdots,N, 𝔼ξj=0\mathbb{E}\xi_{j}=0. If |ξj|L|\xi_{j}|\leq L and 𝔼ξj2ν\mathbb{E}\xi_{j}^{2}\leq\nu for L,ν>0L,\nu>0, then

Pr[1Nj=1Nξj>t],Pr[1Nj=1Nξj<t]exp{t2N2(ν+tL3)},t>0.\Pr[\frac{1}{N}\sum_{j=1}^{N}\xi_{j}>t],\,\Pr[\frac{1}{N}\sum_{j=1}^{N}\xi_{j}<-t]\leq\exp\{-\frac{t^{2}N}{2(\nu+\frac{tL}{3})}\},\quad\forall t>0.

In particular, when tL<3νtL<3\nu, both the tail probabilities are bounded by exp{14Nt2ν}\exp\{-\frac{1}{4}\frac{Nt^{2}}{\nu}\}.

Additional proofs in Section 2:

Proof of Theorem 2.1.

Part 1): We provide a direct verification of (10) based on the parametrix construction for completeness, which is not explicitly included in [25].

First note that there is t0t_{0}, determined by {\mathcal{M}} s.t. when t<t0t<t_{0},

Gt(x,y)𝑑V(y)=Gt(y,x)𝑑V(y)C6,x,\int_{{\mathcal{M}}}G_{t}(x,y)dV(y)=\int_{{\mathcal{M}}}G_{t}(y,x)dV(y)\leq C_{6},\quad\forall x\in{\mathcal{M}},

for some C6>0C_{6}>0 depending on {\mathcal{M}}. This is because Gt(x,y)𝑑V(y)\int_{{\mathcal{M}}}G_{t}(x,y)dV(y) up to an O(t)O(t) truncation error equals the integral on Bt:={y,d(x,y)<δt:=(d/2+1)tlog1t}B_{t}:=\{y\in{\mathcal{M}},d_{{\mathcal{M}}}(x,y)<\delta_{t}:=\sqrt{(d/2+1)t\log\frac{1}{t}}\}. By change to the projected coordinate uu in Tx()T_{x}({\mathcal{M}}), the integral domain of uu is contained in 1.1δt1.1\delta_{t}-ball in d\mathbb{R}^{d} for small enough δt\delta_{t}, then

BtGt(x,y)𝑑V(y)\displaystyle\int_{B_{t}}G_{t}(x,y)dV(y) =1(4πt)d/2Bted(x,y)24t𝑑V(y)1(4πt)d/2ud,u<1.1δte0.9u24t(1+O(δt2))𝑑u\displaystyle=\frac{1}{(4\pi t)^{d/2}}\int_{B_{t}}e^{-\frac{d_{\mathcal{M}}(x,y)^{2}}{4t}}dV(y)\leq\frac{1}{(4\pi t)^{d/2}}\int_{u\in\mathbb{R}^{d},\,\|u\|<1.1\delta_{t}}e^{-\frac{0.9\|u\|^{2}}{4t}}(1+O(\delta_{t}^{2}))du
Θ(1)(1+O(tlog1t))=O(1).\displaystyle\leq\Theta(1)(1+O(t\log{\frac{1}{t}}))=O(1).

Next, as has been shown in Chapter 3 of [25], there exist ulC(×)u_{l}\in C^{\infty}({\mathcal{M}}\times{\mathcal{M}}) for l=1,,ml=1,\cdots,m, u0u_{0} satisfies the needed property, and we define Pm(t,x,y)=Gt(x,y)(l=0mtlul(x,y))P_{m}(t,x,y)=G_{t}(x,y)\left(\sum_{l=0}^{m}t^{l}u_{l}(x,y)\right), PmC((0,),×)P_{m}\in C^{\infty}((0,\infty),{\mathcal{M}}\times{\mathcal{M}}). By Theorem 3.22 of [25],

Ht(x,y)Pm(t,x,y)=0t𝑑sQm(ts,x,z)Pm(s,z,y)𝑑V(z),{H}_{t}(x,y)-P_{m}(t,x,y)=\int_{0}^{t}ds\int_{{\mathcal{M}}}Q_{m}(t-s,x,z)P_{m}(s,z,y)dV(z),

where by Lemma 3.18 of [25], there is C7(t0)C_{7}(t_{0}) and thus is determined by {\mathcal{M}} s.t.

supx,y|Qm(s,x,y)|C7smd/2,0st0.\sup_{x,y\in{\mathcal{M}}}|Q_{m}(s,x,y)|\leq C_{7}s^{m-d/2},\quad\forall 0\leq s\leq t_{0}.

As a result, for t<t0t<t_{0},

|Ht(x,y)Pm(t,x,y)|0t𝑑s|Qm(ts,x,z)|Gs(z,y)|l=0mtlul(z,y)|𝑑V(z)\displaystyle|{H}_{t}(x,y)-P_{m}(t,x,y)|\leq\int_{0}^{t}ds\int_{{\mathcal{M}}}|Q_{m}(t-s,x,z)|G_{s}(z,y)\left|\sum_{l=0}^{m}t^{l}u_{l}(z,y)\right|dV(z)
C7tmd/2(l=0mul)0t𝑑sGs(z,y)𝑑V(z)\displaystyle~{}~{}~{}\leq C_{7}t^{m-d/2}(\sum_{l=0}^{m}\|u_{l}\|_{\infty})\int_{0}^{t}ds\int_{{\mathcal{M}}}G_{s}(z,y)dV(z)
C7tmd/2(l=0mul)C6t=O(tmd/2+1).\displaystyle~{}~{}~{}\leq C_{7}t^{m-d/2}(\sum_{l=0}^{m}\|u_{l}\|_{\infty})C_{6}t=O(t^{m-d/2+1}).

Part 2) is a classical results proved in several places, see e.g. Theorem 1.1 in [16] combined with supxHt(x,x)C5td/2\sup_{x\in{\mathcal{M}}}{H}_{t}(x,x)\leq C_{5}t^{-d/2} for some C5C_{5} depending on manifold, which can be deduced from Part 1). The constant 5 in 5t5t in the exponential in (11) can be made any constant greater than 4, and the constant C3C_{3} change accordingly. ∎

Proof of Lemma 2.2.

Let m=d2+3m=\lceil\frac{d}{2}+3\rceil, mm is a positive integer md23m-\frac{d}{2}\geq 3. Since t0t\to 0, and δt=o(1)\delta_{t}=o(1), the Euclidean ball of radius δt\delta_{t} contains δt\delta_{t}-geodesic ball and is contained (1.1δt1.1\delta_{t})-geodesic ball, for small enough tt. Then both claims in Theorem 2.1 hold when t<ϵ0t<\epsilon_{0} for some ϵ0\epsilon_{0} depending on {\mathcal{M}}, and in 1) for yBδt(x)y\in B_{\delta_{t}}(x)\cap{\mathcal{M}}, C2tmd/2+1=O(t3)C_{2}t^{m-d/2+1}=O(t^{3}). Here by choosing larger mm can make the term of higher order of tt, yet O(t3)O(t^{3}) is enough for our later analysis.

Proof of (12): We use the shorthand notation O~(t)\tilde{O}(t) to denote O(tlog1t)O(t\log\frac{1}{t}). In Theorem 2.1, mm is fixed, ul\|u_{l}\|_{\infty} for lml\leq m are finite constants depending on {\mathcal{M}}, thus

Ht(x,y)=Gt(x,y)(u0(x,y)+O(t))+O(t3).{H}_{t}(x,y)=G_{t}(x,y)\left(u_{0}(x,y)+O(t)\right)+O(t^{3}).

Note that d(x,y)2=xy2(1+O(xy2))d_{\mathcal{M}}(x,y)^{2}=\|x-y\|^{2}(1+O(\|x-y\|^{2})), and thus when yBδt(x)y\in B_{\delta_{t}}(x), d(x,y)2=O(xy2)=O(δt2)=O~(t)d_{\mathcal{M}}(x,y)^{2}=O(\|x-y\|^{2})=O(\delta_{t}^{2})=\tilde{O}(t). By the property of u0u_{0},

u0(x,y)=1+O(d(x,y)2)=1+O~(t).u_{0}(x,y)=1+O(d_{\mathcal{M}}(x,y)^{2})=1+\tilde{O}(t).

Meanwhile, by mean value theorem and that d(x,y)xyd_{\mathcal{M}}(x,y)\geq\|x-y\|,

ed(x,y)2t=exy2(1+O(xy2))t=exy2t(1+O(xy4t)),e^{-\frac{d_{\mathcal{M}}(x,y)^{2}}{t}}=e^{-\frac{\|x-y\|^{2}(1+O(\|x-y\|^{2}))}{t}}=e^{-\frac{\|x-y\|^{2}}{t}}(1+O(\frac{\|x-y\|^{4}}{t})),

and then

Gt(x,y)=Kt(x,y)(1+O(xy4t))=Kt(x,y)(1+O(t(log1t)2)).G_{t}(x,y)=K_{t}(x,y)(1+O(\frac{\|x-y\|^{4}}{t}))=K_{t}(x,y)(1+{O}(t(\log\frac{1}{t})^{2})).

Thus, for any yBδt(x)y\in B_{\delta_{t}}(x)\cap{\mathcal{M}},

Ht(x,y)=Kt(x,y)(1+O(t(log1t)2))(1+O~(t)+O(t))+O(t3),{H}_{t}(x,y)=K_{t}(x,y)(1+{O}(t(\log\frac{1}{t})^{2}))\left(1+\tilde{O}(t)+O(t)\right)+O(t^{3}),

which proves (12), and the constants in big-OO are all determined by {\mathcal{M}}.

Proof of (13) and (14): When yy is outside the δt\delta_{t}-Euclidean ball, it is outside the δt\delta_{t}-geodesic ball. Then, by Theorem 2.1 2) and the definition of δt\delta_{t}, Ht(x,y)C3td/2eδt25tC3t10{H}_{t}(x,y)\leq C_{3}t^{-d/2}e^{-\frac{\delta_{t}^{2}}{5t}}\leq C_{3}t^{10}, which proves (13). (14) directly follows from (11). ∎

Appendix C Proofs about graph Laplacians with WW

C.1 Proofs in Section 3

Proof of (15) in Remark 2.

We want to show that

1ϵKϵ(x,y)(f(x)f(y))2p(x)p(y)𝑑V(x)𝑑V(y)=m2[h]f,Δp2fp2+O(ϵ).\frac{1}{\epsilon}\int_{{\mathcal{M}}}\int_{{\mathcal{M}}}K_{\epsilon}(x,y)(f(x)-f(y))^{2}p(x)p(y)dV(x)dV(y)=m_{2}[h]\langle f,-\Delta_{p^{2}}f\rangle_{p^{2}}+O(\epsilon).

First consider when pp is uniform. Denote by Br(x)B_{r}(x) the Euclidean ball in D\mathbb{R}^{D} centered at xx with radius rr. When yBϵ(x)y\in B_{\sqrt{\epsilon}}(x)\cap{\mathcal{M}}, (f(x)f(y))2=(f(x)Tu)2+Qx,3(u)+O(u4)(f(x)-f(y))^{2}=(\nabla f(x)^{T}u)^{2}+Q_{x,3}(u)+O(\|u\|^{4}), where udu\in\mathbb{R}^{d} is the local projected coordinate, i.e., let ϕx\phi_{x} be the projection onto Tx()T_{x}({\mathcal{M}}), u=ϕx(yx)u=\phi_{x}(y-x), also uyx<ϵ\|u\|\leq\|y-x\|<\sqrt{\epsilon}. Qx,3()Q_{x,3}(\cdot) is a three-order polynomial where the coefficients depend on the derivatives of extrinsic coordinates of {\mathcal{M}} and ff at xx. Then,

1ϵKϵ(x,y)(f(x)f(y))2𝑑V(y)=ϵd/2h(xy2ϵ)(f(x)f(y))2ϵ𝑑V(y)\displaystyle\frac{1}{\epsilon}\int_{{\mathcal{M}}}K_{\epsilon}(x,y)(f(x)-f(y))^{2}dV(y)=\int_{{\mathcal{M}}}\epsilon^{-d/2}h(\frac{\|x-y\|^{2}}{\epsilon})\frac{(f(x)-f(y))^{2}}{\epsilon}dV(y) (A.1)
=ϵd/2B~((f(x)Tu)2ϵ+Qx,3(u)ϵ+O(ϵ))(1+O(ϵ))𝑑u,B~:=ϕx(Bϵ(x))\displaystyle=\epsilon^{-d/2}\int_{\tilde{B}}\left(\frac{(\nabla f(x)^{T}u)^{2}}{\epsilon}+\frac{Q_{x,3}(u)}{\epsilon}+O(\epsilon)\right)(1+O(\epsilon))du,\quad\tilde{B}:=\phi_{x}(B_{\sqrt{\epsilon}}(x)\cap{\mathcal{M}})

and B~Bϵ(0;d)\tilde{B}\subset B_{\sqrt{\epsilon}}(0;\mathbb{R}^{d}), where we used the volume comparison relation dV(y)=(1+O(u2))dudV(y)=(1+O(\|u\|^{2}))du. By the metric comparison, yx=u(1+O(u2))\|y-x\|=\|u\|(1+O(\|u\|^{2})), thus

Vol(Bϵ(0;d)\B~)Vol(Bϵ(0;d)\Bϵ(1O(ϵ))(0;d))=ϵd/2O(ϵ).\mathrm{Vol}(B_{\sqrt{\epsilon}}(0;\mathbb{R}^{d})\backslash\tilde{B})\leq\mathrm{Vol}(B_{\sqrt{\epsilon}}(0;\mathbb{R}^{d})\backslash B_{\sqrt{\epsilon}(1-O(\epsilon))}(0;\mathbb{R}^{d}))=\epsilon^{d/2}O(\epsilon).

Meanwhile, the integration of odd power of uu vanishes on Bϵ(0;d)𝑑u\int_{B_{\sqrt{\epsilon}}(0;\mathbb{R}^{d})}du. Thus one can verify that ϵd/2B~(f(x)Tu)2ϵ𝑑u=m2[h]|f(x)|2+O(ϵ)\epsilon^{-d/2}\int_{\tilde{B}}\frac{(\nabla f(x)^{T}u)^{2}}{\epsilon}du=m_{2}[h]|\nabla f(x)|^{2}+O(\epsilon), ϵd/2B~Qx,3(u)ϵ𝑑u=O(ϵ3/2)\epsilon^{-d/2}\int_{\tilde{B}}\frac{Q_{x,3}(u)}{\epsilon}du=O(\epsilon^{3/2}), and thus the l.h.s. of (A.1) =m2[h]|f(x)|2+O(ϵ)=m_{2}[h]|\nabla f(x)|^{2}+O(\epsilon). Integrating over 𝑑V(x)\int_{{\mathcal{M}}}dV(x) proves that the bias error is O(ϵ)O(\epsilon). When pp is not uniform, one can similarly show that 1ϵKϵ(x,y)(f(x)f(y))2p(y)𝑑V(y)=m2[h]|f(x)|2p(x)+O(ϵ)\frac{1}{\epsilon}\int_{{\mathcal{M}}}K_{\epsilon}(x,y)(f(x)-f(y))^{2}p(y)dV(y)=m_{2}[h]|\nabla f(x)|^{2}p(x)+O(\epsilon) and the proof extends. ∎

Proof of Lemma 3.3.

Since pp is a constant, Δp2=Δ\Delta_{p^{2}}=\Delta. Apply Theorem 3.2 to when f=ψkf=\psi_{k}, and (ψk±ψl)(\psi_{k}\pm\psi_{l}) where klk\neq l, which are K2K^{2} cases and are all in C()C^{\infty}({\mathcal{M}}). Since the set {ψk}k=1K\{\psi_{k}\}_{k=1}^{K} is orthonormal in L2(,dV)L^{2}({\mathcal{M}},dV),

p1ψk,Δψkp2=pμk;p1ψk±ψl,Δ(ψk±ψl)p2=p(μk+μl),kl,1k,lK.p^{-1}\langle\psi_{k},-\Delta\psi_{k}\rangle_{p^{2}}=p\mu_{k};\quad p^{-1}\langle\psi_{k}\pm\psi_{l},-\Delta(\psi_{k}\pm\psi_{l})\rangle_{p^{2}}=p(\mu_{k}+\mu_{l}),\quad k\neq l,1\leq k,l\leq K.

Under the intersection of the K2K^{2} good events which happens with the indicated high probability, (16) holds. The needed threshold of NN is the max of the K2K^{2} many ones. These thresholds and the constants in the big-OO’s depend on pp and ψk\psi_{k} for kk up to KK, and KK is a fixed integer. This means that these constants are determined by {\mathcal{M}}, and thus are treated as absolute ones. ∎

Proof of Lemma 3.4.

First, for any fC()f\in C({\mathcal{M}}), when N>NfN>N_{f} depending on ff, w.p. >12N10>1-2N^{-10},

1NρXf22=f,fp+Of(logNN).\frac{1}{N}\|\rho_{X}f\|_{2}^{2}=\langle f,f\rangle_{p}+O_{f}(\sqrt{\frac{\log N}{N}}). (A.2)

This is because, by definition, 1NρXf22=1Nj=1Nf(xi)2\frac{1}{N}\|\rho_{X}f\|_{2}^{2}=\frac{1}{N}\sum_{j=1}^{N}f(x_{i})^{2}, which is independent sum of r.v. Yj:=f(xi)2Y_{j}:=f(x_{i})^{2}. 𝔼Yj=f(y)2p𝑑V(y)=f,fp\mathbb{E}Y_{j}=\int_{{\mathcal{M}}}f(y)^{2}pdV(y)=\langle f,f\rangle_{p}, and boundedness |Yj|LY:=f,2|Y_{j}|\leq L_{Y}:=\|f\|_{\infty,{\mathcal{M}}}^{2} which is Of(1)O_{f}(1) constant. The variance of YjY_{j} is bounded by 𝔼Yj2=f(y)4p𝑑V(y):=νY\mathbb{E}Y_{j}^{2}=\int_{{\mathcal{M}}}f(y)^{4}pdV(y):=\nu_{Y}, which again is Of(1)O_{f}(1) constant. Since logN/N=o(1)\log N/N=o(1), (A.2) follows by the classical Bernstein.

Now consider the KK vectors uk=1pρXψku_{k}=\frac{1}{\sqrt{p}}\rho_{X}\psi_{k}. Apply (A.2) to when f=1pψkf=\frac{1}{\sqrt{p}}\psi_{k} and 1p(ψk±ψl)\frac{1}{\sqrt{p}}(\psi_{k}\pm\psi_{l}) for klk\neq l, and consider the intersection of the K2K^{2} good events, which happens w.p. >12K2N10>1-2K^{2}N^{-10}, when NN exceeds the maximum thresholds of NN for the K2K^{2} cases. By ψk,ψlp=pδkl\langle\psi_{k},\psi_{l}\rangle_{p}=p\delta_{kl}, and the the polar formula 4ukTul=uk+ul2ukul24u_{k}^{T}u_{l}=\|u_{k}+u_{l}\|^{2}-\|u_{k}-u_{l}\|^{2}, this gives (17). Both the K2K^{2} thresholds and all the constants in big-O in (17) depend on {ψk}k=1K\{\psi_{k}\}_{k=1}^{K}. ∎

Proof of Lemma 3.5.

Suppose Part 1) has been shown with uniform constant in big-OO for each ii, then under the good event of Part 2), Part 2) holds automatically. In particular, since (19) is a property of the random r.v. WijW_{ij} only, where WijW_{ij} are determined by the random points xix_{i} and irrelevant to the vector uu, the threshold of large NN is determined by when Part 1) holds and is uniform for all uu.

It suffices to prove Part 1) to finish proving the lemma. For each ii, we construct an event under which the bound in (19) holds for DiD_{i}, and then apply a union bound. For ii fixed,

1NDi=1NKϵ(xi,xi)+1NjiKϵ(xi,xj)=:+.\frac{1}{N}D_{i}=\frac{1}{N}K_{\epsilon}(x_{i},x_{i})+\frac{1}{N}\sum_{j\neq i}K_{\epsilon}(x_{i},x_{j})=:①+②.

By Assumption 2(C2), Kϵ(xi,xi)=ϵd/2h(0)Θ(ϵd/2)K_{\epsilon}(x_{i},x_{i})=\epsilon^{-d/2}h(0)\leq\Theta(\epsilon^{-d/2}). and thus =O(N1ϵd/2)①=O(N^{-1}\epsilon^{-d/2}). Consider :=1N1jiKϵ(xi,xj)②^{\prime}:=\frac{1}{N-1}\sum_{j\neq i}K_{\epsilon}(x_{i},x_{j}), which is an independent sum condition on xix_{i} and over the randomness of {xj}ji\{x_{j}\}_{j\neq i}. The (N1)(N-1) r.v.

Yj:=Kϵ(xi,xj),ji,Y_{j}:=K_{\epsilon}(x_{i},x_{j}),\quad j\neq i,

satisfies that (Lemma 8 in [10], Lemma A.3 in [9])

𝔼Yj=Kϵ(xi,y)p𝑑V(y)=pm0+O(ϵ).\mathbb{E}Y_{j}=\int_{\mathcal{M}}K_{\epsilon}(x_{i},y)pdV(y)=pm_{0}+O(\epsilon).

Boundedness: again by Assumption 2(C2), |Yj|LY=Θ(ϵd/2)|Y_{j}|\leq L_{Y}=\Theta(\epsilon^{-d/2}). Variance of YjY_{j} is bounded by

𝔼Yj2\displaystyle\mathbb{E}Y_{j}^{2} =Kϵ(xi,y)2p𝑑V(y)=pϵdh2(xiy2ϵ)𝑑V(y),\displaystyle=\int_{\mathcal{M}}K_{\epsilon}(x_{i},y)^{2}pdV(y)=p\int_{\mathcal{M}}\epsilon^{-d}h^{2}(\frac{\|x_{i}-y\|^{2}}{\epsilon})dV(y),

where since h2(r)h^{2}(r) as a function on [0,)[0,\infty) also satisfies Assumption 2,

𝔼Yj2=ϵd/2p(m0[h2]+O(ϵ))νY=Θ(ϵd/2).\mathbb{E}Y_{j}^{2}=\epsilon^{-d/2}p(m_{0}[h^{2}]+O(\epsilon))\leq\nu_{Y}=\Theta(\epsilon^{-d/2}).

The constants in the big-Θ\Theta notation of LYL_{Y} and νY\nu_{Y} are absolute ones depending on {\mathcal{M}} and do not depend on xix_{i}. Since logNNϵd/2=o(1)\sqrt{\frac{\log N}{N\epsilon^{d/2}}}=o(1), the classical Bernstein gives that when NN is sufficiently large w.p. >12N10>1-2N^{-10},

|𝔼Yj|=O(νYlogNN)=O(logNNϵd/2)| condition on xi.|②^{\prime}-\mathbb{E}Y_{j}|=O(\sqrt{\nu_{Y}\frac{\log N}{N}})=O(\sqrt{\frac{\log N}{N\epsilon^{d/2}}})\quad|\text{ condition on }x_{i}.

Under this event, =O(1)②^{\prime}=O(1), and then =(11N)②=(1-\frac{1}{N})②^{\prime} gives that

=m0p+O(ϵ)+O(logNNϵd/2)+O(1N)=m0p+O(ϵ,logNNϵd/2),②=m_{0}p+O(\epsilon)+O(\sqrt{\frac{\log N}{N\epsilon^{d/2}}})+O(\frac{1}{N})=m_{0}p+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}),

and then

1NDi=O(N1ϵd/2)+m0p+O(ϵ,logNNϵd/2)=m0p+O(ϵ,logNNϵd/2).\frac{1}{N}D_{i}=O(N^{-1}\epsilon^{-d/2})+m_{0}p+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})=m_{0}p+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}).

By that xix_{i} is independent from {xj}ji\{x_{j}\}_{j\neq i}, and that the bound is uniform for all location of xix_{i}, we have that w.p. >12N10>1-2N^{-10}, the bound in (19) for ii, and applying union bound to the NN events proves Part 1). ∎

Proof of Proposition 3.6.

Under the condition of the current proposition, Lemma 3.5 applies. For fixed KK, take the intersection of the good events in Lemma 3.5, 3.4 and 3.3, which happens w.p. >14K2N102N9>1-4K^{2}N^{-10}-2N^{-9} for large enough NN. Same as before, let uk=1pρXψku_{k}=\frac{1}{\sqrt{p}}\rho_{X}\psi_{k}, and by 3.4, the set {u1,,uK}\{u_{1},\cdots,u_{K}\} is linearly independent. Let L=Span{u1,,uk}L=\text{Span}\{u_{1},\cdots,u_{k}\}, then dim(L)=kdim(L)=k for each kKk\leq K. For any vLv\in L, v0v\neq 0, there are cjc_{j}, 1jk1\leq j\leq k, such that v=j=1kcjujv=\sum_{j=1}^{k}c_{j}u_{j}. Again, by (17), we have 1Nv2=c2(1+O(logNN))\frac{1}{N}\|v\|^{2}=\|c\|^{2}(1+O(\sqrt{\frac{\log N}{N}})), and together with Lemma 3.5 2),

1m01N2vTDv\displaystyle\frac{1}{m_{0}}\frac{1}{N^{2}}v^{T}Dv =1Nv2(p+O(ϵ,logNNϵd/2))=c2(1+O(logNN))(p+O(ϵ,logNNϵd/2))\displaystyle=\frac{1}{N}\|v\|^{2}(p+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}))=\|c\|^{2}(1+O(\sqrt{\frac{\log N}{N}}))(p+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}))
=c2p(1+O(ϵ,logNNϵd/2)),\displaystyle=\|c\|^{2}p(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})), (A.3)

and the constant in O()O(\cdot) is uniform for all vv. For EN(v)E_{N}(v), (18) still holds, and by that KK is fixed it gives

EN(v)\displaystyle E_{N}(v) c2(pμk+O(ϵ,logNNϵd/2)).\displaystyle\leq\|c\|^{2}\left(p\mu_{k}+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})\right).

Together with (A.3), we have that

EN(v)1m01N2vTDvpμk+O(ϵ,logNNϵd/2)p(1+O(ϵ,logNNϵd/2))=μk+O(ϵ,logNNϵd/2),\frac{E_{N}(v)}{\frac{1}{m_{0}}\frac{1}{N^{2}}v^{T}Dv}\leq\frac{p\mu_{k}+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})}{p(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}))}=\mu_{k}+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}),

and the r.h.s. upper bounds λk(Lrw)\lambda_{k}(L_{rw}) by (8). ∎

C.2 Proofs in Section 4

Proof of (25) in Lemma 4.2.

Suppose ss is small enough such that Lemma 2.2 holds with ϵ\epsilon being ss here. For each ii, we construct an event under which the bound in (25) holds for (Ds)i(D_{s})_{i}, and then apply a union bound. For ii fixed,

(Ds)i=1NHs(xi,xi)+1NjiHs(xi,xj)=:+.(D_{s})_{i}=\frac{1}{N}{H}_{s}(x_{i},x_{i})+\frac{1}{N}\sum_{j\neq i}{H}_{s}(x_{i},x_{j})=:①+②.

By (14), Hs(xi,xi)=O(sd/2){H}_{s}(x_{i},x_{i})=O(s^{-d/2}), and thus =O(N1sd/2)①=O(N^{-1}s^{-d/2}). Consider :=1N1jiHs(xi,xj)②^{\prime}:=\frac{1}{N-1}\sum_{j\neq i}{H}_{s}(x_{i},x_{j}), which is an independent sum condition on xix_{i} and over the randomness of {xj}ji\{x_{j}\}_{j\neq i}. The (N1)(N-1) r.v. Yj:=Hs(xi,xj)Y_{j}:={H}_{s}(x_{i},x_{j}), jij\neq i, satisfies that 𝔼Yj=Hs(xi,y)p𝑑V(y)=p\mathbb{E}Y_{j}=\int_{\mathcal{M}}{H}_{s}(x_{i},y)pdV(y)=p, and boundedness: again by (14), |Yj|LY=Θ(sd/2)|Y_{j}|\leq L_{Y}=\Theta(s^{-d/2}). Variance of YjY_{j} is bounded by 𝔼Yj2=Hs(xi,y)2p𝑑V(y)=pH2s(xi,xi)νY=Θ(sd/2)\mathbb{E}Y_{j}^{2}=\int_{\mathcal{M}}{H}_{s}(x_{i},y)^{2}pdV(y)=p{H}_{2s}(x_{i},x_{i})\leq\nu_{Y}=\Theta(s^{-d/2}). The constants in the big-Θ\Theta notation of LYL_{Y} and νY\nu_{Y} are from (14) which only depend on {\mathcal{M}} and not on xix_{i}. We use the notation O()O_{{\mathcal{M}}}(\cdot) to stress this. Since logNNsd/2=o(1)\sqrt{\frac{\log N}{Ns^{d/2}}}=o(1), the classical Bernstein gives that with sufficiently large NN, w.p. >12N10>1-2N^{-10},

|p|=O(νYlogNN)=O(logNNsd/2)| condition on xi.|②^{\prime}-p|=O(\sqrt{\nu_{Y}\frac{\log N}{N}})=O_{\mathcal{M}}(\sqrt{\frac{\log N}{Ns^{d/2}}})\quad|\text{ condition on }x_{i}.

The rest of the proof is the same as that of Lemma 3.5 1), namely, by that =(11N)②=(1-\frac{1}{N})②^{\prime}, one can verify that both and then (Ds)i(D_{s})_{i} equals p+O(logNNsd/2)p+O_{{\mathcal{M}}}(\sqrt{\frac{\log N}{Ns^{d/2}}}) w.p. >12N10>1-2N^{-10}, and then (25) follows from applying union bound to the NN events. ∎

Proof of Proposition 4.4.

The proof is by the same method as that of Proposition 4.1, and the difference is that the eigenvectors are DD-orthogonal here and normalized differently. Denote λk(Lrw)\lambda_{k}(L_{rw}) as λk\lambda_{k}, and let Lrwvk=λkvkL_{rw}v_{k}=\lambda_{k}v_{k}, normalized s.t.

1N2vkTDvl=δkl,1k,lN.\frac{1}{N^{2}}v_{k}^{T}Dv_{l}=\delta_{kl},\quad 1\leq k,l\leq N.

Note that this normalization of vkv_{k} differs from what is used in the final eigen-convergence rate result, Theorem 5.5, because the current proposition concerns eigenvalue only.

Because ϵd/2+2>cKlogNN\epsilon^{d/2+2}>c_{K}\frac{\log N}{N}, ϵd/2=Ω(logNN)\epsilon^{d/2}=\Omega(\frac{\log N}{N}), then the conditions needed in Proposition 3.6 are satisfied. Thus, with sufficiently large NN, there is an event EUBE_{UB}^{\prime} which happens w.p. >12N94K2N10>1-2N^{-9}-4K^{2}N^{-10}, under which Di>0D_{i}>0 for all ii s.t. LrwL_{rw} is well-defined, and (32) holds for λk=λk(Lrw)\lambda_{k}=\lambda_{k}(L_{rw}). Because the good event EUBE_{UB}^{\prime} in Proposition 3.6 assumes the good event in Lemma 3.5, then (20) also holds for all the vkv_{k} and vk±vlv_{k}\pm v_{l}, which gives that (m0=1m_{0}=1 because hh is Gaussian)

1=1N2vkTDvk=1Nvk2(p+O(ϵ,logNNϵd/2)),1kK,2=1N2(vk±vl)TD(vk±vl)=1Nvk±vl2(p+O(ϵ,logNNϵd/2))kl,1k,lK,\begin{split}1=\frac{1}{N^{2}}v_{k}^{T}Dv_{k}&=\frac{1}{N}\|v_{k}\|^{2}(p+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})),\quad 1\leq k\leq K,\\ 2=\frac{1}{N^{2}}(v_{k}\pm v_{l})^{T}D(v_{k}\pm v_{l})&=\frac{1}{N}\|v_{k}\pm v_{l}\|^{2}(p+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}))\quad k\neq l,1\leq k,l\leq K,\end{split}

and, equivalently (because p>0p>0 is a constant)

1Nvk2=1p(1+O(ϵ,logNNϵd/2)),1kK,1Nvk±vl2=1p(2+O(ϵ,logNNϵd/2)),kl,1k,lK.\begin{split}\frac{1}{N}\|v_{k}\|^{2}&=\frac{1}{p}(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})),\quad 1\leq k\leq K,\\ \frac{1}{N}\|v_{k}\pm v_{l}\|^{2}&=\frac{1}{p}(2+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})),\quad k\neq l,1\leq k,l\leq K.\end{split} (A.4)

We set δ\delta, rr, tt, in the same way, and let fk=Ir[vk]f_{k}=I_{r}[v_{k}], fkC()f_{k}\in C^{\infty}({\mathcal{M}}). Because the good event E(0)E^{(0)} only concerns randomness of Hδϵ(xi,xj)H_{\delta\epsilon}(x_{i},x_{j}), under E(0)E^{(0)} which happens w.p. >12N9>1-2N^{-9},

qδϵ(0)(vk)=1Nvk2(p+O(logNNϵd/2))=1+O(ϵ,logNNϵd/2),1kK,qδϵ(0)(vk±vl)=1Nvk±vl2(p+O(logNNϵd/2))=2+O(ϵ,logNNϵd/2),kl,1k,lK.\begin{split}q^{(0)}_{\delta\epsilon}(v_{k})&=\frac{1}{N}\|v_{k}\|^{2}(p+O(\sqrt{\frac{\log N}{N\epsilon^{d/2}}}))=1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}),\quad 1\leq k\leq K,\\ q^{(0)}_{\delta\epsilon}(v_{k}\pm v_{l})&=\frac{1}{N}\|v_{k}\pm v_{l}\|^{2}(p+O(\sqrt{\frac{\log N}{N\epsilon^{d/2}}}))=2+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}),\quad k\neq l,1\leq k,l\leq K.\end{split} (A.5)

Next, note that since (DW)vk=m~ϵλkDvk(D-W)v_{k}=\tilde{m}\epsilon\lambda_{k}Dv_{k}, and with Gaussian hh, m~=1\tilde{m}=1, and vkv_{k} are DD-orthogonal,

vkT(DW)vkN2=ϵλk1N2vkTDvk=ϵλk,1kK,(vk±vl)T(DW)(vk±vl)N2=ϵ(λk+λl),kl,1k,lK.\begin{split}&~{}~{}~{}\frac{v_{k}^{T}(D-W)v_{k}}{N^{2}}=\epsilon\lambda_{k}\frac{1}{N^{2}}v_{k}^{T}Dv_{k}=\epsilon\lambda_{k},\quad 1\leq k\leq K,\\ &\frac{(v_{k}\pm v_{l})^{T}(D-W)(v_{k}\pm v_{l})}{N^{2}}=\epsilon(\lambda_{k}+\lambda_{l}),\quad k\neq l,1\leq k,l\leq K.\end{split} (A.6)

Then, (27) in Lemma 4.3 where α=δ\alpha=\delta gives that

qδϵ(2)(vk)=O(δd/2)ϵλk+O(ϵ3),1kK,qδϵ(2)(vk±vl)=O(δd/2)ϵ(λk+λl)+2O(ϵ3),kl, 1k,lK,\begin{split}q^{(2)}_{\delta\epsilon}(v_{k})&=O(\delta^{-d/2})\epsilon\lambda_{k}+O(\epsilon^{3}),\quad 1\leq k\leq K,\\ q^{(2)}_{\delta\epsilon}(v_{k}\pm v_{l})&=O(\delta^{-d/2})\epsilon(\lambda_{k}+\lambda_{l})+2O(\epsilon^{3}),\quad k\neq l,\,1\leq k,l\leq K,\end{split}

then same as in (33), they are both O(ϵ)O(\epsilon). Together with (A.5), this gives that

fk,fk=1+O(ϵ,logNNϵd/2)+O(ϵ),1kK,fk,fl=14(qδϵ(vk+vl)qδϵ(vkvl))=O(ϵ,logNNϵd/2)+O(ϵ),kl, 1k,lK.\begin{split}\langle f_{k},f_{k}\rangle&=1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})+O(\epsilon),\quad 1\leq k\leq K,\\ \langle f_{k},f_{l}\rangle&=\frac{1}{4}(q_{\delta\epsilon}(v_{k}+v_{l})-q_{\delta\epsilon}(v_{k}-v_{l}))=O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})+O(\epsilon),\quad k\neq l,\,1\leq k,l\leq K.\end{split} (A.7)

Then due to that O(ϵ,logNNϵd/2)=o(1)O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})=o(1), we have linear independence of {fj}j=1K\{f_{j}\}_{j=1}^{K} with large enough NN.

Again, we let Lk=Span{f1,,fk}L_{k}=\text{Span}\{f_{1},\cdots,f_{k}\}, and have (35). For any fLkf\in L_{k}, f=j=1kcjfjf=\sum_{j=1}^{k}c_{j}f_{j}, f=Ir[v]f=I_{r}[v], v:=j=1kcjvjv:=\sum_{j=1}^{k}c_{j}v_{j},

1N2vTDv=j=1kcj21N2vjTDvj=c2,\frac{1}{N^{2}}v^{T}Dv=\sum_{j=1}^{k}c_{j}^{2}\frac{1}{N^{2}}v_{j}^{T}Dv_{j}=\|c\|^{2},

and, by that Lemma 3.5 2) holds, (20) applies to vv to give 1N2vTDv=1Nv2(p+O(ϵ,logNNϵd/2))\frac{1}{N^{2}}v^{T}Dv=\frac{1}{N}\|v\|^{2}(p+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})), thus

1Nv2=c2p(1+O(ϵ,logNNϵd/2)).\frac{1}{N}\|v\|^{2}=\frac{\|c\|^{2}}{p}(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})). (A.8)

Meanwhile, by (A.6),

vT(DW)vN2=j=1kcj2vjT(DW)vjN2=j=1kcj2ϵλjϵλkc2.\frac{v^{T}(D-W)v}{N^{2}}=\sum_{j=1}^{k}c_{j}^{2}\frac{v_{j}^{T}(D-W)v_{j}}{N^{2}}=\sum_{j=1}^{k}c_{j}^{2}\epsilon\lambda_{j}\leq\epsilon\lambda_{k}\|c\|^{2}. (A.9)

With the good event E(1)E^{(1)} same as before (Lemma 4.2 at s=ϵs=\epsilon), under E(0)E(1)E^{(0)}\cap E^{(1)}, and the O()O_{\mathcal{M}}(\cdot) notation means that the constant depends on {\mathcal{M}} only and not on KK,

qϵ(0)(v)=1Nv2(p+O(logNNϵd/2)),qδϵ(0)(v)=1Nv2(p+O(δd/2logNNϵd/2)),q^{(0)}_{\epsilon}(v)=\frac{1}{N}\|v\|^{2}(p+O_{\mathcal{M}}(\sqrt{\frac{\log N}{N\epsilon^{d/2}}})),\quad q^{(0)}_{\delta\epsilon}(v)=\frac{1}{N}\|v\|^{2}(p+O_{\mathcal{M}}(\sqrt{\delta^{-d/2}\frac{\log N}{N\epsilon^{d/2}}})), (A.10)

and then, again,

qδϵ(0)(v)qϵ(0)(v)\displaystyle q^{(0)}_{\delta\epsilon}(v)-q^{(0)}_{\epsilon}(v) =1Nv2O(δd/4logNNϵd/2)=c2p(1+O(ϵ,logNNϵd/2))O(δd/4logNNϵd/2)\displaystyle=\frac{1}{N}\|v\|^{2}O_{\mathcal{M}}(\delta^{-d/4}\sqrt{\frac{\log N}{N\epsilon^{d/2}}})=\frac{\|c\|^{2}}{p}(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}))O_{\mathcal{M}}(\delta^{-d/4}\sqrt{\frac{\log N}{N\epsilon^{d/2}}})
=c2O(δd/4logNNϵd/2),\displaystyle=\|c\|^{2}O_{\mathcal{M}}(\delta^{-d/4}\sqrt{\frac{\log N}{N\epsilon^{d/2}}}),

where we used (A.8) to substitute the 1Nv2\frac{1}{N}\|v\|^{2} term after the leading 1Nv2p\frac{1}{N}\|v\|^{2}p term is canceled in the subtraction. The UB of qϵ(2)(v)q^{(2)}_{\epsilon}(v) is similar as before, namely, by (26) in Lemma 4.3, inserting (A.9), and with the shorthand that O~(ϵ)\tilde{O}(\epsilon) stands for O(ϵ(log1ϵ)2){O}(\epsilon(\log\frac{1}{\epsilon})^{2}),

qϵ(2)(v)=vT(DW)vN2(1+O~(ϵ))+c2O(ϵ3)ϵc2(λk(1+O~(ϵ))+O(ϵ2)).q^{(2)}_{\epsilon}(v)=\frac{v^{T}(D-W)v}{N^{2}}(1+\tilde{O}(\epsilon))+\|c\|^{2}O(\epsilon^{3})\leq\epsilon\|c\|^{2}(\lambda_{k}(1+\tilde{O}(\epsilon))+O(\epsilon^{2})).

Thus we have that

f,ff,Qtf\displaystyle\langle f,f\rangle-\langle f,Q_{t}f\rangle (qδϵ(0)(v)qϵ(0)(v))+qϵ(2)(v)\displaystyle\leq(q^{(0)}_{\delta\epsilon}(v)-q^{(0)}_{\epsilon}(v))+q^{(2)}_{\epsilon}(v)
ϵc2(λk(1+O~(ϵ))+O(ϵ2)+δd/4O(1ϵlogNNϵd/2))\displaystyle\leq\epsilon\|c\|^{2}\left(\lambda_{k}(1+\tilde{O}(\epsilon))+O(\epsilon^{2})+\delta^{-d/4}O_{\mathcal{M}}(\frac{1}{\epsilon}\sqrt{\frac{\log N}{N\epsilon^{d/2}}})\right)
=ϵc2(λk+O~(ϵ)+δd/4O(1ϵlogNNϵd/2)).(by λk1.1μK)\displaystyle=\epsilon\|c\|^{2}\left(\lambda_{k}+\tilde{O}(\epsilon)+\delta^{-d/4}O_{\mathcal{M}}(\frac{1}{\epsilon}\sqrt{\frac{\log N}{N\epsilon^{d/2}}})\right).\quad\text{(by $\lambda_{k}\leq 1.1\mu_{K}$)} (A.11)

To lower bound f,f\langle f,f\rangle, again by (27) in Lemma 4.3, inserting (A.9),

0qδϵ(2)(v)Θ(δd/2)vT(DW)vN2+c2O(ϵ3)ϵc2(λkΘ(δd/2)+O(ϵ2)),0\leq q^{(2)}_{\delta\epsilon}(v)\leq\Theta(\delta^{-d/2})\frac{v^{T}(D-W)v}{N^{2}}+\|c\|^{2}O(\epsilon^{3})\leq\epsilon\|c\|^{2}\left(\lambda_{k}\Theta(\delta^{-d/2})+O(\epsilon^{2})\right),

and then since λkΘ(δd/2)+O(ϵ2)=O(1)\lambda_{k}\Theta(\delta^{-d/2})+O(\epsilon^{2})=O(1), we again have that qδϵ(2)(v)=c2O(ϵ)q^{(2)}_{\delta\epsilon}(v)=\|c\|^{2}O(\epsilon). We have derived formula of qδϵ(0)(v)q_{\delta\epsilon}^{(0)}(v) in (A.10) under E(0)E(1)E^{(0)}\cap E^{(1)}, and inserting (A.8),

qδϵ(0)(v)=1Nv2(p+O(logNNϵd/2))=c2(1+O(ϵ,logNNϵd/2)).q^{(0)}_{\delta\epsilon}(v)=\frac{1}{N}\|v\|^{2}(p+O(\sqrt{\frac{\log N}{N\epsilon^{d/2}}}))=\|c\|^{2}(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})). (A.12)

Thus,

f,f=qδϵ(0)(v)qδϵ(2)(v)=c2(1+O(ϵ,logNNϵd/2)O(ϵ))c2(1O(ϵ,logNNϵd/2)).\langle f,f\rangle=q^{(0)}_{\delta\epsilon}(v)-q^{(2)}_{\delta\epsilon}(v)=\|c\|^{2}\left(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})-O(\epsilon)\right)\geq\|c\|^{2}\left(1-O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})\right).

Together with (A.11), this gives

f,ff,Qtff,fϵ(λk+O~(ϵ)+δd/4O(1ϵlogNNϵd/2))1O(ϵ,logNNϵd/2)ϵ(λk+O~(ϵ)+CϵlogNNϵd/2),\frac{\langle f,f\rangle-\langle f,Q_{t}f\rangle}{\langle f,f\rangle}\leq\frac{\epsilon\left(\lambda_{k}+\tilde{O}(\epsilon)+\delta^{-d/4}O_{\mathcal{M}}(\frac{1}{\epsilon}\sqrt{\frac{\log N}{N\epsilon^{d/2}}})\right)}{1-O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})}\leq\epsilon\left(\lambda_{k}+\tilde{O}(\epsilon)+\frac{C}{\epsilon}\sqrt{\frac{\log N}{N\epsilon^{d/2}}}\right),

where the notation of CC is defined in the same way as in the proof of Proposition 4.1. The rest of the proof is the same, and the intersection of all the needed good events E(0)E^{(0)}, E(1)E^{(1)}, and EUBE_{UB}^{\prime}, which happens w.p.>12N94K2N104N9>1-2N^{-9}-4K^{2}N^{-10}-4N^{-9}. ∎

C.3 Proofs in Section 5

Proof of Theorem 5.5.

With sufficiently large NN, we restrict to the intersection of the good events in Proposition 4.4 and the K=kmax+1K=k_{max}+1 good events of applying Theorem 5.1 1) to {ψk}k=1K\{\psi_{k}\}_{k=1}^{K}, which happens w.p.>14K2N10(6+4K)N9>1-4K^{2}N^{-10}-(6+4K)N^{-9}. The good event in Proposition 4.4 is contained in the good event EUBE_{UB}^{\prime} of Proposition 3.6 of the eigenvalue UB, which is again contained in the good event of Lemma 3.5. As a result, Di>0D_{i}>0 for all ii, and thus LrwL_{rw} is well-defined, and (20) holds.

Applying (20) to u=vku=v_{k}, and because vkD/N2=p\|v_{k}\|_{D/N}^{2}=p, we have that (m0=1m_{0}=1 due to that hh is Gaussian)

p=vkDN2=pvk22(1+O(ϵ,logNNϵd/2)),1kK.p=\|v_{k}\|_{\frac{D}{N}}^{2}=p\|v_{k}\|_{2}^{2}(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})),\quad 1\leq k\leq K. (A.13)

This verifies that vk22=1+O(ϵ,logNNϵd/2)=1+o(1)\|v_{k}\|_{2}^{2}=1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})=1+o(1), for 1kK1\leq k\leq K.

Because the good event EUBE_{UB}^{\prime} is under that in Lemma 3.4, ϕk22=1+O(logNN)\|\phi_{k}\|_{2}^{2}=1+O(\sqrt{\frac{\log N}{N}}), 1kK1\leq k\leq K, and then, applying (20) to u=ϕku=\phi_{k},

ϕkDN2=pϕk2(1+O(ϵ,logNNϵd/2))=p(1+O(ϵ,logNNϵd/2)),1kK.\|\phi_{k}\|_{\frac{D}{N}}^{2}=p\|\phi_{k}\|^{2}(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}))=p(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})),\quad 1\leq k\leq K. (A.14)

Step 2. for LrwL_{rw}: We follow a similar approach as in Proposition 5.2. When k=1k=1, λ1=0\lambda_{1}=0, and v1v_{1} is always the constant vector, thus the discrepancy is zero. Consider 2kK2\leq k\leq K, by Theorem 5.1 1), and that u2Nu\|u\|_{2}\leq\sqrt{N}\|u\|_{\infty} for any uNu\in\mathbb{R}^{N},

Lrwϕkμkϕk2=O(ϵ,logNNϵd/2+1),2kK,\|L_{rw}\phi_{k}-\mu_{k}\phi_{k}\|_{2}=O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2+1}}}),\quad 2\leq k\leq K, (A.15)

and then by (20) which holds uniformly for all uNu\in\mathbb{R}^{N},

LrwϕkμkϕkDN=Lrwϕkμkϕk2p(1+O(ϵ,logNNϵd/2))=O(Lrwϕkμkϕk2).\|L_{rw}\phi_{k}-\mu_{k}\phi_{k}\|_{\frac{D}{N}}=\|L_{rw}\phi_{k}-\mu_{k}\phi_{k}\|_{2}\sqrt{p}(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}))=O(\|L_{rw}\phi_{k}-\mu_{k}\phi_{k}\|_{2}).

Thus, there is Errpt>0\text{Err}_{pt}>0, s.t.

LrwϕkμkϕkDNErrpt,2kK,Errpt=O(ϵ,logNNϵd/2+1).\|L_{rw}\phi_{k}-\mu_{k}\phi_{k}\|_{\frac{D}{N}}\leq\text{Err}_{pt},\quad 2\leq k\leq K,\quad\text{Err}_{pt}=O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2+1}}}). (A.16)

The constant in big-OO depends on first KK eigenfunctions, and is an absolute one because KK is fixed. Next, same as in the proof of Proposition 5.2, under the good event of Proposition 4.4 and by the definition of γK\gamma_{K} as the maximum (half) eigen-gap among {μk}1kK\{\mu_{k}\}_{1\leq k\leq K}, (41) holds for λk\lambda_{k}.

Let Sk=Span{(DN)1/2vk}S_{k}=\text{Span}\{(\frac{D}{N})^{1/2}v_{k}\}, SkS_{k} is a 1-dimensional subspace in N\mathbb{R}^{N}. Because vjv_{j}’s are DD-orthogonal, Sk=Span{(DN)1/2vj,jk,1jN}S_{k}^{\perp}=\text{Span}\{(\frac{D}{N})^{1/2}v_{j},\,j\neq k,1\leq j\leq N\}. Note that

PSk((DN)1/2μkϕk)=(DN)1/2jk,j=1NvjT(DN)ϕkvjDN2μkvj,P_{S_{k}^{\perp}}\left((\frac{D}{N})^{1/2}\mu_{k}\phi_{k}\right)=(\frac{D}{N})^{1/2}\sum_{j\neq k,j=1}^{N}\frac{v_{j}^{T}(\frac{D}{N})\phi_{k}}{\|v_{j}\|_{\frac{D}{N}}^{2}}\mu_{k}v_{j}, (A.17)

and because

LrwTDvj=1ϵ(IWD1)Dvj=1ϵ(DW)vj=Dλjvj,L_{rw}^{T}Dv_{j}=\frac{1}{\epsilon}(I-WD^{-1})Dv_{j}=\frac{1}{\epsilon}(D-W)v_{j}=D\lambda_{j}v_{j}, (A.18)
PSk((DN)1/2Lrwϕk)\displaystyle P_{S_{k}^{\perp}}\left((\frac{D}{N})^{1/2}L_{rw}\phi_{k}\right) =(DN)1/2jk,j=1NvjT(DN)LrwϕkvjDN2vj=(DN)1/2jk,j=1N1N(LrwTDvj)TϕkvjDN2vj\displaystyle=(\frac{D}{N})^{1/2}\sum_{j\neq k,j=1}^{N}\frac{v_{j}^{T}(\frac{D}{N})L_{rw}\phi_{k}}{\|v_{j}\|_{\frac{D}{N}}^{2}}v_{j}=(\frac{D}{N})^{1/2}\sum_{j\neq k,j=1}^{N}\frac{\frac{1}{N}(L_{rw}^{T}Dv_{j})^{T}\phi_{k}}{\|v_{j}\|_{\frac{D}{N}}^{2}}v_{j}
=(DN)1/2jk,j=1N1N(Dvj)TϕkvjDN2λjvj.\displaystyle=(\frac{D}{N})^{1/2}\sum_{j\neq k,j=1}^{N}\frac{\frac{1}{N}(Dv_{j})^{T}\phi_{k}}{\|v_{j}\|_{\frac{D}{N}}^{2}}\lambda_{j}v_{j}. (A.19)

Subtracting (A.17) and (A.19) gives

PSk((DN)1/2(Lrwϕkμkϕk))=jk,j=1N(λjμk)vjTDNϕkvjDN2(DN)1/2vj,P_{S_{k}^{\perp}}\left((\frac{D}{N})^{1/2}(L_{rw}\phi_{k}-\mu_{k}\phi_{k})\right)=\sum_{j\neq k,j=1}^{N}(\lambda_{j}-\mu_{k})\frac{v_{j}^{T}\frac{D}{N}\phi_{k}}{\|v_{j}\|_{\frac{D}{N}}^{2}}(\frac{D}{N})^{1/2}v_{j},

and by that vjv_{j} are DD-orthogonal, and (41),

PSk((DN)1/2(Lrwϕkμkϕk))22=jk,j=1N|λjμk|2|vjTDNϕk|2vjDN2γK2jk,j=1N|vjTDNϕk|2vjDN2.\|P_{S_{k}^{\perp}}\left((\frac{D}{N})^{1/2}(L_{rw}\phi_{k}-\mu_{k}\phi_{k})\right)\|_{2}^{2}=\sum_{j\neq k,j=1}^{N}|\lambda_{j}-\mu_{k}|^{2}\frac{|v_{j}^{T}\frac{D}{N}\phi_{k}|^{2}}{\|v_{j}\|_{\frac{D}{N}}^{2}}\geq\gamma_{K}^{2}\sum_{j\neq k,j=1}^{N}\frac{|v_{j}^{T}\frac{D}{N}\phi_{k}|^{2}}{\|v_{j}\|_{\frac{D}{N}}^{2}}.

The square-root of the l.h.s.

PSk((DN)1/2(Lrwϕkμkϕk))2(DN)1/2(Lrwϕkμkϕk)2=LrwϕkμkϕkDNErrpt,\|P_{S_{k}^{\perp}}\left((\frac{D}{N})^{1/2}(L_{rw}\phi_{k}-\mu_{k}\phi_{k})\right)\|_{2}\leq\|(\frac{D}{N})^{1/2}(L_{rw}\phi_{k}-\mu_{k}\phi_{k})\|_{2}=\|L_{rw}\phi_{k}-\mu_{k}\phi_{k}\|_{\frac{D}{N}}\leq\text{Err}_{pt},

and the last inequality is by (A.16). This gives that

(jk,j=1N|vjTDNϕk|2vjDN2)1/2ErrptγK.\left(\sum_{j\neq k,j=1}^{N}\frac{|v_{j}^{T}\frac{D}{N}\phi_{k}|^{2}}{\|v_{j}\|_{\frac{D}{N}}^{2}}\right)^{1/2}\leq\frac{\text{Err}_{pt}}{\gamma_{K}}.

Meanwhile, PSk((DN)1/2ϕk)=jk,j=1NvjT(DN)ϕkvjDN2(DN)1/2vjP_{S_{k}^{\perp}}\left((\frac{D}{N})^{1/2}\phi_{k}\right)=\sum_{j\neq k,j=1}^{N}\frac{v_{j}^{T}(\frac{D}{N})\phi_{k}}{\|v_{j}\|_{\frac{D}{N}}^{2}}(\frac{D}{N})^{1/2}v_{j}, and by DD-orthogonality of vjv_{j} again, jk,j=1N|vjTDNϕk|2vjDN2=PSk((DN)1/2ϕk)22\sum_{j\neq k,j=1}^{N}\frac{|v_{j}^{T}\frac{D}{N}\phi_{k}|^{2}}{\|v_{j}\|_{\frac{D}{N}}^{2}}=\|P_{S_{k}^{\perp}}\left((\frac{D}{N})^{1/2}\phi_{k}\right)\|_{2}^{2}. Thus,

PSk((DN)1/2ϕk)2=(jk,j=1N|vjTDNϕk|2vjDN2)1/2ErrptγK=O(ϵ,logNNϵd/2+1).\|P_{S_{k}^{\perp}}\left((\frac{D}{N})^{1/2}\phi_{k}\right)\|_{2}=\left(\sum_{j\neq k,j=1}^{N}\frac{|v_{j}^{T}\frac{D}{N}\phi_{k}|^{2}}{\|v_{j}\|_{\frac{D}{N}}^{2}}\right)^{1/2}\leq\frac{\text{Err}_{pt}}{\gamma_{K}}=O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2+1}}}). (A.20)

Finally, define

βk:=vkT(DN)ϕkvkDN2,βk(DN)1/2vk=PSk(DN)1/2ϕk,\beta_{k}:=\frac{v_{k}^{T}(\frac{D}{N})\phi_{k}}{\|v_{k}\|_{\frac{D}{N}}^{2}},\quad\beta_{k}(\frac{D}{N})^{1/2}v_{k}=P_{S_{k}}(\frac{D}{N})^{1/2}\phi_{k},
PSk((DN)1/2ϕk)=(DN)1/2ϕkPSk(DN)1/2ϕk=(DN)1/2(ϕkβkvk),P_{S_{k}^{\perp}}\left((\frac{D}{N})^{1/2}\phi_{k}\right)=(\frac{D}{N})^{1/2}\phi_{k}-P_{S_{k}}(\frac{D}{N})^{1/2}\phi_{k}=(\frac{D}{N})^{1/2}\left(\phi_{k}-\beta_{k}v_{k}\right),

and then, together with (A.20),

ϕkβkvkDN=PSk((DN)1/2ϕk)2=O(ϵ,logNNϵd/2+1).\|\phi_{k}-\beta_{k}v_{k}\|_{\frac{D}{N}}=\|P_{S_{k}^{\perp}}\left((\frac{D}{N})^{1/2}\phi_{k}\right)\|_{2}=O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2+1}}}).

Applying (20) to u=ϕkβkvku=\phi_{k}-\beta_{k}v_{k}, ϕkβkvk2=(1p(1+O(ϵ,logNNϵd/2)))1/2ϕkβkvkDN=O(ϕkβkvkDN)\|\phi_{k}-\beta_{k}v_{k}\|_{2}=(\frac{1}{p}(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})))^{1/2}\|\phi_{k}-\beta_{k}v_{k}\|_{\frac{D}{N}}=O(\|\phi_{k}-\beta_{k}v_{k}\|_{\frac{D}{N}}), and we have shown that

ϕkβkvk2=O(ϕkβkvkDN)=O(ϵ,logNNϵd/2+1).\|\phi_{k}-\beta_{k}v_{k}\|_{2}=O(\|\phi_{k}-\beta_{k}v_{k}\|_{\frac{D}{N}})=O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2+1}}}).

To finish Step 2, it remains to show that |βk|=1+o(1)|\beta_{k}|=1+o(1), and then we define αk=1βk\alpha_{k}=\frac{1}{\beta_{k}}. By definition of βk\beta_{k},

ϕkDN2\displaystyle\|\phi_{k}\|_{\frac{D}{N}}^{2} =(DN)1/2ϕk22=PSk((DN)1/2ϕk)22+βk(DN)1/2vk22=PSk((DN)1/2ϕk)22+βk2vkDN2,\displaystyle=\|(\frac{D}{N})^{1/2}\phi_{k}\|_{2}^{2}=\|P_{S_{k}^{\perp}}\left((\frac{D}{N})^{1/2}\phi_{k}\right)\|_{2}^{2}+\|\beta_{k}(\frac{D}{N})^{1/2}v_{k}\|_{2}^{2}=\|P_{S_{k}^{\perp}}\left((\frac{D}{N})^{1/2}\phi_{k}\right)\|_{2}^{2}+\beta_{k}^{2}\|v_{k}\|_{\frac{D}{N}}^{2},

by that vkDN2=p\|v_{k}\|_{\frac{D}{N}}^{2}=p, and (A.14), and (A.20), this gives p(1+o(1))=o(1)+βk2pp(1+o(1))=o(1)+\beta_{k}^{2}p, and thus βk2=1+o(1)\beta_{k}^{2}=1+o(1).

Step 3. of Lrw{L}_{rw}: For 2kkmax2\leq k\leq k_{max}, by the relation (A.18),

vkTD(Lrwϕkμkϕk)=(LrwTDvk)TϕkμkvkTDϕk=(λkμk)vkTDϕk,v_{k}^{T}D(L_{rw}\phi_{k}-\mu_{k}\phi_{k})=(L_{rw}^{T}Dv_{k})^{T}\phi_{k}-\mu_{k}v_{k}^{T}D\phi_{k}=(\lambda_{k}-\mu_{k})v_{k}^{T}D\phi_{k},

and we have shown that

vk=αkϕk+εk,αk=1+o(1),εkDN=O(ϵ,logNNϵd/2+1).v_{k}=\alpha_{k}\phi_{k}+\varepsilon_{k},\quad\alpha_{k}=1+o(1),\quad\|\varepsilon_{k}\|_{\frac{D}{N}}=O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2+1}}}).

Similar as in the proof of Proposition 5.3,

|λkμk||vkTDNϕk|=|vkTDN(Lrwϕkμkϕk)|=|(αkϕk+εk)TDN(Lrwϕkμkϕk)|\displaystyle|\lambda_{k}-\mu_{k}||v_{k}^{T}\frac{D}{N}\phi_{k}|=|v_{k}^{T}\frac{D}{N}(L_{rw}\phi_{k}-\mu_{k}\phi_{k})|=|(\alpha_{k}\phi_{k}+\varepsilon_{k})^{T}\frac{D}{N}(L_{rw}\phi_{k}-\mu_{k}\phi_{k})|
|αk||ϕkTDNLrwϕkμkϕkDN2|+|εkTDN(Lrwϕkμkϕk)|=:+.\displaystyle~{}~{}~{}\leq|\alpha_{k}||\phi_{k}^{T}\frac{D}{N}L_{rw}\phi_{k}-\mu_{k}\|\phi_{k}\|^{2}_{\frac{D}{N}}|+|\varepsilon_{k}^{T}\frac{D}{N}(L_{rw}\phi_{k}-\mu_{k}\phi_{k})|=:①+②.

By (A.14), ϕkDN2=p(1+O(ϵ,logNNϵd/2))\|\phi_{k}\|_{\frac{D}{N}}^{2}=p(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})), and meanwhile, ϕkTDNLrwϕk=1pEN(ρXψk)=pμk+O(ϵ,logNNϵd/2)\phi_{k}^{T}\frac{D}{N}L_{rw}\phi_{k}=\frac{1}{p}E_{N}(\rho_{X}\psi_{k})=p\mu_{k}+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}) by (16). Thus =O(|ϕkTDNLrwϕkμkϕkDN2|)=O(ϵ,logNNϵd/2)①=O(|\phi_{k}^{T}\frac{D}{N}L_{rw}\phi_{k}-\mu_{k}\|\phi_{k}\|^{2}_{\frac{D}{N}}|)=O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}). By (A.16) and the bound of εk\varepsilon_{k}, ||εkDNLrwϕkμkϕkDN=O(Errpt2)|②|\leq\|\varepsilon_{k}\|_{\frac{D}{N}}\|L_{rw}\phi_{k}-\mu_{k}\phi_{k}\|_{\frac{D}{N}}=O(\text{Err}_{pt}^{2}) which is O(ϵ)O(\epsilon) as shown in the proof of Proposition 5.3. Finally, by the definition of βk\beta_{k}, and that vkDN2=p\|v_{k}\|_{\frac{D}{N}}^{2}=p,

|λkμk||βk|||+||vkDN2=O(ϵ,logNNϵd/2)+O(ϵ)p=O(ϵ,logNNϵd/2).|\lambda_{k}-\mu_{k}||\beta_{k}|\leq\frac{|①|+|②|}{\|v_{k}\|_{\frac{D}{N}}^{2}}=\frac{O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})+O(\epsilon)}{{p}}=O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}).

Since |βk|=1+o(1)|\beta_{k}|=1+o(1), this proves the bound of |λkμk||\lambda_{k}-\mu_{k}|, and the argument for all kkmaxk\leq k_{max}. ∎

Appendix D Proofs about the density-corrected graph Laplacian with W~\tilde{W}

D.1 Proofs of the point-wise convergence of L~rw\tilde{L}_{rw}

Proof of Lemma 6.1.

Part 1): By that 1NDi=1N(Yi+jiNYj)\frac{1}{N}D_{i}=\frac{1}{N}(Y_{i}+\sum_{j\neq i}^{N}Y_{j}), Yj:=Kϵ(xi,xj)Y_{j}:=K_{\epsilon}(x_{i},x_{j}). For jij\neq i, YjY_{j} has expectation (Lemma 8 in [10], Lemma A.3 in [9])

Kϵ(xi,y)p(y)𝑑V(y)=m0p(xi)+m22ϵ(ωp(xi)+Δp(xi))+Op(ϵ2),\int_{{\mathcal{M}}}K_{\epsilon}(x_{i},y)p(y)dV(y)=m_{0}p(x_{i})+\frac{m_{2}}{2}\epsilon(\omega p(x_{i})+\Delta p(x_{i}))+O_{p}(\epsilon^{2}),

where ωC()\omega\in C^{\infty}({\mathcal{M}}) is determined by manifold extrinsic coordinates; Meanwhile, Kϵ(xi,xi)=ϵd/2h(0)=O(ϵd/2)K_{\epsilon}(x_{i},x_{i})=\epsilon^{-d/2}h(0)=O(\epsilon^{-d/2}); In the independent sum 1N1jiYj\frac{1}{N-1}\sum_{j\neq i}Y_{j}, |Yj||Y_{j}| is bounded by Θ(ϵd/2)\Theta(\epsilon^{-d/2}) and has variance bounded by Θ(ϵd/2)\Theta(\epsilon^{-d/2}). The rest of the proof is the same as in proving Lemma 3.5 1).

Part 2): By part 1), under a good event E1E_{1}, which happens w.p. >12N9>1-2N^{-9}, (47) holds. Because p(x)pmin>0p(x)\geq p_{min}>0 for any xx\in{\mathcal{M}}, we then have

1NDi=m0p(xi)(1+εi(D)),sup1iN|εi(D)|=O(ϵ,logNNϵd/2).\frac{1}{N}D_{i}=m_{0}p(x_{i})(1+\varepsilon^{(D)}_{i}),\quad\sup_{1\leq i\leq N}|\varepsilon^{(D)}_{i}|=O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}). (A.21)

Since O(ϵ,logNNϵd/2)=o(1)O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})=o(1), with large enough NN and under E1E_{1}, Di>0D_{i}>0, then W~\tilde{W} is well-defined. Furtherly, by (A.21),

1Nj=1NWij11NDj=1Nj=1NWijm0p(xj)(1+εj(D))\displaystyle\frac{1}{N}\sum_{j=1}^{N}W_{ij}\frac{1}{\frac{1}{N}D_{j}}=\frac{1}{N}\sum_{j=1}^{N}\frac{W_{ij}}{m_{0}p(x_{j})(1+\varepsilon^{(D)}_{j})}
=(1m01Nj=1NWij1p(xj))(1+O(ϵ,logNNϵd/2)).(by that p>0Wij0)\displaystyle~{}~{}~{}=\left(\frac{1}{m_{0}}\frac{1}{N}\sum_{j=1}^{N}W_{ij}\frac{1}{p(x_{j})}\right)\left(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})\right).\quad\text{(by that $p>0$, $W_{ij}\geq 0$)}

Consider the r.v. Yj=Kϵ(xi,xj)p1(xj)Y_{j}=K_{\epsilon}(x_{i},x_{j})p^{-1}(x_{j}) (condition on xix_{i}), for jij\neq i,

𝔼Yj=Kϵ(xi,y)p1(y)p(y)𝑑V(y)=Kϵ(xi,y)𝑑V(y)=m0+O(ϵ),\mathbb{E}Y_{j}=\int_{{\mathcal{M}}}K_{\epsilon}(x_{i},y)p^{-1}(y)p(y)dV(y)=\int_{{\mathcal{M}}}K_{\epsilon}(x_{i},y)dV(y)=m_{0}+O(\epsilon),

YjY_{j} is bounded by Θ(ϵd/2)\Theta(\epsilon^{-d/2}) and so is its variance, where the constants in big-Θ\Theta depend on pp. Then, similar as in proving (47), we have a good event E2E_{2} which happens w.p. >12N9>1-2N^{-9}, under which

1m01Nj=1NWij1p(xj)=1+O(ϵ,logNNϵd/2),1iN,\frac{1}{m_{0}}\frac{1}{N}\sum_{j=1}^{N}W_{ij}\frac{1}{p(x_{j})}=1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}),\quad 1\leq i\leq N, (A.22)

and the constant in big-OO depends on pp, the function hh, and is uniform for all xix_{i}. Then under E1E2E_{1}\cap E_{2},

j=1NWij1Dj=(1+O(ϵ,logNNϵd/2))(1+O(ϵ,logNNϵd/2))=1+O(ϵ,logNNϵd/2),\sum_{j=1}^{N}W_{ij}\frac{1}{D_{j}}=\left(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})\right)\left(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})\right)=1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}),

which proves (48). Meanwhile, combining (48) and (A.21),

ND~i=NDij=1NWijDj=1m0p(xi)(1+εi(D))(1+O(ϵ,logNNϵd/2))=1m0p(xi)(1+O(ϵ,logNNϵd/2)),N\tilde{D}_{i}=\frac{N}{D_{i}}\sum_{j=1}^{N}\frac{W_{ij}}{D_{j}}=\frac{1}{m_{0}p(x_{i})(1+\varepsilon^{(D)}_{i})}(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}))=\frac{1}{m_{0}p(x_{i})}(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})), (A.23)

and thus under E1E2E_{1}\cap E_{2}, with large NN, D~i>0\tilde{D}_{i}>0 and L~rw\tilde{L}_{rw} is well-defined. ∎

D.2 Proofs of the Dirichlet form convergence

Proof of Lemma 6.4.

As has been shown in the proof of Lemma 6.1, under the good event in Lemma 6.1 1), (47) and then (A.21) hold. Notation of εi(D)\varepsilon^{(D)}_{i} as in (A.21), and omitting hh in the notations m2m_{2}, m0m_{0}, we have that

E~N(u)\displaystyle\tilde{E}_{N}(u) =1m2m02ϵ1N2i,j=1NWi,j(uiuj)2DiNDjN\displaystyle=\frac{1}{\frac{m_{2}}{m_{0}^{2}}\epsilon}\frac{1}{N^{2}}\sum_{i,j=1}^{N}W_{i,j}\frac{(u_{i}-u_{j})^{2}}{\frac{D_{i}}{N}\frac{D_{j}}{N}}
=1m2ϵ1N2i,j=1NWi,j(uiuj)2p(xi)p(xj)(1+εi(D))(1+εj(D))\displaystyle=\frac{1}{m_{2}\epsilon}\frac{1}{N^{2}}\sum_{i,j=1}^{N}W_{i,j}\frac{(u_{i}-u_{j})^{2}}{p(x_{i})p(x_{j})(1+\varepsilon^{(D)}_{i})(1+\varepsilon^{(D)}_{j})}
=1m2ϵ1N2i,j=1NWi,j(uiuj)2p(xi)p(xj)(1+εij),εij=O(εi(D),εj(D))\displaystyle=\frac{1}{m_{2}\epsilon}\frac{1}{N^{2}}\sum_{i,j=1}^{N}W_{i,j}\frac{(u_{i}-u_{j})^{2}}{p(x_{i})p(x_{j})}(1+\varepsilon_{ij}),\quad\varepsilon_{ij}=O(\varepsilon^{(D)}_{i},\varepsilon^{(D)}_{j})
=(1m2ϵ1N2i,j=1NWi,j(uiuj)2p(xi)p(xj))(1+O(ϵ,logNNϵd/2)),\displaystyle=\left(\frac{1}{m_{2}\epsilon}\frac{1}{N^{2}}\sum_{i,j=1}^{N}W_{i,j}\frac{(u_{i}-u_{j})^{2}}{p(x_{i})p(x_{j})}\right)(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})),

where the last row uses the non-negativity of Wi,j(uiuj)2p(xi)p(xj)W_{i,j}\frac{(u_{i}-u_{j})^{2}}{p(x_{i})p(x_{j})}. ∎

Proof of (57) in the proof of Theorem 6.3:

Proof.

Proof of (57) : By definition, for iji\neq j,

𝔼Vi,j\displaystyle\mathbb{E}V_{i,j} =1m2ϵKϵ(x,y)(f(x)f(y))2𝑑V(x)𝑑V(y)\displaystyle=\frac{1}{m_{2}\epsilon}\int_{\mathcal{M}}\int_{\mathcal{M}}K_{\epsilon}(x,y)(f(x)-f(y))^{2}dV(x)dV(y)
=2m2ϵf(x)(Kϵ(x,y)(f(x)f(y))𝑑V(y))𝑑V(x)\displaystyle=\frac{2}{m_{2}\epsilon}\int_{\mathcal{M}}f(x)\left(\int_{\mathcal{M}}K_{\epsilon}(x,y)(f(x)-f(y))dV(y)\right)dV(x)

By Lemma A.3 in [9], Kϵ(x,y)(f(x)f(y))𝑑V(y)=ϵm22Δf(x)+Of(ϵ2)\int_{\mathcal{M}}K_{\epsilon}(x,y)(f(x)-f(y))dV(y)=-\epsilon\frac{m_{2}}{2}\Delta f(x)+O_{f}(\epsilon^{2}), and thus,

𝔼Vi,j=f,Δf+Of(ϵ).\mathbb{E}V_{i,j}=\langle f,-\Delta f\rangle+O_{f}(\epsilon).

Meanwhile, by that ppmin>0p\geq p_{min}>0, 0VijΘp(1)1m2ϵKϵ(xi,xj)(f(xi)f(xj))20\leq V_{ij}\leq\Theta_{p}(1)\frac{1}{m_{2}\epsilon}K_{\epsilon}(x_{i},x_{j})(f(x_{i})-f(x_{j}))^{2}, and then by the boundedness and variance calculation in the proof of Theorem 3.4 of [9], one can verify that, with constants depending on (f,p)(f,p),

|Vij|L=Θ(ϵd/2),𝔼Vij2ν=Θ(ϵd/2).|V_{ij}|\leq L=\Theta(\epsilon^{-d/2}),\quad\mathbb{E}V_{ij}^{2}\leq\nu=\Theta(\epsilon^{-d/2}).

Then, by the same decoupling argument to derive the concentration of V-statistics, under good event E3E_{3} which happens w.p. >12N10>1-2N^{-10},

1N(N1)ij,i,j=1NVij=𝔼Vij+Of,p(logNNϵd/2).\frac{1}{N(N-1)}\sum_{i\neq j,i,j=1}^{N}V_{ij}=\mathbb{E}V_{ij}+O_{f,p}(\sqrt{\frac{\log N}{N\epsilon^{d/2}}}).

As a result,

 ③ in (56=(11N)1N(N1)ij,i,j=1NVij=(11N)(f,Δf+Of(ϵ)+Of,p(logNNϵd/2)),\text{ ③ in \eqref{eq:form-pf-1} }=(1-\frac{1}{N})\frac{1}{N(N-1)}\sum_{i\neq j,i,j=1}^{N}V_{ij}=(1-\frac{1}{N})\left(\langle f,-\Delta f\rangle+O_{f}(\epsilon)+O_{f,p}(\sqrt{\frac{\log N}{N\epsilon^{d/2}}})\right),

which proves (57) because O(1N)O(\frac{1}{N}) is higher order than O(logNNϵd/2)O(\sqrt{\frac{\log N}{N\epsilon^{d/2}}}). ∎

D.3 Proofs of the eigen-convergence of L~rw\tilde{L}_{rw}

Proof of Proposition 6.5.

The proof is similar to that of Proposition 3.6. We first restrict to the good events E1E2E_{1}\cap E_{2} in Lemma 6.1, which happens w.p. >14N9>1-4N^{-9}, under which W~\tilde{W} and L~rw\tilde{L}_{rw} are well-defined, and (47) and (48) hold.

Let uk=ρXψku_{k}=\rho_{X}\psi_{k}. The following lemma, proved in below, shows the near D~\tilde{D}-orthonormal of the vectors uku_{k} and is an analogue of Lemma 3.4.

Lemma D.1.

Under the same assumption of Lemma 6.1, when NN is sufficiently large, w.p. >14N92K2N10>1-4N^{-9}-2K^{2}N^{-10},

ρXψkD~2=1m0(1+O(ϵ,logNNϵd/2)),1kK;(ρXψk)TD~(ρXψl)=O(ϵ,logNNϵd/2),kl, 1k,lK.\begin{split}\|\rho_{X}{\psi}_{k}\|_{\tilde{D}}^{2}&=\frac{1}{m_{0}}(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})),\quad 1\leq k\leq K;\\ (\rho_{X}{\psi}_{k})^{T}\tilde{D}(\rho_{X}{\psi}_{l})&=O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}),\quad k\neq l,\,1\leq k,l\leq K.\end{split} (A.24)

Under the good event of Lemma D.1, called E5E1E2E_{5}\subset E_{1}\cap E_{2}, D~i>0\tilde{D}_{i}>0 for all ii, and with large enough NN, the set {D~1/2uk}k=1K\{\tilde{D}^{1/2}u_{k}\}_{k=1}^{K} is linearly independent, and then so is the set {uk}k=1K\{u_{k}\}_{k=1}^{K}. Let L=Span{u1,,uk}L=\text{Span}\{u_{1},\cdots,u_{k}\}, then dim(L)=kdim(L)=k for each kKk\leq K. For any vLv\in L, v0v\neq 0, there are cjc_{j}, 1jk1\leq j\leq k, such that v=j=1kcjujv=\sum_{j=1}^{k}c_{j}u_{j}. By (A.24), we have

m0vD~2=c2(1+O(ϵ,logNNϵd/2)).m_{0}\|v\|_{\tilde{D}}^{2}=\|c\|^{2}(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})). (A.25)

Meanwhile, by defining B~N(u,v):=14(E~N(u+v)E~N(uv))\tilde{B}_{N}(u,v):=\frac{1}{4}(\tilde{E}_{N}(u+v)-\tilde{E}_{N}(u-v)), similarly as in Lemma 3.3, applying Theorem 6.3 to the K2K^{2} cases where f=ψkf=\psi_{k} and (ψk±ψl)(\psi_{k}\pm\psi_{l}) gives that, under a good event E6E_{6} which happens w.p.>12K2N10>1-2K^{2}N^{-10},

E~N(ρXψk)=μk+O(ϵ,logNNϵd/2),k=1,,K,B~N(ρXψk,ρXψl)=O(ϵ,logNNϵd/2),kl, 1k,lK.\begin{split}\tilde{E}_{N}(\rho_{X}\psi_{k})&=\mu_{k}+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}),\quad k=1,\cdots,K,\\ \tilde{B}_{N}(\rho_{X}\psi_{k},\rho_{X}\psi_{l})&=O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}),\quad k\neq l,\,1\leq k,l\leq K.\end{split} (A.26)

Then, similar as in (18),

E~N(v)\displaystyle\tilde{E}_{N}(v) =j,l=1kcjckB~N(uj,uk)=j=1kcj2(μj+O(ϵ,logNNϵd/2))+jl,j,l=1k|cj||cl|O(ϵ,logNNϵd/2)\displaystyle=\sum_{j,l=1}^{k}c_{j}c_{k}\tilde{B}_{N}(u_{j},u_{k})=\sum_{j=1}^{k}c_{j}^{2}\left(\mu_{j}+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})\right)+\sum_{j\neq l,j,l=1}^{k}|c_{j}||c_{l}|O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})
=j=1kμjcj2+c2KO(ϵ,logNNϵd/2)c2(μk+O(ϵ,logNNϵd/2)).\displaystyle=\sum_{j=1}^{k}\mu_{j}c_{j}^{2}+\|c\|^{2}KO(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})\leq\|c\|^{2}\left(\mu_{k}+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})\right). (A.27)

Back to the r.h.s. of (58), together with (A.25), we have that

1m0E~N(v)vTD~vμk+O(ϵ,logNNϵd/2)1+O(ϵ,logNNϵd/2)=μk+O(ϵ,logNNϵd/2),\frac{\frac{1}{m_{0}}\tilde{E}_{N}(v)}{v^{T}\tilde{D}v}\leq\frac{\mu_{k}+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})}{1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})}=\mu_{k}+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}), (A.28)

and thus provides an UB of λk\lambda_{k}. The bound holds for all the 1kK1\leq k\leq K, under good events E5E6E_{5}\cap E_{6}. ∎

Proof of Lemma D.1.

Restrict to the good events E1E2E_{1}\cap E_{2} in Lemma 6.1, which happens w.p. >14N9>1-4N^{-9}, under which W~\tilde{W} and L~rw\tilde{L}_{rw} are well-defined, and (A.23) holds. Then,

ρXψkD~2=1Ni=1Nψk(xi)2m0p(xi)(1+O(ϵ,logNNϵd/2))=ρX(p1/2ψk)2Nm0(1+O(ϵ,logNNϵd/2)),1kK,\|\rho_{X}{\psi}_{k}\|_{\tilde{D}}^{2}=\frac{1}{N}\sum_{i=1}^{N}\frac{{\psi}_{k}(x_{i})^{2}}{m_{0}p(x_{i})}(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}))=\frac{\|\rho_{X}(p^{-1/2}\psi_{k})\|^{2}}{Nm_{0}}(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})),\quad 1\leq k\leq K,
ρX(ψk±ψl)D~2=ρX(p1/2(ψk±ψl))2Nm0(1+O(ϵ,logNNϵd/2)),kl,1k,lK.\|\rho_{X}({\psi}_{k}\pm{\psi}_{l})\|_{\tilde{D}}^{2}=\frac{\|\rho_{X}(p^{-1/2}({\psi}_{k}\pm{\psi}_{l}))\|^{2}}{Nm_{0}}(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})),\quad k\neq l,1\leq k,l\leq K.

Apply (A.2) to when f=p1/2ψkf=p^{-1/2}{\psi}_{k} and p1/2(ψk±ψl)p^{-1/2}({\psi}_{k}\pm{\psi}_{l}) for klk\neq l, and recall that ψk,ψl=δkl\langle\psi_{k},\psi_{l}\rangle=\delta_{kl}, we have

1NρX(p1/2ψk)2=1+O(logNN),1NρX(p1/2(ψk±ψl))2=2+O(logNN),\frac{1}{N}\|\rho_{X}(p^{-1/2}{\psi}_{k})\|^{2}=1+O(\sqrt{\frac{\log N}{N}}),\quad\frac{1}{N}\|\rho_{X}(p^{-1/2}({\psi}_{k}\pm\psi_{l}))\|^{2}=2+O(\sqrt{\frac{\log N}{N}}),

under a good event which happens w.p.>12K2N10>1-2K^{2}N^{-10} with large enough NN, and then

ρXψkD~2\displaystyle\|\rho_{X}{\psi}_{k}\|_{\tilde{D}}^{2} =1m0(1+O(ϵ,logNNϵd/2)),1kK,\displaystyle=\frac{1}{m_{0}}(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})),\quad 1\leq k\leq K,
ρX(ψk±ψl)D~2\displaystyle\|\rho_{X}({\psi}_{k}\pm{\psi}_{l})\|_{\tilde{D}}^{2} =2m0(1+O(ϵ,logNNϵd/2)),kl,1k,lK,\displaystyle=\frac{2}{m_{0}}(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})),\quad k\neq l,1\leq k,l\leq K,

which proves (A.24). ∎

Proof of Proposition 6.6.

The proof follows the same strategy of proving Proposition 4.4, where we introduce weights by by p(xi)p(x_{i}) in the heat kernel interpolation map when constructing candidate eigenfunctions from eigenvectors.

We restrict to the good event EUB′′E_{UB}^{\prime\prime} in Proposition 6.5, which is contained in E1E2E_{1}\cap E_{2} in Lemma 6.1. Under EUB′′E_{UB}^{\prime\prime}, Di>0D_{i}>0, D~i>0\tilde{D}_{i}>0, and L~rw\tilde{L}_{rw} is well-defined, and, with sufficiently large NN, λkλK1.1μK=O(1)\lambda_{k}\leq\lambda_{K}\leq 1.1\mu_{K}=O(1). Let L~rwvk=λkvk\tilde{L}_{rw}v_{k}=\lambda_{k}v_{k}, normalized s.t.

vkTD~vl=δkl,1k,lN.v_{k}^{T}\tilde{D}v_{l}=\delta_{kl},\quad 1\leq k,l\leq N.

Note that always λ1=0\lambda_{1}=0. Under E1E2E_{1}\cap E_{2}, (A.23) holds, and thus

m0uD~2=m0Ni=1Nui2(ND~i)=(1Ni=1Nui2p(xi))(1+O(ϵ,logNNϵd/2)),uN,m_{0}\|u\|_{\tilde{D}}^{2}=\frac{m_{0}}{N}\sum_{i=1}^{N}u_{i}^{2}(N\tilde{D}_{i})=\left(\frac{1}{N}\sum_{i=1}^{N}\frac{u_{i}^{2}}{p(x_{i})}\right)(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})),\quad\forall u\in\mathbb{R}^{N}, (A.29)

and the constant in big-OO is determined by (,p)({\mathcal{M}},p) and uniform for all uu. Define the notation

up12:=1Ni=1Nui2p(xi),uN.\|u\|_{p^{-1}}^{2}:=\frac{1}{N}\sum_{i=1}^{N}\frac{u_{i}^{2}}{p(x_{i})},\quad\forall u\in\mathbb{R}^{N}. (A.30)

Taking uu to be vkv_{k} and (vk±vl)(v_{k}\pm v_{l}) gives that

m0=vkp12(1+O(ϵ,logNNϵd/2)),1kK,2m0=vk±vlp12(1+O(ϵ,logNNϵd/2)),kl,1k,lK.\begin{split}m_{0}&=\|v_{k}\|_{p^{-1}}^{2}(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})),\quad 1\leq k\leq K,\\ 2m_{0}&=\|v_{k}\pm v_{l}\|_{p^{-1}}^{2}(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})),\quad k\neq l,1\leq k,l\leq K.\end{split} (A.31)

Set δ\delta, rr, tt in the same way as in the proof of Proposition 4.4, and define I~r[u]\tilde{I}_{r}[u] as in (59). We have I~r[u],I~r[u]=qδϵ(u~)\langle\tilde{I}_{r}[u],\tilde{I}_{r}[u]\rangle=q_{\delta\epsilon}(\tilde{u}), I~r[u],QtI~r[u]=qϵ(u~)\langle\tilde{I}_{r}[u],Q_{t}\tilde{I}_{r}[u]\rangle=q_{\epsilon}(\tilde{u}), and (60) for s>0s>0. Next, similar as in the proof of Lemma 4.2, one can show that with large NN and w.p.>12N9>1-2N^{-9},

1Nj=1NHs(xi,xj)p(xi)p(xj)=1p(xi)(1+O,p(logNNsd/2)),1iN,\frac{1}{N}\sum_{j=1}^{N}\frac{{H}_{s}(x_{i},x_{j})}{p(x_{i})p(x_{j})}=\frac{1}{p(x_{i})}(1+O_{{\mathcal{M}},p}(\sqrt{\frac{\log N}{Ns^{d/2}}})),\quad 1\leq i\leq N, (A.32)

where the notation O,p()O_{{\mathcal{M}},p}(\cdot) indicates that the constant depends on (,p)({\mathcal{M}},p) and is uniform for all xix_{i}. Applying (A.32) to s=δϵs=\delta\epsilon gives that, under a good event E(0)E_{(0)}^{\prime}, which happens w.p.>12N9>1-2N^{-9},

q~δϵ(0)(u)\displaystyle\tilde{q}^{(0)}_{\delta\epsilon}(u) =1Ni=1Nui2p(xi)(1+O,p(δd/4logNNϵd/2))\displaystyle=\frac{1}{N}\sum_{i=1}^{N}\frac{u_{i}^{2}}{p(x_{i})}(1+O_{{\mathcal{M}},p}(\delta^{-d/4}\sqrt{\frac{\log N}{N\epsilon^{d/2}}}))
=up12(1+O,p(δd/4logNNϵd/2)),uN.\displaystyle=\|u\|_{p^{-1}}^{2}(1+O_{{\mathcal{M}},p}(\delta^{-d/4}\sqrt{\frac{\log N}{N\epsilon^{d/2}}})),\quad\forall u\in\mathbb{R}^{N}. (A.33)

Applying (A.32) to s=ϵs=\epsilon gives the good event E(1)E_{(1)}^{\prime}, which happens w.p.>12N9>1-2N^{-9}, under which

q~ϵ(0)(u)=up12(1+O,p(logNNϵd/2)),uN.\tilde{q}^{(0)}_{\epsilon}(u)=\|u\|_{p^{-1}}^{2}(1+O_{{\mathcal{M}},p}(\sqrt{\frac{\log N}{N\epsilon^{d/2}}})),\quad\forall u\in\mathbb{R}^{N}. (A.34)

The constants in big-OO in (A.33) and (A.34) are determined by (,p)({\mathcal{M}},p) only and uniform for all uu.

We also need an analogue of Lemma 4.3 to upper bound q~s(2)\tilde{q}^{(2)}_{s}, proved in below. The proof follows same method of Lemma 4.3, and makes use of the uniform boundedness of pp from below, and Lemma 6.4.

Lemma D.2.

Under Assumptions 1, hh being Gaussian, let 0<α<10<\alpha<1 be a fixed constant. Suppose ϵ=o(1)\epsilon=o(1), ϵd/2=Ω(logNN)\epsilon^{d/2}=\Omega(\frac{\log N}{N}), then with sufficiently large NN, and under the good event E1E_{1} of Lemma 6.1 1),

0q~ϵ(2)(u)=(1+O(ϵ(log1ϵ)2,logNNϵd/2))(uT(D~W~)u)+up12O(ϵ3),uN,0\leq\tilde{q}^{(2)}_{\epsilon}(u)=\left(1+O\left(\epsilon(\log\frac{1}{\epsilon})^{2},\sqrt{\frac{\log N}{N\epsilon^{d/2}}}\right)\right)(u^{T}(\tilde{D}-\tilde{W})u)+\|u\|_{p^{-1}}^{2}O(\epsilon^{3}),\quad\forall u\in\mathbb{R}^{N}, (A.35)

and

0q~αϵ(2)(u)1.1αd/2(uT(D~W~)u)+up12O(ϵ3),uN.0\leq\tilde{q}^{(2)}_{\alpha\epsilon}(u)\leq 1.1\alpha^{-d/2}(u^{T}(\tilde{D}-\tilde{W})u)+\|u\|_{p^{-1}}^{2}O(\epsilon^{3}),\quad\forall u\in\mathbb{R}^{N}. (A.36)

The constants in big-OO only depend on (,p)({\mathcal{M}},p) and are uniform for all uu and α\alpha.

We proceed to define fk=I~r[vk]f_{k}=\tilde{I}_{r}[v_{k}], fkC()f_{k}\in C^{\infty}({\mathcal{M}}). Next, note that since (ID~1W~)vk=ϵλkvk(I-\tilde{D}^{-1}\tilde{W})v_{k}=\epsilon\lambda_{k}v_{k}, and vkv_{k} are D~\tilde{D}-orthonormal, then

vkT(D~W~)vk=ϵλkvkTD~vk=ϵλk,,1kK,(vk±vl)T(D~W~)(vk±vl)=ϵ(λk+λl),kl,1k,lK.\begin{split}v_{k}^{T}(\tilde{D}-\tilde{W})v_{k}&=\epsilon\lambda_{k}v_{k}^{T}\tilde{D}v_{k}=\epsilon\lambda_{k},,\quad 1\leq k\leq K,\\ (v_{k}\pm v_{l})^{T}(\tilde{D}-\tilde{W})(v_{k}\pm v_{l})&=\epsilon(\lambda_{k}+\lambda_{l}),\quad k\neq l,1\leq k,l\leq K.\end{split} (A.37)

Taking α=δ\alpha=\delta in Lemma D.2, (A.36) then gives

q~δϵ(2)(vk)=O(δd/2)ϵλk+O(ϵ3),1kK,q~δϵ(2)(vk±vl)=O(δd/2)ϵ(λk+λl)+2O(ϵ3),kl,1k,lK,\begin{split}\tilde{q}^{(2)}_{\delta\epsilon}(v_{k})&=O(\delta^{-d/2})\epsilon\lambda_{k}+O(\epsilon^{3}),\quad 1\leq k\leq K,\\ \tilde{q}^{(2)}_{\delta\epsilon}(v_{k}\pm v_{l})&=O(\delta^{-d/2})\epsilon(\lambda_{k}+\lambda_{l})+2O(\epsilon^{3}),\quad k\neq l,1\leq k,l\leq K,\end{split}

and both are O(ϵ)O(\epsilon). Meanwhile, (A.33)and (A.31) give that (with that δ>0\delta>0 is a fixed constant determined by KK and Δ-\Delta)

q~δϵ(0)(vk)=vkp12(1+O(logNNϵd/2))=m0(1+O(ϵ,logNNϵd/2)),1kK,q~δϵ(0)(vk±vl)=vk±vlp12(1+O(logNNϵd/2))=2m0(1+O(ϵ,logNNϵd/2)),kl, 1k,lK.\begin{split}\tilde{q}^{(0)}_{\delta\epsilon}(v_{k})&=\|v_{k}\|_{p^{-1}}^{2}(1+O(\sqrt{\frac{\log N}{N\epsilon^{d/2}}}))=m_{0}(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})),\quad 1\leq k\leq K,\\ \tilde{q}^{(0)}_{\delta\epsilon}(v_{k}\pm v_{l})&=\|v_{k}\pm v_{l}\|_{p^{-1}}^{2}(1+O(\sqrt{\frac{\log N}{N\epsilon^{d/2}}}))=2m_{0}(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})),\quad k\neq l,\,1\leq k,l\leq K.\end{split} (A.38)

Putting together with the bounds of qδϵ(2)q_{\delta\epsilon}^{(2)}, this gives that

fk,fk=q~δϵ(0)(vk)q~δϵ(2)(vk)=m0(1+O(ϵ,logNNϵd/2))O(ϵ),1kK,fk,fl=14(q~δϵ(vk+vl)q~δϵ(vkvl))=O(ϵ,logNNϵd/2)+O(ϵ),kl, 1k,lK.\begin{split}\langle f_{k},f_{k}\rangle&=\tilde{q}_{\delta\epsilon}^{(0)}(v_{k})-\tilde{q}_{\delta\epsilon}^{(2)}(v_{k})=m_{0}(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}))-O(\epsilon),\quad 1\leq k\leq K,\\ \langle f_{k},f_{l}\rangle&=\frac{1}{4}(\tilde{q}_{\delta\epsilon}(v_{k}+v_{l})-\tilde{q}_{\delta\epsilon}(v_{k}-v_{l}))=O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})+O(\epsilon),\quad k\neq l,\,1\leq k,l\leq K.\end{split} (A.39)

Then due to that O(ϵ,logNNϵd/2)=o(1)O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})=o(1), we have linear independence of {fj}j=1K\{f_{j}\}_{j=1}^{K} with large enough NN.

Same as before, for any 2kK2\leq k\leq K, we let Lk=Span{f1,,fk}L_{k}=\text{Span}\{f_{1},\cdots,f_{k}\}, and have (35). For any fLkf\in L_{k}, f=j=1kcjfjf=\sum_{j=1}^{k}c_{j}f_{j}, f=I~r[v]f=\tilde{I}_{r}[v] , v:=j=1kcjvjv:=\sum_{j=1}^{k}c_{j}v_{j}, and

vTD~v=j=1kcj2vjTD~vj=c2.v^{T}\tilde{D}v=\sum_{j=1}^{k}c_{j}^{2}v_{j}^{T}\tilde{D}v_{j}=\|c\|^{2}.

Meanwhile, by (A.29), m0=1m_{0}=1,

c2=vD~2=vp12(1+O(ϵ,logNNϵd/2)),\|c\|^{2}=\|v\|_{\tilde{D}}^{2}=\|v\|_{p^{-1}}^{2}(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})), (A.40)

and by (A.37),

vT(D~W~)v=ϵj=1kλjcj2ϵc2λk.v^{T}(\tilde{D}-\tilde{W})v=\epsilon\sum_{j=1}^{k}\lambda_{j}c_{j}^{2}\leq\epsilon\|c\|^{2}\lambda_{k}. (A.41)

Then, as we work under E(0)E(1)E^{(0)}\cap E^{(1)}, (A.33) and (A.34) hold. Applying to u=vu=v and subtracting the two,

q~δϵ(0)(v)q~ϵ(0)(v)\displaystyle\tilde{q}^{(0)}_{\delta\epsilon}(v)-\tilde{q}^{(0)}_{\epsilon}(v) =vp12O,p(δd/4logNNϵd/2)=c2(1+O(ϵ,logNNϵd/2))O,p(δd/4logNNϵd/2)\displaystyle=\|v\|_{p^{-1}}^{2}O_{{\mathcal{M}},p}(\delta^{-d/4}\sqrt{\frac{\log N}{N\epsilon^{d/2}}})=\|c\|^{2}(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}))O_{{\mathcal{M}},p}(\delta^{-d/4}\sqrt{\frac{\log N}{N\epsilon^{d/2}}})
=c2O,p(δd/4logNNϵd/2),\displaystyle=\|c\|^{2}O_{{\mathcal{M}},p}(\delta^{-d/4}\sqrt{\frac{\log N}{N\epsilon^{d/2}}}),

where we used (A.40) to obtain the 2nd equality. To upper bound q~ϵ(2)(v)\tilde{q}^{(2)}_{\epsilon}(v), by (A.35), and with the shorthand that O~(ϵ)\tilde{O}(\epsilon) stands for O(ϵ(log1ϵ)2){O}(\epsilon(\log\frac{1}{\epsilon})^{2}),

q~ϵ(2)(v)\displaystyle\tilde{q}^{(2)}_{\epsilon}(v) =(1+O~(ϵ)+O(logNNϵd/2))(vT(D~W~)v)+vp12O(ϵ3)\displaystyle=\left(1+\tilde{O}(\epsilon)+O(\sqrt{\frac{\log N}{N\epsilon^{d/2}}})\right)(v^{T}(\tilde{D}-\tilde{W})v)+\|v\|_{p^{-1}}^{2}O(\epsilon^{3})
(1+O~(ϵ)+O(logNNϵd/2))ϵc2λk+c2(1+O(ϵ,logNNϵd/2))O(ϵ3)\displaystyle\leq\left(1+\tilde{O}(\epsilon)+O(\sqrt{\frac{\log N}{N\epsilon^{d/2}}})\right)\epsilon\|c\|^{2}\lambda_{k}+\|c\|^{2}(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}))O(\epsilon^{3})
ϵc2{λk(1+O~(ϵ)+O(logNNϵd/2))+O(ϵ2)}.\displaystyle\leq\epsilon\|c\|^{2}\left\{\lambda_{k}\left(1+\tilde{O}(\epsilon)+O(\sqrt{\frac{\log N}{N\epsilon^{d/2}}})\right)+O(\epsilon^{2})\right\}.

Thus we have that

f,ff,Qtf(qδϵ(0)(v)qϵ(0)(v))+qϵ(2)(v)\displaystyle\langle f,f\rangle-\langle f,Q_{t}f\rangle\leq(q^{(0)}_{\delta\epsilon}(v)-q^{(0)}_{\epsilon}(v))+q^{(2)}_{\epsilon}(v)
ϵc2{λk(1+O~(ϵ)+O(logNNϵd/2))+O(ϵ2)+O,p(δd/41ϵlogNNϵd/2)}\displaystyle~{}~{}~{}\leq\epsilon\|c\|^{2}\left\{\lambda_{k}\left(1+\tilde{O}(\epsilon)+O(\sqrt{\frac{\log N}{N\epsilon^{d/2}}})\right)+O(\epsilon^{2})+O_{{\mathcal{M}},p}(\delta^{-d/4}\frac{1}{\epsilon}\sqrt{\frac{\log N}{N\epsilon^{d/2}}})\right\}
=ϵc2{λk+O~(ϵ)+O,p(δd/41ϵlogNNϵd/2)}.(by λk1.1μK)\displaystyle~{}~{}~{}=\epsilon\|c\|^{2}\left\{\lambda_{k}+\tilde{O}(\epsilon)+O_{{\mathcal{M}},p}(\delta^{-d/4}\frac{1}{\epsilon}\sqrt{\frac{\log N}{N\epsilon^{d/2}}})\right\}.\quad\text{(by $\lambda_{k}\leq 1.1\mu_{K}$)} (A.42)

To lower bound f,f\langle f,f\rangle, again by (A.36), (A.40) and (A.41),

0q~δϵ(2)(v)Θ(δd/2)(vT(D~W~)v)+vp12O(ϵ3)ϵc2(λkΘ(δd/2)+O(ϵ2))=c2O(ϵ).0\leq\tilde{q}^{(2)}_{\delta\epsilon}(v)\leq\Theta(\delta^{-d/2})(v^{T}(\tilde{D}-\tilde{W})v)+\|v\|_{p^{-1}}^{2}O(\epsilon^{3})\leq\epsilon\|c\|^{2}\left(\lambda_{k}\Theta(\delta^{-d/2})+O(\epsilon^{2})\right)=\|c\|^{2}O(\epsilon).

By (A.33) and (A.40),

q~δϵ(0)(v)=vp12(1+O(logNNϵd/2))=c2(1+O(ϵ,logNNϵd/2)),\tilde{q}^{(0)}_{\delta\epsilon}(v)=\|v\|_{p^{-1}}^{2}(1+O(\sqrt{\frac{\log N}{N\epsilon^{d/2}}}))=\|c\|^{2}(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})), (A.43)

Thus,

f,f=q~δϵ(0)(v)q~δϵ(2)(v)=c2(1+O(ϵ,logNNϵd/2)O(ϵ))c2(1O(ϵ,logNNϵd/2)).\langle f,f\rangle=\tilde{q}^{(0)}_{\delta\epsilon}(v)-\tilde{q}^{(2)}_{\delta\epsilon}(v)=\|c\|^{2}\left(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})-O(\epsilon)\right)\geq\|c\|^{2}\left(1-O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})\right).

the rest of the proof is the same as that in Proposition 4.4, where the constant CC is defined as C=c,pδd/4C=c_{{\mathcal{M}},p}\delta^{-d/4}, c,pc_{{\mathcal{M}},p} being a constant determined by (,p)({\mathcal{M}},p), and then the constant cc in the definition of cKc_{K} also depends on pp. The needed good events are E(0)E_{(0)}^{\prime}, E(1)E_{(1)}^{\prime}, and EUB′′E_{UB}^{\prime\prime}, and the LB holds for kKk\leq K. ∎

Proof of Lemma D.2.

By definition, for any uNu\in\mathbb{R}^{N},

q~ϵ(2)(u)=121N2i,j=1NHϵ(xi,xj)p(xi)p(xj)(uiuj)20.\tilde{q}^{(2)}_{\epsilon}(u)=\frac{1}{2}\frac{1}{N^{2}}\sum_{i,j=1}^{N}\frac{{H}_{\epsilon}(x_{i},x_{j})}{p(x_{i})p(x_{j})}(u_{i}-u_{j})^{2}\geq 0.

Take tt in Lemma 2.2 to be ϵ\epsilon, since ϵ=o(1)\epsilon=o(1), the three equations hold when ϵ<ϵ0\epsilon<\epsilon_{0}. By (13), truncate at an δϵ=6(10+d2)ϵlog1ϵ\delta_{\epsilon}=\sqrt{6(10+\frac{d}{2})\epsilon\log{\frac{1}{\epsilon}}} Euclidean ball, there is C3C_{3}, a positive constant determined by {\mathcal{M}}, s.t.

121N2i,j=1NHϵ(xi,xj)p(xi)p(xj)𝟏{xjBδϵ(xi)}(uiuj)2C3ϵ101N2i,j=1N(uiuj)2p(xi)p(xj)𝟏{xjBδϵ(xi)}.\frac{1}{2}\frac{1}{N^{2}}\sum_{i,j=1}^{N}\frac{{H}_{\epsilon}(x_{i},x_{j})}{p(x_{i})p(x_{j})}{\bf 1}_{\{x_{j}\notin B_{\delta_{\epsilon}}(x_{i})\}}(u_{i}-u_{j})^{2}\leq C_{3}\epsilon^{10}\frac{1}{N^{2}}\sum_{i,j=1}^{N}\frac{(u_{i}-u_{j})^{2}}{p(x_{i})p(x_{j})}{\bf 1}_{\{x_{j}\notin B_{\delta_{\epsilon}}(x_{i})\}}.

Note that

1N2i,j=1N(uiuj)2p(xi)p(xj)=2Ni=1Nui2p(xi)(1Nj=1N1p(xj))2(1Ni=1Nuip(xi))2\displaystyle\frac{1}{N^{2}}\sum_{i,j=1}^{N}\frac{(u_{i}-u_{j})^{2}}{p(x_{i})p(x_{j})}=\frac{2}{N}\sum_{i=1}^{N}\frac{u_{i}^{2}}{p(x_{i})}\left(\frac{1}{N}\sum_{j=1}^{N}\frac{1}{p(x_{j})}\right)-2\left(\frac{1}{N}\sum_{i=1}^{N}\frac{u_{i}}{p(x_{i})}\right)^{2}
2Ni=1Nui2p(xi)(1Nj=1N1p(xj))2Ni=1Nui2p(xi)1pmin=2pminup12,\displaystyle\leq\frac{2}{N}\sum_{i=1}^{N}\frac{u_{i}^{2}}{p(x_{i})}\left(\frac{1}{N}\sum_{j=1}^{N}\frac{1}{p(x_{j})}\right)\leq\frac{2}{N}\sum_{i=1}^{N}\frac{u_{i}^{2}}{p(x_{i})}\frac{1}{p_{min}}=\frac{2}{p_{min}}\|u\|^{2}_{p^{-1}}, (A.44)

thus,

q~ϵ(2)(u)=121N2i,j=1NHϵ(xi,xj)p(xi)p(xj)𝟏{xjBδϵ(xi)}(uiuj)2+up12O(ϵ10).\tilde{q}^{(2)}_{\epsilon}(u)=\frac{1}{2}\frac{1}{N^{2}}\sum_{i,j=1}^{N}\frac{{H}_{\epsilon}(x_{i},x_{j})}{p(x_{i})p(x_{j})}{\bf 1}_{\{x_{j}\in B_{\delta_{\epsilon}}(x_{i})\}}(u_{i}-u_{j})^{2}+\|u\|^{2}_{p^{-1}}O(\epsilon^{10}). (A.45)

Apply (12) with the short hand that O~(ϵ)\tilde{O}(\epsilon) stands for O(ϵ(log1ϵ)2){O}(\epsilon(\log\frac{1}{\epsilon})^{2}),

q~ϵ(2)(u)=121N2i,j=1NKϵ(xi,xj)(1+O~(ϵ))+O(ϵ3)p(xi)p(xj)𝟏{xjBδϵ(xi)}(uiuj)2+up12O(ϵ10)\displaystyle\tilde{q}^{(2)}_{\epsilon}(u)=\frac{1}{2}\frac{1}{N^{2}}\sum_{i,j=1}^{N}\frac{K_{\epsilon}(x_{i},x_{j})(1+\tilde{O}(\epsilon))+O(\epsilon^{3})}{p(x_{i})p(x_{j})}{\bf 1}_{\{x_{j}\in B_{\delta_{\epsilon}}(x_{i})\}}(u_{i}-u_{j})^{2}+\|u\|^{2}_{p^{-1}}O(\epsilon^{10})
=(1+O~(ϵ))121N2i,j=1NKϵ(xi,xj)p(xi)p(xj)𝟏{xjBδϵ(xi)}(uiuj)2+O(ϵ3)1N2i,j=1N(uiuj)2p(xi)p(xj)+up12O(ϵ10)\displaystyle=(1+\tilde{O}(\epsilon))\frac{1}{2}\frac{1}{N^{2}}\sum_{i,j=1}^{N}\frac{K_{\epsilon}(x_{i},x_{j})}{p(x_{i})p(x_{j})}{\bf 1}_{\{x_{j}\in B_{\delta_{\epsilon}}(x_{i})\}}(u_{i}-u_{j})^{2}+O(\epsilon^{3})\frac{1}{N^{2}}\sum_{i,j=1}^{N}\frac{(u_{i}-u_{j})^{2}}{p(x_{i})p(x_{j})}+\|u\|^{2}_{p^{-1}}O(\epsilon^{10})
=(1+O~(ϵ))121N2i,j=1NKϵ(xi,xj)p(xi)p(xj)𝟏{xjBδϵ(xi)}(uiuj)2+up12O(ϵ3)(by (A.44)).\displaystyle=(1+\tilde{O}(\epsilon))\frac{1}{2}\frac{1}{N^{2}}\sum_{i,j=1}^{N}\frac{K_{\epsilon}(x_{i},x_{j})}{p(x_{i})p(x_{j})}{\bf 1}_{\{x_{j}\in B_{\delta_{\epsilon}}(x_{i})\}}(u_{i}-u_{j})^{2}+\|u\|^{2}_{p^{-1}}O(\epsilon^{3})\quad\text{(by \eqref{eq:bound-quadratic-sum-with-p})}.

The truncation for Kϵ(xi,xj)K_{\epsilon}(x_{i},x_{j}) gives that Kϵ(xi,xj)𝟏{xjBδϵ(xi)}=O(ϵ10)K_{\epsilon}(x_{i},x_{j}){\bf 1}_{\{x_{j}\notin B_{\delta_{\epsilon}}(x_{i})\}}=O(\epsilon^{10}), and then similarly as in (A.45),

121N2i,j=1NKϵ(xi,xj)p(xi)p(xj)𝟏{xjBδϵ(xi)}(uiuj)2=121N2i,j=1NKϵ(xi,xj)p(xi)p(xj)(uiuj)2up12O(ϵ10).\frac{1}{2}\frac{1}{N^{2}}\sum_{i,j=1}^{N}\frac{K_{\epsilon}(x_{i},x_{j})}{p(x_{i})p(x_{j})}{\bf 1}_{\{x_{j}\in B_{\delta_{\epsilon}}(x_{i})\}}(u_{i}-u_{j})^{2}=\frac{1}{2}\frac{1}{N^{2}}\sum_{i,j=1}^{N}\frac{K_{\epsilon}(x_{i},x_{j})}{p(x_{i})p(x_{j})}(u_{i}-u_{j})^{2}-\|u\|^{2}_{p^{-1}}O(\epsilon^{10}). (A.46)

By Lemma 6.4, and m2=2m_{2}=2 with Gaussian hh, we have that under the good event E1E_{1} of Lemma 6.1 1),

E~N(u)=(12ϵ1N2i,j=1NWi,j(uiuj)2p(xi)p(xj))(1+O(ϵ,logNNϵd/2)),uN,\tilde{E}_{N}(u)=\left(\frac{1}{2\epsilon}\frac{1}{N^{2}}\sum_{i,j=1}^{N}W_{i,j}\frac{(u_{i}-u_{j})^{2}}{p(x_{i})p(x_{j})}\right)(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})),\quad\forall u\in\mathbb{R}^{N},

and the constant in big-OO is determined by (,p)({\mathcal{M}},p) and uniform for all uu. This gives that

121N2i,j=1NKϵ(xi,xj)p(xi)p(xj)(uiuj)2=ϵE~N(u)(1+O(ϵ,logNNϵd/2)),\frac{1}{2}\frac{1}{N^{2}}\sum_{i,j=1}^{N}\frac{K_{\epsilon}(x_{i},x_{j})}{p(x_{i})p(x_{j})}(u_{i}-u_{j})^{2}=\epsilon\tilde{E}_{N}(u)(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})), (A.47)

and as a result, together with (A.46),

q~ϵ(2)(u)\displaystyle\tilde{q}^{(2)}_{\epsilon}(u) =(1+O~(ϵ))(ϵE~N(u)(1+O(ϵ,logNNϵd/2))up12O(ϵ10))+up12O(ϵ3)\displaystyle=(1+\tilde{O}(\epsilon))\left(\epsilon\tilde{E}_{N}(u)(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}))-\|u\|^{2}_{p^{-1}}O(\epsilon^{10})\right)+\|u\|^{2}_{p^{-1}}O(\epsilon^{3})
=ϵE~N(u)(1+O~(ϵ)+O(logNNϵd/2))+up12O(ϵ3).\displaystyle=\epsilon\tilde{E}_{N}(u)(1+\tilde{O}(\epsilon)+O(\sqrt{\frac{\log N}{N\epsilon^{d/2}}}))+\|u\|^{2}_{p^{-1}}O(\epsilon^{3}).

Recall that E~N(u)=1ϵuT(D~W~)u\tilde{E}_{N}(u)=\frac{1}{\epsilon}u^{T}(\tilde{D}-\tilde{W})u, this proves (A.35).

To prove (A.36), since 0<αϵ<ϵ0<\alpha\epsilon<\epsilon, apply Lemma 2.2 with t=αϵt=\alpha\epsilon, and similarly as in (A.45),

q~αϵ(2)(u)=121N2i,j=1NHαϵ(xi,xj)p(xi)p(xj)𝟏{xjBδαϵ(xi)}(uiuj)2+up12O(ϵ10)\displaystyle\tilde{q}^{(2)}_{\alpha\epsilon}(u)=\frac{1}{2}\frac{1}{N^{2}}\sum_{i,j=1}^{N}\frac{{H}_{\alpha\epsilon}(x_{i},x_{j})}{p(x_{i})p(x_{j})}{\bf 1}_{\{x_{j}\in B_{\delta_{\alpha\epsilon}}(x_{i})\}}(u_{i}-u_{j})^{2}+\|u\|^{2}_{p^{-1}}O(\epsilon^{10})
=121N2i,j=1NKαϵ(xi,xj)(1+O~(αϵ))+O(α3ϵ3)p(xi)p(xj)𝟏{xjBδαϵ(xi)}(uiuj)2+up12O(ϵ10)(by (12))\displaystyle=\frac{1}{2}\frac{1}{N^{2}}\sum_{i,j=1}^{N}\frac{K_{\alpha\epsilon}(x_{i},x_{j})(1+\tilde{O}(\alpha\epsilon))+O(\alpha^{3}\epsilon^{3})}{p(x_{i})p(x_{j})}{\bf 1}_{\{x_{j}\in B_{\delta_{\alpha\epsilon}}(x_{i})\}}(u_{i}-u_{j})^{2}+\|u\|^{2}_{p^{-1}}O(\epsilon^{10})\quad\text{(by \eqref{eq:H-eps-local})}
=(1+O~(ϵ))121N2i,j=1NKαϵ(xi,xj)p(xi)p(xj)𝟏{xjBδαϵ(xi)}(uiuj)2+up12O(ϵ3).(by (A.44))\displaystyle=(1+\tilde{O}(\epsilon))\frac{1}{2}\frac{1}{N^{2}}\sum_{i,j=1}^{N}\frac{K_{\alpha\epsilon}(x_{i},x_{j})}{p(x_{i})p(x_{j})}{\bf 1}_{\{x_{j}\in B_{\delta_{\alpha\epsilon}}(x_{i})\}}(u_{i}-u_{j})^{2}+\|u\|^{2}_{p^{-1}}O(\epsilon^{3}).\quad\text{(by \eqref{eq:bound-quadratic-sum-with-p})}

Then, using (29), (A.46) and (A.47),

q~αϵ(2)(u)\displaystyle\tilde{q}^{(2)}_{\alpha\epsilon}(u) (1+O~(ϵ))αd/212N2i,j=1NKϵ(xi,xj)p(xi)p(xj)𝟏{xjBδαϵ(xi)}(uiuj)2+up12O(ϵ3)\displaystyle\leq(1+\tilde{O}(\epsilon))\alpha^{-d/2}\frac{1}{2N^{2}}\sum_{i,j=1}^{N}\frac{K_{\epsilon}(x_{i},x_{j})}{p(x_{i})p(x_{j})}{\bf 1}_{\{x_{j}\in B_{\delta_{\alpha\epsilon}}(x_{i})\}}(u_{i}-u_{j})^{2}+\|u\|^{2}_{p^{-1}}O(\epsilon^{3})
=(1+O~(ϵ))αd/2(ϵE~N(u)(1+O(ϵ,logNNϵd/2))up12O(ϵ10))+up12O(ϵ3)\displaystyle=(1+\tilde{O}(\epsilon))\alpha^{-d/2}\left(\epsilon\tilde{E}_{N}(u)(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}))-\|u\|^{2}_{p^{-1}}O(\epsilon^{10})\right)+\|u\|^{2}_{p^{-1}}O(\epsilon^{3})
=(1+O~(ϵ)+O(ϵ,logNNϵd/2))αd/2ϵE~N(u)+up12O(ϵ3),\displaystyle=(1+\tilde{O}(\epsilon)+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}))\alpha^{-d/2}\epsilon\tilde{E}_{N}(u)+\|u\|^{2}_{p^{-1}}O(\epsilon^{3}),

which proves (A.36) because O~(ϵ)+O(ϵ,logNNϵd/2)=o(1)\tilde{O}(\epsilon)+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})=o(1) and thus the constant in front of αd/2\alpha^{-d/2} is less than 1.1 for sufficiently small ϵ\epsilon. ∎

Proof of Theorem 6.7.

With sufficiently large NN, we restrict to the intersection of the good events in Proposition 6.6 and the K=kmax+1K=k_{max}+1 good events of applying Theorem 6.2 to {ψk}k=1K\{\psi_{k}\}_{k=1}^{K}. Because the good event in Proposition 6.6 is already under under EUB′′E_{UB}^{\prime\prime} of Proposition 6.5, and under E1E2E_{1}\cap E_{2} of Lemma 6.1, the extra good events in addition to what is needed in Proposition 6.6 are those corresponding to E3E4E_{3}\cap E_{4} in the proof of Theorem 6.2 where f=ψkf=\psi_{k} for each 1kK1\leq k\leq K, and, by a union bound, happens w.p.>1K4N9>1-K\cdot 4N^{-9}. This gives to the final high probability indicated in the theorem. In addition, Di>0D_{i}>0, D~i>0\tilde{D}_{i}>0 for all ii, and L~rw\tilde{L}_{rw} is well-defined.

The rest of the proof follows similar method as that of Theorem 5.5, but differs in the normalization of the eigenvectors and that of the eigenfunctions. With the definition of uD~\|u\|_{\tilde{D}} and up1\|u\|_{p^{-1}} in (61) and (A.30) respectively, As has been shown in (A.29), under E1E2E_{1}\cap E_{2},

uD~2=up12(1+O(ϵ,logNNϵd/2)),uN,\|u\|_{\tilde{D}}^{2}=\|u\|_{p^{-1}}^{2}(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})),\quad\forall u\in\mathbb{R}^{N}, (A.48)

and the constant in big-O is determined by (,p)({\mathcal{M}},p) and uniform for all uu. This also gives that with sufficiently large NN,

0.9pmaxu22N0.9up12uD~21.1up121.1pminu22N,uN,\frac{0.9}{p_{max}}\frac{\|u\|_{2}^{2}}{N}\leq 0.9\|u\|_{p^{-1}}^{2}\leq\|u\|_{\tilde{D}}^{2}\leq 1.1\|u\|_{p^{-1}}^{2}\leq\frac{1.1}{p_{min}}\frac{\|u\|_{2}^{2}}{N},\quad\forall u\in\mathbb{R}^{N}, (A.49)

because up12=1Ni=1Nui2p(xi)\|u\|_{p^{-1}}^{2}=\frac{1}{N}\sum_{i=1}^{N}\frac{u_{i}^{2}}{p(x_{i})} is upper bounded by 1pminNu22\frac{1}{p_{min}N}\|u\|_{2}^{2} and lower bounded by 1.1pmaxu22N\frac{1.1}{p_{max}}\frac{\|u\|_{2}^{2}}{N}. Apply (A.49) to u=vku=v_{k}, this gives that 0.9pmaxvk22vkD~2N=11.1pminvk22\frac{0.9}{p_{max}}\|v_{k}\|_{2}^{2}\leq\|v_{k}\|_{\tilde{D}}^{2}N=1\leq\frac{1.1}{p_{min}}\|v_{k}\|_{2}^{2}, that is

pmin1.1vk2pmax0.9,1kK,\sqrt{\frac{p_{min}}{1.1}}\leq\|v_{k}\|_{2}\leq\sqrt{\frac{p_{max}}{0.9}},\quad 1\leq k\leq K,

and this verifies that vk2=Θ(1)\|v_{k}\|_{2}=\Theta(1) under the high probability event.

Meanwhile, because the good event EUB′′E_{UB}^{\prime\prime} is under the one needed in Lemma D.1, as shown in the proof of Lemma D.1, we have that

ρXψkp12=1Ni=1Nψk(xi)2p(xi)=1+O(logNN),1kK,\|\rho_{X}{\psi}_{k}\|_{p^{-1}}^{2}=\frac{1}{N}\sum_{i=1}^{N}\frac{\psi_{k}(x_{i})^{2}}{p(x_{i})}=1+O(\sqrt{\frac{\log N}{N}}),\quad 1\leq k\leq K,

where the constant in big-OO depends on (,p)({\mathcal{M}},p) and is uniform for all kKk\leq K. By definition, Nϕ~kp12=ρXψkp12N\|\tilde{\phi}_{k}\|_{p^{-1}}^{2}=\|\rho_{X}{\psi}_{k}\|_{p^{-1}}^{2}, and then, apply (A.48) to u=ϕ~ku=\tilde{\phi}_{k},

ϕ~kD~2=ϕ~kp12(1+O(ϵ,logNNϵd/2))=1N(1+O(ϵ,logNNϵd/2)),1kK.\|\tilde{\phi}_{k}\|_{\tilde{D}}^{2}=\|\tilde{\phi}_{k}\|_{p^{-1}}^{2}(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}))=\frac{1}{N}(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})),\quad 1\leq k\leq K. (A.50)

Step 2. for L~rw\tilde{L}_{rw}: When k=1k=1, λ1=0\lambda_{1}=0, and v1v_{1} is always the constant vector, thus the discrepancy is zero. Consider 2kK2\leq k\leq K, by Theorem 6.2 and that u2Nu\|u\|_{2}\leq\sqrt{N}\|u\|_{\infty},

L~rwϕ~kμkϕ~k2=O(ϵ,logNNϵd/2+1),2kK.\|\tilde{L}_{rw}\tilde{\phi}_{k}-\mu_{k}\tilde{\phi}_{k}\|_{2}=O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2+1}}}),\quad 2\leq k\leq K. (A.51)

Then, by (A.49), NL~rwϕ~kμkϕ~kD~=O(L~rwϕ~kμkϕ~k2)=O(ϵ,logNNϵd/2+1)\sqrt{N}\|\tilde{L}_{rw}\tilde{\phi}_{k}-\mu_{k}\tilde{\phi}_{k}\|_{\tilde{D}}=O(\|\tilde{L}_{rw}\tilde{\phi}_{k}-\mu_{k}\tilde{\phi}_{k}\|_{2})=O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2+1}}}), that is, there is Errpt>0\text{Err}_{pt}>0, s.t.

NLrwϕ~kμkϕ~kD~Errpt,2kK,Errpt=O(ϵ,logNNϵd/2+1).\sqrt{N}\|L_{rw}\tilde{\phi}_{k}-\mu_{k}\tilde{\phi}_{k}\|_{\tilde{D}}\leq\text{Err}_{pt},\quad 2\leq k\leq K,\quad\text{Err}_{pt}=O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2+1}}}). (A.52)

Meanwhile, because we are under EUB′′E_{UB}^{\prime\prime}, (41) holds for λk\lambda_{k}. The proof then proceeds in the same way as the Step 2. in Theorem 5.5, replacing DN\frac{D}{N} with D~\tilde{D}. Specifically, let Sk=Span{D~1/2vk}S_{k}=\text{Span}\{\tilde{D}^{1/2}v_{k}\}, Sk=Span{D~1/2vj,jk,1jN}S_{k}^{\perp}=\text{Span}\{\tilde{D}^{1/2}v_{j},\,j\neq k,1\leq j\leq N\}. We then have PSk(D~1/2μkϕ~k)=D~1/2jk,j=1NvjTD~ϕ~kvjD~2μkvjP_{S_{k}^{\perp}}\left(\tilde{D}^{1/2}\mu_{k}\tilde{\phi}_{k}\right)=\tilde{D}^{1/2}\sum_{j\neq k,j=1}^{N}\frac{v_{j}^{T}\tilde{D}\tilde{\phi}_{k}}{\|v_{j}\|_{\tilde{D}}^{2}}\mu_{k}v_{j}, and because

L~rwTD~vj=1ϵ(IW~D~1)D~vj=1ϵ(D~W~)vj=D~λjvj,\tilde{L}_{rw}^{T}\tilde{D}v_{j}=\frac{1}{\epsilon}(I-\tilde{W}\tilde{D}^{-1})\tilde{D}v_{j}=\frac{1}{\epsilon}(\tilde{D}-\tilde{W})v_{j}=\tilde{D}\lambda_{j}v_{j}, (A.53)

we also have PSk(D~1/2L~rwϕ~k)=D~1/2jk,j=1NvjTD~ϕ~kvjD~2λjvjP_{S_{k}^{\perp}}\left(\tilde{D}^{1/2}\tilde{L}_{rw}\tilde{\phi}_{k}\right)=\tilde{D}^{1/2}\sum_{j\neq k,j=1}^{N}\frac{v_{j}^{T}\tilde{D}\tilde{\phi}_{k}}{\|v_{j}\|_{\tilde{D}}^{2}}\lambda_{j}v_{j}. Take subtraction PSk(D~1/2(L~rwϕ~kμkϕ~k))P_{S_{k}^{\perp}}\left(\tilde{D}^{1/2}(\tilde{L}_{rw}\tilde{\phi}_{k}-\mu_{k}\tilde{\phi}_{k})\right) and do the same calculation as before, by (A.52), it gives that

PSk(D~1/2ϕ~k)2=(jk,j=1N|vjTD~ϕ~k|2vjD~2)1/2ErrptNγK=1NO(ϵ,logNNϵd/2+1).\|P_{S_{k}^{\perp}}\left(\tilde{D}^{1/2}\tilde{\phi}_{k}\right)\|_{2}=\left(\sum_{j\neq k,j=1}^{N}\frac{|v_{j}^{T}\tilde{D}\tilde{\phi}_{k}|^{2}}{\|v_{j}\|_{\tilde{D}}^{2}}\right)^{1/2}\leq\frac{\text{Err}_{pt}}{\sqrt{N}\gamma_{K}}=\frac{1}{\sqrt{N}}O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2+1}}}). (A.54)

We similarly define βk:=vkTD~ϕ~kvkD~2\beta_{k}:=\frac{v_{k}^{T}\tilde{D}\tilde{\phi}_{k}}{\|v_{k}\|_{\tilde{D}}^{2}}, βkD~1/2vk=PSkD~1/2ϕ~k\beta_{k}\tilde{D}^{1/2}v_{k}=P_{S_{k}}\tilde{D}^{1/2}\tilde{\phi}_{k}, and PSk(D~1/2ϕ~k)=D~1/2ϕ~kPSkD~1/2ϕ~k=D~1/2(ϕ~kβkvk)P_{S_{k}^{\perp}}\left(\tilde{D}^{1/2}\tilde{\phi}_{k}\right)=\tilde{D}^{1/2}\tilde{\phi}_{k}-P_{S_{k}}\tilde{D}^{1/2}\tilde{\phi}_{k}=\tilde{D}^{1/2}\left(\tilde{\phi}_{k}-\beta_{k}v_{k}\right). Then, by (A.54), we have ϕ~kβkvkD~=PSk(D~1/2ϕ~k)2=1NO(ϵ,logNNϵd/2+1)\|\tilde{\phi}_{k}-\beta_{k}v_{k}\|_{\tilde{D}}=\|P_{S_{k}^{\perp}}\left(\tilde{D}^{1/2}\tilde{\phi}_{k}\right)\|_{2}=\frac{1}{\sqrt{N}}O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2+1}}}), and by (A.49),

ϕ~kβkvk2=O(ϵ,logNNϵd/2+1).\|\tilde{\phi}_{k}-\beta_{k}v_{k}\|_{2}=O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2+1}}}).

To finish Step 2, it remains to show that |βk|=1+o(1)|\beta_{k}|=1+o(1), and then we define αk=1βk\alpha_{k}=\frac{1}{\beta_{k}}. Note that

ϕ~kD~2=D~1/2ϕ~k22=PSk(D~1/2ϕ~k)22+PSk(D~1/2ϕ~k)22=PSk(D~1/2ϕ~k)22+βk2vkD~2.\|\tilde{\phi}_{k}\|_{\tilde{D}}^{2}=\|\tilde{D}^{1/2}\tilde{\phi}_{k}\|_{2}^{2}=\|P_{S_{k}^{\perp}}\left(\tilde{D}^{1/2}\tilde{\phi}_{k}\right)\|_{2}^{2}+\|P_{S_{k}}\left(\tilde{D}^{1/2}\tilde{\phi}_{k}\right)\|_{2}^{2}=\|P_{S_{k}^{\perp}}\left(\tilde{D}^{1/2}\tilde{\phi}_{k}\right)\|_{2}^{2}+\beta_{k}^{2}\|v_{k}\|_{\tilde{D}}^{2}. (A.55)

By that vkD~2=1N\|v_{k}\|_{\tilde{D}}^{2}=\frac{1}{N}, inserting into (A.55) together with (A.54), (A.50),

1N(1+O(ϵ,logNNϵd/2))=(1NO(ϵ,logNNϵd/2+1))2+βk21N,\frac{1}{N}(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}))=(\frac{1}{\sqrt{N}}O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2+1}}}))^{2}+\beta_{k}^{2}\frac{1}{N},

which gives that 1+o(1)=o(1)+βk21+o(1)=o(1)+\beta_{k}^{2} by multiplying NN to both sides.

Step 3. of L~rw\tilde{L}_{rw}: The proof is the same as Step 3. in Theorem 5.5, replacing DN\frac{D}{N} with D~\tilde{D}. Specifically, using the relation (A.53), and the eigenvector consistency in Step 2, we have

|λkμk||vkTD~ϕ~k||αk||ϕ~kTD~L~rwϕ~kμkϕ~D~2|+|εkTD~(L~rwϕ~kμkϕ~k)|=:+.|\lambda_{k}-\mu_{k}||v_{k}^{T}\tilde{D}\tilde{\phi}_{k}|\leq|\alpha_{k}||\tilde{\phi}_{k}^{T}\tilde{D}\tilde{L}_{rw}\tilde{\phi}_{k}-\mu_{k}\|\tilde{\phi}\|_{\tilde{D}}^{2}|+|\varepsilon_{k}^{T}\tilde{D}(\tilde{L}_{rw}\tilde{\phi}_{k}-\mu_{k}\tilde{\phi}_{k})|=:①+②.

where εkD~=1NO(ϵ,logNNϵd/2+1)\|\varepsilon_{k}\|_{\tilde{D}}=\frac{1}{\sqrt{N}}O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2+1}}}) and αk=1+o(1)\alpha_{k}=1+o(1). By (A.26), ϕ~kTD~L~rwϕ~k=E~N(ϕ~k)=1N(μk+O(ϵ,logNNϵd/2)\tilde{\phi}_{k}^{T}\tilde{D}\tilde{L}_{rw}\tilde{\phi}_{k}=\tilde{E}_{N}(\tilde{\phi}_{k})=\frac{1}{N}(\mu_{k}+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}). Together with (A.50), one can show that N=O(ϵ,logNNϵd/2)N①=O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}). For , with (A.52), one can verify that εkD~L~rwϕ~kμkϕ~kD~=1NO(Errpt2)=O(ϵ)N②\leq\|\varepsilon_{k}\|_{\tilde{D}}\|\tilde{L}_{rw}\tilde{\phi}_{k}-\mu_{k}\tilde{\phi}_{k}\|_{\tilde{D}}=\frac{1}{N}O(\text{Err}_{pt}^{2})=\frac{O(\epsilon)}{N}, where used that O(Errpt2)=O(ϵ)O(\text{Err}_{pt}^{2})=O(\epsilon) same as before. Putting together, and with the definition of βk\beta_{k} above,

|λkμk||βk|+vkD~2=(O(ϵ,logNNϵd/2)+O(ϵ))/N1/N=O(ϵ,logNNϵd/2).|\lambda_{k}-\mu_{k}||\beta_{k}|\leq\frac{①+②}{\|v_{k}\|_{\tilde{D}}^{2}}=\frac{(O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})+O(\epsilon))/N}{1/N}=O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}).

We have shown that |βk|=1+o(1)|\beta_{k}|=1+o(1), thus the bound of |λkμk||\lambda_{k}-\mu_{k}| is proved, and holds for kkmaxk\leq k_{max}. ∎