Eigen-convergence of Gaussian kernelized graph Laplacian by manifold heat interpolation

Xiuyuan Cheng Nan Wu Department of Mathematics, Duke University. Email: xiuyuan.cheng@duke.eduDepartment of Mathematics and Department of Statistical Science, Duke University. Email: nan.wu@duke.edu

Abstract

This work studies the spectral convergence of graph Laplacian to the Laplace-Beltrami operator when the graph affinity matrix is constructed from $N$ random samples on a $d$ -dimensional manifold embedded in a possibly high dimensional space. By analyzing Dirichlet form convergence and constructing candidate approximate eigenfunctions via convolution with manifold heat kernel, we prove that, with Gaussian kernel, one can set the kernel bandwidth parameter $\epsilon\sim(\log N/N)^{1/(d/2+2)}$ such that the eigenvalue convergence rate is $N^{-1/(d/2+2)}$ and the eigenvector convergence in 2-norm has rate $N^{-1/(d+4)}$ ; When $\epsilon\sim(\log N/N)^{1/(d/2+3)}$ , both eigenvalue and eigenvector rates are $N^{-1/(d/2+3)}$ . These rates are up to a $\log N$ factor and proved for finitely many low-lying eigenvalues. The result holds for un-normalized and random-walk graph Laplacians when data are uniformly sampled on the manifold, as well as the density-corrected graph Laplacian (where the affinity matrix is normalized by the degree matrix from both sides) with non-uniformly sampled data. As an intermediate result, we prove new point-wise and Dirichlet form convergence rates for the density-corrected graph Laplacian. Numerical results are provided to verify the theory.

Keywords: Graph Laplacian, heat kernel, Laplace-Beltrami operator, manifold learning, Gaussian kernel, spectral convergence

1 Introduction

Table 1: List of default notations

${{\mathcal{M}}}$	$d$ -dimensional manifold in $\mathbb{R}^{D}$
$p$	data sampling density on ${{\mathcal{M}}}$
$\Delta_{{\mathcal{M}}}$	Laplace-Beltrami operator, also as $\Delta$
$\mu_{k}$	population eigenvalue of $-\Delta$
$\psi_{k}$	population eigenfunctions of $-\Delta$
$\lambda_{k}$	empirical eigenvalue of graph Laplacian
$v_{k}$	empirical eigenvector of graph Laplacian
$\nabla_{{\mathcal{M}}}$	manifold gradient, also as $\nabla$
${H}_{t}$	manifold heat kernel
$Q_{t}$	semi-group operator of manifold diffusion, $Q_{t}=e^{t\Delta}$
$X$	dataset points used for computing $W$
$N$	number of samples in $X$
$\epsilon$	kernel bandwidth parameter
$K_{\epsilon}$	graph affinity kernel, $W_{ij}=K_{\epsilon}(x_{i},x_{j})$ , $K_{\epsilon}(x,y)=\epsilon^{-d/2}h(\frac{\\|x-y\\|^{2}}{\epsilon})$
$h$	a function $[0,\infty)\to\mathbb{R}$
$m_{0}$	$m_{0}[h]:=\int_{\mathbb{R}^{d}}h(\|u\|^{2})du$
$m_{2}$	$m_{2}[h]:=\frac{1}{d}\int_{\mathbb{R}^{d}}\|u\|^{2}h(\|u\|^{2})du$
$W$	kernelized graph affinity matrix

$D$	degree matrix of $W$ , $D_{ii}=\sum_{j=1}^{N}W_{ij}$
$L_{un}$	un-normalized graph Laplacian
$L_{rw}$	random-walk graph Laplacian
$E_{N}$	graph Dirichlet form
$\rho_{X}$	function evaluation operator, $\rho_{X}f=\{f(x_{i})\}_{i=1}^{N}$
$\tilde{W}$	density-corrected affinity matrix, $\tilde{W}=D^{-1}WD^{-1}$
$\tilde{D}$	degree matrix of $\tilde{W}$

Asymptotic Notations
$O(\cdot)$	$f=O(g)$ : $\|f\|\leq C\|g\|$ in the limit, $C>0$ , $O_{a}(\cdot)$ declaring the constant dependence on $a$
$\Theta(\cdot)$	$f=\Theta(g)$ : for $f$ , $g\geq 0$ , $C_{1}g\leq f\leq C_{2}g$ in the limit, $C_{1},C_{2}>0$
$\sim$	$f\sim g$ same as $f=\Theta(g)$
$o(\cdot)$	$f=o(g)$ : for $g>0$ , $\|f\|/g\to 0$ in the limit
$\Omega(\cdot)$	$f=\Omega(g)$ : for $f,g>0$ , $f/g\to\infty$ in the limit
$\tilde{O}(\cdot)$	$O(\cdot)$ multiplied another factor involving a log, defined every time used in text
When the superscript _a is omitted, it declares that the constants are absolute ones. $f=O(g_{1},g_{2})$ means that $f=O(\|g_{1}\|+\|g_{2}\|)$ .

Graph Laplacian matrices built from data samples are widely used in data analysis and machine learning. The earlier works include Isomap [2], Laplacian Eigenmap [3], Diffusion Map [10, 30], among others. Apart from being a widely-used unsupervised learning method for clustering analysis and dimension reduction (see, e.g., the review papers [33, 30]), graph Laplacian methods also drew attention via the application in semi-supervised learning [24, 12, 29, 15]. Under the manifold setting, data samples are assumed to lie on low-dimensional manifolds embedded in a possibly high-dimensional ambient space. A fundamental problem is the convergence of the graph Laplacian matrix to the manifold Laplacian operator in the large sample limit. The operator point-wise convergence has been intensively studied and established in a series of works [19, 18, 4, 10, 27], and extended to variant settings, such as different kernel normalizations [23, 36] and general class of kernels [31, 5, 9]. The eigen-convergence, namely how the empirical eigenvalues and eigenvectors converge to the population eigenvalues and eigenfunctions of the manifold Laplacian, is a more subtle issue and has been studied in [4, 34, 6, 35, 28, 14] (among others) and recently in [32, 7, 11, 8].

The current work proves the eigen-convergence, specifically the consistency of eigenvalues and eigenvectors in 2-norm, for finitely many low-lying eigenvalues of the graph Laplacian constructed using Gaussian kernel from i.i.d. sampled manifold data. The result covers the un-normalized and random-walk graph Laplacian when data density is uniform, and the density-corrected graph Laplacian (defined below) with non-uniformly sampled data. For the latter, we also prove new point-wise and Dirichlet form convergence rates as an intermediate result. We overview the main results in Section 1.1 in the context of literature, which are also summarized in Table 2.

The framework of our work follows the variational principle formulation of eigenvalues using the graph and manifold Dirichlet forms. Dirichlet form-based approach to prove graph Laplacian eigen-convergence was firstly carried out in [6] under a non-probabilistic setting. [32, 7] extended the approach under the probabilistic setting, where $x_{i}$ are i.i.d. samples, using optimal transport techniques. Our analysis follows the same form-based approach and differs from previous works in the following aspects: Let $\epsilon$ be the (squared) kernel bandwidth parameter corresponding to diffusion time, $N$ the number of samples, and $d$ the manifold intrinsic dimensionality,

$\bullet$ Leveraging the observation in [10, 27] that the bias error in the point-wise rate of graph Laplacian can be improved from $O(\sqrt{\epsilon})$ to $O(\epsilon)$ using a $C^{2}$ kernel function, we show that the improved point-wise rate ${\rm Err}_{pt}=O\left(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2+1}}}\right)$ of Gaussian kernelized graph Laplacian translates into an improved eigen-convergence rate than using compactly supported kernels. Specifically, the eigenvector (2-norm) convergence rate is ${O}((\log N/N)^{1/(d/2+3)})$ , achieved at the optimal choice of $\epsilon\sim(\log N/N)^{1/(d/2+3)}$ .

$\bullet$ We show that the eigenvalue convergence rate matches that of the Dirichlet form convergence rate ${\rm Err}_{form}=O\left(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}\right)$ in [9], which is better than the point-wise rate ${\rm Err}_{pt}$ . This leads to an eigenvalue convergence rate of ${O}((\log N/N)^{1/(d/2+2)})$ , achieved at the optimal choice of $\epsilon\sim(\log N/N)^{1/(d/2+2)}$ . The optimal $\epsilon$ for eigenvalue and eigenvector estimation differs in order of $N$ .

$\bullet$ In obtaining the initial crude eigenvalue lower bound (LB), called Step 1 in below, we develop a short proof using manifold heat kernel to define the “interpolation mapping”, which constructs from a vector $v$ a smooth function $f$ on ${\mathcal{M}}$ . The manifold variational form of $f$ , defined via the heat kernel, naturally relates to the graph Dirichlet form of $v$ when the graph affinity matrix is constructed using a Gaussian kernel. The analysis makes use of special properties of manifold heat kernel and only holds when the graph affinity kernel locally approximates the heat kernel, like the Gaussian. This specialty of heat kernel has not been exploited in previous graph Laplacian analysis to obtain eigen-convergence rates.

Towards the eigen-convergence, our work also recaps and develops several intermediate results under weaker assumptions of the kernel function (i.e., non-Gaussian), including an improved point-wise convergence rate of density-corrected graph Laplacian. The density-corrected graph Laplacian, originally proposed in [10], is an important variant of the kernelized graph Laplacian where the affinity matrix is $\tilde{W}=D^{-1}WD^{-1}$ . In applications, the data distribution $p$ is often not uniform on the manifold, and then the standard graph Laplacian with $W$ recovers the Fokker-Planck operator (weighted Laplacian) with measure $p^{2}$ , which involves a drift term depending on $\nabla_{\mathcal{M}}\log p$ . The density-corrected graph Laplacian, in contrast, recovers the Laplace-Beltrami operator consistently when $p$ satisfies certain regularity condition, and thus is useful in many applications. In this work, we first prove the point-wise convergence and Dirichlet form convergence of the density-corrected graph Laplacian with $\tilde{W}$ , both matching those of the standard graph Laplacian, and this can be of independent interest. Then the eigen-consistency result extends to such graph Laplacians (with Gaussian kernel function), also achieving the same rate as the standard graph Laplacian when $p$ is uniform.

In below, we give an overview of the theoretical results starting from assumptions, and end the introduction section with some further literature review. In the rest of the paper, Section 2 gives preliminaries needed in the analysis. Sections 3-5 develop the eigen-convergence of standard graph Laplacians, both the un-normalized and the normalized (random-walk) ones. Section 6 extends to density-corrected graph Laplacian, and Section 7 gives numerical results. We discuss possible extensions in the last section.

Notations. Default and asymptotic notations like $O(\cdot)$ , $\Omega(\cdot)$ , $\Theta(\cdot)$ , are listed in Table 1. In this paper, we treat constants which are determined by $h$ , ${\mathcal{M}}$ , $p$ as absolute ones, including the intrinsic dimension $d$ . We mainly track the number of samples $N$ and the kernel diffusion time parameter $\epsilon$ , and we may emphasize the constant dependence on $p$ or ${\mathcal{M}}$ in certain circumstances, using the subscript notation like $O_{{\mathcal{M}}}(\cdot)$ . All constant dependence can be tracked in the proof.

Table 2: Summary of theoretical results.

	$p$ uniform		$p$ non-uniform	Needed assumptions		Error bound
	$L_{un}$ with $W$	$L_{rw}$ with $W$	$\tilde{L}_{rw}$ with $\tilde{W}$	on $h$	on $\epsilon$ ( $\epsilon\to 0+$ )	Error bound
Eigenvalue UB	Prop. 3.1	Prop. 3.6	Prop. 6.5	Assump. 2	$\epsilon^{d/2}=\Omega(\frac{\log N}{N})$	form rate
Crude eigenvalue LB	Prop. 4.1	Prop. 4.4	Prop. 6.6	Gaussian	$\epsilon^{d/2+2}>c_{K}\frac{\log N}{N}$	$O(1)$
Eigenvector convergence	Prop. 5.2	-	-	Gaussian	$\epsilon^{d/2+2}>c_{K}\frac{\log N}{N}$	point-wise rate
Eigenvalue convergence	Prop. 5.3	-	-	Gaussian	$\epsilon^{d/2+2}>c_{K}\frac{\log N}{N}$	form rate
Eigenvalue/vector combined convergence	Thm. 5.4	Thm. 5.5	Thm. 6.7	Gaussian	$\epsilon^{d/2+3}\sim\frac{\log N}{N}$ (optimal order of $\epsilon$ to minimize ${\rm Err}_{pt}$ )	Both $\lambda_{k}$ and $v_{k}$ : $\tilde{O}(N^{-{1}/(d/2+3)})$
Eigenvalue/vector combined convergence	Thm. 5.4		Thm. 6.7	Gaussian	$\epsilon^{d/2+2}\sim\frac{\log N}{N}$ (optimal order of $\epsilon$ to minimize ${\rm Err}_{form}$ )	$\lambda_{k}:\tilde{O}(N^{-{1}/{(d/2+2)}})$ , $v_{k}:\tilde{O}(N^{-{1}/{(d+4)}})$
Point-wise convergence	Thm. 5.1 [27, 9]^∗		Thm. 6.2	Assump. 2	$\epsilon^{d/2+1}=\Omega(\frac{\log N}{N})$	point-wise rate
Dirichlet form convergence	Thm. 3.2 [9]^∗		Thm. 6.3	Assump. 2	$\epsilon^{d/2}=\Omega(\frac{\log N}{N})$	form rate

“form rate” is ${\rm Err}_{form}=O\left(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}\right)$ , “point-wise rate” is ${\rm Err}_{pt}=O\left(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2+1}}}\right)$ .

In the table, convergence of first $k_{max}$ eigenvalues and eigenvectors are concerned, where $k_{max}$ is fixed. In the most right column, “ $\lambda_{k}$ ” means the error of eigenvalue convergence, and “ $v_{k}$ ” means the error of eigenvector convergence (in 2-norm). $\tilde{O}(\cdot)$ stands for the possible involvement of a factor of $(\log N)^{\alpha}$ for some $\alpha>0$ . In the 2nd (3rd) column, the eigenvector and eigenvalue convergences are proved in Thm. 5.5 (Thm. 6.7) and are not written as separated propositions. ^∗The point-wise convergence and Dirichlet form convergence results of graph Laplacian with $W$ hold when $p$ satisfies Assump. 1(A2), i.e., when $p$ is not uniform. The Dirichlet form convergence with rate may hold when $h$ is not differentiable, e.g., when $h={\bf 1}_{[0,1)}$ , c.f. Remark 2.

1.1 Overview of main results

We first introduce needed assumptions, and then provide a technical overview of our analysis in Section 1.1.2 (Steps 0-1) and Section 1.1.3 (Steps 2-3), summarized as a roadmap at the end of the section.

1.1.1 Set-up and assumptions

The current paper inherits the probabilistic manifold data setting, namely, the dataset $\{x_{i}\}_{i=1}^{N}$ consists of i.i.d. samples drawn from a distribution on ${\mathcal{M}}$ with density $p$ satisfying the following assumption:

Assumption 1 (Smooth ${{\mathcal{M}}}$ and $p$ ).

(A1) ${{\mathcal{M}}}$ is a $d$ -dimensional compact connected $C^{\infty}$ manifold (without boundary) isometrically embedded in $\mathbb{R}^{D}$ .

(A2) $p\in C^{\infty}({\mathcal{M}})$ and uniformly bounded both from below and above, that is, $\exists p_{min},\,p_{max}>0$ s.t.

0<p_{min}\leq p(x)\leq p_{max}<\infty,\quad\forall x\in{{\mathcal{M}}}.

Suppose ${\mathcal{M}}$ is embedded via $\iota$ , and when there is no danger of confusion, we use the same notation $x$ to denote $x\in{{\mathcal{M}}}$ and $\iota(x)\in\mathbb{R}^{D}$ . We have the measure space $({\mathcal{M}},dV)$ : when ${\mathcal{M}}$ is orientable, $dV$ is the Riemann volume form; otherwise, $dV$ is the measure associated with the local volume form. The smoothness of $p$ and ${\mathcal{M}}$ fulfills many application scenarios, and possible extensions to less regular ${\mathcal{M}}$ or $p$ are postponed. Our analysis first addresses the basic case where $p$ is uniform on ${\mathcal{M}}$ , i.e., $p=\frac{1}{\mathrm{Vol}({\mathcal{M}})}$ and is a positive constant. For non-uniform $p$ as in (A2), we adopt and analyze the density correction graph Laplacian in Section 6. In both cases, the graph Laplacian recovers the Laplace-Beltrami operator $\Delta_{\mathcal{M}}$ . In below, we write $\Delta_{{\mathcal{M}}}$ as $\Delta$ , $\nabla_{{\mathcal{M}}}$ as $\nabla$ .

Given $N$ data samples, the graph affinity matrix $W$ and the degree matrix $D$ are defined as

W_{ij}=K_{\epsilon}(x_{i},x_{j}),\quad D_{ii}=\sum_{j=1}^{N}W_{ij}.

$W$ is real symmetric, typically $W_{ij}\geq 0$ , and for the kernelized affinity matrix, $W_{ij}=K_{\epsilon}(x_{i},x_{j})$ where

K_{\epsilon}(x,y):=\epsilon^{-d/2}h\left(\frac{\|x-y\|^{2}}{\epsilon}\right),

(1)

for a function $h:[0,\infty)\to\mathbb{R}$ . The parameter $\epsilon>0$ can be viewed as the “time” of the diffusion process. Some results in literature are written in terms of the parameter $\sqrt{\epsilon}>0$ , which corresponds to the scale of the local distance $\|x-y\|$ such that $h(\frac{\|x-y\|^{2}}{\epsilon})$ is of $O(1)$ magnitude. Our results are written with respect to the time parameter $\epsilon$ , which corresponds to the squared local distance length scale.

Our main result of graph Laplacian eigen-convergence considers when the kernelized graph affinity is computed with

h(\xi)=\frac{1}{(4\pi)^{d/2}}e^{-\xi/4},\quad\xi\in[0,\infty),

(2)

we call such $h$ the Gaussian kernel function. (The constant factor $(4\pi)^{-d/2}$ is included in the definition of $h$ for theoretical convenience, and may not be needed in algorithm, e.g., in the normalized graph Laplacian the constant factor is cancelled.)

The Gaussian $h$ belongs to a larger family of differentiable functions:

Assumption 2 (Differentiable $h$ ).

(C1) Regularity. $h$ is continuous on $[0,\infty)$ , $C^{2}$ on $(0,\infty)$ .
(C2) Decay condition. $\exists a,a_{k}>0$ , s.t., $|h^{(k)}(\xi)|\leq a_{k}e^{-a\xi}$ for all $\xi>0$ , $k=0,1,2$ .
(C3) Non-negativity. $h\geq 0$ on $[0,\infty)$ . To exclude the case that $h\equiv 0$ , assume $\|h\|_{\infty}>0$ .

A summary of results with needed assumptions is provided in Table 2, from which we can see that several important intermediate results, which can be of independent interest, only require $h$ to satisfy Assumption 2 or weaker, including

- Point-wise convergence of graph Laplacians.

- Convergence of the graph Dirichlet form.

- The eigenvalue upper bound (UB), which matches to the Dirichlet form convergence rate.

The point-wise convergence and Dirichlet form convergence of standard graph Laplacian only require a differentiable and decay condition of $h$ as originally taken in [10], and even without Assumption 2(C3) non-negativity. Our analysis of density-corrected graph Laplacian assumes $W_{ij}\geq 0$ , and our main result of eigen-convergence needs $h$ to be Gaussian, thus we include (C3) in Assumption 2 to simplify exposition. The need of Gaussian $h$ shows up in proving the (initial crude) eigenvalue lower bound (LB), to be explained in below, and it is due to the fundamental connection between Gaussian kernel and the manifold heat kernel.

1.1.2 Eigenvalue UB/LB and the interpolation mapping

To explain these results and the difference in proving eigenvalue UB and LB, we start by introducing the notion of point-wise rate and form rate. In the current paper,

$\bullet$ Point-wise convergence of graph Laplacians is shown to have the rate of $O\left(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2+1}}}\right)$ . We call this rate the “point-wise rate”, and denote by ${\rm Err}_{pt}$ .

$\bullet$ Convergence of the graph Dirichlet form $\frac{1}{\epsilon N^{2}}u^{T}(D-W)u$ applied to smooth manifold functions, i.e., $u=\{f(x_{i})\}_{i=1}^{N}$ for $f$ smooth on ${\mathcal{M}}$ , is shown to have the rate of $O\left(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}\right)$ . We call this rate the “form rate”, and denote by ${\rm Err}_{form}$ .

In literature, the point-wise convergence of random-walk graph Laplacian $(I-D^{-1}W)$ with differentiable and decay $h$ was firstly shown to have rate $O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2+1}}})$ in [27]. The exposition in [27] was for Gaussian $h$ but the analysis therein extends directly to general $h$ . The Dirichlet form convergence with differentiable $h$ was shown to have rate $O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})$ in [9] via a V-statistic analysis. [9] also derived point-wise rate for both the random-walk and the un-normalized graph Laplacian $(D-W)$ . The analysis in [9] was mainly developed for kernel with adaptive bandwidth, and higher order regularity of $h$ ( $C^{4}$ instead of $C^{2}$ ) was assumed to handle the complication due to variable kernel bandwidth. For the fixed-bandwidth kernel as in (1), the analysis in [9] can be simplified to proceed under less restrictive conditions of $h$ . We include more details in below when quoting these previous results, which pave the way towards proving eigen-convergence.

Table 2 illustrates a difference between eigenvalue UB and LB analysis. Specifically, the eigenvalue UB holds for general differentiable $h$ , while the initial crude eigenvalue LB, and consequently the final eigenvalue and eigenvector convergence rate, need $h$ to be Gaussian. This difference between eigenvalue UB and LB analysis is due to the subtlety of the variational principle approach in analyzing empirical eigenvalues. To be more specific, by “projecting” the population eigenfunctions to vectors in $\mathbb{R}^{N}$ and use as “candidate” eigenvectors in the variational form, the Dirichlet form convergence rate directly translates into a rate of eigenvalue UB (for fixed finitely many low-lying eigenvalues). This is why the eigenvalue UB matches the form rate before any LB is derived, and we call this the “Step 0” of our analysis.

The eigenvalue LB, however, is more difficult, as has been pointed out in [6]. In [6] and following works taking the variational principle approach, the LB analysis is by “interpolating” the empirical eigenvectors to be functions on ${\mathcal{M}}$ . Unlike with the population eigenfunctions which are known to be smooth, there is less property of the empirical eigenvectors that one can use, and any regularity property of these discrete objects is usually non-trivial to obtain [8]. The interpolation mapping in [6] first assigns a point $x_{i}$ to a Voronoi cell $V_{i}$ , assuming that $\{x_{i}\}_{i}$ forms an $\varepsilon$ -net of ${\mathcal{M}}$ to begin with (a non-probabilistic setting), and this maps a vector $u$ to a piece-wise constant function $P^{*}u$ on ${\mathcal{M}}$ ; next, $P^{*}u$ is convolved with a kernel function which is compacted supported on a small geodesic ball, and this produces “candidate” eigenfunctions, whose manifold differential Dirichlet form is upper bounded by the graph Dirichlet form of $u$ , up to an error, through differential geometry calculations. Under the probabilistic setting of i.i.d. samples, [32] constructed the mapping $P^{*}$ using a Wasserstein- $\infty$ optimal transport (OT) map, where the $\infty$ -OT distance between the empirical measure $\frac{1}{N}\sum_{i}\delta_{x_{i}}$ and the population measure $pdV$ is bounded by constructing a Voronoi tessellation of ${\mathcal{M}}$ when $d\geq 2$ . This led to an overall eigen-convergence rate of $\tilde{O}(N^{-1/2d})$ in [32] when $h$ is compactly supported and satisfies certain regularity conditions and $d\geq 2$ , the $\tilde{O}(\cdot)$ indicating a possible a factor of certain power of $\log N$ . A typical example is when $h$ is an indicator function $h={\bf 1}_{[0,1)}$ , which is called “ $\varepsilon$ -graph” in computer science literature ( $\varepsilon$ corresponds to $\sqrt{\epsilon}$ in our notation). The approach was extended to $k$ NN graphs in [7], where the rate of eigenvalue and $2$ -norm eigenvector convergence was also improved to match the point-wise rate of the epsilon-graph or $k$ NN graph Laplacians, leading to a rate of $\tilde{O}(N^{-1/(d+4)})$ when ${\epsilon}^{d/2+2}=\Omega(\frac{\log N}{N})$ . The same rate was shown for $\infty$ -norm consistency of eigenvectors in [8], combined with Lipschitz regularity analysis of empirical eigenvectors using advanced PDE tools. Eigenvalue consistency with degraded rate was obtained under the regime $\epsilon^{d/2}=\Omega(\frac{\log N}{N})$ , which is very sparse graph just beyond graph connectivity threshold [7].

In the current work, we take a different approach for the interpolation mapping in the eigenvalue LB analysis. Our method is based on manifold heat kernels, and the analysis makes use of the fact that at short time and on small local neighborhoods, the heat kernel ${H}_{t}(x,y)$ can be approximated by

G_{t}(x,y):=\frac{1}{(4\pi t)^{d/2}}e^{-\frac{d_{\mathcal{M}}(x,y)^{2}}{4t}},

(3)

and consequently by $K_{t}(x,y)$ when $h$ is Gaussian as in (2). The first approximation $H_{t}\approx G_{t}$ is by classical results of elliptical operators on Riemannian manifolds, c.f. Theorem 2.1. Next, we show that $G_{t}\approx K_{t}$ because $K_{t}$ replaces geodesic distance $d_{\mathcal{M}}(x,y)$ with Euclidean distance $\|x-y\|$ in $G_{t}$ , and the two locally match by $d_{\mathcal{M}}(x,y)=\|x-y\|+O(\|x-y\|^{3})$ . (The constant in the big-O here depends on the second fundamental form, and by compactness of ${\mathcal{M}}$ is universal for $x$ . Similar universal constant in big-O holds throughout the paper.) These estimates allow us to construct interpolated $C^{\infty}({\mathcal{M}})$ functions $I_{r}[v]$ from discrete vector $v\in\mathbb{R}^{N}$ by convolving with the heat kernel at time $r=\frac{\epsilon\delta}{2}$ , where $0<\delta<1$ is a fixed constant determined by the first $K=k_{max}+1$ low-lying population eigenvalues $\mu_{k}$ of $-\Delta$ . Specifically, $\delta$ is inversely proportional to the smallest eigen-gap between $\mu_{k}$ for $k\leq K$ ( $\mu_{k}$ assumed to have single multiplicity in the first place, and then the result generalizes to greater than one multiplicity), which is an $O(1)$ constant determined by $-\Delta$ and $K$ . Applying the variational principle to the operator $I-Q_{t}$ , where $Q_{t}$ is the diffusion semi-group operator and $Q_{t}$ ’s spectrum is determined by that of $-\Delta$ , allows to prove an initial eigenvalue LB smaller than half of the minimum first- $K$ eigen-gap.

The step to derive $O(1)$ initial crude eigenvalue LB using manifold heat kernel interpolation mapping is called “Step 1” in our analysis. While the interpolation mapping by convolving with a smooth kernel has been used in previous works [6, 32, 7], using the manifold heat kernel plays a special role in the eigenvalue LB analysis, and this cannot be equivalently achieved by other choices of kernels (unless the kernel locally approximates the heat kernel, like the Gaussian kernel here). Specifically, Lemma 4.3 is proved using heat kernel properties (without using concentration of i.i.d. data samples), and the lemma connects the continuous integral form of interpolated candidate eigenfunctions with the graph Dirichlet form.

1.1.3 Road-map of analysis

The previous subsection has explained Step 0 and 1 of our analysis. Here we summarize the rest of the analysis and provide a road-map.

After an $O(1)$ initial crude eigenvalue LB is obtained in Step 1, we adopt the “bootstrap strategy” from [7], named as therein, to obtain a refined (2-norm) eigenvector consistency rate to match to the graph Laplacian point-wise convergence rate. We call this “Step 2”. Note that the use of smooth kernel (like Gaussian) has an improved bias error in the point-wise rate than compactly supported kernel function, and then consequently improves the eigen-convergence rate, see more in Remark 4.

Next, leveraging the eigenvector consistency proved in Step 2, we further improve the eigenvalue convergence to match the form rate, which is better than the point-wise rate. We call this “Step 3”. Then the refined eigenvalue LB matches the eigenvalue UB in rate. In the process, the first $K$ many empirical eigenvalues are upper bounded to be $O(1)$ , which follows by the eigenvalue UB proved in the beginning.

In summary, our eigen-convergence analysis consists of the following four steps,

-

Step 0. Eigenvalue UB by the Dirichlet form convergence, matching to the form rate.
-

Step 1. Initial crude eigenvalue LB, providing eigenvalue error up to the smallest first $K$ eigen-gap.
-

Step 2. $2$ -norm consistency of eigenvectors, up to the point-wise rate.
-

Step 3. Refined eigenvalue consistency, up to the form rate.

Step 1 requires $h$ to be non-negative and currently only covers the Gaussian case. This may be relaxed, since the proof only uses the approximation property of $h$ , namely that $K_{\epsilon}\approx{H}_{\epsilon}$ . In this work, we restrict to the Gaussian case for simplicity and the wide use of Gaussian kernels in applications.

1.2 More related works

As we adopt a Dirichlet form-based analysis, the eigen-convergence result in the current paper is of the same type as in previous works using variational principle [6, 32, 7]. In particular, the rate concerns the convergence of the first $k_{max}$ many low-lying eigenvalues of the Laplacian, where $k_{max}$ is a fixed finite integer. The constants in the big- $O$ notations in the bounds are treated as $O(1)$ , and they depend on $k_{max}$ and these leading eigenvalues and eigenfunctions of the manifold Laplacian. Such results are useful for applications where leading eigenvectors are the primary focus, e.g., spectral clustering and dimension-reduced spectral embedding. An alternative approach is to analyze functional operator consistency [4, 34, 28, 26], which may provide different eigen-consistency bounds, e.g., $\infty$ -norm consistency of eigenvectors using compact embedding of Glivenko-Cantelli function classes [11].

The current work considers noise-less data on ${\mathcal{M}}$ , while the robustness of graph Laplacian against noise in data is important for applications. When manifold data vectors are perturbed by noise in the ambient space, [13] showed that Gaussian kernel function $h$ has special property to make kernelized graph Laplacian robust to noise (by a modification of diagonal entries). More recently, [20] showed that bi-stochastic normalization can make the Gaussian kernelized graph affinity matrix robust to high dimensional heteroskedastic noise in data. These results suggest that Gaussian $h$ is a special and useful choice of kernel function for graph Laplacian methods.

Meanwhile, bi-stochastically normalized graph Laplacian has been studied in [23], where the point-wise convergence of the kernel integral operator to the manifold operator was proved. The spectral convergence of bi-stochastically normalized graph Laplacian for data on hyper-torus was recently proved to be $O(N^{-1/(d/2+4)+o(1)})$ in [36]. The density-corrected affinity kernel matrix $\tilde{W}=D^{-1}WD^{-1}$ , which is analyzed in the current work, provides another normalization of the graph Laplacian which recovers the Laplace-Beltrami operator. It would be interesting to explore the connections to these works and extend our analysis to bi-stochastically normalized graph Laplacians, which may have better properties of spectral convergence and noise-robustness.

2 Preliminaries

2.1 Graph and manifold Laplacians

We define the following moment constants of function $h$ satisfying Assumption 2,

m_{0}[h]:=\int_{\mathbb{R}^{d}}h(\|u\|^{2})du,\quad m_{2}[h]:=\frac{1}{d}\int_{\mathbb{R}^{d}}\|u\|^{2}h(\|u\|^{2})du,\quad\tilde{m}[h]:=\frac{m_{2}[h]}{2m_{0}[h]}.

By (C3), $h\geq 0$ and the case $h\equiv 0$ is excluded, thus $m_{0}[h],m_{2}[h]>0$ . With Gaussian $h$ as in (2), $m_{0}=1$ , $m_{2}=2$ , and $\tilde{m}=1$ . Denote $m_{2}[h]$ and $m_{0}[h]$ by $m_{2}$ and $m_{0}$ for a shorthand notation, and

•

The un-normalized graph Laplacian $L_{un}$ is defined as

$L_{un}:=\frac{1}{\frac{m_{2}}{2}p\epsilon N}(D-W).$ (4)

Note that the standard un-normalized graph Laplacian is usually $D-W$ , and we divide by the constant $\frac{m_{2}}{2}p\epsilon N$ for the convergence of $L_{un}$ to $-\Delta$ .
•

The random-walk graph Laplacian $L_{rw}$ is defined as

$L_{rw}:=\frac{1}{\frac{m_{2}}{2m_{0}}\epsilon}(I-D^{-1}W),$ (5)

with the constant normalization to ensure convergence to $-\Delta$ .

The matrix $L_{un}$ is real-symmetric, positive semi-definite (PSD), and the smallest eigenvalue is zero. Suppose eigenvalues of $L_{un}$ are $\lambda_{k}$ , $k=1,2,\cdots$ , and sorted in ascending order, that is,

0=\lambda_{1}(L_{un})\leq\lambda_{2}(L_{un})\leq\cdots\leq\lambda_{N}(L_{un}).

The $L_{rw}$ matrix is well-define when $D_{i}>0$ for all $i$ , which holds w.h.p. under the regime that $\epsilon^{d/2}=\Omega(\frac{\log N}{N})$ , c.f. Lemma 3.5. We always work under the $\epsilon^{d/2}=\Omega(\frac{\log N}{N})$ regime, namely the connectivity regime. Due to that $D^{-1}W$ is similar to $D^{-1/2}WD^{-1/2}$ which is PSD, $L_{rw}$ is also real-diagonalized and has $N$ non-negative real eigenvalues, sorted and denoted as $0=\lambda_{1}(L_{rw})\leq\lambda_{2}(L_{rw})\leq\cdots\leq\lambda_{N}(L_{rw})$ . We also have that, by the min-max variational formula for real-symmetric matrix,

\lambda_{k}(L_{un})=\min_{L\subset\mathbb{R}^{N},\,dim(L)=k}\sup_{v\in L,v\neq 0}\frac{v^{T}L_{un}v}{v^{T}v},\quad k=1,\cdots,N.

We define the graph Dirichlet form $E_{N}(u)$ for $u\in\mathbb{R}^{N}$ as

E_{N}(u)=\frac{1}{\frac{m_{2}}{2}}\frac{1}{\epsilon N^{2}}u^{T}(D-W)u=\frac{1}{\frac{m_{2}}{2}}\frac{1}{2\epsilon N^{2}}\sum_{i,j=1}^{N}W_{i,j}(u_{i}-u_{j})^{2}.

(6)

By (4), $E_{N}(u)=p\frac{1}{N}u^{T}L_{un}u$ , and thus

\lambda_{k}(L_{un})=\min_{L\subset\mathbb{R}^{N},\,dim(L)=k}\sup_{v\in L,v\neq 0}\frac{E_{N}(v)}{p\frac{1}{N}\|v\|^{2}},\quad k=1,\cdots,N.

(7)

Similarly, we have

\lambda_{k}(L_{rw})=\min_{L\subset\mathbb{R}^{N},\,dim(L)=k}\sup_{v\in L,v\neq 0}\frac{E_{N}(v)}{\frac{1}{m_{0}}\frac{1}{N^{2}}v^{T}Dv},\quad k=1,\cdots,N.

(8)

To introduce notations of manifold Laplacian, we define inner-product in $H:=L^{2}({\mathcal{M}},dV)$ as $\langle f,g\rangle:=\int_{\mathcal{M}}f(x)g(x)dV(x)$ , for $f,g\in L^{2}({\mathcal{M}},dV)$ . We also use $\langle\cdot,\cdot\rangle_{q}$ to denote inner-product in $L^{2}({\mathcal{M}},qdV)$ , $qdV$ being a general measure on ${\mathcal{M}}$ (not necessarily probability measure), that is $\langle f,g\rangle_{q}:=\int_{\mathcal{M}}f(x)g(x)q(x)dV(x)$ , for $f,g\in L^{2}({\mathcal{M}},qdV)$ . For smooth connected compact manifold ${\mathcal{M}}$ , the (minus) manifold Laplacian-Beltrami operator $-\Delta$ has eigen-pairs $\{\mu_{k},\psi_{k}\}_{k=1}^{\infty}$ ,

0=\mu_{1}<\mu_{2}\leq\cdots\leq\mu_{k}\leq\cdots,

-\Delta\psi_{k}=\mu_{k}\psi_{k},\quad\langle\psi_{k},\psi_{l}\rangle=\delta_{k,l},\quad\psi_{k}\in C^{\infty}({\mathcal{M}}),\quad k,l=1,2,\cdots.

The second eigenvalue $\mu_{2}>0$ due to connectivity of ${\mathcal{M}}$ . When $\mu_{i}=\cdots=\mu_{i+l-1}=\mu$ for some eigenvalue $\mu$ of $-\Delta$ having multiplicity $l$ , the eigenfunctions $\psi_{i},\cdots,\psi_{i+l-1}$ can be set to be an orthonormal basis of the $l$ -dimensional eigenspace associated with $\mu$ . Note that $\psi_{k}\in C^{\infty}({\mathcal{M}})$ for generic smooth ${\mathcal{M}}$ .

2.2 Heat kernel on ${\mathcal{M}}$

We leverage the special property of Gaussian kernel in the ambient space $\mathbb{R}^{D}$ that it locally approximates the manifold heat kernel on ${{\mathcal{M}}}$ . We start from the notations of manifold heat kernel. Since ${\mathcal{M}}$ is smooth compact (no-boundary), the Green’s function of the heat equation on ${{\mathcal{M}}}$ exists, namely the heat kernel ${H}_{t}(x,y)$ of ${\mathcal{M}}$ . We denote the heat diffusion semi-group operator as $Q_{t}$ which can be formally written as $Q_{t}=e^{t\Delta}$ , and

Q_{t}f(x)=\int_{{\mathcal{M}}}{H}_{t}(x,y)f(y)dV(y),\quad\forall f\in L^{2}({\mathcal{M}},dV).

By that $Q_{t}$ is semi-group, we have the reproduce property

\int_{{\mathcal{M}}}{H}_{t}(x,y){H}_{t}(y,z)dV(y)=H_{2t}(x,z),\quad\forall x,z\in{\mathcal{M}},\quad\forall t>0.

Meanwhile, by the probability interpretation,

\int_{{\mathcal{M}}}{H}_{t}(x,y)dV(y)=1,\quad\forall x\in{\mathcal{M}},\quad\forall t>0.

Using the eigenvalue and eigenfunctions $\{\mu_{k},\psi_{k}\}_{k}$ of $-\Delta$ , the heat kernel has the expansion representation ${H}_{t}(x,y)=\sum_{k=1}^{\infty}e^{-t\mu_{k}}\psi_{k}(x)\psi_{k}(y)$ . We will not use the spectral expansion of ${H}_{t}$ in our analysis, but only that $\psi_{k}$ are also eigenfunctions of $Q_{t}$ , that is,

Q_{t}\psi_{k}=e^{-t\mu_{k}}\psi_{k},\quad k=1,2,\cdots

(9)

Next, we derive Lemma 2.2, which characterizes two properties of the heat kernel ${H}_{t}$ at sufficiently short time: First, on a local neighborhood on ${\mathcal{M}}$ , $H_{t}(x,y)$ can be approximated by $K_{t}(x,y)$ in the leading order, where $K_{t}$ is defined as in (1) with Gaussian $h$ ; Second, globally on the manifold the heat kernel $H_{t}(x,y)$ has a sub-Gaussian decay. These are based on classical results about heat kernel on Riemannian manifolds [21, 16, 25, 17], summarized in the following theorem.

Theorem 2.1 (Heat kernel parametrix and decay [25, 16]).

Suppose ${\mathcal{M}}$ is as in Assumption 1 (A1), and $m>d/2+2$ is a positive integer. Then there are positive constants $t_{0}<1$ , $\delta_{0}<inj({\mathcal{M}})$ i.e. the injective radius of ${\mathcal{M}}$ , and both $t_{0}$ and $\delta_{0}$ depend on ${\mathcal{M}}$ , and

1) Local approximation: There are positive constants $C_{1}$ , $C_{2}$ which depending on ${\mathcal{M}}$ , and $u_{0},\cdots,u_{m}$ $\in C^{\infty}({\mathcal{M}})$ , where $u_{0}$ satisfies that

|u_{0}(x,y)-1|\leq C_{1}d_{\mathcal{M}}(x,y)^{2},\quad\forall y\in{\mathcal{M}},\,d_{\mathcal{M}}(y,x)<\delta_{0},

and $G_{t}$ is defined as in (3), such that, when $t<t_{0}$ , for any $x\in{\mathcal{M}}$ ,

\left|{H}_{t}(x,y)-G_{t}(x,y)\left(\sum_{l=0}^{m}t^{l}u_{l}(x,y)\right)\right|\leq C_{2}t^{m-d/2+1},\quad\forall y\in{\mathcal{M}},\,d_{\mathcal{M}}(y,x)<\delta_{0}.

(10)

2) Global decay: There is positive constant $C_{3}$ depending on ${\mathcal{M}}$ such that, when $t<t_{0}$ ,

{H}_{t}(x,y)\leq C_{3}t^{-d/2}e^{-\frac{d_{\mathcal{M}}(x,y)^{2}}{5t}},\quad\forall x,y\in{\mathcal{M}}.

(11)

Part 1) is by the classical parametrix construction of heat kernel on ${\mathcal{M}}$ , see e.g. Chapter 3 of [25], and Part 2) follows the classical upper bound of heat kernel by Gaussian estimate dating back to 60s [1, 17]. We include a proof of the theorem in Appendix B for completeness.

The theorem directly gives to the following lemma (proof in Appendix B), which is useful for our construction of interpolation mapping using heat kernel. We denote by $B_{\delta}(x)$ the Euclidean ball in $\mathbb{R}^{D}$ centered at point $x$ of radius $\delta$ .

Lemma 2.2.

Suppose ${\mathcal{M}}$ is as in Assumption 1 (A1), and $t\to 0+$ . Let $\delta_{t}:=\sqrt{6(10+\frac{d}{2})t\log{\frac{1}{t}}}$ , and $K_{t}(x,y)$ be with Gaussian kernel $h$ , i.e., $K_{t}(x,y)=(4\pi t)^{-d/2}e^{-\|x-y\|^{2}/4t}$ . Then there is positive constant $\epsilon_{0}$ depending on ${\mathcal{M}}$ such that, when $t<\epsilon_{0}$ , for any $x\in{\mathcal{M}}$ ,

	$\displaystyle{H}_{t}(x,y)=K_{t}(x,y)(1+{O}(t(\log t^{-1})^{2}))+O(t^{3}),\quad\forall y\in B_{\delta_{t}}(x)\cap{\mathcal{M}},$		(12)
	$\displaystyle{H}_{t}(x,y)=O(t^{10}),\quad\forall y\notin B_{\delta_{t}}(x)\cap{\mathcal{M}},$		(13)
	$\displaystyle{H}_{t}(x,y)=O(t^{-d/2}),\quad\forall x,y\in{\mathcal{M}}.$		(14)

The constants in big- $O$ in all the equations only depend on ${\mathcal{M}}$ and are uniform for all $x$ .

3 Eigenvalue upper bound

In this section, we consider uniform $p$ on ${\mathcal{M}}$ , and standard graph Laplacians $L_{un}$ and $L_{rw}$ with the kernelized affinity matrix $W$ , $W_{ij}=K_{\epsilon}(x_{i},x_{j})$ defined as in (1). We show the eigenvalue UB for general differentiable $h$ satisfying Assumption 2, not necessarily Gaussian.

3.1 Un-normalized graph Laplacian eigenvalue UB

We now derive Step 0 for $L_{un}$ , the result being summarized in the following proposition.

Proposition 3.1 (Eigenvalue UB of $L_{un}$ ).

Under Assumption 1(A1), $p$ being uniform on ${\mathcal{M}}$ , and Assumption 2. For fixed $K\in\mathbb{N}$ , if as $N\to\infty$ , $\epsilon\to 0+$ and $\epsilon^{d/2}=\Omega(\frac{\log N}{N})$ , then for sufficiently large $N$ , w.p. $>1-4K^{2}N^{-10}$ ,

\lambda_{k}(L_{un})\leq\mu_{k}+O\left(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}\right),\quad k=1,\cdots,K.

The proposition holds when the population eigenvalues $\mu_{k}$ have more than 1 multiplicities, as long as they are sorted in an ascending order. The proof is by constructing a $k$ -dimensional subspace $L$ in (7) spanned by vectors in $\mathbb{R}^{N}$ which are produced by evaluating the population eigenfunctions $\psi_{k}$ at the $N$ data points. The proof is given in the end of this subsection after we introduce a few needed middle-step results.

Given $X=\{x_{i}\}_{i=1}^{N}$ , define the function evaluation operator $\rho_{X}$ applied to $f:{\mathcal{M}}\to\mathbb{R}$ as

\rho_{X}:C({\mathcal{M}})\to\mathbb{R}^{N},\quad\rho_{X}f=(f(x_{1}),\cdots,f(x_{N})).

We will use $u_{k}=\frac{1}{\sqrt{p}}\rho_{X}\psi_{k}$ as “candidate” approximate eigenvectors. To analyze $E_{N}(\frac{1}{\sqrt{p}}\rho_{X}\psi_{k})$ , the following result from [9] shows that it converges to the differential Dirichlet form

p^{-1}\langle\psi_{k},(-\Delta)\psi_{k}\rangle_{p^{2}}=p\mu_{k}

with the form rate. The result is for general smooth $p$ and weighted Laplacian $\Delta_{q}$ , which is defined as $\Delta_{q}:=\Delta+\frac{\nabla q}{q}\cdot\nabla$ for measure $qdV$ on ${\mathcal{M}}$ . $\Delta_{q}$ is reduced to $\Delta$ when $q$ is uniform.

Theorem 3.2 (Theorem 3.4 in [9]).

Under Assumptions 1 and 2, as $N\to\infty$ , $\epsilon\to 0+$ , $\epsilon^{d/2}=\Omega(\frac{\log N}{N})$ , then for any $f\in C^{\infty}({{\mathcal{M}}})$ , when $N$ is sufficiently large, w.p. $>1-2N^{-10}$ ,

E_{N}(\rho_{X}f)=\langle f,-\Delta_{p^{2}}f\rangle_{p^{2}}+O_{p,f}\left(\epsilon\right)+O\left(\sqrt{\frac{\log N}{N\epsilon^{d/2}}\int_{{\mathcal{M}}}|\nabla f|^{4}p^{2}}\right).

The constant in $O_{p,f}(\cdot)$ depends on the $C^{4}$ norm of $p$ and $f$ on ${\mathcal{M}}$ , and that in $O(\cdot)$ is an absolute one.

Proof of Theorem 3.2.

The proof is by a going through of the proof of Theorem 3.4 of [9] under the simplified situation when $\beta=0$ (no normalization of the estimated density is involved). Specifically, the proof uses the concentration of the $V$ -statistics $V_{ij}:=\frac{1}{\epsilon}K_{\epsilon}(x_{i},x_{j})(f(x_{i})-f(x_{j}))^{2}$ . The expectation of $\mathbb{E}V_{ij}$ , $i\neq j$ , equals $\frac{1}{\epsilon}\int_{{\mathcal{M}}}\int_{{\mathcal{M}}}K_{\epsilon}(x,y)(f(x)-f(y))^{2}p(x)p(y)dV(x)dV(y)=m_{2}[h]\langle f,-\Delta_{p^{2}}f\rangle_{p^{2}}+O_{p,f}(\epsilon)$ . Meanwhile, $|V_{ij}|$ is bounded by $O(\epsilon^{-d/2})$ , and the variance of the $V_{ij}$ can also be bounded by $O(\epsilon^{-d/2})$ with the constant as in the theorem, following the calculation in the proof of Theorem 3.4 in [9]. The concentration of $\frac{1}{N(N-1)}\sum_{i,j=1}^{N}V_{ij}$ at $\mathbb{E}V_{ij}$ then follows by the decoupling of the $V$ -statistics, and it gives the high probability bound in the theorem.

Note that the results in [9] are proved under the assumption that $h$ to be $C^{4}$ rather than $C^{2}$ , that is, requiring Assumption 2(C1)(C2) to hold for up to 4-th derivative of $h$ . This is because $C^{4}$ regularity of $h$ is used to handle complication of the adaptive bandwidth in the other analysis in [9]. With the fixed bandwidth kernel $K_{\epsilon}(x,y)$ as defined in (1), $C^{2}$ regularity suffices, as originally assumed in [10]. ∎

Remark 1 (Relaxation of Assumption 2).

Since the proof only involves the computation of moments of the $V$ -statistic, it is possible to relax Assumption 2(C3) non-negativity of $h$ and replace with certain non-vanishing conditions on $m_{0}[h]$ and $m_{2}[h]$ , e.g., as in [10] and Assumption A.3 in [9]. Since the non-negativity of $W_{ij}$ is used in other places in the paper, and our eigenvalue LB needs $h$ to be Gaussian, we adopt the non-negativity of $h$ in Assumption 2 for simplicity. The $C^{4}$ regularity of $f$ may also be relaxed, and the constant in $O_{p,f}(\cdot)$ may be improved accordingly. These extensions are not further pursued here.

Remark 2 (Dirichlet form convergence with compactly supported $h$ ).

The “epsilon-graph” corresponds to construct graph affinity using the indicator function kernel $h={\bf 1}_{[0,1)}$ . Note that the “epsilon” stands for the scale of local distance and thus is the $\sqrt{\epsilon}$ here, because our $\epsilon$ is “time”. When $h={\bf 1}_{[0,1)}$ , using the same method as in the proof of Lemma 8 in [10], one can verify that (proof in Appendix C.1), for $i\neq j$ ,

\mathbb{E}V_{ij}=m_{2}[h]\langle f,-\Delta_{p^{2}}f\rangle_{p^{2}}+O_{p,f}(\epsilon),\quad f\in C^{\infty}({\mathcal{M}}).

(15)

The boundedness and variance of $V_{ij}$ are again bounded by $O(\epsilon^{-d/2})$ , and thus the Dirichlet form convergence with $h={\bf 1}_{[0,1)}$ has the same rate $O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})$ as in Theorem 3.2. This firstly implies that the eigenvalue UB also has the same rate, following the same proof of Proposition 3.1. The final eigen-convergence rate also depends on the point-wise rate of the graph Laplacian, see more in Remark 4.

In Theorem 3.2 and in below, the $\log N$ factor in the variance error bound is due to the concentration argument. Throughout the paper, the classical Bernstein inequality Lemma B.1 is intensively used.

To proceed, recall the definition of $E_{N}(u)$ as in (6), we define the bi-linear form for $u,v\in\mathbb{R}^{N}$ as

B_{N}(u,v):=\frac{1}{4}(E_{N}(u+v)-E_{N}(u-v))=\frac{1}{m_{2}/2}\frac{1}{\epsilon N^{2}}u^{T}(D-W)v,

which is symmetric, i.e., $B_{N}(u,v)=B_{N}(v,u)$ , and $B_{N}(u,u)=E_{N}(u)$ . The following lemma characterizes the forms $E_{N}$ and $B_{N}$ applied to $\rho_{X}\psi_{k}$ , proved in Appendix C.1.

Lemma 3.3.

Under Assumption 1 (A1), $p$ being uniform on ${\mathcal{M}}$ , and Assumption 2. As $N\to\infty$ , $\epsilon\to 0+$ , $\epsilon^{d/2}N=\Omega(\log N)$ . For fixed $K$ , when $N$ is sufficiently large, w.p. $>1-2K^{2}N^{-10}$ ,

\begin{split}E_{N}(\frac{1}{\sqrt{p}}\rho_{X}\psi_{k})&=p\mu_{k}+O(\epsilon)+O\left(\sqrt{\frac{\log N}{N\epsilon^{d/2}}}\right),\quad k=1,\cdots,K,\\ B_{N}(\frac{1}{\sqrt{p}}\rho_{X}\psi_{k},\frac{1}{\sqrt{p}}\rho_{X}\psi_{l})&=O(\epsilon)+O\left(\sqrt{\frac{\log N}{N\epsilon^{d/2}}}\right),\quad k\neq l,\,1\leq k,l\leq K.\end{split}

(16)

We need to show the linear independence of the vectors $\rho_{X}\psi_{1},\cdots,\rho_{X}\psi_{K}$ such that they span a $K$ -dimensional subspace in $\mathbb{R}^{N}$ . This holds w.h.p. at large $N$ , by the following lemma showing the near-isometry of the projection mapping $\rho_{X}$ , proved in Appendix C.1.

Lemma 3.4.

Under Assumptions 1 (A1), $p$ being uniform on ${\mathcal{M}}$ . For fixed $K$ , when $N$ is sufficiently large, w.p. $>1-2K^{2}N^{-10}$ ,

\begin{split}\frac{1}{N}\|\frac{1}{\sqrt{p}}\rho_{X}\psi_{k}\|^{2}&=1+O(\sqrt{\frac{\log N}{N}}),\,1\leq k\leq K;\\ \frac{1}{N}(\frac{1}{\sqrt{p}}\rho_{X}\psi_{k})^{T}(\frac{1}{\sqrt{p}}\rho_{X}\psi_{l})&=O(\sqrt{\frac{\log N}{N}}),\,k\neq l,\,1\leq k,l\leq K.\end{split}

(17)

Given these estimates, we are ready to prove Proposition 3.1.

Proof of Proposition 3.1.

For fixed $K$ , consider the intersection of both good events in Lemma 3.3 and 3.4, which happens w.p. $>1-4K^{2}N^{-10}$ with large enough $N$ . Let $u_{k}=\frac{1}{\sqrt{p}}\rho_{X}\psi_{k}$ , by (17), the set $\{u_{1},\cdots,u_{K}\}$ is linearly independent.

For any $1\leq k\leq K$ , let $L=\text{Span}\{u_{1},\cdots,u_{k}\}$ , then $dim(L)=k$ . By (7), to show the UB of $\lambda_{k}$ as in the proposition, it suffices to show that

\sup_{v\in L,\|v\|^{2}=N}\frac{1}{p}E_{N}(v)\leq\mu_{k}+O(\epsilon)+O\left(\sqrt{\frac{\log N}{N\epsilon^{d/2}}}\right).

For any $v\in L$ , $\|v\|^{2}=N$ , there are $c_{j}$ , $1\leq j\leq k$ , such that $v=\sum_{j=1}^{k}c_{j}u_{j}$ . By (17),

1=\frac{1}{N}\|v\|^{2}=\sum_{j=1}^{k}c_{j}^{2}(1+O(\sqrt{\frac{\log N}{N}}))+\sum_{j\neq l,j,l=1}^{k}|c_{j}||c_{l}|O(\sqrt{\frac{\log N}{N}})=\|c\|^{2}(1+O(K\sqrt{\frac{\log N}{N}})),

thus $\|c\|^{2}=1+O(\sqrt{\frac{\log N}{N}})$ . Meanwhile, $E_{N}(v)=E_{N}(\sum_{j=1}^{k}c_{j}u_{j})=\sum_{j,l=1}^{k}c_{j}c_{l}B_{N}(u_{j},u_{l})$ , and by (16),

	$\displaystyle E_{N}(v)$	$\displaystyle=\sum_{j=1}^{k}c_{j}^{2}\left(p\mu_{j}+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})\right)+\sum_{j\neq l,j,l=1}^{k}\|c_{j}\|\|c_{l}\|O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})$
		$\displaystyle=p\sum_{j=1}^{k}\mu_{j}c_{j}^{2}+K\\|c\\|^{2}O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})\leq\\|c\\|^{2}\left\{p\mu_{k}+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})\right\},$		(18)

where since $K$ is fixed integer, we incorporate it into the big- $O$ . Also, $\mu_{k}\leq\mu_{K}=O(1)$ , and then

\frac{1}{p}E_{N}(v)\leq\left(1+O(\sqrt{\frac{\log N}{N}})\right)\left\{\mu_{k}+O(\epsilon)+O\left(\sqrt{\frac{\log N}{N\epsilon^{d/2}}}\right)\right\}=\mu_{k}+O(\epsilon)+O\left(\sqrt{\frac{\log N}{N\epsilon^{d/2}}}\right),

which finishes the proof. ∎

3.2 Random-walk graph Laplacian eigenvalue UB

We fist establish a concentration argument of $D_{i}$ in the following lemma, which shows that $D_{i}>0$ w.h.p., by that $\frac{1}{N}D_{i}$ concentrates at the value of $m_{0}p>0$ . Consequently, $\frac{1}{N^{2}}u^{T}Du$ also concentrates and the deviation is uniformly bounded for all $u\in\mathbb{R}^{N}$ , which will be used in analyzing (8).

Lemma 3.5.

Under Assumption 1(A1), $p$ uniform, and Assumption 2. Suppose as $N\to 0$ , $\epsilon\to 0+$ and $\epsilon^{d/2}=\Omega(\frac{\log N}{N})$ . Then, when $N$ is large enough, w.p. $>1-2N^{-9}$ ,

1) The degree $D_{i}$ concentrates for all $i$ , namely,

\frac{1}{N}D_{i}=m_{0}p+O\left(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}\right),\quad\forall i=1,\cdots,N.

(19)

2) The from $\frac{1}{N^{2}}u^{T}Du$ concentrates for all $u$ , namely,

\frac{1}{N^{2}}u^{T}Du=\frac{1}{N}\|u\|^{2}\left(m_{0}p+O\left(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}\right)\right),\quad\forall u\in\mathbb{R}^{N}.

(20)

The constants in big-O in (19) and (20) are determined by $({\mathcal{M}},h)$ and uniform for all $i$ and $u$ .

Part 2) immediately follows from Part 1), the latter being proved by standard concentration argument of independent sum and a union bound for $N$ events. With Lemma 3.5, the proof of the following proposition is similar to that of Proposition 3.1, and the difference lies in handling the denominator of the Rayleigh quotient in (8). The proofs of Lemma 3.5 and Proposition 3.6 are in Appendix C.1.

Proposition 3.6 (Eigenvalue UB of $L_{rw}$ ).

Suppose ${\mathcal{M}}$ , $p$ uniform, $h$ , $K$ , $\mu_{k}$ , and $\epsilon$ are under the same condition as in Proposition 3.1, then for sufficiently large $N$ , w.p. $>1-2N^{-9}-4K^{2}N^{-10}$ , $D_{i}>0$ for all $i$ , and

\lambda_{k}(L_{rw})\leq\mu_{k}+O\left(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}\right),\quad k=1,\cdots,K.

4 Eigenvalue crude lower bound in Step 1

In this section, we prove $O(1)$ eigenvalue LB in Step 1, first for $L_{un}$ , and then the proof for $L_{rw}$ is similar.

We consider for $t>0$ the operator ${\mathcal{L}}_{t}$ on $H=L^{2}({\mathcal{M}},dV)$ defined as

{\mathcal{L}}_{t}:=I-Q_{t},\quad{\mathcal{L}}_{t}f(x)=f(x)-\int_{{\mathcal{M}}}{H}_{t}(x,y)f(y)dV(y),\quad f\in H.

The semi-group operator $Q_{t}$ is Hilbert-Schmidt, compact, and has eigenvalues and eigenfunctions as in (9). Thus, the operator ${\mathcal{L}}_{t}$ is self-adjoint and PSD, and has

{\mathcal{L}}_{t}\psi_{k}=(1-e^{-t\mu_{k}})\psi_{k},\quad k=1,2,\cdots

For any $t>0$ , the eigenvalues $\{1-e^{-t\mu_{k}}\}_{k}$ are ascending from 0 and have limit point 1. We denote $\|f\|^{2}=\langle f,f\rangle$ for $f\in H$ . By the variational principle, we have that when $t>0$ , for any $k$ ,

1-e^{-t\mu_{k}}=\inf_{L\subset H,\,dim(L)=k}\sup_{f\in L,\,\|f\|^{2}\neq 0}\frac{\langle f,{\mathcal{L}}_{t}f\rangle}{\langle f,f\rangle}.

(21)

For the first result, we assume that $\mu_{k}$ are all of multiplicity 1 for simplicity. When population eigenvalues have greater than one multiplicity, the result extends by considering eigenspace rather than eigenvectors in the standard way, see Remark 5.

4.1 Un-normalized graph Laplacian eigenvalue crude LB

We now derive Step 1 for $L_{un}$ , the result being summarized in the following proposition.

Proposition 4.1 (Initial crude eigenvalue LB of $L_{un}$ ).

Under Assumptions 1 (A1), suppose $p$ is uniform on ${\mathcal{M}}$ , and $h$ is Gaussian. For fixed $k_{max}\in\mathbb{N}$ , $K=k_{max}+1$ , suppose $0=\mu_{1}<\cdots<\mu_{K}<\infty$ are all of single multiplicity, and define

\gamma_{K}:=\frac{1}{2}\min_{1\leq k\leq k_{max}}(\mu_{k+1}-\mu_{k}),

(22)

$\gamma_{K}>0$ and is a fixed constant. Then there is a absolute constant $c_{K}$ determined by ${\mathcal{M}}$ and $k_{max}$ (specifically, $c_{K}=c(\frac{\mu_{K}}{\gamma_{K}})^{d/2}\gamma_{K}^{-2}$ , where $c$ is a constant depending on ${\mathcal{M}}$ ), such that, if as $N\to\infty$ , $\epsilon\to 0+$ , and $\epsilon^{d/2+2}>c_{K}\frac{\log N}{N}$ , then for sufficiently large $N$ , w.p. $>1-4K^{2}N^{-10}-4N^{-9}$ ,

\lambda_{k}(L_{un})>\mu_{k}-\gamma_{K},\quad k=2,\cdots,K.

We prove Proposition 4.1 in the end of this subsection after we introduce heat kernel interpolation and establish the needed lemmas.

Suppose $\{\lambda_{k},v_{k}\}_{k=1}^{K}$ are eigenvalue and eigenvectors of $L_{un}$ , to construct a test function $f_{k}$ on ${\mathcal{M}}$ from the vector $v_{k}$ , we define the interpolation mapping (the terminology “interpolation” is inherited from [6]) by the heat kernel with diffusion time $r$ , $0<r<\epsilon$ to be determined. Specifically, define

I_{r}[u](x):=\frac{1}{N}\sum_{j=1}^{N}u_{j}{H}_{r}(x,x_{j}),\quad I_{r}:\mathbb{R}^{N}\to C^{\infty}({\mathcal{M}}),

and then for any $t>0$ ,

{\langle I_{r}[u],Q_{t}I_{r}[u]\rangle}=\frac{1}{N^{2}}\sum_{i,j=1}^{N}u_{i}u_{j}{H}_{2r+t}(x_{i},x_{j}),\quad{\langle I_{r}[u],I_{r}[u]\rangle}=\frac{1}{N^{2}}\sum_{i,j=1}^{N}u_{i}u_{j}{H}_{2r}(x_{i},x_{j}).

(23)

We define the quadratic form

q_{s}(u):=\frac{1}{N^{2}}\sum_{i,j=1}^{N}u_{i}u_{j}{H}_{s}(x_{i},x_{j}),\quad s>0,\quad u\in\mathbb{R}^{N}.

We also define $q_{s}^{(0)}$ and $q_{s}^{(2)}$ as below, and then for any $u\in\mathbb{R}^{N}$ , $q_{s}(u)=q^{(0)}_{s}(u)-q^{(2)}_{s}(u)$ , where

\displaystyle q^{(0)}_{s}(u):=\frac{1}{N}\sum_{i=1}^{N}u_{i}^{2}\left(\frac{1}{N}\sum_{j=1}^{N}{H}_{s}(x_{i},x_{j})\right),\quad q^{(2)}_{s}(u):=\frac{1}{2}\frac{1}{N^{2}}\sum_{i,j=1}^{N}{H}_{s}(x_{i},x_{j})(u_{i}-u_{j})^{2}

(24)

We will show that $q^{(0)}_{s}(u)\approx p\frac{1}{N}\|u\|^{2}$ by concentration of the independent sum $\frac{1}{N}\sum_{j=1}^{N}{H}_{s}(x_{i},x_{j})$ ; $q^{(2)}_{s}(u)\geq 0$ by definition, and will be $O(s)$ when $u$ is an eigenvector with $\|u\|^{2}=N$ .

Lemma 4.2.

Under Assumptions 1 (A1), $p$ being uniform on ${\mathcal{M}}$ . Suppose as $N\to 0$ , $s\to 0+$ and $s^{d/2}=\Omega(\frac{\log N}{N})$ . Then, when $N$ is large enough, w.p. $>1-2N^{-9}$ ,

q^{(0)}_{s}(u)=\frac{1}{N}\|u\|^{2}\left(p+O_{\mathcal{M}}(\sqrt{\frac{\log N}{Ns^{d/2}}})\right),\quad\forall u\in\mathbb{R}^{N}.

The notation $O_{\mathcal{M}}(\cdot)$ indicates that the constant depends on ${\mathcal{M}}$ and is uniform for all $u$ .

Proof of Lemma 4.2.

By definition, $q^{(0)}_{s}(u)=\frac{1}{N}\sum_{i=1}^{N}u_{i}^{2}(D_{s})_{i}$ , where $(D_{s})_{i}:=\frac{1}{N}\sum_{j=1}^{N}{H}_{s}(x_{i},x_{j})$ , and $\{(D_{s})_{i}\}_{i=1}^{N}$ are $N$ positive valued random variables. It suffices to show that with large enough $N$ , w.p. indicated in the lemma,

(D_{s})_{i}=p+O_{\mathcal{M}}(\sqrt{\frac{\log N}{Ns^{d/2}}}),\quad\forall i=1,\cdots,N.

(25)

This can be proved using concentration argument, similar as in the proof of Lemma 3.5 1), where we use the boundedness of the heat kernel (14) in Lemma 2.2. The proof of (25) is given in Appendix C.2. Note that (25) is a property of the r.v. ${H}_{s}(x_{i},x_{j})$ only, which is irrelevant to the vector $u$ . Thus the threshold of large $N$ in the lemma and the constant in big- $O$ depend on ${\mathcal{M}}$ and are uniform for all $u$ . ∎

Lemma 4.3.

Under Assumptions 1 ( $p$ can be non-uniform), $h$ being Gaussian, let $0<\alpha<1$ be a fixed constant. Suppose $\epsilon\to 0+$ as $N\to\infty$ , then with sufficiently small $\epsilon$ , for any realization of $X$ ,

0\leq q^{(2)}_{\epsilon}(u)=\left(1+{O}(\epsilon(\log\frac{1}{\epsilon})^{2})\right)\frac{u^{T}(D-W)u}{N^{2}}+\frac{\|u\|^{2}}{N}O(\epsilon^{3}),\quad\forall u\in\mathbb{R}^{N},

(26)

and

0\leq q^{(2)}_{\alpha\epsilon}(u)\leq 1.1\alpha^{-d/2}\frac{u^{T}(D-W)u}{N^{2}}+\frac{\|u\|^{2}}{N}O(\epsilon^{3}),\quad\forall u\in\mathbb{R}^{N}.

(27)

The constants in big- $O$ only depend on ${\mathcal{M}}$ and are uniform for all $u$ and $\alpha$ .

Proof of Lemma 4.3.

For any $u\in\mathbb{R}^{N}$ , $q^{(2)}_{\epsilon}(u)=\frac{1}{2}\frac{1}{N^{2}}\sum_{i,j=1}^{N}{H}_{\epsilon}(x_{i},x_{j})(u_{i}-u_{j})^{2}\geq 0$ . Since $\epsilon=o(1)$ , take $t$ in Lemma 2.2 to be $\epsilon$ , when $\epsilon<\epsilon_{0}$ , the three equations hold. By (13), truncate at an $\delta_{\epsilon}=\sqrt{6(10+\frac{d}{2})\epsilon\log{\frac{1}{\epsilon}}}$ Euclidean ball,

\displaystyle q^{(2)}_{\epsilon}(u)=\frac{1}{2}\frac{1}{N^{2}}\sum_{i,j=1}^{N}{H}_{\epsilon}(x_{i},x_{j}){\bf 1}_{\{x_{j}\in B_{\delta_{\epsilon}}(x_{i})\}}(u_{i}-u_{j})^{2}+O(\epsilon^{10})\frac{1}{2}\frac{1}{N^{2}}\sum_{i,j=1}^{N}(u_{i}-u_{j})^{2}.

By that $\frac{1}{N^{2}}\sum_{i,j=1}^{N}(u_{i}-u_{j})^{2}\leq\frac{2}{N}\|u\|^{2}$ , and apply (12) with the short hand that $\tilde{O}(\epsilon)$ stands for ${O}(\epsilon(\log\frac{1}{\epsilon})^{2})$ ,

	$\displaystyle q^{(2)}_{\epsilon}(u)$	$\displaystyle=\frac{1}{2}\frac{1}{N^{2}}\sum_{i,j=1}^{N}\left(K_{\epsilon}(x_{i},x_{j})(1+\tilde{O}(\epsilon))+O(\epsilon^{3})\right){\bf 1}_{\{x_{j}\in B_{\delta_{\epsilon}}(x_{i})\}}(u_{i}-u_{j})^{2}+O(\epsilon^{10})\frac{\\|u\\|^{2}}{N}$
		$\displaystyle=(1+\tilde{O}(\epsilon))\frac{1}{2}\frac{1}{N^{2}}\sum_{i,j=1}^{N}K_{\epsilon}(x_{i},x_{j}){\bf 1}_{\{x_{j}\in B_{\delta_{\epsilon}}(x_{i})\}}(u_{i}-u_{j})^{2}+O(\epsilon^{3})\frac{\\|u\\|^{2}}{N}.$

By the truncation argument for $K_{\epsilon}(x_{i},x_{j})$ , we have that

\frac{1}{2}\frac{1}{N^{2}}\sum_{i,j=1}^{N}K_{\epsilon}(x_{i},x_{j}){\bf 1}_{\{x_{j}\in B_{\delta_{\epsilon}}(x_{i})\}}(u_{i}-u_{j})^{2}=\frac{u^{T}(D-W)u}{N^{2}}+\frac{\|u\|^{2}}{N}O(\epsilon^{10}).

(28)

Putting together, we have

q^{(2)}_{\epsilon}(u)=(1+\tilde{O}(\epsilon))\left(\frac{u^{T}(D-W)u}{N^{2}}+\frac{\|u\|^{2}}{N}O(\epsilon^{10})\right)+O(\epsilon^{3})\frac{\|u\|^{2}}{N},

which proves (26).

To prove (27), since $\alpha<1$ is a fixed positive constant, $0<\alpha\epsilon<\epsilon<\epsilon_{0}$ , we then apply Lemma 2.2 with $t$ therein being $\alpha\epsilon$ . With a truncation at $\delta_{\alpha\epsilon}$ -Euclidean ball, and by (12),

	$\displaystyle q^{(2)}_{\alpha\epsilon}(u)$	$\displaystyle=\frac{1}{2}\frac{1}{N^{2}}\sum_{i,j=1}^{N}\left(K_{\alpha\epsilon}(x_{i},x_{j})(1+\tilde{O}(\alpha\epsilon))+O(\alpha^{3}\epsilon^{3})\right){\bf 1}_{\{x_{j}\in B_{\delta_{\alpha\epsilon}}(x_{i})\}}(u_{i}-u_{j})^{2}+\frac{\\|u\\|^{2}}{N}O(\epsilon^{10})$
		$\displaystyle=(1+\tilde{O}(\epsilon))\frac{1}{2}\frac{1}{N^{2}}\sum_{i,j=1}^{N}K_{\alpha\epsilon}(x_{i},x_{j}){\bf 1}_{\{x_{j}\in B_{\delta_{\alpha\epsilon}}(x_{i})\}}(u_{i}-u_{j})^{2}+\frac{\\|u\\|^{2}}{N}O(\epsilon^{3}).$

Suppose $\epsilon$ is sufficiently small such that $1+\tilde{O}(\epsilon)$ is less than 1.1. Note that

K_{\alpha\epsilon}(x,y)=\frac{1}{(4\pi\alpha\epsilon)^{d/2}}e^{-\frac{\|x-y\|^{2}}{4\alpha\epsilon}}\leq\frac{1}{\alpha^{d/2}}\frac{1}{(4\pi\epsilon)^{d/2}}e^{-\frac{\|x-y\|^{2}}{4\epsilon}}=\alpha^{-d/2}K_{\epsilon}(x,y),\quad\forall x,y\in{\mathcal{M}},

(29)

then, by that ${\bf 1}_{\{x_{j}\in B_{\delta_{\alpha\epsilon}}(x_{i})\}}\leq{\bf 1}_{\{x_{j}\in B_{\delta_{\epsilon}}(x_{i})\}}$ , and again with (28),

	$\displaystyle q^{(2)}_{\alpha\epsilon}(u)$	$\displaystyle\leq 1.1\frac{1}{2}\frac{1}{N^{2}}\sum_{i,j=1}^{N}\alpha^{-d/2}K_{\epsilon}(x_{i},x_{j}){\bf 1}_{\{x_{j}\in B_{\delta_{\epsilon}}(x_{i})\}}(u_{i}-u_{j})^{2}+\frac{\\|u\\|^{2}}{N}O(\epsilon^{3})$
		$\displaystyle=1.1\alpha^{-d/2}\left(\frac{u^{T}(D-W)u}{N^{2}}+\frac{\\|u\\|^{2}}{N}O(\epsilon^{10})\right)+\frac{\\|u\\|^{2}}{N}O(\epsilon^{3}),$

and this proves (27). ∎

We are ready to prove Proposition 4.1.

Proof of Proposition 4.1.

For fixed $k_{max}$ , since $\gamma_{K}<\mu_{K}$ , define

\delta:=\frac{0.5\gamma_{K}}{\mu_{K}}<0.5,

(30)

$\delta>0$ and is a fixed constant determined by ${\mathcal{M}}$ and $k_{max}$ . For $\epsilon>0$ , let

r:=\frac{\delta\epsilon}{2},\quad t=\epsilon-2r=(1-\delta)\epsilon.

For $L_{un}v_{k}=\lambda_{k}v_{k}$ , where $v_{k}$ are normalized s.t.

\frac{1}{N}v_{k}^{T}v_{l}=\delta_{kl},\quad 1\leq k,l\leq N,

(31)

let $f_{k}=I_{r}[v_{k}]$ , $k=1,\cdots,K$ , then $f_{k}\in C^{\infty}({\mathcal{M}})\subset H$ . Because $\epsilon^{d/2+2}>c_{K}\frac{\log N}{N}$ , and $\epsilon=o(1)$ , $\epsilon^{d/2}=\Omega(\frac{\log N}{N})$ . Thus, under the assumption of the current proposition, the condition needed in Proposition 3.1 is satisfied, and then when $N$ is sufficiently large, there is an event $E_{UB}$ which happens w.p. $>1-4K^{2}N^{-10}$ , under which

\lambda_{k}\leq\mu_{k}+0.1\mu_{K}\leq 1.1\mu_{K},\quad 1\leq k\leq K.

(32)

We first show that $\{f_{j}\}_{j=1}^{K}$ are linearly independent by considering $\langle f_{k},f_{l}\rangle$ . By definition, for $1\leq k\leq K$ ,

\langle f_{k},f_{k}\rangle=q_{2r}(v_{k})=q^{(0)}_{\delta\epsilon}(v_{k})-q^{(2)}_{\delta\epsilon}(v_{k}),

and for $k\neq l$ , $1\leq k,l\leq K$ ,

\langle(f_{k}\pm f_{l}),(f_{k}\pm f_{l})\rangle=q_{2r}(v_{k}\pm v_{l})=q^{(0)}_{\delta\epsilon}(v_{k}\pm v_{l})-q^{(2)}_{\delta\epsilon}(v_{k}\pm v_{l}).

Because $s=\delta\epsilon$ , under the condition of the proposition, $s$ satisfies the condition in Lemma 4.2, and thus, with sufficiently large $N$ , there is an event $E^{(0)}$ which happens w.p. $>1-2N^{-9}$ , under which

q^{(0)}_{\delta\epsilon}(v_{k})=p+O(\sqrt{\frac{\log N}{N\epsilon^{d/2}}}),\quad 1\leq k\leq K;\quad q^{(0)}_{\delta\epsilon}(v_{k}\pm v_{l})=2p+O(\sqrt{\frac{\log N}{N\epsilon^{d/2}}}),\quad k\neq l,1\leq k,l\leq K,

where we used that the factor $\delta^{-d/2}$ is a fixed constant. Meanwhile, applying (27) in Lemma 4.3 where $\alpha=\delta$ , and note that

\frac{v_{k}^{T}(D-W)v_{k}}{N^{2}}=p\epsilon\lambda_{k};\quad\frac{(v_{k}\pm v_{l})^{T}(D-W)(v_{k}\pm v_{l})}{N^{2}}=p\epsilon(\lambda_{k}+\lambda_{l}),\quad k\neq l,1\leq k,l\leq K,

we have that

\begin{split}q^{(2)}_{\delta\epsilon}(v_{k})&=O(\delta^{-d/2})p\epsilon\lambda_{k}+O(\epsilon^{3}),\quad 1\leq k\leq K,\\ q^{(2)}_{\delta\epsilon}(v_{k}\pm v_{l})&=O(\delta^{-d/2})p\epsilon(\lambda_{k}+\lambda_{l})+2O(\epsilon^{3}),\quad k\neq l,\end{split}

and by that $\lambda_{k},\,\lambda_{l}\leq 1.1\mu_{K}$ which is a fixed constant, so is $\delta$ , we have that

q^{(2)}_{\delta\epsilon}(v_{k})=O(\epsilon),\quad 1\leq k\leq K;\quad q^{(2)}_{\delta\epsilon}(v_{k}\pm v_{l})=O(\epsilon),\quad k\neq l,\,1\leq k,l\leq K.

(33)

Putting together, we have that

\begin{split}\langle f_{k},f_{k}\rangle&=p+O(\sqrt{\frac{\log N}{N\epsilon^{d/2}}},\epsilon),\quad 1\leq k\leq K,\\ \langle f_{k},f_{l}\rangle&=\frac{1}{4}(q_{\delta\epsilon}(v_{k}+v_{l})-q_{\delta\epsilon}(v_{k}-v_{l}))=O(\sqrt{\frac{\log N}{N\epsilon^{d/2}}},\epsilon),\quad k\neq l,\,1\leq k,l\leq K.\end{split}

(34)

This proves linear independence of $\{f_{j}\}_{j=1}^{K}$ when $N$ is large enough, since $O(\sqrt{\frac{\log N}{N\epsilon^{d/2}}},\epsilon)=o(1)$ .

We consider first $K$ eigenvalues of ${\mathcal{L}}_{t}$ , $t=(1-\delta)\epsilon$ . For each $2\leq k\leq K$ , let $L_{k}=\text{Span}\{f_{1},\cdots,f_{k}\}$ be a $k$ -dimensional subspace in $H$ , then by (21),

1-e^{-(1-\delta)\epsilon\mu_{k}}\leq\sup_{f\in L_{k},\,\|f\|^{2}\neq 0}\frac{\langle f,{\mathcal{L}}_{t}f\rangle}{\langle f,f\rangle}=\frac{\langle f,f\rangle-\langle f,Q_{t}f\rangle}{\langle f,f\rangle}.

(35)

For any $f\in L_{k}$ , $\|f\|^{2}\neq 0$ , there is $c\in\mathbb{R}^{k}$ , $c\neq 0$ , such that $f=\sum_{j=1}^{k}c_{j}f_{j}$ . Thus

f=\sum_{j=1}^{k}c_{j}I_{r}[v_{j}]=I_{r}[\sum_{j=1}^{k}c_{j}v_{j}]=I_{r}[v],\quad v:=\sum_{j=1}^{k}c_{j}v_{j}.

Because $v_{j}$ are orthogonal, $\|v_{j}\|^{2}=N$ , we have that

\frac{\|v\|^{2}}{N}=\|c\|^{2},\quad\frac{v^{T}(D-W)v}{N^{2}}=\sum_{j=1}^{k}c_{j}^{2}(p\epsilon\lambda_{j})\leq\lambda_{k}p\epsilon\|c\|^{2}.

By definition, $\langle f,f\rangle=q_{\delta\epsilon}(v)$ , and $\langle f,Q_{t}f\rangle=q_{\epsilon}(v)$ .

We first upper bound the numerator of the r.h.s. of (35). By that $q^{(2)}_{\delta\epsilon}(v)\geq 0$ ,

	$\displaystyle\langle f,f\rangle-\langle f,Q_{t}f\rangle$	$\displaystyle=q_{\delta\epsilon}(v)-q_{\epsilon}(v)=q^{(0)}_{\delta\epsilon}(v)-q^{(2)}_{\delta\epsilon}(v)-q^{(0)}_{\epsilon}(v)+q^{(2)}_{\epsilon}(v)$
		$\displaystyle\leq(q^{(0)}_{\delta\epsilon}(v)-q^{(0)}_{\epsilon}(v))+q^{(2)}_{\epsilon}(v).$		(36)

We have already obtained the good event $E^{(0)}$ when applying Lemma 4.2 with $s=\delta\epsilon$ . We apply the lemma again to $s=\epsilon$ , which gives that with sufficiently large $N$ there is an event $E^{(1)}$ which happens $w.p.>1-2N^{-9}$ , and then under $E^{(0)}\cap E^{(1)}$ ,

q^{(0)}_{\delta\epsilon}(v)=\|c\|^{2}(p+O_{\mathcal{M}}(\sqrt{\delta^{-d/2}\frac{\log N}{N\epsilon^{d/2}}})),\quad q^{(0)}_{\epsilon}(v)=\|c\|^{2}(p+O_{\mathcal{M}}(\sqrt{\frac{\log N}{N\epsilon^{d/2}}})).

(37)

We track the constant dependence here: the constant in $O_{\mathcal{M}}(\cdot)$ in Lemma 4.2 is only depending on ${\mathcal{M}}$ (and not on $K$ ), thus we use the notation $O_{{\mathcal{M}}}(\cdot)$ in (37) and below to emphasize that the constant is ${\mathcal{M}}$ -dependent only and independent from $K$ . Then (37) gives that

q^{(0)}_{\delta\epsilon}(v)-q^{(0)}_{\epsilon}(v)=\|c\|^{2}\delta^{-d/4}O_{\mathcal{M}}\left(\sqrt{\frac{\log N}{N\epsilon^{d/2}}}\right).

The UB of $q^{(2)}_{\epsilon}(v)$ follows from (26) in Lemma 4.3, with the shorthand that $\tilde{O}(\epsilon)$ stands for ${O}(\epsilon(\log\frac{1}{\epsilon})^{2})$ ,

q^{(2)}_{\epsilon}(v)=\frac{v^{T}(D-W)v}{N^{2}}(1+\tilde{O}(\epsilon))+\|c\|^{2}O(\epsilon^{3})\leq\epsilon\|c\|^{2}(\lambda_{k}p(1+\tilde{O}(\epsilon))+O(\epsilon^{2})).

Thus, (36) continues as

\langle f,f\rangle-\langle f,Q_{t}f\rangle\leq\epsilon\|c\|^{2}\left(\lambda_{k}p(1+\tilde{O}(\epsilon))+O(\epsilon^{2})+\delta^{-d/4}O_{\mathcal{M}}(\frac{1}{\epsilon}\sqrt{\frac{\log N}{N\epsilon^{d/2}}})\right).

(38)

Next we lower bound the denominator $\langle f,f\rangle$ . Here we use (27) in Lemma 4.3, which gives that

0\leq q^{(2)}_{\delta\epsilon}(v)\leq\Theta(\delta^{-d/2})\frac{v^{T}(D-W)v}{N^{2}}+\|c\|^{2}O(\epsilon^{3})\leq\epsilon\|c\|^{2}\left(\lambda_{k}p\Theta(\delta^{-d/2})+O(\epsilon^{2})\right).

Note that we assume under event $E_{UB}$ so that the eigenvalue UB (32) holds, thus $\lambda_{k}p\Theta(\delta^{-d/2})+O(\epsilon^{2})=O(1)$ . Together with that $\delta$ is a fixed constant, we have that

q^{(2)}_{\delta\epsilon}(v)=\|c\|^{2}O(\epsilon).

Then, again under $E^{(1)}$ ,

\langle f,f\rangle=q^{(0)}_{\delta\epsilon}(v)-q^{(2)}_{\delta\epsilon}(v)=\|c\|^{2}\left(p+O(\sqrt{\delta^{-d/2}\frac{\log N}{N\epsilon^{d/2}}})-O(\epsilon)\right)\geq\|c\|^{2}\left(p-O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})\right).

Putting together, and by that $\lambda_{k}\leq 1.1\mu_{K}$ , we have that

\frac{\langle f,f\rangle-\langle f,Q_{t}f\rangle}{\langle f,f\rangle}\leq\frac{\epsilon\left(\lambda_{k}p+\tilde{O}(\epsilon)+\delta^{-d/4}O_{\mathcal{M}}(\frac{1}{\epsilon}\sqrt{\frac{\log N}{N\epsilon^{d/2}}})\right)}{p-O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})}\leq\epsilon\left(\lambda_{k}+\tilde{O}(\epsilon)+\frac{C}{\epsilon}\sqrt{\frac{\log N}{N\epsilon^{d/2}}}\right),

where $C=c({\mathcal{M}})\delta^{-d/4}$ , and $c({\mathcal{M}})$ is a constant only depending on ${\mathcal{M}}$ . We set

c_{K}:=(\frac{C}{0.1\gamma_{K}})^{2}=(\frac{c({\mathcal{M}})}{0.1})^{2}\delta^{-d/2}\gamma_{K}^{-2},

and since we assume $\epsilon^{d/2+2}>c_{K}\frac{\log N}{N}$ in the current proposition, we have that $\frac{C}{\epsilon}\sqrt{\frac{\log N}{N\epsilon^{d/2}}}<0.1\gamma_{K}$ . Then, comparing to l.h.s. of (35), we have that

1-e^{-(1-\delta)\epsilon\mu_{k}}\leq\frac{\langle f,f\rangle-\langle f,Q_{t}f\rangle}{\langle f,f\rangle}\leq\epsilon\left(\lambda_{k}+\tilde{O}(\epsilon)+0.1\gamma_{K}\right).

By the relation that $1-e^{-x}\geq x-x^{2}$ for any $x\geq 0$ , $1-e^{-(1-\delta)\epsilon\mu_{k}}\geq\epsilon(1-\delta)\left(\mu_{k}-(1-\delta)\epsilon\mu_{k}^{2}\right)$ , and when $\epsilon$ is sufficiently small s.t. $\epsilon\mu_{k}^{2}\leq\epsilon(1.1\mu_{K})^{2}<0.1\gamma_{K}$ ,

1-e^{-(1-\delta)\epsilon\mu_{k}}\geq\epsilon(1-\delta)\left(\mu_{k}-0.1\gamma_{K}\right)>0.

Noting that for $k\geq 2$ , $\mu_{k}\geq\mu_{2}\geq 2\gamma_{K}>0$ , because $\mu_{1}=0$ . Thus, when $\epsilon$ is sufficiently small and the $\tilde{O}(\epsilon)$ term is less than $0.1\gamma_{K}$ , under the good events $E^{(1)}\cap E_{UB}$ , which happens w.p. $>1-4K^{2}N^{-10}-4N^{-9}$ , we have that

0<(1-\delta)(\mu_{k}-0.1\gamma_{K})\leq\lambda_{k}+\tilde{O}(\epsilon)+0.1\gamma_{K}<\lambda_{k}+0.2\gamma_{K}.

Recall that by definition (30), $\delta\mu_{K}=0.5\gamma_{K}$ , then $\delta\mu_{k}\leq\delta\mu_{K}=0.5\gamma_{K}$ , also $0<\delta<0.5$ . Re-arranging the terms gives that $\mu_{k}<\lambda_{k}+0.8\gamma_{K}$ . This can be verified for all $2\leq k\leq K$ , and note that the good event $E^{(1)}$ is w.r.t $X$ , and $E_{UB}$ is constructed for fixed $k_{max}$ , and none is for specific $k\leq K$ . ∎

4.2 Random-walk graph Laplacian eigenvalue crude LB

The counterpart result of random-walk graph Laplacian is the following proposition. It replaces Proposition 3.1 with Proposition 3.6 in obtaining the eigenvalue UB in the analysis, and consequently the high probability differs slightly.

Proposition 4.4 (Initial crude eigenvalue LB of $L_{rw}$ ).

Under the same condition and setting of ${\mathcal{M}}$ , $p$ being uniform, $h$ being Gaussian, and $k_{max}$ , $\mu_{k}$ , $\epsilon$ same as in Proposition 4.1. Then, for sufficiently large $N$ , w.p. $>1-4K^{2}N^{-10}-6N^{-9}$ , $\lambda_{k}(L_{rw})>\mu_{k}-\gamma_{K}$ , for $k=2,\cdots,K$ .

The proof is similar to that of Proposition 4.1 and left to Appendix C.2. The difference lies in that the empirical eigenvectors $v_{k}$ are $D$ -orthonormal rather than orthonormal, and the degree concentration Lemma 3.5 is used to relate $\frac{\|v\|^{2}}{N}$ with $\frac{1}{N^{2}}v^{T}Dv$ for arbitrary vector $v$ .

5 Steps 2-3 and eigen-convergence

Refer to caption — Figure 1: Population eigenvalues $\mu_{k}$ of $-\Delta$ , and empirical eigenvalues $\lambda_{k}$ of graph Laplacian matrix $L_{N}$ , $L_{N}$ can be $L_{un}$ or $L_{rw}$ . The positive integer $k_{max}$ is fixed, and the constant $\gamma_{K}$ is half of the minimum first- $K$ eigen-gaps, defined as in (22). Eigenvalue UB and initial LB are proved for $k\leq K$ , which guarantees (41). Extending to greater than one multiplicity by defining $\gamma_{K}$ as in (46).

In this section, we obtain eigen-convergence rate of $L_{un}$ and $L_{rw}$ from the initial crude eigenvalue bound in Step 1. We first derive the Steps 2-3 for $L_{un}$ , and the proof for $L_{rw}$ is similar.

5.1 Step 2 eigenvector consistency

In Step 1, the crude bound of eigenvalue (the UB already matches the form rate, the LB is crude) gives that for fixed $k_{max}$ and at large $N$ , each $\lambda_{k}$ will fall into the interval $(\mu_{k}-\gamma_{K},\mu_{k}+\gamma_{K})$ , where $\gamma_{K}$ is less than half of the smallest eigenvalue gaps $(\mu_{2}-\mu_{1})$ , …, $(\mu_{k_{max}+1}-\mu_{k_{max}})$ , illustrated in Fig. 1. This means that $\lambda_{k}$ is separated from neighboring $\mu_{k-1}$ and $\mu_{k+1}$ by an $O(1)$ distance away. This $O(1)$ initial separation is enough for proving eigenvector consistency up to the point-wise rate, which is a standard argument, see e.g. proof of Theorem 2.6 part 2) in [7]. In below we provide an informal explanation and then the formal statement in Proposition 5.2, with a proof for completeness.

We first give an illustrative informal derivation. Take $k=2$ for example, let $L_{N}=L_{un}$ , $L_{N}u_{k}=\lambda_{k}u_{k}$ , and we want to show that $u_{2}$ and $\rho_{X}\psi_{2}$ are aligned.

r_{2}:=L_{N}(\rho_{X}\psi_{2})-\rho_{X}(-\Delta)\psi_{2}\in\mathbb{R}^{N},\quad r_{2}(i)=L_{N}(\rho_{X}\psi_{2})(x_{i})-(-\Delta)\psi_{2}(x_{i}),

the point-wise convergence of graph Laplacian gives $L^{\infty}$ bound of the residual vector $r_{2}$ , suppose $\|r_{2}\|_{2}\leq\varepsilon\|\rho_{X}\psi_{2}\|_{2}$ . Meanwhile, for any $l=1,3,\cdots,N$ , the crude bound of eigenvalues $\lambda_{3}$ gives that

\lambda_{3}>\mu_{2}+\gamma_{K},

where $\gamma_{K}>0$ is an $O(1)$ constant determined by $k_{max}$ and ${\mathcal{M}}$ . Because empirical eigenvalues are sorted, $\lambda_{l}$ for $l\geq 3$ are also $\gamma_{K}$ away from $\mu_{2}$ . As a result,

|\lambda_{l}-\mu_{2}|>\gamma_{K}>0,\quad l\neq 2,\quad 1\leq l\leq N.

Then we use the relation that for each $l\neq 2$ , $u_{l}^{T}r_{2}=u_{l}^{T}(L_{N}(\rho_{X}\psi_{2})-\mu_{2}\rho_{X}\psi_{2})=(\lambda_{l}-\mu_{2})u_{l}^{T}(\rho_{X}\psi_{2})$ , which gives that

|u_{l}^{T}(\rho_{X}\psi_{2})|=\frac{|u_{l}^{T}r_{2}|}{|\lambda_{l}-\mu_{2}|}\leq\frac{\varepsilon}{\gamma_{K}}\|u_{l}\|_{2}\|\rho_{X}\psi_{2}\|_{2}.

This shows that $\rho_{X}\psi_{2}$ has $O(\varepsilon)$ alignment with all the other eigenvectors than $u_{2}$ , and since $\{u_{1},\cdots,u_{N}\}$ are orthogonal basis in $\mathbb{R}^{N}$ , this guarantees $1-O(\varepsilon)$ alignment between $\rho_{X}\psi_{2}$ and $u_{2}$ .

To proceed, we use the point-wise rate of graph Laplacian with $C^{2}$ kernel $h$ as in the next theorem. The analysis of point-wise convergence was given in [27] and [9]: The original theorem in [27] considers the normalized graph Laplacian $(I-D^{-1}W)$ . The analysis is similar for $(D-W)$ and leads to the same rate, which was derived in [9] under the setting of variable kernel bandwidth. These previous works consider a fixed point $x_{0}$ on ${\mathcal{M}}$ , and since the concentration result has exponentially high probability, it directly gives the version of uniform error bound at every data point $x_{i}$ , which is needed here.

Theorem 5.1 ([27, 9]).

Under Assumptions 1 and 2, if as $N\to\infty$ , $\epsilon\to 0+$ , $\epsilon^{d/2+1}=\Omega(\frac{\log N}{N})$ , then for any $f\in C^{4}({\mathcal{M}})$ ,

1) When $N$ is large enough, w.p. $>1-4N^{-9}$ ,

\frac{1}{\epsilon\frac{m_{2}}{2m_{0}}}\left((I-D^{-1}W)(\rho_{X}f)\right)_{i}=-\Delta_{p^{2}}f(x_{i})+\varepsilon_{i},\quad\sup_{1\leq i\leq N}|\varepsilon_{i}|=O(\epsilon)+O(\sqrt{\frac{\log N}{N\epsilon^{d/2+1}}}).

2) When $N$ is large enough, w.p. $>1-2N^{-9}$ ,

\frac{1}{\epsilon\frac{m_{2}}{2}p(x_{i})N}\left((D-W)(\rho_{X}f)\right)_{i}=-\Delta_{p^{2}}f(x_{i})+\varepsilon_{i},\quad\sup_{1\leq i\leq N}|\varepsilon_{i}|=O(\epsilon)+O(\sqrt{\frac{\log N}{N\epsilon^{d/2+1}}}).

The constants in the big-O notations depend on ${\mathcal{M}}$ , $p$ and the $C^{4}$ norm of $f$ .

Note that Theorem 5.1 holds for non-uniform $p$ , while in our eigen-convergence analysis of graph Laplacian with $W$ in below, we only use the result when $p$ is uniform. Meanwhile, similar to Theorem 3.2, Assumption 2(C3) may be relaxed for Theorem 5.1 to hold, c.f. Remark 1.

Proof of Theorem 5.1.

Consider the $N$ events such that $\varepsilon_{i}$ is less than the error bound. For each of the $i$ -th event, condition on $x_{i}$ , Theorem 3.8 in [9] can be directly used to show that the event holds w.p. $>1-4N^{-10}$ for the case 1) random-walk graph Laplacian. For the case 2) un-normalized graph Laplacian, adopting the same technique of Theorem 3.6 in [9] proves the same rate as for the fixed-bandwidth kernel, and gives that the event holds w.p. $>1-2N^{-10}$ . Specifically, the proof is by showing the concentration of the $\frac{1}{\epsilon N}\sum_{j=1}^{N}K_{\epsilon}(x_{i},x_{j})(f(x_{j})-f(x_{i}))$ , which is an independent summation condition on $x_{i}$ . The r.v. $H_{j}:=\frac{1}{\epsilon}K_{\epsilon}(x_{i},x_{j})(f(x_{j})-f(x_{i}))$ , $j\neq i$ , has expectation $\mathbb{E}H_{j}=\frac{m_{2}}{2}p(x_{i})\Delta_{p^{2}}f(x_{i})+O_{f,p}(\epsilon)$ , and $\mathbb{E}H_{j}^{2}$ can be shown to be bounded by $\Theta(\epsilon^{-d/2-1})$ , and $|H_{j}|$ is also bounded by $\Theta(\epsilon^{-d/2-1})$ , following the same calculation as in the proof of Theorem 3.6 in [9]. This shows that the bias error is $O(\epsilon)$ , and the variance error is $O(\sqrt{\frac{\log N}{N\epsilon^{d/2+1}}})$ , by classical Bernstein. Same as in Theorem 3.2, $C^{2}$ regularity and decay up to 2nd derivative of $h$ are enough here.

Strictly speaking, the analysis in [9] is for the “ $\frac{1}{N-1}\sum_{j\neq i,j=1}^{N}$ ” summation and not the “ $\frac{1}{N}\sum_{j\neq i,j=1}^{N}$ ” one here. However, the difference between $\frac{1}{N-1}$ and $\frac{1}{N}$ only introduces an $O(\frac{1}{N})$ relative error and is of higher order, and the $i=j$ term cancels out in the summation of $(D-W)\rho_{X}f$ . In proving this large deviation bound at $x_{i}$ , the needed threshold for large $N$ is determined by $({\mathcal{M}},f,p)$ and uniform for $x_{i}$ . Then, when $N$ exceeds a threshold uniform for all $x_{i}$ , by the independence of the $x_{i}$ ’s, the $i$ -th event holds w.p. $>1-4N^{-10}$ and $>1-2N^{-10}$ for cases 1) and 2) respectively. The current theorem, in both 1) and 2), follows by a union bound. ∎

We are ready for Step 2 for the unnormalized graph Laplacian $L_{un}=\frac{1}{\epsilon\frac{m_{2}}{2}pN}(D-W)$ . Here we consider eigenvectors normalized to have 2-norm 1, i.e., $L_{un}u_{k}=\lambda_{k}u_{k}$ , $u_{k}^{T}u_{l}=\delta_{kl}$ , and we compare $u_{k}$ to

\phi_{k}:=\frac{1}{\sqrt{pN}}\rho_{X}\psi_{k}\in\mathbb{R}^{N},

(39)

where $\psi_{k}$ are population eigenfunctions which are orthonormal in $H=L^{2}({\mathcal{M}},dV)$ , same as above.

Proposition 5.2.

Under Assumptions 1(A1), $p$ being uniform on ${\mathcal{M}}$ , and $h$ is Gaussian, for fixed $k_{max}\in\mathbb{N}$ , $K=k_{max}+1$ , assume that the eigenvalues $\mu_{k}$ for $k\leq K$ are all single multiplicity, and $\gamma_{K}>0$ as defined in (22), the constant $c_{K}$ as in Proposition 4.1. If as $N\to\infty$ , $\epsilon\to 0+$ , $\epsilon^{d/2+2}>c_{K}\frac{\log N}{N}$ , then for sufficiently large $N$ , w.p. $>1-4K^{2}N^{-10}-(2K+4)N^{-9}$ , there exist scalars $\alpha_{k}\neq 0$ , actually $|\alpha_{k}|=1+o(1)$ , such that

\|u_{k}-\alpha_{k}\phi_{k}\|_{2}=O\left(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2+1}}}\right),\quad 1\leq k\leq k_{max}.

Proof of Proposition 5.2.

The proof uses the same approach as that of Theorem 2.6 part 2) in [7], and since our setting is different, we include a proof for completeness.

When $k=1$ , we always have $\lambda_{1}=\mu_{1}=0$ , $u_{1}$ is the constant vector $u_{1}=\frac{1}{\sqrt{N}}\mathbf{1}_{N}$ , and $\psi_{1}$ is the constant function, and thus $\phi_{1}=u_{1}$ up to a sign. Under the condition of the current proposition, the assumptions of Proposition 4.1 are satisfied, and because $\epsilon^{d/2+2}>c_{K}\frac{\log N}{N}$ implies that $\epsilon^{d/2+1}=\Omega(\frac{\log N}{N})$ , the assumptions of Theorem 5.1 2) are also satisfied. We apply Theorem 5.1 2) to the $K$ functions $\psi_{1},\cdots,\psi_{K}$ . By a union bound, we have that when $N$ is large enough, w.p. $>1-2KN^{-9}$ , $\|L_{un}\phi_{k}-\mu_{k}\phi_{k}\|_{\infty}=\frac{1}{\sqrt{pN}}(O(\epsilon)+O(\sqrt{\frac{\log N}{N\epsilon^{d/2+1}}}))$ for $2\leq k\leq K$ . By that $\|v\|_{2}\leq\sqrt{N}\|v\|_{\infty}$ for any $v\in\mathbb{R}^{N}$ , this gives that there is $\text{Err}_{pt}>0$ ,

\|L_{un}\phi_{k}-\mu_{k}\phi_{k}\|_{2}\leq\text{Err}_{pt},\quad 2\leq k\leq K,\quad\text{Err}_{pt}=O(\epsilon)+O(\sqrt{\frac{\log N}{N\epsilon^{d/2+1}}}).

(40)

The constants in big-O depends on first $K$ eigenfunctions and are absolute ones because $K$ is fixed. Applying Proposition 4.1, and consider the intersection with the good event in Proposition 4.1, we have for each $2\leq k\leq K$ , $|\mu_{k}-\lambda_{k}|<\gamma_{K}$ . By definition of $\gamma_{K}$ as in (22),

\min_{1\leq j\leq N,\,j\neq k}|\mu_{k}-\lambda_{j}|>\gamma_{K}>0,\quad 2\leq k\leq k_{max}.

(41)

For each $k\leq k_{max}$ , let $S_{k}=\text{Span}\{u_{k}\}$ be the 1-dimensional subspace in $\mathbb{R}^{N}$ , and let $S_{k}^{\perp}$ be its orthogonal complement. We will show that $\|P_{S_{k}^{\perp}}\phi_{k}\|_{2}$ is small. By definition, $P_{S_{k}^{\perp}}\mu_{k}\phi_{k}=\sum_{j\neq k,j=1}^{N}\mu_{k}(u_{j}^{T}\phi_{k})u_{j}$ , and meanwhile, $P_{S_{k}^{\perp}}L_{un}\phi_{k}=\sum_{j\neq k,j=1}^{N}(u_{j}^{T}L_{un}\phi_{k})u_{j}=\sum_{j\neq k,j=1}^{N}\lambda_{j}(u_{j}^{T}\phi_{k})u_{j}$ . Subtracting the two gives that $P_{S_{k}^{\perp}}(\mu_{k}\phi_{k}-L_{un}\phi_{k})=\sum_{j\neq k,j=1}^{N}(\mu_{k}-\lambda_{j})(u_{j}^{T}\phi_{k})u_{j}$ . By that $u_{j}$ are orthonormal vectors, and (41),

\|P_{S_{k}^{\perp}}(\mu_{k}\phi_{k}-L_{un}\phi_{k})\|_{2}^{2}=\sum_{j\neq k,j=1}^{N}(\mu_{k}-\lambda_{j})^{2}(u_{j}^{T}\phi_{k})^{2}\geq\gamma_{K}^{2}\sum_{j\neq k,j=1}^{N}(u_{j}^{T}\phi_{k})^{2}=\gamma_{K}^{2}\|P_{S_{k}^{\perp}}\phi_{k}\|_{2}^{2}.

Then, combined with (40), we have that $\gamma_{K}\|P_{S_{k}^{\perp}}\phi_{k}\|_{2}\leq\|P_{S_{k}^{\perp}}(\mu_{k}\phi_{k}-L_{un}\phi_{k})\|_{2}\leq\|\mu_{k}\phi_{k}-L_{un}\phi_{k}\|_{2}\leq\text{Err}_{pt}$ , namely, $\|P_{S_{k}^{\perp}}\phi_{k}\|_{2}\leq\frac{\text{Err}_{pt}}{\gamma_{K}}$ .

By definition, $P_{S_{k}^{\perp}}\phi_{k}=\phi_{k}-(u_{k}^{T}\phi_{k})u_{k}$ , where $\|u_{k}\|_{2}=1$ . Note that $\phi_{k}$ are unit vectors up to an $O(\sqrt{\frac{\log N}{N}})$ error: Because the good event in Proposition 4.1 is under that in the eigenvalue UB Proposition 3.1, and specifically that of Lemma 3.4. Thus (17) holds, which means that $|\|\phi_{k}\|^{2}-1|\leq\text{Err}_{norm}$ , $1\leq k\leq K$ , where $\text{Err}_{norm}=O(\sqrt{\frac{\log N}{N}})$ . Then, one can verify that

|u_{k}^{T}\phi_{k}|=1+O(\text{Err}_{norm},\text{Err}_{pt}^{2})=1+o(1),

(42)

and then we set $\alpha_{k}=\frac{1}{u_{k}^{T}\phi_{k}}$ , and have that

\|\alpha_{k}\phi_{k}-u_{k}\|_{2}=\frac{O(\text{Err}_{pt})}{|u_{k}^{T}\phi_{k}|}\leq\frac{O(\text{Err}_{pt})}{1-O(\text{Err}_{norm},\text{Err}_{pt}^{2})}=O(\text{Err}_{pt})(1+O(\text{Err}_{norm},\text{Err}_{pt}^{2}))=O(\text{Err}_{pt}).

The bound holds for each $k\leq k_{max}$ . ∎

5.2 Step 3: refined eigenvalue LB

We now derive Step 3 for $L_{un}$ , the result being summarized in the following proposition.

Proposition 5.3.

Under the same condition of Proposition 5.2, $k_{max}$ is fixed. Then, for sufficiently large $N$ , with the same indicated high probability,

|\mu_{k}-\lambda_{k}|=O\left(\epsilon,\,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}\right),\quad 1\leq k\leq k_{max}.

Proof of Proposition 5.3.

We inherit the notations in the proof of Proposition 5.2. Again $\mu_{1}=\lambda_{1}=0$ . For $2\leq k\leq k_{max}$ , note that

u_{k}^{T}(L_{un}\phi_{k}-\mu_{k}\phi_{k})=(\lambda_{k}-\mu_{k})u_{k}^{T}\phi_{k},

(43)

and meanwhile, we have shown that $u_{k}=\alpha_{k}\phi_{k}+\varepsilon_{k}$ , where $\alpha_{k}=1+o(1)$ and $\|\varepsilon_{k}\|_{2}=O(\text{Err}_{pt})$ . Thus the l.h.s. of (43) equals

(\alpha_{k}\phi_{k}+\varepsilon_{k})^{T}(L_{un}\phi_{k}-\mu_{k}\phi_{k})=\alpha_{k}(\phi_{k}^{T}L_{un}\phi_{k}-\mu_{k}\|\phi_{k}\|_{2}^{2})+\varepsilon_{k}^{T}(L_{un}\phi_{k}-\mu_{k}\phi_{k})=:①+②.

By definition of $\phi_{k}$ , $\phi_{k}^{T}L_{un}\phi_{k}=\frac{1}{pN}(\rho_{X}\psi_{k})^{T}L_{un}(\rho_{X}\psi_{k})=\frac{1}{p^{2}}E_{N}(\rho_{X}\psi_{k})$ . The good event in Proposition 5.2 is under the good event $E_{UB}$ , under which Lemma 3.3 and Lemma 3.4 hold. Then by (16), $E_{N}(\rho_{X}\psi_{k})=p^{2}\mu_{k}+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})$ ; By (17), $\|\phi_{k}\|^{2}=1+O(\sqrt{\frac{\log N}{N}})$ . Putting together, and by that $\alpha_{k}=1+o(1)=O(1)$ ,

①=\alpha_{k}(\phi_{k}^{T}L_{un}\phi_{k}-\mu_{k}\|\phi_{k}\|_{2}^{2})=O(1)\left(\mu_{k}+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})-\mu_{k}(1+O(\sqrt{\frac{\log N}{N}}))\right)=O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}).

Meanwhile, by (40), $\|L_{un}\phi_{k}-\mu_{k}\phi_{k}\|_{2}\leq\text{Err}_{pt}$ , and then

|②|\leq\|\varepsilon_{k}\|_{2}\|L_{un}\phi_{k}-\mu_{k}\phi_{k}\|_{2}=O(\text{Err}_{pt}^{2}).

Because $\epsilon^{d/2+2}>c_{K}\frac{\log N}{N}$ for some $c_{K}>0$ , $\frac{\log N}{N\epsilon^{d/2+1}}=\epsilon\frac{\log N}{N\epsilon^{d/2+2}}<\frac{\epsilon}{c_{K}}$ , thus $\text{Err}_{pt}=O(\epsilon+\sqrt{\frac{\log N}{N\epsilon^{d/2+1}}})=O(\sqrt{\epsilon})$ , and then $②=O(\text{Err}_{pt}^{2})=O(\epsilon)$ . Back to (43), we have that

|\lambda_{k}-\mu_{k}||u_{k}^{T}\phi_{k}|=|①+②|=O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})+O(\epsilon),

and by (42), $|u_{k}^{T}\phi_{k}|=1+o(1)$ , thus $|\lambda_{k}-\mu_{k}|=\frac{|①+②|}{1+o(1)}=O(|①+②|)=O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})$ . The above holds for all $k\leq k_{max}$ . ∎

5.3 Eigen-convergence rate

We are ready to prove the main theorems on eigen-convergence of graph Laplacians, when $p$ is uniform and the kernel function $h$ is Gaussian.

Theorem 5.4 (eigen-convergence of $L_{un}$ ).

Under Assumption 1 (A1), $p$ is uniform on ${\mathcal{M}}$ , and $h$ is Gaussian. For $k_{max}\in\mathbb{N}$ fixed, assume that the eigenvalues $\mu_{k}$ for $k\leq K:=k_{max}+1$ are all single multiplicity, and the constant $c_{K}$ as in Proposition 4.1. Consider first $k_{max}$ eigenvalues and eigenvectors of $L_{un}$ , $L_{un}u_{k}=\lambda_{k}u_{k}$ , $u_{k}^{T}u_{l}=\delta_{kl}$ , and the vectors $\phi_{k}$ are defined as in (39). If as $N\to\infty$ , $\epsilon\to 0+$ , $\epsilon^{d/2+2}>c_{K}\frac{\log N}{N}$ , then for sufficiently large $N$ , w.p. $>1-4K^{2}N^{-10}-(2K+4)N^{-9}$ ,

|\mu_{k}-\lambda_{k}|=O\left(\epsilon,\,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}\right),\quad 1\leq k\leq k_{max},

(44)

and there exist scalars $\alpha_{k}\neq 0$ , actually $|\alpha_{k}|=1+o(1)$ , such that

\|u_{k}-\alpha_{k}\phi_{k}\|_{2}=O\left(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2+1}}}\right),\quad 1\leq k\leq k_{max}.

(45)

Remark 3 (Choice of $\epsilon$ and overall rates).

The eigen-convergence bounds (44) and (45) are provided in the combined form of $\epsilon$ and $N$ , as long as the condition $\epsilon=o(1)$ and $\epsilon^{d/2+2}>c_{K}\log N/N$ holds. The bias error in both cases is $O(\epsilon)$ , and the variance error has a different inverse power of $\epsilon$ ( $-d/4$ and $-d/4-1/2$ respectively). The eigenvalue convergence (44) achieves the form rate ${\rm Err}_{form}=O\left(\epsilon,\,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}\right)$ , which is the rate of the Dirichlet form convergence, c.f. Theorem 3.2. The (2-norm) eigenvector convergence (45) achieves the point-wise rate ${\rm Err}_{pt}=O\left(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2+1}}}\right)$ , which is the rate of point-wise convergence of graph Laplacian, c.f. Theorem 5.1.

The different powers of $\epsilon$ leads to different optimal choice of $\epsilon$ , in order of $N$ , to achieve the best overall rates for eigenvalue and eigenvector convergence respectively. Specifically,

•

The optimal choice of $\epsilon$ to minimize ${\rm Err}_{form}$ is when $\epsilon=(c^{\prime}\frac{\log N}{N})^{1/(d/2+2)}$ for $c^{\prime}>c_{K}$ (which is also the smallest order of $\epsilon$ allowed by the theorem). This choice leads to

|\mu_{k}-\lambda_{k}|={O}\left(({\log N}/{N})^{{1}/{(d/2+2)}}\right)=\tilde{O}(N^{-1/(d/2+2)}),\quad 1\leq k\leq k_{max},

which is the best overall rate of eigenvalue convergence by our theory. We use $\tilde{O}(\cdot)$ to denote the involvement of certain factor of $\log N$ . In this case, $\|u_{k}-\alpha_{k}\phi_{k}\|_{2}={O}((\frac{\log N}{N})^{{1}/{(d+4)}})$ .

•

The optimal choice of $\epsilon$ to minimize ${\rm Err}_{pt}$ is when $\epsilon\sim(\log N/N)^{{1}/{(d/2+3)}}$ , which leads to

\|u_{k}-\alpha_{k}\phi_{k}\|_{2}={O}\left(({\log N}/{N})^{{1}/{(d/2+3)}}\right)=\tilde{O}(N^{-{1}/{(d/2+3)}}),\quad 1\leq k\leq k_{max},

which is the best overall rate of eigenvector convergence. In this case, $|\mu_{k}-\lambda_{k}|=\tilde{O}(N^{-{1}{(d/2+3)}})$ .

We can see that the overall rate of eigenvalue convergence achieves the best overall rate of form convergence $\tilde{O}(N^{-1/(d/2+2)})$ , and that of eigenvector (2-norm) convergence achieves the best overall rate of point-wise convergence $\tilde{O}(N^{-1/(d/2+3)})$ , at the optimal $\epsilon$ for each convergence respectively.

Proof of Theorem 5.4.

Under the condition of the theorem, the eigenvector and eigenvalue error bounds have been proved in Proposition 5.2 and Proposition 5.3. For the the two specific asymptotic scaling of $\epsilon$ , the rate follows from the bounds involving both $\epsilon$ and $N$ . ∎

Remark 4 (Comparison to compactly supported $h$ ).

For $h={\bf 1}_{[0,1)}$ (see also Remark 2), the point-wise convergence of graph Laplacian is known to have the rate as $\text{Err}_{pt,ind}=O\left(\sqrt{\epsilon},\,\sqrt{\frac{\log N}{N\epsilon^{d/2+1}}}\right)$ , see [19, 4, 27, 7] among others. While our way of Step 1 cannot be applied to such $h$ , [7] covered this case when $d\geq 2$ , and provided the eigenvalue and eigenvector consistency up to $\text{Err}_{pt,ind}$ when $\epsilon^{d/2+2}=\Omega(\frac{\log N}{N})$ . The scaling $\epsilon^{d/2+2}=\tilde{\Theta}(N^{-1})$ is the optimal one to balance the bias and variance errors in $\text{Err}_{pt,ind}$ , and then it gives the overall error rate as $\tilde{O}(N^{-1/(d+4)})$ , which agrees with the eigen-convergence rate in [7]. Here $\tilde{O}(\cdot)$ and $\tilde{\Theta}(\cdot)$ indicate that the constant is possibly multiplied by a factor of certain power of $\log N$ . Meanwhile, we note that, if following our approach of using the Dirichlet form convergence rate, the eigenvalue consistency can be improved to be squared namely $\tilde{O}(N^{-1/(d/2+2)})$ when $\epsilon=\tilde{\Theta}(N^{-1/(d/2+2)})$ . Specifically, by Remark 2, the Dirichlet form convergence with indicator $h$ is $\text{Err}_{form,ind}=O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})$ . Then, once the initial crude eigenvalue LB is established, in Step 2, the eigenvector 2-norm consistency can be shown to be $\text{Err}_{pt,ind}$ . In Step 3, the eigenvalue consistency for the first $k_{max}$ eigenvalues can be shown to be $O(\text{Err}_{form,ind},\text{Err}_{pt,ind}^{2})=O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})$ . This would imply the eigenvalue convergence rate of $\tilde{O}(N^{-1/(d/2+2)})$ under the regime where $\epsilon=\tilde{\Theta}(N^{-1/(d/2+2)})$ , while the eigenvector consistency remains $\tilde{O}(N^{-1/(d+4)})$ . Compared to Remark 3, these rates are the same as Gaussian kernel when setting $\epsilon=\tilde{\Theta}(N^{-1/(d/2+2)})$ (the optimal order to minimize the eigenvalue rate which is ${\rm Err}_{form}$ ). However, using Gaussian kernel allows to obtain a better rate for eigenvector convergence, namely $\tilde{O}(N^{-1/(d/2+3)})$ , by setting $\epsilon\sim\tilde{\Theta}(N^{-1/(d/2+3)})$ (the optimal order to minimize the eigenvector convergence rate which is ${\rm Err}_{pt}$ ). This improved eigenvector (2-norm) rate is due to the improved point-wise rate of smooth kernel ${\rm Err}_{pt}$ than that of the indicator kernel $\text{Err}_{pt,ind}$ , and specifically, the bias error is $O(\epsilon)$ instead of $O(\sqrt{\epsilon})$ .

Remark 5 (Extension to larger eigenvalue multiplicity).

The result extends when the population eigenvalues $\mu_{k}$ have multiplicity greater than one. Suppose we consider $0=\mu^{(1)}<\mu^{(2)}<\cdots<\mu^{(M)}<\cdots$ , which are distinct eigenvalues, and $\mu^{(m)}$ has multiplicity $l_{m}\geq 1$ . Then let $k_{max}=\sum_{m=1}^{M}l_{m}$ , $K=\sum_{m=1}^{M+1}l_{m}$ , $\mu_{K}=\mu^{(M+1)}$ , and $\{\mu_{k},\psi_{k}\}_{k=1}^{K}$ are sorted eigenvalues and associated eigenfunctions. Step 0. eigenvalue UB holds, since Proposition 3.1 does not require single multiplicity. In Step 1, the only place in Proposition 4.1 where single multiplicity of $\mu_{k}$ is used is in the definition of $\gamma_{K}$ . Then, by changing to

\gamma^{(M)}=\frac{1}{2}\min_{1\leq m\leq M}(\mu^{(m+1)}-\mu^{(m)})>0,

(46)

and defining $\delta=0.5\frac{\gamma^{(M)}}{\mu_{K}}$ , $0<\delta<0.5$ is a positive constant depending on ${\mathcal{M}}$ and $K$ , Proposition 4.1 proves that $|\lambda_{k}-\mu^{(m)}|<\gamma^{(M)}$ for all $k\leq K$ , i.e. $m\leq M+1$ . This allows to extend Step 2 Proposition 5.2 by considering the projection $P_{S^{\perp}}$ where the subspace in $\mathbb{R}^{N}$ is spanned by eigenvectors whose eigenvalues $\lambda_{k}$ approaches $\mu_{k}=\mu^{(m)}$ , similar as in the original proof of Theorem 2.6 part 2) in [7]. Specifically, suppose $\mu_{i}=\cdots=\mu_{i+l_{m}-1}=\mu^{(m)}$ , $2\leq m\leq M$ , let $S^{(m)}=\text{Span}\{u_{i},\cdots,u_{i+l_{m}-1}\}$ , and the index set $I_{m}:=\{i,\cdots,i+l_{m}-1\}$ . For eigenfunction $\psi_{k}$ , $k\in I_{m}$ , then $\mu_{k}=\mu^{(m)}$ , similarly as in the proof of Proposition 5.2, one can verify that

\|P_{(S^{(m)})^{\perp}}(\mu_{k}\phi_{k}-L_{un}\phi_{k})\|_{2}^{2}=\sum_{j\notin I_{m}}(\mu_{k}-\lambda_{j})^{2}(u_{j}^{T}\phi_{k})^{2}\geq(\gamma^{(M)})^{2}\sum_{j\notin I_{m}}(u_{j}^{T}\phi_{k})^{2}=(\gamma^{(M)})^{2}\|P_{(S^{(m)})^{\perp}}\phi_{k}\|_{2}^{2},

which gives that $\|\phi_{k}-P_{S^{(m)}}\phi_{k}\|_{2}=\|P_{(S^{(m)})^{\perp}}\phi_{k}\|_{2}\leq\frac{1}{\gamma^{(M)}}\text{Err}_{pt}$ , for all $k\in I_{m}$ . By that $\{\phi_{k}\}_{k=1}^{K}$ are near orthonormal with large $N$ (Lemma 3.4), this proves that there exists an $l_{m}$ -by- $l_{m}$ orthogonal transform $Q_{m}$ , and $|\alpha_{k}|=1+o(1)$ , such that $\|u_{k}-\alpha_{k}\phi_{k}^{\prime}\|_{2}=O(\text{Err}_{pt})=O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2+1}}})$ , $k\in I_{m}$ , where $[\phi_{k}^{\prime}]_{k\in I_{m}}=[\phi_{k}]_{k\in I_{m}}Q_{m}$ , and the notation $[v_{j}]_{j\in J}$ stands for the $N$ -by- $|J|$ matrix formed by concatenating the vectors $v_{j}$ as columns. This proves consistency of empirical eigenvectors $u_{k}$ up to the point-wise rate for $k\leq k_{max}$ . Finally, Step 3 Proposition 5.3 extends by considering (43) for $u_{k}$ and $\phi_{k}^{\prime}$ , making use of $\|u_{k}-\alpha_{k}\phi_{k}^{\prime}\|_{2}=O(\text{Err}_{pt})$ , the Dirichlet form convergence of $E_{N}(\rho_{X}\psi_{k})$ (Lemma 3.3), and that $\{\phi_{k}^{\prime}\}_{k\in I_{m}}$ is transformed from $\{\phi_{k}\}_{k\in I_{m}}$ by an orthogonal matrix $Q_{m}$ .

To address the eigen-convergence of $L_{rw}$ , we define the $D/N$ -weighted 2-norm as

\|u\|_{\frac{D}{N}}^{2}=\frac{1}{N}u^{T}Du,

and recall that eigenvectors of $L_{rw}$ are $D$ -orthogonal. The following theorem is the counterpart of Theorem 5.4 for $L_{rw}$ , obtaining the same rates.

Theorem 5.5 (eigen-convergence of $L_{rw}$ ).

Under the same condition and setting of ${\mathcal{M}}$ , $p$ being uniform, $h$ being Gaussian, and $k_{max}$ , K, $\mu_{k}$ , $\epsilon$ same as in Theorem 5.4. Consider first $k_{max}$ eigenvalues and eigenvectors of $L_{rw}$ , $L_{rw}v_{k}=\lambda_{k}v_{k}$ , $v_{k}^{T}Dv_{l}=\delta_{kl}Np$ , i.e. $\|v_{k}\|_{\frac{D}{N}}^{2}=p$ , and the vectors $\phi_{k}$ defined as in (39). Then, for sufficiently large $N$ , w.p. $>1-4K^{2}N^{-10}-(4K+6)N^{-9}$ , $\|v_{k}\|_{2}=1+o(1)$ , and the same bound of $|\mu_{k}-\lambda_{k}|$ and $\|v_{k}-\alpha_{k}\phi_{k}\|_{2}$ as in Theorem 5.4 hold for $1\leq k\leq k_{max}$ , with certain scalars $\alpha_{k}$ satisfying $|\alpha_{k}|=1+o(1)$ ,

The extension to when $\mu_{k}$ has greater than 1 multiplicity is possible, similarly as in Remark 5. The proof of $L_{rw}$ uses almost the same method as for $L_{un}$ , and the difference is that $v_{k}$ are no longer orthonormal but $D$ -orthogonal. This is handled by that $\|u\|_{2}^{2}$ and $\frac{1}{p}\|u\|_{D/N}^{2}$ agrees in relative error up to the form rate, due to the concentration of $D_{i}/N$ (Lemma 3.5). The detailed proof is left to Appendix C.3.

6 Density-corrected graph Laplacian

We consider $p$ as in Assumption 1(A2). The density-corrected graph Laplacian is defined as [10]

\tilde{L}_{rw}=\frac{1}{\frac{m_{2}}{2m_{0}}\epsilon}(I-\tilde{D}^{-1}\tilde{W}),\quad\tilde{W}_{ij}=\frac{W_{ij}}{D_{i}D_{j}},\quad\tilde{D}_{ii}=\sum_{j=1}^{N}\tilde{W}_{ij},

where $W_{ij}=K_{\epsilon}(x_{i},x_{j})$ as before, and $D$ is the degree matrix of $W$ . The density-corrected graph laplacian recovers Laplace-Beltrami operator when $p$ is not uniform.. In this section, we extend the theory of point-wise convergence, Dirichlet form convergence, and eigen-convergence to such graph Laplacian.

6.1 Point-wise convergence of $\tilde{L}_{rw}$

This subsection proves Theorem 6.2, which shows that the point-wise rate of $\tilde{L}_{rw}$ is same as that of $L_{rw}$ without the density-correction. The result is for general differentiable $h$ satisfying Assumption 2, which can be of independent interest.

We first establish the counterpart of Lemma 3.5 about the concentration of all $\frac{1}{N}D_{i}=\frac{1}{N}\sum_{j=1}^{N}W_{ij}$ when $p$ is not uniform. The deviation bound is uniform for all $i$ and has an bias error at $O(\epsilon^{2})$ .

Lemma 6.1.

Under Assumptions 1 and 2, suppose as $N\to\infty$ , $\epsilon\to 0+$ , $\epsilon^{d/2}=\Omega(\frac{\log N}{N})$ . Then,

1) When $N$ is large enough, w.p. $>1-2N^{-9}$ , $D_{i}>0$ for all $i$ s.t. $\tilde{W}$ is well-defined, and

\frac{1}{N}D_{i}=m_{0}\tilde{p}_{\epsilon}(x_{i})+O\left(\epsilon^{2},\sqrt{\frac{\log N}{N\epsilon^{d/2}}}\right),\quad\tilde{p}_{\epsilon}:=p+\tilde{m}\epsilon(\omega p+\Delta p),\quad 1\leq i\leq N.

(47)

where $\omega\in C^{\infty}({\mathcal{M}})$ is determined by manifold extrinsic coordinates, and $\tilde{m}[h]=\frac{m_{2}[h]}{2m_{0}[h]}$ .

2) When $N$ is large enough, w.p. $>1-4N^{-9}$ , $\tilde{D}_{i}>0$ for all $i$ s.t. $\tilde{L}_{rw}$ is well-defined, and

\sum_{j=1}^{N}W_{ij}\frac{1}{D_{j}}=1+O\left(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}\right),\quad 1\leq i\leq N.

(48)

The constants in big- $O$ in parts 1) and 2) depend on ( ${\mathcal{M}},p)$ , and are uniform for all $i$ .

The proof is left to Appendix D. The following theorem proves the point-wise rate of $\tilde{L}_{rw}$ .

Theorem 6.2.

Under Assumptions 1 and 2, if as $N\to\infty$ , $\epsilon\to 0+$ , $\epsilon^{d/2+1}=\Omega(\frac{\log N}{N})$ , then for any $f\in C^{4}({\mathcal{M}})$ , when $N$ is large enough, w.p. $>1-8N^{-9}$ ,

\frac{1}{\epsilon\frac{m_{2}}{2m_{0}}}(I-\tilde{D}^{-1}\tilde{W})(\rho_{X}f)(x_{i})=-\Delta f(x_{i})+\varepsilon_{i},\quad\sup_{1\leq i\leq N}|\varepsilon_{i}|=O(\epsilon)+O\left(\sqrt{\frac{\log N}{N\epsilon^{d/2+1}}}\right).

The constants in the big-O notation depend on ${\mathcal{M}}$ , $p$ and the $C^{4}$ norm of $f$ .

The theorem slightly improves the point-wise convergence rate of $O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2+2}}})$ in [28]. It is proved using the same techniques as the analysis of point-wise convergence of $L_{rw}$ in [27, 9], and we include a proof for completeness here.

Proof of Theorem 6.2.

By definition,

-\frac{1}{\epsilon\frac{m_{2}}{2m_{0}}}(I-\tilde{D}^{-1}\tilde{W})(\rho_{X}f)(x_{i})=\frac{1}{\epsilon\frac{m_{2}}{2m_{0}}}\frac{\sum_{j=1}^{N}W_{ij}\frac{f(x_{j})-f(x_{i})}{D_{j}}}{\sum_{j=1}^{N}W_{ij}\frac{1}{D_{j}}}.

(49)

The proof of Lemma 6.1 has constructed two good events $E_{1}$ and $E_{2}$ ( $E_{1}$ is for Part 1) to hold, Part 2) assumes $E_{1}$ and $E_{2}$ ), such that with large enough $N$ , $E_{1}\cap E_{2}$ happens w.p. $>1-4N^{-9}$ , under which $D_{i}$ , $\tilde{D}_{i}>0$ for all $i$ , $\tilde{W}$ and $\tilde{L}_{rw}$ are well-defined, and equations (47), (A.21), and (48) hold. (48) provides the concentration of the denominator of the r.h.s. of (49). We now consider the numerator. Note that, with sufficiently small $\epsilon$ , $\tilde{p}_{\epsilon}$ is uniformly bounded from below by $O(1)$ constant $p_{min}^{\prime}$ . This is because $\omega,p\in C^{\infty}({\mathcal{M}})$ , ${\mathcal{M}}$ is compact, then $(\omega p+\Delta p)$ is uniformly bounded, and meanwhile $p$ is uniformly bounded from below. Thus, under $E_{1}$ ,

\frac{1}{N}\sum_{j=1}^{N}W_{ij}\frac{f(x_{j})-f(x_{i})}{\frac{1}{N}D_{j}}=\frac{1}{N}\sum_{j=1}^{N}\frac{W_{ij}(f(x_{j})-f(x_{i}))}{m_{0}\tilde{p}_{\epsilon}(x_{j})(1+\varepsilon_{j})},\quad\max_{1\leq j\leq N}|\varepsilon_{j}|=O(\epsilon^{2},\sqrt{\frac{\log N}{N\epsilon^{d/2}}}),

and the equation equals

	$\displaystyle\frac{1}{N}\sum_{j=1}^{N}\frac{W_{ij}(f(x_{j})-f(x_{i}))}{m_{0}\tilde{p}_{\epsilon}(x_{j})}(1+\varepsilon_{j}^{\prime})$	$\displaystyle=\frac{1}{N}\sum_{j=1}^{N}\frac{W_{ij}(f(x_{j})-f(x_{i}))}{m_{0}\tilde{p}_{\epsilon}(x_{j})}+\frac{1}{N}\sum_{j=1}^{N}\frac{W_{ij}(f(x_{j})-f(x_{i}))}{m_{0}\tilde{p}_{\epsilon}(x_{j})}\varepsilon_{j}^{\prime}$
		$\displaystyle=:①+②,\quad\max_{1\leq j\leq N}\|\varepsilon_{j}^{\prime}\|=O(\epsilon^{2},\sqrt{\frac{\log N}{N\epsilon^{d/2}}})$

and we analyze the two terms respectively.

To bound $|②|$ , we use $W_{ij}\geq 0$ and again that $\tilde{p}_{\epsilon}(x)\geq p_{min}^{\prime}>0$ to have

\displaystyle|②|\leq\frac{1}{N}\sum_{j=1}^{N}\frac{W_{ij}|f(x_{j})-f(x_{i})|}{m_{0}\tilde{p}_{\epsilon}(x_{j})}|\varepsilon_{j}^{\prime}|\leq\frac{\max_{1\leq j\leq N}|\varepsilon_{j}^{\prime}|}{m_{0}p_{min}^{\prime}}\cdot\frac{1}{N}\sum_{j=1}^{N}{W_{ij}|f(x_{j})-f(x_{i})|}.

We claim that, for large enough $N$ , w.p. $>1-2N^{-9}$ , and we call this good event $E_{3}$ , under which

\frac{1}{N}\sum_{j=1}^{N}{W_{ij}|f(x_{j})-f(x_{i})|}=O(\sqrt{\epsilon}),\quad 1\leq i\leq N,

(50)

and the proof is in below. With (50), under $E_{3}$ , $|②|$ can be bounded by

|②|=(\max_{1\leq j\leq N}|\varepsilon_{j}^{\prime}|)O(\sqrt{\epsilon})=O(\epsilon^{2},\sqrt{\frac{\log N}{N\epsilon^{d/2}}})O(\sqrt{\epsilon})=O(\epsilon^{5/2},\sqrt{\frac{\log N}{N\epsilon^{d/2-1}}}).

(51)

The analysis of $①$ uses concentration of independent sum again. Condition on $x_{i}$ and consider

①^{\prime}=\frac{1}{N-1}\sum_{j\neq i,j=1}^{N}K_{\epsilon}(x_{i},x_{j})\frac{f(x_{j})-f(x_{i})}{\tilde{p}_{\epsilon}(x_{j})}=:\frac{1}{N-1}\sum_{j\neq i,j=1}^{N}Y_{j},

and we have $①=\frac{1}{m_{0}}(1-\frac{1}{N})①^{\prime}$ . Due to uniform boundedness of $\tilde{p}_{\epsilon}$ from below by $p_{min}^{\prime}>0$ , $|Y_{j}|$ are bounded by $L_{Y}=\Theta(\epsilon^{-d/2})$ . We claim that the expectation (proof in below)

\mathbb{E}Y_{j}=\int_{{\mathcal{M}}}K_{\epsilon}(x_{i},y)\frac{f(y)p(y)}{\tilde{p}_{\epsilon}(y)}dV(y)-f(x_{i})\int_{{\mathcal{M}}}K_{\epsilon}(x_{i},y)\frac{p(y)}{\tilde{p}_{\epsilon}(y)}dV(y)=\frac{m_{2}}{2}\epsilon\Delta f(x_{i})+O(\epsilon^{2}).

(52)

The variance of $Y_{j}$ is bounded by

	$\displaystyle\mathbb{E}Y_{j}^{2}$	$\displaystyle=\int_{{\mathcal{M}}}K_{\epsilon}(x_{i},y)^{2}\left(\frac{f(y)-f(x_{i})}{\tilde{p}_{\epsilon}(y)}\right)^{2}p(y)dV(y)$
		$\displaystyle\leq\frac{1}{p_{min}^{\prime 2}}\int_{{\mathcal{M}}}K_{\epsilon}(x_{i},y)^{2}\left(f(y)-f(x_{i})\right)^{2}p(y)dV(y)\leq\nu_{Y}=\Theta_{f,p}(\epsilon^{-d/2+1}),$

which follows the same derivation as in the proof of the point-wise convergence of $L_{rw}$ without density-correction, c.f. Theorem 5.1 1), and can be directly verified by a similar calculation as in (54). We attempt at the large deviation bound at $\Theta(\sqrt{\frac{\log N}{N}\nu_{Y}})\sim(\frac{\log N}{N\epsilon^{d/2-1}})^{1/2}$ which is of small order than $\frac{\nu_{Y}}{L_{Y}}=\Theta(\epsilon)$ under the theorem condition that $\epsilon^{d/2+1}=\Omega(\frac{\log N}{N})$ . Thus the classical Bernstein gives that for large enough $N$ , where the threshold is determined by $({\mathcal{M}},f,p)$ and uniform for $x_{i}$ , w.p. $>1-2N^{-10}$ ,

①^{\prime}=\mathbb{E}Y_{j}+O(\sqrt{\frac{\log N}{N}\nu_{Y}})=\frac{m_{2}}{2}\epsilon\Delta f(x_{i})+O(\epsilon^{2})+O(\sqrt{\frac{\log N}{N\epsilon^{d/2-1}}}),

and as a result,

①=\tilde{m}\epsilon\Delta f(x_{i})+O(\epsilon^{2})+O(\sqrt{\frac{\log N}{N\epsilon^{d/2-1}}}).

(53)

By a union bound over the events needed at $N$ points, we have that (53) holds at all $x_{i}$ under a good event $E_{4}$ which happens w.p. $>1-2N^{-9}$ .

Putting together, under $E_{3}$ and $E_{4}$ , by (51) and (53), at all $x_{i}$ ,

	$\displaystyle\frac{1}{\epsilon}\sum_{j=1}^{N}W_{ij}\frac{f(x_{j})-f(x_{i})}{D_{j}}$	$\displaystyle=\tilde{m}\Delta f(x_{i})+O(\epsilon)+O(\sqrt{\frac{\log N}{N\epsilon^{d/2+1}}})+O(\epsilon^{3/2},\sqrt{\frac{\log N}{N\epsilon^{d/2+1}}})$
		$\displaystyle=\tilde{m}\Delta f(x_{i})+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2+1}}}).$

Combined with (48), under $E_{1},E_{2},E_{3},E_{4}$ ,

\displaystyle\frac{1}{\epsilon\tilde{m}}\frac{\sum_{j=1}^{N}W_{ij}\frac{f(x_{j})-f(x_{i})}{D_{j}}}{\sum_{j=1}^{N}W_{ij}\frac{1}{D_{j}}}

\displaystyle=\frac{\Delta f(x_{i})+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2+1}}})}{1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})}=\Delta f(x_{i})+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2+1}}}).

It remains to establish (50) and (52) to finish the proof of the theorem.

Proof of (50): Define r.v. $Y_{j}=W_{ij}|f(x_{j})-f(x_{i})|$ and condition on $x_{i}$ , for $j\neq i$ , $\mathbb{E}Y_{j}=\int_{{\mathcal{M}}}K_{\epsilon}(x_{i},y)|f(y)-f(x_{i})|p(y)dV(y)$ . Let $\delta_{\epsilon}=\sqrt{(\frac{d+10}{a})\epsilon\log{\frac{1}{\epsilon}}}$ , for any $x\in{\mathcal{M}}$ , $K_{\epsilon}(x,y)=O(\epsilon^{10})$ when $y\notin B_{\delta_{\epsilon}}(x)$ , then

	$\displaystyle\int_{{\mathcal{M}}}K_{\epsilon}(x,y)\|f(y)-f(x)\|p(y)dV(y)$
	$\displaystyle=\int_{B_{\delta_{\epsilon}}(x)}K_{\epsilon}(x,y)\|f(y)-f(x)\|p(y)dV(y)+O(\epsilon^{10})\\|f\\|_{\infty}\\|p\\|_{\infty}$
	$\displaystyle\leq\int_{B_{\delta_{\epsilon}}(x)}K_{\epsilon}(x,y)(\\|\nabla f\\|_{\infty}\\|y-x\\|)p(y)dV(y)+O_{f,p}(\epsilon^{10})$
	$\displaystyle=O_{f,p}(\sqrt{\epsilon})+O_{f,p}(\epsilon^{10})=O(\sqrt{\epsilon}).$

The $O_{f,p}(\sqrt{\epsilon})$ is obtained because $\|p\|_{\infty}$ , $\|\nabla f\|_{\infty}$ are finite constants, and

	$\displaystyle\frac{1}{\sqrt{\epsilon}}\int_{B_{\delta_{\epsilon}}(x)}K_{\epsilon}(x,y)\\|y-x\\|dV(y)=\int_{B_{\delta_{\epsilon}}(x)}\epsilon^{-d/2}h(\frac{\\|x-y\\|^{2}}{\epsilon})\frac{\\|y-x\\|}{\sqrt{\epsilon}}dV(y)$
	$\displaystyle~{}~{}~{}\leq\int_{B_{\delta_{\epsilon}}(x)}\epsilon^{-d/2}a_{0}e^{-a\frac{\\|x-y\\|^{2}}{\epsilon}}\frac{\\|y-x\\|}{\sqrt{\epsilon}}dV(y)$
	$\displaystyle~{}~{}~{}\leq\int_{\\|u\\|<1.1\delta_{\epsilon},\,u\in\mathbb{R}^{d}}a_{0}e^{-\frac{a}{1.1}\\|u\\|^{2}}\frac{\\|u\\|}{0.9}(1+O(\\|u\\|^{2}))du=O(1),$		(54)

where $u\in\mathbb{R}^{d}$ is the projected coordinates in the tangent plane $T_{x}({\mathcal{M}})$ , and the comparison of $\|x-y\|_{\mathbb{R}^{D}}$ to $\|u\|$ (namely $0.9\|x-y\|_{\mathbb{R}^{D}}<\|u\|<1.1\|x-y\|_{\mathbb{R}^{D}}$ ) and the volume comparison (namely $dV(y)=(1+O(\|u\|^{2}))du$ ) hold when $\delta_{\epsilon}<\delta_{0}({\mathcal{M}})$ which is a constant depending on ${\mathcal{M}}$ , see e.g. Lemma A.1 in [9].

Meanwhile, $|Y_{j}|$ is bounded by $L_{Y}=\|f\|_{\infty}\Theta(\epsilon^{-d/2})$ , and the variance of $Y_{j}$ is bounded by $\mathbb{E}Y_{j}^{2}$ and then bounded by $\nu_{Y}=\Theta(\epsilon^{-d/2+1})$ , by a similar calculation as in (54). We attempt at the large deviation bound at $\Theta(\sqrt{\frac{\log N}{N}\nu_{Y}})\sim(\frac{\log N}{N\epsilon^{d/2-1}})^{1/2}$ which is of small order than $\frac{\nu_{Y}}{L_{Y}}=\Theta(\epsilon)$ under the theorem condition that $\epsilon^{d/2+1}=\Omega(\frac{\log N}{N})$ . Thus, for each $i$ , when $N$ is enough where the threshold is determined by $({\mathcal{M}},f,p)$ and uniform for $x_{i}$ , w.p. $>1-2N^{-10}$ ,

\frac{1}{N-1}\sum_{j\neq i}Y_{j}=\mathbb{E}Y_{j}+O(\sqrt{\frac{\log N}{N\epsilon^{d/2-1}}})=O(\sqrt{\epsilon})+o(\epsilon)=O(\sqrt{\epsilon}).

The $j=i$ term in (50) equals zero. By the same argument of independence of $x_{i}$ from $\{x_{j}\}_{j\neq i}$ and the union bound over $N$ events, we have proved (50).

Proof of (52): Note that

\frac{p}{\tilde{p}_{\epsilon}}=\frac{1}{1+\epsilon\tilde{m}(\omega+\frac{\Delta p}{p})}=1-\epsilon\tilde{m}(\omega+\frac{\Delta p}{p})+\epsilon^{2}r_{\epsilon}=1-\epsilon r_{1}+\epsilon^{2}r_{\epsilon},

where $r_{1}:=\tilde{m}(\omega+\frac{\Delta p}{p})$ is a deterministic function, $r_{1}\in C^{\infty}({\mathcal{M}})$ ; $r_{\epsilon}\in C^{\infty}({\mathcal{M}})$ , and $\|r_{\epsilon}\|_{\infty}=O(1)$ when $\epsilon$ is less than some $O(1)$ threshold due to that $\|\omega+\frac{\Delta p}{p}\|_{\infty}=O(1)$ . Then,

	$\displaystyle\int_{{\mathcal{M}}}K_{\epsilon}(x_{i},y)\frac{fp}{\tilde{p}_{\epsilon}}(y)dV(y)=\int_{{\mathcal{M}}}K_{\epsilon}(x_{i},y)f(y)(1-\epsilon r_{1}+\epsilon^{2}r_{\epsilon})(y)dV(y)$
	$\displaystyle~{}~{}~{}=\int_{{\mathcal{M}}}K_{\epsilon}(x_{i},y)f(y)dV(y)-\epsilon\int_{{\mathcal{M}}}K_{\epsilon}(x_{i},y)(fr_{1})(y)dV(y)+\epsilon^{2}\int_{{\mathcal{M}}}K_{\epsilon}(x_{i},y)(fr_{\epsilon})(y)dV(y)$
	$\displaystyle~{}~{}~{}=\left(m_{0}f(x_{i})+\frac{m_{2}}{2}\epsilon(\omega f+\Delta f)(x_{i})+O(\epsilon^{2})\right)-\epsilon\left(m_{0}fr_{1}(x_{i})+O(\epsilon)\right)+O(\epsilon^{2})$
	$\displaystyle~{}~{}~{}=m_{0}f(x_{i})+\frac{m_{2}}{2}\epsilon(\omega f+\Delta f-\frac{1}{\tilde{m}}fr_{1})(x_{i})+O(\epsilon^{2}),$

and taking $f=1$ gives that

\int_{{\mathcal{M}}}K_{\epsilon}(x_{i},y)\frac{p}{\tilde{p}_{\epsilon}}(y)dV(y)=m_{0}+\frac{m_{2}}{2}\epsilon(\omega-\frac{1}{\tilde{m}}r_{1})(x_{i})+O(\epsilon^{2}).

Putting together and subtracting the two terms in (52) proves that $\mathbb{E}Y_{j}=\frac{m_{2}}{2}\epsilon\Delta f(x_{i})+O(\epsilon^{2})$ . ∎

6.2 Dirichlet form convergence of density-corrected graph Laplacian

The graph Dirichlet form of density-corrected graph Laplacian is defined as

\tilde{E}_{N}(u):=\frac{1}{\frac{m_{2}}{2m_{0}^{2}}\epsilon}u^{T}(\tilde{D}-\tilde{W})u=\frac{1}{\frac{m_{2}}{m_{0}^{2}}\epsilon}\sum_{i,j=1}^{N}\tilde{W}_{i,j}(u_{i}-u_{j})^{2}=\frac{1}{\frac{m_{2}}{m_{0}^{2}}\epsilon}\sum_{i,j=1}^{N}W_{i,j}\frac{(u_{i}-u_{j})^{2}}{D_{i}D_{j}}.

(55)

We establish the counter part of Theorem 3.2, which achieves the same form rate. The theorem is for general differentiable $h$ , which can be of independent interest.

Theorem 6.3.

Under Assumptions 1 and 2, if as $N\to\infty$ , $\epsilon\to 0+$ , $\epsilon^{d/2}N=\Omega(\log N)$ , then for any $f\in C^{\infty}({{\mathcal{M}}})$ , when $N$ is sufficiently large, w.p. $>1-2N^{-9}-2N^{-10}$ ,

\tilde{E}_{N}(\rho_{X}f)=\langle f,-\Delta f\rangle+O_{p,f}\left(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}\right).

Proof of Theorem 6.3.

By definition (55),

\tilde{E}_{N}(\rho_{X}f)=\frac{1}{\frac{m_{2}}{m_{0}^{2}}\epsilon}\frac{1}{N^{2}}\sum_{i,j=1}^{N}W_{i,j}\frac{(f(x_{i})-f(x_{j}))^{2}}{\frac{D_{i}}{N}\frac{D_{j}}{N}}.

The following lemma (proved in Appendix D) makes use of the concentration of $D_{i}/N$ to reduce the graph Dirichlet form to be a V-statistics up to a relative error at the form rate.

Lemma 6.4.

Under the good event in Lemma 6.1 1),

\tilde{E}_{N}(u)=\left(\frac{1}{m_{2}[h]\epsilon}\frac{1}{N^{2}}\sum_{i,j=1}^{N}W_{i,j}\frac{(u_{i}-u_{j})^{2}}{p(x_{i})p(x_{j})}\right)\left(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})\right),\quad\forall u\in\mathbb{R}^{N},

and the constant in big- $O$ is determined by $({\mathcal{M}},p)$ and uniform for all $u$ .

We consider under the good event in Lemma 6.1 1), which is called $E_{1}$ and happens w.p. $>1-2N^{-9}$ . Then applying Lemma 6.4 with $u=\rho_{X}f$ , we have that

\tilde{E}_{N}(\rho_{X}f)=\left\{\frac{1}{m_{2}\epsilon}\frac{1}{N^{2}}\sum_{i,j=1}^{N}W_{i,j}\frac{(f(x_{i})-f(x_{j}))^{2}}{p(x_{i})p(x_{j})}\right\}(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}))=:③(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}))

(56)

The term $③$ in (56) equals $\frac{1}{N^{2}}\sum_{i,j=1}^{N}V_{i,j}$ , where $V_{i,j}:=\frac{1}{m_{2}\epsilon}K_{\epsilon}(x_{i},x_{j})\frac{(f(x_{i})-f(x_{j}))^{2}}{p(x_{i})p(x_{j})}$ , and $V_{i,i}=0$ . We follow the same approach as in the proof of Theorem 3.4 in [9] to analyze this V-statistic, and show that (proof in Appendix D)

\{\text{ $③$ in \eqref{eq:form-pf-1} }\}=\langle f,-\Delta f\rangle+O_{f,p}(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}).

(57)

Back to (56), we have shown that under $E_{1}\cap E_{3}$ ,

	$\displaystyle\tilde{E}_{N}(\rho_{X}f)$	$\displaystyle=③(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}))=\left(\langle f,-\Delta f\rangle+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})\right)(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}))$
		$\displaystyle=\langle f,-\Delta f\rangle+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}),$

and the constant in big- $O$ depends on ${\mathcal{M}}$ , $f$ and $p$ . ∎

6.3 Eigen convergence of $\tilde{L}_{rw}$

In this subsection, let $\lambda_{k}$ be eigenvalues of $\tilde{L}_{rw}$ and $v_{k}$ the associated eigenvectors. By (55), recall that $\tilde{m}=\frac{m_{2}}{2m_{0}}$ , the analogue of (8) is the following

\lambda_{k}=\min_{L\subset\mathbb{R}^{N},\,dim(L)=k}\sup_{v\in L,v\neq 0}\frac{\frac{1}{\epsilon\tilde{m}}v^{T}(\tilde{D}-\tilde{W})v}{v^{T}\tilde{D}v}=\frac{\frac{1}{m_{0}}\tilde{E}_{N}(v)}{v^{T}\tilde{D}v},\quad 1\leq k\leq N.

(58)

The methodology is same as before, with a main difference in the definition of the heat interpolation mapping with weights $p(x_{j})$ as in (59). This gives to the $p$ -weighted quadratic form $\tilde{q}_{s}(u)$ defined in (60), for which we derive the concentration argument of for $\tilde{q}^{(0)}_{s}$ in (A.33) and the upper bound of $\tilde{q}^{(2)}_{s}$ in Lemma D.2. The other difference is that the $\tilde{D}$ -weighted 2-norm is considered because the eigenvectors are $\tilde{D}$ -orthogonal. All the proofs of the Steps 0-3 and Theorem 6.7 are left to Appendix D.

Step 0. We first establish eigenvalue UB based on Lemma 6.1 and the form convergence in Theorem 6.3.

Proposition 6.5 (Eigenvalue UB of $\tilde{L}_{rw}$ ).

Under Assumptions 1 and 2, for fixed $K\in\mathbb{N}$ , Suppose $0=\mu_{1}<\cdots<\mu_{K}<\infty$ are all of single multiplicity. If as $N\to\infty$ , $\epsilon\to 0+$ , and $\epsilon^{d/2}=\Omega(\frac{\log N}{N})$ , then for sufficiently large $N$ , w.p. $>1-4N^{-9}-4K^{2}N^{-10}$ , $\tilde{L}_{rw}$ is well-defined, and

\lambda_{k}\leq\mu_{k}+O\left(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}\right),\quad k=1,\cdots,K.

Step 1. Eigenvalue crude LB. We prove with the $p$ -weighted interpolation mapping defined as

\tilde{I}_{r}[u]=\frac{1}{N}\sum_{j=1}^{N}\frac{u_{j}}{p(x_{j})}H_{r}(x,x_{j})=I_{r}[\tilde{u}],\quad\tilde{u}_{i}=u_{i}/p(x_{i}).

(59)

Then, same as before, $\langle\tilde{I}_{r}[u],\tilde{I}_{r}[u]\rangle=q_{\delta\epsilon}(\tilde{u})$ , and $\langle\tilde{I}_{r}[u],Q_{t}\tilde{I}_{r}[u]\rangle=q_{\epsilon}(\tilde{u})$ , where for $s>0$ ,

\begin{split}\tilde{q}_{s}(u)&:=\frac{1}{N^{2}}\sum_{i,j=1}^{N}\frac{{H}_{s}(x_{i},x_{j})}{p(x_{i})p(x_{j})}u_{i}u_{j}=q_{s}(\tilde{u})=\tilde{q}^{(0)}_{s}(u)-\tilde{q}^{(2)}_{s}(u),\\ \tilde{q}^{(0)}_{s}(u)&:=\frac{1}{N}\sum_{i=1}^{N}u_{i}^{2}\left(\frac{1}{N}\sum_{j=1}^{N}\frac{{H}_{s}(x_{i},x_{j})}{p(x_{i})p(x_{j})}\right),\quad\tilde{q}^{(2)}_{s}(u):=\frac{1}{2N^{2}}\sum_{i,j=1}^{N}\frac{{H}_{s}(x_{i},x_{j})}{p(x_{i})p(x_{j})}(u_{i}-u_{j})^{2}.\end{split}

(60)

Proposition 6.6 (Initial crude eigenvalue LB of $\tilde{L}_{rw}$ ).

Under Assumptions 1, $h$ is Gaussian. For fixed $k_{max}\in\mathbb{N}$ , $K=k_{max}+1$ , and $\mu_{k}$ , $\epsilon$ and $N$ satisfy the same condition as in Proposition 4.1, where the definition of $c_{K}$ is the same except that $c$ is a constant depending on $({\mathcal{M}},p)$ . Then, for sufficiently large $N$ , w.p. $>1-4K^{2}N^{-10}-8N^{-9}$ , $\lambda_{k}>\mu_{k}-\gamma_{K}$ , for $k=2,\cdots,K$ .

Steps 2-3. We prove eigenvector consistency and refined eigenvalue convergence rate. Define

\|u\|_{\tilde{D}}^{2}:=\sum_{i=1}^{N}u_{i}^{2}\tilde{D}_{i},\quad\forall u\in\mathbb{R}^{N}.

(61)

The proof uses same techniques as before, and the differences is in handling the $\tilde{D}$ -orthogonality of the eigenvectors and using the concentration arguments in Lemma 6.1. Same as before, extension to when $\mu_{k}$ has greater than 1 multiplicity is possible (Remark 5).

Theorem 6.7 (eigen-convergence of $\tilde{L}_{rw}$ ).

Under the same condition and setting of ${\mathcal{M}}$ , $p$ being uniform, $h$ being Gaussian, and $k_{max}$ , $K$ , $\mu_{k}$ , $\epsilon$ same as in Theorem 5.4, where the definition of $c_{K}$ is the same except that $c$ is a constant depending on $({\mathcal{M}},p)$ . Consider first $k_{max}$ eigenvalues and eigenvectors of $\tilde{L}_{rw}$ , $\tilde{L}_{rw}v_{k}=\lambda_{k}v_{k}$ , and $v_{k}$ are normalized s.t. $N\|v_{k}\|_{\tilde{D}}^{2}=1$ . Define for $1\leq k\leq K$ ,

\tilde{\phi}_{k}:=\rho_{X}\left(\frac{1}{\sqrt{N}}\psi_{k}\right).

Then, for sufficiently large $N$ , w.p. $>1-4K^{2}N^{-10}-(4K+8)N^{-9}$ , $\|v_{k}\|_{2}=\Theta(1)$ , and the same bounds as in Theorem 5.4 hold for $|\mu_{k}-\lambda_{k}|$ and $\|v_{k}-\alpha_{k}\tilde{\phi}_{k}\|_{2}$ , for $1\leq k\leq k_{max}$ , with certain scalars $\alpha_{k}$ satisfying $|\alpha_{k}|=1+o(1)$ ,

7 Numerical experiments

In this section gives numerical results of point-wise convergence and eigen-convergence of graph Laplacians built from simulated manifold data. Codes are released at https://github.com/xycheng/eigconvergence_gaussian_kernel.

7.1 Eigen-convergence of $L_{rw}$

We test on two simulated datasets, which are uniformly sampled on $S^{1}$ (embedded in $\mathbb{R}^{4}$ , the formula is in Appendix A) and unit sphere $S^{2}$ (embedded in $\mathbb{R}^{3}$ ). For both datasets, we compute over an increasing number of samples $N=\{562,\cdots,1584\}$ and a range of values of $\epsilon$ , where the grid points of both $N$ and $\epsilon$ are evenly spaced in log scale. For each value of $N$ and $\epsilon$ , we generate $N$ data points, construct the kernelized matrix $W_{ij}=K_{\epsilon}(x_{i},x_{j})$ as defined in (1) with Gaussian $h$ , and compute the first 10 eigenvalues $\lambda_{k}$ and eigenvectors $v_{k}$ of $L_{rw}$ . The errors are computed by

\text{RelErr}_{\lambda}=\sum_{k=2}^{k_{max}}\frac{|\lambda_{k}-\mu_{k}|}{\mu_{k}},\quad\text{RelErr}_{v}=\sum_{k=2}^{k_{max}}\frac{\|v_{k}-\phi_{k}\|_{2}}{\|\phi_{k}\|_{2}},

(62)

where $\phi_{k}$ is as defined by (39). The experiment is repeated for 500 replicas from which the averaged empirical errors are computed. For the data on $S^{1}$ , $\epsilon=\{10^{-2.8},\cdots,10^{-4}\}$ . The manifold (in first 3 coordinates) is illustrated in Fig. 4(a) but the density is uniform here. See more details in Appendix A. For the data on $S^{2}$ , $\epsilon=\{10^{-0.2},\cdots,10^{-1.8}\}$ . These ranges are chosen so that the minimal error over $\epsilon$ for each $N$ are observed, at least for $\text{RelErr}_{\lambda}$ . Note that for $S^{1}$ , the population eigenvalues starting from $\mu_{2}$ are of multiplicity 2, and for $S^{2}$ , the multiplicities are 3, 5, $\cdots$ .

The results are shown in Figures 2 and 3. For data on $S^{1}$ , Fig. 2 (a) shows that $\text{RelErr}_{\lambda}$ as a function of $N$ (with post-selected best $\epsilon$ ) shows a convergence order of about $N^{-0.4}$ , which is consistent with the theoretical bound of $N^{-1/(d/2+2)}$ in Theorem 5.5, since $d=1$ here. In the left plot of colored field, the log error values are smoothed over the grid of $N$ and $\epsilon$ , and the best $\epsilon$ scales with $N$ as about $N^{-0.4}$ . The empirical scaling of optimal $\epsilon$ is less stable to observe: depending on the level of smoothing, the slope of $\log_{10}\epsilon$ varies between -0.2 and -0.5 (the left plot), while the slope for best (log) error is always about -0.4 (the right plot). The result without smoothing is shown in Fig. A.1. The eigenvector error in Fig. 2(b) shows an order of about $N^{-0.5}$ , which is better than the theoretical prediction. For the data on $S^{2}$ , the eigenvalue convergence shows an order of about $N^{-0.33}$ , in agreement with the theoretical rate of $N^{-1/(d/2+2)}$ when $d=2$ . The eigenvector error again shows an order of about $N^{-0.5}$ which is better than theory. The small error of eigenvector estimation at very large value of $\epsilon$ may be due to the symmetry of the simple manifolds $S^{1}$ and $S^{2}$ . In both experiments, the eigenvector estimation prefers a much larger value of $\epsilon$ than the eigenvalue estimation, which is consistent with the theory.

7.2 Density-corrected graph Laplacian

To examine the density-corrected graph Laplacian, we switch to non-uniform density on $S^{1}$ , illustrated in Fig. 4(a). We first investigate the point-wise convergence of $-\tilde{L}_{rw}f$ to $\Delta f$ , on a test function $f:S^{1}\to\mathbb{R}$ , see more details in Appendix A. The error is computed as

\text{RelErr}_{pt}=\frac{\|-\tilde{L}_{rw}\rho_{X}f-\rho_{X}(\Delta f)\|_{1}}{\|\rho_{X}(\Delta f)\|_{1}},

(63)

and the result is shown in Fig. 4. Theorem 6.2 predicts the bias error to be $O(\epsilon)$ and the variance error to be $O(\epsilon^{-d/4-1/2})=O(\epsilon^{-3/4})$ since $N$ is fixed, which agrees with Fig. 4(d).

The results of $\text{RelErr}_{\lambda}$ and $\text{RelErr}_{v}$ are shown in Fig. 5. The order of convergence with best $\epsilon$ appears to be about $N^{-0.8}$ for both eigenvalue and eigenvector errors, which is better than those of $L_{rw}$ (when $p$ is uniform) in Fig. 2, and better than the theoretical prediction in Theorem 6.7.

8 Discussion

The current result may be extended in several directions. First, for manifold with smooth boundary, the random-walk graph Laplacian recovers the Neumann Laplacian [10], and one can expect to prove the spectral convergence as well, such as in [22]. Second, extension to kernel with variable or adaptive bandwidth [5, 9], and other normalization schemes, e.g., bi-stochastic normalization [23, 20, 36], would be important to improve the robustness against low sampling density and noise in data, and even the spectral convergence as well. Related is the problem of spectral convergence to other manifold diffusion operators, e.g., the Fokker-Planck operator, on $L^{2}({\mathcal{M}},pdV)$ . It would also be interesting to extend to more general types of kernel function $h$ which is not Gaussian, and even not symmetric [37], for the spectral convergence. Relaxing the condition on the kernel bandwidth $\epsilon$ can also be useful: the optimal transport approach was able to show spectral consistency in the regime just beyond graph connectivity, namely when $\epsilon^{d/2}\gg\log N/N$ [7], which is less restrictive than the condition needed by Gaussian kernel in the current paper. Being able to extend the analysis to very sparse graph is important for applications. At last, further investigation is needed to explain the good spectral convergence observed in experiments, particularly that of the eigenvector convergence and the faster rate with density-corrected graph Laplacian. For the eigenvector convergence, the current work focuses on the 2-norm consistency, while the $\infty$ -norm consistency as has been derived in [11, 8] is also important to study.

Acknowledgement

The authors thank Hau-Tieng Wu for helpful discussion. Cheng thanks Yiping Lu for helpful discussion on the eigen-convergence problem. The work is supported by NSF DMS-2007040. XC is partially supported by NSF, NIH, and the Alfred P. Sloan Foundation.

References

[1] Donald Gary Aronson. Bounds for the fundamental solution of a parabolic equation. Bulletin of the American Mathematical Society, 73(6):890–896, 1967.
[2] Mukund Balasubramanian and Eric L Schwartz. The isomap algorithm and topological stability. Science, 295(5552):7–7, 2002.
[3] Mikhail Belkin and Partha Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 15(6):1373–1396, 2003.
[4] Mikhail Belkin and Partha Niyogi. Convergence of Laplacian eigenmaps. In Advances in Neural Information Processing Systems, pages 129–136, 2007.
[5] Tyrus Berry and John Harlim. Variable bandwidth diffusion kernels. Applied and Computational Harmonic Analysis, 40(1):68–96, 2016.
[6] Dmitri Burago, Sergei Ivanov, and Yaroslav Kurylev. A graph discretization of the Laplace-Beltrami operator. Journal of Spectral Theory, 4(4):675–714, 2014.
[7] Jeff Calder and Nicolas Garcia Trillos. Improved spectral convergence rates for graph Laplacians on $\epsilon$ -graphs and k-NN graphs. Applied and Computational Harmonic Analysis, 60:123–175, 2022.
[8] Jeff Calder, Nicolas Garcia Trillos, and Marta Lewicka. Lipschitz regularity of graph Laplacians on random data clouds. SIAM Journal on Mathematical Analysis, 54(1):1169–1222, 2022.
[9] Xiuyuan Cheng and Hau-Tieng Wu. Convergence of graph Laplacian with knn self-tuned kernels. Information and Inference: A Journal of the IMA, 2021.
[10] Ronald R Coifman and Stéphane Lafon. Diffusion maps. Applied and Computational Harmonic Analysis, 21(1):5–30, 2006.
[11] David B Dunson, Hau-Tieng Wu, and Nan Wu. Spectral convergence of graph Laplacian and heat kernel reconstruction in $L^{\infty}$ from random samples. Applied and Computational Harmonic Analysis, 55:282–336, 2021.
[12] Ahmed El Alaoui, Xiang Cheng, Aaditya Ramdas, Martin J Wainwright, and Michael I Jordan. Asymptotic behavior of $\ell_{p}$ -based Laplacian regularization in semi-supervised learning. In Conference on Learning Theory, pages 879–906, 2016.
[13] Noureddine El Karoui and Hau-Tieng Wu. Graph connection Laplacian methods can be made robust to noise. The Annals of Statistics, 44(1):346–372, 2016.
[14] Justin Eldridge, Mikhail Belkin, and Yusu Wang. Unperturbed: spectral analysis beyond Davis-Kahan. arXiv preprint arXiv:1706.06516, 2017.
[15] M Flores, J Calder, and G Lerman. Algorithms for lp-based semi-supervised learning on graphs. arXiv preprint arXiv:1901.05031, 2019.
[16] Alexander Grigor’yan. Gaussian upper bounds for the heat kernel on arbitrary manifolds. Journal of Differential Geometry, 45:33–52, 1997.
[17] Alexander Grigor’yan. Heat kernel and analysis on manifolds, volume 47. American Mathematical Society, Providence, RI, 2009.
[18] Matthias Hein. Uniform convergence of adaptive graph-based regularization. In International Conference on Computational Learning Theory, pages 50–64. Springer, 2006.
[19] Matthias Hein, Jean-Yves Audibert, and Ulrike Von Luxburg. From graphs to manifolds–weak and strong pointwise consistency of graph Laplacians. In International Conference on Computational Learning Theory, pages 470–485. Springer, 2005.
[20] Boris Landa, Ronald R Coifman, and Yuval Kluger. Doubly-stochastic normalization of the Gaussian kernel is robust to heteroskedastic noise. arXiv preprint arXiv:2006.00402, 2020.
[21] Peter Li, Shing Tung Yau, et al. On the parabolic kernel of the Schrödinger operator. Acta Mathematica, 156:153–201, 1986.
[22] Jinpeng Lu. Graph approximations to the Laplacian spectra. Journal of Topology and Analysis, pages 1–35, 2020.
[23] Nicholas F Marshall and Ronald R Coifman. Manifold learning with bi-stochastic kernels. IMA Journal of Applied Mathematics, 84(3):455–482, 2019.
[24] Boaz Nadler, Nathan Srebro, and Xueyuan Zhou. Semi-supervised learning with the graph Laplacian: The limit of infinite unlabelled data. Advances in Neural Information Processing Systems, 22:1330–1338, 2009.
[25] Steven Rosenberg. The Laplacian on a Riemannian manifold: An introduction to analysis on manifolds. Number 31. Cambridge University Press, 1997.
[26] Zuoqiang Shi. Convergence of Laplacian spectra from random samples. arXiv preprint arXiv:1507.00151, 2015.
[27] Amit Singer. From graph to manifold Laplacian: The convergence rate. Applied and Computational Harmonic Analysis, 21(1):128–134, 2006.
[28] Amit Singer and Hau-Tieng Wu. Spectral convergence of the connection Laplacian from random samples. Information and Inference: A Journal of the IMA, 6(1):58–123, 2016.
[29] Dejan Slepcev and Matthew Thorpe. Analysis of p-Laplacian regularization in semisupervised learning. SIAM Journal on Mathematical Analysis, 51(3):2085–2120, 2019.
[30] Ronen Talmon, Israel Cohen, Sharon Gannot, and Ronald R Coifman. Diffusion maps for signal processing: A deeper look at manifold-learning techniques based on kernels and graphs. IEEE signal processing magazine, 30(4):75–86, 2013.
[31] Daniel Ting, Ling Huang, and Michael Jordan. An analysis of the convergence of graph Laplacians. arXiv preprint arXiv:1101.5435, 2011.
[32] Nicolás García Trillos, Moritz Gerlach, Matthias Hein, and Dejan Slepčev. Error estimates for spectral convergence of the graph Laplacian on random geometric graphs toward the Laplace–Beltrami operator. Foundations of Computational Mathematics, 20(4):827–887, 2020.
[33] Laurens Van Der Maaten, Eric Postma, and Jaap Van den Herik. Dimensionality reduction: a comparative review. J Mach Learn Res, 10(66-71):13, 2009.
[34] Ulrike Von Luxburg, Mikhail Belkin, and Olivier Bousquet. Consistency of spectral clustering. The Annals of Statistics, pages 555–586, 2008.
[35] Xu Wang. Spectral convergence rate of graph Laplacian. arXiv preprint arXiv:1510.08110, 2015.
[36] Caroline L. Wormell and Sebastian Reich. Spectral convergence of diffusion maps: Improved error bounds and an alternative normalization. SIAM Journal on Numerical Analysis, 59(3):1687–1734, 2021.
[37] Hau-Tieng Wu and Nan Wu. Think globally, fit locally under the manifold setup: Asymptotic analysis of locally linear embedding. Annals of Statistics, 46(6B):3805–3837, 2018.

Appendix A Details of numerical experiments

In the example of $S^{1}$ data, the isometric embedding in $\mathbb{R}^{4}$ is by

\iota(t)=\frac{1}{2\pi\sqrt{5}}\left(\cos(2\pi t),\sin(2\pi t),\frac{2}{3}\cos(2\pi 3t),\frac{2}{3}\sin(2\pi 3t)\right),

where $t\in[0,1)$ is the intrinsic coordinate of $S^{1}$ (arc-length). In the example in Section. 7.2 where $p$ is not uniform, $p(t)=1+\frac{1}{2}\sin(2\pi 2t)+\frac{0.6}{2}\sin(2\pi 5t)$ , and the test function $f(t)=0.2\sin(4\pi t)-0.8\sin(4\pi 2t)$ . In the example of $S^{2}$ data, sample are on unit sphere in $\mathbb{R}^{3}$ .

In both plots of the raw error data without smoothing, Figures A.1 and A.2 the slope of error convergence rates (about -0.4 and - 0.33) are about the same. The slope of post-selected optimal (log) $\epsilon$ as a function of (log) $N$ changes, due to the closeness of the values over the multiple values of $\epsilon$ .

Appendix B More preliminaries

Throughout the paper, we use the following version of classical Bernstein inequality, where the tail probability uses $\nu>0$ which is an upper bound of the variance. We use the sub-Gaussian near-tail, which holds when the tempted deviation threshold $t<\frac{3\nu}{L}$ .

Lemma B.1 (Classical Bernstein).

Let $\xi_{j}$ be i.i.d. bounded random variables, $j=1,\cdots,N$ , $\mathbb{E}\xi_{j}=0$ . If $|\xi_{j}|\leq L$ and $\mathbb{E}\xi_{j}^{2}\leq\nu$ for $L,\nu>0$ , then

\Pr[\frac{1}{N}\sum_{j=1}^{N}\xi_{j}>t],\,\Pr[\frac{1}{N}\sum_{j=1}^{N}\xi_{j}<-t]\leq\exp\{-\frac{t^{2}N}{2(\nu+\frac{tL}{3})}\},\quad\forall t>0.

In particular, when $tL<3\nu$ , both the tail probabilities are bounded by $\exp\{-\frac{1}{4}\frac{Nt^{2}}{\nu}\}$ .

Additional proofs in Section 2:

Proof of Theorem 2.1.

Part 1): We provide a direct verification of (10) based on the parametrix construction for completeness, which is not explicitly included in [25].

First note that there is $t_{0}$ , determined by ${\mathcal{M}}$ s.t. when $t<t_{0}$ ,

\int_{{\mathcal{M}}}G_{t}(x,y)dV(y)=\int_{{\mathcal{M}}}G_{t}(y,x)dV(y)\leq C_{6},\quad\forall x\in{\mathcal{M}},

for some $C_{6}>0$ depending on ${\mathcal{M}}$ . This is because $\int_{{\mathcal{M}}}G_{t}(x,y)dV(y)$ up to an $O(t)$ truncation error equals the integral on $B_{t}:=\{y\in{\mathcal{M}},d_{{\mathcal{M}}}(x,y)<\delta_{t}:=\sqrt{(d/2+1)t\log\frac{1}{t}}\}$ . By change to the projected coordinate $u$ in $T_{x}({\mathcal{M}})$ , the integral domain of $u$ is contained in $1.1\delta_{t}$ -ball in $\mathbb{R}^{d}$ for small enough $\delta_{t}$ , then

	$\displaystyle\int_{B_{t}}G_{t}(x,y)dV(y)$	$\displaystyle=\frac{1}{(4\pi t)^{d/2}}\int_{B_{t}}e^{-\frac{d_{\mathcal{M}}(x,y)^{2}}{4t}}dV(y)\leq\frac{1}{(4\pi t)^{d/2}}\int_{u\in\mathbb{R}^{d},\,\\|u\\|<1.1\delta_{t}}e^{-\frac{0.9\\|u\\|^{2}}{4t}}(1+O(\delta_{t}^{2}))du$
		$\displaystyle\leq\Theta(1)(1+O(t\log{\frac{1}{t}}))=O(1).$

Next, as has been shown in Chapter 3 of [25], there exist $u_{l}\in C^{\infty}({\mathcal{M}}\times{\mathcal{M}})$ for $l=1,\cdots,m$ , $u_{0}$ satisfies the needed property, and we define $P_{m}(t,x,y)=G_{t}(x,y)\left(\sum_{l=0}^{m}t^{l}u_{l}(x,y)\right)$ , $P_{m}\in C^{\infty}((0,\infty),{\mathcal{M}}\times{\mathcal{M}})$ . By Theorem 3.22 of [25],

{H}_{t}(x,y)-P_{m}(t,x,y)=\int_{0}^{t}ds\int_{{\mathcal{M}}}Q_{m}(t-s,x,z)P_{m}(s,z,y)dV(z),

where by Lemma 3.18 of [25], there is $C_{7}(t_{0})$ and thus is determined by ${\mathcal{M}}$ s.t.

\sup_{x,y\in{\mathcal{M}}}|Q_{m}(s,x,y)|\leq C_{7}s^{m-d/2},\quad\forall 0\leq s\leq t_{0}.

As a result, for $t<t_{0}$ ,

	$\displaystyle\|{H}_{t}(x,y)-P_{m}(t,x,y)\|\leq\int_{0}^{t}ds\int_{{\mathcal{M}}}\|Q_{m}(t-s,x,z)\|G_{s}(z,y)\left\|\sum_{l=0}^{m}t^{l}u_{l}(z,y)\right\|dV(z)$
	$\displaystyle~{}~{}~{}\leq C_{7}t^{m-d/2}(\sum_{l=0}^{m}\\|u_{l}\\|_{\infty})\int_{0}^{t}ds\int_{{\mathcal{M}}}G_{s}(z,y)dV(z)$
	$\displaystyle~{}~{}~{}\leq C_{7}t^{m-d/2}(\sum_{l=0}^{m}\\|u_{l}\\|_{\infty})C_{6}t=O(t^{m-d/2+1}).$

Part 2) is a classical results proved in several places, see e.g. Theorem 1.1 in [16] combined with $\sup_{x\in{\mathcal{M}}}{H}_{t}(x,x)\leq C_{5}t^{-d/2}$ for some $C_{5}$ depending on manifold, which can be deduced from Part 1). The constant 5 in $5t$ in the exponential in (11) can be made any constant greater than 4, and the constant $C_{3}$ change accordingly. ∎

Proof of Lemma 2.2.

Let $m=\lceil\frac{d}{2}+3\rceil$ , $m$ is a positive integer $m-\frac{d}{2}\geq 3$ . Since $t\to 0$ , and $\delta_{t}=o(1)$ , the Euclidean ball of radius $\delta_{t}$ contains $\delta_{t}$ -geodesic ball and is contained ( $1.1\delta_{t}$ )-geodesic ball, for small enough $t$ . Then both claims in Theorem 2.1 hold when $t<\epsilon_{0}$ for some $\epsilon_{0}$ depending on ${\mathcal{M}}$ , and in 1) for $y\in B_{\delta_{t}}(x)\cap{\mathcal{M}}$ , $C_{2}t^{m-d/2+1}=O(t^{3})$ . Here by choosing larger $m$ can make the term of higher order of $t$ , yet $O(t^{3})$ is enough for our later analysis.

Proof of (12): We use the shorthand notation $\tilde{O}(t)$ to denote $O(t\log\frac{1}{t})$ . In Theorem 2.1, $m$ is fixed, $\|u_{l}\|_{\infty}$ for $l\leq m$ are finite constants depending on ${\mathcal{M}}$ , thus

{H}_{t}(x,y)=G_{t}(x,y)\left(u_{0}(x,y)+O(t)\right)+O(t^{3}).

Note that $d_{\mathcal{M}}(x,y)^{2}=\|x-y\|^{2}(1+O(\|x-y\|^{2}))$ , and thus when $y\in B_{\delta_{t}}(x)$ , $d_{\mathcal{M}}(x,y)^{2}=O(\|x-y\|^{2})=O(\delta_{t}^{2})=\tilde{O}(t)$ . By the property of $u_{0}$ ,

u_{0}(x,y)=1+O(d_{\mathcal{M}}(x,y)^{2})=1+\tilde{O}(t).

Meanwhile, by mean value theorem and that $d_{\mathcal{M}}(x,y)\geq\|x-y\|$ ,

e^{-\frac{d_{\mathcal{M}}(x,y)^{2}}{t}}=e^{-\frac{\|x-y\|^{2}(1+O(\|x-y\|^{2}))}{t}}=e^{-\frac{\|x-y\|^{2}}{t}}(1+O(\frac{\|x-y\|^{4}}{t})),

and then

G_{t}(x,y)=K_{t}(x,y)(1+O(\frac{\|x-y\|^{4}}{t}))=K_{t}(x,y)(1+{O}(t(\log\frac{1}{t})^{2})).

Thus, for any $y\in B_{\delta_{t}}(x)\cap{\mathcal{M}}$ ,

{H}_{t}(x,y)=K_{t}(x,y)(1+{O}(t(\log\frac{1}{t})^{2}))\left(1+\tilde{O}(t)+O(t)\right)+O(t^{3}),

which proves (12), and the constants in big- $O$ are all determined by ${\mathcal{M}}$ .

Proof of (13) and (14): When $y$ is outside the $\delta_{t}$ -Euclidean ball, it is outside the $\delta_{t}$ -geodesic ball. Then, by Theorem 2.1 2) and the definition of $\delta_{t}$ , ${H}_{t}(x,y)\leq C_{3}t^{-d/2}e^{-\frac{\delta_{t}^{2}}{5t}}\leq C_{3}t^{10}$ , which proves (13). (14) directly follows from (11). ∎

Appendix C Proofs about graph Laplacians with $W$

C.1 Proofs in Section 3

Proof of (15) in Remark 2.

We want to show that

\frac{1}{\epsilon}\int_{{\mathcal{M}}}\int_{{\mathcal{M}}}K_{\epsilon}(x,y)(f(x)-f(y))^{2}p(x)p(y)dV(x)dV(y)=m_{2}[h]\langle f,-\Delta_{p^{2}}f\rangle_{p^{2}}+O(\epsilon).

First consider when $p$ is uniform. Denote by $B_{r}(x)$ the Euclidean ball in $\mathbb{R}^{D}$ centered at $x$ with radius $r$ . When $y\in B_{\sqrt{\epsilon}}(x)\cap{\mathcal{M}}$ , $(f(x)-f(y))^{2}=(\nabla f(x)^{T}u)^{2}+Q_{x,3}(u)+O(\|u\|^{4})$ , where $u\in\mathbb{R}^{d}$ is the local projected coordinate, i.e., let $\phi_{x}$ be the projection onto $T_{x}({\mathcal{M}})$ , $u=\phi_{x}(y-x)$ , also $\|u\|\leq\|y-x\|<\sqrt{\epsilon}$ . $Q_{x,3}(\cdot)$ is a three-order polynomial where the coefficients depend on the derivatives of extrinsic coordinates of ${\mathcal{M}}$ and $f$ at $x$ . Then,

	$\displaystyle\frac{1}{\epsilon}\int_{{\mathcal{M}}}K_{\epsilon}(x,y)(f(x)-f(y))^{2}dV(y)=\int_{{\mathcal{M}}}\epsilon^{-d/2}h(\frac{\\|x-y\\|^{2}}{\epsilon})\frac{(f(x)-f(y))^{2}}{\epsilon}dV(y)$		(A.1)
	$\displaystyle=\epsilon^{-d/2}\int_{\tilde{B}}\left(\frac{(\nabla f(x)^{T}u)^{2}}{\epsilon}+\frac{Q_{x,3}(u)}{\epsilon}+O(\epsilon)\right)(1+O(\epsilon))du,\quad\tilde{B}:=\phi_{x}(B_{\sqrt{\epsilon}}(x)\cap{\mathcal{M}})$

and $\tilde{B}\subset B_{\sqrt{\epsilon}}(0;\mathbb{R}^{d})$ , where we used the volume comparison relation $dV(y)=(1+O(\|u\|^{2}))du$ . By the metric comparison, $\|y-x\|=\|u\|(1+O(\|u\|^{2}))$ , thus

\mathrm{Vol}(B_{\sqrt{\epsilon}}(0;\mathbb{R}^{d})\backslash\tilde{B})\leq\mathrm{Vol}(B_{\sqrt{\epsilon}}(0;\mathbb{R}^{d})\backslash B_{\sqrt{\epsilon}(1-O(\epsilon))}(0;\mathbb{R}^{d}))=\epsilon^{d/2}O(\epsilon).

Meanwhile, the integration of odd power of $u$ vanishes on $\int_{B_{\sqrt{\epsilon}}(0;\mathbb{R}^{d})}du$ . Thus one can verify that $\epsilon^{-d/2}\int_{\tilde{B}}\frac{(\nabla f(x)^{T}u)^{2}}{\epsilon}du=m_{2}[h]|\nabla f(x)|^{2}+O(\epsilon)$ , $\epsilon^{-d/2}\int_{\tilde{B}}\frac{Q_{x,3}(u)}{\epsilon}du=O(\epsilon^{3/2})$ , and thus the l.h.s. of (A.1) $=m_{2}[h]|\nabla f(x)|^{2}+O(\epsilon)$ . Integrating over $\int_{{\mathcal{M}}}dV(x)$ proves that the bias error is $O(\epsilon)$ . When $p$ is not uniform, one can similarly show that $\frac{1}{\epsilon}\int_{{\mathcal{M}}}K_{\epsilon}(x,y)(f(x)-f(y))^{2}p(y)dV(y)=m_{2}[h]|\nabla f(x)|^{2}p(x)+O(\epsilon)$ and the proof extends. ∎

Proof of Lemma 3.3.

Since $p$ is a constant, $\Delta_{p^{2}}=\Delta$ . Apply Theorem 3.2 to when $f=\psi_{k}$ , and $(\psi_{k}\pm\psi_{l})$ where $k\neq l$ , which are $K^{2}$ cases and are all in $C^{\infty}({\mathcal{M}})$ . Since the set $\{\psi_{k}\}_{k=1}^{K}$ is orthonormal in $L^{2}({\mathcal{M}},dV)$ ,

p^{-1}\langle\psi_{k},-\Delta\psi_{k}\rangle_{p^{2}}=p\mu_{k};\quad p^{-1}\langle\psi_{k}\pm\psi_{l},-\Delta(\psi_{k}\pm\psi_{l})\rangle_{p^{2}}=p(\mu_{k}+\mu_{l}),\quad k\neq l,1\leq k,l\leq K.

Under the intersection of the $K^{2}$ good events which happens with the indicated high probability, (16) holds. The needed threshold of $N$ is the max of the $K^{2}$ many ones. These thresholds and the constants in the big- $O$ ’s depend on $p$ and $\psi_{k}$ for $k$ up to $K$ , and $K$ is a fixed integer. This means that these constants are determined by ${\mathcal{M}}$ , and thus are treated as absolute ones. ∎

Proof of Lemma 3.4.

First, for any $f\in C({\mathcal{M}})$ , when $N>N_{f}$ depending on $f$ , w.p. $>1-2N^{-10}$ ,

\frac{1}{N}\|\rho_{X}f\|_{2}^{2}=\langle f,f\rangle_{p}+O_{f}(\sqrt{\frac{\log N}{N}}).

(A.2)

This is because, by definition, $\frac{1}{N}\|\rho_{X}f\|_{2}^{2}=\frac{1}{N}\sum_{j=1}^{N}f(x_{i})^{2}$ , which is independent sum of r.v. $Y_{j}:=f(x_{i})^{2}$ . $\mathbb{E}Y_{j}=\int_{{\mathcal{M}}}f(y)^{2}pdV(y)=\langle f,f\rangle_{p}$ , and boundedness $|Y_{j}|\leq L_{Y}:=\|f\|_{\infty,{\mathcal{M}}}^{2}$ which is $O_{f}(1)$ constant. The variance of $Y_{j}$ is bounded by $\mathbb{E}Y_{j}^{2}=\int_{{\mathcal{M}}}f(y)^{4}pdV(y):=\nu_{Y}$ , which again is $O_{f}(1)$ constant. Since $\log N/N=o(1)$ , (A.2) follows by the classical Bernstein.

Now consider the $K$ vectors $u_{k}=\frac{1}{\sqrt{p}}\rho_{X}\psi_{k}$ . Apply (A.2) to when $f=\frac{1}{\sqrt{p}}\psi_{k}$ and $\frac{1}{\sqrt{p}}(\psi_{k}\pm\psi_{l})$ for $k\neq l$ , and consider the intersection of the $K^{2}$ good events, which happens w.p. $>1-2K^{2}N^{-10}$ , when $N$ exceeds the maximum thresholds of $N$ for the $K^{2}$ cases. By $\langle\psi_{k},\psi_{l}\rangle_{p}=p\delta_{kl}$ , and the the polar formula $4u_{k}^{T}u_{l}=\|u_{k}+u_{l}\|^{2}-\|u_{k}-u_{l}\|^{2}$ , this gives (17). Both the $K^{2}$ thresholds and all the constants in big-O in (17) depend on $\{\psi_{k}\}_{k=1}^{K}$ . ∎

Proof of Lemma 3.5.

Suppose Part 1) has been shown with uniform constant in big- $O$ for each $i$ , then under the good event of Part 2), Part 2) holds automatically. In particular, since (19) is a property of the random r.v. $W_{ij}$ only, where $W_{ij}$ are determined by the random points $x_{i}$ and irrelevant to the vector $u$ , the threshold of large $N$ is determined by when Part 1) holds and is uniform for all $u$ .

It suffices to prove Part 1) to finish proving the lemma. For each $i$ , we construct an event under which the bound in (19) holds for $D_{i}$ , and then apply a union bound. For $i$ fixed,

\frac{1}{N}D_{i}=\frac{1}{N}K_{\epsilon}(x_{i},x_{i})+\frac{1}{N}\sum_{j\neq i}K_{\epsilon}(x_{i},x_{j})=:①+②.

By Assumption 2(C2), $K_{\epsilon}(x_{i},x_{i})=\epsilon^{-d/2}h(0)\leq\Theta(\epsilon^{-d/2})$ . and thus $①=O(N^{-1}\epsilon^{-d/2})$ . Consider $②^{\prime}:=\frac{1}{N-1}\sum_{j\neq i}K_{\epsilon}(x_{i},x_{j})$ , which is an independent sum condition on $x_{i}$ and over the randomness of $\{x_{j}\}_{j\neq i}$ . The $(N-1)$ r.v.

Y_{j}:=K_{\epsilon}(x_{i},x_{j}),\quad j\neq i,

satisfies that (Lemma 8 in [10], Lemma A.3 in [9])

\mathbb{E}Y_{j}=\int_{\mathcal{M}}K_{\epsilon}(x_{i},y)pdV(y)=pm_{0}+O(\epsilon).

Boundedness: again by Assumption 2(C2), $|Y_{j}|\leq L_{Y}=\Theta(\epsilon^{-d/2})$ . Variance of $Y_{j}$ is bounded by

\displaystyle\mathbb{E}Y_{j}^{2}

\displaystyle=\int_{\mathcal{M}}K_{\epsilon}(x_{i},y)^{2}pdV(y)=p\int_{\mathcal{M}}\epsilon^{-d}h^{2}(\frac{\|x_{i}-y\|^{2}}{\epsilon})dV(y),

where since $h^{2}(r)$ as a function on $[0,\infty)$ also satisfies Assumption 2,

\mathbb{E}Y_{j}^{2}=\epsilon^{-d/2}p(m_{0}[h^{2}]+O(\epsilon))\leq\nu_{Y}=\Theta(\epsilon^{-d/2}).

The constants in the big- $\Theta$ notation of $L_{Y}$ and $\nu_{Y}$ are absolute ones depending on ${\mathcal{M}}$ and do not depend on $x_{i}$ . Since $\sqrt{\frac{\log N}{N\epsilon^{d/2}}}=o(1)$ , the classical Bernstein gives that when $N$ is sufficiently large w.p. $>1-2N^{-10}$ ,

|②^{\prime}-\mathbb{E}Y_{j}|=O(\sqrt{\nu_{Y}\frac{\log N}{N}})=O(\sqrt{\frac{\log N}{N\epsilon^{d/2}}})\quad|\text{ condition on }x_{i}.

Under this event, $②^{\prime}=O(1)$ , and then $②=(1-\frac{1}{N})②^{\prime}$ gives that

②=m_{0}p+O(\epsilon)+O(\sqrt{\frac{\log N}{N\epsilon^{d/2}}})+O(\frac{1}{N})=m_{0}p+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}),

and then

\frac{1}{N}D_{i}=O(N^{-1}\epsilon^{-d/2})+m_{0}p+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})=m_{0}p+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}).

By that $x_{i}$ is independent from $\{x_{j}\}_{j\neq i}$ , and that the bound is uniform for all location of $x_{i}$ , we have that w.p. $>1-2N^{-10}$ , the bound in (19) for $i$ , and applying union bound to the $N$ events proves Part 1). ∎

Proof of Proposition 3.6.

Under the condition of the current proposition, Lemma 3.5 applies. For fixed $K$ , take the intersection of the good events in Lemma 3.5, 3.4 and 3.3, which happens w.p. $>1-4K^{2}N^{-10}-2N^{-9}$ for large enough $N$ . Same as before, let $u_{k}=\frac{1}{\sqrt{p}}\rho_{X}\psi_{k}$ , and by 3.4, the set $\{u_{1},\cdots,u_{K}\}$ is linearly independent. Let $L=\text{Span}\{u_{1},\cdots,u_{k}\}$ , then $dim(L)=k$ for each $k\leq K$ . For any $v\in L$ , $v\neq 0$ , there are $c_{j}$ , $1\leq j\leq k$ , such that $v=\sum_{j=1}^{k}c_{j}u_{j}$ . Again, by (17), we have $\frac{1}{N}\|v\|^{2}=\|c\|^{2}(1+O(\sqrt{\frac{\log N}{N}}))$ , and together with Lemma 3.5 2),

	$\displaystyle\frac{1}{m_{0}}\frac{1}{N^{2}}v^{T}Dv$	$\displaystyle=\frac{1}{N}\\|v\\|^{2}(p+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}))=\\|c\\|^{2}(1+O(\sqrt{\frac{\log N}{N}}))(p+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}))$
		$\displaystyle=\\|c\\|^{2}p(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})),$		(A.3)

and the constant in $O(\cdot)$ is uniform for all $v$ . For $E_{N}(v)$ , (18) still holds, and by that $K$ is fixed it gives

\displaystyle E_{N}(v)

\displaystyle\leq\|c\|^{2}\left(p\mu_{k}+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})\right).

Together with (A.3), we have that

\frac{E_{N}(v)}{\frac{1}{m_{0}}\frac{1}{N^{2}}v^{T}Dv}\leq\frac{p\mu_{k}+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})}{p(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}))}=\mu_{k}+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}),

and the r.h.s. upper bounds $\lambda_{k}(L_{rw})$ by (8). ∎

C.2 Proofs in Section 4

Proof of (25) in Lemma 4.2.

Suppose $s$ is small enough such that Lemma 2.2 holds with $\epsilon$ being $s$ here. For each $i$ , we construct an event under which the bound in (25) holds for $(D_{s})_{i}$ , and then apply a union bound. For $i$ fixed,

(D_{s})_{i}=\frac{1}{N}{H}_{s}(x_{i},x_{i})+\frac{1}{N}\sum_{j\neq i}{H}_{s}(x_{i},x_{j})=:①+②.

By (14), ${H}_{s}(x_{i},x_{i})=O(s^{-d/2})$ , and thus $①=O(N^{-1}s^{-d/2})$ . Consider $②^{\prime}:=\frac{1}{N-1}\sum_{j\neq i}{H}_{s}(x_{i},x_{j})$ , which is an independent sum condition on $x_{i}$ and over the randomness of $\{x_{j}\}_{j\neq i}$ . The $(N-1)$ r.v. $Y_{j}:={H}_{s}(x_{i},x_{j})$ , $j\neq i$ , satisfies that $\mathbb{E}Y_{j}=\int_{\mathcal{M}}{H}_{s}(x_{i},y)pdV(y)=p$ , and boundedness: again by (14), $|Y_{j}|\leq L_{Y}=\Theta(s^{-d/2})$ . Variance of $Y_{j}$ is bounded by $\mathbb{E}Y_{j}^{2}=\int_{\mathcal{M}}{H}_{s}(x_{i},y)^{2}pdV(y)=p{H}_{2s}(x_{i},x_{i})\leq\nu_{Y}=\Theta(s^{-d/2})$ . The constants in the big- $\Theta$ notation of $L_{Y}$ and $\nu_{Y}$ are from (14) which only depend on ${\mathcal{M}}$ and not on $x_{i}$ . We use the notation $O_{{\mathcal{M}}}(\cdot)$ to stress this. Since $\sqrt{\frac{\log N}{Ns^{d/2}}}=o(1)$ , the classical Bernstein gives that with sufficiently large $N$ , w.p. $>1-2N^{-10}$ ,

|②^{\prime}-p|=O(\sqrt{\nu_{Y}\frac{\log N}{N}})=O_{\mathcal{M}}(\sqrt{\frac{\log N}{Ns^{d/2}}})\quad|\text{ condition on }x_{i}.

The rest of the proof is the same as that of Lemma 3.5 1), namely, by that $②=(1-\frac{1}{N})②^{\prime}$ , one can verify that both $②$ and then $(D_{s})_{i}$ equals $p+O_{{\mathcal{M}}}(\sqrt{\frac{\log N}{Ns^{d/2}}})$ w.p. $>1-2N^{-10}$ , and then (25) follows from applying union bound to the $N$ events. ∎

Proof of Proposition 4.4.

The proof is by the same method as that of Proposition 4.1, and the difference is that the eigenvectors are $D$ -orthogonal here and normalized differently. Denote $\lambda_{k}(L_{rw})$ as $\lambda_{k}$ , and let $L_{rw}v_{k}=\lambda_{k}v_{k}$ , normalized s.t.

\frac{1}{N^{2}}v_{k}^{T}Dv_{l}=\delta_{kl},\quad 1\leq k,l\leq N.

Note that this normalization of $v_{k}$ differs from what is used in the final eigen-convergence rate result, Theorem 5.5, because the current proposition concerns eigenvalue only.

Because $\epsilon^{d/2+2}>c_{K}\frac{\log N}{N}$ , $\epsilon^{d/2}=\Omega(\frac{\log N}{N})$ , then the conditions needed in Proposition 3.6 are satisfied. Thus, with sufficiently large $N$ , there is an event $E_{UB}^{\prime}$ which happens w.p. $>1-2N^{-9}-4K^{2}N^{-10}$ , under which $D_{i}>0$ for all $i$ s.t. $L_{rw}$ is well-defined, and (32) holds for $\lambda_{k}=\lambda_{k}(L_{rw})$ . Because the good event $E_{UB}^{\prime}$ in Proposition 3.6 assumes the good event in Lemma 3.5, then (20) also holds for all the $v_{k}$ and $v_{k}\pm v_{l}$ , which gives that ( $m_{0}=1$ because $h$ is Gaussian)

\begin{split}1=\frac{1}{N^{2}}v_{k}^{T}Dv_{k}&=\frac{1}{N}\|v_{k}\|^{2}(p+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})),\quad 1\leq k\leq K,\\ 2=\frac{1}{N^{2}}(v_{k}\pm v_{l})^{T}D(v_{k}\pm v_{l})&=\frac{1}{N}\|v_{k}\pm v_{l}\|^{2}(p+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}))\quad k\neq l,1\leq k,l\leq K,\end{split}

and, equivalently (because $p>0$ is a constant)

\begin{split}\frac{1}{N}\|v_{k}\|^{2}&=\frac{1}{p}(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})),\quad 1\leq k\leq K,\\ \frac{1}{N}\|v_{k}\pm v_{l}\|^{2}&=\frac{1}{p}(2+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})),\quad k\neq l,1\leq k,l\leq K.\end{split}

(A.4)

We set $\delta$ , $r$ , $t$ , in the same way, and let $f_{k}=I_{r}[v_{k}]$ , $f_{k}\in C^{\infty}({\mathcal{M}})$ . Because the good event $E^{(0)}$ only concerns randomness of $H_{\delta\epsilon}(x_{i},x_{j})$ , under $E^{(0)}$ which happens w.p. $>1-2N^{-9}$ ,

\begin{split}q^{(0)}_{\delta\epsilon}(v_{k})&=\frac{1}{N}\|v_{k}\|^{2}(p+O(\sqrt{\frac{\log N}{N\epsilon^{d/2}}}))=1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}),\quad 1\leq k\leq K,\\ q^{(0)}_{\delta\epsilon}(v_{k}\pm v_{l})&=\frac{1}{N}\|v_{k}\pm v_{l}\|^{2}(p+O(\sqrt{\frac{\log N}{N\epsilon^{d/2}}}))=2+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}),\quad k\neq l,1\leq k,l\leq K.\end{split}

(A.5)

Next, note that since $(D-W)v_{k}=\tilde{m}\epsilon\lambda_{k}Dv_{k}$ , and with Gaussian $h$ , $\tilde{m}=1$ , and $v_{k}$ are $D$ -orthogonal,

\begin{split}&~{}~{}~{}\frac{v_{k}^{T}(D-W)v_{k}}{N^{2}}=\epsilon\lambda_{k}\frac{1}{N^{2}}v_{k}^{T}Dv_{k}=\epsilon\lambda_{k},\quad 1\leq k\leq K,\\ &\frac{(v_{k}\pm v_{l})^{T}(D-W)(v_{k}\pm v_{l})}{N^{2}}=\epsilon(\lambda_{k}+\lambda_{l}),\quad k\neq l,1\leq k,l\leq K.\end{split}

(A.6)

Then, (27) in Lemma 4.3 where $\alpha=\delta$ gives that

\begin{split}q^{(2)}_{\delta\epsilon}(v_{k})&=O(\delta^{-d/2})\epsilon\lambda_{k}+O(\epsilon^{3}),\quad 1\leq k\leq K,\\ q^{(2)}_{\delta\epsilon}(v_{k}\pm v_{l})&=O(\delta^{-d/2})\epsilon(\lambda_{k}+\lambda_{l})+2O(\epsilon^{3}),\quad k\neq l,\,1\leq k,l\leq K,\end{split}

then same as in (33), they are both $O(\epsilon)$ . Together with (A.5), this gives that

\begin{split}\langle f_{k},f_{k}\rangle&=1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})+O(\epsilon),\quad 1\leq k\leq K,\\ \langle f_{k},f_{l}\rangle&=\frac{1}{4}(q_{\delta\epsilon}(v_{k}+v_{l})-q_{\delta\epsilon}(v_{k}-v_{l}))=O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})+O(\epsilon),\quad k\neq l,\,1\leq k,l\leq K.\end{split}

(A.7)

Then due to that $O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})=o(1)$ , we have linear independence of $\{f_{j}\}_{j=1}^{K}$ with large enough $N$ .

Again, we let $L_{k}=\text{Span}\{f_{1},\cdots,f_{k}\}$ , and have (35). For any $f\in L_{k}$ , $f=\sum_{j=1}^{k}c_{j}f_{j}$ , $f=I_{r}[v]$ , $v:=\sum_{j=1}^{k}c_{j}v_{j}$ ,

\frac{1}{N^{2}}v^{T}Dv=\sum_{j=1}^{k}c_{j}^{2}\frac{1}{N^{2}}v_{j}^{T}Dv_{j}=\|c\|^{2},

and, by that Lemma 3.5 2) holds, (20) applies to $v$ to give $\frac{1}{N^{2}}v^{T}Dv=\frac{1}{N}\|v\|^{2}(p+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}))$ , thus

\frac{1}{N}\|v\|^{2}=\frac{\|c\|^{2}}{p}(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})).

(A.8)

Meanwhile, by (A.6),

\frac{v^{T}(D-W)v}{N^{2}}=\sum_{j=1}^{k}c_{j}^{2}\frac{v_{j}^{T}(D-W)v_{j}}{N^{2}}=\sum_{j=1}^{k}c_{j}^{2}\epsilon\lambda_{j}\leq\epsilon\lambda_{k}\|c\|^{2}.

(A.9)

With the good event $E^{(1)}$ same as before (Lemma 4.2 at $s=\epsilon$ ), under $E^{(0)}\cap E^{(1)}$ , and the $O_{\mathcal{M}}(\cdot)$ notation means that the constant depends on ${\mathcal{M}}$ only and not on $K$ ,

q^{(0)}_{\epsilon}(v)=\frac{1}{N}\|v\|^{2}(p+O_{\mathcal{M}}(\sqrt{\frac{\log N}{N\epsilon^{d/2}}})),\quad q^{(0)}_{\delta\epsilon}(v)=\frac{1}{N}\|v\|^{2}(p+O_{\mathcal{M}}(\sqrt{\delta^{-d/2}\frac{\log N}{N\epsilon^{d/2}}})),

(A.10)

and then, again,

	$\displaystyle q^{(0)}_{\delta\epsilon}(v)-q^{(0)}_{\epsilon}(v)$	$\displaystyle=\frac{1}{N}\\|v\\|^{2}O_{\mathcal{M}}(\delta^{-d/4}\sqrt{\frac{\log N}{N\epsilon^{d/2}}})=\frac{\\|c\\|^{2}}{p}(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}))O_{\mathcal{M}}(\delta^{-d/4}\sqrt{\frac{\log N}{N\epsilon^{d/2}}})$
		$\displaystyle=\\|c\\|^{2}O_{\mathcal{M}}(\delta^{-d/4}\sqrt{\frac{\log N}{N\epsilon^{d/2}}}),$

where we used (A.8) to substitute the $\frac{1}{N}\|v\|^{2}$ term after the leading $\frac{1}{N}\|v\|^{2}p$ term is canceled in the subtraction. The UB of $q^{(2)}_{\epsilon}(v)$ is similar as before, namely, by (26) in Lemma 4.3, inserting (A.9), and with the shorthand that $\tilde{O}(\epsilon)$ stands for ${O}(\epsilon(\log\frac{1}{\epsilon})^{2})$ ,

q^{(2)}_{\epsilon}(v)=\frac{v^{T}(D-W)v}{N^{2}}(1+\tilde{O}(\epsilon))+\|c\|^{2}O(\epsilon^{3})\leq\epsilon\|c\|^{2}(\lambda_{k}(1+\tilde{O}(\epsilon))+O(\epsilon^{2})).

Thus we have that

$\displaystyle\langle f,f\rangle-\langle f,Q_{t}f\rangle$	$\displaystyle\leq(q^{(0)}_{\delta\epsilon}(v)-q^{(0)}_{\epsilon}(v))+q^{(2)}_{\epsilon}(v)$
	$\displaystyle\leq\epsilon\\|c\\|^{2}\left(\lambda_{k}(1+\tilde{O}(\epsilon))+O(\epsilon^{2})+\delta^{-d/4}O_{\mathcal{M}}(\frac{1}{\epsilon}\sqrt{\frac{\log N}{N\epsilon^{d/2}}})\right)$
	$\displaystyle=\epsilon\\|c\\|^{2}\left(\lambda_{k}+\tilde{O}(\epsilon)+\delta^{-d/4}O_{\mathcal{M}}(\frac{1}{\epsilon}\sqrt{\frac{\log N}{N\epsilon^{d/2}}})\right).\quad\text{(by $\lambda_{k}\leq 1.1\mu_{K}$)}$	(A.11)

To lower bound $\langle f,f\rangle$ , again by (27) in Lemma 4.3, inserting (A.9),

0\leq q^{(2)}_{\delta\epsilon}(v)\leq\Theta(\delta^{-d/2})\frac{v^{T}(D-W)v}{N^{2}}+\|c\|^{2}O(\epsilon^{3})\leq\epsilon\|c\|^{2}\left(\lambda_{k}\Theta(\delta^{-d/2})+O(\epsilon^{2})\right),

and then since $\lambda_{k}\Theta(\delta^{-d/2})+O(\epsilon^{2})=O(1)$ , we again have that $q^{(2)}_{\delta\epsilon}(v)=\|c\|^{2}O(\epsilon)$ . We have derived formula of $q_{\delta\epsilon}^{(0)}(v)$ in (A.10) under $E^{(0)}\cap E^{(1)}$ , and inserting (A.8),

q^{(0)}_{\delta\epsilon}(v)=\frac{1}{N}\|v\|^{2}(p+O(\sqrt{\frac{\log N}{N\epsilon^{d/2}}}))=\|c\|^{2}(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})).

(A.12)

Thus,

\langle f,f\rangle=q^{(0)}_{\delta\epsilon}(v)-q^{(2)}_{\delta\epsilon}(v)=\|c\|^{2}\left(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})-O(\epsilon)\right)\geq\|c\|^{2}\left(1-O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})\right).

Together with (A.11), this gives

\frac{\langle f,f\rangle-\langle f,Q_{t}f\rangle}{\langle f,f\rangle}\leq\frac{\epsilon\left(\lambda_{k}+\tilde{O}(\epsilon)+\delta^{-d/4}O_{\mathcal{M}}(\frac{1}{\epsilon}\sqrt{\frac{\log N}{N\epsilon^{d/2}}})\right)}{1-O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})}\leq\epsilon\left(\lambda_{k}+\tilde{O}(\epsilon)+\frac{C}{\epsilon}\sqrt{\frac{\log N}{N\epsilon^{d/2}}}\right),

where the notation of $C$ is defined in the same way as in the proof of Proposition 4.1. The rest of the proof is the same, and the intersection of all the needed good events $E^{(0)}$ , $E^{(1)}$ , and $E_{UB}^{\prime}$ , which happens w.p. $>1-2N^{-9}-4K^{2}N^{-10}-4N^{-9}$ . ∎

C.3 Proofs in Section 5

Proof of Theorem 5.5.

With sufficiently large $N$ , we restrict to the intersection of the good events in Proposition 4.4 and the $K=k_{max}+1$ good events of applying Theorem 5.1 1) to $\{\psi_{k}\}_{k=1}^{K}$ , which happens w.p. $>1-4K^{2}N^{-10}-(6+4K)N^{-9}$ . The good event in Proposition 4.4 is contained in the good event $E_{UB}^{\prime}$ of Proposition 3.6 of the eigenvalue UB, which is again contained in the good event of Lemma 3.5. As a result, $D_{i}>0$ for all $i$ , and thus $L_{rw}$ is well-defined, and (20) holds.

Applying (20) to $u=v_{k}$ , and because $\|v_{k}\|_{D/N}^{2}=p$ , we have that ( $m_{0}=1$ due to that $h$ is Gaussian)

p=\|v_{k}\|_{\frac{D}{N}}^{2}=p\|v_{k}\|_{2}^{2}(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})),\quad 1\leq k\leq K.

(A.13)

This verifies that $\|v_{k}\|_{2}^{2}=1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})=1+o(1)$ , for $1\leq k\leq K$ .

Because the good event $E_{UB}^{\prime}$ is under that in Lemma 3.4, $\|\phi_{k}\|_{2}^{2}=1+O(\sqrt{\frac{\log N}{N}})$ , $1\leq k\leq K$ , and then, applying (20) to $u=\phi_{k}$ ,

\|\phi_{k}\|_{\frac{D}{N}}^{2}=p\|\phi_{k}\|^{2}(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}))=p(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})),\quad 1\leq k\leq K.

(A.14)

Step 2. for $L_{rw}$ : We follow a similar approach as in Proposition 5.2. When $k=1$ , $\lambda_{1}=0$ , and $v_{1}$ is always the constant vector, thus the discrepancy is zero. Consider $2\leq k\leq K$ , by Theorem 5.1 1), and that $\|u\|_{2}\leq\sqrt{N}\|u\|_{\infty}$ for any $u\in\mathbb{R}^{N}$ ,

\|L_{rw}\phi_{k}-\mu_{k}\phi_{k}\|_{2}=O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2+1}}}),\quad 2\leq k\leq K,

(A.15)

and then by (20) which holds uniformly for all $u\in\mathbb{R}^{N}$ ,

\|L_{rw}\phi_{k}-\mu_{k}\phi_{k}\|_{\frac{D}{N}}=\|L_{rw}\phi_{k}-\mu_{k}\phi_{k}\|_{2}\sqrt{p}(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}))=O(\|L_{rw}\phi_{k}-\mu_{k}\phi_{k}\|_{2}).

Thus, there is $\text{Err}_{pt}>0$ , s.t.

\|L_{rw}\phi_{k}-\mu_{k}\phi_{k}\|_{\frac{D}{N}}\leq\text{Err}_{pt},\quad 2\leq k\leq K,\quad\text{Err}_{pt}=O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2+1}}}).

(A.16)

The constant in big- $O$ depends on first $K$ eigenfunctions, and is an absolute one because $K$ is fixed. Next, same as in the proof of Proposition 5.2, under the good event of Proposition 4.4 and by the definition of $\gamma_{K}$ as the maximum (half) eigen-gap among $\{\mu_{k}\}_{1\leq k\leq K}$ , (41) holds for $\lambda_{k}$ .

Let $S_{k}=\text{Span}\{(\frac{D}{N})^{1/2}v_{k}\}$ , $S_{k}$ is a 1-dimensional subspace in $\mathbb{R}^{N}$ . Because $v_{j}$ ’s are $D$ -orthogonal, $S_{k}^{\perp}=\text{Span}\{(\frac{D}{N})^{1/2}v_{j},\,j\neq k,1\leq j\leq N\}$ . Note that

P_{S_{k}^{\perp}}\left((\frac{D}{N})^{1/2}\mu_{k}\phi_{k}\right)=(\frac{D}{N})^{1/2}\sum_{j\neq k,j=1}^{N}\frac{v_{j}^{T}(\frac{D}{N})\phi_{k}}{\|v_{j}\|_{\frac{D}{N}}^{2}}\mu_{k}v_{j},

(A.17)

and because

L_{rw}^{T}Dv_{j}=\frac{1}{\epsilon}(I-WD^{-1})Dv_{j}=\frac{1}{\epsilon}(D-W)v_{j}=D\lambda_{j}v_{j},

(A.18)

	$\displaystyle P_{S_{k}^{\perp}}\left((\frac{D}{N})^{1/2}L_{rw}\phi_{k}\right)$	$\displaystyle=(\frac{D}{N})^{1/2}\sum_{j\neq k,j=1}^{N}\frac{v_{j}^{T}(\frac{D}{N})L_{rw}\phi_{k}}{\\|v_{j}\\|_{\frac{D}{N}}^{2}}v_{j}=(\frac{D}{N})^{1/2}\sum_{j\neq k,j=1}^{N}\frac{\frac{1}{N}(L_{rw}^{T}Dv_{j})^{T}\phi_{k}}{\\|v_{j}\\|_{\frac{D}{N}}^{2}}v_{j}$
		$\displaystyle=(\frac{D}{N})^{1/2}\sum_{j\neq k,j=1}^{N}\frac{\frac{1}{N}(Dv_{j})^{T}\phi_{k}}{\\|v_{j}\\|_{\frac{D}{N}}^{2}}\lambda_{j}v_{j}.$		(A.19)

Subtracting (A.17) and (A.19) gives

P_{S_{k}^{\perp}}\left((\frac{D}{N})^{1/2}(L_{rw}\phi_{k}-\mu_{k}\phi_{k})\right)=\sum_{j\neq k,j=1}^{N}(\lambda_{j}-\mu_{k})\frac{v_{j}^{T}\frac{D}{N}\phi_{k}}{\|v_{j}\|_{\frac{D}{N}}^{2}}(\frac{D}{N})^{1/2}v_{j},

and by that $v_{j}$ are $D$ -orthogonal, and (41),

\|P_{S_{k}^{\perp}}\left((\frac{D}{N})^{1/2}(L_{rw}\phi_{k}-\mu_{k}\phi_{k})\right)\|_{2}^{2}=\sum_{j\neq k,j=1}^{N}|\lambda_{j}-\mu_{k}|^{2}\frac{|v_{j}^{T}\frac{D}{N}\phi_{k}|^{2}}{\|v_{j}\|_{\frac{D}{N}}^{2}}\geq\gamma_{K}^{2}\sum_{j\neq k,j=1}^{N}\frac{|v_{j}^{T}\frac{D}{N}\phi_{k}|^{2}}{\|v_{j}\|_{\frac{D}{N}}^{2}}.

The square-root of the l.h.s.

\|P_{S_{k}^{\perp}}\left((\frac{D}{N})^{1/2}(L_{rw}\phi_{k}-\mu_{k}\phi_{k})\right)\|_{2}\leq\|(\frac{D}{N})^{1/2}(L_{rw}\phi_{k}-\mu_{k}\phi_{k})\|_{2}=\|L_{rw}\phi_{k}-\mu_{k}\phi_{k}\|_{\frac{D}{N}}\leq\text{Err}_{pt},

and the last inequality is by (A.16). This gives that

\left(\sum_{j\neq k,j=1}^{N}\frac{|v_{j}^{T}\frac{D}{N}\phi_{k}|^{2}}{\|v_{j}\|_{\frac{D}{N}}^{2}}\right)^{1/2}\leq\frac{\text{Err}_{pt}}{\gamma_{K}}.

Meanwhile, $P_{S_{k}^{\perp}}\left((\frac{D}{N})^{1/2}\phi_{k}\right)=\sum_{j\neq k,j=1}^{N}\frac{v_{j}^{T}(\frac{D}{N})\phi_{k}}{\|v_{j}\|_{\frac{D}{N}}^{2}}(\frac{D}{N})^{1/2}v_{j}$ , and by $D$ -orthogonality of $v_{j}$ again, $\sum_{j\neq k,j=1}^{N}\frac{|v_{j}^{T}\frac{D}{N}\phi_{k}|^{2}}{\|v_{j}\|_{\frac{D}{N}}^{2}}=\|P_{S_{k}^{\perp}}\left((\frac{D}{N})^{1/2}\phi_{k}\right)\|_{2}^{2}$ . Thus,

\|P_{S_{k}^{\perp}}\left((\frac{D}{N})^{1/2}\phi_{k}\right)\|_{2}=\left(\sum_{j\neq k,j=1}^{N}\frac{|v_{j}^{T}\frac{D}{N}\phi_{k}|^{2}}{\|v_{j}\|_{\frac{D}{N}}^{2}}\right)^{1/2}\leq\frac{\text{Err}_{pt}}{\gamma_{K}}=O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2+1}}}).

(A.20)

Finally, define

\beta_{k}:=\frac{v_{k}^{T}(\frac{D}{N})\phi_{k}}{\|v_{k}\|_{\frac{D}{N}}^{2}},\quad\beta_{k}(\frac{D}{N})^{1/2}v_{k}=P_{S_{k}}(\frac{D}{N})^{1/2}\phi_{k},

P_{S_{k}^{\perp}}\left((\frac{D}{N})^{1/2}\phi_{k}\right)=(\frac{D}{N})^{1/2}\phi_{k}-P_{S_{k}}(\frac{D}{N})^{1/2}\phi_{k}=(\frac{D}{N})^{1/2}\left(\phi_{k}-\beta_{k}v_{k}\right),

and then, together with (A.20),

\|\phi_{k}-\beta_{k}v_{k}\|_{\frac{D}{N}}=\|P_{S_{k}^{\perp}}\left((\frac{D}{N})^{1/2}\phi_{k}\right)\|_{2}=O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2+1}}}).

Applying (20) to $u=\phi_{k}-\beta_{k}v_{k}$ , $\|\phi_{k}-\beta_{k}v_{k}\|_{2}=(\frac{1}{p}(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})))^{1/2}\|\phi_{k}-\beta_{k}v_{k}\|_{\frac{D}{N}}=O(\|\phi_{k}-\beta_{k}v_{k}\|_{\frac{D}{N}})$ , and we have shown that

\|\phi_{k}-\beta_{k}v_{k}\|_{2}=O(\|\phi_{k}-\beta_{k}v_{k}\|_{\frac{D}{N}})=O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2+1}}}).

To finish Step 2, it remains to show that $|\beta_{k}|=1+o(1)$ , and then we define $\alpha_{k}=\frac{1}{\beta_{k}}$ . By definition of $\beta_{k}$ ,

\displaystyle\|\phi_{k}\|_{\frac{D}{N}}^{2}

\displaystyle=\|(\frac{D}{N})^{1/2}\phi_{k}\|_{2}^{2}=\|P_{S_{k}^{\perp}}\left((\frac{D}{N})^{1/2}\phi_{k}\right)\|_{2}^{2}+\|\beta_{k}(\frac{D}{N})^{1/2}v_{k}\|_{2}^{2}=\|P_{S_{k}^{\perp}}\left((\frac{D}{N})^{1/2}\phi_{k}\right)\|_{2}^{2}+\beta_{k}^{2}\|v_{k}\|_{\frac{D}{N}}^{2},

by that $\|v_{k}\|_{\frac{D}{N}}^{2}=p$ , and (A.14), and (A.20), this gives $p(1+o(1))=o(1)+\beta_{k}^{2}p$ , and thus $\beta_{k}^{2}=1+o(1)$ .

Step 3. of ${L}_{rw}$ : For $2\leq k\leq k_{max}$ , by the relation (A.18),

v_{k}^{T}D(L_{rw}\phi_{k}-\mu_{k}\phi_{k})=(L_{rw}^{T}Dv_{k})^{T}\phi_{k}-\mu_{k}v_{k}^{T}D\phi_{k}=(\lambda_{k}-\mu_{k})v_{k}^{T}D\phi_{k},

and we have shown that

v_{k}=\alpha_{k}\phi_{k}+\varepsilon_{k},\quad\alpha_{k}=1+o(1),\quad\|\varepsilon_{k}\|_{\frac{D}{N}}=O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2+1}}}).

Similar as in the proof of Proposition 5.3,

	$\displaystyle\|\lambda_{k}-\mu_{k}\|\|v_{k}^{T}\frac{D}{N}\phi_{k}\|=\|v_{k}^{T}\frac{D}{N}(L_{rw}\phi_{k}-\mu_{k}\phi_{k})\|=\|(\alpha_{k}\phi_{k}+\varepsilon_{k})^{T}\frac{D}{N}(L_{rw}\phi_{k}-\mu_{k}\phi_{k})\|$
	$\displaystyle~{}~{}~{}\leq\|\alpha_{k}\|\|\phi_{k}^{T}\frac{D}{N}L_{rw}\phi_{k}-\mu_{k}\\|\phi_{k}\\|^{2}_{\frac{D}{N}}\|+\|\varepsilon_{k}^{T}\frac{D}{N}(L_{rw}\phi_{k}-\mu_{k}\phi_{k})\|=:①+②.$

By (A.14), $\|\phi_{k}\|_{\frac{D}{N}}^{2}=p(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}))$ , and meanwhile, $\phi_{k}^{T}\frac{D}{N}L_{rw}\phi_{k}=\frac{1}{p}E_{N}(\rho_{X}\psi_{k})=p\mu_{k}+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})$ by (16). Thus $①=O(|\phi_{k}^{T}\frac{D}{N}L_{rw}\phi_{k}-\mu_{k}\|\phi_{k}\|^{2}_{\frac{D}{N}}|)=O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})$ . By (A.16) and the bound of $\varepsilon_{k}$ , $|②|\leq\|\varepsilon_{k}\|_{\frac{D}{N}}\|L_{rw}\phi_{k}-\mu_{k}\phi_{k}\|_{\frac{D}{N}}=O(\text{Err}_{pt}^{2})$ which is $O(\epsilon)$ as shown in the proof of Proposition 5.3. Finally, by the definition of $\beta_{k}$ , and that $\|v_{k}\|_{\frac{D}{N}}^{2}=p$ ,

|\lambda_{k}-\mu_{k}||\beta_{k}|\leq\frac{|①|+|②|}{\|v_{k}\|_{\frac{D}{N}}^{2}}=\frac{O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})+O(\epsilon)}{{p}}=O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}).

Since $|\beta_{k}|=1+o(1)$ , this proves the bound of $|\lambda_{k}-\mu_{k}|$ , and the argument for all $k\leq k_{max}$ . ∎

Appendix D Proofs about the density-corrected graph Laplacian with $\tilde{W}$

D.1 Proofs of the point-wise convergence of $\tilde{L}_{rw}$

Proof of Lemma 6.1.

Part 1): By that $\frac{1}{N}D_{i}=\frac{1}{N}(Y_{i}+\sum_{j\neq i}^{N}Y_{j})$ , $Y_{j}:=K_{\epsilon}(x_{i},x_{j})$ . For $j\neq i$ , $Y_{j}$ has expectation (Lemma 8 in [10], Lemma A.3 in [9])

\int_{{\mathcal{M}}}K_{\epsilon}(x_{i},y)p(y)dV(y)=m_{0}p(x_{i})+\frac{m_{2}}{2}\epsilon(\omega p(x_{i})+\Delta p(x_{i}))+O_{p}(\epsilon^{2}),

where $\omega\in C^{\infty}({\mathcal{M}})$ is determined by manifold extrinsic coordinates; Meanwhile, $K_{\epsilon}(x_{i},x_{i})=\epsilon^{-d/2}h(0)=O(\epsilon^{-d/2})$ ; In the independent sum $\frac{1}{N-1}\sum_{j\neq i}Y_{j}$ , $|Y_{j}|$ is bounded by $\Theta(\epsilon^{-d/2})$ and has variance bounded by $\Theta(\epsilon^{-d/2})$ . The rest of the proof is the same as in proving Lemma 3.5 1).

Part 2): By part 1), under a good event $E_{1}$ , which happens w.p. $>1-2N^{-9}$ , (47) holds. Because $p(x)\geq p_{min}>0$ for any $x\in{\mathcal{M}}$ , we then have

\frac{1}{N}D_{i}=m_{0}p(x_{i})(1+\varepsilon^{(D)}_{i}),\quad\sup_{1\leq i\leq N}|\varepsilon^{(D)}_{i}|=O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}).

(A.21)

Since $O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})=o(1)$ , with large enough $N$ and under $E_{1}$ , $D_{i}>0$ , then $\tilde{W}$ is well-defined. Furtherly, by (A.21),

	$\displaystyle\frac{1}{N}\sum_{j=1}^{N}W_{ij}\frac{1}{\frac{1}{N}D_{j}}=\frac{1}{N}\sum_{j=1}^{N}\frac{W_{ij}}{m_{0}p(x_{j})(1+\varepsilon^{(D)}_{j})}$
	$\displaystyle~{}~{}~{}=\left(\frac{1}{m_{0}}\frac{1}{N}\sum_{j=1}^{N}W_{ij}\frac{1}{p(x_{j})}\right)\left(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})\right).\quad\text{(by that $p>0$, $W_{ij}\geq 0$)}$

Consider the r.v. $Y_{j}=K_{\epsilon}(x_{i},x_{j})p^{-1}(x_{j})$ (condition on $x_{i}$ ), for $j\neq i$ ,

\mathbb{E}Y_{j}=\int_{{\mathcal{M}}}K_{\epsilon}(x_{i},y)p^{-1}(y)p(y)dV(y)=\int_{{\mathcal{M}}}K_{\epsilon}(x_{i},y)dV(y)=m_{0}+O(\epsilon),

$Y_{j}$ is bounded by $\Theta(\epsilon^{-d/2})$ and so is its variance, where the constants in big- $\Theta$ depend on $p$ . Then, similar as in proving (47), we have a good event $E_{2}$ which happens w.p. $>1-2N^{-9}$ , under which

\frac{1}{m_{0}}\frac{1}{N}\sum_{j=1}^{N}W_{ij}\frac{1}{p(x_{j})}=1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}),\quad 1\leq i\leq N,

(A.22)

and the constant in big- $O$ depends on $p$ , the function $h$ , and is uniform for all $x_{i}$ . Then under $E_{1}\cap E_{2}$ ,

\sum_{j=1}^{N}W_{ij}\frac{1}{D_{j}}=\left(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})\right)\left(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})\right)=1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}),

which proves (48). Meanwhile, combining (48) and (A.21),

N\tilde{D}_{i}=\frac{N}{D_{i}}\sum_{j=1}^{N}\frac{W_{ij}}{D_{j}}=\frac{1}{m_{0}p(x_{i})(1+\varepsilon^{(D)}_{i})}(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}))=\frac{1}{m_{0}p(x_{i})}(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})),

(A.23)

and thus under $E_{1}\cap E_{2}$ , with large $N$ , $\tilde{D}_{i}>0$ and $\tilde{L}_{rw}$ is well-defined. ∎

D.2 Proofs of the Dirichlet form convergence

Proof of Lemma 6.4.

As has been shown in the proof of Lemma 6.1, under the good event in Lemma 6.1 1), (47) and then (A.21) hold. Notation of $\varepsilon^{(D)}_{i}$ as in (A.21), and omitting $h$ in the notations $m_{2}$ , $m_{0}$ , we have that

	$\displaystyle\tilde{E}_{N}(u)$	$\displaystyle=\frac{1}{\frac{m_{2}}{m_{0}^{2}}\epsilon}\frac{1}{N^{2}}\sum_{i,j=1}^{N}W_{i,j}\frac{(u_{i}-u_{j})^{2}}{\frac{D_{i}}{N}\frac{D_{j}}{N}}$
		$\displaystyle=\frac{1}{m_{2}\epsilon}\frac{1}{N^{2}}\sum_{i,j=1}^{N}W_{i,j}\frac{(u_{i}-u_{j})^{2}}{p(x_{i})p(x_{j})(1+\varepsilon^{(D)}_{i})(1+\varepsilon^{(D)}_{j})}$
		$\displaystyle=\frac{1}{m_{2}\epsilon}\frac{1}{N^{2}}\sum_{i,j=1}^{N}W_{i,j}\frac{(u_{i}-u_{j})^{2}}{p(x_{i})p(x_{j})}(1+\varepsilon_{ij}),\quad\varepsilon_{ij}=O(\varepsilon^{(D)}_{i},\varepsilon^{(D)}_{j})$
		$\displaystyle=\left(\frac{1}{m_{2}\epsilon}\frac{1}{N^{2}}\sum_{i,j=1}^{N}W_{i,j}\frac{(u_{i}-u_{j})^{2}}{p(x_{i})p(x_{j})}\right)(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})),$

where the last row uses the non-negativity of $W_{i,j}\frac{(u_{i}-u_{j})^{2}}{p(x_{i})p(x_{j})}$ . ∎

Proof of (57) in the proof of Theorem 6.3:

Proof.

Proof of (57) : By definition, for $i\neq j$ ,

	$\displaystyle\mathbb{E}V_{i,j}$	$\displaystyle=\frac{1}{m_{2}\epsilon}\int_{\mathcal{M}}\int_{\mathcal{M}}K_{\epsilon}(x,y)(f(x)-f(y))^{2}dV(x)dV(y)$
		$\displaystyle=\frac{2}{m_{2}\epsilon}\int_{\mathcal{M}}f(x)\left(\int_{\mathcal{M}}K_{\epsilon}(x,y)(f(x)-f(y))dV(y)\right)dV(x)$

By Lemma A.3 in [9], $\int_{\mathcal{M}}K_{\epsilon}(x,y)(f(x)-f(y))dV(y)=-\epsilon\frac{m_{2}}{2}\Delta f(x)+O_{f}(\epsilon^{2})$ , and thus,

\mathbb{E}V_{i,j}=\langle f,-\Delta f\rangle+O_{f}(\epsilon).

Meanwhile, by that $p\geq p_{min}>0$ , $0\leq V_{ij}\leq\Theta_{p}(1)\frac{1}{m_{2}\epsilon}K_{\epsilon}(x_{i},x_{j})(f(x_{i})-f(x_{j}))^{2}$ , and then by the boundedness and variance calculation in the proof of Theorem 3.4 of [9], one can verify that, with constants depending on $(f,p)$ ,

|V_{ij}|\leq L=\Theta(\epsilon^{-d/2}),\quad\mathbb{E}V_{ij}^{2}\leq\nu=\Theta(\epsilon^{-d/2}).

Then, by the same decoupling argument to derive the concentration of V-statistics, under good event $E_{3}$ which happens w.p. $>1-2N^{-10}$ ,

\frac{1}{N(N-1)}\sum_{i\neq j,i,j=1}^{N}V_{ij}=\mathbb{E}V_{ij}+O_{f,p}(\sqrt{\frac{\log N}{N\epsilon^{d/2}}}).

As a result,

\text{ ③ in \eqref{eq:form-pf-1} }=(1-\frac{1}{N})\frac{1}{N(N-1)}\sum_{i\neq j,i,j=1}^{N}V_{ij}=(1-\frac{1}{N})\left(\langle f,-\Delta f\rangle+O_{f}(\epsilon)+O_{f,p}(\sqrt{\frac{\log N}{N\epsilon^{d/2}}})\right),

which proves (57) because $O(\frac{1}{N})$ is higher order than $O(\sqrt{\frac{\log N}{N\epsilon^{d/2}}})$ . ∎

D.3 Proofs of the eigen-convergence of $\tilde{L}_{rw}$

Proof of Proposition 6.5.

The proof is similar to that of Proposition 3.6. We first restrict to the good events $E_{1}\cap E_{2}$ in Lemma 6.1, which happens w.p. $>1-4N^{-9}$ , under which $\tilde{W}$ and $\tilde{L}_{rw}$ are well-defined, and (47) and (48) hold.

Let $u_{k}=\rho_{X}\psi_{k}$ . The following lemma, proved in below, shows the near $\tilde{D}$ -orthonormal of the vectors $u_{k}$ and is an analogue of Lemma 3.4.

Lemma D.1.

Under the same assumption of Lemma 6.1, when $N$ is sufficiently large, w.p. $>1-4N^{-9}-2K^{2}N^{-10}$ ,

\begin{split}\|\rho_{X}{\psi}_{k}\|_{\tilde{D}}^{2}&=\frac{1}{m_{0}}(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})),\quad 1\leq k\leq K;\\ (\rho_{X}{\psi}_{k})^{T}\tilde{D}(\rho_{X}{\psi}_{l})&=O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}),\quad k\neq l,\,1\leq k,l\leq K.\end{split}

(A.24)

Under the good event of Lemma D.1, called $E_{5}\subset E_{1}\cap E_{2}$ , $\tilde{D}_{i}>0$ for all $i$ , and with large enough $N$ , the set $\{\tilde{D}^{1/2}u_{k}\}_{k=1}^{K}$ is linearly independent, and then so is the set $\{u_{k}\}_{k=1}^{K}$ . Let $L=\text{Span}\{u_{1},\cdots,u_{k}\}$ , then $dim(L)=k$ for each $k\leq K$ . For any $v\in L$ , $v\neq 0$ , there are $c_{j}$ , $1\leq j\leq k$ , such that $v=\sum_{j=1}^{k}c_{j}u_{j}$ . By (A.24), we have

m_{0}\|v\|_{\tilde{D}}^{2}=\|c\|^{2}(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})).

(A.25)

Meanwhile, by defining $\tilde{B}_{N}(u,v):=\frac{1}{4}(\tilde{E}_{N}(u+v)-\tilde{E}_{N}(u-v))$ , similarly as in Lemma 3.3, applying Theorem 6.3 to the $K^{2}$ cases where $f=\psi_{k}$ and $(\psi_{k}\pm\psi_{l})$ gives that, under a good event $E_{6}$ which happens w.p. $>1-2K^{2}N^{-10}$ ,

\begin{split}\tilde{E}_{N}(\rho_{X}\psi_{k})&=\mu_{k}+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}),\quad k=1,\cdots,K,\\ \tilde{B}_{N}(\rho_{X}\psi_{k},\rho_{X}\psi_{l})&=O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}),\quad k\neq l,\,1\leq k,l\leq K.\end{split}

(A.26)

Then, similar as in (18),

	$\displaystyle\tilde{E}_{N}(v)$	$\displaystyle=\sum_{j,l=1}^{k}c_{j}c_{k}\tilde{B}_{N}(u_{j},u_{k})=\sum_{j=1}^{k}c_{j}^{2}\left(\mu_{j}+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})\right)+\sum_{j\neq l,j,l=1}^{k}\|c_{j}\|\|c_{l}\|O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})$
		$\displaystyle=\sum_{j=1}^{k}\mu_{j}c_{j}^{2}+\\|c\\|^{2}KO(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})\leq\\|c\\|^{2}\left(\mu_{k}+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})\right).$		(A.27)

Back to the r.h.s. of (58), together with (A.25), we have that

\frac{\frac{1}{m_{0}}\tilde{E}_{N}(v)}{v^{T}\tilde{D}v}\leq\frac{\mu_{k}+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})}{1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})}=\mu_{k}+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}),

(A.28)

and thus provides an UB of $\lambda_{k}$ . The bound holds for all the $1\leq k\leq K$ , under good events $E_{5}\cap E_{6}$ . ∎

Proof of Lemma D.1.

Restrict to the good events $E_{1}\cap E_{2}$ in Lemma 6.1, which happens w.p. $>1-4N^{-9}$ , under which $\tilde{W}$ and $\tilde{L}_{rw}$ are well-defined, and (A.23) holds. Then,

\|\rho_{X}{\psi}_{k}\|_{\tilde{D}}^{2}=\frac{1}{N}\sum_{i=1}^{N}\frac{{\psi}_{k}(x_{i})^{2}}{m_{0}p(x_{i})}(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}))=\frac{\|\rho_{X}(p^{-1/2}\psi_{k})\|^{2}}{Nm_{0}}(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})),\quad 1\leq k\leq K,

\|\rho_{X}({\psi}_{k}\pm{\psi}_{l})\|_{\tilde{D}}^{2}=\frac{\|\rho_{X}(p^{-1/2}({\psi}_{k}\pm{\psi}_{l}))\|^{2}}{Nm_{0}}(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})),\quad k\neq l,1\leq k,l\leq K.

Apply (A.2) to when $f=p^{-1/2}{\psi}_{k}$ and $p^{-1/2}({\psi}_{k}\pm{\psi}_{l})$ for $k\neq l$ , and recall that $\langle\psi_{k},\psi_{l}\rangle=\delta_{kl}$ , we have

\frac{1}{N}\|\rho_{X}(p^{-1/2}{\psi}_{k})\|^{2}=1+O(\sqrt{\frac{\log N}{N}}),\quad\frac{1}{N}\|\rho_{X}(p^{-1/2}({\psi}_{k}\pm\psi_{l}))\|^{2}=2+O(\sqrt{\frac{\log N}{N}}),

under a good event which happens w.p. $>1-2K^{2}N^{-10}$ with large enough $N$ , and then

	$\displaystyle\\|\rho_{X}{\psi}_{k}\\|_{\tilde{D}}^{2}$	$\displaystyle=\frac{1}{m_{0}}(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})),\quad 1\leq k\leq K,$
	$\displaystyle\\|\rho_{X}({\psi}_{k}\pm{\psi}_{l})\\|_{\tilde{D}}^{2}$	$\displaystyle=\frac{2}{m_{0}}(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})),\quad k\neq l,1\leq k,l\leq K,$

which proves (A.24). ∎

Proof of Proposition 6.6.

The proof follows the same strategy of proving Proposition 4.4, where we introduce weights by by $p(x_{i})$ in the heat kernel interpolation map when constructing candidate eigenfunctions from eigenvectors.

We restrict to the good event $E_{UB}^{\prime\prime}$ in Proposition 6.5, which is contained in $E_{1}\cap E_{2}$ in Lemma 6.1. Under $E_{UB}^{\prime\prime}$ , $D_{i}>0$ , $\tilde{D}_{i}>0$ , and $\tilde{L}_{rw}$ is well-defined, and, with sufficiently large $N$ , $\lambda_{k}\leq\lambda_{K}\leq 1.1\mu_{K}=O(1)$ . Let $\tilde{L}_{rw}v_{k}=\lambda_{k}v_{k}$ , normalized s.t.

v_{k}^{T}\tilde{D}v_{l}=\delta_{kl},\quad 1\leq k,l\leq N.

Note that always $\lambda_{1}=0$ . Under $E_{1}\cap E_{2}$ , (A.23) holds, and thus

m_{0}\|u\|_{\tilde{D}}^{2}=\frac{m_{0}}{N}\sum_{i=1}^{N}u_{i}^{2}(N\tilde{D}_{i})=\left(\frac{1}{N}\sum_{i=1}^{N}\frac{u_{i}^{2}}{p(x_{i})}\right)(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})),\quad\forall u\in\mathbb{R}^{N},

(A.29)

and the constant in big- $O$ is determined by $({\mathcal{M}},p)$ and uniform for all $u$ . Define the notation

\|u\|_{p^{-1}}^{2}:=\frac{1}{N}\sum_{i=1}^{N}\frac{u_{i}^{2}}{p(x_{i})},\quad\forall u\in\mathbb{R}^{N}.

(A.30)

Taking $u$ to be $v_{k}$ and $(v_{k}\pm v_{l})$ gives that

\begin{split}m_{0}&=\|v_{k}\|_{p^{-1}}^{2}(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})),\quad 1\leq k\leq K,\\ 2m_{0}&=\|v_{k}\pm v_{l}\|_{p^{-1}}^{2}(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})),\quad k\neq l,1\leq k,l\leq K.\end{split}

(A.31)

Set $\delta$ , $r$ , $t$ in the same way as in the proof of Proposition 4.4, and define $\tilde{I}_{r}[u]$ as in (59). We have $\langle\tilde{I}_{r}[u],\tilde{I}_{r}[u]\rangle=q_{\delta\epsilon}(\tilde{u})$ , $\langle\tilde{I}_{r}[u],Q_{t}\tilde{I}_{r}[u]\rangle=q_{\epsilon}(\tilde{u})$ , and (60) for $s>0$ . Next, similar as in the proof of Lemma 4.2, one can show that with large $N$ and w.p. $>1-2N^{-9}$ ,

\frac{1}{N}\sum_{j=1}^{N}\frac{{H}_{s}(x_{i},x_{j})}{p(x_{i})p(x_{j})}=\frac{1}{p(x_{i})}(1+O_{{\mathcal{M}},p}(\sqrt{\frac{\log N}{Ns^{d/2}}})),\quad 1\leq i\leq N,

(A.32)

where the notation $O_{{\mathcal{M}},p}(\cdot)$ indicates that the constant depends on $({\mathcal{M}},p)$ and is uniform for all $x_{i}$ . Applying (A.32) to $s=\delta\epsilon$ gives that, under a good event $E_{(0)}^{\prime}$ , which happens w.p. $>1-2N^{-9}$ ,

	$\displaystyle\tilde{q}^{(0)}_{\delta\epsilon}(u)$	$\displaystyle=\frac{1}{N}\sum_{i=1}^{N}\frac{u_{i}^{2}}{p(x_{i})}(1+O_{{\mathcal{M}},p}(\delta^{-d/4}\sqrt{\frac{\log N}{N\epsilon^{d/2}}}))$
		$\displaystyle=\\|u\\|_{p^{-1}}^{2}(1+O_{{\mathcal{M}},p}(\delta^{-d/4}\sqrt{\frac{\log N}{N\epsilon^{d/2}}})),\quad\forall u\in\mathbb{R}^{N}.$		(A.33)

Applying (A.32) to $s=\epsilon$ gives the good event $E_{(1)}^{\prime}$ , which happens w.p. $>1-2N^{-9}$ , under which

\tilde{q}^{(0)}_{\epsilon}(u)=\|u\|_{p^{-1}}^{2}(1+O_{{\mathcal{M}},p}(\sqrt{\frac{\log N}{N\epsilon^{d/2}}})),\quad\forall u\in\mathbb{R}^{N}.

(A.34)

The constants in big- $O$ in (A.33) and (A.34) are determined by $({\mathcal{M}},p)$ only and uniform for all $u$ .

We also need an analogue of Lemma 4.3 to upper bound $\tilde{q}^{(2)}_{s}$ , proved in below. The proof follows same method of Lemma 4.3, and makes use of the uniform boundedness of $p$ from below, and Lemma 6.4.

Lemma D.2.

Under Assumptions 1, $h$ being Gaussian, let $0<\alpha<1$ be a fixed constant. Suppose $\epsilon=o(1)$ , $\epsilon^{d/2}=\Omega(\frac{\log N}{N})$ , then with sufficiently large $N$ , and under the good event $E_{1}$ of Lemma 6.1 1),

0\leq\tilde{q}^{(2)}_{\epsilon}(u)=\left(1+O\left(\epsilon(\log\frac{1}{\epsilon})^{2},\sqrt{\frac{\log N}{N\epsilon^{d/2}}}\right)\right)(u^{T}(\tilde{D}-\tilde{W})u)+\|u\|_{p^{-1}}^{2}O(\epsilon^{3}),\quad\forall u\in\mathbb{R}^{N},

(A.35)

and

0\leq\tilde{q}^{(2)}_{\alpha\epsilon}(u)\leq 1.1\alpha^{-d/2}(u^{T}(\tilde{D}-\tilde{W})u)+\|u\|_{p^{-1}}^{2}O(\epsilon^{3}),\quad\forall u\in\mathbb{R}^{N}.

(A.36)

The constants in big- $O$ only depend on $({\mathcal{M}},p)$ and are uniform for all $u$ and $\alpha$ .

We proceed to define $f_{k}=\tilde{I}_{r}[v_{k}]$ , $f_{k}\in C^{\infty}({\mathcal{M}})$ . Next, note that since $(I-\tilde{D}^{-1}\tilde{W})v_{k}=\epsilon\lambda_{k}v_{k}$ , and $v_{k}$ are $\tilde{D}$ -orthonormal, then

\begin{split}v_{k}^{T}(\tilde{D}-\tilde{W})v_{k}&=\epsilon\lambda_{k}v_{k}^{T}\tilde{D}v_{k}=\epsilon\lambda_{k},,\quad 1\leq k\leq K,\\ (v_{k}\pm v_{l})^{T}(\tilde{D}-\tilde{W})(v_{k}\pm v_{l})&=\epsilon(\lambda_{k}+\lambda_{l}),\quad k\neq l,1\leq k,l\leq K.\end{split}

(A.37)

Taking $\alpha=\delta$ in Lemma D.2, (A.36) then gives

\begin{split}\tilde{q}^{(2)}_{\delta\epsilon}(v_{k})&=O(\delta^{-d/2})\epsilon\lambda_{k}+O(\epsilon^{3}),\quad 1\leq k\leq K,\\ \tilde{q}^{(2)}_{\delta\epsilon}(v_{k}\pm v_{l})&=O(\delta^{-d/2})\epsilon(\lambda_{k}+\lambda_{l})+2O(\epsilon^{3}),\quad k\neq l,1\leq k,l\leq K,\end{split}

and both are $O(\epsilon)$ . Meanwhile, (A.33)and (A.31) give that (with that $\delta>0$ is a fixed constant determined by $K$ and $-\Delta$ )

\begin{split}\tilde{q}^{(0)}_{\delta\epsilon}(v_{k})&=\|v_{k}\|_{p^{-1}}^{2}(1+O(\sqrt{\frac{\log N}{N\epsilon^{d/2}}}))=m_{0}(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})),\quad 1\leq k\leq K,\\ \tilde{q}^{(0)}_{\delta\epsilon}(v_{k}\pm v_{l})&=\|v_{k}\pm v_{l}\|_{p^{-1}}^{2}(1+O(\sqrt{\frac{\log N}{N\epsilon^{d/2}}}))=2m_{0}(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})),\quad k\neq l,\,1\leq k,l\leq K.\end{split}

(A.38)

Putting together with the bounds of $q_{\delta\epsilon}^{(2)}$ , this gives that

\begin{split}\langle f_{k},f_{k}\rangle&=\tilde{q}_{\delta\epsilon}^{(0)}(v_{k})-\tilde{q}_{\delta\epsilon}^{(2)}(v_{k})=m_{0}(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}))-O(\epsilon),\quad 1\leq k\leq K,\\ \langle f_{k},f_{l}\rangle&=\frac{1}{4}(\tilde{q}_{\delta\epsilon}(v_{k}+v_{l})-\tilde{q}_{\delta\epsilon}(v_{k}-v_{l}))=O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})+O(\epsilon),\quad k\neq l,\,1\leq k,l\leq K.\end{split}

(A.39)

Then due to that $O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})=o(1)$ , we have linear independence of $\{f_{j}\}_{j=1}^{K}$ with large enough $N$ .

Same as before, for any $2\leq k\leq K$ , we let $L_{k}=\text{Span}\{f_{1},\cdots,f_{k}\}$ , and have (35). For any $f\in L_{k}$ , $f=\sum_{j=1}^{k}c_{j}f_{j}$ , $f=\tilde{I}_{r}[v]$ , $v:=\sum_{j=1}^{k}c_{j}v_{j}$ , and

v^{T}\tilde{D}v=\sum_{j=1}^{k}c_{j}^{2}v_{j}^{T}\tilde{D}v_{j}=\|c\|^{2}.

Meanwhile, by (A.29), $m_{0}=1$ ,

\|c\|^{2}=\|v\|_{\tilde{D}}^{2}=\|v\|_{p^{-1}}^{2}(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})),

(A.40)

and by (A.37),

v^{T}(\tilde{D}-\tilde{W})v=\epsilon\sum_{j=1}^{k}\lambda_{j}c_{j}^{2}\leq\epsilon\|c\|^{2}\lambda_{k}.

(A.41)

Then, as we work under $E^{(0)}\cap E^{(1)}$ , (A.33) and (A.34) hold. Applying to $u=v$ and subtracting the two,

	$\displaystyle\tilde{q}^{(0)}_{\delta\epsilon}(v)-\tilde{q}^{(0)}_{\epsilon}(v)$	$\displaystyle=\\|v\\|_{p^{-1}}^{2}O_{{\mathcal{M}},p}(\delta^{-d/4}\sqrt{\frac{\log N}{N\epsilon^{d/2}}})=\\|c\\|^{2}(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}))O_{{\mathcal{M}},p}(\delta^{-d/4}\sqrt{\frac{\log N}{N\epsilon^{d/2}}})$
		$\displaystyle=\\|c\\|^{2}O_{{\mathcal{M}},p}(\delta^{-d/4}\sqrt{\frac{\log N}{N\epsilon^{d/2}}}),$

where we used (A.40) to obtain the 2nd equality. To upper bound $\tilde{q}^{(2)}_{\epsilon}(v)$ , by (A.35), and with the shorthand that $\tilde{O}(\epsilon)$ stands for ${O}(\epsilon(\log\frac{1}{\epsilon})^{2})$ ,

	$\displaystyle\tilde{q}^{(2)}_{\epsilon}(v)$	$\displaystyle=\left(1+\tilde{O}(\epsilon)+O(\sqrt{\frac{\log N}{N\epsilon^{d/2}}})\right)(v^{T}(\tilde{D}-\tilde{W})v)+\\|v\\|_{p^{-1}}^{2}O(\epsilon^{3})$
		$\displaystyle\leq\left(1+\tilde{O}(\epsilon)+O(\sqrt{\frac{\log N}{N\epsilon^{d/2}}})\right)\epsilon\\|c\\|^{2}\lambda_{k}+\\|c\\|^{2}(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}))O(\epsilon^{3})$
		$\displaystyle\leq\epsilon\\|c\\|^{2}\left\{\lambda_{k}\left(1+\tilde{O}(\epsilon)+O(\sqrt{\frac{\log N}{N\epsilon^{d/2}}})\right)+O(\epsilon^{2})\right\}.$

Thus we have that

	$\displaystyle\langle f,f\rangle-\langle f,Q_{t}f\rangle\leq(q^{(0)}_{\delta\epsilon}(v)-q^{(0)}_{\epsilon}(v))+q^{(2)}_{\epsilon}(v)$
	$\displaystyle~{}~{}~{}\leq\epsilon\\|c\\|^{2}\left\{\lambda_{k}\left(1+\tilde{O}(\epsilon)+O(\sqrt{\frac{\log N}{N\epsilon^{d/2}}})\right)+O(\epsilon^{2})+O_{{\mathcal{M}},p}(\delta^{-d/4}\frac{1}{\epsilon}\sqrt{\frac{\log N}{N\epsilon^{d/2}}})\right\}$
	$\displaystyle~{}~{}~{}=\epsilon\\|c\\|^{2}\left\{\lambda_{k}+\tilde{O}(\epsilon)+O_{{\mathcal{M}},p}(\delta^{-d/4}\frac{1}{\epsilon}\sqrt{\frac{\log N}{N\epsilon^{d/2}}})\right\}.\quad\text{(by $\lambda_{k}\leq 1.1\mu_{K}$)}$		(A.42)

To lower bound $\langle f,f\rangle$ , again by (A.36), (A.40) and (A.41),

0\leq\tilde{q}^{(2)}_{\delta\epsilon}(v)\leq\Theta(\delta^{-d/2})(v^{T}(\tilde{D}-\tilde{W})v)+\|v\|_{p^{-1}}^{2}O(\epsilon^{3})\leq\epsilon\|c\|^{2}\left(\lambda_{k}\Theta(\delta^{-d/2})+O(\epsilon^{2})\right)=\|c\|^{2}O(\epsilon).

By (A.33) and (A.40),

\tilde{q}^{(0)}_{\delta\epsilon}(v)=\|v\|_{p^{-1}}^{2}(1+O(\sqrt{\frac{\log N}{N\epsilon^{d/2}}}))=\|c\|^{2}(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})),

(A.43)

Thus,

\langle f,f\rangle=\tilde{q}^{(0)}_{\delta\epsilon}(v)-\tilde{q}^{(2)}_{\delta\epsilon}(v)=\|c\|^{2}\left(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})-O(\epsilon)\right)\geq\|c\|^{2}\left(1-O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})\right).

the rest of the proof is the same as that in Proposition 4.4, where the constant $C$ is defined as $C=c_{{\mathcal{M}},p}\delta^{-d/4}$ , $c_{{\mathcal{M}},p}$ being a constant determined by $({\mathcal{M}},p)$ , and then the constant $c$ in the definition of $c_{K}$ also depends on $p$ . The needed good events are $E_{(0)}^{\prime}$ , $E_{(1)}^{\prime}$ , and $E_{UB}^{\prime\prime}$ , and the LB holds for $k\leq K$ . ∎

Proof of Lemma D.2.

By definition, for any $u\in\mathbb{R}^{N}$ ,

\tilde{q}^{(2)}_{\epsilon}(u)=\frac{1}{2}\frac{1}{N^{2}}\sum_{i,j=1}^{N}\frac{{H}_{\epsilon}(x_{i},x_{j})}{p(x_{i})p(x_{j})}(u_{i}-u_{j})^{2}\geq 0.

Take $t$ in Lemma 2.2 to be $\epsilon$ , since $\epsilon=o(1)$ , the three equations hold when $\epsilon<\epsilon_{0}$ . By (13), truncate at an $\delta_{\epsilon}=\sqrt{6(10+\frac{d}{2})\epsilon\log{\frac{1}{\epsilon}}}$ Euclidean ball, there is $C_{3}$ , a positive constant determined by ${\mathcal{M}}$ , s.t.

\frac{1}{2}\frac{1}{N^{2}}\sum_{i,j=1}^{N}\frac{{H}_{\epsilon}(x_{i},x_{j})}{p(x_{i})p(x_{j})}{\bf 1}_{\{x_{j}\notin B_{\delta_{\epsilon}}(x_{i})\}}(u_{i}-u_{j})^{2}\leq C_{3}\epsilon^{10}\frac{1}{N^{2}}\sum_{i,j=1}^{N}\frac{(u_{i}-u_{j})^{2}}{p(x_{i})p(x_{j})}{\bf 1}_{\{x_{j}\notin B_{\delta_{\epsilon}}(x_{i})\}}.

Note that

	$\displaystyle\frac{1}{N^{2}}\sum_{i,j=1}^{N}\frac{(u_{i}-u_{j})^{2}}{p(x_{i})p(x_{j})}=\frac{2}{N}\sum_{i=1}^{N}\frac{u_{i}^{2}}{p(x_{i})}\left(\frac{1}{N}\sum_{j=1}^{N}\frac{1}{p(x_{j})}\right)-2\left(\frac{1}{N}\sum_{i=1}^{N}\frac{u_{i}}{p(x_{i})}\right)^{2}$
	$\displaystyle\leq\frac{2}{N}\sum_{i=1}^{N}\frac{u_{i}^{2}}{p(x_{i})}\left(\frac{1}{N}\sum_{j=1}^{N}\frac{1}{p(x_{j})}\right)\leq\frac{2}{N}\sum_{i=1}^{N}\frac{u_{i}^{2}}{p(x_{i})}\frac{1}{p_{min}}=\frac{2}{p_{min}}\\|u\\|^{2}_{p^{-1}},$		(A.44)

thus,

\tilde{q}^{(2)}_{\epsilon}(u)=\frac{1}{2}\frac{1}{N^{2}}\sum_{i,j=1}^{N}\frac{{H}_{\epsilon}(x_{i},x_{j})}{p(x_{i})p(x_{j})}{\bf 1}_{\{x_{j}\in B_{\delta_{\epsilon}}(x_{i})\}}(u_{i}-u_{j})^{2}+\|u\|^{2}_{p^{-1}}O(\epsilon^{10}).

(A.45)

Apply (12) with the short hand that $\tilde{O}(\epsilon)$ stands for ${O}(\epsilon(\log\frac{1}{\epsilon})^{2})$ ,

	$\displaystyle\tilde{q}^{(2)}_{\epsilon}(u)=\frac{1}{2}\frac{1}{N^{2}}\sum_{i,j=1}^{N}\frac{K_{\epsilon}(x_{i},x_{j})(1+\tilde{O}(\epsilon))+O(\epsilon^{3})}{p(x_{i})p(x_{j})}{\bf 1}_{\{x_{j}\in B_{\delta_{\epsilon}}(x_{i})\}}(u_{i}-u_{j})^{2}+\\|u\\|^{2}_{p^{-1}}O(\epsilon^{10})$
	$\displaystyle=(1+\tilde{O}(\epsilon))\frac{1}{2}\frac{1}{N^{2}}\sum_{i,j=1}^{N}\frac{K_{\epsilon}(x_{i},x_{j})}{p(x_{i})p(x_{j})}{\bf 1}_{\{x_{j}\in B_{\delta_{\epsilon}}(x_{i})\}}(u_{i}-u_{j})^{2}+O(\epsilon^{3})\frac{1}{N^{2}}\sum_{i,j=1}^{N}\frac{(u_{i}-u_{j})^{2}}{p(x_{i})p(x_{j})}+\\|u\\|^{2}_{p^{-1}}O(\epsilon^{10})$
	$\displaystyle=(1+\tilde{O}(\epsilon))\frac{1}{2}\frac{1}{N^{2}}\sum_{i,j=1}^{N}\frac{K_{\epsilon}(x_{i},x_{j})}{p(x_{i})p(x_{j})}{\bf 1}_{\{x_{j}\in B_{\delta_{\epsilon}}(x_{i})\}}(u_{i}-u_{j})^{2}+\\|u\\|^{2}_{p^{-1}}O(\epsilon^{3})\quad\text{(by \eqref{eq:bound-quadratic-sum-with-p})}.$

The truncation for $K_{\epsilon}(x_{i},x_{j})$ gives that $K_{\epsilon}(x_{i},x_{j}){\bf 1}_{\{x_{j}\notin B_{\delta_{\epsilon}}(x_{i})\}}=O(\epsilon^{10})$ , and then similarly as in (A.45),

\frac{1}{2}\frac{1}{N^{2}}\sum_{i,j=1}^{N}\frac{K_{\epsilon}(x_{i},x_{j})}{p(x_{i})p(x_{j})}{\bf 1}_{\{x_{j}\in B_{\delta_{\epsilon}}(x_{i})\}}(u_{i}-u_{j})^{2}=\frac{1}{2}\frac{1}{N^{2}}\sum_{i,j=1}^{N}\frac{K_{\epsilon}(x_{i},x_{j})}{p(x_{i})p(x_{j})}(u_{i}-u_{j})^{2}-\|u\|^{2}_{p^{-1}}O(\epsilon^{10}).

(A.46)

By Lemma 6.4, and $m_{2}=2$ with Gaussian $h$ , we have that under the good event $E_{1}$ of Lemma 6.1 1),

\tilde{E}_{N}(u)=\left(\frac{1}{2\epsilon}\frac{1}{N^{2}}\sum_{i,j=1}^{N}W_{i,j}\frac{(u_{i}-u_{j})^{2}}{p(x_{i})p(x_{j})}\right)(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})),\quad\forall u\in\mathbb{R}^{N},

and the constant in big- $O$ is determined by $({\mathcal{M}},p)$ and uniform for all $u$ . This gives that

\frac{1}{2}\frac{1}{N^{2}}\sum_{i,j=1}^{N}\frac{K_{\epsilon}(x_{i},x_{j})}{p(x_{i})p(x_{j})}(u_{i}-u_{j})^{2}=\epsilon\tilde{E}_{N}(u)(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})),

(A.47)

and as a result, together with (A.46),

	$\displaystyle\tilde{q}^{(2)}_{\epsilon}(u)$	$\displaystyle=(1+\tilde{O}(\epsilon))\left(\epsilon\tilde{E}_{N}(u)(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}))-\\|u\\|^{2}_{p^{-1}}O(\epsilon^{10})\right)+\\|u\\|^{2}_{p^{-1}}O(\epsilon^{3})$
		$\displaystyle=\epsilon\tilde{E}_{N}(u)(1+\tilde{O}(\epsilon)+O(\sqrt{\frac{\log N}{N\epsilon^{d/2}}}))+\\|u\\|^{2}_{p^{-1}}O(\epsilon^{3}).$

Recall that $\tilde{E}_{N}(u)=\frac{1}{\epsilon}u^{T}(\tilde{D}-\tilde{W})u$ , this proves (A.35).

To prove (A.36), since $0<\alpha\epsilon<\epsilon$ , apply Lemma 2.2 with $t=\alpha\epsilon$ , and similarly as in (A.45),

	$\displaystyle\tilde{q}^{(2)}_{\alpha\epsilon}(u)=\frac{1}{2}\frac{1}{N^{2}}\sum_{i,j=1}^{N}\frac{{H}_{\alpha\epsilon}(x_{i},x_{j})}{p(x_{i})p(x_{j})}{\bf 1}_{\{x_{j}\in B_{\delta_{\alpha\epsilon}}(x_{i})\}}(u_{i}-u_{j})^{2}+\\|u\\|^{2}_{p^{-1}}O(\epsilon^{10})$
	$\displaystyle=\frac{1}{2}\frac{1}{N^{2}}\sum_{i,j=1}^{N}\frac{K_{\alpha\epsilon}(x_{i},x_{j})(1+\tilde{O}(\alpha\epsilon))+O(\alpha^{3}\epsilon^{3})}{p(x_{i})p(x_{j})}{\bf 1}_{\{x_{j}\in B_{\delta_{\alpha\epsilon}}(x_{i})\}}(u_{i}-u_{j})^{2}+\\|u\\|^{2}_{p^{-1}}O(\epsilon^{10})\quad\text{(by \eqref{eq:H-eps-local})}$
	$\displaystyle=(1+\tilde{O}(\epsilon))\frac{1}{2}\frac{1}{N^{2}}\sum_{i,j=1}^{N}\frac{K_{\alpha\epsilon}(x_{i},x_{j})}{p(x_{i})p(x_{j})}{\bf 1}_{\{x_{j}\in B_{\delta_{\alpha\epsilon}}(x_{i})\}}(u_{i}-u_{j})^{2}+\\|u\\|^{2}_{p^{-1}}O(\epsilon^{3}).\quad\text{(by \eqref{eq:bound-quadratic-sum-with-p})}$

Then, using (29), (A.46) and (A.47),

	$\displaystyle\tilde{q}^{(2)}_{\alpha\epsilon}(u)$	$\displaystyle\leq(1+\tilde{O}(\epsilon))\alpha^{-d/2}\frac{1}{2N^{2}}\sum_{i,j=1}^{N}\frac{K_{\epsilon}(x_{i},x_{j})}{p(x_{i})p(x_{j})}{\bf 1}_{\{x_{j}\in B_{\delta_{\alpha\epsilon}}(x_{i})\}}(u_{i}-u_{j})^{2}+\\|u\\|^{2}_{p^{-1}}O(\epsilon^{3})$
		$\displaystyle=(1+\tilde{O}(\epsilon))\alpha^{-d/2}\left(\epsilon\tilde{E}_{N}(u)(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}))-\\|u\\|^{2}_{p^{-1}}O(\epsilon^{10})\right)+\\|u\\|^{2}_{p^{-1}}O(\epsilon^{3})$
		$\displaystyle=(1+\tilde{O}(\epsilon)+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}))\alpha^{-d/2}\epsilon\tilde{E}_{N}(u)+\\|u\\|^{2}_{p^{-1}}O(\epsilon^{3}),$

which proves (A.36) because $\tilde{O}(\epsilon)+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})=o(1)$ and thus the constant in front of $\alpha^{-d/2}$ is less than 1.1 for sufficiently small $\epsilon$ . ∎

Proof of Theorem 6.7.

With sufficiently large $N$ , we restrict to the intersection of the good events in Proposition 6.6 and the $K=k_{max}+1$ good events of applying Theorem 6.2 to $\{\psi_{k}\}_{k=1}^{K}$ . Because the good event in Proposition 6.6 is already under under $E_{UB}^{\prime\prime}$ of Proposition 6.5, and under $E_{1}\cap E_{2}$ of Lemma 6.1, the extra good events in addition to what is needed in Proposition 6.6 are those corresponding to $E_{3}\cap E_{4}$ in the proof of Theorem 6.2 where $f=\psi_{k}$ for each $1\leq k\leq K$ , and, by a union bound, happens w.p. $>1-K\cdot 4N^{-9}$ . This gives to the final high probability indicated in the theorem. In addition, $D_{i}>0$ , $\tilde{D}_{i}>0$ for all $i$ , and $\tilde{L}_{rw}$ is well-defined.

The rest of the proof follows similar method as that of Theorem 5.5, but differs in the normalization of the eigenvectors and that of the eigenfunctions. With the definition of $\|u\|_{\tilde{D}}$ and $\|u\|_{p^{-1}}$ in (61) and (A.30) respectively, As has been shown in (A.29), under $E_{1}\cap E_{2}$ ,

\|u\|_{\tilde{D}}^{2}=\|u\|_{p^{-1}}^{2}(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})),\quad\forall u\in\mathbb{R}^{N},

(A.48)

and the constant in big-O is determined by $({\mathcal{M}},p)$ and uniform for all $u$ . This also gives that with sufficiently large $N$ ,

\frac{0.9}{p_{max}}\frac{\|u\|_{2}^{2}}{N}\leq 0.9\|u\|_{p^{-1}}^{2}\leq\|u\|_{\tilde{D}}^{2}\leq 1.1\|u\|_{p^{-1}}^{2}\leq\frac{1.1}{p_{min}}\frac{\|u\|_{2}^{2}}{N},\quad\forall u\in\mathbb{R}^{N},

(A.49)

because $\|u\|_{p^{-1}}^{2}=\frac{1}{N}\sum_{i=1}^{N}\frac{u_{i}^{2}}{p(x_{i})}$ is upper bounded by $\frac{1}{p_{min}N}\|u\|_{2}^{2}$ and lower bounded by $\frac{1.1}{p_{max}}\frac{\|u\|_{2}^{2}}{N}$ . Apply (A.49) to $u=v_{k}$ , this gives that $\frac{0.9}{p_{max}}\|v_{k}\|_{2}^{2}\leq\|v_{k}\|_{\tilde{D}}^{2}N=1\leq\frac{1.1}{p_{min}}\|v_{k}\|_{2}^{2}$ , that is

\sqrt{\frac{p_{min}}{1.1}}\leq\|v_{k}\|_{2}\leq\sqrt{\frac{p_{max}}{0.9}},\quad 1\leq k\leq K,

and this verifies that $\|v_{k}\|_{2}=\Theta(1)$ under the high probability event.

Meanwhile, because the good event $E_{UB}^{\prime\prime}$ is under the one needed in Lemma D.1, as shown in the proof of Lemma D.1, we have that

\|\rho_{X}{\psi}_{k}\|_{p^{-1}}^{2}=\frac{1}{N}\sum_{i=1}^{N}\frac{\psi_{k}(x_{i})^{2}}{p(x_{i})}=1+O(\sqrt{\frac{\log N}{N}}),\quad 1\leq k\leq K,

where the constant in big- $O$ depends on $({\mathcal{M}},p)$ and is uniform for all $k\leq K$ . By definition, $N\|\tilde{\phi}_{k}\|_{p^{-1}}^{2}=\|\rho_{X}{\psi}_{k}\|_{p^{-1}}^{2}$ , and then, apply (A.48) to $u=\tilde{\phi}_{k}$ ,

\|\tilde{\phi}_{k}\|_{\tilde{D}}^{2}=\|\tilde{\phi}_{k}\|_{p^{-1}}^{2}(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}))=\frac{1}{N}(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})),\quad 1\leq k\leq K.

(A.50)

Step 2. for $\tilde{L}_{rw}$ : When $k=1$ , $\lambda_{1}=0$ , and $v_{1}$ is always the constant vector, thus the discrepancy is zero. Consider $2\leq k\leq K$ , by Theorem 6.2 and that $\|u\|_{2}\leq\sqrt{N}\|u\|_{\infty}$ ,

\|\tilde{L}_{rw}\tilde{\phi}_{k}-\mu_{k}\tilde{\phi}_{k}\|_{2}=O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2+1}}}),\quad 2\leq k\leq K.

(A.51)

Then, by (A.49), $\sqrt{N}\|\tilde{L}_{rw}\tilde{\phi}_{k}-\mu_{k}\tilde{\phi}_{k}\|_{\tilde{D}}=O(\|\tilde{L}_{rw}\tilde{\phi}_{k}-\mu_{k}\tilde{\phi}_{k}\|_{2})=O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2+1}}})$ , that is, there is $\text{Err}_{pt}>0$ , s.t.

\sqrt{N}\|L_{rw}\tilde{\phi}_{k}-\mu_{k}\tilde{\phi}_{k}\|_{\tilde{D}}\leq\text{Err}_{pt},\quad 2\leq k\leq K,\quad\text{Err}_{pt}=O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2+1}}}).

(A.52)

Meanwhile, because we are under $E_{UB}^{\prime\prime}$ , (41) holds for $\lambda_{k}$ . The proof then proceeds in the same way as the Step 2. in Theorem 5.5, replacing $\frac{D}{N}$ with $\tilde{D}$ . Specifically, let $S_{k}=\text{Span}\{\tilde{D}^{1/2}v_{k}\}$ , $S_{k}^{\perp}=\text{Span}\{\tilde{D}^{1/2}v_{j},\,j\neq k,1\leq j\leq N\}$ . We then have $P_{S_{k}^{\perp}}\left(\tilde{D}^{1/2}\mu_{k}\tilde{\phi}_{k}\right)=\tilde{D}^{1/2}\sum_{j\neq k,j=1}^{N}\frac{v_{j}^{T}\tilde{D}\tilde{\phi}_{k}}{\|v_{j}\|_{\tilde{D}}^{2}}\mu_{k}v_{j}$ , and because

\tilde{L}_{rw}^{T}\tilde{D}v_{j}=\frac{1}{\epsilon}(I-\tilde{W}\tilde{D}^{-1})\tilde{D}v_{j}=\frac{1}{\epsilon}(\tilde{D}-\tilde{W})v_{j}=\tilde{D}\lambda_{j}v_{j},

(A.53)

we also have $P_{S_{k}^{\perp}}\left(\tilde{D}^{1/2}\tilde{L}_{rw}\tilde{\phi}_{k}\right)=\tilde{D}^{1/2}\sum_{j\neq k,j=1}^{N}\frac{v_{j}^{T}\tilde{D}\tilde{\phi}_{k}}{\|v_{j}\|_{\tilde{D}}^{2}}\lambda_{j}v_{j}$ . Take subtraction $P_{S_{k}^{\perp}}\left(\tilde{D}^{1/2}(\tilde{L}_{rw}\tilde{\phi}_{k}-\mu_{k}\tilde{\phi}_{k})\right)$ and do the same calculation as before, by (A.52), it gives that

\|P_{S_{k}^{\perp}}\left(\tilde{D}^{1/2}\tilde{\phi}_{k}\right)\|_{2}=\left(\sum_{j\neq k,j=1}^{N}\frac{|v_{j}^{T}\tilde{D}\tilde{\phi}_{k}|^{2}}{\|v_{j}\|_{\tilde{D}}^{2}}\right)^{1/2}\leq\frac{\text{Err}_{pt}}{\sqrt{N}\gamma_{K}}=\frac{1}{\sqrt{N}}O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2+1}}}).

(A.54)

We similarly define $\beta_{k}:=\frac{v_{k}^{T}\tilde{D}\tilde{\phi}_{k}}{\|v_{k}\|_{\tilde{D}}^{2}}$ , $\beta_{k}\tilde{D}^{1/2}v_{k}=P_{S_{k}}\tilde{D}^{1/2}\tilde{\phi}_{k}$ , and $P_{S_{k}^{\perp}}\left(\tilde{D}^{1/2}\tilde{\phi}_{k}\right)=\tilde{D}^{1/2}\tilde{\phi}_{k}-P_{S_{k}}\tilde{D}^{1/2}\tilde{\phi}_{k}=\tilde{D}^{1/2}\left(\tilde{\phi}_{k}-\beta_{k}v_{k}\right)$ . Then, by (A.54), we have $\|\tilde{\phi}_{k}-\beta_{k}v_{k}\|_{\tilde{D}}=\|P_{S_{k}^{\perp}}\left(\tilde{D}^{1/2}\tilde{\phi}_{k}\right)\|_{2}=\frac{1}{\sqrt{N}}O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2+1}}})$ , and by (A.49),

\|\tilde{\phi}_{k}-\beta_{k}v_{k}\|_{2}=O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2+1}}}).

To finish Step 2, it remains to show that $|\beta_{k}|=1+o(1)$ , and then we define $\alpha_{k}=\frac{1}{\beta_{k}}$ . Note that

\|\tilde{\phi}_{k}\|_{\tilde{D}}^{2}=\|\tilde{D}^{1/2}\tilde{\phi}_{k}\|_{2}^{2}=\|P_{S_{k}^{\perp}}\left(\tilde{D}^{1/2}\tilde{\phi}_{k}\right)\|_{2}^{2}+\|P_{S_{k}}\left(\tilde{D}^{1/2}\tilde{\phi}_{k}\right)\|_{2}^{2}=\|P_{S_{k}^{\perp}}\left(\tilde{D}^{1/2}\tilde{\phi}_{k}\right)\|_{2}^{2}+\beta_{k}^{2}\|v_{k}\|_{\tilde{D}}^{2}.

(A.55)

By that $\|v_{k}\|_{\tilde{D}}^{2}=\frac{1}{N}$ , inserting into (A.55) together with (A.54), (A.50),

\frac{1}{N}(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}))=(\frac{1}{\sqrt{N}}O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2+1}}}))^{2}+\beta_{k}^{2}\frac{1}{N},

which gives that $1+o(1)=o(1)+\beta_{k}^{2}$ by multiplying $N$ to both sides.

Step 3. of $\tilde{L}_{rw}$ : The proof is the same as Step 3. in Theorem 5.5, replacing $\frac{D}{N}$ with $\tilde{D}$ . Specifically, using the relation (A.53), and the eigenvector consistency in Step 2, we have

|\lambda_{k}-\mu_{k}||v_{k}^{T}\tilde{D}\tilde{\phi}_{k}|\leq|\alpha_{k}||\tilde{\phi}_{k}^{T}\tilde{D}\tilde{L}_{rw}\tilde{\phi}_{k}-\mu_{k}\|\tilde{\phi}\|_{\tilde{D}}^{2}|+|\varepsilon_{k}^{T}\tilde{D}(\tilde{L}_{rw}\tilde{\phi}_{k}-\mu_{k}\tilde{\phi}_{k})|=:①+②.

where $\|\varepsilon_{k}\|_{\tilde{D}}=\frac{1}{\sqrt{N}}O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2+1}}})$ and $\alpha_{k}=1+o(1)$ . By (A.26), $\tilde{\phi}_{k}^{T}\tilde{D}\tilde{L}_{rw}\tilde{\phi}_{k}=\tilde{E}_{N}(\tilde{\phi}_{k})=\frac{1}{N}(\mu_{k}+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})$ . Together with (A.50), one can show that $N①=O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})$ . For $②$ , with (A.52), one can verify that $②\leq\|\varepsilon_{k}\|_{\tilde{D}}\|\tilde{L}_{rw}\tilde{\phi}_{k}-\mu_{k}\tilde{\phi}_{k}\|_{\tilde{D}}=\frac{1}{N}O(\text{Err}_{pt}^{2})=\frac{O(\epsilon)}{N}$ , where used that $O(\text{Err}_{pt}^{2})=O(\epsilon)$ same as before. Putting together, and with the definition of $\beta_{k}$ above,

|\lambda_{k}-\mu_{k}||\beta_{k}|\leq\frac{①+②}{\|v_{k}\|_{\tilde{D}}^{2}}=\frac{(O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}})+O(\epsilon))/N}{1/N}=O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}).

We have shown that $|\beta_{k}|=1+o(1)$ , thus the bound of $|\lambda_{k}-\mu_{k}|$ is proved, and holds for $k\leq k_{max}$ . ∎

	$\displaystyle\int_{{\mathcal{M}}}K_{\epsilon}(x,y)\|f(y)-f(x)\|p(y)dV(y)$
	$\displaystyle=\int_{B_{\delta_{\epsilon}}(x)}K_{\epsilon}(x,y)\|f(y)-f(x)\|p(y)dV(y)+O(\epsilon^{10})\\|f\\|_{\infty}\\|p\\|_{\infty}$
	$\displaystyle\leq\int_{B_{\delta_{\epsilon}}(x)}K_{\epsilon}(x,y)(\\|\nabla f\\|_{\infty}\\|y-x\\|)p(y)dV(y)+O_{f,p}(\epsilon^{10})$
	$\displaystyle=O_{f,p}(\sqrt{\epsilon})+O_{f,p}(\epsilon^{10})=O(\sqrt{\epsilon}).$

	$\displaystyle\frac{1}{\sqrt{\epsilon}}\int_{B_{\delta_{\epsilon}}(x)}K_{\epsilon}(x,y)\\|y-x\\|dV(y)=\int_{B_{\delta_{\epsilon}}(x)}\epsilon^{-d/2}h(\frac{\\|x-y\\|^{2}}{\epsilon})\frac{\\|y-x\\|}{\sqrt{\epsilon}}dV(y)$
	$\displaystyle~{}~{}~{}\leq\int_{B_{\delta_{\epsilon}}(x)}\epsilon^{-d/2}a_{0}e^{-a\frac{\\|x-y\\|^{2}}{\epsilon}}\frac{\\|y-x\\|}{\sqrt{\epsilon}}dV(y)$
	$\displaystyle~{}~{}~{}\leq\int_{\\|u\\|<1.1\delta_{\epsilon},\,u\in\mathbb{R}^{d}}a_{0}e^{-\frac{a}{1.1}\\|u\\|^{2}}\frac{\\|u\\|}{0.9}(1+O(\\|u\\|^{2}))du=O(1),$		(54)

	$\displaystyle\tilde{q}^{(2)}_{\epsilon}(v)$	$\displaystyle=\left(1+\tilde{O}(\epsilon)+O(\sqrt{\frac{\log N}{N\epsilon^{d/2}}})\right)(v^{T}(\tilde{D}-\tilde{W})v)+\\|v\\|_{p^{-1}}^{2}O(\epsilon^{3})$
		$\displaystyle\leq\left(1+\tilde{O}(\epsilon)+O(\sqrt{\frac{\log N}{N\epsilon^{d/2}}})\right)\epsilon\\|c\\|^{2}\lambda_{k}+\\|c\\|^{2}(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}))O(\epsilon^{3})$
		$\displaystyle\leq\epsilon\\|c\\|^{2}\left\{\lambda_{k}\left(1+\tilde{O}(\epsilon)+O(\sqrt{\frac{\log N}{N\epsilon^{d/2}}})\right)+O(\epsilon^{2})\right\}.$

	$\displaystyle\tilde{q}^{(2)}_{\alpha\epsilon}(u)$	$\displaystyle\leq(1+\tilde{O}(\epsilon))\alpha^{-d/2}\frac{1}{2N^{2}}\sum_{i,j=1}^{N}\frac{K_{\epsilon}(x_{i},x_{j})}{p(x_{i})p(x_{j})}{\bf 1}_{\{x_{j}\in B_{\delta_{\alpha\epsilon}}(x_{i})\}}(u_{i}-u_{j})^{2}+\\|u\\|^{2}_{p^{-1}}O(\epsilon^{3})$
		$\displaystyle=(1+\tilde{O}(\epsilon))\alpha^{-d/2}\left(\epsilon\tilde{E}_{N}(u)(1+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}))-\\|u\\|^{2}_{p^{-1}}O(\epsilon^{10})\right)+\\|u\\|^{2}_{p^{-1}}O(\epsilon^{3})$
		$\displaystyle=(1+\tilde{O}(\epsilon)+O(\epsilon,\sqrt{\frac{\log N}{N\epsilon^{d/2}}}))\alpha^{-d/2}\epsilon\tilde{E}_{N}(u)+\\|u\\|^{2}_{p^{-1}}O(\epsilon^{3}),$