Eigen-convergence of Gaussian kernelized graph Laplacian by manifold heat interpolation
Abstract
This work studies the spectral convergence of graph Laplacian to the Laplace-Beltrami operator when the graph affinity matrix is constructed from random samples on a -dimensional manifold embedded in a possibly high dimensional space. By analyzing Dirichlet form convergence and constructing candidate approximate eigenfunctions via convolution with manifold heat kernel, we prove that, with Gaussian kernel, one can set the kernel bandwidth parameter such that the eigenvalue convergence rate is and the eigenvector convergence in 2-norm has rate ; When , both eigenvalue and eigenvector rates are . These rates are up to a factor and proved for finitely many low-lying eigenvalues. The result holds for un-normalized and random-walk graph Laplacians when data are uniformly sampled on the manifold, as well as the density-corrected graph Laplacian (where the affinity matrix is normalized by the degree matrix from both sides) with non-uniformly sampled data. As an intermediate result, we prove new point-wise and Dirichlet form convergence rates for the density-corrected graph Laplacian. Numerical results are provided to verify the theory.
Keywords: Graph Laplacian, heat kernel, Laplace-Beltrami operator, manifold learning, Gaussian kernel, spectral convergence
1 Introduction
-dimensional manifold in | |
data sampling density on | |
Laplace-Beltrami operator, also as | |
population eigenvalue of | |
population eigenfunctions of | |
empirical eigenvalue of graph Laplacian | |
empirical eigenvector of graph Laplacian | |
manifold gradient, also as | |
manifold heat kernel | |
semi-group operator of manifold diffusion, | |
dataset points used for computing | |
number of samples in | |
kernel bandwidth parameter | |
graph affinity kernel, , | |
a function | |
kernelized graph affinity matrix |
degree matrix of , | |
un-normalized graph Laplacian | |
random-walk graph Laplacian | |
graph Dirichlet form | |
function evaluation operator, | |
density-corrected affinity matrix, | |
degree matrix of |
Asymptotic Notations | |
---|---|
: in the limit, , declaring the constant dependence on | |
: for , , in the limit, | |
same as | |
: for , in the limit | |
: for , in the limit | |
multiplied another factor involving a log, defined every time used in text | |
When the superscript a is omitted, it declares that the constants are absolute ones. means that . |
Graph Laplacian matrices built from data samples are widely used in data analysis and machine learning. The earlier works include Isomap [2], Laplacian Eigenmap [3], Diffusion Map [10, 30], among others. Apart from being a widely-used unsupervised learning method for clustering analysis and dimension reduction (see, e.g., the review papers [33, 30]), graph Laplacian methods also drew attention via the application in semi-supervised learning [24, 12, 29, 15]. Under the manifold setting, data samples are assumed to lie on low-dimensional manifolds embedded in a possibly high-dimensional ambient space. A fundamental problem is the convergence of the graph Laplacian matrix to the manifold Laplacian operator in the large sample limit. The operator point-wise convergence has been intensively studied and established in a series of works [19, 18, 4, 10, 27], and extended to variant settings, such as different kernel normalizations [23, 36] and general class of kernels [31, 5, 9]. The eigen-convergence, namely how the empirical eigenvalues and eigenvectors converge to the population eigenvalues and eigenfunctions of the manifold Laplacian, is a more subtle issue and has been studied in [4, 34, 6, 35, 28, 14] (among others) and recently in [32, 7, 11, 8].
The current work proves the eigen-convergence, specifically the consistency of eigenvalues and eigenvectors in 2-norm, for finitely many low-lying eigenvalues of the graph Laplacian constructed using Gaussian kernel from i.i.d. sampled manifold data. The result covers the un-normalized and random-walk graph Laplacian when data density is uniform, and the density-corrected graph Laplacian (defined below) with non-uniformly sampled data. For the latter, we also prove new point-wise and Dirichlet form convergence rates as an intermediate result. We overview the main results in Section 1.1 in the context of literature, which are also summarized in Table 2.
The framework of our work follows the variational principle formulation of eigenvalues using the graph and manifold Dirichlet forms. Dirichlet form-based approach to prove graph Laplacian eigen-convergence was firstly carried out in [6] under a non-probabilistic setting. [32, 7] extended the approach under the probabilistic setting, where are i.i.d. samples, using optimal transport techniques. Our analysis follows the same form-based approach and differs from previous works in the following aspects: Let be the (squared) kernel bandwidth parameter corresponding to diffusion time, the number of samples, and the manifold intrinsic dimensionality,
Leveraging the observation in [10, 27] that the bias error in the point-wise rate of graph Laplacian can be improved from to using a kernel function, we show that the improved point-wise rate of Gaussian kernelized graph Laplacian translates into an improved eigen-convergence rate than using compactly supported kernels. Specifically, the eigenvector (2-norm) convergence rate is , achieved at the optimal choice of .
We show that the eigenvalue convergence rate matches that of the Dirichlet form convergence rate in [9], which is better than the point-wise rate . This leads to an eigenvalue convergence rate of , achieved at the optimal choice of . The optimal for eigenvalue and eigenvector estimation differs in order of .
In obtaining the initial crude eigenvalue lower bound (LB), called Step 1 in below, we develop a short proof using manifold heat kernel to define the “interpolation mapping”, which constructs from a vector a smooth function on . The manifold variational form of , defined via the heat kernel, naturally relates to the graph Dirichlet form of when the graph affinity matrix is constructed using a Gaussian kernel. The analysis makes use of special properties of manifold heat kernel and only holds when the graph affinity kernel locally approximates the heat kernel, like the Gaussian. This specialty of heat kernel has not been exploited in previous graph Laplacian analysis to obtain eigen-convergence rates.
Towards the eigen-convergence, our work also recaps and develops several intermediate results under weaker assumptions of the kernel function (i.e., non-Gaussian), including an improved point-wise convergence rate of density-corrected graph Laplacian. The density-corrected graph Laplacian, originally proposed in [10], is an important variant of the kernelized graph Laplacian where the affinity matrix is . In applications, the data distribution is often not uniform on the manifold, and then the standard graph Laplacian with recovers the Fokker-Planck operator (weighted Laplacian) with measure , which involves a drift term depending on . The density-corrected graph Laplacian, in contrast, recovers the Laplace-Beltrami operator consistently when satisfies certain regularity condition, and thus is useful in many applications. In this work, we first prove the point-wise convergence and Dirichlet form convergence of the density-corrected graph Laplacian with , both matching those of the standard graph Laplacian, and this can be of independent interest. Then the eigen-consistency result extends to such graph Laplacians (with Gaussian kernel function), also achieving the same rate as the standard graph Laplacian when is uniform.
In below, we give an overview of the theoretical results starting from assumptions, and end the introduction section with some further literature review. In the rest of the paper, Section 2 gives preliminaries needed in the analysis. Sections 3-5 develop the eigen-convergence of standard graph Laplacians, both the un-normalized and the normalized (random-walk) ones. Section 6 extends to density-corrected graph Laplacian, and Section 7 gives numerical results. We discuss possible extensions in the last section.
Notations. Default and asymptotic notations like , , , are listed in Table 1. In this paper, we treat constants which are determined by , , as absolute ones, including the intrinsic dimension . We mainly track the number of samples and the kernel diffusion time parameter , and we may emphasize the constant dependence on or in certain circumstances, using the subscript notation like . All constant dependence can be tracked in the proof.
uniform | non-uniform | Needed assumptions | Error bound | |||
with | with | with | on | on () | ||
Eigenvalue UB | Prop. 3.1 | Prop. 3.6 | Prop. 6.5 | Assump. 2 | form rate | |
Crude eigenvalue LB | Prop. 4.1 | Prop. 4.4 | Prop. 6.6 | Gaussian | ||
Eigenvector convergence | Prop. 5.2 | - | - | Gaussian | point-wise rate | |
Eigenvalue convergence | Prop. 5.3 | - | - | form rate | ||
Eigenvalue/vector combined convergence | Thm. 5.4 | Thm. 5.5 | Thm. 6.7 | Gaussian | (optimal order of to minimize ) | Both and : |
(optimal order of to minimize ) | , | |||||
Point-wise convergence | Thm. 5.1 [27, 9]∗ | Thm. 6.2 | Assump. 2 | point-wise rate | ||
Dirichlet form convergence | Thm. 3.2 [9]∗ | Thm. 6.3 | Assump. 2 | form rate |
“form rate” is , “point-wise rate” is .
In the table, convergence of first eigenvalues and eigenvectors are concerned, where is fixed. In the most right column, “” means the error of eigenvalue convergence, and “” means the error of eigenvector convergence (in 2-norm). stands for the possible involvement of a factor of for some . In the 2nd (3rd) column, the eigenvector and eigenvalue convergences are proved in Thm. 5.5 (Thm. 6.7) and are not written as separated propositions. ∗The point-wise convergence and Dirichlet form convergence results of graph Laplacian with hold when satisfies Assump. 1(A2), i.e., when is not uniform. The Dirichlet form convergence with rate may hold when is not differentiable, e.g., when , c.f. Remark 2.
1.1 Overview of main results
We first introduce needed assumptions, and then provide a technical overview of our analysis in Section 1.1.2 (Steps 0-1) and Section 1.1.3 (Steps 2-3), summarized as a roadmap at the end of the section.
1.1.1 Set-up and assumptions
The current paper inherits the probabilistic manifold data setting, namely, the dataset consists of i.i.d. samples drawn from a distribution on with density satisfying the following assumption:
Assumption 1 (Smooth and ).
(A1) is a -dimensional compact connected manifold (without boundary) isometrically embedded in .
(A2) and uniformly bounded both from below and above, that is, s.t.
Suppose is embedded via , and when there is no danger of confusion, we use the same notation to denote and . We have the measure space : when is orientable, is the Riemann volume form; otherwise, is the measure associated with the local volume form. The smoothness of and fulfills many application scenarios, and possible extensions to less regular or are postponed. Our analysis first addresses the basic case where is uniform on , i.e., and is a positive constant. For non-uniform as in (A2), we adopt and analyze the density correction graph Laplacian in Section 6. In both cases, the graph Laplacian recovers the Laplace-Beltrami operator . In below, we write as , as .
Given data samples, the graph affinity matrix and the degree matrix are defined as
is real symmetric, typically , and for the kernelized affinity matrix, where
(1) |
for a function . The parameter can be viewed as the “time” of the diffusion process. Some results in literature are written in terms of the parameter , which corresponds to the scale of the local distance such that is of magnitude. Our results are written with respect to the time parameter , which corresponds to the squared local distance length scale.
Our main result of graph Laplacian eigen-convergence considers when the kernelized graph affinity is computed with
(2) |
we call such the Gaussian kernel function. (The constant factor is included in the definition of for theoretical convenience, and may not be needed in algorithm, e.g., in the normalized graph Laplacian the constant factor is cancelled.)
The Gaussian belongs to a larger family of differentiable functions:
Assumption 2 (Differentiable ).
(C1)
Regularity. is continuous on , on .
(C2) Decay condition. , s.t., for all ,
.
(C3) Non-negativity. on . To exclude the case that , assume .
A summary of results with needed assumptions is provided in Table 2, from which we can see that several important intermediate results, which can be of independent interest, only require to satisfy Assumption 2 or weaker, including
- Point-wise convergence of graph Laplacians.
- Convergence of the graph Dirichlet form.
- The eigenvalue upper bound (UB), which matches to the Dirichlet form convergence rate.
The point-wise convergence and Dirichlet form convergence of standard graph Laplacian only require a differentiable and decay condition of as originally taken in [10], and even without Assumption 2(C3) non-negativity. Our analysis of density-corrected graph Laplacian assumes , and our main result of eigen-convergence needs to be Gaussian, thus we include (C3) in Assumption 2 to simplify exposition. The need of Gaussian shows up in proving the (initial crude) eigenvalue lower bound (LB), to be explained in below, and it is due to the fundamental connection between Gaussian kernel and the manifold heat kernel.
1.1.2 Eigenvalue UB/LB and the interpolation mapping
To explain these results and the difference in proving eigenvalue UB and LB, we start by introducing the notion of point-wise rate and form rate. In the current paper,
Point-wise convergence of graph Laplacians is shown to have the rate of . We call this rate the “point-wise rate”, and denote by .
Convergence of the graph Dirichlet form applied to smooth manifold functions, i.e., for smooth on , is shown to have the rate of . We call this rate the “form rate”, and denote by .
In literature, the point-wise convergence of random-walk graph Laplacian with differentiable and decay was firstly shown to have rate in [27]. The exposition in [27] was for Gaussian but the analysis therein extends directly to general . The Dirichlet form convergence with differentiable was shown to have rate in [9] via a V-statistic analysis. [9] also derived point-wise rate for both the random-walk and the un-normalized graph Laplacian . The analysis in [9] was mainly developed for kernel with adaptive bandwidth, and higher order regularity of ( instead of ) was assumed to handle the complication due to variable kernel bandwidth. For the fixed-bandwidth kernel as in (1), the analysis in [9] can be simplified to proceed under less restrictive conditions of . We include more details in below when quoting these previous results, which pave the way towards proving eigen-convergence.
Table 2 illustrates a difference between eigenvalue UB and LB analysis. Specifically, the eigenvalue UB holds for general differentiable , while the initial crude eigenvalue LB, and consequently the final eigenvalue and eigenvector convergence rate, need to be Gaussian. This difference between eigenvalue UB and LB analysis is due to the subtlety of the variational principle approach in analyzing empirical eigenvalues. To be more specific, by “projecting” the population eigenfunctions to vectors in and use as “candidate” eigenvectors in the variational form, the Dirichlet form convergence rate directly translates into a rate of eigenvalue UB (for fixed finitely many low-lying eigenvalues). This is why the eigenvalue UB matches the form rate before any LB is derived, and we call this the “Step 0” of our analysis.
The eigenvalue LB, however, is more difficult, as has been pointed out in [6]. In [6] and following works taking the variational principle approach, the LB analysis is by “interpolating” the empirical eigenvectors to be functions on . Unlike with the population eigenfunctions which are known to be smooth, there is less property of the empirical eigenvectors that one can use, and any regularity property of these discrete objects is usually non-trivial to obtain [8]. The interpolation mapping in [6] first assigns a point to a Voronoi cell , assuming that forms an -net of to begin with (a non-probabilistic setting), and this maps a vector to a piece-wise constant function on ; next, is convolved with a kernel function which is compacted supported on a small geodesic ball, and this produces “candidate” eigenfunctions, whose manifold differential Dirichlet form is upper bounded by the graph Dirichlet form of , up to an error, through differential geometry calculations. Under the probabilistic setting of i.i.d. samples, [32] constructed the mapping using a Wasserstein- optimal transport (OT) map, where the -OT distance between the empirical measure and the population measure is bounded by constructing a Voronoi tessellation of when . This led to an overall eigen-convergence rate of in [32] when is compactly supported and satisfies certain regularity conditions and , the indicating a possible a factor of certain power of . A typical example is when is an indicator function , which is called “-graph” in computer science literature ( corresponds to in our notation). The approach was extended to NN graphs in [7], where the rate of eigenvalue and -norm eigenvector convergence was also improved to match the point-wise rate of the epsilon-graph or NN graph Laplacians, leading to a rate of when . The same rate was shown for -norm consistency of eigenvectors in [8], combined with Lipschitz regularity analysis of empirical eigenvectors using advanced PDE tools. Eigenvalue consistency with degraded rate was obtained under the regime , which is very sparse graph just beyond graph connectivity threshold [7].
In the current work, we take a different approach for the interpolation mapping in the eigenvalue LB analysis. Our method is based on manifold heat kernels, and the analysis makes use of the fact that at short time and on small local neighborhoods, the heat kernel can be approximated by
(3) |
and consequently by when is Gaussian as in (2). The first approximation is by classical results of elliptical operators on Riemannian manifolds, c.f. Theorem 2.1. Next, we show that because replaces geodesic distance with Euclidean distance in , and the two locally match by . (The constant in the big-O here depends on the second fundamental form, and by compactness of is universal for . Similar universal constant in big-O holds throughout the paper.) These estimates allow us to construct interpolated functions from discrete vector by convolving with the heat kernel at time , where is a fixed constant determined by the first low-lying population eigenvalues of . Specifically, is inversely proportional to the smallest eigen-gap between for ( assumed to have single multiplicity in the first place, and then the result generalizes to greater than one multiplicity), which is an constant determined by and . Applying the variational principle to the operator , where is the diffusion semi-group operator and ’s spectrum is determined by that of , allows to prove an initial eigenvalue LB smaller than half of the minimum first- eigen-gap.
The step to derive initial crude eigenvalue LB using manifold heat kernel interpolation mapping is called “Step 1” in our analysis. While the interpolation mapping by convolving with a smooth kernel has been used in previous works [6, 32, 7], using the manifold heat kernel plays a special role in the eigenvalue LB analysis, and this cannot be equivalently achieved by other choices of kernels (unless the kernel locally approximates the heat kernel, like the Gaussian kernel here). Specifically, Lemma 4.3 is proved using heat kernel properties (without using concentration of i.i.d. data samples), and the lemma connects the continuous integral form of interpolated candidate eigenfunctions with the graph Dirichlet form.
1.1.3 Road-map of analysis
The previous subsection has explained Step 0 and 1 of our analysis. Here we summarize the rest of the analysis and provide a road-map.
After an initial crude eigenvalue LB is obtained in Step 1, we adopt the “bootstrap strategy” from [7], named as therein, to obtain a refined (2-norm) eigenvector consistency rate to match to the graph Laplacian point-wise convergence rate. We call this “Step 2”. Note that the use of smooth kernel (like Gaussian) has an improved bias error in the point-wise rate than compactly supported kernel function, and then consequently improves the eigen-convergence rate, see more in Remark 4.
Next, leveraging the eigenvector consistency proved in Step 2, we further improve the eigenvalue convergence to match the form rate, which is better than the point-wise rate. We call this “Step 3”. Then the refined eigenvalue LB matches the eigenvalue UB in rate. In the process, the first many empirical eigenvalues are upper bounded to be , which follows by the eigenvalue UB proved in the beginning.
In summary, our eigen-convergence analysis consists of the following four steps,
-
-
Step 0. Eigenvalue UB by the Dirichlet form convergence, matching to the form rate.
-
-
Step 1. Initial crude eigenvalue LB, providing eigenvalue error up to the smallest first eigen-gap.
-
-
Step 2. -norm consistency of eigenvectors, up to the point-wise rate.
-
-
Step 3. Refined eigenvalue consistency, up to the form rate.
Step 1 requires to be non-negative and currently only covers the Gaussian case. This may be relaxed, since the proof only uses the approximation property of , namely that . In this work, we restrict to the Gaussian case for simplicity and the wide use of Gaussian kernels in applications.
1.2 More related works
As we adopt a Dirichlet form-based analysis, the eigen-convergence result in the current paper is of the same type as in previous works using variational principle [6, 32, 7]. In particular, the rate concerns the convergence of the first many low-lying eigenvalues of the Laplacian, where is a fixed finite integer. The constants in the big- notations in the bounds are treated as , and they depend on and these leading eigenvalues and eigenfunctions of the manifold Laplacian. Such results are useful for applications where leading eigenvectors are the primary focus, e.g., spectral clustering and dimension-reduced spectral embedding. An alternative approach is to analyze functional operator consistency [4, 34, 28, 26], which may provide different eigen-consistency bounds, e.g., -norm consistency of eigenvectors using compact embedding of Glivenko-Cantelli function classes [11].
The current work considers noise-less data on , while the robustness of graph Laplacian against noise in data is important for applications. When manifold data vectors are perturbed by noise in the ambient space, [13] showed that Gaussian kernel function has special property to make kernelized graph Laplacian robust to noise (by a modification of diagonal entries). More recently, [20] showed that bi-stochastic normalization can make the Gaussian kernelized graph affinity matrix robust to high dimensional heteroskedastic noise in data. These results suggest that Gaussian is a special and useful choice of kernel function for graph Laplacian methods.
Meanwhile, bi-stochastically normalized graph Laplacian has been studied in [23], where the point-wise convergence of the kernel integral operator to the manifold operator was proved. The spectral convergence of bi-stochastically normalized graph Laplacian for data on hyper-torus was recently proved to be in [36]. The density-corrected affinity kernel matrix , which is analyzed in the current work, provides another normalization of the graph Laplacian which recovers the Laplace-Beltrami operator. It would be interesting to explore the connections to these works and extend our analysis to bi-stochastically normalized graph Laplacians, which may have better properties of spectral convergence and noise-robustness.
2 Preliminaries
2.1 Graph and manifold Laplacians
We define the following moment constants of function satisfying Assumption 2,
By (C3), and the case is excluded, thus . With Gaussian as in (2), , , and . Denote and by and for a shorthand notation, and
-
•
The un-normalized graph Laplacian is defined as
(4) Note that the standard un-normalized graph Laplacian is usually , and we divide by the constant for the convergence of to .
-
•
The random-walk graph Laplacian is defined as
(5) with the constant normalization to ensure convergence to .
The matrix is real-symmetric, positive semi-definite (PSD), and the smallest eigenvalue is zero. Suppose eigenvalues of are , , and sorted in ascending order, that is,
The matrix is well-define when for all , which holds w.h.p. under the regime that , c.f. Lemma 3.5. We always work under the regime, namely the connectivity regime. Due to that is similar to which is PSD, is also real-diagonalized and has non-negative real eigenvalues, sorted and denoted as . We also have that, by the min-max variational formula for real-symmetric matrix,
We define the graph Dirichlet form for as
(6) |
By (4), , and thus
(7) |
Similarly, we have
(8) |
To introduce notations of manifold Laplacian, we define inner-product in as , for . We also use to denote inner-product in , being a general measure on (not necessarily probability measure), that is , for . For smooth connected compact manifold , the (minus) manifold Laplacian-Beltrami operator has eigen-pairs ,
The second eigenvalue due to connectivity of . When for some eigenvalue of having multiplicity , the eigenfunctions can be set to be an orthonormal basis of the -dimensional eigenspace associated with . Note that for generic smooth .
2.2 Heat kernel on
We leverage the special property of Gaussian kernel in the ambient space that it locally approximates the manifold heat kernel on . We start from the notations of manifold heat kernel. Since is smooth compact (no-boundary), the Green’s function of the heat equation on exists, namely the heat kernel of . We denote the heat diffusion semi-group operator as which can be formally written as , and
By that is semi-group, we have the reproduce property
Meanwhile, by the probability interpretation,
Using the eigenvalue and eigenfunctions of , the heat kernel has the expansion representation . We will not use the spectral expansion of in our analysis, but only that are also eigenfunctions of , that is,
(9) |
Next, we derive Lemma 2.2, which characterizes two properties of the heat kernel at sufficiently short time: First, on a local neighborhood on , can be approximated by in the leading order, where is defined as in (1) with Gaussian ; Second, globally on the manifold the heat kernel has a sub-Gaussian decay. These are based on classical results about heat kernel on Riemannian manifolds [21, 16, 25, 17], summarized in the following theorem.
Theorem 2.1 (Heat kernel parametrix and decay [25, 16]).
Suppose is as in Assumption 1 (A1), and is a positive integer. Then there are positive constants , i.e. the injective radius of , and both and depend on , and
1) Local approximation: There are positive constants , which depending on , and , where satisfies that
and is defined as in (3), such that, when , for any ,
(10) |
2) Global decay: There is positive constant depending on such that, when ,
(11) |
Part 1) is by the classical parametrix construction of heat kernel on , see e.g. Chapter 3 of [25], and Part 2) follows the classical upper bound of heat kernel by Gaussian estimate dating back to 60s [1, 17]. We include a proof of the theorem in Appendix B for completeness.
The theorem directly gives to the following lemma (proof in Appendix B), which is useful for our construction of interpolation mapping using heat kernel. We denote by the Euclidean ball in centered at point of radius .
Lemma 2.2.
Suppose is as in Assumption 1 (A1), and . Let , and be with Gaussian kernel , i.e., . Then there is positive constant depending on such that, when , for any ,
(12) | |||
(13) | |||
(14) |
The constants in big- in all the equations only depend on and are uniform for all .
3 Eigenvalue upper bound
In this section, we consider uniform on , and standard graph Laplacians and with the kernelized affinity matrix , defined as in (1). We show the eigenvalue UB for general differentiable satisfying Assumption 2, not necessarily Gaussian.
3.1 Un-normalized graph Laplacian eigenvalue UB
We now derive Step 0 for , the result being summarized in the following proposition.
Proposition 3.1 (Eigenvalue UB of ).
The proposition holds when the population eigenvalues have more than 1 multiplicities, as long as they are sorted in an ascending order. The proof is by constructing a -dimensional subspace in (7) spanned by vectors in which are produced by evaluating the population eigenfunctions at the data points. The proof is given in the end of this subsection after we introduce a few needed middle-step results.
Given , define the function evaluation operator applied to as
We will use as “candidate” approximate eigenvectors. To analyze , the following result from [9] shows that it converges to the differential Dirichlet form
with the form rate. The result is for general smooth and weighted Laplacian , which is defined as for measure on . is reduced to when is uniform.
Theorem 3.2 (Theorem 3.4 in [9]).
Proof of Theorem 3.2.
The proof is by a going through of the proof of Theorem 3.4 of [9] under the simplified situation when (no normalization of the estimated density is involved). Specifically, the proof uses the concentration of the -statistics . The expectation of , , equals . Meanwhile, is bounded by , and the variance of the can also be bounded by with the constant as in the theorem, following the calculation in the proof of Theorem 3.4 in [9]. The concentration of at then follows by the decoupling of the -statistics, and it gives the high probability bound in the theorem.
Note that the results in [9] are proved under the assumption that to be rather than , that is, requiring Assumption 2(C1)(C2) to hold for up to 4-th derivative of . This is because regularity of is used to handle complication of the adaptive bandwidth in the other analysis in [9]. With the fixed bandwidth kernel as defined in (1), regularity suffices, as originally assumed in [10]. ∎
Remark 1 (Relaxation of Assumption 2).
Since the proof only involves the computation of moments of the -statistic, it is possible to relax Assumption 2(C3) non-negativity of and replace with certain non-vanishing conditions on and , e.g., as in [10] and Assumption A.3 in [9]. Since the non-negativity of is used in other places in the paper, and our eigenvalue LB needs to be Gaussian, we adopt the non-negativity of in Assumption 2 for simplicity. The regularity of may also be relaxed, and the constant in may be improved accordingly. These extensions are not further pursued here.
Remark 2 (Dirichlet form convergence with compactly supported ).
The “epsilon-graph” corresponds to construct graph affinity using the indicator function kernel . Note that the “epsilon” stands for the scale of local distance and thus is the here, because our is “time”. When , using the same method as in the proof of Lemma 8 in [10], one can verify that (proof in Appendix C.1), for ,
(15) |
The boundedness and variance of are again bounded by , and thus the Dirichlet form convergence with has the same rate as in Theorem 3.2. This firstly implies that the eigenvalue UB also has the same rate, following the same proof of Proposition 3.1. The final eigen-convergence rate also depends on the point-wise rate of the graph Laplacian, see more in Remark 4.
In Theorem 3.2 and in below, the factor in the variance error bound is due to the concentration argument. Throughout the paper, the classical Bernstein inequality Lemma B.1 is intensively used.
To proceed, recall the definition of as in (6), we define the bi-linear form for as
which is symmetric, i.e., , and . The following lemma characterizes the forms and applied to , proved in Appendix C.1.
Lemma 3.3.
We need to show the linear independence of the vectors such that they span a -dimensional subspace in . This holds w.h.p. at large , by the following lemma showing the near-isometry of the projection mapping , proved in Appendix C.1.
Lemma 3.4.
Under Assumptions 1 (A1), being uniform on . For fixed , when is sufficiently large, w.p. ,
(17) |
Given these estimates, we are ready to prove Proposition 3.1.
3.2 Random-walk graph Laplacian eigenvalue UB
We fist establish a concentration argument of in the following lemma, which shows that w.h.p., by that concentrates at the value of . Consequently, also concentrates and the deviation is uniformly bounded for all , which will be used in analyzing (8).
Lemma 3.5.
Under Assumption 1(A1), uniform, and Assumption 2. Suppose as , and . Then, when is large enough, w.p. ,
1) The degree concentrates for all , namely,
(19) |
Part 2) immediately follows from Part 1), the latter being proved by standard concentration argument of independent sum and a union bound for events. With Lemma 3.5, the proof of the following proposition is similar to that of Proposition 3.1, and the difference lies in handling the denominator of the Rayleigh quotient in (8). The proofs of Lemma 3.5 and Proposition 3.6 are in Appendix C.1.
Proposition 3.6 (Eigenvalue UB of ).
Suppose , uniform, , , , and are under the same condition as in Proposition 3.1, then for sufficiently large , w.p. , for all , and
4 Eigenvalue crude lower bound in Step 1
In this section, we prove eigenvalue LB in Step 1, first for , and then the proof for is similar.
We consider for the operator on defined as
The semi-group operator is Hilbert-Schmidt, compact, and has eigenvalues and eigenfunctions as in (9). Thus, the operator is self-adjoint and PSD, and has
For any , the eigenvalues are ascending from 0 and have limit point 1. We denote for . By the variational principle, we have that when , for any ,
(21) |
For the first result, we assume that are all of multiplicity 1 for simplicity. When population eigenvalues have greater than one multiplicity, the result extends by considering eigenspace rather than eigenvectors in the standard way, see Remark 5.
4.1 Un-normalized graph Laplacian eigenvalue crude LB
We now derive Step 1 for , the result being summarized in the following proposition.
Proposition 4.1 (Initial crude eigenvalue LB of ).
Under Assumptions 1 (A1), suppose is uniform on , and is Gaussian. For fixed , , suppose are all of single multiplicity, and define
(22) |
and is a fixed constant. Then there is a absolute constant determined by and (specifically, , where is a constant depending on ), such that, if as , , and , then for sufficiently large , w.p. ,
We prove Proposition 4.1 in the end of this subsection after we introduce heat kernel interpolation and establish the needed lemmas.
Suppose are eigenvalue and eigenvectors of , to construct a test function on from the vector , we define the interpolation mapping (the terminology “interpolation” is inherited from [6]) by the heat kernel with diffusion time , to be determined. Specifically, define
and then for any ,
(23) |
We define the quadratic form
We also define and as below, and then for any , , where
(24) |
We will show that by concentration of the independent sum ; by definition, and will be when is an eigenvector with .
Lemma 4.2.
Under Assumptions 1 (A1), being uniform on . Suppose as , and . Then, when is large enough, w.p. ,
The notation indicates that the constant depends on and is uniform for all .
Proof of Lemma 4.2.
By definition, , where , and are positive valued random variables. It suffices to show that with large enough , w.p. indicated in the lemma,
(25) |
This can be proved using concentration argument, similar as in the proof of Lemma 3.5 1), where we use the boundedness of the heat kernel (14) in Lemma 2.2. The proof of (25) is given in Appendix C.2. Note that (25) is a property of the r.v. only, which is irrelevant to the vector . Thus the threshold of large in the lemma and the constant in big- depend on and are uniform for all . ∎
Lemma 4.3.
Under Assumptions 1 ( can be non-uniform), being Gaussian, let be a fixed constant. Suppose as , then with sufficiently small , for any realization of ,
(26) |
and
(27) |
The constants in big- only depend on and are uniform for all and .
Proof of Lemma 4.3.
For any , . Since , take in Lemma 2.2 to be , when , the three equations hold. By (13), truncate at an Euclidean ball,
We are ready to prove Proposition 4.1.
Proof of Proposition 4.1.
For fixed , since , define
(30) |
and is a fixed constant determined by and . For , let
For , where are normalized s.t.
(31) |
let , , then . Because , and , . Thus, under the assumption of the current proposition, the condition needed in Proposition 3.1 is satisfied, and then when is sufficiently large, there is an event which happens w.p. , under which
(32) |
We first show that are linearly independent by considering . By definition, for ,
and for , ,
Because , under the condition of the proposition, satisfies the condition in Lemma 4.2, and thus, with sufficiently large , there is an event which happens w.p. , under which
where we used that the factor is a fixed constant. Meanwhile, applying (27) in Lemma 4.3 where , and note that
we have that
and by that which is a fixed constant, so is , we have that
(33) |
Putting together, we have that
(34) |
This proves linear independence of when is large enough, since .
We consider first eigenvalues of , . For each , let be a -dimensional subspace in , then by (21),
(35) |
For any , , there is , , such that . Thus
Because are orthogonal, , we have that
By definition, , and .
We first upper bound the numerator of the r.h.s. of (35). By that ,
(36) |
We have already obtained the good event when applying Lemma 4.2 with . We apply the lemma again to , which gives that with sufficiently large there is an event which happens , and then under ,
(37) |
We track the constant dependence here: the constant in in Lemma 4.2 is only depending on (and not on ), thus we use the notation in (37) and below to emphasize that the constant is -dependent only and independent from . Then (37) gives that
The UB of follows from (26) in Lemma 4.3, with the shorthand that stands for ,
Thus, (36) continues as
(38) |
Next we lower bound the denominator . Here we use (27) in Lemma 4.3, which gives that
Note that we assume under event so that the eigenvalue UB (32) holds, thus . Together with that is a fixed constant, we have that
Then, again under ,
Putting together, and by that , we have that
where , and is a constant only depending on . We set
and since we assume in the current proposition, we have that . Then, comparing to l.h.s. of (35), we have that
By the relation that for any , , and when is sufficiently small s.t. ,
Noting that for , , because . Thus, when is sufficiently small and the term is less than , under the good events , which happens w.p. , we have that
Recall that by definition (30), , then , also . Re-arranging the terms gives that . This can be verified for all , and note that the good event is w.r.t , and is constructed for fixed , and none is for specific . ∎
4.2 Random-walk graph Laplacian eigenvalue crude LB
The counterpart result of random-walk graph Laplacian is the following proposition. It replaces Proposition 3.1 with Proposition 3.6 in obtaining the eigenvalue UB in the analysis, and consequently the high probability differs slightly.
Proposition 4.4 (Initial crude eigenvalue LB of ).
Under the same condition and setting of , being uniform, being Gaussian, and , , same as in Proposition 4.1. Then, for sufficiently large , w.p., , for .
5 Steps 2-3 and eigen-convergence

In this section, we obtain eigen-convergence rate of and from the initial crude eigenvalue bound in Step 1. We first derive the Steps 2-3 for , and the proof for is similar.
5.1 Step 2 eigenvector consistency
In Step 1, the crude bound of eigenvalue (the UB already matches the form rate, the LB is crude) gives that for fixed and at large , each will fall into the interval , where is less than half of the smallest eigenvalue gaps , …, , illustrated in Fig. 1. This means that is separated from neighboring and by an distance away. This initial separation is enough for proving eigenvector consistency up to the point-wise rate, which is a standard argument, see e.g. proof of Theorem 2.6 part 2) in [7]. In below we provide an informal explanation and then the formal statement in Proposition 5.2, with a proof for completeness.
We first give an illustrative informal derivation. Take for example, let , , and we want to show that and are aligned.
the point-wise convergence of graph Laplacian gives bound of the residual vector , suppose . Meanwhile, for any , the crude bound of eigenvalues gives that
where is an constant determined by and . Because empirical eigenvalues are sorted, for are also away from . As a result,
Then we use the relation that for each , , which gives that
This shows that has alignment with all the other eigenvectors than , and since are orthogonal basis in , this guarantees alignment between and .
To proceed, we use the point-wise rate of graph Laplacian with kernel as in the next theorem. The analysis of point-wise convergence was given in [27] and [9]: The original theorem in [27] considers the normalized graph Laplacian . The analysis is similar for and leads to the same rate, which was derived in [9] under the setting of variable kernel bandwidth. These previous works consider a fixed point on , and since the concentration result has exponentially high probability, it directly gives the version of uniform error bound at every data point , which is needed here.
Theorem 5.1 ([27, 9]).
1) When is large enough, w.p. ,
2) When is large enough, w.p. ,
The constants in the big-O notations depend on , and the norm of .
Note that Theorem 5.1 holds for non-uniform , while in our eigen-convergence analysis of graph Laplacian with in below, we only use the result when is uniform. Meanwhile, similar to Theorem 3.2, Assumption 2(C3) may be relaxed for Theorem 5.1 to hold, c.f. Remark 1.
Proof of Theorem 5.1.
Consider the events such that is less than the error bound. For each of the -th event, condition on , Theorem 3.8 in [9] can be directly used to show that the event holds w.p. for the case 1) random-walk graph Laplacian. For the case 2) un-normalized graph Laplacian, adopting the same technique of Theorem 3.6 in [9] proves the same rate as for the fixed-bandwidth kernel, and gives that the event holds w.p. . Specifically, the proof is by showing the concentration of the , which is an independent summation condition on . The r.v. , , has expectation , and can be shown to be bounded by , and is also bounded by , following the same calculation as in the proof of Theorem 3.6 in [9]. This shows that the bias error is , and the variance error is , by classical Bernstein. Same as in Theorem 3.2, regularity and decay up to 2nd derivative of are enough here.
Strictly speaking, the analysis in [9] is for the “” summation and not the “” one here. However, the difference between and only introduces an relative error and is of higher order, and the term cancels out in the summation of . In proving this large deviation bound at , the needed threshold for large is determined by and uniform for . Then, when exceeds a threshold uniform for all , by the independence of the ’s, the -th event holds w.p. and for cases 1) and 2) respectively. The current theorem, in both 1) and 2), follows by a union bound. ∎
We are ready for Step 2 for the unnormalized graph Laplacian . Here we consider eigenvectors normalized to have 2-norm 1, i.e., , , and we compare to
(39) |
where are population eigenfunctions which are orthonormal in , same as above.
Proposition 5.2.
Proof of Proposition 5.2.
The proof uses the same approach as that of Theorem 2.6 part 2) in [7], and since our setting is different, we include a proof for completeness.
When , we always have , is the constant vector , and is the constant function, and thus up to a sign. Under the condition of the current proposition, the assumptions of Proposition 4.1 are satisfied, and because implies that , the assumptions of Theorem 5.1 2) are also satisfied. We apply Theorem 5.1 2) to the functions . By a union bound, we have that when is large enough, w.p. , for . By that for any , this gives that there is ,
(40) |
The constants in big-O depends on first eigenfunctions and are absolute ones because is fixed. Applying Proposition 4.1, and consider the intersection with the good event in Proposition 4.1, we have for each , . By definition of as in (22),
(41) |
For each , let be the 1-dimensional subspace in , and let be its orthogonal complement. We will show that is small. By definition, , and meanwhile, . Subtracting the two gives that . By that are orthonormal vectors, and (41),
Then, combined with (40), we have that , namely, .
By definition, , where . Note that are unit vectors up to an error: Because the good event in Proposition 4.1 is under that in the eigenvalue UB Proposition 3.1, and specifically that of Lemma 3.4. Thus (17) holds, which means that , , where . Then, one can verify that
(42) |
and then we set , and have that
The bound holds for each . ∎
5.2 Step 3: refined eigenvalue LB
We now derive Step 3 for , the result being summarized in the following proposition.
Proposition 5.3.
Under the same condition of Proposition 5.2, is fixed. Then, for sufficiently large , with the same indicated high probability,
Proof of Proposition 5.3.
We inherit the notations in the proof of Proposition 5.2. Again . For , note that
(43) |
and meanwhile, we have shown that , where and . Thus the l.h.s. of (43) equals
By definition of , . The good event in Proposition 5.2 is under the good event , under which Lemma 3.3 and Lemma 3.4 hold. Then by (16), ; By (17), . Putting together, and by that ,
Meanwhile, by (40), , and then
Because for some , , thus , and then . Back to (43), we have that
and by (42), , thus . The above holds for all . ∎
5.3 Eigen-convergence rate
We are ready to prove the main theorems on eigen-convergence of graph Laplacians, when is uniform and the kernel function is Gaussian.
Theorem 5.4 (eigen-convergence of ).
Under Assumption 1 (A1), is uniform on , and is Gaussian. For fixed, assume that the eigenvalues for are all single multiplicity, and the constant as in Proposition 4.1. Consider first eigenvalues and eigenvectors of , , , and the vectors are defined as in (39). If as , , , then for sufficiently large , w.p. ,
(44) |
and there exist scalars , actually , such that
(45) |
Remark 3 (Choice of and overall rates).
The eigen-convergence bounds (44) and (45) are provided in the combined form of and , as long as the condition and holds. The bias error in both cases is , and the variance error has a different inverse power of ( and respectively). The eigenvalue convergence (44) achieves the form rate , which is the rate of the Dirichlet form convergence, c.f. Theorem 3.2. The (2-norm) eigenvector convergence (45) achieves the point-wise rate , which is the rate of point-wise convergence of graph Laplacian, c.f. Theorem 5.1.
The different powers of leads to different optimal choice of , in order of , to achieve the best overall rates for eigenvalue and eigenvector convergence respectively. Specifically,
-
•
The optimal choice of to minimize is when for (which is also the smallest order of allowed by the theorem). This choice leads to
which is the best overall rate of eigenvalue convergence by our theory. We use to denote the involvement of certain factor of . In this case, .
-
•
The optimal choice of to minimize is when , which leads to
which is the best overall rate of eigenvector convergence. In this case, .
We can see that the overall rate of eigenvalue convergence achieves the best overall rate of form convergence , and that of eigenvector (2-norm) convergence achieves the best overall rate of point-wise convergence , at the optimal for each convergence respectively.
Proof of Theorem 5.4.
Remark 4 (Comparison to compactly supported ).
For (see also Remark 2), the point-wise convergence of graph Laplacian is known to have the rate as , see [19, 4, 27, 7] among others. While our way of Step 1 cannot be applied to such , [7] covered this case when , and provided the eigenvalue and eigenvector consistency up to when . The scaling is the optimal one to balance the bias and variance errors in , and then it gives the overall error rate as , which agrees with the eigen-convergence rate in [7]. Here and indicate that the constant is possibly multiplied by a factor of certain power of . Meanwhile, we note that, if following our approach of using the Dirichlet form convergence rate, the eigenvalue consistency can be improved to be squared namely when . Specifically, by Remark 2, the Dirichlet form convergence with indicator is . Then, once the initial crude eigenvalue LB is established, in Step 2, the eigenvector 2-norm consistency can be shown to be . In Step 3, the eigenvalue consistency for the first eigenvalues can be shown to be . This would imply the eigenvalue convergence rate of under the regime where , while the eigenvector consistency remains . Compared to Remark 3, these rates are the same as Gaussian kernel when setting (the optimal order to minimize the eigenvalue rate which is ). However, using Gaussian kernel allows to obtain a better rate for eigenvector convergence, namely , by setting (the optimal order to minimize the eigenvector convergence rate which is ). This improved eigenvector (2-norm) rate is due to the improved point-wise rate of smooth kernel than that of the indicator kernel , and specifically, the bias error is instead of .
Remark 5 (Extension to larger eigenvalue multiplicity).
The result extends when the population eigenvalues have multiplicity greater than one. Suppose we consider , which are distinct eigenvalues, and has multiplicity . Then let , , , and are sorted eigenvalues and associated eigenfunctions. Step 0. eigenvalue UB holds, since Proposition 3.1 does not require single multiplicity. In Step 1, the only place in Proposition 4.1 where single multiplicity of is used is in the definition of . Then, by changing to
(46) |
and defining , is a positive constant depending on and , Proposition 4.1 proves that for all , i.e. . This allows to extend Step 2 Proposition 5.2 by considering the projection where the subspace in is spanned by eigenvectors whose eigenvalues approaches , similar as in the original proof of Theorem 2.6 part 2) in [7]. Specifically, suppose , , let , and the index set . For eigenfunction , , then , similarly as in the proof of Proposition 5.2, one can verify that
which gives that , for all . By that are near orthonormal with large (Lemma 3.4), this proves that there exists an -by- orthogonal transform , and , such that , , where , and the notation stands for the -by- matrix formed by concatenating the vectors as columns. This proves consistency of empirical eigenvectors up to the point-wise rate for . Finally, Step 3 Proposition 5.3 extends by considering (43) for and , making use of , the Dirichlet form convergence of (Lemma 3.3), and that is transformed from by an orthogonal matrix .
To address the eigen-convergence of , we define the -weighted 2-norm as
and recall that eigenvectors of are -orthogonal. The following theorem is the counterpart of Theorem 5.4 for , obtaining the same rates.
Theorem 5.5 (eigen-convergence of ).
Under the same condition and setting of , being uniform, being Gaussian, and , K, , same as in Theorem 5.4. Consider first eigenvalues and eigenvectors of , , , i.e. , and the vectors defined as in (39). Then, for sufficiently large , w.p. , , and the same bound of and as in Theorem 5.4 hold for , with certain scalars satisfying ,
The extension to when has greater than 1 multiplicity is possible, similarly as in Remark 5. The proof of uses almost the same method as for , and the difference is that are no longer orthonormal but -orthogonal. This is handled by that and agrees in relative error up to the form rate, due to the concentration of (Lemma 3.5). The detailed proof is left to Appendix C.3.
6 Density-corrected graph Laplacian
We consider as in Assumption 1(A2). The density-corrected graph Laplacian is defined as [10]
where as before, and is the degree matrix of . The density-corrected graph laplacian recovers Laplace-Beltrami operator when is not uniform.. In this section, we extend the theory of point-wise convergence, Dirichlet form convergence, and eigen-convergence to such graph Laplacian.
6.1 Point-wise convergence of
This subsection proves Theorem 6.2, which shows that the point-wise rate of is same as that of without the density-correction. The result is for general differentiable satisfying Assumption 2, which can be of independent interest.
We first establish the counterpart of Lemma 3.5 about the concentration of all when is not uniform. The deviation bound is uniform for all and has an bias error at .
Lemma 6.1.
1) When is large enough, w.p. , for all s.t. is well-defined, and
(47) |
where is determined by manifold extrinsic coordinates, and .
2) When is large enough, w.p. , for all s.t. is well-defined, and
(48) |
The constants in big- in parts 1) and 2) depend on (, and are uniform for all .
The proof is left to Appendix D. The following theorem proves the point-wise rate of .
Theorem 6.2.
The theorem slightly improves the point-wise convergence rate of in [28]. It is proved using the same techniques as the analysis of point-wise convergence of in [27, 9], and we include a proof for completeness here.
Proof of Theorem 6.2.
By definition,
(49) |
The proof of Lemma 6.1 has constructed two good events and ( is for Part 1) to hold, Part 2) assumes and ), such that with large enough , happens w.p. , under which , for all , and are well-defined, and equations (47), (A.21), and (48) hold. (48) provides the concentration of the denominator of the r.h.s. of (49). We now consider the numerator. Note that, with sufficiently small , is uniformly bounded from below by constant . This is because , is compact, then is uniformly bounded, and meanwhile is uniformly bounded from below. Thus, under ,
and the equation equals
and we analyze the two terms respectively.
To bound , we use and again that to have
We claim that, for large enough , w.p. , and we call this good event , under which
(50) |
and the proof is in below. With (50), under , can be bounded by
(51) |
The analysis of uses concentration of independent sum again. Condition on and consider
and we have . Due to uniform boundedness of from below by , are bounded by . We claim that the expectation (proof in below)
(52) |
The variance of is bounded by
which follows the same derivation as in the proof of the point-wise convergence of without density-correction, c.f. Theorem 5.1 1), and can be directly verified by a similar calculation as in (54). We attempt at the large deviation bound at which is of small order than under the theorem condition that . Thus the classical Bernstein gives that for large enough , where the threshold is determined by and uniform for , w.p. ,
and as a result,
(53) |
By a union bound over the events needed at points, we have that (53) holds at all under a good event which happens w.p. .
Putting together, under and , by (51) and (53), at all ,
Combined with (48), under ,
It remains to establish (50) and (52)
to finish the proof of the theorem.
Proof of (50): Define r.v. and condition on , for , . Let , for any , when , then
The is obtained because , are finite constants, and
(54) |
where is the projected coordinates in the tangent plane , and the comparison of to (namely ) and the volume comparison (namely ) hold when which is a constant depending on , see e.g. Lemma A.1 in [9].
Meanwhile, is bounded by , and the variance of is bounded by and then bounded by , by a similar calculation as in (54). We attempt at the large deviation bound at which is of small order than under the theorem condition that . Thus, for each , when is enough where the threshold is determined by and uniform for , w.p. ,
The term in (50) equals zero.
By the same argument of independence of from
and the union bound over events, we have proved (50).
6.2 Dirichlet form convergence of density-corrected graph Laplacian
The graph Dirichlet form of density-corrected graph Laplacian is defined as
(55) |
We establish the counter part of Theorem 3.2, which achieves the same form rate. The theorem is for general differentiable , which can be of independent interest.
Theorem 6.3.
Proof of Theorem 6.3.
By definition (55),
The following lemma (proved in Appendix D) makes use of the concentration of to reduce the graph Dirichlet form to be a V-statistics up to a relative error at the form rate.
Lemma 6.4.
Under the good event in Lemma 6.1 1),
and the constant in big- is determined by and uniform for all .
We consider under the good event in Lemma 6.1 1), which is called and happens w.p. . Then applying Lemma 6.4 with , we have that
(56) |
The term in (56) equals , where , and . We follow the same approach as in the proof of Theorem 3.4 in [9] to analyze this V-statistic, and show that (proof in Appendix D)
(57) |
Back to (56), we have shown that under ,
and the constant in big- depends on , and . ∎
6.3 Eigen convergence of
In this subsection, let be eigenvalues of and the associated eigenvectors. By (55), recall that , the analogue of (8) is the following
(58) |
The methodology is same as before, with a main difference in the definition of the heat interpolation mapping with weights as in (59). This gives to the -weighted quadratic form defined in (60), for which we derive the concentration argument of for in (A.33) and the upper bound of in Lemma D.2. The other difference is that the -weighted 2-norm is considered because the eigenvectors are -orthogonal. All the proofs of the Steps 0-3 and Theorem 6.7 are left to Appendix D.
Step 0. We first establish eigenvalue UB based on Lemma 6.1 and the form convergence in Theorem 6.3.
Proposition 6.5 (Eigenvalue UB of ).
Step 1. Eigenvalue crude LB. We prove with the -weighted interpolation mapping defined as
(59) |
Then, same as before, , and , where for ,
(60) |
Proposition 6.6 (Initial crude eigenvalue LB of ).
Steps 2-3. We prove eigenvector consistency and refined eigenvalue convergence rate. Define
(61) |
The proof uses same techniques as before, and the differences is in handling the -orthogonality of the eigenvectors and using the concentration arguments in Lemma 6.1. Same as before, extension to when has greater than 1 multiplicity is possible (Remark 5).
Theorem 6.7 (eigen-convergence of ).
Under the same condition and setting of , being uniform, being Gaussian, and , , , same as in Theorem 5.4, where the definition of is the same except that is a constant depending on . Consider first eigenvalues and eigenvectors of , , and are normalized s.t. . Define for ,
Then, for sufficiently large , w.p., , and the same bounds as in Theorem 5.4 hold for and , for , with certain scalars satisfying ,




7 Numerical experiments
In this section gives numerical results of point-wise convergence and eigen-convergence of graph Laplacians built from simulated manifold data. Codes are released at https://github.com/xycheng/eigconvergence_gaussian_kernel.
7.1 Eigen-convergence of
We test on two simulated datasets, which are uniformly sampled on (embedded in , the formula is in Appendix A) and unit sphere (embedded in ). For both datasets, we compute over an increasing number of samples and a range of values of , where the grid points of both and are evenly spaced in log scale. For each value of and , we generate data points, construct the kernelized matrix as defined in (1) with Gaussian , and compute the first 10 eigenvalues and eigenvectors of . The errors are computed by
(62) |
where is as defined by (39). The experiment is repeated for 500 replicas from which the averaged empirical errors are computed. For the data on , . The manifold (in first 3 coordinates) is illustrated in Fig. 4(a) but the density is uniform here. See more details in Appendix A. For the data on , . These ranges are chosen so that the minimal error over for each are observed, at least for . Note that for , the population eigenvalues starting from are of multiplicity 2, and for , the multiplicities are 3, 5, .
The results are shown in Figures 2 and 3. For data on , Fig. 2 (a) shows that as a function of (with post-selected best ) shows a convergence order of about , which is consistent with the theoretical bound of in Theorem 5.5, since here. In the left plot of colored field, the log error values are smoothed over the grid of and , and the best scales with as about . The empirical scaling of optimal is less stable to observe: depending on the level of smoothing, the slope of varies between -0.2 and -0.5 (the left plot), while the slope for best (log) error is always about -0.4 (the right plot). The result without smoothing is shown in Fig. A.1. The eigenvector error in Fig. 2(b) shows an order of about , which is better than the theoretical prediction. For the data on , the eigenvalue convergence shows an order of about , in agreement with the theoretical rate of when . The eigenvector error again shows an order of about which is better than theory. The small error of eigenvector estimation at very large value of may be due to the symmetry of the simple manifolds and . In both experiments, the eigenvector estimation prefers a much larger value of than the eigenvalue estimation, which is consistent with the theory.






7.2 Density-corrected graph Laplacian
To examine the density-corrected graph Laplacian, we switch to non-uniform density on , illustrated in Fig. 4(a). We first investigate the point-wise convergence of to , on a test function , see more details in Appendix A. The error is computed as
(63) |
and the result is shown in Fig. 4. Theorem 6.2 predicts the bias error to be and the variance error to be since is fixed, which agrees with Fig. 4(d).
8 Discussion
The current result may be extended in several directions. First, for manifold with smooth boundary, the random-walk graph Laplacian recovers the Neumann Laplacian [10], and one can expect to prove the spectral convergence as well, such as in [22]. Second, extension to kernel with variable or adaptive bandwidth [5, 9], and other normalization schemes, e.g., bi-stochastic normalization [23, 20, 36], would be important to improve the robustness against low sampling density and noise in data, and even the spectral convergence as well. Related is the problem of spectral convergence to other manifold diffusion operators, e.g., the Fokker-Planck operator, on . It would also be interesting to extend to more general types of kernel function which is not Gaussian, and even not symmetric [37], for the spectral convergence. Relaxing the condition on the kernel bandwidth can also be useful: the optimal transport approach was able to show spectral consistency in the regime just beyond graph connectivity, namely when [7], which is less restrictive than the condition needed by Gaussian kernel in the current paper. Being able to extend the analysis to very sparse graph is important for applications. At last, further investigation is needed to explain the good spectral convergence observed in experiments, particularly that of the eigenvector convergence and the faster rate with density-corrected graph Laplacian. For the eigenvector convergence, the current work focuses on the 2-norm consistency, while the -norm consistency as has been derived in [11, 8] is also important to study.
Acknowledgement
The authors thank Hau-Tieng Wu for helpful discussion. Cheng thanks Yiping Lu for helpful discussion on the eigen-convergence problem. The work is supported by NSF DMS-2007040. XC is partially supported by NSF, NIH, and the Alfred P. Sloan Foundation.
References
- [1] Donald Gary Aronson. Bounds for the fundamental solution of a parabolic equation. Bulletin of the American Mathematical Society, 73(6):890–896, 1967.
- [2] Mukund Balasubramanian and Eric L Schwartz. The isomap algorithm and topological stability. Science, 295(5552):7–7, 2002.
- [3] Mikhail Belkin and Partha Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 15(6):1373–1396, 2003.
- [4] Mikhail Belkin and Partha Niyogi. Convergence of Laplacian eigenmaps. In Advances in Neural Information Processing Systems, pages 129–136, 2007.
- [5] Tyrus Berry and John Harlim. Variable bandwidth diffusion kernels. Applied and Computational Harmonic Analysis, 40(1):68–96, 2016.
- [6] Dmitri Burago, Sergei Ivanov, and Yaroslav Kurylev. A graph discretization of the Laplace-Beltrami operator. Journal of Spectral Theory, 4(4):675–714, 2014.
- [7] Jeff Calder and Nicolas Garcia Trillos. Improved spectral convergence rates for graph Laplacians on -graphs and k-NN graphs. Applied and Computational Harmonic Analysis, 60:123–175, 2022.
- [8] Jeff Calder, Nicolas Garcia Trillos, and Marta Lewicka. Lipschitz regularity of graph Laplacians on random data clouds. SIAM Journal on Mathematical Analysis, 54(1):1169–1222, 2022.
- [9] Xiuyuan Cheng and Hau-Tieng Wu. Convergence of graph Laplacian with knn self-tuned kernels. Information and Inference: A Journal of the IMA, 2021.
- [10] Ronald R Coifman and Stéphane Lafon. Diffusion maps. Applied and Computational Harmonic Analysis, 21(1):5–30, 2006.
- [11] David B Dunson, Hau-Tieng Wu, and Nan Wu. Spectral convergence of graph Laplacian and heat kernel reconstruction in from random samples. Applied and Computational Harmonic Analysis, 55:282–336, 2021.
- [12] Ahmed El Alaoui, Xiang Cheng, Aaditya Ramdas, Martin J Wainwright, and Michael I Jordan. Asymptotic behavior of -based Laplacian regularization in semi-supervised learning. In Conference on Learning Theory, pages 879–906, 2016.
- [13] Noureddine El Karoui and Hau-Tieng Wu. Graph connection Laplacian methods can be made robust to noise. The Annals of Statistics, 44(1):346–372, 2016.
- [14] Justin Eldridge, Mikhail Belkin, and Yusu Wang. Unperturbed: spectral analysis beyond Davis-Kahan. arXiv preprint arXiv:1706.06516, 2017.
- [15] M Flores, J Calder, and G Lerman. Algorithms for lp-based semi-supervised learning on graphs. arXiv preprint arXiv:1901.05031, 2019.
- [16] Alexander Grigor’yan. Gaussian upper bounds for the heat kernel on arbitrary manifolds. Journal of Differential Geometry, 45:33–52, 1997.
- [17] Alexander Grigor’yan. Heat kernel and analysis on manifolds, volume 47. American Mathematical Society, Providence, RI, 2009.
- [18] Matthias Hein. Uniform convergence of adaptive graph-based regularization. In International Conference on Computational Learning Theory, pages 50–64. Springer, 2006.
- [19] Matthias Hein, Jean-Yves Audibert, and Ulrike Von Luxburg. From graphs to manifolds–weak and strong pointwise consistency of graph Laplacians. In International Conference on Computational Learning Theory, pages 470–485. Springer, 2005.
- [20] Boris Landa, Ronald R Coifman, and Yuval Kluger. Doubly-stochastic normalization of the Gaussian kernel is robust to heteroskedastic noise. arXiv preprint arXiv:2006.00402, 2020.
- [21] Peter Li, Shing Tung Yau, et al. On the parabolic kernel of the Schrödinger operator. Acta Mathematica, 156:153–201, 1986.
- [22] Jinpeng Lu. Graph approximations to the Laplacian spectra. Journal of Topology and Analysis, pages 1–35, 2020.
- [23] Nicholas F Marshall and Ronald R Coifman. Manifold learning with bi-stochastic kernels. IMA Journal of Applied Mathematics, 84(3):455–482, 2019.
- [24] Boaz Nadler, Nathan Srebro, and Xueyuan Zhou. Semi-supervised learning with the graph Laplacian: The limit of infinite unlabelled data. Advances in Neural Information Processing Systems, 22:1330–1338, 2009.
- [25] Steven Rosenberg. The Laplacian on a Riemannian manifold: An introduction to analysis on manifolds. Number 31. Cambridge University Press, 1997.
- [26] Zuoqiang Shi. Convergence of Laplacian spectra from random samples. arXiv preprint arXiv:1507.00151, 2015.
- [27] Amit Singer. From graph to manifold Laplacian: The convergence rate. Applied and Computational Harmonic Analysis, 21(1):128–134, 2006.
- [28] Amit Singer and Hau-Tieng Wu. Spectral convergence of the connection Laplacian from random samples. Information and Inference: A Journal of the IMA, 6(1):58–123, 2016.
- [29] Dejan Slepcev and Matthew Thorpe. Analysis of p-Laplacian regularization in semisupervised learning. SIAM Journal on Mathematical Analysis, 51(3):2085–2120, 2019.
- [30] Ronen Talmon, Israel Cohen, Sharon Gannot, and Ronald R Coifman. Diffusion maps for signal processing: A deeper look at manifold-learning techniques based on kernels and graphs. IEEE signal processing magazine, 30(4):75–86, 2013.
- [31] Daniel Ting, Ling Huang, and Michael Jordan. An analysis of the convergence of graph Laplacians. arXiv preprint arXiv:1101.5435, 2011.
- [32] Nicolás García Trillos, Moritz Gerlach, Matthias Hein, and Dejan Slepčev. Error estimates for spectral convergence of the graph Laplacian on random geometric graphs toward the Laplace–Beltrami operator. Foundations of Computational Mathematics, 20(4):827–887, 2020.
- [33] Laurens Van Der Maaten, Eric Postma, and Jaap Van den Herik. Dimensionality reduction: a comparative review. J Mach Learn Res, 10(66-71):13, 2009.
- [34] Ulrike Von Luxburg, Mikhail Belkin, and Olivier Bousquet. Consistency of spectral clustering. The Annals of Statistics, pages 555–586, 2008.
- [35] Xu Wang. Spectral convergence rate of graph Laplacian. arXiv preprint arXiv:1510.08110, 2015.
- [36] Caroline L. Wormell and Sebastian Reich. Spectral convergence of diffusion maps: Improved error bounds and an alternative normalization. SIAM Journal on Numerical Analysis, 59(3):1687–1734, 2021.
- [37] Hau-Tieng Wu and Nan Wu. Think globally, fit locally under the manifold setup: Asymptotic analysis of locally linear embedding. Annals of Statistics, 46(6B):3805–3837, 2018.
Appendix A Details of numerical experiments
In the example of data, the isometric embedding in is by
where is the intrinsic coordinate of (arc-length). In the example in Section. 7.2 where is not uniform, , and the test function . In the example of data, sample are on unit sphere in .
In both plots of the raw error data without smoothing, Figures A.1 and A.2 the slope of error convergence rates (about -0.4 and - 0.33) are about the same. The slope of post-selected optimal (log) as a function of (log) changes, due to the closeness of the values over the multiple values of .






Appendix B More preliminaries
Throughout the paper, we use the following version of classical Bernstein inequality, where the tail probability uses which is an upper bound of the variance. We use the sub-Gaussian near-tail, which holds when the tempted deviation threshold .
Lemma B.1 (Classical Bernstein).
Let be i.i.d. bounded random variables, , . If and for , then
In particular, when , both the tail probabilities are bounded by .
Additional proofs in Section 2:
Proof of Theorem 2.1.
Part 1): We provide a direct verification of (10) based on the parametrix construction for completeness, which is not explicitly included in [25].
First note that there is , determined by s.t. when ,
for some depending on . This is because up to an truncation error equals the integral on . By change to the projected coordinate in , the integral domain of is contained in -ball in for small enough , then
Proof of Lemma 2.2.
Let , is a positive integer . Since , and , the Euclidean ball of radius contains -geodesic ball and is contained ()-geodesic ball, for small enough . Then both claims in Theorem 2.1 hold when for some depending on , and in 1) for , . Here by choosing larger can make the term of higher order of , yet is enough for our later analysis.
Proof of (12): We use the shorthand notation to denote . In Theorem 2.1, is fixed, for are finite constants depending on , thus
Note that , and thus when , . By the property of ,
Meanwhile, by mean value theorem and that ,
and then
Thus, for any ,
which proves (12), and the constants in big- are all determined by .
Appendix C Proofs about graph Laplacians with
C.1 Proofs in Section 3
Proof of (15) in Remark 2.
We want to show that
First consider when is uniform. Denote by the Euclidean ball in centered at with radius . When , , where is the local projected coordinate, i.e., let be the projection onto , , also . is a three-order polynomial where the coefficients depend on the derivatives of extrinsic coordinates of and at . Then,
(A.1) | |||
and , where we used the volume comparison relation . By the metric comparison, , thus
Meanwhile, the integration of odd power of vanishes on . Thus one can verify that , , and thus the l.h.s. of (A.1) . Integrating over proves that the bias error is . When is not uniform, one can similarly show that and the proof extends. ∎
Proof of Lemma 3.3.
Since is a constant, . Apply Theorem 3.2 to when , and where , which are cases and are all in . Since the set is orthonormal in ,
Under the intersection of the good events which happens with the indicated high probability, (16) holds. The needed threshold of is the max of the many ones. These thresholds and the constants in the big-’s depend on and for up to , and is a fixed integer. This means that these constants are determined by , and thus are treated as absolute ones. ∎
Proof of Lemma 3.4.
First, for any , when depending on , w.p. ,
(A.2) |
This is because, by definition, , which is independent sum of r.v. . , and boundedness which is constant. The variance of is bounded by , which again is constant. Since , (A.2) follows by the classical Bernstein.
Now consider the vectors . Apply (A.2) to when and for , and consider the intersection of the good events, which happens w.p. , when exceeds the maximum thresholds of for the cases. By , and the the polar formula , this gives (17). Both the thresholds and all the constants in big-O in (17) depend on . ∎
Proof of Lemma 3.5.
Suppose Part 1) has been shown with uniform constant in big- for each , then under the good event of Part 2), Part 2) holds automatically. In particular, since (19) is a property of the random r.v. only, where are determined by the random points and irrelevant to the vector , the threshold of large is determined by when Part 1) holds and is uniform for all .
It suffices to prove Part 1) to finish proving the lemma. For each , we construct an event under which the bound in (19) holds for , and then apply a union bound. For fixed,
By Assumption 2(C2), . and thus . Consider , which is an independent sum condition on and over the randomness of . The r.v.
satisfies that (Lemma 8 in [10], Lemma A.3 in [9])
Boundedness: again by Assumption 2(C2), . Variance of is bounded by
where since as a function on also satisfies Assumption 2,
The constants in the big- notation of and are absolute ones depending on and do not depend on . Since , the classical Bernstein gives that when is sufficiently large w.p. ,
Under this event, , and then gives that
and then
By that is independent from , and that the bound is uniform for all location of , we have that w.p. , the bound in (19) for , and applying union bound to the events proves Part 1). ∎
Proof of Proposition 3.6.
Under the condition of the current proposition, Lemma 3.5 applies. For fixed , take the intersection of the good events in Lemma 3.5, 3.4 and 3.3, which happens w.p. for large enough . Same as before, let , and by 3.4, the set is linearly independent. Let , then for each . For any , , there are , , such that . Again, by (17), we have , and together with Lemma 3.5 2),
(A.3) |
and the constant in is uniform for all . For , (18) still holds, and by that is fixed it gives
Together with (A.3), we have that
and the r.h.s. upper bounds by (8). ∎
C.2 Proofs in Section 4
Proof of (25) in Lemma 4.2.
Suppose is small enough such that Lemma 2.2 holds with being here. For each , we construct an event under which the bound in (25) holds for , and then apply a union bound. For fixed,
By (14), , and thus . Consider , which is an independent sum condition on and over the randomness of . The r.v. , , satisfies that , and boundedness: again by (14), . Variance of is bounded by . The constants in the big- notation of and are from (14) which only depend on and not on . We use the notation to stress this. Since , the classical Bernstein gives that with sufficiently large , w.p. ,
The rest of the proof is the same as that of Lemma 3.5 1), namely, by that , one can verify that both and then equals w.p. , and then (25) follows from applying union bound to the events. ∎
Proof of Proposition 4.4.
The proof is by the same method as that of Proposition 4.1, and the difference is that the eigenvectors are -orthogonal here and normalized differently. Denote as , and let , normalized s.t.
Note that this normalization of differs from what is used in the final eigen-convergence rate result, Theorem 5.5, because the current proposition concerns eigenvalue only.
Because , , then the conditions needed in Proposition 3.6 are satisfied. Thus, with sufficiently large , there is an event which happens w.p. , under which for all s.t. is well-defined, and (32) holds for . Because the good event in Proposition 3.6 assumes the good event in Lemma 3.5, then (20) also holds for all the and , which gives that ( because is Gaussian)
and, equivalently (because is a constant)
(A.4) |
We set , , , in the same way, and let , . Because the good event only concerns randomness of , under which happens w.p. ,
(A.5) |
Next, note that since , and with Gaussian , , and are -orthogonal,
(A.6) |
Then, (27) in Lemma 4.3 where gives that
then same as in (33), they are both . Together with (A.5), this gives that
(A.7) |
Then due to that , we have linear independence of with large enough .
Again, we let , and have (35). For any , , , ,
and, by that Lemma 3.5 2) holds, (20) applies to to give , thus
(A.8) |
Meanwhile, by (A.6),
(A.9) |
With the good event same as before (Lemma 4.2 at ), under , and the notation means that the constant depends on only and not on ,
(A.10) |
and then, again,
where we used (A.8) to substitute the term after the leading term is canceled in the subtraction. The UB of is similar as before, namely, by (26) in Lemma 4.3, inserting (A.9), and with the shorthand that stands for ,
Thus we have that
(A.11) |
To lower bound , again by (27) in Lemma 4.3, inserting (A.9),
and then since , we again have that . We have derived formula of in (A.10) under , and inserting (A.8),
(A.12) |
Thus,
Together with (A.11), this gives
where the notation of is defined in the same way as in the proof of Proposition 4.1. The rest of the proof is the same, and the intersection of all the needed good events , , and , which happens w.p.. ∎
C.3 Proofs in Section 5
Proof of Theorem 5.5.
With sufficiently large , we restrict to the intersection of the good events in Proposition 4.4 and the good events of applying Theorem 5.1 1) to , which happens w.p.. The good event in Proposition 4.4 is contained in the good event of Proposition 3.6 of the eigenvalue UB, which is again contained in the good event of Lemma 3.5. As a result, for all , and thus is well-defined, and (20) holds.
Applying (20) to , and because , we have that ( due to that is Gaussian)
(A.13) |
This verifies that , for .
Step 2. for : We follow a similar approach as in Proposition 5.2. When , , and is always the constant vector, thus the discrepancy is zero. Consider , by Theorem 5.1 1), and that for any ,
(A.15) |
and then by (20) which holds uniformly for all ,
Thus, there is , s.t.
(A.16) |
The constant in big- depends on first eigenfunctions, and is an absolute one because is fixed. Next, same as in the proof of Proposition 5.2, under the good event of Proposition 4.4 and by the definition of as the maximum (half) eigen-gap among , (41) holds for .
Let , is a 1-dimensional subspace in . Because ’s are -orthogonal, . Note that
(A.17) |
and because
(A.18) |
(A.19) |
Subtracting (A.17) and (A.19) gives
and by that are -orthogonal, and (41),
The square-root of the l.h.s.
and the last inequality is by (A.16). This gives that
Meanwhile, , and by -orthogonality of again, . Thus,
(A.20) |
Finally, define
and then, together with (A.20),
Applying (20) to , , and we have shown that
To finish Step 2, it remains to show that , and then we define . By definition of ,
Step 3. of : For , by the relation (A.18),
and we have shown that
Similar as in the proof of Proposition 5.3,
By (A.14), , and meanwhile, by (16). Thus . By (A.16) and the bound of , which is as shown in the proof of Proposition 5.3. Finally, by the definition of , and that ,
Since , this proves the bound of , and the argument for all . ∎
Appendix D Proofs about the density-corrected graph Laplacian with
D.1 Proofs of the point-wise convergence of
Proof of Lemma 6.1.
Part 1): By that , . For , has expectation (Lemma 8 in [10], Lemma A.3 in [9])
where is determined by manifold extrinsic coordinates; Meanwhile, ; In the independent sum , is bounded by and has variance bounded by . The rest of the proof is the same as in proving Lemma 3.5 1).
Part 2): By part 1), under a good event , which happens w.p. , (47) holds. Because for any , we then have
(A.21) |
Since , with large enough and under , , then is well-defined. Furtherly, by (A.21),
Consider the r.v. (condition on ), for ,
is bounded by and so is its variance, where the constants in big- depend on . Then, similar as in proving (47), we have a good event which happens w.p. , under which
D.2 Proofs of the Dirichlet form convergence
Proof of Lemma 6.4.
Proof.
Proof of (57) : By definition, for ,
By Lemma A.3 in [9], , and thus,
Meanwhile, by that , , and then by the boundedness and variance calculation in the proof of Theorem 3.4 of [9], one can verify that, with constants depending on ,
Then, by the same decoupling argument to derive the concentration of V-statistics, under good event which happens w.p. ,
As a result,
which proves (57) because is higher order than . ∎
D.3 Proofs of the eigen-convergence of
Proof of Proposition 6.5.
The proof is similar to that of Proposition 3.6. We first restrict to the good events in Lemma 6.1, which happens w.p. , under which and are well-defined, and (47) and (48) hold.
Let . The following lemma, proved in below, shows the near -orthonormal of the vectors and is an analogue of Lemma 3.4.
Lemma D.1.
Under the same assumption of Lemma 6.1, when is sufficiently large, w.p. ,
(A.24) |
Under the good event of Lemma D.1, called , for all , and with large enough , the set is linearly independent, and then so is the set . Let , then for each . For any , , there are , , such that . By (A.24), we have
(A.25) |
Meanwhile, by defining , similarly as in Lemma 3.3, applying Theorem 6.3 to the cases where and gives that, under a good event which happens w.p.,
(A.26) |
Then, similar as in (18),
(A.27) |
Back to the r.h.s. of (58), together with (A.25), we have that
(A.28) |
and thus provides an UB of . The bound holds for all the , under good events . ∎
Proof of Lemma D.1.
Proof of Proposition 6.6.
The proof follows the same strategy of proving Proposition 4.4, where we introduce weights by by in the heat kernel interpolation map when constructing candidate eigenfunctions from eigenvectors.
We restrict to the good event in Proposition 6.5, which is contained in in Lemma 6.1. Under , , , and is well-defined, and, with sufficiently large , . Let , normalized s.t.
Note that always . Under , (A.23) holds, and thus
(A.29) |
and the constant in big- is determined by and uniform for all . Define the notation
(A.30) |
Taking to be and gives that
(A.31) |
Set , , in the same way as in the proof of Proposition 4.4, and define as in (59). We have , , and (60) for . Next, similar as in the proof of Lemma 4.2, one can show that with large and w.p.,
(A.32) |
where the notation indicates that the constant depends on and is uniform for all . Applying (A.32) to gives that, under a good event , which happens w.p.,
(A.33) |
Applying (A.32) to gives the good event , which happens w.p., under which
(A.34) |
The constants in big- in (A.33) and (A.34) are determined by only and uniform for all .
We also need an analogue of Lemma 4.3 to upper bound , proved in below. The proof follows same method of Lemma 4.3, and makes use of the uniform boundedness of from below, and Lemma 6.4.
Lemma D.2.
We proceed to define , . Next, note that since , and are -orthonormal, then
(A.37) |
Taking in Lemma D.2, (A.36) then gives
and both are . Meanwhile, (A.33)and (A.31) give that (with that is a fixed constant determined by and )
(A.38) |
Putting together with the bounds of , this gives that
(A.39) |
Then due to that , we have linear independence of with large enough .
Same as before, for any , we let , and have (35). For any , , , , and
Meanwhile, by (A.29), ,
(A.40) |
and by (A.37),
(A.41) |
Then, as we work under , (A.33) and (A.34) hold. Applying to and subtracting the two,
where we used (A.40) to obtain the 2nd equality. To upper bound , by (A.35), and with the shorthand that stands for ,
Thus we have that
(A.42) |
To lower bound , again by (A.36), (A.40) and (A.41),
(A.43) |
Thus,
the rest of the proof is the same as that in Proposition 4.4, where the constant is defined as , being a constant determined by , and then the constant in the definition of also depends on . The needed good events are , , and , and the LB holds for . ∎
Proof of Lemma D.2.
By definition, for any ,
Take in Lemma 2.2 to be , since , the three equations hold when . By (13), truncate at an Euclidean ball, there is , a positive constant determined by , s.t.
Note that
(A.44) |
thus,
(A.45) |
Apply (12) with the short hand that stands for ,
The truncation for gives that , and then similarly as in (A.45),
(A.46) |
By Lemma 6.4, and with Gaussian , we have that under the good event of Lemma 6.1 1),
and the constant in big- is determined by and uniform for all . This gives that
(A.47) |
and as a result, together with (A.46),
Recall that , this proves (A.35).
Proof of Theorem 6.7.
With sufficiently large , we restrict to the intersection of the good events in Proposition 6.6 and the good events of applying Theorem 6.2 to . Because the good event in Proposition 6.6 is already under under of Proposition 6.5, and under of Lemma 6.1, the extra good events in addition to what is needed in Proposition 6.6 are those corresponding to in the proof of Theorem 6.2 where for each , and, by a union bound, happens w.p.. This gives to the final high probability indicated in the theorem. In addition, , for all , and is well-defined.
The rest of the proof follows similar method as that of Theorem 5.5, but differs in the normalization of the eigenvectors and that of the eigenfunctions. With the definition of and in (61) and (A.30) respectively, As has been shown in (A.29), under ,
(A.48) |
and the constant in big-O is determined by and uniform for all . This also gives that with sufficiently large ,
(A.49) |
because is upper bounded by and lower bounded by . Apply (A.49) to , this gives that , that is
and this verifies that under the high probability event.
Meanwhile, because the good event is under the one needed in Lemma D.1, as shown in the proof of Lemma D.1, we have that
where the constant in big- depends on and is uniform for all . By definition, , and then, apply (A.48) to ,
(A.50) |
Step 2. for : When , , and is always the constant vector, thus the discrepancy is zero. Consider , by Theorem 6.2 and that ,
(A.51) |
Then, by (A.49), , that is, there is , s.t.
(A.52) |
Meanwhile, because we are under , (41) holds for . The proof then proceeds in the same way as the Step 2. in Theorem 5.5, replacing with . Specifically, let , . We then have , and because
(A.53) |
we also have . Take subtraction and do the same calculation as before, by (A.52), it gives that
(A.54) |
We similarly define , , and . Then, by (A.54), we have , and by (A.49),
To finish Step 2, it remains to show that , and then we define . Note that
(A.55) |
By that , inserting into (A.55) together with (A.54), (A.50),
which gives that
by multiplying to both sides.
Step 3. of : The proof is the same as Step 3. in Theorem 5.5, replacing with . Specifically, using the relation (A.53), and the eigenvector consistency in Step 2, we have
where and . By (A.26), . Together with (A.50), one can show that . For , with (A.52), one can verify that , where used that same as before. Putting together, and with the definition of above,
We have shown that , thus the bound of is proved, and holds for . ∎