empty
Balancing Geometry and Density: Path Distances on High-Dimensional Data
Abstract
New geometric and computational analyses of power-weighted shortest-path distances (PWSPDs) are presented. By illuminating the way these metrics balance geometry and density in the underlying data, we clarify their key parameters and illustrate how they provide multiple perspectives for data analysis. Comparisons are made with related data-driven metrics, which illustrate the broader role of density in kernel-based unsupervised and semi-supervised machine learning. Computationally, we relate PWSPDs on complete weighted graphs to their analogues on weighted nearest neighbor graphs, providing high probability guarantees on their equivalence that are near-optimal. Connections with percolation theory are developed to establish estimates on the bias and variance of PWSPDs in the finite sample setting. The theoretical results are bolstered by illustrative experiments, demonstrating the versatility of PWSPDs for a wide range of data settings. Throughout the paper, our results generally require only that the underlying data is sampled from a compact low-dimensional manifold, and depend most crucially on the intrinsic dimension of this manifold, rather than its ambient dimension.
1 Introduction
The analysis of high-dimensional data is a challenge in modern statistical and machine learning. In order to defeat the curse of dimensionality [38, 34, 10], distance metrics that efficiently and accurately capture intrinsically low-dimensional latent structure in high-dimensional data are required. Indeed, this need to capture low-dimensional linear and nonlinear structure in data has led to the development of a range of data-dependent distances and related dimension reduction methods, which have been widely employed in applications [44, 57, 8, 26, 21, 58]. Understanding how these metrics trade off fundamental properties in the data (e.g. local versus global structure, geometry versus density) when making pointwise comparisons is an important challenge in their use, and may be understood as a form of model selection in unsupervised and semi-supervised machine learning problems.
1.1 Power-Weighted Shortest Path Distances
In this paper we analyze power-weighted shortest path distances (PWSPDs) and develop their applications to problems in machine learning. These metrics compute the shortest path between two points in the data, accounting for the underlying density of the points along the path. Paths through low-density regions are penalized, so that the optimal path must balance being “short” (in the sense of the classical geodesic distance) with passing through high-density regions. We consider a finite data set , which we usually assume to be intrinsically low-dimensional, in the sense that there exists a compact -dimensional Riemannian data manifold and a probability density function supported on such that .
Definition 1.1.
For and for , the (discrete) -weighted shortest path distance (PWSPD) from to is:
(1) |
where is a path of points in with and and is the Euclidean norm.
Early uses of density-based distances for interpolation [54] led to the formulation of PWSPD in the context of unsupervised and semi-supervised learning and applications [30, 60, 17, 53, 18, 13, 47, 46, 42, 64, 16]. It will occasionally be useful to think of as the path distance in the complete graph on with edge weights , which we shall denote . When , , i.e. the Euclidean distance. As increases, the largest elements in the set of path edge lengths begin to dominate the optimization (1), so that paths through higher density regions (with shorter edge lengths) are promoted. When , converges (up to rescaling by the number of edges achieving maximal length) to the longest-leg path distance [42] and is thus driven by the density function . Outside these extremes, balances taking a “short” path and taking one through regions of high density. Note that can be defined for , but it does not satisfy the triangle inequality and is thus not a metric ( however is a metric for all ). This case was studied in [2], where it is shown to have counterintuitive properties that should preclude its use in machine learning and data analysis.
While (1) is defined for finite data, it admits a corresponding continuum formulation.
Definition 1.2.
Let be a compact, -dimensional Riemannian manifold and a continuous density function on that is lower bounded away from zero (i.e. on ). For and , the (continuum) -weighted shortest path distance from to is:
(2) |
where is a path with .
Note is simply the geodesic distance on . However for and a nonuniform density, the optimal path is generally not the geodesic distance on : favors paths which travel along high-density regions, and detours off the classical geodesics are thus acceptable. The parameter controls how large of a detour is optimal; for large , optimal paths may become highly nonlocal and different from classical geodesic paths.
It is known [39, 33] that when is continuous and positive, for and all ,
(3) |
for an absolute constant depending only on and , i.e. that the discrete PWSPD computed on an i.i.d. sample from (appropriately rescaled) is a consistent estimator for the continuum PWSPD. In particular, (3) is established by [33] for , isometrically embedded manifolds and by [39] for smooth, compact manifolds without boundary and for defined using geodesic distance. We thus define the normalized (discrete) path metric
(4) |
The normalization factor accounts for the fact that for , converges uniformly to 0 as [46]. Note that the exponent in (1) and (3) is necessary to obtain a metric that is homogeneous. Moreover, as , is constant on regions of constant density, but is not. Indeed, consider a uniform distribution on , which has density . Then for all and for all , . On the other hand, for all , as , i.e. all points are equidistant in the limit . Thus the exponent in (1) and (3) is necessary to obtain an entirely density-based metric for large .
In practice, it is more efficient to compute PWSPDs in a sparse graph instead of a complete graph. It is thus natural to define PWSPDs with respect to a subgraph of .
Definition 1.3.
Let be any subgraph of . For , let be the set of paths connecting and in . For and for , the (discrete) -weighted shortest path distance (PWSPD) with respect to from to is:
Clearly . In order to compute all-pairs PWSPDs in a complete graph with nodes (i.e. for all ), a direct application of Dijkstra’s algorithm has complexity . Let denote the NN graph, constructed from by retaining only edges if is amongst the nearest neighbors of in (we say: “ is a NN of ” for short) or vice versa. In some cases the PWSPDs with respect to are known to coincide with those computed in [33, 20]. If so, we say the NN graph is a -spanner of . This provides a significant computational advantage, since NN graphs are much sparser, and reduces the complexity of computing all-pairs PWSPD to [40].
1.2 Summary of Contributions
This article develops new analyses, computational insights, and applications of PWSPDs, which may be summarized in three major contributions. First, we establish that when is not too large, PWSPDs locally are density-rescaled Euclidean distances. We give precise error bounds that improve over known bounds [39] and are tight enough to prove the local equivalence of Gaussian kernels constructed with PWSPD and density-rescaled Euclidean distances. We also develop related theory which clarifies the role of density in machine learning kernels more broadly. A range of machine learning kernels that normalize in order to mitigate or leverage differences in underlying density are considered and compared to PWSPD. Relatedly, we analyze how PWSPDs become increasingly influenced by the underlying density as . We also illustrate the role of density and benefits of PWSPDs on illustrative data sets.
Second, we improve and extend known bounds on [33, 46, 20] guaranteeing that the NN graph is a -spanner of . Specifically, we show that for any , the NN graph is a -spanner of with probability exceeding if , for an explicit constant that depends on the density power , intrinsic dimension , underlying density , and the geometry of the manifold , but is crucially independent of . These results are proved both in the case that the manifold is isometrically embedded and in the case that the edge lengths are in terms of intrinsic geodesic distance on the manifold. Our results provide an essential computational tool for the practical use of PWSPDs, and their key dependencies are verified numerically with extensive large-scale experiments.
Third, we bound the convergence rate of PWSPD to its continuum limit using a percolation theory framework, thereby quantifying the [39, 33] asymptotic convergence result (4). Specifically, we develop bias and variance estimates by relating results on Euclidean first passage percolation (FPP) to the PWSPD setting. Surprisingly, these results suggest that the variance of PWSPD is essentially independent of , and depends on the intrinsic dimension in complex ways. Numerical experiments verify our theoretical analyses and suggest several conjectures related to Euclidean FPP that are of independent interest.
1.3 Notation
We shall use the notation in Table 1 consistently, though certain specialized notation will be introduced as required. We assume throughout that the data is drawn from a compact Riemannian data manifold , with additional assumptions imposed on as needed; we do not rigorously consider the more general case that is drawn from a distribution supported near . If , we assume that it is isometrically embedded in , i.e. is the unique metric induced by restricting the Euclidean metric on to , unless otherwise stated. If an event holds with probability , where and is independent of , we say it holds with high probability (w.h.p.).
Notation | Definition |
---|---|
, a finite data set | |
ambient dimension of data set | |
intrinsic dimension of data set | |
, the Euclidean -norm of | |
, the Euclidean -norm | |
the absolute value of | |
complete graph on with edge weight between | |
edge between nodes in a graph | |
a Riemannian manifold with associated metric | |
measure of curvature on ; see Definition 2.1 | |
measure of regularity on ; see Definition 3.7 | |
reach of a manifold ; see Definition 3.8 | |
probability density function from which is drawn | |
, | minimum and maximum values of density defined on compact manifold |
, | discrete, continuous path |
discrete PWSPD, see (1) | |
rescaled version of , see (4) | |
discrete PWSPD defined on the subgraph ; see Definition 1.3 | |
continuum PWSPD, see (2) | |
geodesic distance on manifold | |
density-based stretch of Euclidean distance with respect to | |
weight, degree, and Laplacian matrices associated to a graph | |
arbitrary metric | |
, ball of radius centered at with respect to | |
Euclidean ball of radius centered at , dimension determined by context | |
-elongated set of radius based at points ; see Definition 3.4 | |
number of nearest neighbors (NN), sometimes dependent on (i.e. ) | |
percolation time, fluctuation constants | |
intensity parameter in a Poisson point process | |
complement of the set | |
expectation, variance of a random variable | |
, the Euclidean diameter of a set | |
volume of a set , with dimension depending on context | |
complement of a set | |
boundary of a set | |
for a constant independent of the dependencies of | |
quantity is proportional to quantity , i.e. and |
2 Local Analysis: Density and Kernels
Density-driven methods are commonly used for unsupervised and semi-supervised learning [19, 27, 21, 51, 13, 7, 52]. Despite this popularity, the role of density is not completely clear in this context. Indeed, some methods seek to leverage variations in density while others mitigate it. In this section, we explore the role that density plays in popular machine learning kernels, including those used in self-tuning spectral clustering and diffusion maps. We compare with the effect of density in -based kernels, and illustrate the primary advantages and disadvantages on toy data sets.
2.1 Role of Density in Graph Laplacian Kernels
A large family of algorithms [8, 9, 56, 48, 61] view data points as the nodes of a graph, and define the corresponding edge weights via a kernel function. In general, by kernel we mean a function that captures a notion of similarity between elements of . More precisely, we suppose that is of the form for some metric on and smooth, positive, rapidly decaying (hence integrable) function . Our technical results will pertain exclusively to the Gaussian kernel for some metric and scaling parameter , albeit more general kernels have been considered in the literature [4, 23, 11]. Given , one first defines a weight matrix by for some kernel , and diagonal degree matrix by . A graph Laplacian is then defined using . Then, the lowest frequency eigenvectors of , denoted , define a -dimensional spectral embedding of the data by , where . Commonly, a standard clustering algorithm such as -means is then applied to the spectral embedding. This procedure is known as spectral clustering (SC). In unnormalized SC, , while in normalized SC either the random walk Laplacian or the symmetric normalized Laplacian is used.
Many modifications of this general framework have been considered. Although SC is better able to handle irregularly shaped clusters than many traditional algorithms [5, 55], it is often unstable in the presence of low degree points and sensitive to the choice of scaling parameter when using the Gaussian kernel [61]. These shortcomings motivated [63] to apply SC with the self-tuning kernel where is Euclidean distance of to its NN. To clarify how the data density influences this kernel, consider how relates to the NN density estimator at :
(5) |
It is known [43] that if is such that while , then is a consistent estimator of , as long as is continuous and positive. Furthermore, if is uniformly continuous and while , then with probability 1 [25]. Although these results assume the density is supported in , the density estimator (5) is consistent in the general case when is supported on a -dimensional Riemannian manifold for [28]. For such , for some constant depending on . Thus, for large the kernel for self-tuning spectral clustering is approximately:
(6) |
Relative to a standard SC kernel, (6) weakens connections in high density regions and strengthens connections in low density regions.
Diffusion maps [22, 21] is a more general framework which reduces to SC for certain parameter choices. More specifically, [21] considered the family of kernels
(7) |
parametrized by , which determines the degree of density normalization. Since , is a kernel density estimator of the density [12] and, up to higher order terms,
(8) |
Note that has an effect on the kernel similar to the self-tuning kernel (6): connections in high density regions are weakened, and connections in low density regions are strengthened. Let denote the discrete random walk Laplacian using the weights given in (7). The discrete operator converges to the continuum Kolmogorov operator as for Laplacian operator and gradient , both taken with respect to the Riemannian metric inherited from the ambient space [8, 21, 12]. When , we recover standard spectral clustering; there is no density renormalization in the kernel but the limiting operator is density dependent. When , ; in this case the discrete operator is density dependent but the limiting operator is purely geometric, since the density term is eliminated. We note that Laplacians and diffusion maps with various metrics and norms have been considered in a range of settings [62, 15, 59, 41].
2.2 Local Characterization of PWSPD-Based Kernels
While the kernels discussed in Section 2.1 compensate for discrepancies in density, PWSPD-based kernels strengthen connections through high-density regions and weaken connections through low-density regions. To illustrate more clearly the role of density in PWSPD-based kernels, we first show that locally the continuum PWSPD is well-approximated by the density-based stretch of Euclidean distance as long as does not vary too rapidly and does not curve too quickly. This is quantified in Lemma 2.2, which is then used to prove Theorem 2.3, which bounds the local deviation of from . Finally, Corollary 2.4 establishes that Gaussian kernels constructed with and are locally similar. Throughout this section we assume as defined below.
Definition 2.1.
An isometrically embedded Riemannian manifold is an element of if it is compact with dimension , , and for all such that , where is geodesic distance on .
The condition for all such that is equivalent to an upper bound on the second fundamental form: for all [4, 45]. Note that this is also equivalent to a positive lower bound on the reach [29] of (e.g. Proposition 6.1 in [49] and Proposition A.1 in [1]); see Definition 3.8.
Let and denote, respectively, the (closed) and geodesic balls centered at of radius . Let , be the global density maximum and minimum. Define the following local quantities:
Let , which characterizes the local discrepancy in density in a ball of radius around the point .
The following Lemma establishes that and are locally equivalent, and that discrepancies depend on and the curvature constant . We note similar estimates appear in [2] for the special case . The proof appears in Appendix A.
Lemma 2.2.
Let . Then for all with and ,
(9) |
Note that corresponding bounds in terms of geodesic distance follow easily from the definition of : . Lemma 2.2 thus establishes that the metrics and are locally equivalent when (i) is close to 1, (ii) is not too large, and (iii) is not too large. However, when , balls may become highly nonlocal in terms of geodesics.
The following Theorem establishes the local equivalence of and (and thus kernels constructed using these metrics). Assuming the density does not vary too quickly, Lemma 2.2 can be used to show that locally the difference between and is small. Variations in density are controlled by requiring that is -Lipschitz with respect to geodesic distance, i.e. . This Lipschitz assumption allows us to establish a higher-order equivalence compared to existing results (e.g. Corollary 9 in [39]), which we leverage to obtain the local kernel equivalence stated in Corollary 2.4. The following analysis also establishes explicit dependencies of the equivalence on .
Theorem 2.3.
Assume and that is a bounded -Lipschitz density function on with . Let and let
Then for all such that and ,
Proof.
We first show that is close to 1. Let satisfy and satisfy (since these sets are compact, these points must exist). Then by the Lipschitz condition:
Let be a path achieving . Note that
so that We thus obtain
(10) |
Letting , Taylor expanding around and (10) give . Applying Lemma 2.2 yields , which gives
Rewriting the above yields:
We thus obtain
∎
Note the coefficient increases exponentially in ; thus the equivalence between and is weaker for large . We also emphasize that in a Euclidean ball of radius , the metric scales like ; Theorem 2.3 thus guarantees that the relative error of approximating with is .
When is locally well-approximated by , the kernels constructed from these two metrics are also locally similar. The following Corollary leverages the error term in Theorem 2.3 to make this precise for Gaussian kernels. It is a direct consequence of Theorem 2.3 and Taylor expanding the Gaussian kernel, and its proof is given in Appendix C. Let so that is the Gaussian kernel with metric and scaling parameter . Note .
Corollary 2.4.
Under the assumptions and notation of Theorem 2.3, for ,
When is not too large relative to , a kernel constructed with is locally well-approximated by a kernel constructed with . Thus, in a Euclidean ball of radius , we may think of the Gaussian kernel as:
Density plays a different role in this kernel compared with those of Section 2.1. This kernel strengthens connections in high density regions and weakens them in low density regions.
We note that the -power in Definition 1.2 has a large impact, in that -based and -based kernels have very different properties. More specifically, is a local kernel as defined in [12], so it is sufficient to analyze the kernel locally. However is a non-local kernel, so that non-trivial connections between distant points are possible. The analysis in this Section thus establishes the global equivalence of and (when is not too large relative to ) but only the local equivalence of and .
2.3 The Role of : Examples
This subsection illustrates the useful properties of PWSPDs and the role of on three synthetic data sets in : (1) Two Rings data, consisting of two non-convex clusters that are well-separated by a low-density region; (2) Long Bottleneck data, consisting of two isotropic clusters each with a density gap connected by a long, thin bottleneck; (3) Short Bottleneck data, where two elongated clusters are connected by a short bottleneck. The data sets are shown in Figures 1, 2, and 3, respectively. We also show the PWSPD spectral embedding (denoted PWSPD SE) for various , computed from a symmetric normalized Laplacian constructed with PWSPD. The scaling parameter for each data set is chosen as the percentile of pairwise PWSPD distances.
Different aspects of the data are emphasized in the low-dimensional PWSPD embedding as varies. Indeed, in Figure 1, we see the PWSPD embedding separates the rings for large but not for small . In Figure 2, we see separation across the bottleneck for small, while for large there is separation with respect to the density gradients that appear in the two bells of the dumbbell. Interestingly, separation with respect to both density and geometry is observed for (see Figure 2(g)). In Figure 3, the clusters are both elongated and lack robust density separation, but the PWSPD embedding well-separates the two clusters for moderate . In general, close to 1 emphasizes the geometry of the data, large emphasizes the density structure of the data, and moderate defines a metric balancing these two considerations.































2.3.1 Comparison with Euclidean Spectral Clustering
To evaluate how impacts the clusterability of the PWSPD spectral embedding, we consider experiments in which we run spectral clustering under various graph constructions. We run -means for a range of parameters on the spectral embedding , where is the lowest frequency eigenvector of the Laplacian. We construct the symmetric normalized Laplacian using PWSPD (denoted PWSPD SC) and also using Euclidean distances (denoted SC) and the Laplacian with diffusion maps normalization (denoted SC+DMN). We vary in the SC and SC+DMN methods, and both and in the PWSPD SC method. Results for selt-tuning SC, in which the NN used to compute the local scaling parameter varies, are in Appendix D. To allow for comparisons across figures, is varied across the percentiles of the pairwise distances in the underlying data, up to the percentile. We measure two outputs of the clustering experiments:
-
(i)
The overall accuracy (OA), namely the proportion of data points correctly labeled after alignment when is known a priori. For , similar results were observed when thresholding at 0 instead of running -means; see Appendix D.
-
(ii)
The eigengap estimate of the number of latent clusters: , where are the eigenvalues of the corresponding graph Laplacian. We note that experiments estimating by considering the ratio of consecutive eigenvalues were also performed, with similar results. In the case of PWSPD SC, we plot heatmaps of where is correctly estimated, with yellow corresponding to success () and blue corresponding to failure ().
The results in terms of OA and as a function of and are in Figures 1, 2, 3. We see that when density separates the data clearly, as in the Two Rings data, PWSPD SC with large gives accurate clustering results, while small may fail. In this dataset, very small allows for the data to be correctly clustered with SC and SC+DMN when is known a priori. However, the regime of is so small that the eigenvalues become unhelpful for estimating the number of latent clusters. Unlike Euclidean spectral clustering, PWSPD SC correctly estimates for a range of parameters, and achieves near-perfect clustering results for those parameters as well. Indeed, as shown by Figures 1(f), 1(h), PWSPD SC with large is able to do fully unsupervised clustering on the Two Rings data.
In the case of the Long Bottleneck dataset, there are three reasonable latent clusterings, depending on whether geometry, density, or both matter (see Figure 2(a), 2(k), 2(f)). PWSPD is able to balance between the geometry and density-driven cluster structure in the data. Indeed, all of the cluster configurations shown in Figure 2(a), 2(k), 2(f) are learnable without supervision for some choice of parameters . To capture the density cluster structure (), should be taken large, as suggested in Figure 2(m), 2(n). To capture the geometry cluster structure (), should be taken small and large, as suggested by Figures 2(c), 2(d). Interestingly, both cluster and geometry () can be captured by choosing moderate, as in Figure 2(h), 2(i). For Euclidean SC, varying is insufficient to capture the rich structure of this data.
In the case of the Short Bottleneck, taking large allows for the Euclidean methods to correctly estimate the number of clusters. But, in this regime, the methods do not cluster accurately. On the other hand, taking between 2 and 3 and large allows PWSPD to correctly estimate and also cluster accurately.
Overall, this suggests that varying in PWSPD SC has a different impact than varying the scaling parameter , and can allow for richer cluster structures to be learned when compared to SC with Euclidean distances. In addition, PWSPDs generally allow for the underlying cluster structures to be learned in a fully unsupervised manner, while Euclidean methods may struggle to simultaneously cluster well and estimate accurately.
3 Spanners for PWSPD
Let denote a subgraph and recall the definition of given in Definition 1.3.
Definition 3.1.
For , is a -spanner if for all .
Clearly always, as any path in is a path in . Hence if is a -spanner we have equality: . Define the NN graph, , by retaining only edges if is a NN of or vice versa. For appropriate and it is known that is a -spanner of w.h.p. Specifically, [33] shows this when is an open connected set with boundary, and for a constant depending on . One can deduce , while the dependence on is more obscure. A different approach is used in [20] to show this for arbitrary smooth, closed, isometrically embedded , and , where hides constants depending on the geometry of . In both cases must be continuous and bounded away from zero.
Under these assumptions, we prove is a -spanner w.h.p., for any smooth, closed, isometrically embedded with mild restrictions on its curvature. Our results hold generally for and enjoy improved dependence of on and explicit dependence of on and the geometry of compared to [33, 20]. We also consider an intrinsic version of PWSPD,
where is assumed known, which is not typically the case in data science. However this situation can occur when is presented as a subset of , but one wishes to analyze with an exotic metric (i.e. not ). For example, if each is an image, a Wasserstein metric may be more appropriate than . As this case closely mirrors the statement and proof of Theorem 11 we leave it to Appendix E. Before proceeding we introduce some further terminology:
Definition 3.2.
The edge is critical if it is in the shortest path from to in .
Lemma 3.3.
[20] is a -spanner if it contains every critical edge of .
3.1 Nearest Neighbors and PWSPD Spanners
A key proof ingredient is the following definition, which generalizes the role of spheres in the proof of Theorem 1.3 in [20].
Definition 3.4.
For any and , the -elongated set associated to is
Visualizations of are shown in Figure 4. is the set of points such that the two-hop path, , is -shorter than the one-hop path, . Hence:



Lemma 3.5.
If there exists then the edge is not critical.
We defer the proof of the following technical Lemma to Appendix B.
Lemma 3.6.
Let , , and for . Then:
For , [33] makes a similar claim but crucially does not quantify the dependence of the radius of this ball on . Before proceeding, we introduce two regularity assumptions:
Definition 3.7.
is in for and if it is connected and for all we have: .
Definition 3.8.
A compact manifold has reach if every satisfying has a unique projection onto .
Theorem 3.9.
Let be a compact manifold with reach . Let be drawn i.i.d. from according to a probability distribution with continuous density satisfying for all . For and sufficiently large, is a -spanner of with probability at least if
(11) |
Proof.
In light of Lemma 3.3 we prove that, with probability at least , contains every critical edge of . Equivalently, we show every edge of not contained in is not critical.
For any , for sufficiently large [46]. So, let be sufficiently large so that . Pick any which are not NNs and let . If , then and thus the edge is not critical. So, suppose without loss of generality in what follows that .
Define and ; note that by the assumption . Let and let be the projection of onto , which is unique because . By Lemma 3.6, . By Lemma B.1, . Let denote the NNs of , ordered randomly. Because is not a NN of , for . Thus, and so by Lemma B.2 we bound for fixed
(12) | ||||
(13) |
Because the are all independently drawn:
A routine calculations reveals that for ,
(14) |
By Lemma 3.5 we conclude the edge is not critical with probability exceeding . There are fewer than such non-NN pairs . These edges are precisely those contained in but not in . By the union bound and (14) we conclude that none of these are critical with probability greater than . This was conditioned on for all , which holds with probability exceeding . Thus, all critical edges are contained in with probability exceeding . Unpacking yields the claimed lower bound on . ∎
In (11), the explicit dependence of on , and are shown. The factor corresponds to the geometry of . The numerical constant 4, which is not tight, stems from accounting for the reach of . If is convex (i.e. ) then it can be replaced with . The second factor in (11) is controlled by the probability distribution while the third corresponds to and . For and ignoring geometric and density factors we attain as in [20]. For large we get , thus improving the dependence of on given in [33, 20]. Finally, using Corollary 4.4 of [46] we can sharpen the qualitative requirement that be “sufficiently large” to the quantitative lower bound for a constant depending on the geometry of . So, when is high-dimensional, has small reach, or when is close to 1, may need to be quite large for as in (11) to yield a 1-spanner.
3.2 Numerical Experiments
We verify the claimed dependence of on and ensures that is a -spanner of numerically. To generate Figures 5(a)–5(f) we:
-
(1)
Fix and , then generate a sequence of pairs.
-
(2)
For each , do:
-
(i)
Generate by sampling i.i.d. from on .
-
(ii)
For all pairs compute and .
-
(iii)
If record “failure”; else, record “success”.
-
(i)
-
(3)
Repeat step 2 twenty times and compute the proportion of successes.
As can be seen from Figure 5, there is a sharp transition between an “all failures” and an “all successes” regime. The transition line is roughly linear when viewed using semi-log-x axes, i.e. . Moreover the slope of the line-of-best-fit to this transition line decreases with increasing (compare Figure 5(a)-5(c)) and depends on intrinsic, not extrinsic dimension (compare Figure 5(b) and 5(d)), as predicted by Theorem 11. Intriguingly, there is little difference between Figure 5(b) (uniform distribution) and Figure 5(e) (Gaussian distribution), suggesting that perhaps the assumption in Theorem 11 is unnecessary. Finally, we observe that the constant of proportionality (i.e. such that ) predicted by Theorem 11 appears pessimistic. For Figure 5(a)-5(c), Theorem 11 predicts and respectively (taking due to the flat domain), while empirically the slope of the line-of-best-fit is , and respectively.
In Figure 5(f), we consider an intrinsically 4-dimensional set corrupted with Gaussian noise (standard deviation ) in the fifth dimension. Interestingly, the scaling with is more efficient than as shown in Figure 5(a) for the intrinsically 5-dimensional data. This suggests that measures which concentrate near low-dimensional sets benefit from that low-dimensionality, even if they are not supported exactly on it.
We also consider relaxing the success condition (2.iii). We define to be a -spanner if for proportion of the edges, so that Theorem 3.9 pertains to (1,1)-spanners. Figures 5(g) and 5(h) show the minimal (averaged across simulations) for which is a -spanner and a -spanner respectively; the red lines trace out the requirements for to be a -spanner and -spanner respectively. Comparing with Figure 5(b), we see that the required scaling for to be a -spanner is similar to the required scaling to be a -spanner, at least for small. However, the required scaling for -spanners () is quite different and much less restrictive, even for very close to 1; for example the requirement for to be a -spanner appears sublinear in the versus plot (see Figure 5(h)). If this notion of approximation is acceptable, our empirical results suggest one can enjoy much greater sparsity. Finally, in Figure 5(i) we compute the minimal such that is a -spanner of ; again the overall transition patterns for -spanners are similar to the -spanner case in Figure 5(b) when is close to 1. Overall we see that greater sparsity is permissible in these relaxed cases, and analyzing such notions rigorously is a topic of ongoing research.









4 Global Analysis: Statistics on PWSPD and Percolation
We recall that after a suitable normalization, is a consistent estimator for . Indeed, [39, 33] prove that for any , , there exists a constant independent of such that . The important question then arises: how quickly does converge? How large does need to be to guarantee the error incurred by approximating with is small? To answer this question we turn to results from Euclidean first passage percolation (FPP) [36, 37, 6, 24]. For any discrete set , we let denote the PWSPD computed in the set .
4.1 Overview of Euclidean First Passage Percolation
Euclidean FPP analyzes , where is a homogeneous, unit intensity Poisson point process (PPP) on .
Definition 4.1.
A (homogeneous) Poisson point process (PPP) on is a point process such that for any bounded subset , (the number of points in ) is a random variable with distribution ; is the intensity of the PPP.
It is known that
(15) |
where is a constant depending only on known as the time constant. The convergence of is studied by decomposing the error into random and deterministic fluctuations, i.e.
In terms of mean squared error (MSE), one has the standard bias-variance decomposition:
The following Proposition is well known in the Euclidean FPP literature.
Proposition 4.2.
Let and . Then for a constant depending only on .
Although is the best bound which has been proved, the fluctuation rate is known to in fact depend on the dimension, i.e. for some exponent . Strong evidence is provided in [24] that the bias can be bounded by the variance, so the exponent very likely controls the total convergence rate.
The following tail bound is also known [37].
Proposition 4.3.
Let , and . For any , there exist constants and (depending on ) such that for and , .
4.2 Convergence Rates for PWSPD
We wish to utilize the results in Section 4.1 to obtain convergence rates for PWSPD. However, we are interested in PWSPD computed on a compact set with boundary and the convergence rate of rather than . To simplify the analysis, we restrict our attention to the following idealized model.
Assumption 1.
Let be a convex, compact, -dimensional set of unit volume containing the origin. Assume we sample points independently and uniformly from , i.e. , to obtain the discrete set . Let denote the points in which are at least distance from the boundary of , i.e. .
We establish three things: (i) Euclidean FPP results apply away from ; (ii) the time constant equals the constant in (3); (iii) has the same convergence rate as .
To establish (i), we let denote a homogeneous PPP with rate , and let denote the length of the shortest path connecting and in . We also let and denote the PWSPD in ; note . To apply percolation results to our setting, the statistical equivalence of , and must be established. For large, the equivalence of and is standard and we omit any analysis. The equivalence of and is less clear. In particular, how far away from do need to be to ensure these metrics are the same? The following Proposition is a direct consequence of Theorem 2.4 from [37], and essentially guarantees the equivalence of the metrics as long as and are at least distance from .
Proposition 4.4.
Let , , , , and , and . Then for constants (depending on ), for all , the geodesics connecting in and are equal with probability at least , so that .
Next we establish the equivalence of (percolation time constant) and (PWSPD discrete-to-continuum normalization constant).
Proof.
Finally, we bound our real quantity of interest: the convergence rate of to .
Theorem 4.6.
Assume Assumption 1, , , , , and . Then for large enough, .
Proof.
To simplify notation throughout the proof we denote simply by . By Proposition 4.5 and for large enough,
where and is a homogeneous PPP with rate . Let be the event that the geodesics from to in and are equal. Since we assume , we may apply Proposition 4.4 with to conclude for . Conditioning on , and observing , we obtain
where decays exponentially in (for the last line note that conditioning on means conditioning on the geodesics being local, which can only decrease the expected error).
A Lipschitz analysis applied to the function yields:
By Proposition 4.3,
(16) |
with probability at least for any , where . Fix and let be the event that (16) is satisfied. On ,
for large enough. Note also that
and decreases exponentially in . We thus obtain
where is a constant depending on , and the last line follows since once again the expected error is lower conditioned on than unconditionally. We have thus established
for exponentially small in . Finally let be the unit intensity homogeneous PPP obtained from by multiplying each axis by . By Proposition 4.2,
For large, the above dominates , so that for a constant depending on ,
∎
4.3 Estimating the Fluctuation Exponent
As an application, we utilize the 1-spanner results of Section 3 to empirically estimate the fluctuation rate . Since there is evidence that the variance dominates the bias, this important parameter likely determines the convergence rate of to . Once again utilizing the change of variable , we note
and we estimate the right hand side from simulations. Specifically, we sample points uniformly from the unit cube and compute for , in a NN graph on , with as suggested by Theorem 3.9 (note that in this example). We vary from to , and for each we estimate from simulations. Figure 6 shows the resulting log-log variance plots for and various , as well as the slopes from a linear regression. The observed slopes are related to by , and one thus obtains the estimates for reported in Table 2. See Appendix F for confidence interval estimates.
These simulations confirm that is indeed independent of . It is conjectured in the percolation literature that as increases, with , , which is consistent with our results. For , the empirical convergence rate is thus (not as given in Theorem 4.6), and for large one expects an MSE of order instead of . However estimating empirically becomes increasingly difficult as increases, since one has less sparsity in the NN graph, and because is obtained from by , so errors incurred in estimating the regression slopes are amplified by a factor of . Table 2 also reports the factor , which can be interpreted as the expected computational speed-up obtained by running the simulation in a NN graph instead of a complete graph. We were unable to obtain empirical speed-up factors since computational resources prevented running the simulations in a complete graph.
An important open problem is establishing that computed from a nonuniform density enjoys the same convergence rate (with respect to ) as the uniform case. Although this seems intuitively true and preliminary simulation results support this equivalence, to the best of our knowledge it has not been proven, as the current proof techniques rely on “straight line” geodesics.




2 | 0.30 | 394 | 3 | 0.28 | 152 | 4 | 0.19 | 58 | |||
2 | 0.31 | 667 | 3 | 0.23 | 336 | 4 | 0.16 | 169 | |||
2 | 0.33 | 1204 | 3 | 0.24 | 820 | 4 | 0.14 | 558 | |||
2 | 0.34 | 1545 | 3 | 0.29 | 1204 | 4 | 0.19 | 927 |
5 Conclusion and Future Work
This article establishes local equivalence of PWSPD to a density-based stretch of Euclidean distance. We derive a near-optimal condition on for the NN graph to be a 1-spanner for PWSPD, quantifying and improving the dependence on and . Moreover, we leverage the theory of Euclidean FPP to establish statistical convergence rates for PWSPD to its continuum limit, and apply our spanner results to empirically support conjectures on the optimal dimension-dependent rates of convergence.
Many directions remain for future work. Our statistical convergence rates for PWSPD in Section 4 are limited to uniform distributions. Preliminary numerical experiments indicate that these rates also hold for PWSPDs defined with varying density, but rigorous convergence rates for nonhomogeneous PPPs are lacking in the literature.
The analysis of Section 2 proved the local equivalence of PWSPDs with density-stretched Euclidean distances. These results and the convergence results of Section 4 are the first steps in a program of developing a discrete-to-continuum limit analysis for PWSPDs and PWSPD-based operators. A major problem is to develop conditions so that the discrete graph Laplacian (defined with ) converges to a continuum second order differential operator as . A related direction is the analysis of how data clusterability with PWSPDs depends on for various random data models and in specific applications.
The numerical results of Section 3.2 confirm that is required for the NN graph to be a 1-spanner, as predicted by theory. Relaxing the notion of -spanners to -spanners, as suggested in Section 3.2, is a topic of future research.
Finally, the results of this article require data to be generated from a distribution supported exactly on a low-dimensional manifold . An arguably more realistic setting is the noisy one in which the data is distributed only approximately on . Two potential models are of interest: (i) replacing with (tube model) and (ii) considering a density that concentrates on , rather than being supported on it (concentration model). PWSPDs may exhibit very different properties under these two noise models, for example under bounded uniform noise and Gaussian noise, especially for large . For the concentration model one expects noisy PWSPDs to converge to manifold PWSPDs for large, since the optimal PWSPD paths are density driven. Preliminary empirical results (Figure 5(f)) suggest that when the measure concentrates sufficiently near a low-dimensional set , the number of nearest neighbors needed for a 1-spanner benefits from the intrinsic low-dimensional structure. For the tube model, although noisy PWSPDs will not converge to manifold PWSPDs, they will still scale according to the intrinsic manifold dimension for small. For both models, incorporating a denoising procedure such as local averaging [31] or diffusion [35] before computing PWSPDs is expected to be advantageous. Future research will investigate robust denoising procedures for PWSPD, computing PWSPDs after dimension reduction, and which type of noise distributions are most adversarial to PWSPD.
Acknowledgements
AVL acknowledges partial support from the US National Science Foundation under grant DMS-1912906. DM acknowledges partial support from the US National Science Foundation under grant DMS-1720237 and the Office of Naval Research under grant N000141712162. JMM acknowledges partial support from the US National Science Foundation under grants DMS-1912737 and DMS-1924513. DM thanks Matthias Wink for several useful discussions on Riemannian geometry. We thank the two reviewers and the associate editor for many helpful comments that greatly improved the manuscript.
References
- [1] E. Aamari, J. Kim, F. Chazal, B. Michel, A. Rinaldo, and L. Wasserman. Estimating the reach of a manifold. Electronic Journal of Statistics, 13(1):1359–1399, 2019.
- [2] M. Alamgir and U. Von Luxburg. Shortest path distance in random k-nearest neighbor graphs. In ICML, pages 1251–1258, 2012.
- [3] K.S. Alexander. A note on some rates of convergence in first-passage percolation. The Annals of Applied Probability, pages 81–90, 1993.
- [4] H. Antil, T. Berry, and J. Harlim. Fractional diffusion maps. Applied and Computational Harmonic Analysis, 54:145–175, 2021.
- [5] E. Arias-Castro. Clustering based on pairwise distances when the data is of mixed dimensions. IEEE Transactions on Information Theory, 57(3):1692–1706, 2011.
- [6] A. Auffinger, M. Damron, and J. Hanson. 50 Years of First-Passage Percolation, volume 68. American Mathematical Soc., 2017.
- [7] M. Azizyan, A. Singh, and L. Wasserman. Density-sensitive semisupervised inference. The Annals of Statistics, 41(2):751–771, 2013.
- [8] M. Belkin and P. Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 15(6):1373–1396, 2003.
- [9] M. Belkin and P. Niyogi. Convergence of Laplacian eigenmaps. In NIPS, pages 129–136, 2007.
- [10] R.E. Bellman. Adaptive control processes: a guided tour. Princeton University Press, 2015.
- [11] T. Berry and J. Harlim. Variable bandwidth diffusion kernels. Applied and Computational Harmonic Analysis, 40(1):68–96, 2016.
- [12] T. Berry and T. Sauer. Local kernels and the geometric structure of data. Applied and Computational Harmonic Analysis, 40(3):439–469, 2016.
- [13] A.S. Bijral, N. Ratliff, and N. Srebro. Semi-supervised learning with density based distances. In UAI, pages 43–50, 2011.
- [14] J.-D. Boissonnat, A. Lieutier, and M. Wintraecken. The reach, metric distortion, geodesic convexity and the variation of tangent spaces. Journal of Applied and Computational Topology, 3(1):29–58, 2019.
- [15] L. Boninsegna, G. Gobbo, F. Noé, and C. Clementi. Investigating molecular kinetics by variationally optimized diffusion maps. Journal of Chemical Theory and Computation, 11(12):5947–5960, 2015.
- [16] E. Borghini, X. Fernández, P. Groisman, and G. Mindlin. Intrinsic persistent homology via density-based metric learning. arXiv preprint arXiv:2012.07621, 2020.
- [17] O. Bousquet, O. Chapelle, and M. Hein. Measure based regularization. In NIPS, pages 1221–1228, 2004.
- [18] H. Chang and D.-Y. Yeung. Robust path-based spectral clustering. Pattern Recognition, 41(1):191–203, 2008.
- [19] Y. Cheng. Mean shift, mode seeking, and clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17(8):790–799, 1995.
- [20] T. Chu, G.L. Miller, and D.R. Sheehy. Exact computation of a manifold metric, via Lipschitz embeddings and shortest paths on a graph. In SODA, pages 411–425, 2020.
- [21] R.R. Coifman and S. Lafon. Diffusion maps. Applied and Computational Harmonic Analysis, 21(1):5–30, 2006.
- [22] R.R. Coifman, S. Lafon, A.B. Lee, M. Maggioni, B. Nadler, F. Warner, and S.W. Zucker. Geometric diffusions as a tool for harmonic analysis and structure definition of data: Diffusion maps. Proceedings of the National Academy of Sciences, 102(21):7426–7431, 2005.
- [23] S.B. Damelin, F.J. Hickernell, D.L. Ragozin, and X. Zeng. On energy, discrepancy and group invariant measures on measurable subsets of euclidean space. Journal of Fourier Analysis and Applications, 16(6):813–839, 2010.
- [24] M. Damron and X. Wang. Entropy reduction in Euclidean first-passage percolation. Electronic Journal of Probability, 21, 2016.
- [25] L.P. Devroye and T.J. Wagner. The strong uniform consistency of nearest neighbor density estimates. The Annals of Statistics, pages 536–540, 1977.
- [26] D.L. Donoho and C. Grimes. Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data. Proceedings of the National Academy of Sciences, 100(10):5591–5596, 2003.
- [27] M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases with noise. In KDD, volume 96, pages 226–231, 1996.
- [28] A.M. Farahmand, C. Szepesvári, and J.-Y. Audibert. Manifold-adaptive dimension estimation. In ICML, pages 265–272, 2007.
- [29] H. Federer. Curvature measures. Transactions of the American Mathematical Society, 93(3):418–491, 1959.
- [30] B. Fischer, T. Zöller, and J.M. Buhmann. Path based pairwise data clustering with application to texture segmentation. In International Workshop on Energy Minimization Methods in Computer Vision and Pattern Recognition, pages 235–250. Springer, 2001.
- [31] N. García Trillos, D. Sanz-Alonso, and R. Yang. Local regularization of noisy point clouds: Improved global geometric estimates and data analysis. Journal of Machine Learning Research, 20(136):1–37, 2019.
- [32] A. Gray. The volume of a small geodesic ball of a Riemannian manifold. The Michigan Mathematical Journal, 20(4):329–344, 1974.
- [33] P. Groisman, M. Jonckheere, and F. Sapienza. Nonhomogeneous Euclidean first-passage percolation and distance learning. arXiv preprint arXiv:1810.09398, 2018.
- [34] L. Györfi, M. Kohler, A. Krzyzak, and H. Walk. A distribution-free theory of nonparametric regression. Springer Science & Business Media, 2006.
- [35] M. Hein and M. Maier. Manifold denoising. In NIPS, volume 19, pages 561–568, 2006.
- [36] C.D. Howard and C.M. Newman. Euclidean models of first-passage percolation. Probability Theory and Related Fields, 108(2):153–170, 1997.
- [37] C.D. Howard and C.M. Newman. Geodesics and spanning trees for Euclidean first-passage percolation. Annals of Probability, pages 577–623, 2001.
- [38] G. Hughes. On the mean accuracy of statistical pattern recognizers. IEEE Transactions on Information Theory, 14(1):55–63, 1968.
- [39] S.J. Hwang, S.B. Damelin, and A. Hero. Shortest path through random points. The Annals of Applied Probability, 26(5):2791–2823, 2016.
- [40] D.B. Johnson. Efficient algorithms for shortest paths in sparse networks. Journal of the ACM, 24(1):1–13, 1977.
- [41] J. Kileel, A. Moscovich, N. Zelesko, and A. Singer. Manifold learning with arbitrary norms. arXiv preprint arXiv:2012.14172, 2020.
- [42] A. Little, M. Maggioni, and J.M Murphy. Path-based spectral clustering: Guarantees, robustness to outliers, and fast algorithms. Journal of Machine Learning Research, 21(6):1–66, 2020.
- [43] D.O. Loftsgaarden and C.P. Quesenberry. A nonparametric estimate of a multivariate density function. The Annals of Mathematical Statistics, 36(3):1049–1051, 1965.
- [44] P. C. Mahalanobis. On the generalized distance in statistics. National Institute of Science of India, 1936.
- [45] J. Malik, C. Shen, H.-T. Wu, and N. Wu. Connecting dots: from local covariance to empirical intrinsic geometry and locally linear embedding. Pure and Applied Analysis, 1(4):515–542, 2019.
- [46] D. Mckenzie and S. Damelin. Power weighted shortest paths for clustering Euclidean data. Foundations of Data Science, 1(3):307, 2019.
- [47] A. Moscovich, A. Jaffe, and B.Nadler. Minimax-optimal semi-supervised regression on unknown manifolds. In AISTATS, pages 933–942, 2017.
- [48] A.Y. Ng, M.I. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm. In NIPS, pages 849–856, 2002.
- [49] P. Niyogi, S. Smale, and S. Weinberger. Finding the homology of submanifolds with high confidence from random samples. Discrete & Computational Geometry, 39(1-3):419–441, 2008.
- [50] P. Petersen, S. Axler, and K.A. Ribet. Riemannian Geometry, volume 171. Springer, 2006.
- [51] A. Rinaldo and L. Wasserman. Generalized density clustering. The Annals of Statistics, 38(5):2678–2722, 2010.
- [52] A. Rodriguez and A. Laio. Clustering by fast search and find of density peaks. Science, 344(6191):1492–1496, 2014.
- [53] Sajama and A. Orlitsky. Estimating and computing density based distance metrics. In ICML, pages 760–767, 2005.
- [54] L.K. Saul and M.I. Jordan. A variational principle for model-based interpolation. In NIPS, pages 267–273, 1997.
- [55] G. Schiebinger, M.J. Wainwright, and B. Yu. The geometry of kernelized spectral clustering. The Annals of Statistics, 43(2):819–846, 2015.
- [56] J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8):888–905, 2000.
- [57] J.B. Tenenbaum, V. De Silva, and J.C. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500):2319–2323, 2000.
- [58] L. van der Maaten and G. Hinton. Visualizing data using t-SNE. Journal of Machine Learning Research, 9(Nov):2579–2605, 2008.
- [59] D. Van Dijk, R. Sharma, J. Nainys, K. Yim, P. Kathail, A.J. Carr, C. Burdziak, K.R Moon, C.L. Chaffer, D. Pattabiraman, B. Bierie, L. Mazutis, G. Wolf, S. Krishnaswamy, and D. Pe’er. Recovering gene interactions from single-cell data using data diffusion. Cell, 174(3):716–729, 2018.
- [60] P. Vincent and Y. Bengio. Density-sensitive metrics and kernels. In Snowbird Learning Workshop, 2003.
- [61] U. Von Luxburg. A tutorial on spectral clustering. Statistics and Computing, 17(4):395–416, 2007.
- [62] R. Xu, S. Damelin, B. Nadler, and D.C. Wunsch II. Clustering of high-dimensional gene expression data with feature filtering methods and diffusion maps. Artificial Intelligence in Medicine, 48(2-3):91–98, 2010.
- [63] L. Zelnik-Manor and P. Perona. Self-tuning spectral clustering. In NIPS, pages 1601–1608, 2005.
- [64] S. Zhang and J.M. Murphy. Hyperspectral image clustering with spatially-regularized ultrametrics. Remote Sensing, 13(5):955, 2021.
Appendix A Proofs for Section 2
Proof of Lemma 2.2.
Let be a path which achieves . Since , for all . Then:
Note implies , and thus , so that . This yields
which proves the upper bound. Now let be a path achieving ; note that since , the path is contained in . Thus
so that
∎
Appendix B Proofs for Section 3
Proof of Lemma 3.6.
Let and choose a coordinate system such that , and . is now the interior of:
In spherical coordinates the boundary of this region may be expressed as:
(17) |
where and . Define as the unique positive solution of (17). Implicitly differentiating in yields
Solving for and setting the result to 0 yields
Thus we obtain two solutions to :
Thus the minimal radius occurs when . Substituting into (17) yields:
Hence , as desired. To see observe that if then:
(18) |
hence cannot be in . ∎
Lemma B.1.
With assumptions and notation as in Theorem 11, .
Lemma B.2.
With notation and assumptions as in Theorem 11:
(19) |
Appendix C Local Analysis: Proof of Corollary 2.4
Proof.
First note that by Theorem 2.3
Thus
∎
Appendix D Additional Clustering Results
D.1 Spectral Clustering with Self-Tuning
Clustering results for Euclidean SC with self-tuning (ST) are shown in Figure 7. Similarly to Euclidean SC and Euclidean SC with the diffusion maps normalization, SC+ST can cluster well given a priori but struggles to simultaneously learn and cluster accurately.






D.2 Clustering with Fiedler Eigenvector
Clustering results on the Two Rings and Short Bottleneck data (both of which have ) appear in Figure 8. The results are very similar to running -means on the second eigenvector of .






Appendix E PWSPD Spanners: Intrinsic Path Distance
In this section we assume that is a compact Riemannian manifold with metric , but we do not assume that is isometrically embedded in . Let us first establish some notation. For precise definitions of terms in italics, we refer the reader to [50].
-
•
For any , denotes the Riemannian curvature while denotes the sectional curvature. In this notation, is a matrix while is a scalar. These notions of curvature are related:
(22) We shall drop the “” in the subscript when it is clear from context. Let be such that for all and . Because is compact, such a exists. Similarly, denotes the scalar curvature and while .
-
•
For , define , . Both quantities can be expressed as a Taylor series in , with coefficients depending on the curvature of [32]:
(23) (24) -
•
For any , denotes the exponential map. By construction, . The exponential map is used to construct normal coordinates.
-
•
The injectivity radius at is denoted by , while denotes the injectivity radius of . Because is closed, the injectivity radius is bounded away from 0.
Proposition E.1.
In normal coordinates the metric has the following expansion:
where if and is zero otherwise. Moreover, for any and :
(25) |
Proof.
See the discussion in [50] above Theorem 5.5.8 and Exercises 5.9.26 and 5.9.27. ∎
Combining (22) and (25) yields:
For any let denote the distance to the NN of . Because as , we have that for all , almost surely for large enough.
The proof of Theorem E.2 proceeds in a similar manner to the proof of Theorem 3.9. However, care must be taken to account for the curvature. Note that for any quantity implicitly dependent on , we shall say that if for any there exists an such that for all we have that . Let denote the complete graph on with edge weights while shall denote the NN subgraph of .
Theorem E.2.
Let be a closed and compact Riemannian manifold. Let be drawn i.i.d. from according to a probability distribution with continuous density satisfying for all . For , sufficiently large, and
(26) |
is a -spanner of with probability at least .
Proof.
As in Theorem 3.9, we prove this by showing that every edge of not contained in is not critical. As before, w.h.p. uniformly in [39]. So, let be large enough so that .
Pick any which are not NNs and let . If , then and thus the edge is not critical. So, suppose without loss of generality in what follows that . Let denote the NNs of . Because is not a NN of , for . We show is not critical by showing there exists an such that with probability at least .
Let be the midpoint of the (shortest) geodesic from to . As the exponential map is a diffeomorphism onto . Choose Riemannian normal coordinates centered at such that and . For any , we may write for some . Now, by (25)
and similarly: We split the analysis into the case where and where .
Case (Positive Sectional Curvature): If then the terms proportional to are strictly non-positive, and hence may be dropped. We get:
and hence Thus, for sufficiently small we may guarantee that
(27) |
by ensuring where is such that the term is less than . As with , we observe that .
Case (Negative Sectional Curvature): If then one can upper bound the terms proportional to by to obtain:
and so:
(28) |
As in the positive sectional curvature case, one can guarantee:
(29) |
by ensuring
(30) |
where is such that the term is less than . Again, we observe that . Note that if (30) holds with then , and so (28) becomes:
(31) |
For both cases, consider the -elongated set defined in the tangent space:
as well as its scaled image under the exponential map: . From the above arguments, (27) (resp. (29)) will hold as long as . By Lemma 3.6 as long as is sufficiently large that we have so where . Hence:
(32) |
As in Theorem 3.9:
As in Theorem 3.9 for ,
(33) |
If then from (27) (resp. (29)):
(34) |
and so is not critical. Thus, is not critical with probability exceeding . These edges are precisely those contained in but not in . There are fewer than such non -NN pairs . By the union bound and (33) we conclude that none of these are critical with probability greater than . This was conditioned on for all , which holds with probability . Thus, all critical edges are contained in with probability exceeding . Unpacking yields the claimed lower bound on . ∎
Appendix F Estimating the Fluctuation Exponent
Table 3 shows confidence interval estimates for obtained by computing in a sparse graph.
CI for | CI for | CI for | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
2 | 0.30 | (0.28, 0.32) | 3 | 0.28 | (0.20, 0.36) | 4 | 0.19 | (0.03, 0.36) | |||
2 | 0.31 | (0.30, 0.32) | 3 | 0.23 | (0.20, 0.25) | 4 | 0.16 | (0.13, 0.19) | |||
2 | 0.33 | (0.31, 0.34) | 3 | 0.24 | (0.22, 0.25) | 4 | 0.14 | (0.11, 0.18) | |||
2 | 0.34 | (0.32, 0.37) | 3 | 0.29 | (0.27, 0.32) | 4 | 0.19 | (0.14, 0.23) |