This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

\floatpagestyle

empty

Balancing Geometry and Density: Path Distances on High-Dimensional Data

Anna Little111Department of Mathematics, University of Utah, Salt Lake City, UT 84112, USA (little@math.utah.edu)    Daniel McKenzie222Department of Mathematics, UCLA, Los Angeles, CA 90095, USA (mckenzie@math.ucla.edu)    James M. Murphy333Department of Mathematics, Tufts University, Medford, MA 02155, USA jm.murphy@tufts.edu
Abstract

New geometric and computational analyses of power-weighted shortest-path distances (PWSPDs) are presented. By illuminating the way these metrics balance geometry and density in the underlying data, we clarify their key parameters and illustrate how they provide multiple perspectives for data analysis. Comparisons are made with related data-driven metrics, which illustrate the broader role of density in kernel-based unsupervised and semi-supervised machine learning. Computationally, we relate PWSPDs on complete weighted graphs to their analogues on weighted nearest neighbor graphs, providing high probability guarantees on their equivalence that are near-optimal. Connections with percolation theory are developed to establish estimates on the bias and variance of PWSPDs in the finite sample setting. The theoretical results are bolstered by illustrative experiments, demonstrating the versatility of PWSPDs for a wide range of data settings. Throughout the paper, our results generally require only that the underlying data is sampled from a compact low-dimensional manifold, and depend most crucially on the intrinsic dimension of this manifold, rather than its ambient dimension.

1 Introduction

The analysis of high-dimensional data is a challenge in modern statistical and machine learning. In order to defeat the curse of dimensionality [38, 34, 10], distance metrics that efficiently and accurately capture intrinsically low-dimensional latent structure in high-dimensional data are required. Indeed, this need to capture low-dimensional linear and nonlinear structure in data has led to the development of a range of data-dependent distances and related dimension reduction methods, which have been widely employed in applications [44, 57, 8, 26, 21, 58]. Understanding how these metrics trade off fundamental properties in the data (e.g. local versus global structure, geometry versus density) when making pointwise comparisons is an important challenge in their use, and may be understood as a form of model selection in unsupervised and semi-supervised machine learning problems.

1.1 Power-Weighted Shortest Path Distances

In this paper we analyze power-weighted shortest path distances (PWSPDs) and develop their applications to problems in machine learning. These metrics compute the shortest path between two points in the data, accounting for the underlying density of the points along the path. Paths through low-density regions are penalized, so that the optimal path must balance being “short” (in the sense of the classical geodesic distance) with passing through high-density regions. We consider a finite data set 𝒳={xi}i=1nD\mathcal{X}=\{x_{i}\}_{i=1}^{n}\subset\mathbb{R}^{D}, which we usually assume to be intrinsically low-dimensional, in the sense that there exists a compact dd-dimensional Riemannian data manifold D\mathcal{M}\subset\mathbb{R}^{D} and a probability density function f(x)f(x) supported on \mathcal{M} such that {xi}i=1ni.i.d.f(x)\{x_{i}\}_{i=1}^{n}\mathbin{\overset{i.i.d.}{\kern 0.0pt\leavevmode\resizebox{17.63316pt}{3.66875pt}{$\sim$}}}f(x).

Definition 1.1.

For p[1,)p\in[1,\infty) and for x,y𝒳x,y\in\mathcal{X}, the (discrete) pp-weighted shortest path distance (PWSPD) from xx to yy is:

p(x,y)=minπ={xij}j=1T(j=1T1xijxij+1p)1p,\displaystyle\ell_{p}(x,y)=\min_{\pi=\{x_{i_{j}}\}_{j=1}^{T}}\left(\sum_{j=1}^{T-1}\|x_{i_{j}}-x_{i_{j+1}}\|^{p}\right)^{\frac{1}{p}}, (1)

where π\pi is a path of points in 𝒳\mathcal{X} with xi1=xx_{i_{1}}=x and xiT=yx_{i_{T}}=y and \|\cdot\| is the Euclidean norm.

Early uses of density-based distances for interpolation [54] led to the formulation of PWSPD in the context of unsupervised and semi-supervised learning and applications [30, 60, 17, 53, 18, 13, 47, 46, 42, 64, 16]. It will occasionally be useful to think of pp(,)\ell_{p}^{p}(\cdot,\cdot) as the path distance in the complete graph on 𝒳\mathcal{X} with edge weights xixjp\|x_{i}-x_{j}\|^{p}, which we shall denote 𝒢𝒳p\mathcal{G}_{\mathcal{X}}^{p}. When p=1p=1, 1(x,y)=xy\ell_{1}(x,y)=\|x-y\|, i.e. the Euclidean distance. As pp increases, the largest elements in the set of path edge lengths {xijxij+1}j=1T1\left\{\left\|x_{i_{j}}-x_{i_{j+1}}\right\|\right\}_{j=1}^{T-1} begin to dominate the optimization (1), so that paths through higher density regions (with shorter edge lengths) are promoted. When pp\rightarrow\infty, p\ell_{p} converges (up to rescaling by the number of edges achieving maximal length) to the longest-leg path distance (x,y)=minπ={xij}j=1Tmaxj=1,,T1xijxij+1\displaystyle\ell_{\infty}(x,y)=\min_{\pi=\{x_{i_{j}}\}_{j=1}^{T}}\max_{j=1,\dots,T-1}\|x_{i_{j}}-x_{i_{j+1}}\| [42] and is thus driven by the density function ff. Outside these extremes, p\ell_{p} balances taking a “short” path and taking one through regions of high density. Note that p\ell_{p} can be defined for p<1p<1, but it does not satisfy the triangle inequality and is thus not a metric (pp\ell_{p}^{p} however is a metric for all p>0p>0). This case was studied in [2], where it is shown to have counterintuitive properties that should preclude its use in machine learning and data analysis.

While (1) is defined for finite data, it admits a corresponding continuum formulation.

Definition 1.2.

Let (,g)(\mathcal{M},g) be a compact, dd-dimensional Riemannian manifold and ff a continuous density function on \mathcal{M} that is lower bounded away from zero (i.e. fmin:=minxf(x)>0f_{\min}:=\min_{x\in\mathcal{M}}f(x)>0 on \mathcal{M}). For p[1,)p\in[1,\infty) and x,yx,y\in\mathcal{M}, the (continuum) pp-weighted shortest path distance from xx to yy is:

p(x,y)=(infγ011f(γ(t))p1dg(γ(t),γ(t))𝑑t)1p,\mathcal{L}_{p}(x,y)=\left(\inf_{\gamma}\int_{0}^{1}\frac{1}{f(\gamma(t))^{\frac{p-1}{d}}}\sqrt{g\left(\gamma^{\prime}(t),\gamma^{\prime}(t)\right)}dt\right)^{\frac{1}{p}}, (2)

where γ:[0,1]\gamma:[0,1]\rightarrow\mathcal{M} is a 𝒞1\mathcal{C}^{1} path with γ(0)=x,γ(1)=y\gamma(0)=x,\gamma(1)=y.

Note 1\mathcal{L}_{1} is simply the geodesic distance on \mathcal{M}. However for p>1p>1 and a nonuniform density, the optimal path γ\gamma is generally not the geodesic distance on \mathcal{M}: p\mathcal{L}_{p} favors paths which travel along high-density regions, and detours off the classical 1\mathcal{L}_{1} geodesics are thus acceptable. The parameter pp controls how large of a detour is optimal; for large pp, optimal paths may become highly nonlocal and different from classical geodesic paths.

It is known [39, 33] that when ff is continuous and positive, for p>1p>1 and all x,yx,y\in\mathcal{M},

limnnp1pdp(x,y)=Cp,dp(x,y)\displaystyle\displaystyle\lim_{n\rightarrow\infty}n^{\frac{p-1}{pd}}\ell_{p}(x,y)=C_{p,d}\mathcal{L}_{p}(x,y) (3)

for an absolute constant Cp,dC_{p,d} depending only on pp and dd, i.e. that the discrete PWSPD computed on an i.i.d. sample from ff (appropriately rescaled) is a consistent estimator for the continuum PWSPD. In particular, (3) is established by [33] for C1C^{1}, isometrically embedded manifolds and by [39] for smooth, compact manifolds without boundary and for p\ell_{p} defined using geodesic distance. We thus define the normalized (discrete) path metric

~p(x,y)\displaystyle\tilde{\ell}_{p}(x,y) :=np1pdp(x,y).\displaystyle:=n^{\frac{p-1}{pd}}\ell_{p}(x,y)\,. (4)

The np1pdn^{\frac{p-1}{pd}} normalization factor accounts for the fact that for p>1p>1, p\ell_{p} converges uniformly to 0 as nn\rightarrow\infty [46]. Note that the 1/p1/p exponent in (1) and (3) is necessary to obtain a metric that is homogeneous. Moreover, as pp\rightarrow\infty, p\mathcal{L}_{p} is constant on regions of constant density, but pp\mathcal{L}_{p}^{p} is not. Indeed, consider a uniform distribution on [0,1]d[0,1]^{d}, which has density f=𝟙[0,1]df=\mathbbm{1}_{[0,1]^{d}}. Then for all x,y[0,1]dx,y\in[0,1]^{d} and for all pp, pp(x,y)=xy\mathcal{L}_{p}^{p}(x,y)=\|x-y\|. On the other hand, for all x,y[0,1]dx,y\in[0,1]^{d}, p(x,y)=xy1/p1\mathcal{L}_{p}(x,y)=\|x-y\|^{1/p}\rightarrow 1 as pp\rightarrow\infty, i.e. all points are equidistant in the limit pp\rightarrow\infty. Thus the 1/p1/p exponent in (1) and (3) is necessary to obtain an entirely density-based metric for large pp.

In practice, it is more efficient to compute PWSPDs in a sparse graph instead of a complete graph. It is thus natural to define PWSPDs with respect to a subgraph \mathcal{H} of 𝒢𝒳p\mathcal{G}_{\mathcal{X}}^{p}.

Definition 1.3.

Let \mathcal{H} be any subgraph of 𝒢𝒳p\mathcal{G}_{\mathcal{X}}^{p}. For x,yXx,y\in X, let 𝒫(x,y)\mathcal{P}_{\mathcal{H}}(x,y) be the set of paths connecting xx and yy in \mathcal{H}. For p[1,)p\in[1,\infty) and for x,y𝒳x,y\in\mathcal{X}, the (discrete) pp-weighted shortest path distance (PWSPD) with respect to \mathcal{H} from xx to yy is:

p(x,y)=minπ={xij}j=1T𝒫(x,y)(j=1T1xijxij+1p)1p.\displaystyle\ell_{p}^{\mathcal{H}}(x,y)=\min_{\pi=\{x_{i_{j}}\}_{j=1}^{T}\in\mathcal{P}_{\mathcal{H}}(x,y)}\left(\sum_{j=1}^{T-1}\|x_{i_{j}}-x_{i_{j+1}}\|^{p}\right)^{\frac{1}{p}}.

Clearly p𝒢𝒳p(,)=p(,)\ell_{p}^{\mathcal{G}_{\mathcal{X}}^{p}}(\cdot,\cdot)=\ell_{p}(\cdot,\cdot). In order to compute all-pairs PWSPDs in a complete graph with nn nodes (i.e. p(xi,xj)\ell_{p}(x_{i},x_{j}) for all xi,xj𝒳x_{i},x_{j}\in\mathcal{X}), a direct application of Dijkstra’s algorithm has complexity O(n3)O(n^{3}). Let 𝒢𝒳p,k\mathcal{G}^{p,k}_{\mathcal{X}} denote the kkNN graph, constructed from 𝒢𝒳p\mathcal{G}^{p}_{\mathcal{X}} by retaining only edges {x,y}\{x,y\} if xx is amongst the kk nearest neighbors of yy in 𝒳\mathcal{X} (we say: “xx is a kkNN of yy” for short) or vice versa. In some cases the PWSPDs with respect to 𝒢𝒳p,k\mathcal{G}^{p,k}_{\mathcal{X}} are known to coincide with those computed in 𝒢𝒳p\mathcal{G}_{\mathcal{X}}^{p} [33, 20]. If so, we say the kkNN graph is a 11-spanner of 𝒢𝒳p\mathcal{G}_{\mathcal{X}}^{p}. This provides a significant computational advantage, since kkNN graphs are much sparser, and reduces the complexity of computing all-pairs PWSPD to O(kn2)O(kn^{2}) [40].

1.2 Summary of Contributions

This article develops new analyses, computational insights, and applications of PWSPDs, which may be summarized in three major contributions. First, we establish that when pd\frac{p}{d} is not too large, PWSPDs locally are density-rescaled Euclidean distances. We give precise error bounds that improve over known bounds [39] and are tight enough to prove the local equivalence of Gaussian kernels constructed with PWSPD and density-rescaled Euclidean distances. We also develop related theory which clarifies the role of density in machine learning kernels more broadly. A range of machine learning kernels that normalize in order to mitigate or leverage differences in underlying density are considered and compared to PWSPD. Relatedly, we analyze how PWSPDs become increasingly influenced by the underlying density as pp\rightarrow\infty. We also illustrate the role of density and benefits of PWSPDs on illustrative data sets.

Second, we improve and extend known bounds on kk [33, 46, 20] guaranteeing that the kkNN graph is a 11-spanner of 𝒢𝒳p\mathcal{G}_{\mathcal{X}}^{p}. Specifically, we show that for any 1<p<1<p<\infty, the kkNN graph is a 11-spanner of 𝒢𝒳p\mathcal{G}_{\mathcal{X}}^{p} with probability exceeding 11/n1-1/n if kCp,d,f,log(n)k\geq C_{p,d,f,\mathcal{M}}\cdot\log(n), for an explicit constant Cp,d,f,C_{p,d,f,\mathcal{M}} that depends on the density power pp, intrinsic dimension dd, underlying density ff, and the geometry of the manifold \mathcal{M}, but is crucially independent of nn. These results are proved both in the case that the manifold is isometrically embedded and in the case that the edge lengths are in terms of intrinsic geodesic distance on the manifold. Our results provide an essential computational tool for the practical use of PWSPDs, and their key dependencies are verified numerically with extensive large-scale experiments.

Third, we bound the convergence rate of PWSPD to its continuum limit using a percolation theory framework, thereby quantifying the [39, 33] asymptotic convergence result (4). Specifically, we develop bias and variance estimates by relating results on Euclidean first passage percolation (FPP) to the PWSPD setting. Surprisingly, these results suggest that the variance of PWSPD is essentially independent of pp, and depends on the intrinsic dimension dd in complex ways. Numerical experiments verify our theoretical analyses and suggest several conjectures related to Euclidean FPP that are of independent interest.

1.3 Notation

We shall use the notation in Table 1 consistently, though certain specialized notation will be introduced as required. We assume throughout that the data 𝒳\mathcal{X} is drawn from a compact Riemannian data manifold (,g)(\mathcal{M},g), with additional assumptions imposed on \mathcal{M} as needed; we do not rigorously consider the more general case that 𝒳\mathcal{X} is drawn from a distribution supported near \mathcal{M}. If D\mathcal{M}\subset\mathbb{R}^{D}, we assume that it is isometrically embedded in D\mathbb{R}^{D}, i.e. gg is the unique metric induced by restricting the Euclidean metric on D\mathbb{R}^{D} to \mathcal{M}, unless otherwise stated. If an event holds with probability 1c/n1-c/n, where n=|𝒳|n=|\mathcal{X}| and cc is independent of nn, we say it holds with high probability (w.h.p.).

Notation Definition
𝒳\mathcal{X} 𝒳={xi}i=1nD\mathcal{X}=\{x_{i}\}_{i=1}^{n}\subset\mathbb{R}^{D}, a finite data set
DD ambient dimension of data set 𝒳\mathcal{X}
dd intrinsic dimension of data set 𝒳\mathcal{X}
vp\|v\|_{p} vp=(i=1D|vi|p)1p\|v\|_{p}=(\sum_{i=1}^{D}|v_{i}|^{p})^{\frac{1}{p}}, the Euclidean pp-norm of vDv\in\mathbb{R}^{D}
v\|v\| v2\|v\|_{2}, the Euclidean 22-norm
|c||c| the absolute value of cc\in\mathbb{R}
𝒢𝒳p\mathcal{G}_{\mathcal{X}}^{p} complete graph on 𝒳\mathcal{X} with edge weight xixjp\|x_{i}-x_{j}\|^{p} between xi,xj𝒳x_{i},x_{j}\in\mathcal{X}
{x,y}\{x,y\} edge between nodes x,yx,y in a graph
(,g)(\mathcal{M},g) a Riemannian manifold with associated metric gg
κ\kappa measure of curvature on \mathcal{M}; see Definition 2.1
κ0\kappa_{0} measure of regularity on \mathcal{M}; see Definition 3.7
ζ\zeta reach of a manifold \mathcal{M}; see Definition 3.8
f(x)f(x) probability density function from which 𝒳\mathcal{X} is drawn
fminf_{\text{min}}, fmaxf_{\text{max}} minimum and maximum values of density ff defined on compact manifold \mathcal{M}
{πi}i=1T\{\pi_{i}\}_{i=1}^{T}, γ(t)\gamma(t) discrete, continuous path
p(x,y)\ell_{p}(x,y) discrete PWSPD, see (1)
~p(x,y)\tilde{\ell}_{p}(x,y) rescaled version of p(x,y)\ell_{p}(x,y), see (4)
p(x,y)\ell_{p}^{\mathcal{H}}(x,y) discrete PWSPD defined on the subgraph 𝒢𝒳p\mathcal{H}\subset\mathcal{G}_{\mathcal{X}}^{p}; see Definition 1.3
p(x,y)\mathcal{L}_{p}(x,y) continuum PWSPD, see (2)
𝒟(x,y)\mathscr{D}(x,y) geodesic distance on manifold \mathcal{M}
𝒟f,Euc(x,y)\mathscr{D}_{f,\text{Euc}}(x,y) density-based stretch of Euclidean distance with respect to ff
W,Deg,LW,\text{Deg},L weight, degree, and Laplacian matrices associated to a graph
δ(,)\delta(\cdot,\cdot) arbitrary metric
Bδ(x,ϵ)B_{\delta}(x,\epsilon) {y|δ(x,y)ϵ}\{y\ |\ \delta(x,y)\leq\epsilon\}, ball of radius ϵ>0\epsilon>0 centered at xx with respect to δ\delta
B(x,ϵ)B(x,\epsilon) Euclidean ball of radius ϵ>0\epsilon>0 centered at xx, dimension determined by context
𝒟α,p(x,y)\mathcal{D}_{\alpha,p}(x,y) pp-elongated set of radius α\alpha based at points x,yx,y; see Definition 3.4
kk number of nearest neighbors (NN), sometimes dependent on nn (i.e. k=k(n)k=k(n))
μ,χ\mu,\chi percolation time, fluctuation constants
λ\lambda intensity parameter in a Poisson point process
A¯\bar{A} complement of the set AA
𝔼[ξ],Var[ξ]\mathbb{E}[\xi],\text{Var}[\xi] expectation, variance of a random variable ξ\xi
diam(A)\text{diam}(A) supx,yAxy\sup_{x,y\in A}\|x-y\|, the Euclidean diameter of a set AA
vol(A)\text{vol}(A) volume of a set AA, with dimension depending on context
A¯\bar{A} complement of a set AA
A\partial A boundary of a set AA
aba\lesssim b aCba\leq Cb for a constant CC independent of the dependencies of a,ba,b
aba\propto b quantity aa is proportional to quantity bb, i.e. aba\lesssim b and bab\lesssim a
Table 1: Notation used throughout the paper.

2 Local Analysis: Density and Kernels

Density-driven methods are commonly used for unsupervised and semi-supervised learning [19, 27, 21, 51, 13, 7, 52]. Despite this popularity, the role of density is not completely clear in this context. Indeed, some methods seek to leverage variations in density while others mitigate it. In this section, we explore the role that density plays in popular machine learning kernels, including those used in self-tuning spectral clustering and diffusion maps. We compare with the effect of density in p\ell_{p}-based kernels, and illustrate the primary advantages and disadvantages on toy data sets.

2.1 Role of Density in Graph Laplacian Kernels

A large family of algorithms [8, 9, 56, 48, 61] view data points as the nodes of a graph, and define the corresponding edge weights via a kernel function. In general, by kernel we mean a function 𝒦:D×D\mathcal{K}:\mathbb{R}^{D}\times\mathbb{R}^{D}\rightarrow\mathbb{R} that captures a notion of similarity between elements of D\mathbb{R}^{D}. More precisely, we suppose that 𝒦\mathcal{K} is of the form 𝒦(xi,xj)=h(δ(xi,xj))\mathcal{K}(x_{i},x_{j})=h(\delta(x_{i},x_{j})) for some metric δ\delta on D\mathbb{R}^{D} and smooth, positive, rapidly decaying (hence integrable) function h:h:\mathbb{R}\rightarrow\mathbb{R}. Our technical results will pertain exclusively to the Gaussian kernel 𝒦(xi,xj)=exp(δ(xi,xj)2/ϵ2)\mathcal{K}(x_{i},x_{j})=\exp(-\delta(x_{i},x_{j})^{2}/\epsilon^{2}) for some metric δ\delta and scaling parameter ϵ>0\epsilon>0, albeit more general kernels have been considered in the literature [4, 23, 11]. Given 𝒳D\mathcal{X}\subset\mathbb{R}^{D}, one first defines a weight matrix Wn×nW\in\mathbb{R}^{n\times n} by Wij=𝒦(xi,xj)W_{ij}=\mathcal{K}(x_{i},x_{j}) for some kernel 𝒦\mathcal{K}, and diagonal degree matrix Degn×n\text{Deg}\in\mathbb{R}^{n\times n} by Degii=j=1nWij\text{Deg}_{ii}=\sum_{j=1}^{n}W_{ij}. A graph Laplacian LL is then defined using W,DegW,\text{Deg}. Then, the KK lowest frequency eigenvectors of LL, denoted ϕ1,,ϕK\phi_{1},\ldots,\phi_{K}, define a KK-dimensional spectral embedding of the data by xi(ϕ1(xi),ϕ2(xi),,ϕK(xi))x_{i}\mapsto(\phi_{1}(x_{i}),\phi_{2}(x_{i}),\dots,\phi_{K}(x_{i})), where ϕj(xi)=(ϕj)i\phi_{j}(x_{i})=(\phi_{j})_{i}. Commonly, a standard clustering algorithm such as KK-means is then applied to the spectral embedding. This procedure is known as spectral clustering (SC). In unnormalized SC, L=DegWL=\text{Deg}-W, while in normalized SC either the random walk Laplacian LRW=Deg1LL_{\text{RW}}=\text{Deg}^{-1}L or the symmetric normalized Laplacian LSYM=Deg1/2LDeg1/2L_{\text{SYM}}=\text{Deg}^{-1/2}L\text{Deg}^{-1/2} is used.

Many modifications of this general framework have been considered. Although SC is better able to handle irregularly shaped clusters than many traditional algorithms [5, 55], it is often unstable in the presence of low degree points and sensitive to the choice of scaling parameter ϵ\epsilon when using the Gaussian kernel [61]. These shortcomings motivated [63] to apply SC with the self-tuning kernel Wij=exp(xixj2σi,kσj,k),W_{ij}=\exp\left(-\frac{\|x_{i}-x_{j}\|^{2}}{\sigma_{i,k}\sigma_{j,k}}\right), where σi,k\sigma_{i,k} is Euclidean distance of xix_{i} to its kthk^{\text{th}} NN. To clarify how the data density influences this kernel, consider how σi,k\sigma_{i,k} relates to the kkNN density estimator at xix_{i}:

fn(xi)\displaystyle f_{n}(x_{i}) :=knvol(B(0,1))σi,kd.\displaystyle:=\frac{k}{n\text{vol}(B(0,1))\sigma_{i,k}^{d}}\,. (5)

It is known [43] that if k=k(n)k=k(n) is such that k(n)k(n)\rightarrow\infty while k(n)/n0k(n)/n\rightarrow 0, then fn(xi)f_{n}(x_{i}) is a consistent estimator of f(xi)f(x_{i}), as long as ff is continuous and positive. Furthermore, if ff is uniformly continuous and k(n)/lognk(n)/\log n\rightarrow\infty while k(n)/n0k(n)/n\rightarrow 0, then supi|fn(xi)f(xi)|0\sup_{i}|f_{n}(x_{i})-f(x_{i})|\rightarrow 0 with probability 1 [25]. Although these results assume the density ff is supported in d\mathbb{R}^{d}, the density estimator (5) is consistent in the general case when ff is supported on a dd-dimensional Riemannian manifold D\mathcal{M}\subseteq\mathbb{R}^{D} for lognk(n)n\log n\ll k(n)\ll n [28]. For such k(n)k(n), σi,kϵn,df(xi)1d\sigma_{i,k}\rightarrow\epsilon_{n,d}f(x_{i})^{-\frac{1}{d}} for some constant ϵn,d\epsilon_{n,d} depending on n,dn,d. Thus, for nn large the kernel for self-tuning spectral clustering is approximately:

Wij\displaystyle W_{ij} exp(f(xi)1df(xj)1dxixj2ϵn,d2).\displaystyle\approx\exp\left(-f(x_{i})^{\frac{1}{d}}f(x_{j})^{\frac{1}{d}}\frac{\|x_{i}-x_{j}\|^{2}}{\epsilon_{n,d}^{2}}\right). (6)

Relative to a standard SC kernel, (6) weakens connections in high density regions and strengthens connections in low density regions.

Diffusion maps [22, 21] is a more general framework which reduces to SC for certain parameter choices. More specifically, [21] considered the family of kernels

Wij\displaystyle W_{ij} =exp(xixj2/ϵ2)deg(xi)adeg(xj)a,deg(xi)=j=1nexp(xixj2/ϵ2)\displaystyle=\frac{\exp\left(-\|x_{i}-x_{j}\|^{2}/\epsilon^{2}\right)}{\text{deg}(x_{i})^{a}\text{deg}(x_{j})^{a}}\quad,\quad\text{deg}(x_{i})=\sum_{j=1}^{n}\exp\left(-\|x_{i}-x_{j}\|^{2}/\epsilon^{2}\right) (7)

parametrized by a[0,1]a\in[0,1], which determines the degree of density normalization. Since deg(xi)f(xi)+O(ϵ2)\text{deg}(x_{i})\propto f(x_{i})+O(\epsilon^{2}), deg(xi)\text{deg}(x_{i}) is a kernel density estimator of the density f(xi)f(x_{i}) [12] and, up to higher order terms,

Wij\displaystyle W_{ij} exp(xixj2/ϵ2)f(xi)af(xj)a.\displaystyle\propto\frac{\exp\left(-\|x_{i}-x_{j}\|^{2}/\epsilon^{2}\right)}{f(x_{i})^{a}f(x_{j})^{a}}. (8)

Note that ff has an effect on the kernel similar to the self-tuning kernel (6): connections in high density regions are weakened, and connections in low density regions are strengthened. Let LRWa,ϵL_{\text{RW}}^{a,\epsilon} denote the discrete random walk Laplacian using the weights WijW_{ij} given in (7). The discrete operator LRWa,ϵ/ϵ2-L_{\text{RW}}^{a,\epsilon}/\epsilon^{2} converges to the continuum Kolmogorov operator ψ=Δψ+(22a)ψff\mathscr{L}\psi=\Delta\psi+(2-2a)\nabla\psi\cdot\frac{\nabla f}{f} as n,ϵ0+n\rightarrow\infty,\epsilon\rightarrow 0^{+} for Laplacian operator Δ\Delta and gradient \nabla, both taken with respect to the Riemannian metric inherited from the ambient space [8, 21, 12]. When a=0a=0, we recover standard spectral clustering; there is no density renormalization in the kernel but the limiting operator is density dependent. When a=1a=1, LRW1,ϵ/ϵ2Δ-L_{\text{RW}}^{1,\epsilon}/\epsilon^{2}\rightarrow\Delta; in this case the discrete operator is density dependent but the limiting operator is purely geometric, since the density term is eliminated. We note that Laplacians and diffusion maps with various metrics and norms have been considered in a range of settings [62, 15, 59, 41].

2.2 Local Characterization of PWSPD-Based Kernels

While the kernels discussed in Section 2.1 compensate for discrepancies in density, PWSPD-based kernels strengthen connections through high-density regions and weaken connections through low-density regions. To illustrate more clearly the role of density in PWSPD-based kernels, we first show that locally the continuum PWSPD pp\mathcal{L}_{p}^{p} is well-approximated by the density-based stretch of Euclidean distance 𝒟f,Euc(x,y)=xy(f(x)f(y))p12d,\displaystyle\mathscr{D}_{f,\text{Euc}}(x,y)=\frac{\|x-y\|}{\left(f(x)f(y)\right)^{\frac{p-1}{2d}}}, as long as ff does not vary too rapidly and \mathcal{M} does not curve too quickly. This is quantified in Lemma 2.2, which is then used to prove Theorem 2.3, which bounds the local deviation of p\mathcal{L}_{p} from 𝒟f,Euc1/p\mathscr{D}_{f,\text{Euc}}^{1/p}. Finally, Corollary 2.4 establishes that Gaussian kernels constructed with p\mathcal{L}_{p} and 𝒟f,Euc1/p\mathscr{D}_{f,\text{Euc}}^{1/p} are locally similar. Throughout this section we assume S(d,κ,ϵ0)\mathcal{M}\in S(d,\kappa,\epsilon_{0}) as defined below.

Definition 2.1.

An isometrically embedded Riemannian manifold D\mathcal{M}\subset\mathbb{R}^{D} is an element of S(d,κ,ϵ0)S(d,\kappa,\epsilon_{0}) if it is compact with dimension dd, vol()=1\text{vol}(\mathcal{M})=1, and 𝒟(x,y)xy(1+κxy2)\mathscr{D}(x,y)\leq\|x-y\|(1+\kappa\|x-y\|^{2}) for all x,yx,y\in\mathcal{M} such that 𝒟(x,y)ϵ0\mathscr{D}(x,y)\leq\epsilon_{0}, where 𝒟(,)\mathscr{D}(\cdot,\cdot) is geodesic distance on \mathcal{M}.

The condition 𝒟(x,y)xy(1+κxy2)\mathscr{D}(x,y)\leq\|x-y\|(1+\kappa\|x-y\|^{2}) for all x,yx,y\in\mathcal{M} such that 𝒟(x,y)ϵ0\mathscr{D}(x,y)\leq\epsilon_{0} is equivalent to an upper bound on the second fundamental form: IIxκ\|II_{x}\|\leq\kappa for all xx\in\mathcal{M} [4, 45]. Note that this is also equivalent to a positive lower bound on the reach [29] of \mathcal{M} (e.g. Proposition 6.1 in [49] and Proposition A.1 in [1]); see Definition 3.8.

Let Bpp(x,ϵ)B_{\mathcal{L}_{p}^{p}}(x,\epsilon) and B𝒟(x,ϵ)B_{\mathscr{D}}(x,\epsilon) denote, respectively, the (closed) pp\mathcal{L}_{p}^{p} and geodesic balls centered at xx of radius ϵ\epsilon. Let fmax=maxy{f(y):y}f_{\text{max}}=\max_{y}\{f(y):y\in\mathcal{M}\}, fmin=miny{f(y):y}f_{\text{min}}=\min_{y}\{f(y):y\in\mathcal{M}\} be the global density maximum and minimum. Define the following local quantities:

fmin(x,ϵ)\displaystyle f_{\text{min}}(x,\epsilon) =miny{f(y):yB𝒟(x,ϵ(1+κϵ2))},\displaystyle=\min_{y}\left\{f(y):y\in B_{\mathscr{D}}(x,\epsilon(1+\kappa\epsilon^{2}))\right\},
fmax(x,ϵ)\displaystyle f_{\text{max}}(x,\epsilon) =maxy{f(y):yBpp(x,ϵ(1+κϵ2)/fmin(x,ϵ)p1d)}.\displaystyle=\max_{y}\{f(y):y\in B_{\mathcal{L}_{p}^{p}}(x,\epsilon(1+\kappa\epsilon^{2})/f_{\text{min}}(x,\epsilon)^{\frac{p-1}{d}})\}.

Let ρx,ϵ=fmax(x,ϵ)/fmin(x,ϵ)\rho_{x,\epsilon}=f_{\text{max}}(x,\epsilon)/f_{\text{min}}(x,\epsilon), which characterizes the local discrepancy in density in a ball of radius O(ϵ)O(\epsilon) around the point xx.

The following Lemma establishes that pp\mathcal{L}_{p}^{p} and 𝒟f,Euc\mathscr{D}_{f,\text{Euc}} are locally equivalent, and that discrepancies depend on (ρx,ϵ)p1d(\rho_{x,\epsilon})^{\frac{p-1}{d}} and the curvature constant κ\kappa. We note similar estimates appear in [2] for the special case p=0p=0. The proof appears in Appendix A.

Lemma 2.2.

Let S(d,κ,ϵ0)\mathcal{M}\in S(d,\kappa,\epsilon_{0}). Then for all yy\in\mathcal{M} with 𝒟(x,y)ϵ0\mathscr{D}(x,y)\leq\epsilon_{0} and xyϵ\|x-y\|\leq\epsilon,

1(ρx,ϵ)p1d𝒟f,Euc(x,y)\displaystyle\frac{1}{(\rho_{x,\epsilon})^{\frac{p-1}{d}}}\mathscr{D}_{f,\text{Euc}}(x,y) pp(x,y)(ρx,ϵ)p1d(1+κϵ2)𝒟f,Euc(x,y).\displaystyle\leq\mathcal{L}_{p}^{p}(x,y)\leq(\rho_{x,\epsilon})^{\frac{p-1}{d}}(1+\kappa\epsilon^{2})\mathscr{D}_{f,\text{Euc}}(x,y)\,. (9)

Note that corresponding bounds in terms of geodesic distance follow easily from the definition of p\mathcal{L}_{p}: fmax(x,ϵ)p1d𝒟(x,y)pp(x,y)fmin(x,ϵ)p1d𝒟(x,y)f_{\max}(x,\epsilon)^{-\frac{p-1}{d}}\mathscr{D}(x,y)\leq\mathcal{L}^{p}_{p}(x,y)\leq f_{\min}(x,\epsilon)^{-\frac{p-1}{d}}\mathscr{D}(x,y). Lemma 2.2 thus establishes that the metrics pp\mathcal{L}_{p}^{p} and 𝒟f,Euc\mathscr{D}_{f,\text{Euc}} are locally equivalent when (i) ρx,ϵ\rho_{x,\epsilon} is close to 1, (ii) p1d\frac{p-1}{d} is not too large, and (iii) κ\kappa is not too large. However, when p1d1\frac{p-1}{d}\gg 1, pp\mathcal{L}_{p}^{p} balls may become highly nonlocal in terms of geodesics.

The following Theorem establishes the local equivalence of p\mathcal{L}_{p} and 𝒟f,Euc1/p\mathscr{D}_{f,\text{Euc}}^{1/p} (and thus kernels constructed using these metrics). Assuming the density does not vary too quickly, Lemma 2.2 can be used to show that locally the difference between 𝒟f,Euc1/p\mathscr{D}_{f,\text{Euc}}^{1/p} and p\mathcal{L}_{p} is small. Variations in density are controlled by requiring that ff is FF-Lipschitz with respect to geodesic distance, i.e. |f(x)f(y)|F𝒟(x,y)|f(x)-f(y)|\leq F\mathscr{D}(x,y). This Lipschitz assumption allows us to establish a higher-order equivalence compared to existing results (e.g. Corollary 9 in [39]), which we leverage to obtain the local kernel equivalence stated in Corollary 2.4. The following analysis also establishes explicit dependencies of the equivalence on d,p,F,κd,p,F,\kappa.

Theorem 2.3.

Assume S(d,κ,ϵ0)\mathcal{M}\in S(d,\kappa,\epsilon_{0}) and that ff is a bounded FF-Lipschitz density function on \mathcal{M} with fmin>0f_{\text{min}}>0. Let ϵ>0\epsilon>0 and let

ρ=maxxρx,ϵ,C1=F(ρp1d+1)(p1)fmin1+p1pdpd,C2=κfminp1pdp.\rho=\displaystyle\max_{x\in\mathcal{M}}\rho_{x,\epsilon},\ C_{1}=\frac{F(\rho^{\frac{p-1}{d}}+1)(p-1)}{f_{\text{min}}^{1+\frac{p-1}{pd}}pd},\ C_{2}=\frac{\kappa}{f_{\text{min}}^{\frac{p-1}{pd}}p}.

Then for all x,yx,y\in\mathcal{M} such that 𝒟(x,y)ϵ0\mathscr{D}(x,y)\leq\epsilon_{0} and xyϵ\|x-y\|\leq\epsilon,

|p(x,y)𝒟f,Euc1/p(x,y)|C1ϵ1+1p+C2ϵ2+1p+O(ϵ3+1p).|\mathcal{L}_{p}(x,y)-\mathscr{D}_{f,\text{Euc}}^{1/p}(x,y)|\leq C_{1}\epsilon^{1+\frac{1}{p}}+C_{2}\epsilon^{2+\frac{1}{p}}+O(\epsilon^{3+\frac{1}{p}}).
Proof.

We first show that ρx,ϵ\rho_{x,\epsilon} is close to 1. Let y1Bpp(x,ϵ(1+κϵ2)/fmin(x,ϵ)p1d)y_{1}\in B_{\mathcal{L}_{p}^{p}}(x,\epsilon(1+\kappa\epsilon^{2})/f_{\text{min}}(x,\epsilon)^{\frac{p-1}{d}}) satisfy f(y1)=fmax(x,ϵ)f(y_{1})=f_{\text{max}}(x,\epsilon) and y2B𝒟(x,ϵ(1+κϵ2))y_{2}\in B_{\mathscr{D}}(x,\epsilon(1+\kappa\epsilon^{2})) satisfy f(y2)=fmin(x,ϵ)f(y_{2})=f_{\text{min}}(x,\epsilon) (since these sets are compact, these points must exist). Then by the Lipschitz condition:

|ρx,ϵ1|\displaystyle|\rho_{x,\epsilon}-1| =|f(y1)f(y2)|f(y2)F𝒟(y1,y2)f(y2)F𝒟(x,y1)+F𝒟(x,y2)f(y2).\displaystyle=\frac{|f(y_{1})-f(y_{2})|}{f(y_{2})}\leq\frac{F\mathscr{D}(y_{1},y_{2})}{f(y_{2})}\leq\frac{F\mathscr{D}(x,y_{1})+F\mathscr{D}(x,y_{2})}{f(y_{2})}\,.

Let γ2(t)\gamma_{2}(t) be a path achieving pp(x,y1)\mathcal{L}_{p}^{p}(x,y_{1}). Note that

𝒟(x,y1)fmax(x,ϵ)p1d\displaystyle\frac{\mathscr{D}(x,y_{1})}{f_{\text{max}}(x,\epsilon)^{\frac{p-1}{d}}} 011f(γ2(t))p1d|γ2(t)|𝑑t=pp(x,y1)ϵ(1+κϵ2)fmin(x,ϵ)p1d\displaystyle\leq\int_{0}^{1}\frac{1}{f(\gamma_{2}(t))^{\frac{p-1}{d}}}|\gamma_{2}^{\prime}(t)|\ dt=\mathcal{L}_{p}^{p}(x,y_{1})\leq\frac{\epsilon(1+\kappa\epsilon^{2})}{f_{\text{min}}(x,\epsilon)^{\frac{p-1}{d}}}

so that 𝒟(x,y1)ρx,ϵp1dϵ(1+κϵ2),𝒟(x,y2)ϵ(1+κϵ2).\mathscr{D}(x,y_{1})\leq\rho_{x,\epsilon}^{\frac{p-1}{d}}\epsilon(1+\kappa\epsilon^{2}),\quad\mathscr{D}(x,y_{2})\leq\epsilon(1+\kappa\epsilon^{2}). We thus obtain

ρx,ϵ\displaystyle\rho_{x,\epsilon} 1+F(ρx,ϵp1d+1fmin(x,ϵ))ϵ(1+κϵ2).\displaystyle\leq 1+F\left(\frac{\rho_{x,\epsilon}^{\frac{p-1}{d}}+1}{f_{\text{min}}(x,\epsilon)}\right)\epsilon(1+\kappa\epsilon^{2})\,. (10)

Letting Cx,ϵ=F(ρx,ϵp1d+1)/fmin(x,ϵ)C_{x,\epsilon}=F(\rho_{x,\epsilon}^{\frac{p-1}{d}}+1)/f_{\text{min}}(x,\epsilon), Taylor expanding around ϵ=0\epsilon=0 and (10) give ρx,ϵp1pd(1+Cx,ϵϵ(1+κϵ2))p1pd=1+Cx,ϵ(p1)pdϵ+O(ϵ3)\rho_{x,\epsilon}^{\frac{p-1}{pd}}\leq(1+C_{x,\epsilon}\epsilon(1+\kappa\epsilon^{2}))^{\frac{p-1}{pd}}=1+C_{x,\epsilon}\frac{(p-1)}{pd}\epsilon+O(\epsilon^{3}). Applying Lemma 2.2 yields (ρx,ϵ)p1pd𝒟f,Euc1/p(x,y)p(x,y)(ρx,ϵ)p1pd(1+κϵ2)1p𝒟f,Euc1/p(x,y)(\rho_{x,\epsilon})^{-\frac{p-1}{pd}}\mathscr{D}_{f,\text{Euc}}^{1/p}(x,y)\leq\mathcal{L}_{p}(x,y)\leq(\rho_{x,\epsilon})^{\frac{p-1}{pd}}(1+\kappa\epsilon^{2})^{\frac{1}{p}}\mathscr{D}_{f,\text{Euc}}^{1/p}(x,y), which gives

𝒟f,Euc1/p(x,y)(1+Cx,ϵ(p1)pdϵ+O(ϵ3))p(x,y)(1+Cx,ϵ(p1)pdϵ+κpϵ2+O(ϵ3))𝒟f,Euc1/p(x,y).\displaystyle\frac{\mathscr{D}_{f,\text{Euc}}^{1/p}(x,y)}{\left(1+C_{x,\epsilon}\frac{(p-1)}{pd}\epsilon+O(\epsilon^{3})\right)}\leq\mathcal{L}_{p}(x,y)\leq\left(1+C_{x,\epsilon}\frac{(p-1)}{pd}\epsilon+\frac{\kappa}{p}\epsilon^{2}+O(\epsilon^{3})\right)\mathscr{D}_{f,\text{Euc}}^{1/p}(x,y).

Rewriting the above yields:

(1Cx,ϵ(p1)pdϵκpϵ2+O(ϵ3))p(x,y)\displaystyle\left(1-C_{x,\epsilon}\frac{(p-1)}{pd}\epsilon-\frac{\kappa}{p}\epsilon^{2}+O(\epsilon^{3})\right)\mathcal{L}_{p}(x,y) 𝒟f,Euc1/p(x,y)p(x,y)(1+Cx,ϵ(p1)pdϵ+O(ϵ3)).\displaystyle\leq\mathscr{D}_{f,\text{Euc}}^{1/p}(x,y)\leq\mathcal{L}_{p}(x,y)\left(1+C_{x,\epsilon}\frac{(p-1)}{pd}\epsilon+O(\epsilon^{3})\right).

We thus obtain

|p(x,y)𝒟f,Euc1/p(x,y)|\displaystyle\left|\mathcal{L}_{p}(x,y)-\mathscr{D}_{f,\text{Euc}}^{1/p}(x,y)\right| (Cx,ϵ(p1)pdϵ+κpϵ2+O(ϵ3))p(x,y)\displaystyle\leq\left(C_{x,\epsilon}\frac{(p-1)}{pd}\epsilon+\frac{\kappa}{p}\epsilon^{2}+O(\epsilon^{3})\right)\mathcal{L}_{p}(x,y)
(Cx,ϵ(p1)pdϵ+κpϵ2+O(ϵ3))ϵ1p(1+κϵ2)1pfmin(x,ϵ)p1pd\displaystyle\leq\left(C_{x,\epsilon}\frac{(p-1)}{pd}\epsilon+\frac{\kappa}{p}\epsilon^{2}+O(\epsilon^{3})\right)\frac{\epsilon^{\frac{1}{p}}(1+\kappa\epsilon^{2})^{\frac{1}{p}}}{f_{\text{min}}(x,\epsilon)^{\frac{p-1}{pd}}}
=(Cx,ϵfmin(x,ϵ)p1pd(p1)pdϵ1+1p+κpfmin(x,ϵ)p1pdϵ2+1p+O(ϵ3+1p)).\displaystyle=\left(\frac{C_{x,\epsilon}}{f_{\text{min}}(x,\epsilon)^{\frac{p-1}{pd}}}\frac{(p-1)}{pd}\epsilon^{1+{\frac{1}{p}}}+\frac{\kappa}{pf_{\text{min}}(x,\epsilon)^{\frac{p-1}{pd}}}\epsilon^{2+{\frac{1}{p}}}+O(\epsilon^{3+{\frac{1}{p}}})\right)\,.

Note the coefficient C1C_{1} increases exponentially in pp; thus the equivalence between p\mathcal{L}_{p} and 𝒟f,Euc1/p\mathscr{D}_{f,\text{Euc}}^{1/p} is weaker for large pp. We also emphasize that in a Euclidean ball of radius ϵ\epsilon, the metric p\mathcal{L}_{p} scales like ϵ1p\epsilon^{\frac{1}{p}}; Theorem 2.3 thus guarantees that the relative error of approximating p\mathcal{L}_{p} with 𝒟f,Euc1/p\mathscr{D}_{f,\text{Euc}}^{1/p} is O(ϵ)O(\epsilon).

When p\mathcal{L}_{p} is locally well-approximated by 𝒟f,Euc1/p\mathscr{D}_{f,\text{Euc}}^{1/p}, the kernels constructed from these two metrics are also locally similar. The following Corollary leverages the error term in Theorem 2.3 to make this precise for Gaussian kernels. It is a direct consequence of Theorem 2.3 and Taylor expanding the Gaussian kernel, and its proof is given in Appendix C. Let ha(x)=exp(x2a)h_{a}(x)=\exp(-x^{2a}) so that h1(δ(,)ϵ)h_{1}\left(\frac{\delta(\cdot,\cdot)}{\epsilon}\right) is the Gaussian kernel with metric δ(,)\delta(\cdot,\cdot) and scaling parameter ϵ>0\epsilon>0. Note h1(pϵ1/p)=h1p(ppϵ)h_{1}\left(\frac{\mathcal{L}_{p}}{\epsilon^{1/p}}\right)=h_{\frac{1}{p}}\left(\frac{\mathcal{L}_{p}^{p}}{\epsilon}\right).

Corollary 2.4.

Under the assumptions and notation of Theorem 2.3, for C~i=Ci/fminp1pd\tilde{C}_{i}=C_{i}/f_{\text{min}}^{\frac{p-1}{pd}},

|h1p(pp(x,y)/ϵ)h1p(𝒟f,Euc(x,y)/ϵ)|h1p(pp(x,y)/ϵ)\displaystyle\frac{\left|h_{\frac{1}{p}}\left(\mathcal{L}_{p}^{p}(x,y)/\epsilon\right)-h_{\frac{1}{p}}\left(\mathscr{D}_{f,\text{Euc}}(x,y)/\epsilon\right)\right|}{h_{\frac{1}{p}}\left(\mathcal{L}_{p}^{p}(x,y)/\epsilon\right)} C~1ϵ+(C~2+12C~12)ϵ2+O(ϵ3).\displaystyle\leq\tilde{C}_{1}\epsilon+\left(\tilde{C}_{2}+\frac{1}{2}\tilde{C}_{1}^{2}\right)\epsilon^{2}+O(\epsilon^{3})\,.

When p1p-1 is not too large relative to dd, a kernel constructed with p\mathcal{L}_{p} is locally well-approximated by a kernel constructed with 𝒟f,Euc1/p\mathscr{D}_{f,\text{Euc}}^{1/p}. Thus, in a Euclidean ball of radius ϵ\epsilon, we may think of the Gaussian p\mathcal{L}_{p} kernel as:

h1(p(xi,xj)ϵ1/p)\displaystyle h_{1}\left(\frac{\mathcal{L}_{p}(x_{i},x_{j})}{\epsilon^{1/p}}\right) h1p(xixjϵ(f(xi)f(xj))p12d).\displaystyle\approx h_{\frac{1}{p}}\left(\frac{\|x_{i}-x_{j}\|}{\epsilon(f(x_{i})f(x_{j}))^{\frac{p-1}{2d}}}\right).

Density plays a different role in this kernel compared with those of Section 2.1. This kernel strengthens connections in high density regions and weakens them in low density regions.

We note that the 1p\frac{1}{p}-power in Definition 1.2 has a large impact, in that p\mathcal{L}_{p}-based and pp\mathcal{L}_{p}^{p}-based kernels have very different properties. More specifically, h1(pp/ϵ)h_{1}(\mathcal{L}_{p}^{p}/\epsilon) is a local kernel as defined in [12], so it is sufficient to analyze the kernel locally. However h1(p/ϵ1/p)h_{1}(\mathcal{L}_{p}/\epsilon^{1/p}) is a non-local kernel, so that non-trivial connections between distant points are possible. The analysis in this Section thus establishes the global equivalence of h1(pp/ϵ)h_{1}(\mathcal{L}_{p}^{p}/\epsilon) and h1(𝒟f,Euc/ϵ)h_{1}(\mathscr{D}_{f,\text{Euc}}/\epsilon) (when pp is not too large relative to dd) but only the local equivalence of h1p(pp/ϵ)h_{\frac{1}{p}}(\mathcal{L}_{p}^{p}/\epsilon) and h1p(𝒟f,Euc/ϵ)h_{\frac{1}{p}}(\mathscr{D}_{f,\text{Euc}}/\epsilon).

2.3 The Role of pp: Examples

This subsection illustrates the useful properties of PWSPDs and the role of pp on three synthetic data sets in 2\mathbb{R}^{2}: (1) Two Rings data, consisting of two non-convex clusters that are well-separated by a low-density region; (2) Long Bottleneck data, consisting of two isotropic clusters each with a density gap connected by a long, thin bottleneck; (3) Short Bottleneck data, where two elongated clusters are connected by a short bottleneck. The data sets are shown in Figures 1, 2, and 3, respectively. We also show the PWSPD spectral embedding (denoted PWSPD SE) for various pp, computed from a symmetric normalized Laplacian constructed with PWSPD. The scaling parameter ϵ\epsilon for each data set is chosen as the 15th15^{\text{th}} percentile of pairwise PWSPD distances.

Different aspects of the data are emphasized in the low-dimensional PWSPD embedding as pp varies. Indeed, in Figure 1, we see the PWSPD embedding separates the rings for large pp but not for small pp. In Figure 2, we see separation across the bottleneck for pp small, while for pp large there is separation with respect to the density gradients that appear in the two bells of the dumbbell. Interestingly, separation with respect to both density and geometry is observed for p=2p=2 (see Figure 2(g)). In Figure 3, the clusters are both elongated and lack robust density separation, but the PWSPD embedding well-separates the two clusters for moderate pp. In general, pp close to 1 emphasizes the geometry of the data, large pp emphasizes the density structure of the data, and moderate pp defines a metric balancing these two considerations.

Refer to caption
(a) Two Rings
Refer to caption
(b) PWSPD SE, p=1.2p=1.2
Refer to caption
(c) PWSPD SE, p=2p=2
Refer to caption
(d) PWSPD SE, p=5p=5
Refer to caption
(e) OA, Euc. SC
Refer to caption
(f) OA, PWSPD SC
Refer to caption
(g) K^\hat{K}, Euc. SC
Refer to caption
(h) K^\hat{K}, PWSPD SC (binarized)
Figure 1: Two Rings dataset. Because the underlying cluster structure is density-driven, the PWSPD SE separates the clusters for large pp (Figure 1(d)). While taking ϵ\epsilon small in Euclidean spectral clustering can allow for good clustering accuracy (see Figure 1(e)), the range is narrow and does not permit accurate estimation of KK via the eigengap (see Figure 1(g)). On the other hand, PWSPD consistently clusters well and correctly captures K=2K=2 for a wide range of (ϵ,p)(\epsilon,p) pairs (see Figures 1(g), 1(h)). Generally, PWSPD allows for fully unsupervised clustering as long as pp is sufficiently large and ϵ\epsilon not too small.
Refer to caption
(a) L. Bottleneck, K=2K=2
Refer to caption
(b) PWSPD SE., p=1.2p=1.2
Refer to caption
(c) OA, PWSPD SC
Refer to caption
(d) K^\hat{K}, PWSPD SC
Refer to caption
(e) OA, Euc. SC
Refer to caption
(f) L. Bottleneck, K=4K=4
Refer to caption
(g) PWSPD SE, p=2p=2
Refer to caption
(h) OA, PWSPD SC
Refer to caption
(i) K^\hat{K}, PWSPD SC
Refer to caption
(j) OA, Euc SC+DMN
Refer to caption
(k) L. Bottleneck, K=3K=3
Refer to caption
(l) PWSPD SE, p=5p=5
Refer to caption
(m) OA, PWSPD SC
Refer to caption
(n) K^\hat{K}, PWSPD SC
Refer to caption
(o) K^\hat{K}, Euc. SC
Figure 2: Long Bottleneck dataset. Different latent cluster structures exist in this data, driven by geometry (Figure 2(a), K=2K=2), density (Figure 2(k), K=3K=3), and a combination of geometry and density (Figure 2(f), K=4K=4). When varying pp, the PWSPD SE separates by geometry (Figure 2(b)) for pp near 1, before separating by density for p1p\gg 1 (Figure 2(l)). Given the correct choice of ϵ\epsilon and a priori knowledge of KK, any of the three natural clusterings can be learned by Euclidean SC (Figure 2(e), 2(j)). However, in the Euclidean SC case, correct estimation of KK fails to coincide with parameters that give good clustering results (Figure 2(o)). On the other hand, PWSPD SC is able to correctly estimate each of K=2,3,4K=2,3,4 for some choice of (ϵ,p)(\epsilon,p) parameters in the same region that such parameters yield high clustering accuracy (Figures 2(c), 2(d) for K=2K=2; Figures 2(m), 2(n) for K=3K=3; Figures 2(h), 2(i) for K=4K=4).
Refer to caption
(a) Short Bottleneck
Refer to caption
(b) PWSPD SE, p=1.2p=1.2
Refer to caption
(c) PWSPD SE, p=2p=2
Refer to caption
(d) PWSPD SE, p=5p=5
Refer to caption
(e) OA, Euc. SC
Refer to caption
(f) OA, PWSPD SC
Refer to caption
(g) K^\hat{K}, Euc. SC
Refer to caption
(h) K^\hat{K}, PWSPD SC (binarized)
Figure 3: Short Bottleneck dataset. Because the underlying cluster structure is not driven entirely by geometry or density, the PWSPD SE separates the clusters for moderate pp (see Figure 3(c)). We note PWSPD is able to correctly learn KK and cluster accurately for ϵ\epsilon somewhat large and pp between 2 and 3 (see Figures 3(f), 3(h)), while Euclidean SC cannot simultaneously learn KK and cluster accurately (see Figures 3(e), 3(g)).

2.3.1 Comparison with Euclidean Spectral Clustering

To evaluate how pp impacts the clusterability of the PWSPD spectral embedding, we consider experiments in which we run spectral clustering under various graph constructions. We run KK-means for a range of parameters on the spectral embedding xi(ϕ2(xi),,ϕK(xi))x_{i}\mapsto(\phi_{2}(x_{i}),\dots,\phi_{K}(x_{i})), where ϕk\phi_{k} is the kthk^{\text{th}} lowest frequency eigenvector of the Laplacian. We construct the symmetric normalized Laplacian using PWSPD (denoted PWSPD SC) and also using Euclidean distances (denoted SC) and the Laplacian with diffusion maps normalization a=1a=1 (denoted SC+DMN). We vary ϵ\epsilon in the SC and SC+DMN methods, and both ϵ\epsilon and pp in the PWSPD SC method. Results for selt-tuning SC, in which the kkNN used to compute the local scaling parameter varies, are in Appendix D. To allow for comparisons across figures, ϵ\epsilon is varied across the percentiles of the pairwise distances in the underlying data, up to the 25th25^{th} percentile. We measure two outputs of the clustering experiments:

  1. (i)

    The overall accuracy (OA), namely the proportion of data points correctly labeled after alignment when KK is known a priori. For K=2K=2, similar results were observed when thresholding ϕ2\phi_{2} at 0 instead of running KK-means; see Appendix D.

  2. (ii)

    The eigengap estimate of the number of latent clusters: K^=argmaxk2λk+1λk\hat{K}=\operatorname*{arg\,max}_{k\geq 2}\lambda_{k+1}-\lambda_{k}, where 0=λ1λ2λn0=\lambda_{1}\leq\lambda_{2}\leq\dots\leq\lambda_{n} are the eigenvalues of the corresponding graph Laplacian. We note that experiments estimating KK by considering the ratio of consecutive eigenvalues were also performed, with similar results. In the case of PWSPD SC, we plot heatmaps of where KK is correctly estimated, with yellow corresponding to success (K^=K\hat{K}=K) and blue corresponding to failure (K^K\hat{K}\neq K).

The results in terms of OA and K^\hat{K} as a function of ϵ\epsilon and pp are in Figures 1, 2, 3. We see that when density separates the data clearly, as in the Two Rings data, PWSPD SC with large pp gives accurate clustering results, while small pp may fail. In this dataset, ϵ\epsilon very small allows for the data to be correctly clustered with SC and SC+DMN when KK is known a priori. However, the regime of ϵ\epsilon is so small that the eigenvalues become unhelpful for estimating the number of latent clusters. Unlike Euclidean spectral clustering, PWSPD SC correctly estimates K^=2\hat{K}=2 for a range of parameters, and achieves near-perfect clustering results for those parameters as well. Indeed, as shown by Figures 1(f), 1(h), PWSPD SC with pp large is able to do fully unsupervised clustering on the Two Rings data.

In the case of the Long Bottleneck dataset, there are three reasonable latent clusterings, depending on whether geometry, density, or both matter (see Figure 2(a), 2(k), 2(f)). PWSPD is able to balance between the geometry and density-driven cluster structure in the data. Indeed, all of the cluster configurations shown in Figure 2(a), 2(k), 2(f) are learnable without supervision for some choice of parameters (ϵ,p)(\epsilon,p). To capture the density cluster structure (K=3K=3), pp should be taken large, as suggested in Figure 2(m), 2(n). To capture the geometry cluster structure (K=2K=2), pp should be taken small and ϵ\epsilon large, as suggested by Figures 2(c), 2(d). Interestingly, both cluster and geometry (K=4K=4) can be captured by choosing pp moderate, as in Figure 2(h), 2(i). For Euclidean SC, varying ϵ\epsilon is insufficient to capture the rich structure of this data.

In the case of the Short Bottleneck, taking ϵ\epsilon large allows for the Euclidean methods to correctly estimate the number of clusters. But, in this ϵ\epsilon regime, the methods do not cluster accurately. On the other hand, taking pp between 2 and 3 and ϵ\epsilon large allows PWSPD to correctly estimate KK and also cluster accurately.

Overall, this suggests that varying pp in PWSPD SC has a different impact than varying the scaling parameter ϵ\epsilon, and can allow for richer cluster structures to be learned when compared to SC with Euclidean distances. In addition, PWSPDs generally allow for the underlying cluster structures to be learned in a fully unsupervised manner, while Euclidean methods may struggle to simultaneously cluster well and estimate KK accurately.

3 Spanners for PWSPD

Let 𝒢𝒳p\mathcal{H}\subset\mathcal{G}_{\mathcal{X}}^{p} denote a subgraph and recall the definition of p(,)\ell^{\mathcal{H}}_{p}(\cdot,\cdot) given in Definition 1.3.

Definition 3.1.

For t1t\geq 1, 𝒢𝒳p\mathcal{H}\subset\mathcal{G}_{\mathcal{X}}^{p} is a tt-spanner if p(x,y)tp(x,y)\ell^{\mathcal{H}}_{p}(x,y)\leq t\ell_{p}(x,y) for all x,y𝒳x,y\in\mathcal{X}.

Clearly p(x,y)p(x,y)\ell_{p}(x,y)\leq\ell^{\mathcal{H}}_{p}(x,y) always, as any path in \mathcal{H} is a path in 𝒢𝒳p\mathcal{G}^{p}_{\mathcal{X}}. Hence if \mathcal{H} is a 11-spanner we have equality: p(x,y)=p(x,y)\ell^{\mathcal{H}}_{p}(x,y)=\ell_{p}(x,y). Define the kkNN graph, 𝒢𝒳p,k\mathcal{G}^{p,k}_{\mathcal{X}}, by retaining only edges {x,y}\{x,y\} if xx is a kkNN of yy or vice versa. For appropriate k,pk,p and \mathcal{M} it is known that 𝒢𝒳p,k\mathcal{G}^{p,k}_{\mathcal{X}} is a 11-spanner of 𝒢𝒳p\mathcal{G}^{p}_{\mathcal{X}} w.h.p. Specifically, [33] shows this when \mathcal{M} is an open connected set with C1C^{1} boundary, 1<p<1<p<\infty and k=O(cp,dlog(n))k=O(c_{p,d}\log(n)) for a constant cp,dc_{p,d} depending on p,dp,d. One can deduce cp,d2d+13ddd/2c_{p,d}\geq 2^{d+1}3^{d}d^{d/2}, while the dependence on pp is more obscure. A different approach is used in [20] to show this for arbitrary smooth, closed, isometrically embedded \mathcal{M}, 2p<2\leq p<\infty and k=O(2dlog(n))k=O(2^{d}\log(n)), where OO hides constants depending on the geometry of \mathcal{M}. In both cases ff must be continuous and bounded away from zero.

Under these assumptions, we prove 𝒢𝒳p,k\mathcal{G}^{p,k}_{\mathcal{X}} is a 11-spanner w.h.p., for any smooth, closed, isometrically embedded \mathcal{M} with mild restrictions on its curvature. Our results hold generally for 1<p<1<p<\infty and enjoy improved dependence of kk on dd and explicit dependence of kk on pp and the geometry of \mathcal{M} compared to [33, 20]. We also consider an intrinsic version of PWSPD,

,p(x,y)=(minπ={xij}j=1Tj=1T1𝒟(xij,xij+1)p)1/p,\displaystyle\ell_{\mathcal{M},p}(x,y)=\left(\min_{\pi=\{x_{i_{j}}\}_{j=1}^{T}}\sum_{j=1}^{T-1}\mathscr{D}(x_{i_{j}},x_{i_{j+1}})^{p}\right)^{1/p},

where 𝒟(,)\mathscr{D}(\cdot,\cdot) is assumed known, which is not typically the case in data science. However this situation can occur when 𝒳\mathcal{X} is presented as a subset of D\mathbb{R}^{D}, but one wishes to analyze 𝒳\mathcal{X} with an exotic metric (i.e. not \|\cdot\|). For example, if each xi𝒳x_{i}\in\mathcal{X} is an image, a Wasserstein metric may be more appropriate than \|\cdot\|. As this case closely mirrors the statement and proof of Theorem 11 we leave it to Appendix E. Before proceeding we introduce some further terminology:

Definition 3.2.

The edge {x,y}\{x,y\} is critical if it is in the shortest path from xx to yy in 𝒢𝒳p\mathcal{G}_{\mathcal{X}}^{p}.

Lemma 3.3.

[20] 𝒢𝒳p\mathcal{H}\subset\mathcal{G}_{\mathcal{X}}^{p} is a 11-spanner if it contains every critical edge of 𝒢𝒳p\mathcal{G}_{\mathcal{X}}^{p}.

3.1 Nearest Neighbors and PWSPD Spanners

A key proof ingredient is the following definition, which generalizes the role of spheres in the proof of Theorem 1.3 in [20].

Definition 3.4.

For any x,ydx,y\in\mathbb{R}^{d} and α(0,1]\alpha\in(0,1], the pp-elongated set associated to x,yx,y is

𝒟α,p(x,y)={zd:xzp+yzpαxyp}.\mathcal{D}_{\alpha,p}(x,y)=\left\{z\in\mathbb{R}^{d}:\ \|x-z\|^{p}+\|y-z\|^{p}\leq\alpha\|x-y\|^{p}\right\}.

Visualizations of 𝒟1,p(x,y)2\mathcal{D}_{1,p}(x,y)\subset\mathbb{R}^{2} are shown in Figure 4. 𝒟1,p(x,y)\mathcal{D}_{1,p}(x,y) is the set of points zz such that the two-hop path, xzyx\to z\to y, is p\ell_{p}-shorter than the one-hop path, xyx\to y. Hence:

Refer to caption
Refer to caption
Refer to caption
Figure 4: Plots of 𝒟1,p((12,0),(12,0))\mathcal{D}_{1,p}((-\frac{1}{2},0),(\frac{1}{2},0)) for p=1.01,2,100p=1.01,2,100. We see that for smaller pp, the set becomes quite small, converging to a line segment as p1+p\rightarrow 1^{+}. For p=2p=2, the pp-elongated set is a circle. As pp increases, 𝒟1,p((12,0),(12,0))\mathcal{D}_{1,p}((-\frac{1}{2},0),(\frac{1}{2},0)) converges to a set resembling a vertically-oriented American football.
Lemma 3.5.

If there exists z𝒟1,p(x,y)𝒳z\in\mathcal{D}_{1,p}(x,y)\cap\mathcal{X} then the edge {x,y}\{x,y\} is not critical.

We defer the proof of the following technical Lemma to Appendix B.

Lemma 3.6.

Let r:=xyr:=\|x-y\|, xM=x+y2x_{M}=\frac{x+y}{2}, and r:=rα2/p41/p14r^{\star}:=r\sqrt{\frac{\alpha^{2/p}}{4^{1/p}}-\frac{1}{4}} for α>21p\alpha>2^{1-p}. Then:

B(xM,r)𝒟α,p(x,y)B(x,r).B(x_{M},r^{\star})\subset\mathcal{D}_{\alpha,p}(x,y)\subset B(x,r).

For α=1\alpha=1, [33] makes a similar claim but crucially does not quantify the dependence of the radius of this ball on pp. Before proceeding, we introduce two regularity assumptions:

Definition 3.7.

D\mathcal{M}\subset\mathbb{R}^{D} is in V(d,κ0,ϵ0)V(d,\kappa_{0},\epsilon_{0}) for κ01\kappa_{0}\geq 1 and ϵ0>0\epsilon_{0}>0 if it is connected and for all x,ϵ(0,ϵ0)x\in\mathcal{M},\ \epsilon\in(0,\epsilon_{0}) we have: κ01ϵdvol(B(x,ϵ))/vol(B(0,1))κ0ϵd\kappa_{0}^{-1}\epsilon^{d}\leq\text{vol}(\mathcal{M}\cap B(x,\epsilon))/\text{vol}(B(0,1))\leq\kappa_{0}\epsilon^{d}.

Definition 3.8.

A compact manifold D\mathcal{M}\subset\mathbb{R}^{D} has reach ζ>0\zeta>0 if every xDx\in\mathbb{R}^{D} satisfying dist(x,):=minyxy<ζ\text{dist}(x,\mathcal{M}):=\min_{y\in\mathcal{M}}\|x-y\|<\zeta has a unique projection onto \mathcal{M}.

Theorem 3.9.

Let V(d,κ0,ϵ0)\mathcal{M}\in V(d,\kappa_{0},\epsilon_{0}) be a compact manifold with reach ζ>0\zeta>0. Let 𝒳={xi}i=1n\mathcal{X}=\{x_{i}\}_{i=1}^{n} be drawn i.i.d. from \mathcal{M} according to a probability distribution with continuous density ff satisfying 0<fminf(x)fmax0<f_{\min}\leq f(x)\leq f_{\max} for all xx\in\mathcal{M}. For p>1p>1 and nn sufficiently large, 𝒢𝒳p,k\mathcal{G}^{p,k}_{\mathcal{X}} is a 11-spanner of 𝒢𝒳p\mathcal{G}^{p}_{\mathcal{X}} with probability at least 11/n1-1/n if

k4κ02[fmaxfmin][4411/p1]d/2log(n).k\geq 4\kappa_{0}^{2}\left[\frac{f_{\max}}{f_{\min}}\right]\left[\frac{4}{4^{1-1/p}-1}\right]^{d/2}\log(n). (11)
Proof.

In light of Lemma 3.3 we prove that, with probability at least 11/n1-1/n, 𝒢𝒳p,k\mathcal{G}^{p,k}_{\mathcal{X}} contains every critical edge of 𝒢𝒳p\mathcal{G}^{p}_{\mathcal{X}}. Equivalently, we show every edge of 𝒢𝒳p\mathcal{G}^{p}_{\mathcal{X}} not contained in 𝒢𝒳p,k\mathcal{G}^{p,k}_{\mathcal{X}} is not critical.

For any c,ϵ>0c,\epsilon>0, [maxx,y𝒳p(x,y)ϵ]1c/n\mathbb{P}\left[\displaystyle\max_{x,y\in\mathcal{X}}\ell_{p}(x,y)\leq\epsilon\right]\geq 1-c/n for nn sufficiently large [46]. So, let nn be sufficiently large so that [p(x,y)min{ϵ0,ζd141/p14} for all x,y𝒳](112n)\mathbb{P}\left[\ell_{p}(x,y)\leq\min\left\{\epsilon_{0},\frac{\zeta}{d}\sqrt{\frac{1}{4^{1/p}}-\frac{1}{4}}\right\}\text{ for all }x,y\in\mathcal{X}\right]\geq\left(1-\frac{1}{2n}\right). Pick any x,y𝒳x,y\in\mathcal{X} which are not kkNNs and let r:=xyr:=\|x-y\|. If r>min{ϵ0,ζd141/p14}r>\min\left\{\epsilon_{0},\frac{\zeta}{d}\sqrt{\frac{1}{4^{1/p}}-\frac{1}{4}}\right\}, then p(x,y)<xy\ell_{p}(x,y)<\|x-y\| and thus the edge {x,y}\{x,y\} is not critical. So, suppose without loss of generality in what follows that rmin{ϵ0,ζd141/p14}r\leq\min\left\{\epsilon_{0},\frac{\zeta}{d}\sqrt{\frac{1}{4^{1/p}}-\frac{1}{4}}\right\}.

Define r1:=r141/p14r_{1}^{\star}:=r\sqrt{\frac{1}{4^{1/p}}-\frac{1}{4}} and r2:=r(141/p14r4ζ)r^{\star}_{2}:=r\left(\sqrt{\frac{1}{4^{1/p}}-\frac{1}{4}}-\frac{r}{4\zeta}\right); note that r2>0r_{2}^{\star}>0 by the assumption rζd141/p14r\leq\frac{\zeta}{d}\sqrt{\frac{1}{4^{1/p}}-\frac{1}{4}}. Let xM:=x+y2x_{M}:=\frac{x+y}{2} and let x~M:=argminzxMz\tilde{x}_{M}:=\operatorname*{arg\,min}_{z\in\mathcal{M}}\|x_{M}-z\| be the projection of xMx_{M} onto \mathcal{M}, which is unique because r<ζr<\zeta. By Lemma 3.6, B(xM,r1)𝒟1,p(x,y)B(x,r)B(x_{M},r_{1}^{\star})\subset\mathcal{D}_{1,p}(x,y)\subset B(x,r). By Lemma B.1, B(x~M,r2)B(xM,r1)B(\tilde{x}_{M},r_{2}^{\star})\subset B(x_{M},r_{1}^{\star}). Let xi1,,xikx_{i_{1}},\ldots,x_{i_{k}} denote the kkNNs of xx, ordered randomly. Because yy is not a kkNN of xx, xxijxy=r\|x-x_{i_{j}}\|\leq\|x-y\|=r for j=1,,kj=1,\ldots,k. Thus, xijB(x,r)x_{i_{j}}\in B(x,r) and so by Lemma B.2 we bound for fixed jj

[xij𝒟1,p(x,y)|xijB(x,r)]\displaystyle\mathbb{P}\left[x_{i_{j}}\in\mathcal{D}_{1,p}(x,y)\ |\ x_{i_{j}}\in B(x,r)\right] [xijB(x~M,r2)|xijB(x,r)]\displaystyle\geq\mathbb{P}\left[x_{i_{j}}\in B(\tilde{x}_{M},r_{2}^{\star})\ |\ x_{i_{j}}\in B(x,r)\right] (12)
34κ02fminfmax(141/p14)d/2=:ε,p,f.\displaystyle\geq\frac{3}{4}\kappa_{0}^{-2}\frac{f_{\min}}{f_{\max}}\left(\frac{1}{4^{1/p}}-\frac{1}{4}\right)^{d/2}=:\varepsilon_{\mathcal{M},p,f}. (13)

Because the xijx_{i_{j}} are all independently drawn:

[j with xij𝒟1,p(x,y)]=j=1k[xij𝒟1,p(x,y)|xijB(x,r)](1ε,p,f)k.\displaystyle\mathbb{P}\left[\not\exists j\text{ with }x_{i_{j}}\in\mathcal{D}_{1,p}(x,y)\right]=\prod_{j=1}^{k}\mathbb{P}\left[x_{i_{j}}\notin\mathcal{D}_{1,p}(x,y)\ |\ x_{i_{j}}\in B(x,r)\right]\leq\left(1-\varepsilon_{\mathcal{M},p,f}\right)^{k}.

A routine calculations reveals that for k3lognlog(1ε,p,f)k\geq\frac{3\log n}{-\log(1-\varepsilon_{\mathcal{M},p,f})},

[j with xij𝒟1,p(x,y)]=1[j with xij𝒟1,p(x,y)]11n3.\displaystyle\mathbb{P}\left[\exists j\text{ with }x_{i_{j}}\in\mathcal{D}_{1,p}(x,y)\right]=1-\mathbb{P}\left[\not\exists j\text{ with }x_{i_{j}}\in\mathcal{D}_{1,p}(x,y)\right]\geq 1-\frac{1}{n^{3}}. (14)

By Lemma 3.5 we conclude the edge {x,y}\{x,y\} is not critical with probability exceeding 11n31-\frac{1}{n^{3}}. There are fewer than n(n1)/2n(n-1)/2 such non-kkNN pairs x,y𝒳x,y\in\mathcal{X}. These edges {x,y}\{x,y\} are precisely those contained in 𝒢𝒳p\mathcal{G}^{p}_{\mathcal{X}} but not in 𝒢𝒳p,k\mathcal{G}^{p,k}_{\mathcal{X}}. By the union bound and (14) we conclude that none of these are critical with probability greater than 1n(n1)21n3112n1-\frac{n(n-1)}{2}\frac{1}{n^{3}}\geq 1-\frac{1}{2n}. This was conditioned on p(x,y)min{ϵ0,ζd141/p14}\ell_{p}(x,y)\leq\min\left\{\epsilon_{0},\frac{\zeta}{d}\sqrt{\frac{1}{4^{1/p}}-\frac{1}{4}}\right\} for all x,y𝒳x,y\in\mathcal{X}, which holds with probability exceeding 112n1-\frac{1}{2n}. Thus, all critical edges are contained in 𝒢p,k𝒳\mathcal{G}_{p,k}^{\mathcal{X}} with probability exceeding 1(12n+12n)=11n1-\left(\frac{1}{2n}+\frac{1}{2n}\right)=1-\frac{1}{n}. Unpacking ε,p,f\varepsilon_{\mathcal{M},p,f} yields the claimed lower bound on kk. ∎

In (11), the explicit dependence of kk on κ0,p\kappa_{0},p, and dd are shown. The 4κ024\kappa_{0}^{2} factor corresponds to the geometry of \mathcal{M}. The numerical constant 4, which is not tight, stems from accounting for the reach of \mathcal{M}. If \mathcal{M} is convex (i.e. ζ=\zeta=\infty) then it can be replaced with 33. The second factor in (11) is controlled by the probability distribution while the third corresponds to pp and dd. For p=2p=2 and ignoring geometric and density factors we attain k=O(2dlog(n))k=O(2^{d}\log(n)) as in [20]. For large pp we get kO((43)d/2log(n))k\approx O\left(\left(\frac{4}{3}\right)^{d/2}\log(n)\right), thus improving the dependence of kk on dd given in [33, 20]. Finally, using Corollary 4.4 of [46] we can sharpen the qualitative requirement that nn be “sufficiently large” to the quantitative lower bound nCmax{[dζ]pdp1[4411/p1]pd2(p1),[1ϵ0]pdp1}n\geq C\max\left\{\left[\frac{d}{\zeta}\right]^{\frac{pd}{p-1}}\left[\frac{4}{4^{1-1/p}-1}\right]^{\frac{pd}{2(p-1)}},\left[\frac{1}{\epsilon_{0}}\right]^{\frac{pd}{p-1}}\right\} for a constant CC depending on the geometry of \mathcal{M}. So, when \mathcal{M} is high-dimensional, has small reach, or when pp is close to 1, nn may need to be quite large for kk as in (11) to yield a 1-spanner.

3.2 Numerical Experiments

We verify the claimed dependence of kk on n,pn,p and dd ensures that 𝒢𝒳p,k\mathcal{G}^{p,k}_{\mathcal{X}} is a 11-spanner of 𝒢𝒳p\mathcal{G}^{p}_{\mathcal{X}} numerically. To generate Figures 5(a)5(f) we:

  1. (1)

    Fix p,d,,p,d,\mathcal{M}, and ff, then generate a sequence of (n,k)(n,k) pairs.

  2. (2)

    For each (n,k)(n,k), do:

    1. (i)

      Generate 𝒳={xi}i=1n\mathcal{X}=\{x_{i}\}_{i=1}^{n} by sampling i.i.d. from ff on \mathcal{M}.

    2. (ii)

      For all pairs {xi,xj}\{x_{i},x_{j}\} compute p(xi,xj)\ell_{p}(x_{i},x_{j}) and p𝒢𝒳p,k(xi,xj)\ell_{p}^{\mathcal{G}^{p,k}_{\mathcal{X}}}(x_{i},x_{j}).

    3. (iii)

      If max1i<jn|p(xi,xj)p𝒢𝒳p,k(xi,xj)|>1010\displaystyle\max_{1\leq i<j\leq n}\left|\ell_{p}(x_{i},x_{j})-\ell_{p}^{\mathcal{G}^{p,k}_{\mathcal{X}}}(x_{i},x_{j})\right|>10^{-10} record “failure”; else, record “success”.

  3. (3)

    Repeat step 2 twenty times and compute the proportion of successes.

As can be seen from Figure 5, there is a sharp transition between an “all failures” and an “all successes” regime. The transition line is roughly linear when viewed using semi-log-x axes, i.e. klog(n)k\propto\log(n). Moreover the slope of the line-of-best-fit to this transition line decreases with increasing pp (compare Figure 5(a)-5(c)) and depends on intrinsic, not extrinsic dimension (compare Figure 5(b) and 5(d)), as predicted by Theorem 11. Intriguingly, there is little difference between Figure 5(b) (uniform distribution) and Figure 5(e) (Gaussian distribution), suggesting that perhaps the assumption fmin>0f_{\min}>0 in Theorem 11 is unnecessary. Finally, we observe that the constant of proportionality (i.e. CC such that k=Clognk=C\log n) predicted by Theorem 11 appears pessimistic. For Figure 5(a)-5(c), Theorem 11 predicts C=484.03,128C=484.03,128 and 21.7621.76 respectively (taking κ0=1\kappa_{0}=1 due to the flat domain), while empirically the slope of the line-of-best-fit is 43.4343.43, 25.3325.33 and 0.290.29 respectively.

In Figure 5(f), we consider an intrinsically 4-dimensional set corrupted with Gaussian noise (standard deviation 0.10.1) in the fifth dimension. Interestingly, the scaling with kk is more efficient than as shown in Figure 5(a) for the intrinsically 5-dimensional data. This suggests that measures which concentrate near low-dimensional sets benefit from that low-dimensionality, even if they are not supported exactly on it.

We also consider relaxing the success condition (2.iii). We define \mathcal{H} to be a (t,ω)(t,\omega)-spanner if p(x,y)tp(x,y)\ell^{\mathcal{H}}_{p}(x,y)\leq t\ell_{p}(x,y) for ω(0,1]\omega\in(0,1] proportion of the edges, so that Theorem 3.9 pertains to (1,1)-spanners. Figures 5(g) and 5(h) show the minimal ω\omega (averaged across simulations) for which 𝒢𝒳p,k\mathcal{G}^{p,k}_{\mathcal{X}} is a (1.1,ω)(1.1,\omega)-spanner and a (1.01,ω)(1.01,\omega)-spanner respectively; the red lines trace out the requirements for 𝒢𝒳p,k\mathcal{G}^{p,k}_{\mathcal{X}} to be a (1.1,1)(1.1,1)-spanner and (1.01,1)(1.01,1)-spanner respectively. Comparing with Figure 5(b), we see that the required scaling for 𝒢𝒳p,k\mathcal{G}^{p,k}_{\mathcal{X}} to be a (1+ϵ,1)(1+\epsilon,1)-spanner is similar to the required scaling to be a (1,1)(1,1)-spanner, at least for ϵ>0\epsilon>0 small. However, the required scaling for (1+ϵ,ω)(1+\epsilon,\omega)-spanners (ω<1\omega<1) is quite different and much less restrictive, even for ω\omega very close to 1; for example the requirement for 𝒢𝒳p,k\mathcal{G}^{p,k}_{\mathcal{X}} to be a (1.01,0.95)(1.01,0.95)-spanner appears sublinear in the log2(n)\log_{2}(n) versus kk plot (see Figure 5(h)). If this notion of approximation is acceptable, our empirical results suggest one can enjoy much greater sparsity. Finally, in Figure 5(i) we compute the minimal t1t\geq 1 such that 𝒢𝒳p,k\mathcal{G}^{p,k}_{\mathcal{X}} is a (t,1)(t,1)-spanner of 𝒢𝒳p\mathcal{G}^{p}_{\mathcal{X}}; again the overall transition patterns for (t,1)(t,1)-spanners are similar to the (1,1)(1,1)-spanner case in Figure 5(b) when tt is close to 1. Overall we see that greater sparsity is permissible in these relaxed cases, and analyzing such notions rigorously is a topic of ongoing research.

Refer to caption
(a) p=1.5,=[0,1]5p=1.5,\mathcal{M}=[0,1]^{5}. Uniform.
Refer to caption
(b) p=2,=[0,1]5p=2,\mathcal{M}=[0,1]^{5}. Uniform.
Refer to caption
(c) p=10,=[0,1]5p=10,\mathcal{M}=[0,1]^{5}. Uniform.
Refer to caption
(d) p=2,=𝕊45p=2,\mathcal{M}=\mathbb{S}^{4}\subset\mathbb{R}^{5}. Uniform.
Refer to caption
(e) p=2,=[0,1]5p=2,\mathcal{M}=[0,1]^{5}. Gaussian dist.
Refer to caption
(f) p=1.5,=[0,1]4p=1.5,\mathcal{M}=[0,1]^{4}, Uniform+noise.
Refer to caption
(g) p=2,=[0,1]5p=2,\mathcal{M}=[0,1]^{5}. Uniform.
Refer to caption
(h) p=2,=[0,1]5p=2,\mathcal{M}=[0,1]^{5}. Uniform.
Refer to caption
(i) p=2,=[0,1]5p=2,\mathcal{M}=[0,1]^{5}. Uniform.
Figure 5: Figures 5(a)5(f) show the proportion of randomly generated data sets for which 𝒢𝒳p,k\mathcal{G}^{p,k}_{\mathcal{X}} is a 11-spanner of 𝒢𝒳p\mathcal{G}^{p}_{\mathcal{X}}. The red line is the line of best fit through the cells representing the first value of kk, for each value of nn, for which all trials were successful, i.e. it is the line ensuring 𝒢𝒳p,k\mathcal{G}^{p,k}_{\mathcal{X}} is a 11-spanner. The slopes of Figure 5(a)5(f) are, respectively, 43.4343.43, 25.3325.33, 0.290.29, 14.1814.18, 18.7918.79, and 40.040.0. Figures 5(g) and 5(h) show the minimal ω\omega (averaged across simulations) for which 𝒢𝒳p,k\mathcal{G}^{p,k}_{\mathcal{X}} is a (1.1,ω)(1.1,\omega)-spanner and a (1.01,ω)(1.01,\omega)-spanner respectively; the red lines trace out the requirements for 𝒢𝒳p,k\mathcal{G}^{p,k}_{\mathcal{X}} to be a (1.1,1)(1.1,1)-spanner and (1.01,1)(1.01,1)-spanner respectively. Figure 5(i) shows the minimal t1t\geq 1 such that 𝒢𝒳p,k\mathcal{G}^{p,k}_{\mathcal{X}} is a (t,1)(t,1)-spanner of 𝒢𝒳p\mathcal{G}^{p}_{\mathcal{X}}, and the red line traces out the (1,1)(1,1)-spanner requirement.

4 Global Analysis: Statistics on PWSPD and Percolation

We recall that after a suitable normalization, p\ell_{p} is a consistent estimator for p\mathcal{L}_{p}. Indeed, [39, 33] prove that for any d1d\geq 1, p>1p>1, there exists a constant Cp,dC_{p,d} independent of nn such that limn~p(x,y)=Cp,dp(x,y)\displaystyle\lim_{n\rightarrow\infty}\tilde{\ell}_{p}(x,y)=C_{p,d}\mathcal{L}_{p}(x,y). The important question then arises: how quickly does ~p\widetilde{\ell}_{p} converge? How large does nn need to be to guarantee the error incurred by approximating p\mathcal{L}_{p} with ~p\widetilde{\ell}_{p} is small? To answer this question we turn to results from Euclidean first passage percolation (FPP) [36, 37, 6, 24]. For any discrete set 𝒳\mathcal{X}, we let p(x,y,𝒳)\ell_{p}(x,y,\mathcal{X}) denote the PWSPD computed in the set 𝒳{x}{y}\mathcal{X}\cup\{x\}\cup\{y\}.

4.1 Overview of Euclidean First Passage Percolation

Euclidean FPP analyzes pp(0,z,H1)\ell_{p}^{p}(0,z,H_{1}), where H1H_{1} is a homogeneous, unit intensity Poisson point process (PPP) on d\mathbb{R}^{d}.

Definition 4.1.

A (homogeneous) Poisson point process (PPP) on d\mathbb{R}^{d} is a point process such that for any bounded subset AdA\subset\mathbb{R}^{d}, nAn_{A} (the number of points in AA) is a random variable with distribution [nA=m]=1m!(λ|A|)meλ|A|\mathbb{P}[n_{A}=m]=\frac{1}{m!}(\lambda|A|)^{m}e^{-\lambda|A|}; λ\lambda is the intensity of the PPP.

It is known that

limzpp(0,z,H1)z\displaystyle\lim_{\|z\|\rightarrow\infty}\frac{\ell_{p}^{p}(0,z,H_{1})}{\|z\|} =μ,\displaystyle=\mu\,, (15)

where μ=μp,d\mu=\mu_{p,d} is a constant depending only on p,dp,d known as the time constant. The convergence of pp(0,z,H1)\ell_{p}^{p}(0,z,H_{1}) is studied by decomposing the error into random and deterministic fluctuations, i.e.

pp(0,z,H1)μz\displaystyle\ell_{p}^{p}(0,z,H_{1})-\mu\|z\| =pp(0,z,H1)𝔼[pp(0,z,H1)]random+𝔼[pp(0,z,H1)]μzdeterministic.\displaystyle=\underbrace{\ell_{p}^{p}(0,z,H_{1})-\mathbb{E}[\ell_{p}^{p}(0,z,H_{1})]}_{\text{random}}+\underbrace{\mathbb{E}[\ell_{p}^{p}(0,z,H_{1})]-\mu\|z\|}_{\text{deterministic}}\,.

In terms of mean squared error (MSE), one has the standard bias-variance decomposition:

𝔼[(pp(0,z,H1)μz)2]=(𝔼[pp(0,z,H1)]μz)2+Var[pp(0,z,H1)].\mathbb{E}\left[\left(\ell_{p}^{p}(0,z,H_{1})-\mu\|z\|\right)^{2}\right]=\left(\mathbb{E}[\ell_{p}^{p}(0,z,H_{1})]-\mu\|z\|\right)^{2}+\text{Var}\left[\ell_{p}^{p}(0,z,H_{1})\right].

The following Proposition is well known in the Euclidean FPP literature.

Proposition 4.2.

Let d2d\geq 2 and p>1p>1. Then 𝔼[(pp(0,z,H1)μz)2]Czlog2(z)\mathbb{E}\left[\left(\ell_{p}^{p}(0,z,H_{1})-\mu\|z\|\right)^{2}\right]\leq C\|z\|\log^{2}(\|z\|) for a constant CC depending only on p,dp,d.

Proof.

By Theorem 2.1 in [37], Var[pp(0,z,H1)]Cz\text{Var}\left[\ell_{p}^{p}(0,z,H_{1})\right]\leq C\|z\|. By Theorem 2.1 in [3], (𝔼[μz]μz)2Czlog2(z)\left(\mathbb{E}\left[\mu\|z\|\right]-\mu\|z\|\right)^{2}\leq C\|z\|\log^{2}(\|z\|). ∎

Although Var[pp(0,z,H1)]Cz\text{Var}\left[\ell_{p}^{p}(0,z,H_{1})\right]\leq C\|z\| is the best bound which has been proved, the fluctuation rate is known to in fact depend on the dimension, i.e. Var[pp(0,z,H1)]z2χ\text{Var}\left[\ell_{p}^{p}(0,z,H_{1})\right]\sim\|z\|^{2\chi} for some exponent χ=χ(d)12\chi=\chi(d)\leq\frac{1}{2}. Strong evidence is provided in [24] that the bias can be bounded by the variance, so the exponent χ\chi very likely controls the total convergence rate.

The following tail bound is also known [37].

Proposition 4.3.

Let d2,p>1,β1=min{1,d/p}d\geq 2,p>1,\beta_{1}=\min\{1,d/p\}, and β2=1/(4p+3)\beta_{2}=1/(4p+3). For any ϵ(0,β2)\epsilon\in(0,\beta_{2}), there exist constants C0C_{0} and C1C_{1} (depending on ϵ\epsilon) such that for z>0\|z\|>0 and z12+ϵtz12+β2ϵ\|z\|^{\frac{1}{2}+\epsilon}\leq t\leq\|z\|^{\frac{1}{2}+\beta_{2}-\epsilon}, [|pp(0,z,H1)μz|t]C1exp(C0(t/z)β1)\mathbb{P}\left[\left|\ell_{p}^{p}(0,z,H_{1})-\mu\|z\|\right|\geq t\right]\leq C_{1}\exp\left(-C_{0}(t/\sqrt{\|z\|})^{\beta_{1}}\right).

4.2 Convergence Rates for PWSPD

We wish to utilize the results in Section 4.1 to obtain convergence rates for PWSPD. However, we are interested in PWSPD computed on a compact set with boundary MM and the convergence rate of p\ell_{p} rather than pp\ell_{p}^{p}. To simplify the analysis, we restrict our attention to the following idealized model.

Assumption 1.

Let MdM\subseteq\mathbb{R}^{d} be a convex, compact, dd-dimensional set of unit volume containing the origin. Assume we sample nn points independently and uniformly from MM, i.e. f=1Mf=1_{M}, to obtain the discrete set 𝒳n\mathcal{X}_{n}. Let MτM_{\tau} denote the points in MM which are at least distance τ\tau from the boundary of MM, i.e. Mτ:={xM:minyMxy>τ}M_{\tau}:=\{x\in M:\min_{y\in\partial M}\|x-y\|>\tau\}.

We establish three things: (i) Euclidean FPP results apply away from M\partial M; (ii) the time constant μ\mu equals the constant Cp,dC_{p,d} in (3); (iii) p\ell_{p} has the same convergence rate as pp\ell_{p}^{p}.

To establish (i), we let HnH_{n} denote a homogeneous PPP with rate λ=n\lambda=n, and let p(0,y,Hn)\ell_{p}(0,y,H_{n}) denote the length of the shortest path connecting 0 and yy in HnH_{n}. We also let 𝒳N=HnM\mathcal{X}_{N}=H_{n}\cap M and p(0,y,𝒳N)\ell_{p}(0,y,\mathcal{X}_{N}) denote the PWSPD in 𝒳N\mathcal{X}_{N}; note 𝔼[|𝒳N|]=n\mathbb{E}[|\mathcal{X}_{N}|]=n. To apply percolation results to our setting, the statistical equivalence of p(0,y,𝒳n),\ell_{p}(0,y,\mathcal{X}_{n}), p(0,y,𝒳N)\ell_{p}(0,y,\mathcal{X}_{N}), and p(0,y,Hn)\ell_{p}(0,y,H_{n}) must be established. For nn large, the equivalence of p(0,y,𝒳n)\ell_{p}(0,y,\mathcal{X}_{n}) and p(0,y,𝒳N)\ell_{p}(0,y,\mathcal{X}_{N}) is standard and we omit any analysis. The equivalence of p(0,y,𝒳N)\ell_{p}(0,y,\mathcal{X}_{N}) and p(0,y,Hn)\ell_{p}(0,y,H_{n}) is less clear. In particular, how far away from M\partial M do 0,y0,y need to be to ensure these metrics are the same? The following Proposition is a direct consequence of Theorem 2.4 from [37], and essentially guarantees the equivalence of the metrics as long as 0 and yy are at least distance O(n14d)O(n^{-\frac{1}{4d}}) from M\partial M.

Proposition 4.4.

Let d2d\geq 2, p>1p>1, β1=min{1,dp}\beta_{1}=\min\{1,\frac{d}{p}\}, β2=1/(4p+3)\beta_{2}=1/(4p+3), and ϵ(0,β22)\epsilon\in(0,\frac{\beta_{2}}{2}), and τ=n14d+ϵddiam(M)34+ϵ\tau=n^{-\frac{1}{4d}+\frac{\epsilon}{d}}\text{diam}(M)^{\frac{3}{4}+\epsilon}. Then for constants C0,C1C_{0},C_{1} (depending on ϵ\epsilon), for all 0,yMτ0,y\in M_{\tau}, the geodesics connecting 0,y0,y in 𝒳N\mathcal{X}_{N} and HnH_{n} are equal with probability at least 1C1exp(C0(n1dy)34ϵβ1)1-C_{1}\exp(-C_{0}(n^{\frac{1}{d}}\|y\|)^{\frac{3}{4}\epsilon\beta_{1}}), so that p(0,y,𝒳N)=p(0,y,Hn)\ell_{p}(0,y,\mathcal{X}_{N})=\ell_{p}(0,y,H_{n}).

Next we establish the equivalence of μp,d\mu_{p,d} (percolation time constant) and Cp,dC_{p,d} (PWSPD discrete-to-continuum normalization constant).

Proposition 4.5.

Let μp,d\mu_{p,d} be as in (15) and Cp,dC_{p,d} as in (3). Then μp,d1/p=Cp,d\mu_{p,d}^{1/p}=C_{p,d}.

Proof.

Suppose Assumption 1 holds and choose yMy\in M with y=1\|y\|=1 and let MM be such that 0,y0,y are not on the boundary. By Proposition 4.4, limnp(0,y,𝒳n)=limnp(0,y,Hn)\displaystyle\lim_{n\rightarrow\infty}\ell_{p}(0,y,\mathcal{X}_{n})=\lim_{n\rightarrow\infty}\ell_{p}(0,y,H_{n}). Let H1H_{1} be the unit intensity PPP obtained from HnH_{n} by rescaling each axis by n1/dn^{1/d}, so that p(0,y,Hn)=n1dp(0,n1dy,H1)\ell_{p}(0,y,H_{n})=n^{-\frac{1}{d}}\ell_{p}(0,n^{\frac{1}{d}}y,H_{1}). For notational convenience, let z=n1dyz=n^{\frac{1}{d}}y. Then

limn~p(0,y,𝒳n)\displaystyle\lim_{n\rightarrow\infty}\widetilde{\ell}_{p}(0,y,\mathcal{X}_{n}) =limn~p(0,y,Hn)\displaystyle=\lim_{n\rightarrow\infty}\widetilde{\ell}_{p}(0,y,H_{n})
=limnnp1pdp(0,y,Hn)\displaystyle=\lim_{n\rightarrow\infty}n^{\frac{p-1}{pd}}\ell_{p}(0,y,H_{n})
=limnnp1pdn1dp(0,n1dy,H1)\displaystyle=\lim_{n\rightarrow\infty}n^{\frac{p-1}{pd}}n^{-\frac{1}{d}}\ell_{p}(0,n^{\frac{1}{d}}y,H_{1})
=limzzp1pz1p(0,z,H1)\displaystyle=\lim_{\|z\|\rightarrow\infty}\|z\|^{\frac{p-1}{p}}\|z\|^{-1}\ell_{p}(0,z,H_{1})
=limzp(0,z,H1)z1/p.\displaystyle=\lim_{\|z\|\rightarrow\infty}\frac{\ell_{p}(0,z,H_{1})}{\|z\|^{1/p}}.

Thus, Cp,d=Cp,dp(0,y)=limn~p(0,y,𝒳n)=limzp(0,z,H1)z1/p=μp,d1/p\displaystyle C_{p,d}=C_{p,d}\mathcal{L}_{p}(0,y)=\lim_{n\rightarrow\infty}\widetilde{\ell}_{p}(0,y,\mathcal{X}_{n})=\lim_{\|z\|\rightarrow\infty}\frac{\ell_{p}(0,z,H_{1})}{\|z\|^{1/p}}=\mu_{p,d}^{1/p}. ∎

Finally, we bound our real quantity of interest: the convergence rate of ~p\widetilde{\ell}_{p} to Cp,dpC_{p,d}\mathcal{L}_{p}.

Theorem 4.6.

Assume Assumption 1, d2d\geq 2, β2=1/(4p+3)\beta_{2}=1/(4p+3), τ=n(1β2)4ddiam(M)3+β24\tau=n^{-\frac{(1-\beta_{2})}{4d}}\text{diam}(M)^{\frac{3+\beta_{2}}{4}}, p>1p>1, and 0,yMτ0,y\in M_{\tau}. Then for nn large enough, 𝔼[(~p(0,y,𝒳n)Cp,dp(0,y))2]n1dlog2(n)\mathbb{E}\left[\left(\tilde{\ell}_{p}(0,y,\mathcal{X}_{n})-C_{p,d}\mathcal{L}_{p}(0,y)\right)^{2}\right]\lesssim n^{-\frac{1}{d}}\log^{2}(n).

Proof.

To simplify notation throughout the proof we denote p(0,y)\mathcal{L}_{p}(0,y) simply by p\mathcal{L}_{p}. By Proposition 4.5 and for nn large enough,

𝔼[(~p(0,y,𝒳n)Cp,dp)2]\displaystyle\mathbb{E}\left[\left(\tilde{\ell}_{p}(0,y,\mathcal{X}_{n})-C_{p,d}\mathcal{L}_{p}\right)^{2}\right] 𝔼[(~p(0,y,𝒳N)μ1/pp)2]=:(I),\displaystyle\lesssim\mathbb{E}\left[\left(\tilde{\ell}_{p}(0,y,\mathcal{X}_{N})-\mu^{1/p}\mathcal{L}_{p}\right)^{2}\right]=:(I)\,,

where 𝒳N=HnM\mathcal{X}_{N}=H_{n}\cap M and HnH_{n} is a homogeneous PPP with rate nn. Let AA be the event that the geodesics from 0 to yy in 𝒳N\mathcal{X}_{N} and HnH_{n} are equal. Since we assume τ=n(1β2)4ddiam(M)3+β24\tau=n^{-\frac{(1-\beta_{2})}{4d}}\text{diam}(M)^{\frac{3+\beta_{2}}{4}}, we may apply Proposition 4.4 with ϵ=β2/4\epsilon=\beta_{2}/4 to conclude [A]1C1exp(C0yνdnν)\mathbb{P}[A]\geq 1-C_{1}\exp\left(-C_{0}\|y\|^{\frac{\nu}{d}}n^{\nu}\right) for ν=3β216min{1,dp}\nu=\frac{3\beta_{2}}{16}\min\{1,\frac{d}{p}\}. Conditioning on AA, and observing ~p(0,y,𝒳N)=np1pdp(0,y,𝒳N)np1pdy\tilde{\ell}_{p}(0,y,\mathcal{X}_{N})=n^{\frac{p-1}{pd}}\ell_{p}(0,y,\mathcal{X}_{N})\leq n^{\frac{p-1}{pd}}\|y\|, we obtain

(I)\displaystyle(I) =𝔼[(~p(0,y,𝒳N)μ1/pp)2|A][A]+𝔼[(~p(0,y,𝒳N)μ1/pp)2|A¯][A¯]\displaystyle=\mathbb{E}\left[\left(\tilde{\ell}_{p}(0,y,\mathcal{X}_{N})-\mu^{1/p}\mathcal{L}_{p}\right)^{2}\ \big{|}\ A\right]\mathbb{P}[A]+\mathbb{E}\left[\left(\tilde{\ell}_{p}(0,y,\mathcal{X}_{N})-\mu^{1/p}\mathcal{L}_{p}\right)^{2}\ \big{|}\ \bar{A}\right]\mathbb{P}[\bar{A}]
𝔼[(~p(0,y,Hn)μ1/pp)2|A]+(n2(p1)pdy2+μ2/pp2)C1exp(C0yνdnν)\displaystyle\leq\mathbb{E}\left[\left(\tilde{\ell}_{p}(0,y,H_{n})-\mu^{1/p}\mathcal{L}_{p}\right)^{2}\ \big{|}\ A\right]+\left(n^{\frac{2(p-1)}{pd}}\|y\|^{2}+\mu^{2/p}\mathcal{L}_{p}^{2}\right)C_{1}\exp\left(-C_{0}\|y\|^{\frac{\nu}{d}}n^{\nu}\right)
𝔼[(~p(0,y,Hn)μ1/pp)2]+q1\displaystyle\leq\mathbb{E}\left[\left(\tilde{\ell}_{p}(0,y,H_{n})-\mu^{1/p}\mathcal{L}_{p}\right)^{2}\right]+q_{1}

where q1q_{1} decays exponentially in nn (for the last line note that conditioning on AA means conditioning on the geodesics being local, which can only decrease the expected error).

A Lipschitz analysis applied to the function g(x)=x1/pg(x)=x^{1/p} yields:

(~p(0,y,Hn)μ1/pp)2\displaystyle\left(\widetilde{\ell}_{p}(0,y,H_{n})-\mu^{1/p}\mathcal{L}_{p}\right)^{2} p2~p(0,y,Hn)2(1p)/p(~pp(0,y,Hn)μpp)2.\displaystyle\leq p^{-2}\widetilde{\ell}_{p}(0,y,H_{n})^{2(1-p)/p}\cdot\left(\widetilde{\ell}^{p}_{p}(0,y,H_{n})-\mu\mathcal{L}_{p}^{p}\right)^{2}\ .

By Proposition 4.3,

~pp(0,y,Hn)\displaystyle\widetilde{\ell}^{p}_{p}(0,y,H_{n}) μppy12+ϵ/n1d(12ϵ)\displaystyle\geq\mu\mathcal{L}_{p}^{p}-\|y\|^{\frac{1}{2}+\epsilon}/n^{\frac{1}{d}\left(\frac{1}{2}-\epsilon\right)} (16)

with probability at least 1C1exp(C0yϵβ1nϵβ1d)1-C_{1}\exp\left(-C_{0}\|y\|^{\epsilon\beta_{1}}n^{\frac{\epsilon\beta_{1}}{d}}\right) for any ϵ(0,β2)\epsilon\in(0,\beta_{2}), where β1=min{1,d/p}\beta_{1}=\min\{1,d/p\}. Fix ϵ(0,β2)\epsilon\in(0,\beta_{2}) and let BB be the event that (16) is satisfied. On BB,

~p(0,y,Hn)2(1p)p\displaystyle\widetilde{\ell}_{p}(0,y,H_{n})^{\frac{2(1-p)}{p}} (μ1/pp)2(1p)p(1y12+ϵμppn1d(12ϵ))2(1p)p2\displaystyle\leq(\mu^{1/p}\mathcal{L}_{p})^{\frac{2(1-p)}{p}}\left(1-\frac{\|y\|^{\frac{1}{2}+\epsilon}}{\mu\mathcal{L}_{p}^{p}n^{\frac{1}{d}\left(\frac{1}{2}-\epsilon\right)}}\right)^{\frac{2(1-p)}{p^{2}}}
(μ1/pp)2(1p)p(1+2(p1)y12+ϵp2μppn1d(12ϵ)+higher order terms)\displaystyle\leq(\mu^{1/p}\mathcal{L}_{p})^{\frac{2(1-p)}{p}}\left(1+\frac{2(p-1)\|y\|^{\frac{1}{2}+\epsilon}}{p^{2}\mu\mathcal{L}_{p}^{p}n^{\frac{1}{d}\left(\frac{1}{2}-\epsilon\right)}}+\text{higher order terms}\right)
2(μ1/pp)2(1p)p,\displaystyle\leq 2(\mu^{1/p}\mathcal{L}_{p})^{\frac{2(1-p)}{p}},

for nn large enough. Note also that

𝔼[(~p(0,y,Hn)μ1/pp)2|B¯][B¯]\displaystyle\mathbb{E}\left[\left(\widetilde{\ell}_{p}(0,y,H_{n})-\mu^{1/p}\mathcal{L}_{p}\right)^{2}\,|\,\bar{B}\right]\mathbb{P}[\bar{B}] (n2(p1)pdy2+μ2/pp2)exp(C0yϵβ1nϵβ1d):=q2\displaystyle\leq\left(n^{\frac{2(p-1)}{pd}}\|y\|^{2}+\mu^{2/p}\mathcal{L}_{p}^{2}\right)\exp\left(-C_{0}\|y\|^{\epsilon\beta_{1}}n^{\frac{\epsilon\beta_{1}}{d}}\right):=q_{2}

and q2q_{2} decreases exponentially in nn. We thus obtain

𝔼[(~p(0,y,Hn)μ1/pp)2]\displaystyle\mathbb{E}\left[\left(\widetilde{\ell}_{p}(0,y,H_{n})-\mu^{1/p}\mathcal{L}_{p}\right)^{2}\right] 𝔼[(~p(0,y,Hn)μ1/pp)2|B][B]+q2\displaystyle\leq\mathbb{E}\left[\left(\widetilde{\ell}_{p}(0,y,H_{n})-\mu^{1/p}\mathcal{L}_{p}\right)^{2}\ \big{|}\ B\right]\mathbb{P}[B]+q_{2}
2p2(μ1/pp)2(1p)p𝔼[(~pp(0,y,Hn)μpp)2|B]+q2\displaystyle\leq\frac{2}{p^{2}}(\mu^{1/p}\mathcal{L}_{p})^{\frac{2(1-p)}{p}}\mathbb{E}\left[\left(\widetilde{\ell}^{p}_{p}(0,y,H_{n})-\mu\mathcal{L}_{p}^{p}\right)^{2}\ \big{|}\ B\right]+q_{2}
=C𝔼[(~pp(0,y,Hn)μpp)2]+q2\displaystyle=C\mathbb{E}\left[\left(\widetilde{\ell}^{p}_{p}(0,y,H_{n})-\mu\mathcal{L}_{p}^{p}\right)^{2}\right]+q_{2}

where CC is a constant depending on p,d,yp,d,\|y\|, and the last line follows since once again the expected error is lower conditioned on BB than unconditionally. We have thus established

𝔼[(~p(0,y,𝒳n)Cp,dp)2]\displaystyle\mathbb{E}\left[\left(\tilde{\ell}_{p}(0,y,\mathcal{X}_{n})-C_{p,d}\mathcal{L}_{p}\right)^{2}\right] 𝔼[(~pp(0,y,Hn)μpp)2]+q1+q2\displaystyle\lesssim\mathbb{E}\left[\left(\widetilde{\ell}^{p}_{p}(0,y,H_{n})-\mu\mathcal{L}_{p}^{p}\right)^{2}\right]+q_{1}+q_{2}

for q1,q2q_{1},q_{2} exponentially small in nn. Finally let H1H_{1} be the unit intensity homogeneous PPP obtained from HnH_{n} by multiplying each axis by n1/dn^{1/d}. By Proposition 4.2,

𝔼[(pp(0,n1dy,H1)μn1dy)2]\displaystyle\mathbb{E}\left[\left(\ell_{p}^{p}(0,n^{\frac{1}{d}}y,H_{1})-\mu n^{\frac{1}{d}}\|y\|\right)^{2}\right] n1dylog2(n1dy)\displaystyle\lesssim n^{\frac{1}{d}}\|y\|\log^{2}(n^{\frac{1}{d}}\|y\|)
𝔼[(npdpp(0,y,Hn)n1dμpp)2]\displaystyle\Rightarrow\mathbb{E}\left[\left(n^{\frac{p}{d}}\ell_{p}^{p}(0,y,H_{n})-n^{\frac{1}{d}}\mu\mathcal{L}_{p}^{p}\right)^{2}\right] n1dylog2(n1dy)\displaystyle\lesssim n^{\frac{1}{d}}\|y\|\log^{2}(n^{\frac{1}{d}}\|y\|)
𝔼[(np1dpp(0,y,Hn)μpp)2]\displaystyle\Rightarrow\mathbb{E}\left[\left(n^{\frac{p-1}{d}}\ell_{p}^{p}(0,y,H_{n})-\mu\mathcal{L}_{p}^{p}\right)^{2}\right] n1dylog2(n1dy)\displaystyle\lesssim n^{-\frac{1}{d}}\|y\|\log^{2}(n^{\frac{1}{d}}\|y\|)
𝔼[(~pp(0,y,Hn)μpp)2]\displaystyle\Rightarrow\mathbb{E}\left[\left(\widetilde{\ell}_{p}^{p}(0,y,H_{n})-\mu\mathcal{L}_{p}^{p}\right)^{2}\right] n1dlog2(n).\displaystyle\lesssim n^{-\frac{1}{d}}\log^{2}(n)\,.

For nn large, the above dominates q1,q2q_{1},q_{2}, so that for a constant CC depending on p,d,yp,d,\|y\|,

𝔼[(~p(0,y,𝒳n)Cp,dp)2]Cn1dlog2(n).\mathbb{E}\left[\left(\tilde{\ell}_{p}(0,y,\mathcal{X}_{n})-C_{p,d}\mathcal{L}_{p}\right)^{2}\right]\leq Cn^{-\frac{1}{d}}\log^{2}(n).

4.3 Estimating the Fluctuation Exponent

As an application, we utilize the 1-spanner results of Section 3 to empirically estimate the fluctuation rate χ(d)\chi(d). Since there is evidence that the variance dominates the bias, this important parameter likely determines the convergence rate of ~p\widetilde{\ell}_{p} to p\mathcal{L}_{p}. Once again utilizing the change of variable z=n1dyz=n^{\frac{1}{d}}y, we note

Var[pp(0,z,H1)]\displaystyle\text{Var}\left[\ell_{p}^{p}(0,z,H_{1})\right] z2χVar[~p(0,y,𝒳n)]n2(χ1)d,\displaystyle\lesssim\|z\|^{2\chi}\iff\text{Var}\left[\widetilde{\ell}_{p}(0,y,\mathcal{X}_{n})\right]\lesssim n^{\frac{2(\chi-1)}{d}}\,,

and we estimate the right hand side from simulations. Specifically, we sample nn points uniformly from the unit cube [0,1]d[0,1]^{d} and compute ~p(x,y,𝒳n)\widetilde{\ell}_{p}(x,y,\mathcal{X}_{n}) for x=(0.25,0.5,,0.5)x=(0.25,0.5,\ldots,0.5), y=(0.75,0.5,,0.5)y=(0.75,0.5,\ldots,0.5) in a kkNN graph on 𝒳n\mathcal{X}_{n}, with k=1+3(4411/p1)d/2log(n)k=\lceil 1+3\left(\frac{4}{4^{1-1/p}-1}\right)^{d/2}\log(n)\rceil as suggested by Theorem 3.9 (note that fmin=fmax,ζ=,κ0=1f_{min}=f_{max},\zeta=\infty,\kappa_{0}=1 in this example). We vary nn from nmin=11,586n_{\min}=11,586 to nmax=92,682n_{\max}=92,682, and for each nn we estimate Var[~p(x,y,𝒳n)]\text{Var}\left[\widetilde{\ell}_{p}(x,y,\mathcal{X}_{n})\right] from NsimN_{\text{sim}} simulations. Figure 6 shows the resulting log-log variance plots for d=2,3,4d=2,3,4 and various pp, as well as the slopes mm from a linear regression. The observed slopes are related to χ\chi by χ=md/2+1\chi=md/2+1, and one thus obtains the estimates for χ\chi reported in Table 2. See Appendix F for confidence interval estimates.

These simulations confirm that χ\chi is indeed independent of pp. It is conjectured in the percolation literature that χ(d)0+\chi(d)\rightarrow 0^{+} as dd increases, with χ(2)=13\chi(2)=\frac{1}{3}, χ(3)14\chi(3)\approx\frac{1}{4}, which is consistent with our results. For d=2d=2, the empirical convergence rate is thus n23n^{-\frac{2}{3}} (not n12n^{-\frac{1}{2}} as given in Theorem 4.6), and for large dd one expects an MSE of order n2dn^{-\frac{2}{d}} instead of n1dn^{-\frac{1}{d}}. However estimating χ\chi empirically becomes increasingly difficult as dd increases, since one has less sparsity in the kkNN graph, and because χ\chi is obtained from mm by χ=md/2+1\chi=md/2+1, so errors incurred in estimating the regression slopes are amplified by a factor of dd. Table 2 also reports the factor nmax/kn_{\max}/k, which can be interpreted as the expected computational speed-up obtained by running the simulation in a kkNN graph instead of a complete graph. We were unable to obtain empirical speed-up factors since computational resources prevented running the simulations in a complete graph.

An important open problem is establishing that ~p\widetilde{\ell}_{p} computed from a nonuniform density enjoys the same convergence rate (with respect to nn) as the uniform case. Although this seems intuitively true and preliminary simulation results support this equivalence, to the best of our knowledge it has not been proven, as the current proof techniques rely on “straight line” geodesics.

Refer to caption
(a) Data for d=2d=2
Refer to caption
(b) d=2d=2
Refer to caption
(c) d=3d=3
Refer to caption
(d) d=4d=4
Figure 6: Variance plots for ~p\tilde{\ell}_{p}. For each nn, the variance was estimated from a maximum of Nsim=24000N_{\text{sim}}=24000 simulations, with a smaller NsimN_{\text{sim}} when pp was small and/or the dimension was large. Specifically, when d=2d=2, Nsim=14000N_{\text{sim}}=14000 was used for p=1.5p=1.5; when d=3d=3, Nsim=5000,12000N_{\text{sim}}=5000,12000 was used for p=1.5,2p=1.5,2; when d=4d=4, Nsim=2000,6000,19000N_{\text{sim}}=2000,6000,19000 was used for p=1.5,2,4p=1.5,2,4.
dd pp χ^\hat{\chi} nmax/kn_{\max}/k dd pp χ^\hat{\chi} nmax/kn_{\max}/k dd pp χ^\hat{\chi} nmax/kn_{\max}/k
2 1.51.5 0.30 394 3 1.51.5 0.28 152 4 1.51.5 0.19 58
2 22 0.31 667 3 22 0.23 336 4 22 0.16 169
2 44 0.33 1204 3 44 0.24 820 4 44 0.14 558
2 88 0.34 1545 3 88 0.29 1204 4 88 0.19 927
Table 2: The slopes of log(n)\log(n) versus Var[~p]\text{Var}[\tilde{\ell}_{p}] are shown for uniform data for different density weightings (pp) and different dimensions (d)(d).

5 Conclusion and Future Work

This article establishes local equivalence of PWSPD to a density-based stretch of Euclidean distance. We derive a near-optimal condition on kk for the kkNN graph to be a 1-spanner for PWSPD, quantifying and improving the dependence on pp and dd. Moreover, we leverage the theory of Euclidean FPP to establish statistical convergence rates for PWSPD to its continuum limit, and apply our spanner results to empirically support conjectures on the optimal dimension-dependent rates of convergence.

Many directions remain for future work. Our statistical convergence rates for PWSPD in Section 4 are limited to uniform distributions. Preliminary numerical experiments indicate that these rates also hold for PWSPDs defined with varying density, but rigorous convergence rates for nonhomogeneous PPPs are lacking in the literature.

The analysis of Section 2 proved the local equivalence of PWSPDs with density-stretched Euclidean distances. These results and the convergence results of Section 4 are the first steps in a program of developing a discrete-to-continuum limit analysis for PWSPDs and PWSPD-based operators. A major problem is to develop conditions so that the discrete graph Laplacian (defined with ~p\tilde{\ell}_{p}) converges to a continuum second order differential operator as nn\rightarrow\infty. A related direction is the analysis of how data clusterability with PWSPDs depends on pp for various random data models and in specific applications.

The numerical results of Section 3.2 confirm that klog(n)k\propto\log(n) is required for the kkNN graph to be a 1-spanner, as predicted by theory. Relaxing the notion of tt-spanners to (t,ω)(t,\omega)-spanners, as suggested in Section 3.2, is a topic of future research.

Finally, the results of this article require data to be generated from a distribution supported exactly on a low-dimensional manifold \mathcal{M}. An arguably more realistic setting is the noisy one in which the data is distributed only approximately on \mathcal{M}. Two potential models are of interest: (i) replacing \mathcal{M} with B(,τ)={xD|dist(x,)τ}B(\mathcal{M},\tau)=\{x\in\mathbb{R}^{D}\ |\ \text{dist}(x,\mathcal{M})\leq\tau\} (tube model) and (ii) considering a density that concentrates on \mathcal{M}, rather than being supported on it (concentration model). PWSPDs may exhibit very different properties under these two noise models, for example under bounded uniform noise and Gaussian noise, especially for large pp. For the concentration model one expects noisy PWSPDs to converge to manifold PWSPDs for pp large, since the optimal PWSPD paths are density driven. Preliminary empirical results (Figure 5(f)) suggest that when the measure concentrates sufficiently near a low-dimensional set \mathcal{M}, the number of nearest neighbors needed for a 1-spanner benefits from the intrinsic low-dimensional structure. For the tube model, although noisy PWSPDs will not converge to manifold PWSPDs, they will still scale according to the intrinsic manifold dimension for τ\tau small. For both models, incorporating a denoising procedure such as local averaging [31] or diffusion [35] before computing PWSPDs is expected to be advantageous. Future research will investigate robust denoising procedures for PWSPD, computing PWSPDs after dimension reduction, and which type of noise distributions are most adversarial to PWSPD.

Acknowledgements

AVL acknowledges partial support from the US National Science Foundation under grant DMS-1912906. DM acknowledges partial support from the US National Science Foundation under grant DMS-1720237 and the Office of Naval Research under grant N000141712162. JMM acknowledges partial support from the US National Science Foundation under grants DMS-1912737 and DMS-1924513. DM thanks Matthias Wink for several useful discussions on Riemannian geometry. We thank the two reviewers and the associate editor for many helpful comments that greatly improved the manuscript.

References

  • [1] E. Aamari, J. Kim, F. Chazal, B. Michel, A. Rinaldo, and L. Wasserman. Estimating the reach of a manifold. Electronic Journal of Statistics, 13(1):1359–1399, 2019.
  • [2] M. Alamgir and U. Von Luxburg. Shortest path distance in random k-nearest neighbor graphs. In ICML, pages 1251–1258, 2012.
  • [3] K.S. Alexander. A note on some rates of convergence in first-passage percolation. The Annals of Applied Probability, pages 81–90, 1993.
  • [4] H. Antil, T. Berry, and J. Harlim. Fractional diffusion maps. Applied and Computational Harmonic Analysis, 54:145–175, 2021.
  • [5] E. Arias-Castro. Clustering based on pairwise distances when the data is of mixed dimensions. IEEE Transactions on Information Theory, 57(3):1692–1706, 2011.
  • [6] A. Auffinger, M. Damron, and J. Hanson. 50 Years of First-Passage Percolation, volume 68. American Mathematical Soc., 2017.
  • [7] M. Azizyan, A. Singh, and L. Wasserman. Density-sensitive semisupervised inference. The Annals of Statistics, 41(2):751–771, 2013.
  • [8] M. Belkin and P. Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 15(6):1373–1396, 2003.
  • [9] M. Belkin and P. Niyogi. Convergence of Laplacian eigenmaps. In NIPS, pages 129–136, 2007.
  • [10] R.E. Bellman. Adaptive control processes: a guided tour. Princeton University Press, 2015.
  • [11] T. Berry and J. Harlim. Variable bandwidth diffusion kernels. Applied and Computational Harmonic Analysis, 40(1):68–96, 2016.
  • [12] T. Berry and T. Sauer. Local kernels and the geometric structure of data. Applied and Computational Harmonic Analysis, 40(3):439–469, 2016.
  • [13] A.S. Bijral, N. Ratliff, and N. Srebro. Semi-supervised learning with density based distances. In UAI, pages 43–50, 2011.
  • [14] J.-D. Boissonnat, A. Lieutier, and M. Wintraecken. The reach, metric distortion, geodesic convexity and the variation of tangent spaces. Journal of Applied and Computational Topology, 3(1):29–58, 2019.
  • [15] L. Boninsegna, G. Gobbo, F. Noé, and C. Clementi. Investigating molecular kinetics by variationally optimized diffusion maps. Journal of Chemical Theory and Computation, 11(12):5947–5960, 2015.
  • [16] E. Borghini, X. Fernández, P. Groisman, and G. Mindlin. Intrinsic persistent homology via density-based metric learning. arXiv preprint arXiv:2012.07621, 2020.
  • [17] O. Bousquet, O. Chapelle, and M. Hein. Measure based regularization. In NIPS, pages 1221–1228, 2004.
  • [18] H. Chang and D.-Y. Yeung. Robust path-based spectral clustering. Pattern Recognition, 41(1):191–203, 2008.
  • [19] Y. Cheng. Mean shift, mode seeking, and clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17(8):790–799, 1995.
  • [20] T. Chu, G.L. Miller, and D.R. Sheehy. Exact computation of a manifold metric, via Lipschitz embeddings and shortest paths on a graph. In SODA, pages 411–425, 2020.
  • [21] R.R. Coifman and S. Lafon. Diffusion maps. Applied and Computational Harmonic Analysis, 21(1):5–30, 2006.
  • [22] R.R. Coifman, S. Lafon, A.B. Lee, M. Maggioni, B. Nadler, F. Warner, and S.W. Zucker. Geometric diffusions as a tool for harmonic analysis and structure definition of data: Diffusion maps. Proceedings of the National Academy of Sciences, 102(21):7426–7431, 2005.
  • [23] S.B. Damelin, F.J. Hickernell, D.L. Ragozin, and X. Zeng. On energy, discrepancy and group invariant measures on measurable subsets of euclidean space. Journal of Fourier Analysis and Applications, 16(6):813–839, 2010.
  • [24] M. Damron and X. Wang. Entropy reduction in Euclidean first-passage percolation. Electronic Journal of Probability, 21, 2016.
  • [25] L.P. Devroye and T.J. Wagner. The strong uniform consistency of nearest neighbor density estimates. The Annals of Statistics, pages 536–540, 1977.
  • [26] D.L. Donoho and C. Grimes. Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data. Proceedings of the National Academy of Sciences, 100(10):5591–5596, 2003.
  • [27] M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases with noise. In KDD, volume 96, pages 226–231, 1996.
  • [28] A.M. Farahmand, C. Szepesvári, and J.-Y. Audibert. Manifold-adaptive dimension estimation. In ICML, pages 265–272, 2007.
  • [29] H. Federer. Curvature measures. Transactions of the American Mathematical Society, 93(3):418–491, 1959.
  • [30] B. Fischer, T. Zöller, and J.M. Buhmann. Path based pairwise data clustering with application to texture segmentation. In International Workshop on Energy Minimization Methods in Computer Vision and Pattern Recognition, pages 235–250. Springer, 2001.
  • [31] N. García Trillos, D. Sanz-Alonso, and R. Yang. Local regularization of noisy point clouds: Improved global geometric estimates and data analysis. Journal of Machine Learning Research, 20(136):1–37, 2019.
  • [32] A. Gray. The volume of a small geodesic ball of a Riemannian manifold. The Michigan Mathematical Journal, 20(4):329–344, 1974.
  • [33] P. Groisman, M. Jonckheere, and F. Sapienza. Nonhomogeneous Euclidean first-passage percolation and distance learning. arXiv preprint arXiv:1810.09398, 2018.
  • [34] L. Györfi, M. Kohler, A. Krzyzak, and H. Walk. A distribution-free theory of nonparametric regression. Springer Science & Business Media, 2006.
  • [35] M. Hein and M. Maier. Manifold denoising. In NIPS, volume 19, pages 561–568, 2006.
  • [36] C.D. Howard and C.M. Newman. Euclidean models of first-passage percolation. Probability Theory and Related Fields, 108(2):153–170, 1997.
  • [37] C.D. Howard and C.M. Newman. Geodesics and spanning trees for Euclidean first-passage percolation. Annals of Probability, pages 577–623, 2001.
  • [38] G. Hughes. On the mean accuracy of statistical pattern recognizers. IEEE Transactions on Information Theory, 14(1):55–63, 1968.
  • [39] S.J. Hwang, S.B. Damelin, and A. Hero. Shortest path through random points. The Annals of Applied Probability, 26(5):2791–2823, 2016.
  • [40] D.B. Johnson. Efficient algorithms for shortest paths in sparse networks. Journal of the ACM, 24(1):1–13, 1977.
  • [41] J. Kileel, A. Moscovich, N. Zelesko, and A. Singer. Manifold learning with arbitrary norms. arXiv preprint arXiv:2012.14172, 2020.
  • [42] A. Little, M. Maggioni, and J.M Murphy. Path-based spectral clustering: Guarantees, robustness to outliers, and fast algorithms. Journal of Machine Learning Research, 21(6):1–66, 2020.
  • [43] D.O. Loftsgaarden and C.P. Quesenberry. A nonparametric estimate of a multivariate density function. The Annals of Mathematical Statistics, 36(3):1049–1051, 1965.
  • [44] P. C. Mahalanobis. On the generalized distance in statistics. National Institute of Science of India, 1936.
  • [45] J. Malik, C. Shen, H.-T. Wu, and N. Wu. Connecting dots: from local covariance to empirical intrinsic geometry and locally linear embedding. Pure and Applied Analysis, 1(4):515–542, 2019.
  • [46] D. Mckenzie and S. Damelin. Power weighted shortest paths for clustering Euclidean data. Foundations of Data Science, 1(3):307, 2019.
  • [47] A. Moscovich, A. Jaffe, and B.Nadler. Minimax-optimal semi-supervised regression on unknown manifolds. In AISTATS, pages 933–942, 2017.
  • [48] A.Y. Ng, M.I. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm. In NIPS, pages 849–856, 2002.
  • [49] P. Niyogi, S. Smale, and S. Weinberger. Finding the homology of submanifolds with high confidence from random samples. Discrete & Computational Geometry, 39(1-3):419–441, 2008.
  • [50] P. Petersen, S. Axler, and K.A. Ribet. Riemannian Geometry, volume 171. Springer, 2006.
  • [51] A. Rinaldo and L. Wasserman. Generalized density clustering. The Annals of Statistics, 38(5):2678–2722, 2010.
  • [52] A. Rodriguez and A. Laio. Clustering by fast search and find of density peaks. Science, 344(6191):1492–1496, 2014.
  • [53] Sajama and A. Orlitsky. Estimating and computing density based distance metrics. In ICML, pages 760–767, 2005.
  • [54] L.K. Saul and M.I. Jordan. A variational principle for model-based interpolation. In NIPS, pages 267–273, 1997.
  • [55] G. Schiebinger, M.J. Wainwright, and B. Yu. The geometry of kernelized spectral clustering. The Annals of Statistics, 43(2):819–846, 2015.
  • [56] J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8):888–905, 2000.
  • [57] J.B. Tenenbaum, V. De Silva, and J.C. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500):2319–2323, 2000.
  • [58] L. van der Maaten and G. Hinton. Visualizing data using t-SNE. Journal of Machine Learning Research, 9(Nov):2579–2605, 2008.
  • [59] D. Van Dijk, R. Sharma, J. Nainys, K. Yim, P. Kathail, A.J. Carr, C. Burdziak, K.R Moon, C.L. Chaffer, D. Pattabiraman, B. Bierie, L. Mazutis, G. Wolf, S. Krishnaswamy, and D. Pe’er. Recovering gene interactions from single-cell data using data diffusion. Cell, 174(3):716–729, 2018.
  • [60] P. Vincent and Y. Bengio. Density-sensitive metrics and kernels. In Snowbird Learning Workshop, 2003.
  • [61] U. Von Luxburg. A tutorial on spectral clustering. Statistics and Computing, 17(4):395–416, 2007.
  • [62] R. Xu, S. Damelin, B. Nadler, and D.C. Wunsch II. Clustering of high-dimensional gene expression data with feature filtering methods and diffusion maps. Artificial Intelligence in Medicine, 48(2-3):91–98, 2010.
  • [63] L. Zelnik-Manor and P. Perona. Self-tuning spectral clustering. In NIPS, pages 1601–1608, 2005.
  • [64] S. Zhang and J.M. Murphy. Hyperspectral image clustering with spatially-regularized ultrametrics. Remote Sensing, 13(5):955, 2021.

Appendix A Proofs for Section 2

Proof of Lemma 2.2.

Let γ1(t)\gamma_{1}(t) be a path which achieves 𝒟(x,y)\mathscr{D}(x,y). Since 𝒟(x,y)ϵ(1+κϵ2)\mathscr{D}(x,y)\leq\epsilon(1+\kappa\epsilon^{2}), f(γ1(t))fmin(x,ϵ)f(\gamma_{1}(t))\geq f_{\text{min}}(x,\epsilon) for all tt. Then:

pp(x,y)\displaystyle\mathcal{L}_{p}^{p}(x,y) 011f(γ1(t))p1d|γ1(t)|𝑑t𝒟(x,y)fmin(x,ϵ)p1dϵ(1+κϵ2)fmin(x,ϵ)p1d.\displaystyle\leq\int_{0}^{1}\frac{1}{f(\gamma_{1}(t))^{\frac{p-1}{d}}}|\gamma_{1}^{\prime}(t)|\ dt\leq\frac{\mathscr{D}(x,y)}{f_{\text{min}}(x,\epsilon)^{\frac{p-1}{d}}}\leq\frac{\epsilon(1+\kappa\epsilon^{2})}{f_{\text{min}}(x,\epsilon)^{\frac{p-1}{d}}}\,.

Note yBpp(x,ϵ(1+κϵ2)/fmin(x,ϵ)p1d)y\in B_{\mathcal{L}_{p}^{p}}(x,\epsilon(1+\kappa\epsilon^{2})/f_{\text{min}}(x,\epsilon)^{\frac{p-1}{d}}) implies f(y)fmax(x,ϵ)f(y)\leq f_{\text{max}}(x,\epsilon), and thus fmax(x,ϵ)p1d(f(x)f(y))p12d1\frac{f_{\text{max}}(x,\epsilon)^{\frac{p-1}{d}}}{(f(x)f(y))^{\frac{p-1}{2d}}}\geq 1, so that pp(x,y)𝒟(x,y)fmin(x,ϵ)p1dfmax(x,ϵ)p1d(f(x)f(y))p12d\mathcal{L}_{p}^{p}(x,y)\leq\frac{\mathscr{D}(x,y)}{f_{\text{min}}(x,\epsilon)^{\frac{p-1}{d}}}\frac{f_{\text{max}}(x,\epsilon)^{\frac{p-1}{d}}}{(f(x)f(y))^{\frac{p-1}{2d}}}. This yields

pp(x,y)(ρx,ϵ)p1dxy(1+κxy2)(f(x)f(y))p12d(ρx,ϵ)p1d(1+κϵ2)𝒟f,Euc(x,y),\displaystyle\mathcal{L}_{p}^{p}(x,y)\leq(\rho_{x,\epsilon})^{\frac{p-1}{d}}\frac{\|x-y\|(1+\kappa\|x-y\|^{2})}{(f(x)f(y))^{\frac{p-1}{2d}}}\leq(\rho_{x,\epsilon})^{\frac{p-1}{d}}(1+\kappa\epsilon^{2})\mathscr{D}_{f,\text{Euc}}(x,y),

which proves the upper bound. Now let γ0(t)\gamma_{0}(t) be a path achieving pp(x,y)\mathcal{L}_{p}^{p}(x,y); note that since pp(x,y)𝒟(x,y)fmin(x,ϵ)p1d\mathcal{L}_{p}^{p}(x,y)\leq\frac{\mathscr{D}(x,y)}{f_{\text{min}}(x,\epsilon)^{\frac{p-1}{d}}}, the path γ0\gamma_{0} is contained in Bpp(x,ϵ(1+κϵ2)/fmin(x,ϵ)p1d)B_{\mathcal{L}_{p}^{p}}(x,\epsilon(1+\kappa\epsilon^{2})/f_{\text{min}}(x,\epsilon)^{\frac{p-1}{d}}). Thus

pp(x,y)\displaystyle\mathcal{L}_{p}^{p}(x,y) =011f(γ0(t))p1d|γ0(t)|𝑑t𝒟(x,y)fmax(x,ϵ)p1d𝒟(x,y)fmax(x,ϵ)p1dfmin(x,ϵ)p1d(f(x)f(y))p12d\displaystyle=\int_{0}^{1}\frac{1}{f(\gamma_{0}(t))^{\frac{p-1}{d}}}|\gamma_{0}^{\prime}(t)|\ dt\geq\frac{\mathscr{D}(x,y)}{f_{\text{max}}(x,\epsilon)^{\frac{p-1}{d}}}\geq\frac{\mathscr{D}(x,y)}{f_{\text{max}}(x,\epsilon)^{\frac{p-1}{d}}}\cdot\frac{f_{\text{min}}(x,\epsilon)^{\frac{p-1}{d}}}{(f(x)f(y))^{\frac{p-1}{2d}}}

so that

pp(x,y)\displaystyle\mathcal{L}_{p}^{p}(x,y) 𝒟(x,y)(ρx,ϵ)p1d(f(x)f(y))p12dxy(ρx,ϵ)p1d(f(x)f(y))p12d=1(ρx,ϵ)p1d𝒟f,Euc(x,y).\displaystyle\geq\frac{\mathscr{D}(x,y)}{(\rho_{x,\epsilon})^{\frac{p-1}{d}}(f(x)f(y))^{\frac{p-1}{2d}}}\geq\frac{\|x-y\|}{(\rho_{x,\epsilon})^{\frac{p-1}{d}}(f(x)f(y))^{\frac{p-1}{2d}}}=\frac{1}{(\rho_{x,\epsilon})^{\frac{p-1}{d}}}\mathscr{D}_{f,\text{Euc}}(x,y).

Appendix B Proofs for Section 3

Proof of Lemma 3.6.

Let s:=xys:=\|x-y\| and choose a coordinate system x(1),,x(n)x^{(1)},\ldots,x^{(n)} such that y=(s/2,0,0)y=(-s/2,0,\ldots 0), x=(s/2,0,,0)x=(s/2,0,\ldots,0) and xM=𝟎x_{M}=\mathbf{0}. 𝒟α,p(x,y)\mathcal{D}_{\alpha,p}(x,y) is now the interior of:

((x(1)+s2)2+(x(2))2++(x(n))2)p/2+((x(1)s2)2+(x(2))2++(x(n))2)p/2=αsp.\left(\left(x^{(1)}+\frac{s}{2}\right)^{2}+(x^{(2)})^{2}+\ldots+(x^{(n)})^{2}\right)^{p/2}+\left(\left(x^{(1)}-\frac{s}{2}\right)^{2}+(x^{(2)})^{2}+\ldots+(x^{(n)})^{2}\right)^{p/2}=\alpha s^{p}.

In spherical coordinates the boundary of this region may be expressed as:

(r2+srcosθ1+s2/4)p/2+(r2srcosθ1+s2/4)p/2\displaystyle\left(r^{2}+sr\cos\theta_{1}+s^{2}/4\right)^{p/2}+\left(r^{2}-sr\cos\theta_{1}+s^{2}/4\right)^{p/2} =αsp\displaystyle=\alpha s^{p} (17)

where (x(1))2++(x(n))2=r2(x^{(1)})^{2}+\ldots+(x^{(n)})^{2}=r^{2} and x1=rcosθ1x_{1}=r\cos\theta_{1}. Define r=H(θ1)r=H(\theta_{1}) as the unique positive solution of (17). Implicitly differentiating in θ1\theta_{1} yields

p2(r2+srcos(θ1)+s24)p21(2rdrdθ1srsin(θ1)+scos(θ1)drdθ1)\displaystyle\frac{p}{2}\left(r^{2}+sr\cos(\theta_{1})+\frac{s^{2}}{4}\right)^{\frac{p}{2}-1}\left(2r\frac{dr}{d\theta_{1}}-sr\sin(\theta_{1})+s\cos(\theta_{1})\frac{dr}{d\theta_{1}}\right)
+\displaystyle+ p2(r2srcos(θ1)+s24)p21(2rdrdθ1+srsin(θ1)scos(θ1)drdθ1)=0.\displaystyle\frac{p}{2}\left(r^{2}-sr\cos(\theta_{1})+\frac{s^{2}}{4}\right)^{\frac{p}{2}-1}\left(2r\frac{dr}{d\theta_{1}}+sr\sin(\theta_{1})-s\cos(\theta_{1})\frac{dr}{d\theta_{1}}\right)=0\,.

Solving for drdθ1\frac{dr}{d\theta_{1}} and setting the result to 0 yields

[(r2+srcos(θ1)+s24)p22(r2srcos(θ1)+s24)p22]sin(θ1):=(I)(II)\displaystyle\left[\left(r^{2}+sr\cos(\theta_{1})+\frac{s^{2}}{4}\right)^{\frac{p-2}{2}}-\left(r^{2}-sr\cos(\theta_{1})+\frac{s^{2}}{4}\right)^{\frac{p-2}{2}}\right]\sin(\theta_{1}):=\text{(I)}\cdot\text{(II)} =0.\displaystyle=0.

Thus we obtain two solutions to drdθ1=0\frac{dr}{d\theta_{1}}=0:

(I)=0cos(θ1)=0θ1=π2(min.)(II)=0sin(θ1)=0θ1=0(max.)\displaystyle\text{(I)}=0\Rightarrow\cos(\theta_{1})=0\Rightarrow\theta_{1}=\frac{\pi}{2}\qquad\text{(min.)}\quad\text{(II)}=0\Rightarrow\sin(\theta_{1})=0\Rightarrow\theta_{1}=0\qquad\text{(max.)}

Thus the minimal radius occurs when θ1=π2\theta_{1}=\frac{\pi}{2}. Substituting θ1=π2\theta_{1}=\frac{\pi}{2} into (17) yields:

r=sα2/p/41/p1/4=xyα2/p/41/p1/4.r=s\sqrt{\alpha^{2/p}\big{/}4^{1/p}-1/4}=\|x-y\|\sqrt{\alpha^{2/p}\big{/}4^{1/p}-1/4}.

Hence B(xM,r)𝒟α,p(xi,xj)B(x_{M},r)\subset\mathcal{D}_{\alpha,p}(x_{i},x_{j}), as desired. To see 𝒟α,p(x,y)B(x,r)\mathcal{D}_{\alpha,p}(x,y)\subset B(x,r) observe that if zB(x,r)z\notin B(x,r) then:

xz>r=xyxzp>αxyp for all α(0,1] and p1\|x-z\|>r=\|x-y\|\Rightarrow\|x-z\|^{p}>\alpha\|x-y\|^{p}\quad\text{ for all }\alpha\in(0,1]\text{ and }p\geq 1 (18)

hence zz cannot be in 𝒟α,p(x,y)\mathcal{D}_{\alpha,p}(x,y). ∎

Lemma B.1.

With assumptions and notation as in Theorem 11, B(x~M,r2)B(xM,r1)B\left(\tilde{x}_{M},r_{2}^{\star}\right)\subset B(x_{M},r_{1}^{\star}).

Proof.

By [14, Lemma 1], xMx~Mζζ2r2/4<r2/(4ζ)\|x_{M}-\tilde{x}_{M}\|\leq\zeta-\sqrt{\zeta^{2}-r^{2}/4}<r^{2}/(4\zeta). Now, suppose yB(x~M,r2)y\in B(\tilde{x}_{M},r_{2}^{\star}). Then

xMyxMx~M+x~Myr2/(4ζ)+r(1/41/p1/4r/(4ζ))=r1,\|x_{M}-y\|\leq\|x_{M}-\tilde{x}_{M}\|+\|\tilde{x}_{M}-y\|\leq r^{2}\big{/}(4\zeta)+r\left(\sqrt{1/4^{1/p}-1/4}-r/(4\zeta)\right)=r_{1}^{\star},

so that yB(xM,r1)y\in B(x_{M},r_{1}^{\star}), as desired.

Lemma B.2.

With notation and assumptions as in Theorem 11:

[xijB(x~M,r2)|xijB(x,r)]34κ02fminfmax(141/p14)d/2.\displaystyle\mathbb{P}\left[x_{i_{j}}\in B(\tilde{x}_{M},r_{2}^{\star})\ |\ x_{i_{j}}\in B(x,r)\right]\geq\frac{3}{4}\kappa_{0}^{-2}\frac{f_{\min}}{f_{\max}}\left(\frac{1}{4^{1/p}}-\frac{1}{4}\right)^{d/2}. (19)
Proof.

By the definition of conditional probability and B(x~M,r2)B(x,r)B(\tilde{x}_{M},r_{2}^{\star})\subset B(x,r):

[xijB(x~M,r2)|xijB(x,r)]=B(x~M,r2)f/B(x,r)f.\displaystyle\mathbb{P}\left[x_{i_{j}}\in B(\tilde{x}_{M},r_{2}^{\star})\ |\ x_{i_{j}}\in B(x,r)\right]=\int_{B(\tilde{x}_{M},r_{2}^{\star})\cap\mathcal{M}}f\bigg{/}\displaystyle\int_{B(x,r)\cap\mathcal{M}}f. (20)

By Definition 3.7, B(x~M,r2)ffminvol(B(x~M,r2))fminκ01(r2)dvol(B(0,1))\int_{B(\tilde{x}_{M},r_{2}^{\star})\cap\mathcal{M}}f\geq f_{\min}\text{vol}\left(B(\tilde{x}_{M},r_{2}^{\star})\cap\mathcal{M}\right)\geq f_{\min}\kappa_{0}^{-1}(r_{2}^{\star})^{d}\text{vol}\left(B(0,1)\right) and B(x,r)ffmaxvol(B(x,r))fmaxκ0rdvol(B(0,1))\int_{B(x,r)\cap\mathcal{M}}f\leq f_{\max}\text{vol}\left(B(x,r)\cap\mathcal{M}\right)\leq f_{\max}\kappa_{0}r^{d}\text{vol}\left(B(0,1)\right). Returning to (20):

[xijB(x~M,r2)|xijB(x,r)]κ02fminfmax(r2r)d.\mathbb{P}\left[x_{i_{j}}\in B(\tilde{x}_{M},r_{2}^{\star})\ |\ x_{i_{j}}\in B(x,r)\right]\geq\kappa_{0}^{-2}\frac{f_{\min}}{f_{\max}}\left(\frac{r_{2}^{\star}}{r}\right)^{d}. (21)

The result follows by noting

(r2r)d=(141/p14r4ζ)d(141/p1414d141/p14)d34(141/p14)d/2.\left(\frac{r_{2}^{\star}}{r}\right)^{d}=\left(\sqrt{\frac{1}{4^{1/p}}-\frac{1}{4}}-\frac{r}{4\zeta}\right)^{d}\geq\left(\sqrt{\frac{1}{4^{1/p}}-\frac{1}{4}}-\frac{1}{4d}\sqrt{\frac{1}{4^{1/p}}-\frac{1}{4}}\right)^{d}\geq\frac{3}{4}\left(\frac{1}{4^{1/p}}-\frac{1}{4}\right)^{d/2}.

Appendix C Local Analysis: Proof of Corollary 2.4

Proof.

First note that by Theorem 2.3

|p2(x,y)𝒟f,Euc2/p(x,y)|\displaystyle\left|\mathcal{L}_{p}^{2}(x,y)-\mathscr{D}_{f,\text{Euc}}^{2/p}(x,y)\right| =|(p(x,y)𝒟f,Euc1/p(x,y))(p(x,y)+𝒟f,Euc1/p(x,y))|\displaystyle=\left|\left(\mathcal{L}_{p}(x,y)-\mathscr{D}_{f,\text{Euc}}^{1/p}(x,y)\right)\left(\mathcal{L}_{p}(x,y)+\mathscr{D}_{f,\text{Euc}}^{1/p}(x,y)\right)\right|
|(C1ϵ1+1p+C2ϵ2+1p+O(ϵ3+1p))ϵ1/p(1+O(ϵ2))fminp1pd|\displaystyle\leq\left|\left(C_{1}\epsilon^{1+\frac{1}{p}}+C_{2}\epsilon^{2+\frac{1}{p}}+O(\epsilon^{3+\frac{1}{p}})\right)\frac{\epsilon^{1/p}(1+O(\epsilon^{2}))}{f_{\text{min}}^{\frac{p-1}{pd}}}\right|
=C~1ϵ1+2/p+C~2ϵ2+2/p+O(ϵ3+2/p).\displaystyle=\tilde{C}_{1}\epsilon^{1+2/p}+\tilde{C}_{2}\epsilon^{2+2/p}+O(\epsilon^{3+2/p})\,.

Thus

|h1p(pp(x,y)/ϵ)h1p(𝒟f,Euc(x,y)/ϵ)|h1p(pp(x,y)/ϵ)\displaystyle\frac{\left|h_{\frac{1}{p}}\left(\mathcal{L}_{p}^{p}(x,y)/\epsilon\right)-h_{\frac{1}{p}}\left(\mathscr{D}_{f,\text{Euc}}(x,y)/\epsilon\right)\right|}{h_{\frac{1}{p}}\left(\mathcal{L}_{p}^{p}(x,y)/\epsilon\right)} =|1exp(𝒟f,Euc2/p(x,y)ϵ2/p+p2(x,y)ϵ2/p)|\displaystyle=\left|1-\exp\left(-\frac{\mathscr{D}_{f,\text{Euc}}^{2/p}(x,y)}{\epsilon^{2/p}}+\frac{\mathcal{L}_{p}^{2}(x,y)}{\epsilon^{2/p}}\right)\right|
=|1exp(±[C~1ϵ+C~2ϵ2+O(ϵ3)])|\displaystyle=\left|1-\exp(\pm[\tilde{C}_{1}\epsilon+\tilde{C}_{2}\epsilon^{2}+O(\epsilon^{3})])\right|
C~1ϵ+(C~2+12C~12)ϵ2+O(ϵ3).\displaystyle\leq\tilde{C}_{1}\epsilon+\left(\tilde{C}_{2}+\frac{1}{2}\tilde{C}_{1}^{2}\right)\epsilon^{2}+O(\epsilon^{3}).

Appendix D Additional Clustering Results

D.1 Spectral Clustering with Self-Tuning

Clustering results for Euclidean SC with self-tuning (ST) are shown in Figure 7. Similarly to Euclidean SC and Euclidean SC with the diffusion maps normalization, SC+ST can cluster well given KK a priori but struggles to simultaneously learn KK and cluster accurately.

Refer to caption
(a) OA, Euc. SC+ST, Two Rings
Refer to caption
(b) OA, Euc. SC+ST, Long Bottleneck
Refer to caption
(c) OA, Euc. SC+ST, Short Bottleneck
Refer to caption
(d) K^\hat{K}, Euc. SC+ST, Two Rings
Refer to caption
(e) K^\hat{K}, Euc. SC+ST, Long Bottleneck
Refer to caption
(f) K^\hat{K}, Euc. SC+ST, Short Bottleneck
Figure 7: Clustering results with self-tuning SC.

D.2 Clustering with Fiedler Eigenvector

Clustering results on the Two Rings and Short Bottleneck data (both of which have K=2K=2) appear in Figure 8. The results are very similar to running KK-means on the second eigenvector of LL.

Refer to caption
(a) OA, Euc. SC, Two Rings
Refer to caption
(b) OA, Euc. SC+ST, Two Rings
Refer to caption
(c) OA, PWSPD SC, Two Rings
Refer to caption
(d) OA, Euc. SC, Short Bottleneck
Refer to caption
(e) OA, Euc. SC+ST, Short Bottleneck
Refer to caption
(f) OA, Euc. SC, Short Bottleneck
Figure 8: Clustering with Fiedler eigenvector.

Appendix E PWSPD Spanners: Intrinsic Path Distance

In this section we assume that \mathcal{M} is a compact Riemannian manifold with metric gg, but we do not assume that \mathcal{M} is isometrically embedded in D\mathbb{R}^{D}. Let us first establish some notation. For precise definitions of terms in italics, we refer the reader to [50].

  • For any v,wTxv,w\in T_{x}\mathcal{M}, Rx(v,w)R_{x}(v,w) denotes the Riemannian curvature while Kx(v,w)K_{x}(v,w) denotes the sectional curvature. In this notation, Rx(v,w)R_{x}(v,w) is a matrix while Kx(v,w)K_{x}(v,w) is a scalar. These notions of curvature are related:

    Rx(w,v)w,v=Kx(v,w)(v2w2v,w).\langle R_{x}(w,v)w,v\rangle=K_{x}(v,w)\left(\|v\|^{2}\|w\|^{2}-\langle v,w\rangle\right). (22)

    We shall drop the “xx” in the subscript when it is clear from context. Let KminK_{\min} be such that Kx(v,w)KminK_{x}(v,w)\geq K_{\min} for all xx\in\mathcal{M} and v,wTxv,w\in T_{x}\mathcal{M}. Because \mathcal{M} is compact, such a KminK_{\min} exists. Similarly, SxS_{x} denotes the scalar curvature and Smin=minxSxS_{\min}=\displaystyle\min_{x\in\mathcal{M}}S_{x} while Smax=maxxSxS_{\max}=\displaystyle\max_{x\in\mathcal{M}}S_{x}.

  • For r>0r>0, define Vmin(r):=minxvol(B𝒟(x,r))V_{\min}(r):=\displaystyle\min_{x\in\mathcal{M}}\text{vol}(B_{\mathscr{D}}(x,r)), Vmax(r)=maxxvol(B𝒟(x,r))V_{\max}(r)=\displaystyle\max_{x\in\mathcal{M}}\text{vol}(B_{\mathscr{D}}(x,r)). Both quantities can be expressed as a Taylor series in rr, with coefficients depending on the curvature of \mathcal{M} [32]:

    Vmin(r)\displaystyle V_{\min}(r) =(1Smax6(d+2)r2+O(r4))vol(B(0,1))rd,\displaystyle=\left(1-\frac{S_{\max}}{6(d+2)}r^{2}+O(r^{4})\right)\text{vol}(B(0,1))r^{d}, (23)
    Vmax(r)\displaystyle V_{\max}(r) =(1Smin6(d+2)r2+O(r4))vol(B(0,1))rd.\displaystyle=\left(1-\frac{S_{\min}}{6(d+2)}r^{2}+O(r^{4})\right)\text{vol}(B(0,1))r^{d}. (24)
  • For any xx\in\mathcal{M}, expx:Tx\exp_{x}:T_{x}\mathcal{M}\to\mathcal{M} denotes the exponential map. By construction, 𝒟(x,expx(v))=gx(v,v)\mathscr{D}(x,\exp_{x}(v))=g_{x}(v,v). The exponential map is used to construct normal coordinates.

  • The injectivity radius at xx is denoted by inj(x)\text{inj}(x), while inj()\displaystyle\text{inj}(\mathcal{M}) denotes the injectivity radius of \mathcal{M}. Because \mathcal{M} is closed, the injectivity radius is bounded away from 0.

Proposition E.1.

In normal coordinates the metric has the following expansion:

gkl(x)=δkl+13Rijklxixj+O(x3),g_{kl}(x)=\delta_{kl}+\frac{1}{3}R_{ijkl}x^{i}x^{j}+O(\|x\|^{3}),

where δkl=1\delta_{kl}=1 if k=lk=l and is zero otherwise. Moreover, for any v,wB(0,1)v,w\in B(0,1) and rinj()r\leq\text{inj}(\mathcal{M}):

𝒟(expp(rv),expp(rw))2=r2vw2+r43R(w,v)w,v+O(r5).\mathscr{D}\left(\exp_{p}(rv),\exp_{p}(rw)\right)^{2}=r^{2}\|v-w\|^{2}+\frac{r^{4}}{3}\langle R(w,v)w,v\rangle+O(r^{5}). (25)
Proof.

See the discussion in [50] above Theorem 5.5.8 and Exercises 5.9.26 and 5.9.27. ∎

Combining (22) and (25) yields:

𝒟(expp(rv),expp(rw))2=r2vw2+r4K(v,w)3(v2w2v,w2)+O(r5).\mathscr{D}\left(\exp_{p}(rv),\exp_{p}(rw)\right)^{2}=r^{2}\|v-w\|^{2}+\frac{r^{4}K(v,w)}{3}\left(\|v\|^{2}\|w\|^{2}-\langle v,w\rangle^{2}\right)+O(r^{5}).

For any xi𝒳x_{i}\in\mathcal{X} let ri,kr_{i,k} denote the distance to the kkNN of xix_{i}. Because ri,k0+r_{i,k}\to 0^{+} as nn\to\infty, we have that ri,kinj()r_{i,k}\leq\text{inj}(\mathcal{M}) for all xix_{i}, almost surely for nn large enough.

The proof of Theorem E.2 proceeds in a similar manner to the proof of Theorem 3.9. However, care must be taken to account for the curvature. Note that for any quantity c:=c(n)c:=c(n) implicitly dependent on nn, we shall say that c=o(1)c=o(1) if for any ϵ>0\epsilon>0 there exists an NN such that for all n>Nn>N we have that c(n)<ϵc(n)<\epsilon. Let 𝒢,𝒳p\mathcal{G}_{\mathcal{M},\mathcal{X}}^{p} denote the complete graph on 𝒳\mathcal{X} with edge weights 𝒟(xi,xj)p\mathscr{D}(x_{i},x_{j})^{p} while 𝒢,𝒳p,k\mathcal{G}^{p,k}_{\mathcal{M},\mathcal{X}} shall denote the kkNN subgraph of 𝒢,𝒳p\mathcal{G}_{\mathcal{M},\mathcal{X}}^{p}.

Theorem E.2.

Let (,g)(\mathcal{M},g) be a closed and compact Riemannian manifold. Let 𝒳={xi}i=1n\mathcal{X}=\{x_{i}\}_{i=1}^{n} be drawn i.i.d. from \mathcal{M} according to a probability distribution with continuous density ff satisfying 0<fminf(x)fmax0<f_{\min}\leq f(x)\leq f_{\max} for all xx\in\mathcal{M}. For p>1p>1, nn sufficiently large, and

k3[1+o(1)][fmaxfmin][4411/p(1o(1))2/p1]d/2log(n),k\geq 3\left[1+o(1)\right]\left[\frac{f_{\max}}{f_{\min}}\right]\left[\frac{4}{4^{1-1/p}(1-o(1))^{2/p}-1}\right]^{d/2}\log(n), (26)

𝒢,𝒳p,k\mathcal{G}^{p,k}_{\mathcal{M},\mathcal{X}} is a 11-spanner of 𝒢,𝒳p\mathcal{G}^{p}_{\mathcal{M},\mathcal{X}} with probability at least 11/n1-1/n.

Proof.

As in Theorem 3.9, we prove this by showing that every edge of 𝒢,𝒳p\mathcal{G}^{p}_{\mathcal{M},\mathcal{X}} not contained in 𝒢,𝒳p,k\mathcal{G}^{p,k}_{\mathcal{M},\mathcal{X}} is not critical. As before, w.h.p. ,p(x,y)0\ell_{\mathcal{M},p}(x,y)\to 0 uniformly in x,yx,y [39]. So, let nn be large enough so that [,p(x,y)inj() for all x,y𝒳](112n)\mathbb{P}\left[\ell_{\mathcal{M},p}(x,y)\leq\text{inj}(\mathcal{M})\text{ for all }x,y\in\mathcal{X}\right]\geq\left(1-\frac{1}{2n}\right).

Pick any x,y𝒳x,y\in\mathcal{X} which are not kkNNs and let r:=𝒟(x,y)r:=\mathscr{D}(x,y). If r>inj()r>\text{inj}(\mathcal{M}), then p(x,y)<𝒟(x,y)\ell_{p}(x,y)<\mathscr{D}(x,y) and thus the edge {x,y}\{x,y\} is not critical. So, suppose without loss of generality in what follows that rinj()r\leq\text{inj}(\mathcal{M}). Let xi1,,xikx_{i_{1}},\ldots,x_{i_{k}} denote the kkNNs of xx. Because yy is not a kkNN of xx, 𝒟(x,xij)𝒟(x,y)=r\mathscr{D}(x,x_{i_{j}})\leq\mathscr{D}(x,y)=r for j=1,,kj=1,\ldots,k. We show {x,y}\{x,y\} is not critical by showing there exists an {1,,k}\ell\in\{1,\dots,k\} such that 𝒟(x,xi)p+𝒟(xi,y)p𝒟(x,y)p\mathscr{D}(x,x_{i_{\ell}})^{p}+\mathscr{D}(x_{i_{\ell}},y)^{p}\leq\mathscr{D}(x,y)^{p} with probability at least 11/n1-1/n.

Let xMx_{M} be the midpoint of the (shortest) geodesic from xx to yy. As rinj()r\leq\text{inj}(\mathcal{M}) the exponential map exp:=expxM\exp:=\exp_{x_{M}} is a diffeomorphism onto B𝒟(xM,r)B_{\mathscr{D}}(x_{M},r). Choose Riemannian normal coordinates centered at xMx_{M} such that y=expxM(r2e1)y=\exp_{x_{M}}(\frac{r}{2}e_{1}) and x=expxM(r2e1)x=\exp_{x_{M}}(-\frac{r}{2}e_{1}). For any zB𝒟(xM,r)z\in B_{\mathscr{D}}(x_{M},r), we may write z=expxM(rv)z=\exp_{x_{M}}(rv) for some vB(0,1)TxMv\in B(0,1)\subset T_{x_{M}}\mathcal{M}. Now, by (25)

𝒟(x,z)2\displaystyle\mathscr{D}(x,z)^{2} =𝒟(expxM(r2e1),expxm(rv))2\displaystyle=\mathscr{D}\left(\exp_{x_{M}}\left(-\frac{r}{2}e_{1}\right),\exp_{x_{m}}(rv)\right)^{2}
=r212e1+v2r4K(e1,v)3(14v214e1,v2)+O(r5)\displaystyle=r^{2}\left\|\frac{1}{2}e_{1}+v\right\|^{2}-\frac{r^{4}K(e_{1},v)}{3}\left(\frac{1}{4}\|v\|^{2}-\frac{1}{4}\langle e_{1},v\rangle^{2}\right)+O(r^{5})
r212e1+v2r4Kmin12(v2e1,v2)+O(r5)\displaystyle\leq r^{2}\left\|\frac{1}{2}e_{1}+v\right\|^{2}-\frac{r^{4}K_{\min}}{12}\left(\|v\|^{2}-\langle e_{1},v\rangle^{2}\right)+O(r^{5})

and similarly: 𝒟(z,y)2r212e1+v2r4Kmin12(v2e1,v2)+O(r5).\mathscr{D}(z,y)^{2}\leq r^{2}\left\|-\frac{1}{2}e_{1}+v\right\|^{2}-\frac{r^{4}K_{\min}}{12}\left(\|v\|^{2}-\langle e_{1},v\rangle^{2}\right)+O(r^{5}). We split the analysis into the case where Kmin0K_{\min}\geq 0 and where Kmin<0K_{\min}<0.

Case Kmin0K_{\min}\geq 0 (Positive Sectional Curvature): If Kmin0K_{\min}\geq 0 then the terms proportional to r4r^{4} are strictly non-positive, and hence may be dropped. We get:

𝒟(x,z)2\displaystyle\mathscr{D}(x,z)^{2} r212e1+v2+O(r5) and 𝒟(z,y)2r212e1+v2+O(r5),\displaystyle\leq r^{2}\left\|\frac{1}{2}e_{1}+v\right\|^{2}+O(r^{5})\text{ and }\mathscr{D}(z,y)^{2}\leq r^{2}\left\|-\frac{1}{2}e_{1}+v\right\|^{2}+O(r^{5}),

and hence 𝒟(x,z)p+𝒟(z,y)prp(12e1+vp+12e1+vp)+O(rp+3).\mathscr{D}(x,z)^{p}+\mathscr{D}(z,y)^{p}\leq r^{p}\left(\left\|\frac{1}{2}e_{1}+v\right\|^{p}+\left\|-\frac{1}{2}e_{1}+v\right\|^{p}\right)+O(r^{p+3}). Thus, for rr sufficiently small we may guarantee that

𝒟(x,z)p+𝒟(z,y)prp=𝒟(x,y)p\mathscr{D}(x,z)^{p}+\mathscr{D}(z,y)^{p}\leq r^{p}=\mathscr{D}(x,y)^{p} (27)

by ensuring 12e1+vp+12e1+vp1α,\left\|\frac{1}{2}e_{1}+v\right\|^{p}+\left\|-\frac{1}{2}e_{1}+v\right\|^{p}\leq 1-\alpha, where α(0,1)\alpha\in(0,1) is such that the O(rp+3)O(r^{p+3}) term is less than αrp\alpha r^{p}. As r0r\to 0 with nn\to\infty, we observe that α=o(1)\alpha=o(1).

Case Kmin<0K_{min}<0 (Negative Sectional Curvature): If Kmin<0K_{\min}<0 then one can upper bound the terms proportional to r4r^{4} by r4Kmin/12-r^{4}K_{\min}/12 to obtain:

𝒟(x,z)2r212e1+v2r4Kmin12+O(r5),\displaystyle\mathscr{D}(x,z)^{2}\leq r^{2}\|\frac{1}{2}e_{1}+v\|^{2}-\frac{r^{4}K_{\min}}{12}+O(r^{5}),
𝒟(z,y)2r212e1+v2r4Kmin12+O(r5).\displaystyle\mathscr{D}(z,y)^{2}\leq r^{2}\|-\frac{1}{2}e_{1}+v\|^{2}-\frac{r^{4}K_{\min}}{12}+O(r^{5}).

and so:

𝒟(x,z)p+𝒟(z,y)p\displaystyle\mathscr{D}(x,z)^{p}+\mathscr{D}(z,y)^{p}\leq rp(12e1+vp+12e1+vp)\displaystyle r^{p}\left(\left\|\frac{1}{2}e_{1}+v\right\|^{p}+\left\|-\frac{1}{2}e_{1}+v\right\|^{p}\right)
+rp+2Kmin12(12e1+vp2+12e1+vp2)+O(rp+3).\displaystyle+r^{p+2}\frac{-K_{\min}}{12}\left(\left\|\frac{1}{2}e_{1}+v\right\|^{p-2}+\left\|-\frac{1}{2}e_{1}+v\right\|^{p-2}\right)+O(r^{p+3}). (28)

As in the positive sectional curvature case, one can guarantee:

𝒟(x,z)p+𝒟(z,y)prp=𝒟(x,y)p\mathscr{D}(x,z)^{p}+\mathscr{D}(z,y)^{p}\leq r^{p}=\mathscr{D}(x,y)^{p} (29)

by ensuring

12e1+vp+12e1+vp1α\left\|\frac{1}{2}e_{1}+v\right\|^{p}+\left\|-\frac{1}{2}e_{1}+v\right\|^{p}\leq 1-\alpha (30)

where α(0,1)\alpha\in(0,1) is such that the O(rp+2)O(r^{p+2}) term is less than αrp\alpha r^{p}. Again, we observe that α=o(1)\alpha=o(1). Note that if (30) holds with α<1\alpha<1 then 12e1+vp2+12e1+vp2<1\left\|\frac{1}{2}e_{1}+v\right\|^{p-2}+\left\|-\frac{1}{2}e_{1}+v\right\|^{p-2}<1, and so (28) becomes:

𝒟(x,z)p+𝒟(z,y)pαrp+Kmin12rp+2+O(rp+3).\mathscr{D}(x,z)^{p}+\mathscr{D}(z,y)^{p}\leq\alpha r^{p}+\frac{-K_{\min}}{12}r^{p+2}+O(r^{p+3}). (31)

For both cases, consider the pp-elongated set defined in the tangent space:

𝒟1α,p:=𝒟1α,p(12e1,12e1)={vTxM|12e1+vp+12e1+vp1α}\mathcal{D}_{1-\alpha,p}:=\mathcal{D}_{1-\alpha,p}\left(\frac{1}{2}e_{1},-\frac{1}{2}e_{1}\right)=\left\{v\in T_{x_{M}}\mathcal{M}\ \bigg{|}\ \left\|\frac{1}{2}e_{1}+v\right\|^{p}+\left\|-\frac{1}{2}e_{1}+v\right\|^{p}\leq 1-\alpha\right\}

as well as its scaled image under the exponential map: exp(r𝒟1α,p)\exp\left(r\mathcal{D}_{1-\alpha,p}\right). From the above arguments, (27) (resp. (29)) will hold as long as zexp(r𝒟1α,p)z\in\exp\left(r\mathcal{D}_{1-\alpha,p}\right). By Lemma 3.6 as long as nn is sufficiently large that 1α>21p1-\alpha>2^{1-p} we have B(0,(1α)2/p41/p14)𝒟1α,pB\left(0,\sqrt{\frac{(1-\alpha)^{2/p}}{4^{1/p}}-\frac{1}{4}}\right)\subset\mathcal{D}_{1-\alpha,p} so B(0,r1)r𝒟1α,pB(0,r_{1}^{\star})\subset r\mathcal{D}_{1-\alpha,p} where r:=r(1α)2/p41/p14r^{\star}:=r\sqrt{\frac{(1-\alpha)^{2/p}}{4^{1/p}}-\frac{1}{4}}. Hence:

B𝒟(xM,r)=exp(B(0,r))exp(r𝒟1α,p)B𝒟(xM,r).B_{\mathscr{D}}(x_{M},r^{\star})=\exp\left(B(0,r^{\star})\right)\subset\exp\left(r\mathcal{D}_{1-\alpha,p}\right)\subset B_{\mathscr{D}}(x_{M},r). (32)

As in Theorem 3.9:

[xijexp(𝒟1α,p)|xijB𝒟(x,r)]\displaystyle\mathbb{P}\left[x_{i_{j}}\in\exp\left(\mathcal{D}_{1-\alpha,p}\right)\ |\ x_{i_{j}}\in B_{\mathscr{D}}(x,r)\right] =[xijexp(𝒟1α,p)][xijB𝒟(x,r)]\displaystyle=\frac{\mathbb{P}\left[x_{i_{j}}\in\exp\left(\mathcal{D}_{1-\alpha,p}\right)\right]}{\mathbb{P}\left[x_{i_{j}}\in B_{\mathscr{D}}(x,r)\right]}
[xijB𝒟(xM,r)][xijB𝒟(x,r)](using (32))\displaystyle\geq\frac{\mathbb{P}\left[x_{i_{j}}\in B_{\mathscr{D}}(x_{M},r^{\star})\right]}{\mathbb{P}\left[x_{i_{j}}\in B_{\mathscr{D}}(x,r)\right]}\quad\text{\em(using \eqref{eq:Inclusions})}
fminVmin(r)fmaxVmax(r).(by definition of Vmin and Vmax)\displaystyle\geq\frac{f_{\min}V_{\min}(r^{\star})}{f_{\max}V_{\max}(r)}.\quad\text{\em(by definition of $V_{\min}$ and $V_{\max}$)}

Using (23), (24) and r=o(1)r=o(1):

Vmin(r)Vmax(r)=(1o(1))(r)drd=(1o(1))((1α)2/p41/p14)d/2.\frac{V_{\min}(r^{\star})}{V_{\max}(r)}=\left(1-o(1)\right)\frac{\left(r^{\star}\right)^{d}}{r^{d}}=\left(1-o(1)\right)\left(\frac{(1-\alpha)^{2/p}}{4^{1/p}}-\frac{1}{4}\right)^{d/2}.

Recalling that α=o(1)\alpha=o(1):

[xijexp(𝒟1α,p)|xijB𝒟(x,r)]\displaystyle\mathbb{P}\left[x_{i_{j}}\in\exp\left(\mathcal{D}_{1-\alpha,p}\right)\ |\ x_{i_{j}}\in B_{\mathscr{D}}(x,r)\right]\geq (1o(1))fminfmax((1o(1))2/p41/p14)d/2\displaystyle\left(1-o(1)\right)\frac{f_{\min}}{f_{\max}}\left(\frac{(1-o(1))^{2/p}}{4^{1/p}}-\frac{1}{4}\right)^{d/2}
=\displaystyle= ε,p,f.\displaystyle\varepsilon_{\mathcal{M},p,f}.

As in Theorem 3.9 for k3lognlog(1ε,p,f)k\geq\frac{3\log n}{-\log(1-\varepsilon_{\mathcal{M},p,f})},

[j with xijexp(𝒟1α,p)]=1[j with xijexp(𝒟1α,p)]11n3.\displaystyle\mathbb{P}\left[\exists j\text{ with }x_{i_{j}}\in\exp\left(\mathcal{D}_{1-\alpha,p}\right)\right]=1-\mathbb{P}\left[\not\exists j\text{ with }x_{i_{j}}\in\exp\left(\mathcal{D}_{1-\alpha,p}\right)\right]\geq 1-\frac{1}{n^{3}}. (33)

If xijexp(𝒟1α,p)x_{i_{j}}\in\exp\left(\mathcal{D}_{1-\alpha,p}\right) then from (27) (resp. (29)):

𝒟(x,xij)p+𝒟(xij,y)prp=𝒟(x,y)p\mathscr{D}(x,x_{i_{j}})^{p}+\mathscr{D}(x_{i_{j}},y)^{p}\leq r^{p}=\mathscr{D}(x,y)^{p} (34)

and so {x,y}\{x,y\} is not critical. Thus, {x,y}\{x,y\} is not critical with probability exceeding 11n31-\frac{1}{n^{3}}. These edges {x,y}\{x,y\} are precisely those contained in 𝒢,𝒳p\mathcal{G}^{p}_{\mathcal{M},\mathcal{X}} but not in 𝒢,𝒳p,k\mathcal{G}^{p,k}_{\mathcal{M},\mathcal{X}}. There are fewer than n(n1)/2n(n-1)/2 such non kk-NN pairs x,y𝒳x,y\in\mathcal{X}. By the union bound and (33) we conclude that none of these are critical with probability greater than 1n(n1)21n3112n1-\frac{n(n-1)}{2}\frac{1}{n^{3}}\geq 1-\frac{1}{2n}. This was conditioned on ,p(x,y)inj()\ell_{\mathcal{M},p}(x,y)\leq\text{inj}(\mathcal{M}) for all x,y𝒳x,y\in\mathcal{X}, which holds with probability 112n1-\frac{1}{2n}. Thus, all critical edges are contained in 𝒢p,k,𝒳\mathcal{G}_{p,k}^{\mathcal{M},\mathcal{X}} with probability exceeding 1(12n+12n)=11n1-\left(\frac{1}{2n}+\frac{1}{2n}\right)=1-\frac{1}{n}. Unpacking ε,p,f\varepsilon_{\mathcal{M},p,f} yields the claimed lower bound on kk. ∎

Appendix F Estimating the Fluctuation Exponent

Table 3 shows confidence interval estimates for χ\chi obtained by computing ~p\tilde{\ell}_{p} in a sparse graph.

dd pp χ^\hat{\chi} CI for χ\chi dd pp χ^\hat{\chi} CI for χ\chi dd pp χ^\hat{\chi} CI for χ\chi
2 1.51.5 0.30 (0.28, 0.32) 3 1.51.5 0.28 (0.20, 0.36) 4 1.51.5 0.19 (0.03, 0.36)
2 22 0.31 (0.30, 0.32) 3 22 0.23 (0.20, 0.25) 4 22 0.16 (0.13, 0.19)
2 44 0.33 (0.31, 0.34) 3 44 0.24 (0.22, 0.25) 4 44 0.14 (0.11, 0.18)
2 88 0.34 (0.32, 0.37) 3 88 0.29 (0.27, 0.32) 4 88 0.19 (0.14, 0.23)
Table 3: Confidence interval estimates of χ\chi for uniform data for different density weightings (pp) and different dimensions (d)(d).