This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

HoroPCA: Hyperbolic Dimensionality Reduction via
Horospherical Projections

Ines Chami Equal contribution. Albert Gu Dat Nguyen Christopher Ré
Abstract

This paper studies Principal Component Analysis (PCA) for data lying in hyperbolic spaces. Given directions, PCA relies on: (1) a parameterization of subspaces spanned by these directions, (2) a method of projection onto subspaces that preserves information in these directions, and (3) an objective to optimize, namely the variance explained by projections. We generalize each of these concepts to the hyperbolic space and propose HoroPCA, a method for hyperbolic dimensionality reduction. By focusing on the core problem of extracting principal directions, HoroPCA theoretically better preserves information in the original data such as distances, compared to previous generalizations of PCA. Empirically, we validate that HoroPCA outperforms existing dimensionality reduction methods, significantly reducing error in distance preservation. As a data whitening method, it improves downstream classification by up to 3.9% compared to methods that don’t use whitening. Finally, we show that HoroPCA can be used to visualize hyperbolic data in two dimensions.

1 Introduction

Learning representations of data in hyperbolic spaces has recently attracted important interest in Machine Learning (ML) [25, 27] due to their ability to represent hierarchical data with high fidelity in low dimensions [28]. Many real-world datasets exhibit hierarchical structures, and hyperbolic embeddings have led to state-of-the-art results in applications such as question answering [30], node classification [5], link prediction [1, 7] and word embeddings [32]. These developments motivate the need for algorithms that operate in hyperbolic spaces such as nearest neighbor search [21, 35], hierarchical clustering [24, 6], or dimensionality reduction which is the focus of this work.

Euclidean Principal Component Analysis (PCA) is a fundamental dimensionality reduction technique which seeks directions that best explain the original data. PCA is an important primitive in data analysis and has many important uses such as (i) dimensionality reduction (e.g. for memory efficiency), (ii) data whitening and pre-processing for downstream tasks, and (iii) data visualization.

Here, we seek a generalization of PCA to hyperbolic geometry. Given a core notion of directions, PCA involves the following ingredients:

  1. 1.

    A nested sequence of affine subspaces (flags) spanned by a set of directions.

  2. 2.

    A projection method which maps points to these subspaces while preserving information (e.g. dot-product) along each direction.

  3. 3.

    A variance objective to help choose the best directions.

Refer to caption
(a) Euclidean projections.
Refer to caption
(b) Hyperbolic geodesic projections.
Refer to caption
(c) Hyperbolic horospherical projections.
Figure 1: Given datapoints (black dots), Euclidean and horospherical projections preserve distance information across different subspaces (black lines) pointing towards the same direction or point at infinity, while geodesic projections do not. (a): Distances between points, and therefore the explained variance, are invariant to translations along orthogonal directions (red lines). (b): Geodesic projections do not preserve this property: distances between green projections are not the same as distances between blue projections. (c): Horospherical projections project data by sliding it along the complementary direction defined by horospheres (red circles) and there exist an isometric mapping between the blue and green projections.

The PCA algorithm is then defined as combining these primitives: given a dataset, it chooses directions that maximize the variance of projections onto a subspace spanned by those directions, so that the resulting sequence of directions optimally explains the data. Crucially, the algorithm only depends on the directions of the affine subspaces and not their locations in space (Fig. 1(a)). Thus, in practice we can assume that they all go through the origin (and hence become linear subspaces), which greatly simplifies computations.

Generalizing PCA to manifolds is a challenging problem that has been studied for decades, starting with Principal Geodesic Analysis (PGA) [10] which parameterizes subspaces using tangent vectors at the mean of the data, and maximizes distances from projections to the mean to find optimal directions. More recently, the Barycentric Subspace Analysis (BSA) method [26] was introduced. It finds a more general parameterization of nested sequences of submanifolds by minimizing the unexplained variance. However, both PGA and BSA map points onto submanifolds using closest-point or geodesic projections, which do not attempt to preserve information along principal directions; for example, they cannot isometrically preserve hyperbolic distances between any points and shrink all path lengths by an exponential factor (3.5).

Fundamentally, all previous methods only look for subspaces rather than directions that explain the data, and can perhaps be better understood as principal subspace analysis rather than principal component analysis. Like with PCA, they assume that all optimal subspaces go through a chosen base point, but unlike in the Euclidean setting, this assumption is now unjustified: “translating” the submanifolds does not preserve distances between projections (Fig. 1(b)). Furthermore, the dependence on the base point is sensitive: as noted above, the shrink factor of the projection depends exponentially on the distances between the subspaces and the data. Thus, having to choose a base point increases the number of necessary parameters and reduces stability.

Here, we propose HoroPCA, a dimensionality reduction method for data defined in hyperbolic spaces which better preserves the properties of Euclidean PCA. We show how to interpret directions using points at infinity (or ideal points), which then allows us to generalize core properties of PCA:

  1. 1.

    To generalize the notion of affine subspace, we propose parameterizing geodesic subspaces as the sets spanned by these ideal points. This yields multiple viable nested subspaces (flags) (Section 3.1).

  2. 2.

    To maximally preserve information in the original data, we propose a new projection method that uses horospheres, a generalization of complementary directions for hyperbolic space. In contrast with geodesic projections, these projections exactly preserve information – specifically, distances to ideal points – along each direction. Consequently, they preserve distances between points much better than geodesic projections (Section 3.2).

  3. 3.

    Finally, we introduce a simple generalization of explained variance that is a function of distances only and can be computed in hyperbolic space (Section 3.3).

Combining these notions, we propose an algorithm that seeks a sequence of principal components that best explain variations in hyperbolic data. We show that this formulation retains the location-independence property of PCA: translating target submanifolds along orthogonal directions (horospheres) preserves projected distances (Fig. 1(c)). In particular, the algorithm’s objective depends only on the directions and not locations of the submanifolds (Section 4).

We empirically validate HoroPCA on real datasets and for three standard PCA applications. First, (i) we show that it yields much lower distortion and higher explained variance than existing methods, reducing average distortion by up to 77%. Second, (ii) we validate that it can be used for data pre-processing, improving downstream classification by up to 3.8% in Average Precision score compared to methods that don’t use whitening. Finally, (iii) we show that the low-dimensional representations learned by HoroPCA can be visualized to qualitatively interpret hyperbolic data.

2 Background

We first review some basic notions from hyperbolic geometry; a more in-depth treatment is available in standard texts [22]. We discuss the generalization of coordinates and directions in hyperbolic space and then review geodesic projections. We finally describe generalizations of the notion of mean and variance to non-Euclidean spaces.

2.1 The Poincaré Model of Hyperbolic Space

Hyperbolic geometry is a Riemannian geometry with constant negative curvature 1-1, where curvature measures deviation from flat Euclidean geometry. For easier visualizations, we work with the dd-dimensional Poincaré model of hyperbolic space: d={xd:x<1}\mathbb{H}^{d}=\{{x}\in\mathbb{R}^{d}:\|x\|<1\}, where \|\cdot\| is the Euclidean norm. In this model, the Riemannian distance can be computed in cartesian coordinates by:

d(x,y)=cosh1(1+2xy2(1x2)(1y2)).d_{\mathbb{H}}(x,y)=\operatorname{cosh^{-1}}\left(1+2\frac{\|x-y\|^{2}}{(1-\|x\|^{2})(1-\|y\|^{2})}\right). (1)
Geodesics

Shortest paths in hyperbolic space are called geodesics. In the Poincaré model, they are represented by straight segments going through the origin and circular arcs perpendicular to the boundary of the unit ball (Fig. 2).

Geodesic submanifolds

A submanifold MdM\subset\mathbb{H}^{d} is called (totally) geodesic if for every x,yMx,y\in M, the geodesic line connecting xx and yy belongs to MM. This generalizes the notion of affine subspaces in Euclidean spaces. In the Poincaré model, geodesic submanifolds are represented by linear subspaces going through the origin and spherical caps perpendicular to the boundary of the unit ball.

Euclidean Hyperbolic
Component Unit vector ww Ideal point pp
Coordinate Dot product xwx\cdot w Busemann func. Bp(x)B_{p}(x)
Table 1: Analogies of components and their corresponding coordinates, in both Euclidean and hyperbolic space.

2.2 Directions in Hyperbolic space

The notions of directions, and coordinates in a given direction can be generalized to hyperbolic spaces as follows.

Ideal points

As with parallel rays in Euclidean spaces, geodesic rays in d\mathbb{H}^{d} that stay close to each other can be viewed as sharing a common endpoint at infinity, also called an ideal point. Intuitively, ideal points represent directions along which points in d\mathbb{H}^{d} can move toward infinity. The set of ideal points 𝕊d1\mathbb{S}_{\infty}^{d-1}, called the boundary at infinity of d\mathbb{H}^{d}, is represented by the unit sphere 𝕊d1={x=1}\mathbb{S}_{\infty}^{d-1}=\{\|x\|=1\} in the Poincaré model. We abuse notations and say that a geodesic submanifold MdM\subset\mathbb{H}^{d} contains an ideal point pp if the boundary of MM in 𝕊d1\mathbb{S}_{\infty}^{d-1} contains pp.

Busemann coordinates

In Euclidean spaces, each direction can be represented by a unit vector ww. The coordinate of a point xx in the direction of ww is simply the dot product wxw\cdot x. In hyperbolic geometry, directions can be represented by ideal points but dot products are not well-defined. Instead, we take a ray-based perspective: note that in Euclidean spaces, if we shoot a ray in the direction of ww from the origin, the coordinate wxw\cdot x can be viewed as the normalized distance to infinity in the direction of that ray. In other words, as a point y=twy=tw, (t>0t>0) moves toward infinity in the direction of ww:

wx=limt(d(0,tw)d(x,tw)).w\cdot x=\lim_{t\to\infty}\left(d(0,tw)-d(x,tw)\right).

This approach generalizes to other geometries: given a unit-speed geodesic ray γ(t)\gamma(t), the Busemann function Bγ(x)B_{\gamma}(x) of γ\gamma is defined as:111Note that compared to the above formula, the sign convention is flipped due to historical reasons.

Bγ(x)=limt(d(x,γ(t))t).B_{\gamma}(x)=\lim_{t\to\infty}\left(d(x,\gamma(t))-t\right).

Up to an additive constant, this function only depends on the endpoint at infinity of the geodesic ray, and not the starting point γ(0)\gamma(0). Thus, given an ideal point pp, we define the Busemann function Bp(x)B_{p}(x) of pp to be the Busemann function of the geodesic ray that starts from the origin of the unit ball model and has endpoint pp. Intuitively, it represents the coordinates of xx in the direction of pp. In the Poincaré model, there is a closed formula:

Bp(x)=lnpx21x2.B_{p}(x)=\ln\frac{\|p-x\|^{2}}{1-\|x\|^{2}}.
Horospheres

The level sets of Busemann functions Bp(x)B_{p}(x) are called horospheres centered at pp. In this sense, they resemble spheres, which are level sets of distance functions. However, intrinsically as Riemannian manifolds, horospheres have curvature zero and thus also exhibit many properties of planes in Euclidean spaces.

Every geodesic with endpoint pp is orthogonal to every horosphere centered at pp. Given two horospheres with the same center, every orthogonal geodesic segment connecting them has the same length. In this sense, concentric horospheres resemble parallel planes in Euclidean spaces. In the Poincaré model, horospheres are Euclidean spheres that touch the boundary sphere 𝕊d1\mathbb{S}_{\infty}^{d-1} at their ideal centers (Fig. 2). Given an ideal point pp and a point xx in d\mathbb{H}^{d}, there is a unique horosphere S(p,x)S(p,x) passing through xx and centered at pp.

2.3 Geodesic Projections

PCA uses orthogonal projections to project data onto subspaces. Orthogonal projections are usually generalized to other geometries as closest-point projections. Given a target submanifold MM, each point xx in the ambient space is mapped to the closest-point to it in MM:

πMG(x)=argminyMdM(x,y).\pi^{\operatorname{G}}_{M}(x)=\operatorname*{argmin}_{y\in M}d_{M}(x,y).

One could view πMG()\pi^{\operatorname{G}}_{M}(\cdot) as the map that pushes each point xx along an orthogonal geodesic until it hits MM. For this reason, it is also called geodesic projection. In the Poincaré model, these can be computed in closed-form (see Appendix C).

2.4 Manifold Statistics

PCA relies on data statistics which do not generalize easily to hyperbolic geometry. One approach to generalize the arithmetic mean is to notice that it is the minimizer of the sum of squared distances to the inputs. Motivated by this, the Fréchet mean [11] of a set of points SS in a Riemannian manifold (M,dM)(M,d_{M}) is defined as:

μM(S)argminyMxSdM(x,y)2.\mu_{M}(S)\coloneqq\operatorname*{argmin}_{y\in M}\sum_{x\in S}d_{M}(x,y)^{2}.

This definition only depends on the intrinsic distance of the manifold. For hyperbolic spaces, since squared distance functions are convex, μ(S)\mu(S) always exists and is unique.222For more general geometries, existence and uniqueness hold if the data is well-localized  [20]. Analogously, the Fréchet variance is defined as:

σM2(S)1|S|xSNdM(x,μ(S))2.\sigma^{2}_{M}(S)\coloneqq\frac{1}{|S|}\sum_{x\in S}^{N}d_{M}(x,\mu(S))^{2}. (2)

We refer to [16] for a discussion on different intrinsic statistics in non-Euclidean spaces, and a study of their asymptotic properties.

Refer to caption
Figure 2: Hyperbolic geodesics (black lines) going through an ideal point (in red), and horospheres (red circles) centered at that same ideal point. The hyperbolic lengths of geodesic segments between two horospheres are equal.

3 Generalizing PCA to the Hyperbolic Space

We now describe our approach to generalize PCA to hyperbolic spaces. The starting point of HoroPCA is to pick 1Kd1\leq K\leq d ideal points p1,,pK𝕊d1p_{1},\dots,p_{K}\in\mathbb{S}_{\infty}^{d-1} to represent KK directions in hyperbolic spaces (Section 2.2). Then, we generalize the core concepts of Euclidean PCA. In Section 3.1, we show how to generalize flags. In Section 3.2, we show how to project points onto the lower-dimensional submanifold spanned by a given set of directions, while preserving information along each direction. In Section 3.3, we introducing a variance objective to optimize and show that it is a function of the directions only.

3.1 Hyperbolic Flags

In Euclidean spaces, one can take the linear spans of more and more components to define a nested sequence of linear subspaces, called a flag. To generalize this to hyperbolic spaces, we first need to adapt the notion of linear/affine spans. Recall that geodesic submanifolds are generalizations of affine subspaces in Euclidean spaces.

Definition 3.1.

Given a set of points SS (that could be inside d\mathbb{H}^{d} or on the boundary sphere 𝕊d1\mathbb{S}_{\infty}^{d-1}), the smallest geodesic submanifold of d\mathbb{H}^{d} that contains SS is called the geodesic hull of SS and denoted by GH(S)\operatorname{GH}(S).

Thus, given KK ideal points p1,p2,,pKp_{1},p_{2},\dots,p_{K} and a base point bdb\in\mathbb{H}^{d}, we can define a nested sequence of geodesic submanifolds GH(b,p1)GH(b,p1,p2)GH(b,p1,,pK)\operatorname{GH}(b,p_{1})\subset\operatorname{GH}(b,p_{1},p_{2})\subset\dots\subset\operatorname{GH}(b,p_{1},\dots,p_{K}). This will be our notion of flags.

Remark 3.2.

The base point bb is only needed here for technical reasons, just like an origin 𝐨\mathbf{o} is needed to define linear spans in Euclidean spaces. We will see next that it does not affect the projection results or objectives (Theorem 4.1).

Remark 3.3.

We assume that none of b,p1,,pKb,p_{1},\dots,p_{K} are in the geodesic hull of the other KK points. This is analogous to being linearly independent in Euclidean spaces.

3.2 Projections via Horospheres

In Euclidean PCA, points are projected to the subspaces spanned by the given directions in a way that preserves coordinates in those directions. We seek a projection method in hyperbolic spaces with a similar property.

Recall that coordinates are generalized by Busemann functions (Table 1), and that horospheres are level sets of Busemann functions. Thus, we propose a projection that preserves coordinates by moving points along horospheres. It turns out that this projection method also preserves distances better than the traditional geodesic projection.

As a toy example, we first show how the projection is defined in the K=1K=1 case (i.e. projecting onto a geodesic) and why it tends to preserve distances well. We will then show how to use K1K\geq 1 ideal points simultaneously.

3.2.1 Projecting onto K=1K=1 Directions

For K=1K=1, we have one ideal point pp and base point bb, and the geodesic hull GH(b,p)\operatorname{GH}(b,p) is just a geodesic γ\gamma. Our goal is to map every xdx\in\mathbb{H}^{d} to a point πb,pH(x)\pi^{\operatorname{H}}_{b,p}(x) on γ\gamma that has the same Busemann coordinate in the direction of pp:

Bp(x)=Bp(πb,pH(x)).B_{p}(x)=B_{p}(\pi^{\operatorname{H}}_{b,p}(x)).

Since level sets of Bp(x)B_{p}(x) are horospheres centered at pp, the above equation simply says that πb,pH(x)\pi^{\operatorname{H}}_{b,p}(x) belongs to the horosphere S(p,x)S(p,x) centered at pp and passing through xx. Thus, we define:

πb,pH(x)γS(p,x).\pi^{\operatorname{H}}_{b,p}(x)\coloneqq\gamma\cap S(p,x). (3)

Another important property that πb,pH()\pi^{\operatorname{H}}_{b,p}(\cdot) shares with orthogonal projections in Euclidean spaces is that it preserves distances along a direction – lengths of geodesic segments that point to pp are preserved after projection (Fig. 3):

Proposition 3.4.

For any xdx\in\mathbb{H}^{d}, if yGH(x,p)y\in\operatorname{GH}(x,p) then:

d(πb,pH(x),πb,pH(y))=d(x,y).d_{\mathbb{H}}(\pi^{\operatorname{H}}_{b,p}(x),\pi^{\operatorname{H}}_{b,p}(y))=d_{\mathbb{H}}(x,y).
Proof.

This follows from the remark in Section 2.2 about horospheres: every geodesic going through pp is orthogonal to every horosphere centered at pp, and every orthogonal geodesic segment connecting concentric horospheres has the same length (Fig. 2). In this case, the segments from xx to yy and from πb,pH(x)\pi^{\operatorname{H}}_{b,p}(x) to πb,pH(y)\pi^{\operatorname{H}}_{b,p}(y) are two such segments, connecting S(p,x)S(p,x) and S(p,y)S(p,y). ∎

Refer to caption
Figure 3: x,yx^{\prime},y^{\prime} are horospherical (green) projections of x,yx,y. 3.4 shows d(x,y)=d(x,y)d_{\mathbb{H}}(x^{\prime},y^{\prime})=d_{\mathbb{H}}(x,y). The distance between the two geodesic (blue) projections is smaller.

3.2.2 Projecting onto K>1K>1 Directions

We now generalize the above construction to projections onto higher-dimensional submanifolds. We describe the main ideas here; Appendix A contains more details, including an illustration in the case K=2K=2 (Fig. 5).

Fix a base point bdb\in\mathbb{H}^{d} and K>1K>1 ideal points {p1,,pK}\{p_{1},\dots,p_{K}\}. We want to define a map from d\mathbb{H}^{d} to MGH(b,p1,,pK)M\coloneqq\operatorname{GH}(b,p_{1},\dots,p_{K}) that preserves the Busemann coordinates in the directions of p1,,pKp_{1},\dots,p_{K}, i.e.:

Bpj(x)=Bpj(πb,p1,,pKH(x)) for every j=1,,K.B_{p_{j}}(x)=B_{p_{j}}\left(\pi^{\operatorname{H}}_{b,p_{1},\dots,p_{K}}(x)\right)\text{ for every }j=1,\dots,K.

As before, the idea is to take the intersection with the horospheres centered at pjp_{j}’s and passing through xx:

πb,p1,,pKH:d\displaystyle\pi^{\text{H}}_{b,p_{1},\dots,p_{K}}:\mathbb{H}^{d} M\displaystyle\to M
x\displaystyle x MS(p1,x)S(pK,x).\displaystyle\mapsto M\cap S(p_{1},x)\cap\dots\cap S(p_{K},x).

It turns out that this intersection generally consists of two points instead of one. When that happens, one of them will be strictly closer to the base point bb, and we define πb,p1,,pKH(x)\pi^{\operatorname{H}}_{b,p_{1},\dots,p_{K}}(x) to be that point.

As with 3.4, πb,p1,,pKH()\pi^{\operatorname{H}}_{b,p_{1},\dots,p_{K}}(\cdot) preserves distances along KK-dimensional manifolds (A.10). In contrast, geodesic projections in hyperbolic spaces never preserve distances (except between points already in the target):

Proposition 3.5.

Let MdM\subset\mathbb{H}^{d} be a geodesic submanifold. Then every geodesic segment of distance at least rr from MM gets at least cosh(r)\cosh(r) times shorter under the geodesic projection πMG()\pi^{\operatorname{G}}_{M}(\cdot) to MM:

length(πMG(I))1cosh(r)length(I).\operatorname{length}(\pi^{\operatorname{G}}_{M}(I))\leq\frac{1}{\cosh(r)}\operatorname{length}(I).

In particular, the shrink factor grows exponentially as the segment II moves away from MM.

The proof is in Appendix B.

Computation

Interestingly, horosphere projections can be computed without actually computing the horospheres. The key idea is that if we let P=GH(p1,,pK)P=\operatorname{GH}(p_{1},\dots,p_{K}) be the geodesic hull of the horospheres’ centers, then the intersection S(p1,x)S(pK,x)S(p_{1},x)\cap\dots\cap S(p_{K},x) is simply the orbit of xx under the rotations around PP. (This is true for the same reason that spheres whose centers lie on the same axis must intersect along a circle around that axis). Thus, πb,p1,,pKH()\pi^{\operatorname{H}}_{b,p_{1},\dots,p_{K}}(\cdot) can be viewed as the map that rotates xx around until it hits MM. It follows that it can be computed by:

  1. 1.

    Find the geodesic projection c=πPG(x)c=\pi^{\operatorname{G}}_{P}(x) of xx onto PP.

  2. 2.

    Find the geodesic α\alpha on MM that is orthogonal to PP at cc.

  3. 3.

    Among the two points on α\alpha whose distance to cc equals d(x,c)d_{\mathbb{H}}(x,c), returns the one closer to bb.

The detailed computations and proof that this recovers horospherical projections are provided in Appendix A.

3.3 Intrinsic Variance Objective

In Euclidean PCA, directions are chosen to maximally preserve information from the original data. In particular, PCA chooses directions that maximize the Euclidean variance of projected data. To generalize this to hyperbolic geometry, we define an analog of variance that is intrinsic, i.e. dependent only on the distances between data points. As we will see in Section 4, having an intrinsic objective helps make our algorithm location (or base point) independent.

The usual notion of Euclidean variance is the squared sum of distances to the mean of the projected datapoints. Generalizing this is challenging because non-Euclidean spaces do not have a canonical choice of mean. Previous works have generalized variance either by using the unexplained variance or Fréchet variance. The former is the squared sum of residual distances to the projections, and thus avoids computing a mean. However, it is not intrinsic. The latter is intrinsic [10] but involves finding the Fréchet mean, which is not necessarily a canonical notion of mean and can only be computed by gradient descent.

Our approach uses the observation that in Euclidean space:

σ2(S)=1nxSxμ(S)2=1n2x,ySxy2.\sigma^{2}(S)=\frac{1}{n}\sum_{x\in S}\left\|x-\mu(S)\right\|^{2}=\frac{1}{n^{2}}\sum_{x,y\in S}\|x-y\|^{2}.

Thus, we propose the following generalization of variance:

σ2(S)=1n2x,ySd(x,y)2.\sigma_{\mathbb{H}}^{2}(S)=\frac{1}{n^{2}}\sum_{x,y\in S}d_{\mathbb{H}}(x,y)^{2}. (4)

This function agrees with the usual variance in Euclidean space, while being a function of distances only. Thus it is well defined in non-Euclidean space, is easily computed, and, as we will show next, has the desired invariance due to isometry properties of horospherical projections.

4 HoroPCA

Section 3 formulated several simple primitives – including directions, flags, projections, and variance – in a way that is generalizable to hyperbolic geometry. We now revisit standard PCA, showing how it has a simple definition that combines these primitives using optimization. This directly leads to the full HoroPCA algorithm by simply using the hyperbolic analogs of these primitives.

Euclidean PCA

Given a dataset SS and a target dimension KK, Euclidean PCA greedily finds a sequence of principal components p1,,pKp_{1},\dots,p_{K} that maximizes the variance of orthogonal projections π𝐨,p1,,pkE()\pi^{\operatorname{E}}_{\mathbf{o},p_{1},\dots,p_{k}}(\cdot) onto the linear 333Here 𝐨\mathbf{o} denotes the origin, and π𝐨,p1,,pkE()\pi^{\operatorname{E}}_{\mathbf{o},p_{1},\dots,p_{k}}(\cdot) denotes the projection onto the affine span of {𝐨,p1,,pk}\{\mathbf{o},p_{1},\dots,p_{k}\}, which is equivalent to the linear span of {p1,,pk}\{p_{1},\dots,p_{k}\}. subspaces spanned by these components:

p1\displaystyle p_{1} =argmaxp=1σ2(π𝐨,pE(S))\displaystyle=\underset{||p||=1}{\mathrm{argmax}}\ \sigma^{2}(\pi^{\operatorname{E}}_{\mathbf{o},p}(S))
andpk+1\displaystyle\text{and}\ p_{k+1} =argmaxp=1σ2(π𝐨,p1,,pk,pE(S)).\displaystyle=\underset{||p||=1}{\mathrm{argmax}}\ \sigma^{2}(\pi^{\operatorname{E}}_{\mathbf{o},p_{1},\dots,p_{k},p}(S)).

Thus, for each 1kK1\leq k\leq K, {p1,,pk}\{p_{1},\dots,p_{k}\} is the optimal set of directions of dimension kk.

HoroPCA

Because we have generalized the notions of flag, projection, and variance to hyperbolic geometry, the HoroPCA algorithm can be defined in the same fashion. Given a dataset SS in d\mathbb{H}^{d} and a base point bdb\in\mathbb{H}^{d}, we seek a sequence of KK directions that maximizes the variance of horosphere-projected data:

p1=argmaxp𝕊d1σ2(πb,pH(S))andpk+1=argmaxp𝕊d1σ2(πb,p1,,pk,pH(S)).\begin{split}p_{1}&=\operatorname*{argmax}_{p\in\mathbb{S}_{\infty}^{d-1}}\sigma^{2}_{\mathbb{H}}(\pi^{\operatorname{H}}_{b,p}(S))\\ \text{and}\ p_{k+1}&=\operatorname*{argmax}_{p\in\mathbb{S}_{\infty}^{d-1}}\sigma^{2}_{\mathbb{H}}(\pi^{\operatorname{H}}_{b,p_{1},\dots,p_{k},p}(S)).\end{split} (5)
Base point independence

Finally, we show that algorithm (5) always returns the same results regardless of the choice of a base point bdb\in\mathbb{H}^{d}. Since our variance objective only depends on the distances between projected data points, it suffices to show that these distances do not depend on bb.

Theorem 4.1.

For any b,bb,b^{\prime} and any x,ydx,y\in\mathbb{H}^{d}, the two projected distances d(πb,p1,,pKH(x),πb,p1,,pKH(y))d_{\mathbb{H}}(\pi^{\operatorname{H}}_{b,p_{1},\dots,p_{K}}(x),\pi^{\operatorname{H}}_{b,p_{1},\dots,p_{K}}(y)) and d(πb,p1,,pKH(x),πb,p1,,pKH(y))d_{\mathbb{H}}(\pi^{\operatorname{H}}_{b^{\prime},p_{1},\dots,p_{K}}(x),\pi^{\operatorname{H}}_{b^{\prime},p_{1},\dots,p_{K}}(y)) are equal.

The proof is included in Appendix A. Thus, HoroPCA retains the location-independence property of Euclidean PCA: only the directions of target subspaces matter; their exact locations do not (Fig. 1). Therefore, just like in the Euclidean setting, we can assume without loss of generality that bb is the origin 𝐨\mathbf{o} of the Poincaré model. This:

  1. 1.

    alleviates the need to use dd extra parameters to search for an appropriate base point, and

  2. 2.

    simplifies computations, since in the Poincaré model, geodesics submanifolds that go through the origin are simply linear subspaces, which are easier to deal with.

After computing the principal directions which span the target M=GH(𝐨,p1,,pK)M=\operatorname{GH}(\mathbf{o},p_{1},\dots,p_{K}), the reduced dimensionality data can be found by applying an Euclidean rotation that sends MM to K\mathbb{H}^{K}, which also preserves hyperbolic distances.

5 Experiments


Balanced Tree Phylo Tree Diseases CS Ph.D.
distortion (\downarrow) variance (\uparrow) distortion (\downarrow) variance (\uparrow) distortion (\downarrow) variance (\uparrow) distortion (\downarrow) variance (\uparrow)
PCA 0.84 0.34 0.94 0.40 0.90 0.26 0.84 1.68
tPCA 0.70 1.16 0.63 14.34 0.63 3.92 0.56 11.09
PGA 0.63 ±\pm 0.07 2.11 ±\pm 0.47 0.64 ±\pm 0.01 15.29 ±\pm 0.51 0.66 ±\pm 0.02 3.16 ±\pm 0.39 0.73 ±\pm 0.02 6.14 ±\pm 0.60
PGA-Noise 0.87 ±\pm 0.08 0.29 ±\pm 0.30 0.64 ±\pm 0.02 15.08 ±\pm 0.77 0.88 ±\pm 0.04 0.53 ±\pm 0.19 0.79 ±\pm 0.03 4.58 ±\pm 0.64
BSA 0.50 ±\pm 0.00 3.02 ±\pm 0.01 0.61 ±\pm 0.03 18.60 ±\pm 1.16 0.52 ±\pm 0.02 5.95 ±\pm 0.25 0.70 ±\pm 0.01 8.15 ±\pm 0.96
BSA-Noise 0.74 ±\pm 0.12 1.06 ±\pm 0.67 0.68 ±\pm 0.02 13.71 ±\pm 0.72 0.80 ±\pm 0.11 1.62 ±\pm 1.30 0.79 ±\pm 0.02 4.41 ±\pm 0.59
hAE 0.26 ±\pm 0.00 6.91 ±\pm 0.00 0.32 ±\pm 0.04 45.87 ±\pm 3.52 0.18 ±\pm 0.00 14.23 ±\pm 0.06 0.37 ±\pm 0.02 22.12 ±\pm 2.47
hMDS 0.22 7.54 0.74 40.51 0.21 15.05 0.83 19.93
HoroPCA 0.19 ±\pm 0.00 7.15 ±\pm 0.00 0.13 ±\pm 0.01 69.16 ±\pm 1.96 0.15 ±\pm 0.01 15.46 ±\pm 0.19 0.16 ±\pm 0.02 36.79 ±\pm 0.70
Table 2: Dimensionality reduction results on 10-dimensional hyperbolic embeddings reduced to two dimensions. Results are averaged over 5 runs for non-deterministic methods. Best in bold and second best underlined.

We now validate the empirical benefits of HoroPCA on three PCA uses. First, for dimensionality reduction, HoroPCA preserves information (distances and variance) better than previous methods which are sensitive to base point choices and distort distances more (Section 5.2). Next, we validate that our notion of hyperbolic coordinates captures variation in the data and can be used for whitening in classification tasks (Section 5.3). Finally, we visualize the representations learned by HoroPCA in two dimensions (Section 5.4).

5.1 Experimental Setup

Baselines

We compare HoroPCA to several dimensionality reduction methods, including: (1) Euclidean PCA, which should perform poorly on hyperbolic data, (2) Exact PGA, (3) Tangent PCA (tPCA), which approximates PGA by moving the data in the tangent space of the Fréchet mean and then solves Euclidean PCA, (4) BSA, (5) Hyperbolic Multi-dimensional Scaling (hMDS) [27], which takes a distance matrix as input and recovers a configuration of points that best approximates these distances, (6) Hyperbolic autoencoder (hAE) trained with gradient descent [12, 15]. To demonstrate their dependence on base points, we also include two baselines that perturb the base point in PGA and BSA. We open-source our implementation444https://github.com/HazyResearch/HoroPCA and refer to Appendix C for implementation details on how we implemented all baselines and HoroPCA.

Datasets

For dimensionality reduction experiments, we consider standard hierarchical datasets previously used to evaluate the benefits of hyperbolic embeddings. More specifically, we use the datasets in [27] including a fully balanced tree, a phylogenetic tree, a biological graph comprising of diseases’ relationships and a graph of Computer Science (CS) Ph.D. advisor-advisee relationships. These datasets have respectively 40, 344, 516 and 1025 nodes, and we use the code from [13] to embed them in the Poincaré ball. For data whitening experiments, we reproduce the experimental setup from [8] and use the Polbooks, Football and Polblogs datasets which have 105, 115 and 1224 nodes each. These real-world networks are embedded in two-dimensions using Chamberlain et al. [4]’s embedding method.

Evaluation metrics

To measure distance-preservation after projection, we use average distortion. If π()\pi(\cdot) denotes a mapping from high- to low-dimensional representations, the average distortion of a dataset SS is computed as:

1(|S|2)xyS|d(π(x),π(y))d(x,y)|d(x,y).\frac{1}{\binom{|S|}{2}}\sum_{x\neq y\in S}\frac{|d_{\mathbb{H}}(\pi(x),\pi(y))-d_{\mathbb{H}}(x,y)|}{d_{\mathbb{H}}(x,y)}.

We also measure the Fréchet variance in Eq. 2, which is the analogue of the objective that Euclidean PCA optimizes555All mentioned PCA methods, including HoroPCA, optimize for some forms of variance but not Fréchet variance or distortion.. Note that the mean in Eq. 2 cannot be computed in closed-form and we therefore compute it with gradient-descent.

5.2 Dimensionality Reduction

We report metrics for the reduction of 10-dimensional embeddings to two dimensions in Table 2, and refer to Appendix C for additional results, such as more component and dimension configurations. All results suggest that HoroPCA better preserves information contained in the high-dimensional representations.

On distance preservation, HoroPCA outperforms all methods with significant improvements on larger datasets. This supports our theoretical result that horospherical projections better preserve distances than geodesic projections. Furthermore, HoroPCA also outperforms existing methods on the explained Fréchet variance metric on all but one dataset. This suggests that our distance-based formulation of the variance (Eq. 4) effectively captures variations in the data. We also note that as expected, both PGA and BSA are sensitive to base point choices: adding Gaussian noise to the base point leads to significant drops in performance. In contrast, HoroPCA is by construction base-point independent.

5.3 Hyperbolic Data Whitening

An important use of PCA is for data whitening, as it allows practitioners to remove noise and decorrelate the data, which can improve downstream tasks such as regression or classification. Recall that standard PCA data whitening consists of (i) finding principal directions that explain the data, (ii) calculating the coordinates of each data point along these directions, and (iii) normalizing the coordinates for each direction (to have zero mean and unit variance).

Because of the close analogy between HoroPCA and Euclidean PCA, these steps can easily map to the hyperbolic case, where we (i) use HoroPCA to find principal directions (ideal points), (ii) calculate the Busemann coordinates along these directions, and (iii) normalize them as Euclidean coordinates. Note that this yields Euclidean representations, which allow leveraging powerful tools developed specifically for learning on Euclidean data.

We evaluate the benefit of this whitening step on a simple classification task. We compare to directly classifying the data with Euclidean Support Vector Machine (eSVM) or its hyperbolic counterpart (hSVM), and also to whitening with tPCA. Note that most baselines in Section 5.1 are incompatible with data whitening: hMDS does not learn a transformation that can be applied to unseen test data, while methods like PGA and BSA do not naturally return Euclidean coordinates for us to normalize. To obtain another baseline, we use a logarithmic map to extract Euclidean coordinates from PGA.


Polbooks Football Polblogs
eSVM 69.9±\pm 1.2 20.7 ±\pm 3.0 92.3 ±\pm 1.5
hSVM 68.3 ±\pm 0.6 20.9 ±\pm 2.5 92.2 ±\pm 1.6
tPCA+eSVM 68.5 ±\pm 0.9 21.2 ±\pm 2.2 92.4 ±\pm 1.5
PGA+eSVM 64.4 ±\pm 4.1 21.7 ±\pm 2.2 82.3 ±\pm 1.2
HoroPCA+eSVM 72.2 ±\pm 2.8 25.0 ±\pm 1.0 92.8 ±\pm 0.9
Table 3: Data whitening experiments. We report classification accuracy averaged over 5 embedding configurations. Best in bold and second best underlined.

We reproduce the experimental setup from [8] who split the datasets in 50% train and 50% test sets, run classification on 2-dimensional embeddings and average results over 5 different embedding configurations as was done in the original paper (Table 3). 666Note that the results slightly differ from [8] which could be because of different implementations or data splits. HoroPCA whitening improves downstream classification on all datasets compared to eSVM and hSVM or tPCA and PGA whitening. This suggests that HoroPCA can be leveraged for hyperbolic data whitening. Further, this confirms that Busemann coordinates do capture variations in the original data.

5.4 Visualizations

When learning embeddings for ML applications (e.g. classification), increasing the dimensionality can significantly improve the embeddings’ quality. To effectively work with these higher-dimensional embeddings, it is useful to visualize their structure and organization, which often requires reducing their representations to two or three dimensions. Here, we consider embeddings of the mammals subtree of the Wordnet noun hierarchy learned with the algorithm from [25]. We reduce embeddings to two dimensions using PGA and HoroPCA and show the results in Fig. 4. We also include more visualizations for PCA and BSA in Fig. 8 in the Appendix. As we can see, the reduced representations obtained with HoroPCA yield better visualizations. For instance, we can see some hierarchical patterns such as “feline hypernym of cat” or “cat hypernym of burmese cat”. These patterns are harder to visualize for other methods, since these do not preserve distances as well as HoroPCA, e.g. PGA has 0.534 average distortion on this dataset compared to 0.078 for HoroPCA.

Refer to caption
(a) PGA (average distortion: 0.534)
Refer to caption
(b) HoroPCA (average distortion: 0.078)
Figure 4: Visualization of embeddings of the WordNet mammal subtree computed by reducing 10-dimensional Poincaré embeddings [25].

6 Related Work

PCA methods in Riemannian manifolds

We first review some approaches to extending PCA to general Riemannian geometries, of which hyperbolic geometry is a special case. For a more detailed discussion, see Pennec [26]. The simplest such approach is tangent PCA (tPCA), which maps the data to the tangent space at the Fréchet mean μ\mu using the logarithm map, then applies Euclidean PCA. A similar approach, Principal Geodesic Analysis (PGA) [10], seeks geodesic subspaces at μ\mu that minimize the sum of squared Riemannian distances to the data. Compared to tPCA, PGA searches through the same subspaces but uses a more natural loss function.

Both PGA and tPCA project on submanifolds that go through the Fréchet mean. When the data is not well-centered, this may be sub-optimal, and Geodesic PCA (GPCA) was proposed to alleviate this issue [17, 18]. GPCA first finds a geodesic γ\gamma that best fits the data, then finds other orthogonal geodesics that go through some common point bb on γ\gamma. In other words, GPCA removes the constraint of PGA that bb is the Fréchet mean. Extensions of GPCA have been proposed such as probabilistic methods [36] and Horizontal Component Analysis [29].

Pennec [26] proposes a more symmetric approach. Instead of using the exponential map at a base point, it parameterizes KK-dimensional subspaces as the barycenter loci of K+1K+1 points. Nested sequences of subspaces (flags) can be formed by simply adding more points. In hyperbolic geometry, this construction coincides with the one based on geodesic hulls that we use in Section 3, except that it applies to points inside d\mathbb{H}^{d} instead of ideal points and thus needs more parameters to parameterize a flag (see Remark A.8).

By considering a more general type of submanifolds, Hauberg [14] gives another way to avoid the sensitive dependence on Fréchet mean. However, its bigger search space also makes the method computationally expensive, especially when the target dimension KK is bigger than 11.

In contrast with all methods so far, HoroPCA relies on horospherical projections instead of geodesic projections. This yields a generalization of PCA that depends only on the directions and not specific locations of the subspaces.

Dimension reduction in hyperbolic geometry

We now review some dimension reduction methods proposed specifically for hyperbolic geometry. Cvetkovski & Crovella [9] and Sala et al. [27] are examples of hyperbolic multidimensional scaling methods, which seek configurations of points in lower-dimensional hyperbolic spaces whose pairwise distances best approximate a given dissimilarity matrix. Unlike HoroPCA, they do not learn a projection that can be applied to unseen data.

Tran & Vu [33] constructs a map to lower-dimensional hyperbolic spaces whose preimages of compact sets are compact. Unlike most methods, it is data-agnostic and does not optimize any objective.

Benjamini & Makarychev [2] adapt the Euclidean Johnson–Lindenstrauss transform to hyperbolic geometry and obtain a distortion bound when the dataset size is not too big compared to the target dimension. They do not seek an analog of Euclidean directions or projections, but nevertheless implicitly use a projection method based on pushing points along horocycles, which shares many properties with our horospherical projections. In fact, the latter converges to the former as the ideal points get closer to each other.

From hyperbolic to Euclidean

Liu et al. [23] use “distances to centroids” to compute Euclidean representations of hyperbolic data. The Busemann functions we use bear resemblances to these centroid-based functions but are better analogs of coordinates along given directions, which is a central concept in PCA, and have better regularity properties [3]. Recent works have also used Busemann functions for hyperbolic prototype learning [19, 34]. These works do not define projections to lower-dimensional hyperbolic spaces. In contrast, HoroPCA naturally returns both hyperbolic representations (via horospherical projections) and Euclidean representations (via Busemann coordinates). This allows leveraging techniques in both settings.

7 Conclusion

We proposed HoroPCA, a method to generalize PCA to hyperbolic spaces. In contrast with previous PCA generalizations, HoroPCA preserves the core location-independence PCA property. Empirically, HoroPCA significantly outperforms previous methods on the reduction of hyperbolic data. Future extensions of this work include deriving a closed-form solution, analyzing the stability properties of HoroPCA, or using the concepts introduced in this work to derive efficient nearest neighbor search algorithms or neural network operations.

Acknowledgements

We gratefully acknowledge the support of NIH under No. U54EB020405 (Mobilize), NSF under Nos. CCF1763315 (Beyond Sparsity), CCF1563078 (Volume to Velocity), and 1937301 (RTML); ONR under No. N000141712266 (Unifying Weak Supervision); the Moore Foundation, NXP, Xilinx, LETI-CEA, Intel, IBM, Microsoft, NEC, Toshiba, TSMC, ARM, Hitachi, BASF, Accenture, Ericsson, Qualcomm, Analog Devices, the Okawa Foundation, American Family Insurance, Google Cloud, Swiss Re, Total, the HAI-AWS Cloud Credits for Research program, the Stanford Data Science Initiative (SDSI), and members of the Stanford DAWN project: Facebook, Google, and VMWare. The Mobilize Center is a Biomedical Technology Resource Center, funded by the NIH National Institute of Biomedical Imaging and Bioengineering through Grant P41EB027060. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views, policies, or endorsements, either expressed or implied, of NIH, ONR, or the U.S. Government.

References

  • Balazevic et al. [2019] Balazevic, I., Allen, C., and Hospedales, T. Multi-relational poincaré graph embeddings. In Advances in Neural Information Processing Systems, pp. 4463–4473, 2019.
  • Benjamini & Makarychev [2009] Benjamini, I. and Makarychev, Y. Dimension reduction for hyperbolic space. Proceedings of the American Mathematical Society, pp. 695–698, 2009.
  • Busemann [1955] Busemann, H. The geometry of geodesics. Academic Press Inc., New York, N. Y., 1955.
  • Chamberlain et al. [2017] Chamberlain, B., Clough, J., and Deisenroth, M. Neural embeddings of graphs in hyperbolic space. In CoRR. MLG Workshop 2017, 2017.
  • Chami et al. [2019] Chami, I., Ying, Z., Ré, C., and Leskovec, J. Hyperbolic graph convolutional neural networks. In Advances in neural information processing systems, pp. 4868–4879, 2019.
  • Chami et al. [2020a] Chami, I., Gu, A., Chatziafratis, V., and Ré, C. From trees to continuous embeddings and back: Hyperbolic hierarchical clustering. Advances in Neural Information Processing Systems, 33, 2020a.
  • Chami et al. [2020b] Chami, I., Wolf, A., Juan, D.-C., Sala, F., Ravi, S., and Ré, C. Low-dimensional hyperbolic knowledge graph embeddings. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp.  6901–6914, 2020b.
  • Cho et al. [2019] Cho, H., DeMeo, B., Peng, J., and Berger, B. Large-margin classification in hyperbolic space. In The 22nd International Conference on Artificial Intelligence and Statistics, pp.  1832–1840. PMLR, 2019.
  • Cvetkovski & Crovella [2011] Cvetkovski, A. and Crovella, M. Multidimensional scaling in the poincaré disk. arXiv preprint arXiv:1105.5332, 2011.
  • Fletcher et al. [2004] Fletcher, P. T., Conglin Lu, Pizer, S. M., and Sarang Joshi. Principal geodesic analysis for the study of nonlinear statistics of shape. IEEE Transactions on Medical Imaging, 23(8):995–1005, 2004.
  • Fréchet [1948] Fréchet, M. Les éléments aléatoires de nature quelconque dans un espace distancié. In Annales de l’institut Henri Poincaré, volume 10, pp. 215–310, 1948.
  • Ganea et al. [2018] Ganea, O.-E., Bécigneul, G., and Hofmann, T. Hyperbolic neural networks. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, pp.  5350–5360, 2018.
  • Gu et al. [2018] Gu, A., Sala, F., Gunel, B., and Ré, C. Learning mixed-curvature representations in product spaces. In International Conference on Learning Representations, 2018.
  • Hauberg [2016] Hauberg, S. Principal curves on riemannian manifolds. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(9):1915–1921, 2016. doi: 10.1109/TPAMI.2015.2496166.
  • Hinton & Salakhutdinov [2006] Hinton, G. E. and Salakhutdinov, R. R. Reducing the dimensionality of data with neural networks. science, 313(5786):504–507, 2006.
  • Huckemann & Eltzner [2020] Huckemann, S. and Eltzner, B. Statistical methods generalizing principal component analysis to non-euclidean spaces. In Handbook of Variational Methods for Nonlinear Geometric Data, pp.  317–338. Springer, 2020.
  • Huckemann & Ziezold [2006] Huckemann, S. and Ziezold, H. Principal component analysis for riemannian manifolds, with an application to triangular shape spaces. Advances in Applied Probability, 38(2):299–319, 2006.
  • Huckemann et al. [2010] Huckemann, S., Hotz, T., and Munk, A. Intrinsic shape analysis: Geodesic pca for riemannian manifolds modulo isometric lie group actions. Statistica Sinica, pp.  1–58, 2010.
  • Keller-Ressel [2020] Keller-Ressel, M. A theory of hyperbolic prototype learning. arXiv preprint arXiv:2010.07744, 2020.
  • Kendall [1990] Kendall, W. S. Probability, convexity, and harmonic maps with small image i: uniqueness and fine existence. Proceedings of the London Mathematical Society, 3(2):371–406, 1990.
  • Krauthgamer & Lee [2006] Krauthgamer, R. and Lee, J. R. Algorithms on negatively curved spaces. In 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS’06), pp.  119–132. IEEE, 2006.
  • Lee [2013] Lee, J. M. Smooth manifolds. In Introduction to Smooth Manifolds, pp.  1–31. Springer, 2013.
  • Liu et al. [2019] Liu, Q., Nickel, M., and Kiela, D. Hyperbolic graph neural networks. In Advances in Neural Information Processing Systems, pp. 8230–8241, 2019.
  • Monath et al. [2019] Monath, N., Zaheer, M., Silva, D., McCallum, A., and Ahmed, A. Gradient-based hierarchical clustering using continuous representations of trees in hyperbolic space. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp.  714–722, 2019.
  • Nickel & Kiela [2017] Nickel, M. and Kiela, D. Poincaré embeddings for learning hierarchical representations. In Advances in neural information processing systems, pp. 6338–6347, 2017.
  • Pennec [2018] Pennec, X. Barycentric subspace analysis on manifolds. Annals of Statistics, 46(6A):2711–2746, 2018.
  • Sala et al. [2018] Sala, F., De Sa, C., Gu, A., and Ré, C. Representation tradeoffs for hyperbolic embeddings. In International Conference on Machine Learning, pp. 4460–4469, 2018.
  • Sarkar [2011] Sarkar, R. Low distortion delaunay embedding of trees in hyperbolic plane. In International Symposium on Graph Drawing, pp.  355–366. Springer, 2011.
  • Sommer [2013] Sommer, S. Horizontal dimensionality reduction and iterated frame bundle development. In International Conference on Geometric Science of Information, pp.  76–83. Springer, 2013.
  • Tay et al. [2018] Tay, Y., Tuan, L. A., and Hui, S. C. Hyperbolic representation learning for fast and efficient neural question answering. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, pp.  583–591, 2018.
  • Thurston [1978] Thurston, W. P. Geometry and topology of three-manifolds, 1978. Lecture notes. Available at http://library.msri.org/books/gt3m/.
  • Tifrea et al. [2018] Tifrea, A., Becigneul, G., and Ganea, O.-E. Poincare glove: Hyperbolic word embeddings. In International Conference on Learning Representations, 2018.
  • Tran & Vu [2008] Tran, D. A. and Vu, K. Dimensionality reduction in hyperbolic data spaces: Bounding reconstructed-information loss. In Seventh IEEE/ACIS International Conference on Computer and Information Science (icis 2008), pp.  133–139. IEEE, 2008.
  • Wang [2021] Wang, M.-X. Laplacian eigenspaces, horocycles and neuron models on hyperbolic spaces, 2021. URL https://openreview.net/forum?id=ZglaBL5inu.
  • Wu & Charikar [2020] Wu, X. and Charikar, M. Nearest neighbor search for hyperbolic embeddings. arXiv preprint arXiv:2009.00836, 2020.
  • Zhang & Fletcher [2013] Zhang, M. and Fletcher, T. Probabilistic principal geodesic analysis. In Advances in Neural Information Processing Systems, pp. 1178–1186, 2013.

Appendix A Horospherical Projection: Proofs and Discussions

We show that the horospherical projection πb,p1,,pKH\pi^{\operatorname{H}}_{b,p_{1},\dots,p_{K}} in Section 3.2.2 is well-defined, shares many nice properties with Euclidean orthogonal projections, and has a closed-form expression.

More specifically, this section is organized as follows. In Section A.1, we show that horospherical projections are well-defined (Theorem A.4). In Section A.2, we show that they are base-point independent (A.7), non-expanding (A.12), and distance-preserving along a family of KK-dimensional submanifolds (A.10). Finally, in Section A.4, we explain how they can be computed efficiently in the hyperboloid model.

For an illustration in the case of K=2K=2 ideal points in 3\mathbb{H}^{3}, see Fig. 5. This figure might help explain the intuitions behind the theorems in Section A.1 and Section A.2.

A.1 Well-definedness

Recall that given a base point bdb\in\mathbb{H}^{d} and K>1K>1 ideal points {p1,,pK}\{p_{1},\dots,p_{K}\}, we would like to define πb,p1,,pKH\pi^{\operatorname{H}}_{b,p_{1},\dots,p_{K}} by

xMS(p1,x)S(p2,x)S(pK,x),x\to M\cap S(p_{1},x)\cap S(p_{2},x)\cap\dots\cap S(p_{K},x),

where M=GH(b,p1,,pK)M=\operatorname{GH}(b,p_{1},\dots,p_{K}) is the target submanifold and S(pj,x)S(p_{j},x) is the horosphere centered at pjp_{j} and passing through xx. For this definition to make sense, the intersection in the right hand side must contain exactly one point for each xdx\in\mathbb{H}^{d}. Unfortunately, this is not the case: In fact, the intersection generally consists of two points. Nevertheless, we will show that there is a consistent way to choose one from these two points, making the function πb,p1,,pKH\pi^{\operatorname{H}}_{b,p_{1},\dots,p_{K}} well-defined. This is the result of Theorem A.4.

First, to understand the above intersection, we give a more concrete description of jS(pj,x)\cap_{j}S(p_{j},x).

Lemma A.1.

Let P=GH(p1,p2,,pK)P=\operatorname{GH}(p_{1},p_{2},\dots,p_{K}). Then for every xdx\in\mathbb{H}^{d}, the intersection of horospheres

S(x)=S(p1,x)S(p2,x)S(pk,x)S(x)=S(p_{1},x)\cap S(p_{2},x)\cap\dots\cap S(p_{k},x)

is precisely the orbit of xx under the group GG of rotations around PP.

Proof.

First, note that every rotation around PP preserves the horospheres S(pj,x)S(p_{j},x) - just like how every rotation around an axis preserves every sphere whose center is on that axis. It follows that S(x)S(x) is preserved by GG. In particular, the orbit of xx under GG is contained in S(x)S(x).

It remains to show that S(x)S(x) contains no other points. To this end, consider any yxy\neq x in S(x)S(x). The perpendicular bisector BB of xx and yy is a totally geodesic hyperplane of d\mathbb{H}^{d} that contains every pjp_{j} (because each pjp_{j} is intuitively the center of a sphere that goes through xx and yy). Thus, by the definition of geodesic hull, BPB\supset P. In particular, the reflection through BB sends xx to yy and fixes every point in PP.

Now take any geodesic hyperplane AA that contains both PP and yy, so that the reflection through AA fixes yy and every point in PP. Then the composition of the reflections through BB and AA is a rotation that sends xx to yy and fixes every point in PP. In other words, it is a rotation around PP that sends xx to yy. Therefore, yy belongs to the orbit of xx under GG. ∎

Corollary A.2.

If xPx\in P then S(x)={x}S(x)=\{x\}. Otherwise, let πPG(x)\pi^{\operatorname{G}}_{P}(x) be the geodesic projection of xx onto PP, and Q(x)Q(x) be the geodesic submanifold that orthogonally complements PP at πPG(x)\pi^{\operatorname{G}}_{P}(x). Then S(x)Q(x)S(x)\subset Q(x) and is precisely the (hyper)sphere in Q(x)Q(x) that is centered at πPG(x)\pi^{\operatorname{G}}_{P}(x) and passing through xx.

Proof.

This follows from Lemma A.1. If xPx\in P then every rotation around PP fixes xx, so the orbit of xx is just itself.

Now consider the case xPx\not\in P. All rotations around PP must preserve πPG(x)\pi^{\operatorname{G}}_{P}(x) and the orthogonal complement Q(x)Q(x) of PP at πPG(x)\pi^{\operatorname{G}}_{P}(x). Furthermore, when restricted to the space Q(x)Q(x), these rotations are precisely the rotations in Q(x)Q(x) around the point πPG(x)\pi^{\operatorname{G}}_{P}(x). Thus, for every yQ(x)y\in Q(x), the orbit of yy under GG is a sphere in Q(x)Q(x) centered at πPG(x)\pi^{\operatorname{G}}_{P}(x). In particular, S(x)S(x), which is the orbit of xx, is the sphere in Q(x)Q(x) that is centered at πPG(x)\pi^{\operatorname{G}}_{P}(x) and passing through xx. ∎

A.2 gives the following characterization of the intersection

MS(p1,x)S(p2,x)S(pK,x)=MS(x):M\cap S(p_{1},x)\cap S(p_{2},x)\cap\dots\cap S(p_{K},x)=M\cap S(x):

Note that P=GH(p1,,pK)P=\operatorname{GH}(p_{1},\dots,p_{K}) is a geodesic submanifold of M=GH(b,p1,,pK)M=\operatorname{GH}(b,p_{1},\dots,p_{K}) and that dimP=dimM1\dim P=\dim M-1. Thus, through every point yPy\in P, there is a unique geodesic α\alpha on MM that goes through yy and is perpendicular to PP.

Corollary A.3.

If xPx\in P then MS(x)={x}M\cap S(x)=\{x\}. Otherwise, let α\alpha be the geodesic on MM that goes through πPG(x)\pi^{\operatorname{G}}_{P}(x) and is perpendicular to PP. Then MS(x)M\cap S(x) consists of the two points on α\alpha whose distance to πPG(x)\pi^{\operatorname{G}}_{P}(x) equals d(x,πPG(x))d_{\mathbb{H}}(x,\pi^{\operatorname{G}}_{P}(x)).

Proof.

The case xPx\in P is clear since xMx\in M and S(x)={x}S(x)=\{x\} by A.2.

For the other case, let Q(x)Q(x) be the orthogonal complement of PP at πPG(x)\pi^{\operatorname{G}}_{P}(x). Then by A.2, S(x)S(x) is precisely the sphere in Q(x)Q(x) that is centered at c(x)c(x) and passing through xx.

Now note that MQ(x)=αM\cap Q(x)=\alpha. Since S(x)Q(x)S(x)\subset Q(x), this gives MS(x)=MQ(x)S(x)=αS(x)M\cap S(x)=M\cap Q(x)\cap S(x)=\alpha\cap S(x). We know that every sphere intersects every line through the center at two points. ∎

Therefore, to define πb,p1,,pKH(x)\pi^{\operatorname{H}}_{b,p_{1},\dots,p_{K}}(x), we just need to choose one of the two poins in MS(x)M\cap S(x) in a consistent way (so that the map is differentiable). To this end, note that PP cuts MM into two half-spaces, and exactly one of them contains the base point bb. (Recall that bMb\in M and bPb\not\in P by the “independence” condition). We denote this half by PbP_{b}. Then, while S(x)S(x) contains two points in MM, it only contains one point in PbP_{b}:

Theorem A.4.

Let α\alpha be the geodesic on MM that goes through πPG(x)\pi^{\operatorname{G}}_{P}(x) and is perpendicular to PP. Let α+\alpha^{+} be the half of α\alpha contained in PbP_{b}. Then α+\alpha^{+} intersects the sphere S(x)S(x) at a unique point xx^{\prime}, which is also the unique intersection point between PbP_{b} and S(x)S(x). Thus, we can define πb,p1,,pKH\pi^{\operatorname{H}}_{b,p_{1},\dots,p_{K}} by

πb,p1,,pKH(x)=α+S(x)=PbS(x).\pi^{\operatorname{H}}_{b,p_{1},\dots,p_{K}}(x)=\alpha^{+}\cap S(x)=P_{b}\cap S(x).

Equivalently, πb,p1,,pKH(x)\pi^{\operatorname{H}}_{b,p_{1},\dots,p_{K}}(x) is the point in MS(x)M\cap S(x) that is strictly closer to bb.

Proof.

By A.2, S(x)S(x) is a sphere centered at πPG(x)\pi^{\operatorname{G}}_{P}(x). Then, since α+\alpha^{+} is a geodesic ray starting at πPG(x)\pi^{\operatorname{G}}_{P}(x), it must intersect S(x)S(x) at a unique point xx^{\prime}.

Next, we have

PbS(x)=PbMS(x)=PbαS(x)=α+S(x),P_{b}\cap S(x)=P_{b}\cap M\cap S(x)=P_{b}\cap\alpha\cap S(x)=\alpha^{+}\cap S(x),

where the first equality holds because PbMP_{b}\subset M, the second because MS(x)=αS(x)M\cap S(x)=\alpha\cap S(x) by the proof of A.3, and the third because Pbα=α+P_{b}\cap\alpha=\alpha^{+}. Thus, PbS(x)P_{b}\cap S(x) is precisely xx^{\prime}.

Finally, let x′′x^{\prime\prime} be the other point of MS(x)=αS(x)M\cap S(x)=\alpha\cap S(x). Then PP is the perpendicular bisector of xx^{\prime} and x′′x^{\prime\prime} in MM. Thus, every point on the same side of PP in MM as xx^{\prime} (but not on the boundary PP) is strictly closer to xx^{\prime} than to x′′x^{\prime\prime}. By definition, bb is one of such point. Thus, πb,p1,,pKH(x)\pi^{\operatorname{H}}_{b,p_{1},\dots,p_{K}}(x) is the point in MS(x)M\cap S(x) that is closer to bb. ∎

Refer to caption
Figure 5: The horospherical projection πb,p1,p2H\pi^{\operatorname{H}}_{b,p_{1},p_{2}} from 3\mathbb{H}^{3} to 22 dimensions. Here p1,p2p_{1},p_{2} are ideal points, and the base point b3b\in\mathbb{H}^{3} is chosen to be the origin of the Poincaré ball. The geodesic hull GH(b,p1,p2)\operatorname{GH}(b,p_{1},p_{2}) is the hyperbolic plane bounded by the yellow circle. The geodesic between p1p_{1} and p2p_{2} is shown in blue and is called the spine P=GH(p1,p2)P=\operatorname{GH}(p_{1},p_{2}) of this projection. For any input x3x\in\mathbb{H}^{3}, the two horospheres S(p1,x)S(p_{1},x) and S(p2,x)S(p_{2},x) (shown in red) intersect along a circle S(x)S(x) (shown in green). Note that this circle is precisely the set {y3:Bpj(y)=Bpj(x),j=1,2}\{y\in\mathbb{H}^{3}:B_{p_{j}}(y)=B_{p_{j}}(x),j=1,2\} and is symmetric around the spine PP. It intersects GH(b,p1,p2)\operatorname{GH}(b,p_{1},p_{2}) at two points xx^{\prime} and x′′x^{\prime\prime}, which lie on opposite sides of the spine. Since xx^{\prime} belongs to the side containing bb, it is closer to bb than x′′x^{\prime\prime} is. Thus, we define πb,p1,p2H(x)\pi^{\operatorname{H}}_{b,p_{1},p_{2}}(x) to be xx^{\prime}.

A.2 Geometric properties

From Lemma A.1 and Theorem A.4, we obtain another interpretation of πb,p1,,pKH\pi^{\operatorname{H}}_{b,p_{1},\dots,p_{K}}: It maps d\mathbb{H}^{d} to PbMP_{b}\subset M by rotating every point xdx\in\mathbb{H}^{d} around PP until it hits PbP_{b}. In other words, we have

Theorem A.5 (The “open book” interpretation).

For any xPx\not\in P, let Mx=GH(P{x})M_{x}=\operatorname{GH}(P\cup\{x\}). Then PP cuts MxM_{x} into two half-spaces; we denote the half that contains xx by PxP_{x}. Then, when restricted to PxP_{x}, the map

πb,p1,,pKH:PxPb\pi^{\operatorname{H}}_{b,p_{1},\dots,p_{K}}:P_{x}\to P_{b}

is simply a rotation around PP.

Following this, we call PP the spine of the horosphere projection. The identity d=xPPx\mathbb{H}^{d}=\cup_{x\not\in P}P_{x} can be thought of as an open book decomposition of d\mathbb{H}^{d} into pages PxP_{x} that are bounded by the spine PP. The horosphere projection πb,p1,,pKH\pi^{\operatorname{H}}_{b,p_{1},\dots,p_{K}} then simply acts by collapsing every page onto a specified page PbP_{b}.

Here are some consequences of this interpretation:

Corollary A.6.

πb,p1,,pKH\pi^{\operatorname{H}}_{b,p_{1},\dots,p_{K}} only depends on the spine PP and not specifically on p1,pKp_{1},\dots p_{K}. Thus, when we are not interested in the specific ideal points, we simply write πb,PH\pi^{\operatorname{H}}_{b,P}.

Proof.

As noted above, πb,p1,,pKH(x)\pi^{\operatorname{H}}_{b,p_{1},\dots,p_{K}}(x) can be obtained by rotating xx around PP until it hits PbP_{b}. This operation does not used the exact positions of pjp_{j} at all. ∎

Corollary A.7.

The choice of bb does not affect the geometry of the projection πb,PH\pi^{\operatorname{H}}_{b,P}. More precisely, for any two base points b,bPb,b^{\prime}\not\in P, the horosphere projections

πb,PH:dPbandπb,PH:dPb\pi^{\operatorname{H}}_{b,P}:\mathbb{H}^{d}\to P_{b}\ \ \text{and}\ \ \pi^{\operatorname{H}}_{b^{\prime},P}:\mathbb{H}^{d}\to P_{b^{\prime}}

only differ by a rotation PbPbP_{b}\to P_{b^{\prime}} around PP.

Proof.

By Theorem A.5, when restricted to any page PxP_{x}, the maps πb,PH:dPb\pi^{\operatorname{H}}_{b,P}:\mathbb{H}^{d}\to P_{b} and πb,PH:dPb\pi^{\operatorname{H}}_{b^{\prime},P}:\mathbb{H}^{d}\to P_{b^{\prime}} are just rotations around PP. The difference between any two rotations around PP is another rotation around PP. ∎

In particular, A.7 implies Theorem 4.1: See 4.1

As discussed in Section 4, this theorem helps reduces parameters and simplifies the computation of πb,PH\pi^{\operatorname{H}}_{b,P}.

Remark A.8.

A.6 and A.7 together imply that the HoroPCA algorithm (5) only depends on the geodesic hulls GH(p1),GH(p1,p2),GH(p1,,pK)\operatorname{GH}(p_{1}),\operatorname{GH}(p_{1},p_{2}),\dots\operatorname{GH}(p_{1},\dots,p_{K}) of the ideal points and not the specific ideal points themselves. It follows that, theoretically, the search space of (5) has dimension dK12K(K+1)dK-\frac{1}{2}K(K+1) – the same as the dimension of the space of flags in Euclidean spaces.

In our implementation, for simplicity we parametrize the KK ideal points independently, which results in a suboptimal search space dimension (d1)K(d-1)K. Nevertheless, this is still slightly more efficient than the parametrizations used in PGA and BSA, which require have (d+1)K(d+1)K-dimensions.

The following corollaries say that horospherical projections share a nice property with Euclidean orthogonal projections: When projecting to a KK-dimensional submanifold, they preserve the distances along KK dimensions and collapse the distances along the other dKd-K orthogonal dimensions:

Corollary A.9.

πb,PH\pi^{\operatorname{H}}_{b,P} is invariant under rotations around PP. In other words, if a rotation around PP takes xx to yy then πb,PH(x)=πb,PH(y)\pi^{\operatorname{H}}_{b,P}(x)=\pi^{\operatorname{H}}_{b,P}(y).

Consequently, every xPx\not\in P belongs to a (dK)(d-K)-dimensional submanifold that is collapsed to a point by πb,PH\pi^{\operatorname{H}}_{b,P}.

Proof.

The open book interpretation tells us that πb,PH(x)\pi^{\operatorname{H}}_{b,P}(x) and πb,PH(y)\pi^{\operatorname{H}}_{b,P}(y) are precisely the intersections of PbP_{b} with S(x)S(x) and S(y)S(y), respectively. If yy belongs to the rotation orbit S(x)S(x) of xx then the rotation orbit S(y)S(y) of yy is the same as S(x)S(x). Thus πb,PH(x)=πb,PH(y)\pi^{\operatorname{H}}_{b,P}(x)=\pi^{\operatorname{H}}_{b,P}(y).

Hence, for every xdx\in\mathbb{H}^{d}, S(x)S(x) is collapsed to a point by πb,PH\pi^{\operatorname{H}}_{b,P}. To see that dimS(x)=dK\dim S(x)=d-K when xPx\not\in P, recall that by A.2, if Q(x)Q(x) is the orthogonal complement of PP at πPG(x)\pi^{\operatorname{G}}_{P}(x) then S(x)S(x) is a hypersphere inside Q(x)Q(x). Since the ideal points pjp_{j} are assumed to be “affinely independent,” we have dimP=K1\dim P=K-1, so dimQ(x)=d(K1)\dim Q(x)=d-(K-1), and dimS(x)=dimQ(x)1=dK\dim S(x)=\dim Q(x)-1=d-K. ∎

Corollary A.10.

For every xdx\in\mathbb{H}^{d}, there exists a KK-dimensional totally geodesic submanifold (with boundary) that contains xx and is mapped isometrically to PbP_{b} by πb,PH\pi^{\operatorname{H}}_{b,P}. If xPx\not\in P then such a manifold is unique.

Proof.

If xPx\not\in P then the submanifold PxP_{x} in Theorem A.5 is a geodesic submanifold that contains xx and is mapped isometrically to PbP_{b} by πb,PH\pi^{\operatorname{H}}_{b,P}. As in the proof of A.9, we have dimP=K1\dim P=K-1 and dimPx=dimP+1=K\dim P_{x}=\dim P+1=K.

Since A.9 implies that the other dKd-K dimensions are collapsed by πb,PH\pi^{\operatorname{H}}_{b,P}, no other distances from xx can be preserved. Thus, PxP_{x} is the unique submanifold with the desired properties.

If xPx\in P then xPyx\in P_{y} for every yPy\not\in P. Note that this means every distance from xx is preserved by πb,PH\pi^{\operatorname{H}}_{b,P}. ∎

The following corollaries say that, like Euclidean orthogonal projections, horospherical projections never increase distances. Thus, minimizing distortion is roughly equivalent to maximizing projected distances. This is another motivation for Eq. 4.

Corollary A.11.

For every xdx\in\mathbb{H}^{d} and every tangent vector v\vec{v} at xx,

πb,PH(v)v.\|\pi^{\operatorname{H}}_{b,P}(\vec{v})\|_{\mathbb{H}}\leq\|\vec{v}\|_{\mathbb{H}}.
Proof.

This follows from A.10 and A.9:

If xPx\in P then the proof of A.10 implies that πb,PH\pi^{\operatorname{H}}_{b,P} preserves every distance from xx. Thus, the desired inequality is actually an equality.

If xPx\not\in P then by v\vec{v} has an orthogonal decomposition v=u+u\vec{v}=\vec{u}+\vec{u}^{\perp}, where u\vec{u} and u\vec{u}^{\perp} are tangent and perpendicular to PxP_{x}, respectively. By A.10 and A.9, πb,PH\pi^{\operatorname{H}}_{b,P} preserves the length of u\vec{u} while collapsing u\vec{u}^{\perp} to 0. It follows that πb,PH(v)=uv\|\pi^{\operatorname{H}}_{b,P}(\vec{v})\|_{\mathbb{H}}=\|\vec{u}\|_{\mathbb{H}}\leq\|\vec{v}\|_{\mathbb{H}}. ∎

Corollary A.12.

πb,PH\pi^{\operatorname{H}}_{b,P} is non-expanding. In other words, for every x,ydx,y\in\mathbb{H}^{d},

d(πb,PH(x),πb,PH(y))d(x,y).d_{\mathbb{H}}(\pi^{\operatorname{H}}_{b,P}(x),\pi^{\operatorname{H}}_{b,P}(y))\leq d_{\mathbb{H}}(x,y).
Proof.

We first show that for any path γ(t)\gamma(t) in d\mathbb{H}^{d},

length(πb,PH(γ))length(γ).\operatorname{length}(\pi^{\operatorname{H}}_{b,P}(\gamma))\leq\operatorname{length}(\gamma).

Indeed, note that the velocity vector of πb,PH(γ)\pi^{\operatorname{H}}_{b,P}(\gamma) is precisely πb,PH(γ˙)\pi^{\operatorname{H}}_{b,P}(\dot{\gamma}). Then, by A.11,

length(πb,PH(γ))\displaystyle\operatorname{length}(\pi^{\operatorname{H}}_{b,P}(\gamma)) =πb,PH(γ˙(t))𝑑t\displaystyle=\int\|\pi^{\operatorname{H}}_{b,P}(\dot{\gamma}(t))\|_{\mathbb{H}}dt
γ˙(t)𝑑t=length(γ).\displaystyle\leq\int\|\dot{\gamma}(t)\|_{\mathbb{H}}dt=\operatorname{length}(\gamma).

Now for any x,ydx,y\in\mathbb{H}^{d}, let γ(t)\gamma(t) be the geodesic segment from xx to yy. Then πb,PH(γ)\pi^{\operatorname{H}}_{b,P}(\gamma) is a path connecting πb,PH(x)\pi^{\operatorname{H}}_{b,P}(x) and πb,PH(y)\pi^{\operatorname{H}}_{b,P}(y). Thus, the projected distance is at most the length of πb,PH(γ)\pi^{\operatorname{H}}_{b,P}(\gamma), which by the above argument is at most length(γ)=d(x,y).\operatorname{length(\gamma)}=d_{\mathbb{H}}(x,y).

A.3 Detour: A Review of the Hyperboloid Model

Theorem A.4 suggests that πb,PH(x)\pi^{\operatorname{H}}_{b,P}(x) can be computed in three steps:

  1. 1.

    Find the geodesic projection πPG(x)\pi^{\operatorname{G}}_{P}(x) of xx onto PP.

  2. 2.

    Find a geodesic ray α+\alpha^{+} on PbP_{b} that starts at πPG(x)\pi^{\operatorname{G}}_{P}(x) and is orthogonal to PP.

  3. 3.

    Return the unique point on α+\alpha^{+} that is of distance d(x,πPG(x))d_{\mathbb{H}}(x,\pi^{\operatorname{G}}_{P}(x)) from πPG(x)\pi^{\operatorname{G}}_{P}(x).

It turns out that these subroutines are easier to implement in the hyperboloid model instead of the Poincaré model of hyperbolic spaces. Thus, we first briefly review the basic definitions and properties of this model. The readers who are already familiar with the hyperboloid model can skip to Section A.4, where we describe the full algorithm.

For a more detailed treatment, see [31].

Remark A.13.

The above steps are slightly different from the ones mentioned in Section 3.2.2. However, by Theorem A.4, these two descriptions are equivalent. We decided to use the “closer-to-bb” description in Section 3.2.2 because it is slightly more self-contained, but the actual implementation will be based on the above three steps.

Minkowski spaces

We first describe Minkowski spaces, which is the embedding space where the hyperboloid model sits in as hypersurfaces.

The (d+1)(d+1)-dimensional Minkowski space 1,d\mathbb{R}^{1,d} is like the flat Euclidean space 1+d\mathbb{R}^{1+d} except with a dot product that has a negative sign in the first coordinate. More precisely, 1,d\mathbb{R}^{1,d} is the vector space 1+d\mathbb{R}^{1+d} equipped with the indefinite, non-degenerate bilinear form

B((t,x1,,xd),(u,y1,,yd))=tu+i=1dxiyi,B\left((t,x_{1},\dots,x_{d}),(u,y_{1},\dots,y_{d})\right)=-tu+\sum_{i=1}^{d}x_{i}y_{i},

which serves as the “dot product.”

Like with Euclidean spaces, the quantity B(v,v)B(\vec{v},\vec{v}) is called the (Minkowski) squared norm of the vector v\vec{v}. Two vectors u,v1,d\vec{u},\vec{v}\in\mathbb{R}^{1,d} are called orthogonal if B(u,v)=0B(\vec{u},\vec{v})=0. The orthogonal complement of a linear subspace V1,dV\subset\mathbb{R}^{1,d} is the set V={w1,d:B(w,v)=0 for every vV}V^{\perp}=\{\vec{w}\in\mathbb{R}^{1,d}:B(\vec{w},\vec{v})=0\text{ for every }\vec{v}\in V\}, which still has dimension dim1,ddimV\dim\mathbb{R}^{1,d}-\dim V. However, unlike in Euclidean spaces, vectors in 1,d\mathbb{R}^{1,d} can have negative or zero squared norms, and linear subspaces can intersect their orthogonal complements.

Types of vectors in 1,d\mathbb{R}^{1,d}

Vectors with negative, zero, and positive (Minkowski) squared norms are called time-like, light-like, and space-like, respectively. Time-like and light-like vectors together form a solid double cone in 1,d\mathbb{R}^{1,d}.

If v\vec{v} is a time-like or light-like vector, we call it future-pointing if its first coordinate is positive, otherwise we call it past-pointing. It follows from Cauchy-Schwarz inequality that.

Proposition A.14.

If u,v\vec{u},\vec{v} are future-pointing time-like or light-like vectors then B(u,v)<0B(\vec{u},\vec{v})<0.

A linear subspace VV of 1,d\mathbb{R}^{1,d} is called space-like if every non-zero vector in VV is space-like. In that case, the bilinear form B(,)B(\cdot,\cdot) restricts to a positive-definite bilinear form on VV, thus making it isometric to an Euclidean vector space. It follows from A.14 that

Proposition A.15.

The orthogonal complement of a time-like vector is a space-like linear subspace.

The hyperboloid model of hyperbolic spaces

We are now ready to introduce the hyperboloid model d\mathbb{H}^{d}. It sits inside 1,d\mathbb{R}^{1,d} in a similar way to how the unit sphere 𝕊d\mathbb{S}^{d} sits inside the Euclidean space 1+d\mathbb{R}^{1+d}.

Definition A.16.

The hyperboloid model of dd-dimensional hyperbolic spaces is the set d\mathbb{H}^{d} of future-pointing vectors in 1,d\mathbb{R}^{1,d} with Minkowsi squared norm 1-1.

Remark A.17.

For the rest of Appendix A, we will use d\mathbb{H}^{d} to denote this hyperboloid model as a subset of 1,d\mathbb{R}^{1,d}, and not the abstract hyperbolic space or its other models (e.g. Poincaré ball). This should not lead to any ambiguities because we will not work with any other model.

The following properties of d\mathbb{H}^{d} further illustrate the analogy with spheres in Euclidean spaces.

Proposition A.18.

For every xd\vec{x}\in\mathbb{H}^{d}, the tangent space TxdT_{\vec{x}}\mathbb{H}^{d} of d\mathbb{H}^{d} at x\vec{x} is (parallel to) the orthogonal complement of x\vec{x} in 1,d\mathbb{R}^{1,d}.

Proposition A.19.

When restricted to each tangent space of d\mathbb{H}^{d}, the bilinear form B(,)B(\cdot,\cdot) is positive definite. This defines a Riemannian metric on d\mathbb{H}^{d} which has constant curvature 1-1. The stereographic projection

(t,x1,,xd)(x11+t,,xd1+t)(t,x_{1},\dots,x_{d})\to\left(\frac{x_{1}}{1+t},\dots,\frac{x_{d}}{1+t}\right)

is an isometry between d\mathbb{H}^{d} and the Poincaré model. Its inverse map is given by

(y1,,yd)(1+iyi2,2y1,,2yd)1iyi2.(y_{1},\dots,y_{d})\to\frac{(1+\sum_{i}y_{i}^{2},2y_{1},\dots,2y_{d})}{1-\sum_{i}y_{i}^{2}}.
Proposition A.20.

Every kk-dimensional geodesic submanifold of d\mathbb{H}^{d} is the intersection of d\mathbb{H}^{d} with a (k+1)(k+1)-dimensional linear subspace of 1,d\mathbb{R}^{1,d}.

In the hyperboloid model, the ideal points of hyperbolic spaces are represented by light-like directions (instead of individual vectors):

Proposition A.21.

Each ideal point of d\mathbb{H}^{d} is represented by a 11-dimensional linear subspace spanned by some v1,d\vec{v}\in\mathbb{R}^{1,d} with B(v,v)=0B(\vec{v},\vec{v})=0. The map

(t,x1,,xd)(x1t,,xdt)(t,x_{1},\dots,x_{d})\to\left(\frac{x_{1}}{t},\dots,\frac{x_{d}}{t}\right)

gives a correspondence between a light-like vector v1,d\vec{v}\in\mathbb{R}^{1,d} and an ideal point p𝕊d1p\in\mathbb{S}_{\infty}^{d-1} in the Poincaré model that is represented by span(v)\operatorname{span}(\vec{v}). This correspondence is compatible with the stereographic projection in A.19. Its inverse map is given by

(y1,,yd)(1,y1,,yd).(y_{1},\dots,y_{d})\to(1,y_{1},\dots,y_{d}).

We conclude this section by noting that geodesic hulls in the hyperboloid model are closely related to linear spans in Minkowski space:

Proposition A.22.

Let SS be a set of vectors that are either in d\mathbb{H}^{d} or represent ideal directions of d\mathbb{H}^{d}. Then the geodesic hull of SS in d\mathbb{H}^{d} is the intersection of d\mathbb{H}^{d} with the linear span of SS.

In particular, since the spine PP in our setting (Theorem A.4) is the geodesic hull of some ideal points, it is cut out by the linear span of the corresponding ideal directions.

A.4 Computation of Horospherical Projections

Algorithm 1 Horospherical Projection
  Input: point xx, ideal points {p1,,pK}\{p_{1},\dots,p_{K}\}, base point bb.
  zπspan(p1,,pK)Mink(x)z\leftarrow\pi^{\operatorname{Mink}}_{\operatorname{span}(p_{1},\dots,p_{K})}(x) {orthogonal projection in ambient space}
  wz/zw\leftarrow z/\|z\| {rescale to a vector on the hyperboloid}
  ubwu\leftarrow b-w {subtraction in ambient space}
  uuπspan(p1,,pK)Mink(u)u\leftarrow u-\pi^{\operatorname{Mink}}_{\operatorname{span}(p_{1},\dots,p_{K})}(u) {orthogonal projection in ambient space}
  uu/uu\leftarrow u/\|u\| {make uu the unit tangent vector orthogonal to the spine PP}
  yexpw(d(x,w)u)y\leftarrow\exp_{w}(d_{\mathbb{H}}(x,w)\cdot u) {exponential map in d\mathbb{H}^{d}}
  
  return  yy

Now we describe how the three steps mentioned at the beginning of Section A.3 can be implemented in the hyperboloid model. The results of this section are summarized in Algorithm 1. To transfer back and forth between the hyperboloid and Poincaré models, we use the formulas in A.19 and A.21.

Step 1: Computing geodesic projections

We first describe how to compute the geodesic (or closest-point) projection from d\mathbb{H}^{d} to a geodesic submanifold PP in d\mathbb{H}^{d}. This process is very similar to how geodesic projections work in Euclidean spheres: We first perform an orthogonal projection onto the linear subspace VV that cuts out PP, then rescale the result to get a vector on d\mathbb{H}^{d}.

Generally, orthogonal projections in Minkowski spaces are very similar to those in Euclidean spaces. However, since vectors in 1,d\mathbb{R}^{1,d} can have norm zero, the orthogonal projection πVMink\pi^{\operatorname{Mink}}_{V} is not well-defined for every subspace VV. Thus, a little extra argument is needed.

Proposition A.23 (Orthogonal projections onto time-containing linear subspaces).

Let VV be a linear subspace of 1,d\mathbb{R}^{1,d} that contains some time-like vectors. Then

  1. 1.

    VV={0}V\cap V^{\perp}=\{0\}. Consequently, we can define a linear orthogonal projection πVMink:1,dV\pi^{\operatorname{Mink}}_{V}:\mathbb{R}^{1,d}\to V as follows:

    Since 1,d=VV\mathbb{R}^{1,d}=V\oplus V^{\perp}, every vector x1,d\vec{x}\in\mathbb{R}^{1,d} can be uniquely written as x=z+n\vec{x}=\vec{z}+\vec{n} for some zV\vec{z}\in V and nV\vec{n}\in V^{\perp}. Then, we let πVMink(x)z.\pi^{\operatorname{Mink}}_{V}(\vec{x})\coloneqq\vec{z}.

  2. 2.

    Let AA be a matrix whose column vectors form a linear basis of VV. Let BB be the (1+d)×(1+d)(1+d)\times(1+d) symmetric matrix associated to the bilinear form BB, i.e. the diagonal matrix with entries (1,1,1,1,,1)(-1,1,1,1,\dots,1). Then ABAA^{\top}BA is non-singular, and the linear projection πVMink\pi^{\operatorname{Mink}}_{V} is given by xA(ABA)1ABx\vec{x}\to A(A^{\top}BA)^{-1}A^{\top}B\vec{x}

  3. 3.

    If x\vec{x} is a future-pointing time-like vector then so is πVMink(x)\pi^{\operatorname{Mink}}_{V}(\vec{x}).

Proof.

  1. 1.

    Let v\vec{v} be a time-like vector in VV. Then v\vec{v}^{\perp} is space-like by A.15. Since VVVvV\cap V^{\perp}\subset V^{\perp}\subset\vec{v}^{\perp}, the intersection VVV\cap V^{\perp} must be space-like. On the other hand, for every wVV\vec{w}\in V\cap V^{\perp}, we have B(w,w)=0B(\vec{w},\vec{w})=0, so w\vec{w} must be light-like. It follows that VV={0}V\cap V^{\perp}=\{0\}. The 1,d=VV\mathbb{R}^{1,d}=V\oplus V^{\perp} part of the claim is a standard linear algebra fact.

  2. 2.

    This follows from the same argument that deduces the formula for Euclidean orthogonal projections.

  3. 3.

    To avoid cumbersome notations, let z=πVMink(x)\vec{z}=\pi^{\operatorname{Mink}}_{V}(\vec{x}). Then since xz\vec{x}-\vec{z} is orthogonal to VV and in particular to z\vec{z}, we have the Pythagorean formula

    B(z,z)+B(xz,xz)=B(x,x).B(\vec{z},\vec{z})+B(\vec{x}-\vec{z},\vec{x}-\vec{z})=B(\vec{x},\vec{x}).

    On the other hand, since xz\vec{x}-\vec{z} is orthogonal to VV and in particular orthogonal to some time-like vectors in VV, by A.15, it must be space-like. Thus, if B(x,x)<0B(\vec{x},\vec{x})<0 then the above equation implies B(z,z)<0B(\vec{z},\vec{z})<0, which means z\vec{z} is time-lile.

    Now either z\vec{z} or z-\vec{z} is future-pointing. In the latter case, since x\vec{x} is future-pointing, A.14 implies B(z,x)<0B(-\vec{z},\vec{x})<0. On the other hand, we have B(z,x)=B(z,z)<0B(\vec{z},\vec{x})=B(\vec{z},\vec{z})<0, contradicting the above inequality. Thus, z\vec{z} is future-pointing. ∎

Proposition A.24 (Geodesic projections in d\mathbb{H}^{d}).

Let PP be a geodesic submanifold of d\mathbb{H}^{d}. Recall that by A.20, P=dVP=\mathbb{H}^{d}\cap V for some linear subspace VV of 1,d\mathbb{R}^{1,d}. Then for every xd\vec{x}\in\mathbb{H}^{d}, the geodesic projection πPG(x)\pi^{\operatorname{G}}_{P}(\vec{x}) of x\vec{x} onto PP in d\mathbb{H}^{d} is given by

πPG(x)=zB(z,z),\pi^{\operatorname{G}}_{P}(\vec{x})=\frac{\vec{z}}{\sqrt{-B(\vec{z},\vec{z})}},

where z=πVMink(x)\vec{z}=\pi^{\operatorname{Mink}}_{V}(\vec{x}) is the linear orthogonal projection of x\vec{x} onto VV.

Proof.

Again, to avoid cumbersome notations, let

w=zB(z,z).\vec{w}=\frac{\vec{z}}{\sqrt{-B(\vec{z},\vec{z})}}.

We will show that w=πPG(x)\vec{w}=\pi^{\operatorname{G}}_{P}(\vec{x}). First, note that since z\vec{z} is a future-pointing time-like vector by A.23, wd\vec{w}\in\mathbb{H}^{d}. Since wV\vec{w}\in V, it follows that wP\vec{w}\in P.

Let WW be the linear span of w\vec{w} and x\vec{x}, so that WdW\cap\mathbb{H}^{d} is the geodesic γ\gamma in d\mathbb{H}^{d} that connects w\vec{w} and x\vec{x}. Note that w\vec{w} and xz\vec{x}-\vec{z} form a basis of WW. They are both orthogonal to the tangent space TwPT_{\vec{w}}P of PP at w\vec{w} because:

  • By A.18, w\vec{w} is orthogonal to every tangent vector of d\mathbb{H}^{d} at w\vec{w}.

  • TwPT_{\vec{w}}P is contained in VV, which is orthogonal to xz\vec{x}-\vec{z}.

Thus, WW is orthogonal to TwPT_{\vec{w}}P. It follows that the geodesic γ\gamma is orthogonal to PP, which means w=πPG(x)\vec{w}=\pi^{\operatorname{G}}_{P}(\vec{x}). ∎

Step 2: Finding orthogonal geodesic ray

Recall that if PP is a geodesic submanifold of d\mathbb{H}^{d} and bd\vec{b}\in\mathbb{H}^{d} is a point not in PP, then the geodesic hull MM of P{b}P\cup\{\vec{b}\} in d\mathbb{H}^{d} is a geodesic submanifold of d\mathbb{H}^{d} with dimension dimP+1\dim P+1. Thus, PP cuts MM into two halves, and we denote the half that contains b\vec{b} by PbP_{b}.

Given a point wP\vec{w}\in P, there exists a unique geodesic ray α+\alpha^{+} that starts at w\vec{w}, stays on PbP_{b}, and is orthogonal to PP. In this section, we describe how to compute α+\alpha^{+}.

Recall that by A.20, P=VdP=V\cap\mathbb{H}^{d} for some linear subspace VV.

Proposition A.25.

The vector

u=(bw)πVMink(bw)\vec{u}=(\vec{b}-\vec{w})-\pi^{\operatorname{Mink}}_{V}(\vec{b}-\vec{w})

is tangent to MM and orthogonal to PP at w\vec{w}. Furthermore, it points toward the side of PbP_{b}.

Proof.

By construction, u\vec{u} is orthogonal to VV. Note that VV contains both w\vec{w} and the tangent space TwPT_{\vec{w}}P of PP at w\vec{w}. Thus, u\vec{u} is orthogonal to both of them. Together with A.18, this implies that u\vec{u} is a tangent vector of d\mathbb{H}^{d} at w\vec{w} that is orthogonal to PP.

Next, note that MM is the intersection of d\mathbb{H}^{d} with the linear span of V{b}V\cup\{\vec{b}\}, and since b\vec{b} and w\vec{w} both belong to this linear span, so does u\vec{u}. Thus, since u\vec{u} is a tangent vector of d\mathbb{H}^{d} at w\vec{w}, it must be tangent to MM.

By construction, u\vec{u} points from w\vec{w} toward the side of b\vec{b} instead of away from it. More rigorously, note that by essentially the same argument as above,

a(bw)πwMink(bw)\vec{a}\coloneqq(\vec{b}-\vec{w})-\pi^{\operatorname{Mink}}_{\vec{w}}(\vec{b}-\vec{w})

is a tangent vector at w\vec{w} of the geodesic on d\mathbb{H}^{d} that goes from w\vec{w} to b\vec{b}. Also, note that

au=πVMink(bw)πwMink(bw).\vec{a}-\vec{u}=\pi^{\operatorname{Mink}}_{V}(\vec{b}-\vec{w})-\pi^{\operatorname{Mink}}_{\vec{w}}(\vec{b}-\vec{w}).

Since V=TwPspan(w)V=T_{\vec{w}}P\oplus\operatorname{span}(\vec{w}) is an orthogonal decomposition, this means

au=πTwPMink(bw),\vec{a}-\vec{u}=\pi^{\operatorname{Mink}}_{T_{\vec{w}}P}(\vec{b}-\vec{w}),

which in particular implies auTwP\vec{a}-\vec{u}\in T_{\vec{w}}P. It follows that u\vec{u} is the projection of a\vec{a} onto the orthogonal complement of TwPT_{\vec{w}}P in TwdT_{\vec{w}}\mathbb{H}^{d}.

In an Euclidean vector space, the dot product of any vector with its projection onto any direction is positive unless the projection is zero. Since TwdT_{\vec{w}}\mathbb{H}^{d} is a space-like subspace, this statement applies to a\vec{a} and u\vec{u}. Note that u\vec{u} cannot be zero because otherwise b\vec{b} would belong to VV and hence PP. Thus, we conclude that au>0\vec{a}\cdot\vec{u}>0, which means that u\vec{u} points toward the side of PbP_{b}. ∎

Thus, u\vec{u} is the tangent vector at w\vec{w} of the desired ray α+\alpha^{+}. To compute points on this ray, we can use the exponential map at w\vec{w}.

Step 3: The exponential map

Finally, given a distance dd and a tangent vector u\vec{u} at w\vec{w} of a geodesic ray α+\alpha^{+} in d\mathbb{H}^{d}, we need to compute the point on α+\alpha^{+} that is of hyperbolic distance dd from w\vec{w}. This is based on the following lemma:

Lemma A.26.

Suppose that u\vec{u} is a unit tangent vector of d\mathbb{H}^{d} at w\vec{w}. Then

α(t)=(cosht)w+(sinht)u\alpha(t)=(\cosh t)\vec{w}+(\sinh t)\vec{u}

is a unit-speed geodesic with α(0)=w\alpha(0)=\vec{w} and α˙(0)=u\dot{\alpha}(0)=\vec{u}.

Proof.

Note that w\vec{w} and u\vec{u} are orthogonal by A.18. The facts that α(0)=w\alpha(0)=\vec{w}, α˙(0)=u\dot{\alpha}(0)=\vec{u}, and that α(t)\alpha(t) is a unit-speed curve on d\mathbb{H}^{d} follow from simple, direct computations. To see that α(t)\alpha(t) is a geodesic, note that it is the intersection of d\mathbb{H}^{d} with the linear subspace span(w,u)\operatorname{span}(\vec{w},\vec{u}) and apply A.20. ∎

It follows that

Proposition A.27.

If u\vec{u} is the tangent vector at the starting point w\vec{w} of a geodesic ray α+\alpha^{+} in d\mathbb{H}^{d} then

(coshd)w+(sinhd)uB(u,u)(\cosh d)\vec{w}+(\sinh d)\frac{\vec{u}}{\sqrt{B(\vec{u},\vec{u})}}

is the point on α+\alpha^{+} that is of distance dd from w\vec{w}. ∎

From the results in this section, we conclude that Algorithm 1 computes the horospherical projection πb,PH(x)\pi^{\operatorname{H}}_{b,P}(x).

Appendix B Geodesic Projection: Distortion Analysis

We show that in hyperbolic geometry, the geodesic projection πMG\pi^{\operatorname{G}}_{M} (also called closest-point projection) to a geodesic submanifold MM always strictly decreases distances unless the inputs are already in MM; furthermore, all paths of distance at least rr from MM get shorter by at least coshr\cosh r times under the projection. Note that most ideas and results in this section already exist in the literature. For a comprehensive treatment, see [31].

This section is organized as follows. First, we use basic hyperbolic trigonometry to prove a bound on projected distances onto geodesics in dimension two (B.3). Then we generalize this bound to higher dimensions (Theorem B.4) and obtain an infinitesimal version of it (B.5). The main theorem (Theorem B.6) then follows immediately.

Hyperbolic trigonometry in two dimensions

We first recall the laws of sine and cosine for triangle in hyperbolic planes:

Proposition B.1.

Consider any triangle in 2\mathbb{H}^{2} with edge lengths A,B,CA,B,C. Let α,β,γ\alpha,\beta,\gamma be the angles at the vertices opposite to edges A,B,CA,B,C. Then

sinhAsinα=sinhBsinβ=sinhCsinγ,\frac{\sinh A}{\sin\alpha}=\frac{\sinh B}{\sin\beta}=\frac{\sinh C}{\sin\gamma},
cosγ=coshAcoshBcoshCsinhAsinhB.\cos\gamma=\frac{\cosh A\cosh B-\cosh C}{\sinh A\sinh B}.
Proof.

See [31], section 2.6. ∎

Proposition B.2.

Consider any quadrilateral in 2\mathbb{H}^{2} with vertices x,y,z,wx,y,z,w (in that order). Suppose that the angles at zz and ww are 9090 degrees. Then

cosh(xy)=\displaystyle\cosh(xy)= cosh(zw)cosh(xw)cosh(yz)\displaystyle\cosh(zw)\cosh(xw)\cosh(yz)
sinh(xw)sinh(yz).\displaystyle-\sinh(xw)\sinh(yz).

Here cosh(xy)\cosh(xy) is a shorthand for cosh(d(x,y))\cosh(d_{\mathbb{H}}(x,y)).

Proof.

Applying the laws of sine to triangle wxzwxz gives

sinh(xw)sin(zx,zw)=sinh(xz)sin(wx,wz)=sinh(xz),\frac{\sinh(xw)}{\sin\angle(zx,zw)}=\frac{\sinh(xz)}{\sin\angle(wx,wz)}=\sinh(xz),

so

sin(zx,zw)=sinh(xw)sinh(xz).\sin\angle(zx,zw)=\frac{\sinh(xw)}{\sinh(xz)}.

On the other hand, applying the law of cosine to triangle xyzxyz gives

cos(zx,zy)=cosh(zx)cosh(zy)cosh(xy)sinh(zx)sinh(zy).\cos\angle(zx,zy)=\frac{\cosh(zx)\cosh(zy)-\cosh(xy)}{\sinh(zx)\sinh(zy)}.

Now note that sin(zx,zw)=cos(zx,zy)\sin\angle(zx,zw)=\cos\angle(zx,zy). Thus,

sinh(xw)sinh(xz)=cosh(zx)cosh(zy)cosh(xy)sinh(zx)sinh(zy),\frac{\sinh(xw)}{\sinh(xz)}=\frac{\cosh(zx)\cosh(zy)-\cosh(xy)}{\sinh(zx)\sinh(zy)},

which can be simplified to

cosh(zx)cosh(zy)cosh(xy)=sinh(xw)sinh(zy).\cosh(zx)\cosh(zy)-\cosh(xy)=\sinh(xw)\sinh(zy).

Finally, note that the law of cosine for triangle wxzwxz gives

cosh(wx)cosh(wz)cosh(xz)sinh(wx)sinh(wz)=cos(wx,wz)=0,\frac{\cosh(wx)\cosh(wz)-\cosh(xz)}{\sinh(wx)\sinh(wz)}=\cos\angle(wx,wz)=0,

so cosh(xz)=cosh(wx)cosh(wz)\cosh(xz)=\cosh(wx)\cosh(wz). Plugging this into the above equation gives

cosh(wx)cosh(wz)cosh(zy)cosh(xy)\displaystyle\cosh(wx)\cosh(wz)\cosh(zy)-\cosh(xy)
=\displaystyle= sinh(xw)sinh(zy),\displaystyle\sinh(xw)\sinh(zy),

which can be arranged to get the desired identity. ∎

Now we are ready to prove a bound on the projected distances onto a geodesic in 2\mathbb{H}^{2}:

Proposition B.3.

Let LL be a geodesic in 2\mathbb{H}^{2}. Consider any x,y2x,y\in\mathbb{H}^{2} that are on the same side of LL and are of distances at least rr from LL. Let d=d(x,y)d=d_{\mathbb{H}}(x,y) and d=d(πLG(x),πLG(y))d^{\prime}=d_{\mathbb{H}}(\pi^{\operatorname{G}}_{L}(x),\pi^{\operatorname{G}}_{L}(y)) denote the original and projected distances of xx and yy. Then,

sinh(d/2)sinh(d/2)coshr.\sinh(d^{\prime}/2)\leq\frac{\sinh(d/2)}{\cosh r}.
Proof.

Let rx=d(x,πLG(x))r_{x}=d_{\mathbb{H}}(x,\pi^{\operatorname{G}}_{L}(x)) and ry=d(y,πLG(y))r_{y}=d_{\mathbb{H}}(y,\pi^{\operatorname{G}}_{L}(y)). Then by B.2,

coshd=coshdcoshrxcoshrysinhrxsinhry.\cosh d=\cosh d^{\prime}\cosh r_{x}\cosh r_{y}-\sinh r_{x}\sinh r_{y}.

Since coshacoshbsinhasinhb=cosh(ab)\cosh a\cosh b-\sinh a\sinh b=\cosh(a-b), this gives

coshd=(coshd1)coshrxcoshry+cosh(rxry).\cosh d=(\cosh d^{\prime}-1)\cosh r_{x}\cosh r_{y}+\cosh(r_{x}-r_{y}).

Subtracting 11 from both sides and using cosh(a)1=2sinh2(a/2)\cosh(a)-1=2\sinh^{2}(a/2) gives

2sinh2(d/2)\displaystyle 2\sinh^{2}(d/2)
=\displaystyle= 2sinh2(d/2)coshrxcoshry+2sinh2((rxry)/2).\displaystyle 2\sinh^{2}(d^{\prime}/2)\cosh r_{x}\cosh r_{y}+2\sinh^{2}((r_{x}-r_{y})/2).

Thus,

sinh2(d/2)sinh2(d/2)coshrxcoshry.\sinh^{2}(d/2)\geq\sinh^{2}(d^{\prime}/2)\cosh r_{x}\cosh r_{y}.

Since rx,ryrr_{x},r_{y}\geq r, this implies

sinh2(d/2)sinh2(d/2)cosh2r,\sinh^{2}(d/2)\geq\sinh^{2}(d^{\prime}/2)\cosh^{2}r,

which is equivalent to the desired inequality. ∎

General dimensions

Now we show that B.3 generalizes to higher dimensions.

Theorem B.4.

Let MdM\subset\mathbb{H}^{d} be a totally geodesic submanifold. Consider any x,ydx,y\in\mathbb{H}^{d} that are of distances at least rr from MM. Let d=d(x,y)d=d_{\mathbb{H}}(x,y) and d=d(πMG(x),πMG(y))d^{\prime}=d_{\mathbb{H}}(\pi^{\operatorname{G}}_{M}(x),\pi^{\operatorname{G}}_{M}(y)) denote the original and projected distances of xx and yy. Then,

sinh(d/2)sinh(d/2)coshr.\sinh(d^{\prime}/2)\leq\frac{\sinh(d/2)}{\cosh r}.
Proof.

This is clearly true if πMG(x)=πMG(y)\pi^{\operatorname{G}}_{M}(x)=\pi^{\operatorname{G}}_{M}(y) or if both xx and yy belong to MM. Thus, without loss of generality, we can assume that πMG(x)πMG(y)\pi^{\operatorname{G}}_{M}(x)\neq\pi^{\operatorname{G}}_{M}(y) and πMG(x)x\pi^{\operatorname{G}}_{M}(x)\neq x.

Let LML\subset M be the geodesic joining πMG(x)\pi^{\operatorname{G}}_{M}(x) and πMG(y)\pi^{\operatorname{G}}_{M}(y), so that d(x,M)=d(x,L)d_{\mathbb{H}}(x,M)=d_{\mathbb{H}}(x,L) and d(y,M)=d(y,L)d_{\mathbb{H}}(y,M)=d_{\mathbb{H}}(y,L). Let LxL_{x} be the half-plane of GH(L{x})\operatorname{GH}(L\cup\{x\}) that is bounded by LL and contains xx. Note that if we rotate yy around LL until it hits LxL_{x} at a point y¯\bar{y} then

  1. (i)

    πLG(y¯)=πLG(y)\pi^{\operatorname{G}}_{L}(\bar{y})=\pi^{\operatorname{G}}_{L}(y),

  2. (ii)

    d(y¯,L)=d(y,L)d_{\mathbb{H}}(\bar{y},L)=d_{\mathbb{H}}(y,L),

  3. (iii)

    πx,LH(y)=y¯\pi^{\operatorname{H}}_{x,L}(y)=\bar{y} by the “open book interpretation” of horospherical projections (see the beginning of Section A.2).

Thus, using (i), (ii) and applying B.3 to L,x,y¯L,x,\bar{y} gives,

sinh(d/2)sinh(d(x,y¯)/2)coshr.\sinh(d^{\prime}/2)\leq\frac{\sinh(d_{\mathbb{H}}(x,\bar{y})/2)}{\cosh r}.

By (iii) and A.12,

d(x,y¯)d(x,y).d_{\mathbb{H}}(x,\bar{y})\leq d_{\mathbb{H}}(x,y).

Combining these two inequalities gives the desired inequality. ∎

Since sinh(t)t\sinh(t)\approx t as t0t\to 0, applying Theorem B.4 as yxy\to x gives

Corollary B.5.

Under the geodesic projection πMG()\pi^{\operatorname{G}}_{M}(\cdot), every tangent vector v\vec{v} at xx gets at least cosh(d(x,M))\cosh(d_{\mathbb{H}}(x,M)) times shorter.

This implies the main theorem of this section:

Theorem B.6.

Let γ(t)\gamma(t) be any smooth path in d\mathbb{H}^{d} that stays a distance at least rr away from MM. Then

length(πMG(γ))length(γ)coshr.\operatorname{length}(\pi^{\operatorname{G}}_{M}(\gamma))\leq\frac{\operatorname{length}(\gamma)}{\cosh r}.

In particular, length(πMG(γ))<length(γ)\operatorname{length}(\pi^{\operatorname{G}}_{M}(\gamma))<\operatorname{length}(\gamma) unless γ\gamma is entirely in MM.

Proof.

Note that the velocity vector of πMG(γ)\pi^{\operatorname{G}}_{M}(\gamma) is precisely πMG(γ˙)\pi^{\operatorname{G}}_{M}(\dot{\gamma}). Then, by B.5,

length(πMG(γ))\displaystyle\operatorname{length}(\pi^{\operatorname{G}}_{M}(\gamma)) =πMG(γ˙(t))𝑑t\displaystyle=\int\|\pi^{\operatorname{G}}_{M}(\dot{\gamma}(t))\|_{\mathbb{H}}dt
γ˙(t)coshr𝑑t=length(γ)coshr.\displaystyle\leq\int\frac{\|\dot{\gamma}(t)\|_{\mathbb{H}}}{\cosh r}dt=\frac{\operatorname{length}(\gamma)}{\cosh r}.

Since coshr1\cosh r\geq 1, this implies

length(πMG(γ))length(γ).\operatorname{length}(\pi^{\operatorname{G}}_{M}(\gamma))\leq\operatorname{length}(\gamma).

This can only be an equality if the coshr\cosh r factor is constantly 11, which means γ\gamma stays in MM. ∎

When γ\gamma is a geodesic, this gives a proof of 3.5: See 3.5

Remark B.7.

The above proposition does not imply that if xx and yy are of distances at least rr from MM then d(x,y)d_{\mathbb{H}}(x,y) gets at least coshr\cosh r times shorter under the geodesic projection. This is because the geodesic segment between xx and yy can get closer to MM in the middle. This is inevitable when xx and yy are extremely far apart: For example, if d(x,M)=d(y,M)=rd_{\mathbb{H}}(x,M)=d_{\mathbb{H}}(y,M)=r, then the triangle inequality implies

d(πMG(x),πMG(y))d(x,y)2r.d_{\mathbb{H}}(\pi^{\operatorname{G}}_{M}(x),\pi^{\operatorname{G}}_{M}(y))\geq d_{\mathbb{H}}(x,y)-2r.

Hence, if d(x,y)rd_{\mathbb{H}}(x,y)\gg r then the shrink factor cannot be much smaller than 11. Generally, the rule of thumb is that geodesic projections shrink distances by a big factor if the input points are not very far from each other but quite far from the target submanifold.

Sarkar PyTorch
dimension 22 1010 5050 22 1010 5050
Balanced Tree 0.088 0.136 0.152 0.097 0.022 0.022
Phylo Tree 0.639 0.639 0.638 0.292 0.286 0.284
Diseases 0.247 0.212 0.210 0.126 0.054 0.055
CS Ph.D. 0.388 0.384 0.380 0.189 0.140 0.139
Table 4: Average distortion for embeddings learned with different methods. Sarkar refers to combinatorial embeddings learned with Sarkar’s construction [28, 27] while PyTorch refers to embeddings learned using the PyTorch code from [13].

Appendix C Additional Experimental Details

We first provide more details about our experimental setup, including a description of how we implemented different methods and then provide additional experimental results.


Balanced Tree Phylo Tree Diseases CS Ph.D.
distortion (\downarrow) variance (\uparrow) distortion (\downarrow) variance (\uparrow) distortion (\downarrow) variance (\uparrow) distortion (\downarrow) variance (\uparrow)
PCA 0.91 1.09 0.99 0.28 0.95 0.75 0.96 1.25
tPCA 0.83 4.01 0.27 62.54 0.64 35.45 0.55 86.10
PGA 0.67 ±\pm 0.013 11.15 ±\pm 0.941 0.08 ±\pm 0.001 1110.30 ±\pm 0.577 0.55 ±\pm 0.006 56.82 ±\pm 0.493 0.25 ±\pm 0.002 270.28 ±\pm 0.487
BSA 0.58 ±\pm 0.018 16.74 ±\pm 1.278 0.08 ±\pm 0.000 1110.05 ±\pm 0.130 0.54 ±\pm 0.008 57.73 ±\pm 0.486 0.25 ±\pm 0.001 270.98 ±\pm 0.363
hMDS 0.12 58.62 0.08 498.05 0.32 568.22 0.18 1241.34
HoroPCA 0.06 ±\pm 0.010 62.72 ±\pm 0.924 0.01 ±\pm 0.000 1190.27 ±\pm 2.974 0.09 ±\pm 0.010 161.32 ±\pm 1.185 0.05 ±\pm 0.004 994.69 ±\pm 321.359
Table 5: Dimensionality reduction results on 10-dimensional combinatorial embeddings (Sarkar’s construction) reduced to two dimensions. Results are averaged over 5 runs for non-deterministic methods. Best in bold and second best underlined.

C.1 Datasets

Embedding

The datasets we use are structured as graphs with nodes and edges. We map these graphs to the hyperbolic space using different embedding methods. The datasets from [27] can be found in the open-source implementation777https://github.com/HazyResearch/hyperbolics and we compute hyperbolic embeddings for different dimensions using the PyTorch code from [13]. We also consider embeddings computed with Sarkar’s combinatorial construction [28] and report the average distortion measure with respect to the original graph distances in Table 4.

For classification experiments, we follow the experimental protocol in [8] and use the embeddings provided in the open-source implementation.888https://github.com/hhcho/hyplinear

Centering

For simplicity, we center the input embeddings so that they have a Fréchet mean of zero. This makes computations of projections more efficient. To do so, we compute the Fréchet mean of input hyperbolic points using gradient-descent, and then apply an isometric hyperbolic reflection that maps the mean to the origin.999This reflection can be computed in closed-form using circle inversions, see for instance Section 2 in [27].

C.2 Implementation Details in the Poincaré Model

Section A.4 (in particular, A.24 and Algorithm 1) describes how geodesic projections and horospherical projections can be computed efficiently in the hyperboloid model. However, because the Poincaré ball model is useful for visualizations and is popular in the ML literature, here we also give high-level descriptions of simple alternative methods to compute these projections directly in the Poincaré model.

This subsection is organized as follows. We first describe an implementation of geodesic projections in the Poincaré model (Section C.2.1). We then describe our implementation of all baseline methods, which rely on geodesic projections (Section C.2.2). Finally, we describe how HoroPCA could also be implemented in the Poincaré model (Section C.2.3).

C.2.1 Computing Geodesic Projections

Recall that geodesic projections map points to a target submanifold such that the projection of a point is the point on the submanifold that is closest to it. In the Poincaré model, when the target submanifold goes through the origin, geodesic projections can be computed efficiently as follows.

Consider any geodesic submanifold PP that contains the origin in the Poincaré Ball: this must be a linear space since any geodesic that goes through the origin in this model is a straight-line. The Euclidean reflection with respect to PP is a hyperbolic isometry (i.e. preserves distances). Given any point xx, we can compute its Euclidean (and hyperbolic) reflection rP(x)r_{P}(x) with respect to PP. Then, the (hyperbolic geometry) midpoint between xx and rP(x)r_{P}(x) belongs to PP and is the geodesic projection of xx onto PP. There is a closed-form expression for this midpoint, which can be derived using Mobïus operations.

C.2.2 Implementation of Baselines

We now detail our baseline implementation.

PCA and tPCA

We used the Singular Value Decomposition (SVD) PyTorch implementation to implement both PCA and tPCA. Before using the SVD algorithm, tPCA maps points to the tangent space at the Frćhet mean using the logarithmic map. Having centered the data, this mapping can be done using the logarithmic map at the origin which is simply:

log𝐨(x)=tanh1(x)xx.\log_{\mathbf{o}}(x)=\operatorname{tanh^{-1}}(||x||)\frac{x}{||x||}.
PGA and BSA

Both PGA and BSA rely on geodesic projections. When the data is centered, the target submanifolds in BSA and PGA are simply linear spaces going through the origin. We can therefore compute the geodesic projections in these methods using the computations described above. To test their sensitivity to the base point, we also implemented two baselines that perturb the base point, by adding Gaussian noise.

hMDS

To implement hMDS, we simply implemented the formulas in the original paper (see Algorithm 2 in [27]).

hAE

For hyperbolic autoencoders, we used two fully-connected layers following the approach of [12]. One layer was used for encoding into the reduced number of dimensions, and one layer was used for decoding into the original number of dimensions. We trained these networks by minimizing the reconstruction error, and used the intermediate hidden representations as low-dimensional representations.

C.2.3 HoroPCA Implementation

The definition of horospherical projections onto K>1K>1 directions involves taking the intersection of KK horospheres S(p1,x)S(pK,x)S(p_{1},x)\cap\dots\cap S(p_{K},x) (Section 3.2.2). Although Section A.4 shows that horospherical projections can be computed efficiently in the hyperboloid model without actually computing these intersections, it is also possible to implement HoroPCA by directly computing these intersections. This can be done in a simple iterative fashion.

For example, in the Poincaré ball, horospheres can be represented as Euclidean spheres. Note that the intersection of two dd-dimensional Euclidean hyperspheres S(p0,r0)S(p_{0},r_{0}) and S(p1,r1)S(p_{1},r_{1}) is a d1d-1-dimensional hypersphere which can be represented by a center, a radius, and d1d-1-dimensional subspace. The radius is uniquely determined by the radii of the original spheres r0,r1r_{0},r_{1} and the distance between their centers p0p1\|p_{0}-p_{1}\| and analyzing the 2-dimensional case, for which there is a simple closed formula. The center can similarly be found as a weighted combination of the centers p0,p1p_{0},p_{1} based on the formula for the 2D case. Lastly, this d1d-1-dimensional subspace can be represented by noting it is the orthogonal space to the vector p1p0p_{1}-p_{0}. Thus the intersection of these dd-dimensional hyperspheres can be easily computed, and this process can be iterated to find the intersection of KK spheres S(pi,ri)S(p_{i},r_{i}).

Refer to caption
Figure 6: Histogram of average distortion computed for the projections of 1000 random hyperbolic points onto 100 random directions. Best in bold and second best underlined.

C.3 Additional Experimental Results

C.3.1 Distortion Analysis

We first analyze the average distortion incurred by horospherical and geodesic projections on a toy example with synthetically-generated data. We generate 1000 points the Poincaré disk by sampling tangent vectors from a multivariate Gaussian, and mapping them to the disk using the exponential map at the origin. We then sample 100 random ideal points (directions) in the disk, and consider the corresponding straight-line geodesics pointing towards these directions. We project the datapoints onto these geodesics and measure the average distortion from before and after projection and visualize the results in a histogram (Fig. 6). As we can see, Horospherical projections achieve much lower distortion than geodesic projections on average, suggesting that these projections may better preserve information such as distances in higher-dimensional datasets.

C.3.2 Dimensionality Reduction Results

Sarkar embeddings

Table 2 showed dimensionality reduction results for embeddings learned with optimization. Here, we consider the same reduction experiment on combinatorial embeddings and report the results in Table 5. The results confirm the trends observed in Table 2: HoroPCA outperforms baseline methods, with significant improvements on distance preservation.

More dimension/component configurations

We consider the reduction of 50-dimensional PyTorch embeddings of the Diseases and CS Ph.D. datasets. We plot average distortion and explained Fréchet variance for different number of components in Fig. 7. HoroPCA significantly outperforms all previous generalizations of PCA. HoroPCA also outperforms hMDS, which is a competitive baseline, but not a PCA method as discussed before.

Refer to caption
(a) Diseases average distortion.
Refer to caption
(b) Diseases explained Fréchet variance.
Refer to caption
(c) CS Ph.D. average distortion.
Refer to caption
(d) CS Ph.D. explained Fréchet variance.
Figure 7: Average distortion and explained Fréchet variance for different numbers of principal components.
Refer to caption
(a) PCA (average distortion: 0.942)
Refer to caption
(b) PGA (average distortion: 0.534)
Refer to caption
(c) BSA (average distortion: 0.532)
Refer to caption
(d) HoroPCA (average distortion: 0.078)
Figure 8: Visualization of embeddings of the WordNet mammal subtree computed by reducing 10-dimensional Poincaré embeddings [25].