This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

11footnotetext: Department of Statistics and Operations Research, University of North Carolina at Chapel Hill (USA).22footnotetext: Department of Statistics, Carlos III University of Madrid (Spain).33footnotetext: Corresponding author. e-mail: pavlos@live.unc.edu.

Scaled torus principal component analysis

Pavlos Zoubouloglou1,3, Eduardo García-Portugués2, and J. S. Marron1
Abstract

A particularly challenging context for dimensionality reduction is multivariate circular data, i.e., data supported on a torus. Such kind of data appears, e.g., in the analysis of various phenomena in ecology and astronomy, as well as in molecular structures. This paper introduces Scaled Torus Principal Component Analysis (ST-PCA), a novel approach to perform dimensionality reduction with toroidal data. ST-PCA finds a data-driven map from a torus to a sphere of the same dimension and a certain radius. The map is constructed with multidimensional scaling to minimize the discrepancy between pairwise geodesic distances in both spaces. ST-PCA then resorts to principal nested spheres to obtain a nested sequence of subspheres that best fits the data, which can afterwards be inverted back to the torus. Numerical experiments illustrate how ST-PCA can be used to achieve meaningful dimensionality reduction on low-dimensional torii, particularly with the purpose of clusters separation, while two data applications in astronomy (three-dimensional torus) and molecular biology (on a seven-dimensional torus) show that ST-PCA outperforms existing methods for the investigated datasets.

Keywords: Dimension reduction; Directional statistics; Multidimensional scaling; Principal component analysis; Statistics on manifolds.

1 Introduction

Recent advances in the way information is collected have brought along the need to analyze data of a non-Euclidean nature. Important among these are multivariate circular data, or toroidal data, which are vectors of angles that are naturally represented on the torus 𝕋d:=[π,π)d\mathbb{T}^{d}:=[-\pi,\pi)^{d}, d1d\geq 1, where π-\pi and π\pi are identified. An important instance of toroidal data happens in molecular biology, where vectors of torsion angles characterize protein and RNA structures (Ramachandran et al.,, 1963; Duarte and Pyle,, 1998). A different source for data on 𝕋d\mathbb{T}^{d} is environmental sciences, with planar directions being key constituents of wind and sea currents vector fields; see Ducharme et al., (2012) for an application in the latter context. Another popular instance of non-Euclidean data is spherical data, i.e., data on 𝕊d:={𝒙d+1:𝒙=1}\mathbb{S}^{d}:=\{\bm{x}\in\mathbb{R}^{d+1}:\|\bm{x}\|=1\} with d=2d=2. Among other fields, spherical data plays a relevant role in astronomy; see Jupp et al., (2003) for a statistical analysis of the normal vectors associated to long-period comet orbits. Pewsey and García-Portugués, (2021) and Marron and Alonso, (2014) provide recent surveys of existing statistical methodology for data on spheres and torii, and for other types of non-Euclidean spaces, respectively.

Principal Component Analysis (PCA) is arguably the most popular multivariate technique, its adaptation to non-Euclidean data being the subject of considerable research in the last decades. A first step towards identifying modes of variation for manifold-valued data were the tangent plane-based methods, which seek to utilize the locally Euclidean property of manifolds. An important representative of them is the Principal Geodesic Analysis (PGA) of Fletcher et al., (2004). Recall that (classic) Euclidean PCA builds the directions of maximal projected variation starting at the sample mean. PGA parallels this by starting with the Fréchet (sample) mean and finding the geodesic through that mean on which the projected variation in the tangent plane is maximal. Thus the first principal geodesic is obtained and the remaining modes of variation are obtained sequentially, by imposing orthogonality on geodesics at the Fréchet mean. Huckemann and Ziezold, (2006) showed that the constraint of geodesics passing through that mean seriously limited the flexibility of PGA and proposed Geodesic-PCA, which removes precisely that constraint. Later, Huckemann et al., (2010) proposed generalized geodesics and orthogonal projections on quotient spaces, and used these as components for a PCA on certain Riemannian manifolds.

In the context of skeletal models for object shape representations, modes of variation on the sphere that follow non-great subspheres (i.e., non-geodesics) can provide more effective dimensionality reduction, as noted, e.g., in Pizer and Marron, (2017) and Pizer et al., (2020). This motivated the principal arc analysis of Jung et al., (2011) and its generalization on 𝕊d\mathbb{S}^{d}, d2d\geq 2, Jung et al., (2012)’s Principal Nested Spheres (PNS). PNS exploits the fact that intersecting 𝕊d\mathbb{S}^{d} with a hyperplane of its Euclidean embedding yields a subsphere isomorphic to 𝕊d1\mathbb{S}^{d-1} which is not necessarily a great subsphere. Therefore, PNS offers a richer family of possible modes of variation on 𝕊d\mathbb{S}^{d} than manifold-generic methods for dimensionality reduction, that are focused on great subspheres only. This provided evidence for the utility of exploiting the specificity of the data support in building a more informative dimension-reduction analysis. PNS features an iterative process that yields a sequence of best-fitting subspheres of decreasing dimensions to 𝕊d\mathbb{S}^{d}. This motivated the concept of “backwards analogues” of PCA, as discussed in Damon and Marron, (2014): for a dd-dimensional manifold, a (d1)(d-1)-dimensional best-fitting submanifold is identified, and then this process is iterated, in a nested fashion, until a one-dimensional mode of variation has been obtained. Under fairly general assumptions, Huckemann and Eltzner, (2018) proved that backwards component sequences (including PNS) satisfy asymptotic results, such as strong consistency and asymptotic normality. Yet both forward and backward analogues of PCA are greedy algorithms, which do not necessarily guarantee a global optimum in the sequence of proportion of explained variance. In response to this point, Pennec, (2018) developed barycentric subspace analysis, which introduces a nested (backwards or forward) sequence of submanifolds that is defined as the locus of points that are manifold-affine spans of a set of d+1d+1 points in a dd-dimensional manifold (great subspheres for 𝕊d\mathbb{S}^{d}).

Despite its apparent simplicity, the torus is a particularly challenging manifold for dimensionality reduction. Tangent plane methods disregard long-range periodicity and tend to underperform, unless the data are fairly closely distributed. An attempt to adapt geodesic-based methods on the torus is torus principal geodesic analysis by Nodehi et al., (2015). Yet, as noted in Eltzner et al., (2018), geodesics on the torus face the problem of producing space-filling curves almost surely by winding around indefinitely. For that reason, existing methods on PCA adaptations for toroidal data can be classified into those relying either on (a) imposing a certain distributional model on the data or (b) transforming the data to a space where dimension-reduction tools are already consolidated. With respect to approach (a), one can find Kent and Mardia, (2009)’s approach of PCA on the covariance matrix of a wrapped normal model on the torus. A recent approach proposed by Nodehi et al., (2020) extends probabilistic PCA to Torus Probabilistic PCA (TPPCA), where the toroidal data are expressed as a latent variable model modulo 2π2\pi, while assuming Gaussianity for the latent variables. The parameters of the model are then to be estimated by an Expectation–Maximization algorithm. With regard to approach (b), typically the arrival space has been set as the Euclidean space or variations thereof. For example, Mu et al., (2005) proposed dihedral PCA (dPCA), which leverages the usual embedding 𝕋d[1,1]2d\mathbb{T}^{d}\rightarrow[-1,1]^{2d} via sines and cosines, and then performs PCA on 2d\mathbb{R}^{2d}, a drawback being that ordinary PCA ignores the fact that the data lie on a curved dd-dimensional manifold within [1,1]2d[-1,1]^{2d}. A similar transformation is that of Altis et al., (2007), using the complex representation of the angles. Riccardi et al., (2009) propose angular PCA (aPCA), where they first center the toroidal data and then proceed with usual PCA. Sittel et al., (2017) modified aPCA to dPCA+, a variant that attempts to minimize the distortion by shifting the “boundaries” of 𝕋d\mathbb{T}^{d} to the lowest density region.

Still within (b), but differently to the previous approaches, Sargsyan et al., (2012)’s Geodesic PCA deforms 𝕋d\mathbb{T}^{d} to 𝕊d\mathbb{S}^{d}, then performs dimensionality reduction in the latter space by PGA. Nevertheless, this transformation directly uses the angles in 𝕋d\mathbb{T}^{d} as the dd angles that determine the hyperspherical coordinates of 𝕊d\mathbb{S}^{d}, with the consequent appearance of singularities (e.g., when d=2d=2 the two maximally-separated points (±π/2,±π/2)(\pm\pi/2,\pm\pi/2) on 𝕋2\mathbb{T}^{2} are mapped to the same point on 𝕊2\mathbb{S}^{2}). Perhaps the most successful and advanced existing method for PCA on the torus is Eltzner et al., (2018)’s Torus PCA (T-PCA), where 𝕋d\mathbb{T}^{d} is also deformed to 𝕊d\mathbb{S}^{d}, now with a more sophisticated transformation, and PNS is then applied. However, the deformation in T-PCA is not invariant to permutations of the angular variables, which introduce a source of subjectivity in the analysis. To alleviate this issue, the authors propose two data-driven orderings of variables, SI ordering and SO ordering, corresponding to sorting the variables in terms of descending and increasing amount of circular spread, respectively. Yet the practitioner has to choose one ordering, the two possibly yielding significantly different scores that may affect subsequent analyses (as evidenced in Figure 6). Furthermore, the deformation to 𝕊d\mathbb{S}^{d}, although designed to cut open the torus at a point with the minimum distortion, may induce relevant artifacts to certain datasets. These deformation artifacts can create spurious cluster structures in the scores of datasets with non-existent cluster structures (see Figure 5). A final limitation of T-PCA, presented in Remark 2.3 of Eltzner et al., (2018), is that T-PCA is applicable whenever there is a structural data gap in all angles except for at most two.

Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Figure 1: Scores (scatterplots (a) and (b)) and modes of variation (panel (c)) for the sunspots data; panels (a) and (b) show the first (horizontal axis) and second (vertical axis) scores for PCA and ST-PCA, respectively. In panel (c), the original data is shown with black points, while the T-PCA and ST-PCA first modes of variation are represented as red and blue solid lines, respectively. The figure illustrates that ST-PCA gives a more efficient mode of variation than standard PCA and T-PCA.

In this paper we propose a toroidal PCA that is based on embedding the sample on 𝕋d\mathbb{T}^{d} into 𝕊d\mathbb{S}^{d} by using a data-driven map aimed at preserving the pairwise geodesic distances of the data. Our dimension-reduction method is summarized in three steps. First, we transform the original sample on 𝕋d\mathbb{T}^{d} to a similar configuration of points on a dd-dimensional sphere by using Spherical MultiDimensional Scaling (SMDS). SMDS is an optimization procedure designed to minimize the squared differences between the sample pairwise geodesic distances on 𝕋d\mathbb{T}^{d} and the corresponding pairwise geodesic distances of the transformed sample on a dd-sphere. The radius of the arrival dd-sphere is also chosen automatically in a further attempt to minimize distortions. Second, we apply PNS to the transformed data to find the sequence of best-fitting subspheres and the scores associated to them. Finally, we show how to optionally invert the SMDS map, in a step usually referred to as prediction, to produce a low-dimensional representation of the data in the original torus via the inverted PNS scores. Due to the multidimensional scaling operation, we henceforth refer to our method as Scaled Torus PCA (ST-PCA). An illustration of the adequacy of ST-PCA for data analysis is advanced in Figure 1. Figure 1c shows astronomical data on the appearance of sunspots, which lie on 𝕋3\mathbb{T}^{3} in a roughly diagonal pattern, and are later described in detail in Section 5.1. Figure 1a displays the scores from standard PCA, which disregards the periodicity of the data, while Figure 1b shows that ST-PCA gives more sensible scores. Figure 1c shows that while T-PCA honors the periodicity of the data, it yields a less efficient mode of variation (red curve) when compared to ST-PCA (blue curve). In this case study, and among other competitors, ST-PCA performs the best in the sense of preserving data fidelity and explaining the most variance in the first principal component. A second case study to an RNA dataset in 𝕋7\mathbb{T}^{7} that has lately been used as benchmark in the literature also evidences the adequacy of ST-PCA among other existing alternatives, in the sense of correctly clustering the largest amount of points based on the first principal component scores. We evidence also the ability of ST-PCA to perform dimension reduction and identifying clusters in the data with numerical experiments on different data patterns on 𝕋2\mathbb{T}^{2} and 𝕋3\mathbb{T}^{3}.

The rest of this paper is organized as follows. Section 2 explores the links between torii and spheres and provides a review of multidimensional scaling, with discussion on relevant literature. In Section 3 we develop ST-PCA in full detail and discuss the subtleties on its implementation. A simulation study on clustered data is provided in Section 4 to illustrate the strengths of ST-PCA. We apply ST-PCA to two real datasets in Section 5, evidencing the benefits of ST-PCA with respect to available alternatives. The paper concludes with a discussion in Section 6.

2 Background

The first subsection introduces some required notions from geometry and provide justification on why the sphere is a more natural arrival space than the Euclidean space. The reader who is curious about further relevant geometric concepts is referred to Huckemann et al., (2010) for a concise summary that is very accessible to statisticians. The second subsection provides details on classical Euclidean MDS and its spherical variant. An overview of related work is provided.

2.1 Some geometrical considerations

Traditional data embeddings into metric spaces take place in low-dimensional Euclidean spaces, as these allow for easiest visualization and inference. Results for isometric embeddings to the Euclidean space are due to the Nash embedding theorem for Riemannian manifolds and Morgan, (1974) for arbitrary metric spaces. Recent advances in machine learning and computer graphics have popularized the idea of embedding data that lie in inherently curved spaces into lower-dimensional non-Euclidean spaces, as these spaces may reduce the distortion of the embedding.

In our setting, the dd-sphere of radius rr and the dd-torus are specially relevant. These spaces are respectively defined as 𝕊rd:={𝒙d+1:𝒙=r}\mathbb{S}^{d}_{r}:=\{\bm{x}\in\mathbb{R}^{d+1}:\|\bm{x}\|=r\}, where \|\cdot\| is the Euclidean norm, and 𝕋d:=(𝕊1)d:=𝕊1×d×𝕊1\mathbb{T}^{d}:=\allowbreak(\mathbb{S}^{1})^{d}:=\mathbb{S}^{1}\times\stackrel{{\scriptstyle d}}{{\cdots}}\times\mathbb{S}^{1} or, equivalently and as previously introduced, 𝕋d:=[π,π)d\mathbb{T}^{d}:=[-\pi,\pi)^{d} with π-\pi and π\pi identified, since 𝕊1\mathbb{S}^{1} can also be thought of as the quotient space /(2π)\mathbb{R}/(2\pi\mathbb{Z}). For simplicity of notation, we denote 𝕊1d\mathbb{S}^{d}_{1} by 𝕊d\mathbb{S}^{d}. Geodesic distances on 𝕊rd\mathbb{S}^{d}_{r} are given by

δ𝕊rd(𝒙,𝒚)=rarccos(𝒙𝒚/r2),𝒙,𝒚𝕊rd.\displaystyle\delta_{\mathbb{S}^{d}_{r}}(\bm{x},\bm{y})=r\arccos\left(\bm{x}^{\prime}\bm{y}/r^{2}\right),\quad\bm{x},\bm{y}\in\mathbb{S}^{d}_{r}. (1)

For angular coordinates on 𝕊1\mathbb{S}^{1}, δ𝕊1(ϕ,ψ)=min{|ϕψ|,2π|ϕψ|}\delta_{\mathbb{S}^{1}}(\phi,\psi)=\min\{|\phi-\psi|,2\pi-|\phi-\psi|\}, ϕ,ψ[π,π)\phi,\psi\in[-\pi,\pi). The distance on 𝕋d\mathbb{T}^{d} benefits from the space’s product structure: δ𝕋d(ϕ,𝝍):=(i=1dδ𝕊1(ϕi,ψi)2)1/2\delta_{\mathbb{T}^{d}}(\bm{\phi},\bm{\psi}):=\big{(}\sum_{i=1}^{d}\delta_{\mathbb{S}^{1}}(\phi_{i},\psi_{i})^{2}\big{)}^{1/2}, ϕ,𝝍𝕋d\bm{\phi},\bm{\psi}\in\mathbb{T}^{d}.

The construction of our dimensionality reduction tool rests upon the underlying idea of Eltzner et al., (2018) that spherical embeddings are more appropriate than Euclidean ones when considering transformations from 𝕋d\mathbb{T}^{d}. While a definitely arguable claim, we present next a series of reasons supporting it. First, 𝕊rd\mathbb{S}_{r}^{d} is a space benign enough to isometrically embed an arbitrary distance under certain conditions. Precisely, Schoenberg, (1937) provides necessary and sufficient conditions for an isometric embedding of an arbitrary distance to the sphere, as shown in Theorem 3.22 of Pekalska and Duin, (2005):

Theorem 1 (Schoenberg, 1937).

An n×nn\times n distance matrix 𝐃=(δij)i,j=1n\bm{D}=(\delta_{ij})_{i,j=1}^{n} can be isometrically embedded to a sphere 𝕊rm\mathbb{S}^{m}_{r} of radius r>0r>0 and dimension mm if and only if δijπr\delta_{ij}\leq\pi r, for all i,j=1,,ni,j=1,\ldots,n, and the matrix 𝐆=(cos(δij/r))i,j=1n\bm{G}=(\cos(\delta_{ij}/r))_{i,j=1}^{n} is positive semidefinite. Then, the smallest mm such that 𝐃\bm{D} embeds to 𝕊rm\mathbb{S}^{m}_{r} is m=rank(𝐆)1m=\mathrm{rank}(\bm{G})-1, while the solution is undefined for rank(𝐆)=1\mathrm{rank}(\bm{G})=1.

Second, both 𝕊rd\mathbb{S}_{r}^{d} and 𝕋d\mathbb{T}^{d}, endowed with their respective metrics, are compact Riemannian manifolds for every dd\in\mathbb{N}, unlike d\mathbb{R}^{d} with the usual Euclidean distance. Hence the second similarity of the two manifolds (and dissimilarity with the Euclidean space) comes from topology. Third, 𝕊rd\mathbb{S}_{r}^{d} has a subspace isomorphic to 𝕊1\mathbb{S}^{1} (recall, e.g., its parametrization in terms of hyperspherical coordinates) that contributes on preserving the dd-dimensional periodicity in 𝕋d\mathbb{T}^{d}, as opposed to d\mathbb{R}^{d}, where periodicity is nonexistent. Fourth, as noted in the context of data analysis at least as early as in Cox and Cox, (1991), convex hulls in 𝕊d\mathbb{S}^{d} and 𝕋d\mathbb{T}^{d} present definition issues, in contradiction to d\mathbb{R}^{d}, where the convex hull of the data provides a natural notion of boundary. A final, yet very important reason, comes from statistics and concerns the ability of PNS to capture cluster structure when compared to PCA. Consider the problem of a dataset with three normal-like clusters on 𝕊d\mathbb{S}^{d} and d\mathbb{R}^{d}. In 𝕊d\mathbb{S}^{d} it is always possible to capture three clusters by using a one-dimensional periodic mode of variation (𝕊1\mathbb{S}^{1}), as long as one allows small circles. However, unless the three clusters have colinear centers, in d\mathbb{R}^{d} one can only flawlessly capture up to two clusters by using a straight line. Therefore, an embedding into 𝕊d\mathbb{S}^{d} followed by PNS allows for a more efficient cluster hunting than one into d\mathbb{R}^{d} and subsequent PCA.

2.2 Multidimensional scaling

MultiDimensional Scaling (MDS) is a well-established method with origins that date to the psychometrics literature, as early as the 1950’s. Many reviews and books already exist on the topic, see, e.g., Borg and Groenen, (2005), Cox and Cox, (2008), or Saeed et al., (2018). We review the Spherical MDS literature relevant for our work. We will only discuss and work with what is called metric MDS and in particular, its classical and spherical variant.

2.2.1 Classical Euclidean MDS

Let 𝒙1,,𝒙n\bm{x}_{1},\ldots,\bm{x}_{n} be a sample of elements in an arbitrary metric space (X,δX)(X,\delta_{X}), where δX\delta_{X} is a distance function on XX. For a given pp\in\mathbb{N}, the metric version of Classical MDS seeks to find a configuration 𝒚^1,,𝒚^np\widehat{\bm{y}}_{1},\ldots,\widehat{\bm{y}}_{n}\in\mathbb{R}^{p} such that

(𝒚^1,,𝒚^n)=argmin(𝒚1,,𝒚n)(p)n1n(n1)ij(δX(𝒙i,𝒙j)𝒚i𝒚j)2.\displaystyle(\widehat{\bm{y}}_{1},\ldots,\widehat{\bm{y}}_{n})=\arg\min_{(\bm{y}_{1},\ldots,\bm{y}_{n})\in(\mathbb{R}^{p})^{n}}\frac{1}{n(n-1)}\sum_{i\neq j}(\delta_{X}(\bm{x}_{i},\bm{x}_{j})-\|\bm{y}_{i}-\bm{y}_{j}\|)^{2}. (2)

When (X,δX)=(d,)(X,\delta_{X})=(\mathbb{R}^{d},\|\cdot\|) for dd\in\mathbb{N}, dpd\geq p, (2) is essentially a standard PCA up to translation (see, e.g., Section 2.2.7 of Cox and Cox, (2008)), where 𝒚^1,,𝒚^n\widehat{\bm{y}}_{1},\ldots,\widehat{\bm{y}}_{n} are the projections of 𝒙1,,𝒙n\bm{x}_{1},\ldots,\bm{x}_{n} to the best-fitting p\mathbb{R}^{p}, up to isometry.

2.2.2 Spherical MDS

In the non-classical context, we consider a second metric space (Y,δY)(Y,\delta_{Y}), not necessarily Euclidean, to which we want to map our original data, see, e.g., Cox and Cox, (1991, 2008). Then, (2) is equivalent to finding a configuration 𝒚^1,,𝒚^nY\widehat{\bm{y}}_{1},\ldots,\widehat{\bm{y}}_{n}\in Y such that

(𝒚^1,,𝒚^n)=argmin(𝒚1,,𝒚n)Yn1n(n1)ij(δX(𝒙i,𝒙j)δY(𝒚i,𝒚j))2.\displaystyle(\widehat{\bm{y}}_{1},\ldots,\widehat{\bm{y}}_{n})=\arg\min_{(\bm{y}_{1},\ldots,\bm{y}_{n})\in Y^{n}}\frac{1}{n(n-1)}\sum_{i\neq j}(\delta_{X}(\bm{x}_{i},\bm{x}_{j})-\delta_{Y}(\bm{y}_{i},\bm{y}_{j}))^{2}. (3)

The objective function in (2) and (3) is referred to as the stress function for the given method. In the case that (Y,δY)=(𝕊rd,δ𝕊rd)(Y,\delta_{Y})=(\mathbb{S}^{d}_{r},\delta_{\mathbb{S}^{d}_{r}}), we refer to (3) as Spherical MDS (SMDS). Typically, to solve the Spherical MDS problem (4) where (X,dX)(X,d_{X}) is arbitrary, one of two strategies have been used. The first represents the data on 𝕊rd\mathbb{S}_{r}^{d} with hyperspherical coordinates and numerically optimizes in (3) over Y=[0,π]d1×[π,π)Y=[0,\pi]^{d-1}\times[-\pi,\pi). Its main advantage is computational convenience, as for the optimization only some box constraints are required. A nuisance of this approach is that computation of Cartesian coordinates is required for computing geodesic distances on 𝕊rd\mathbb{S}_{r}^{d}, for d3d\geq 3; perhaps for this reason, instances of this approach seem to have focused on d=1,2d=1,2 (Elad et al.,, 2005; Cox and Cox,, 1991; Papazoglou and Mylonas,, 2017). The second strategy employs the Euclidean parametrization, as well as either linear or quadratic constraints that will force the new configuration to lie on a sphere. This, however, may yield a somehow demanding optimization problem. Instances of such work include Lee and Bentler, (1980), Borg and Lingoes, (1980), and de Leeuw and Heiser, (1980). A particularly successful approach in this subcategory that can solve MDS with quadratic constraints by employing majorization techniques is the primal method of MDS-Quadratic (MDS-Q), proposed by de Leeuw and Mair, (2009). MDS-Q reduces to Spherical MDS when Y=𝕊dY=\mathbb{S}^{d} in (3). A faster algorithm is also supported by MDS-Q, referred to as the dual method of MDS-Q, which yields approximately spherical solutions (and so a projection to the sphere must follow).

3 Scaled Torus-PCA

We provide next a high-level overview of Scaled Torus-PCA (ST-PCA). Given a sample 𝒙1,,𝒙n𝕋d\bm{x}_{1},\ldots,\bm{x}_{n}\in\mathbb{T}^{d}, dd\in\mathbb{N}, ST-PCA proceeds in three main steps:

  1. 1.

    Obtain 𝒚^1,,𝒚^n𝕊rd\widehat{\bm{y}}_{1},\ldots,\widehat{\bm{y}}_{n}\in\mathbb{S}_{r}^{d} by solving the SMDS problem arising from (3) with (X,δX)=(𝕋d,δ𝕋d)(X,\delta_{X})=(\mathbb{T}^{d},\delta_{\mathbb{T}^{d}}) and (Y,δY)=(𝕊rd,δ𝕊rd)(Y,\delta_{Y})=(\mathbb{S}^{d}_{r},\delta_{\mathbb{S}^{d}_{r}}):

    (𝒚^1,,𝒚^n)=argmin(𝒚1,,𝒚n)(𝕊rd)n1n(n1)ij(δ𝕋d(𝒙i,𝒙j)δ𝕊rd(𝒚i,𝒚j))2.\displaystyle(\widehat{\bm{y}}_{1},\ldots,\widehat{\bm{y}}_{n})=\arg\min_{(\bm{y}_{1},\ldots,\bm{y}_{n})\in{(\mathbb{S}^{d}_{r})}^{n}}\frac{1}{n(n-1)}\sum_{i\neq j}(\delta_{\mathbb{T}^{d}}(\bm{x}_{i},\bm{x}_{j})-\delta_{\mathbb{S}^{d}_{r}}(\bm{y}_{i},\bm{y}_{j}))^{2}. (4)
  2. 2.

    Obtain SjS^{j}, j=d1,,1j=d-1,\ldots,1, the nested sequence of subspaces that are isomorphic to 𝕊j\mathbb{S}^{j} and that best fit 𝒚^1,,𝒚^n\widehat{\bm{y}}_{1},\ldots,\widehat{\bm{y}}_{n} according to PNS.

  3. 3.

    Optionally, “invert” S1S^{1} through a reverse SMDS problem to obtain a principal curve on 𝕋d\mathbb{T}^{d}.

Step 1 is discussed in Subsection 3.1, while considerations on choosing the radius of 𝕊rd\mathbb{S}^{d}_{r} are addressed in Subsection 3.2. Steps 2 and 3 are elaborated in Subsections 3.3 and 3.4, respectively.

3.1 Solving the SMDS problem

For a given dataset 𝒙1,,𝒙n𝕋d\bm{x}_{1},\ldots,\bm{x}_{n}\in\mathbb{T}^{d} and radius r+r\in\mathbb{R}_{+}, in Step 1 we seek a solution to the optimization problem (4). While this could be achieved by using MDS-Q, as previously described, we have also explored the unconstrained optimization approach

(𝒛^1,,𝒛^n)=argmin(𝒛1,,𝒛n)(d+1)n1n(n1)ij(δ𝕋d(𝒙i,𝒙j)rarccos(𝒛i𝒛j𝒛i𝒛j))2,\displaystyle(\widehat{\bm{z}}_{1},\ldots,\widehat{\bm{z}}_{n})=\underset{(\bm{z}_{1},\ldots,\bm{z}_{n})\in(\mathbb{R}^{d+1})^{n}}{\arg\min}\frac{1}{n(n-1)}\sum_{i\neq j}\left(\delta_{\mathbb{T}^{d}}(\bm{x}_{i},\bm{x}_{j})-r\arccos\left(\frac{\bm{z}_{i}^{\prime}\bm{z}_{j}}{\|\bm{z}_{i}\|\|\bm{z}_{j}\|}\right)\right)^{2}, (5)

whose solution is then projected as 𝒚^i:=r𝒛^i/𝒛^i𝕊rd\widehat{\bm{y}}_{i}:=r\widehat{\bm{z}}_{i}/\|\widehat{\bm{z}}_{i}\|\in\mathbb{S}^{d}_{r}, i=1,,ni=1,\ldots,n. An advantage of this relatively simple approach over MDS-Q is that it can be further modified straightforwardly, as later discussed in Section 3.2. More importantly, even though the framework for a geodesic MDS-Q exists, existing software only features chordal distances on 𝕊d\mathbb{S}^{d}. When either solving (4) or (5) some care is advised: none have unique global minima, as their objective functions are invariant under rotations of point configurations. In (4), this issue can be addressed by fixing 𝒚1\bm{y}_{1} to, say, (0,,0,1)(0,\ldots,0,1)^{\prime} and optimizing over the (n1)(n-1) remaining (𝒚2,,𝒚n)(\bm{y}_{2},\ldots,\bm{y}_{n}). In (5), one can fix 𝒛1=(0,,0,s)\bm{z}_{1}=(0,\ldots,0,s)^{\prime} and optimize on (s,𝒛2,,𝒛n)+×(d+1)n1(s,\bm{z}_{2},\ldots,\bm{z}_{n})\in\mathbb{R}_{+}\times(\mathbb{R}^{d+1})^{n-1}.

We ran experiments comparing the MDS-Q approach with the BFGS algorithm as implemented in R’s optim function to solve (6) (a slight modification of (5)). MDS-Q was run as implemented in the smacofSphere function of the smacof R package (de Leeuw and Mair,, 2009). Both SMDS algorithms were initiated by a projection to 𝕊d\mathbb{S}^{d} of the classical MDS solution. Calculation times for the optimization of (6) also include the initialization of the radius according to (7). Optimization in (6) was conducted at a relative tolerance level of 5×1025\times 10^{-2} and MDS-Q at ε=5×102\varepsilon=5\times 10^{-2}, which was chosen due to comparable computational efficiency and notable improvements in stress over MDS-Q and can partly be attributed to MDS-Q’s non-geodesic calculations of distances. For the dual MDS-Q algorithm, the default penalty (10) was used. Each of the three numerical examples in Section 4 was generated 50 times and their results were averaged. In the setting of Subsection 4.1, our optimization of (6) took 6.45s and yielded a stress of 0.068, primal MDS-Q needed 14.29s and yielded a stress of 0.479, while dual MDS-Q reported 2.16s and 1.076, respectively. The respective metrics for the setting of Subsection 4.2 are: 7.28s and 0.064; 75.02s (with a range of 43–157s) and 1.23; 2.6 and 1.753. For the setting in Subsection 4.3 we obtained: 30.93s and 0.159; 14.65s and 0.645; and 2.19s and 0.691. Due to the above considerations, and since we specifically target geodesic distances, we have decided to use approach (6) in the sequel.

3.2 Considerations on the radius

The choice of rr in (4) and (5) affects the scaling of the spherical distances. This has an important effect, since distances on 𝕊rd\mathbb{S}^{d}_{r} are bounded by rπr\pi while distances on 𝕋d\mathbb{T}^{d} are bounded by dπ\sqrt{d}\pi. Besides, 𝕊rd\mathbb{S}^{d}_{r} becomes a very different space when r0r\to 0 or rr\to\infty, collapsing in the first case to {𝟎}\{\mathbf{0}\} and becoming flat in the second limit case (the curvature of 𝕊rd\mathbb{S}_{r}^{d} is 1/r21/r^{2}).

A possibility is to incorporate rr into the optimization problem, modifying the radius of the arrival space in a data-driven form, as done by Bai et al., (2015). Precisely, we can turn (5) into

(r^,𝒛^1,,𝒛^n)=argmin(r,𝒛1,,𝒛n)+×(d+1)n1n(n1)ij(δ𝕋d(𝒙i,𝒙j)rarccos(𝒛i𝒛j𝒛i𝒛j))2.\displaystyle(\widehat{r},\widehat{\bm{z}}_{1},\ldots,\widehat{\bm{z}}_{n})=\underset{(r,\bm{z}_{1},\ldots,\bm{z}_{n})\in\mathbb{R}_{+}\times(\mathbb{R}^{d+1})^{n}}{\arg\min}\frac{1}{n(n-1)}\sum_{i\neq j}\left(\delta_{\mathbb{T}^{d}}(\bm{x}_{i},\bm{x}_{j})-r\arccos\left(\frac{\bm{z}_{i}^{\prime}\bm{z}_{j}}{\|\bm{z}_{i}\|\|\bm{z}_{j}\|}\right)\right)^{2}. (6)

Including rr within the optimization in (5) is immediate, which is an additional benefit of (5). A simple initial value for rr is d\sqrt{d}; it equates the maximum distances on 𝕋d\mathbb{T}^{d} and 𝕊rd\mathbb{S}_{r}^{d}.

A more sophisticated approach to select rr that does not require expanding (5) to (6) is described next. Denote by μ\mu and νr\nu_{r} the two uniform measures on 𝕋d\mathbb{T}^{d} and 𝕊rd\mathbb{S}_{r}^{d}, respectively. Let 𝒖1,,𝒖nμ\bm{u}_{1},\ldots,\bm{u}_{n}\sim\mu and 𝒗1,,𝒗nνr\bm{v}_{1},\ldots,\bm{v}_{n}\sim\nu_{r} be iid samples. Denote by FN,𝕋dF_{N,\mathbb{T}^{d}} to the empirical cumulative distribution function (ecdf) of the pairwise distances {δ𝕋d(𝒖i,𝒖j)}1i<jn\{\delta_{\mathbb{T}^{d}}(\bm{u}_{i},\bm{u}_{j})\}_{1\leq i<j\leq n} and by GN,𝕊rdG_{N,\mathbb{S}_{r}^{d}} the analogous ecdf for {δ𝕊rd(𝒗i,𝒗j)}1i<jn\{\delta_{\mathbb{S}_{r}^{d}}(\bm{v}_{i},\bm{v}_{j})\}_{1\leq i<j\leq n}, with N:=n(n1)/2N:=n(n-1)/2. Define

r:=argminr0𝔼μ×νr[W1(FN,𝕋d,GN,𝕊rd)],W1(FN,𝕋d,GN,𝕊rd)=01|FN,𝕋d(1)(s)GN,𝕊rd(1)(s)|ds,\displaystyle r^{*}:=\arg\min_{r\geq 0}\mathbb{E}_{\mu\times\nu_{r}}\left[W_{1}(F_{N,\mathbb{T}^{d}},G_{N,\mathbb{S}_{r}^{d}})\right],\quad W_{1}(F_{N,\mathbb{T}^{d}},G_{N,\mathbb{S}_{r}^{d}})=\int_{0}^{1}\big{|}F^{(-1)}_{N,\mathbb{T}^{d}}(s)-G^{(-1)}_{N,\mathbb{S}_{r}^{d}}(s)\big{|}\,\mathrm{d}s, (7)

which corresponds to the expected 11-Wasserstein distance between the two pairwise distance ecdfs under the product measure μ×νr\mu\times\nu_{r}. Above, F(1)F^{(-1)} stands for the of generalized inverse of FF. As defined, rr^{*} guarantees that, on average, the most spread sample of size nn on 𝕋d\mathbb{T}^{d} can be allocated on 𝕊rd\mathbb{S}^{d}_{r} while optimally maintaining its interpoint-distance distribution. Uniformly-distributed samples on 𝕋d\mathbb{T}^{d} can be regarded as the worst-case scenario on which to perform SMDS effectively; rr^{*} is designed to aid precisely in this situation. In practice, the expectation in (7) can be estimated empirically by Monte Carlo and the Wasserstein distance is evaluated by its equivalent form W1(FN,𝕋d,GN,𝕊rd)=1Ni=1N|d(i),𝕋dd(i),𝕊rd|W_{1}(F_{N,\mathbb{T}^{d}},G_{N,\mathbb{S}_{r}^{d}})=\frac{1}{N}\sum_{i=1}^{N}\big{|}d_{(i),\mathbb{T}^{d}}-d_{(i),\mathbb{S}_{r}^{d}}\big{|} for d(k),Xd_{(k),X} being the kk-th order statistic of {δX(𝒖i,𝒖j)}1i<jn\{\delta_{X}(\bm{u}_{i},\bm{u}_{j})\}_{1\leq i<j\leq n}. In the experiments described in Subsection 3.1, as well as in the sequel, we computed rr^{*} as described in (7) estimating the expectation with M=100M=100 Monte Carlo replicates of n=100n=100 uniformly generated points each. Then, we have used this as an initial value for r^\widehat{r} in solving (6).

3.3 PNS fit

By solving (5) we have found a configuration 𝒚^1,,𝒚^n𝕊rd\widehat{\bm{y}}_{1},\ldots,\widehat{\bm{y}}_{n}\in\mathbb{S}^{d}_{r}. For the easiness of presentation, we assume this configuration has been projected to 𝕊d\mathbb{S}^{d} before applying PNS (which is invariant to scaling). We can now use PNS on 𝒚^1,,𝒚^n\widehat{\bm{y}}_{1},\ldots,\widehat{\bm{y}}_{n} to obtain a nested sequence of subspaces

𝕊dSd1Sd2S1{𝝁^},\displaystyle\mathbb{S}^{d}\supset S^{d-1}\supset S^{d-2}\supset\ldots\supset S^{1}\supset\{\widehat{\bm{\mu}}\}, (8)

where Sd1𝕊d1,,S1𝕊1S^{d-1}\cong\mathbb{S}^{d-1},\ldots,S^{1}\cong\mathbb{S}^{1}, and 𝝁^\widehat{\bm{\mu}} is the backwards mean, which is the Fréchet mean of the projections of the data onto S1S^{1}. A possible parametrization for the (d1)(d-1)-dimensional subsphere Sd1S^{d-1} that sheds light into its small-subsphere structure is attained with the tangent-normal decomposition: Sd1={𝒙𝕊d:𝒙=t1𝜽1+(1t12)1/2𝑩𝜽1𝝃1,𝝃1𝕊d1}S^{d-1}=\{\bm{x}\in\mathbb{S}^{d}:\bm{x}=t_{1}\bm{\theta}_{1}+(1-t_{1}^{2})^{1/2}\bm{B}_{\bm{\theta}_{1}}\bm{\xi}_{1},\,\bm{\xi}_{1}\in\mathbb{S}^{d-1}\}, where (t1,𝜽1)[1,1]×𝕊d(t_{1},\bm{\theta}_{1})\in[-1,1]\times\mathbb{S}^{d} and 𝑩𝜽1\bm{B}_{\bm{\theta}_{1}} is an arbitrary semi-orthonormal (d+1)×d(d+1)\times d matrix such that 𝑩𝜽1𝑩𝜽1=𝑰d+1𝜽1𝜽1\bm{B}_{\bm{\theta}_{1}}\bm{B}_{\bm{\theta}_{1}}^{\prime}=\bm{I}_{d+1}-\bm{\theta}_{1}\bm{\theta}_{1}^{\prime} and 𝑩𝜽1𝑩𝜽1=𝑰d\bm{B}_{\bm{\theta}_{1}}^{\prime}\bm{B}_{\bm{\theta}_{1}}=\bm{I}_{d}. Further subspheres can be parametrized iteratively in a nested fashion, yet explicit forms are more convoluted.

PNS also yields the matrix of scores, {𝝃i}i=1n\{\bm{\xi}_{i}\}_{i=1}^{n}, where 𝝃i\bm{\xi}_{i} is a dd-vector (for angular parametrization) that denotes the coordinates of 𝒚^i\widehat{\bm{y}}_{i} with regard to the spherical components. This matrix gives rise to the space of principal scores E=[π,π)×[π/2,π/2]d1E=[-\pi,\pi)\times[-\pi/2,\pi/2]^{d-1}, a map h:𝕊dEh:\mathbb{S}^{d}\to E of PNS such that h(𝒚^i)=𝝃ih(\widehat{\bm{y}}_{i})=\bm{\xi}_{i}, and its inverse, h~\tilde{h}. Further, we denote by {𝝃i(k)}\{\bm{\xi}_{i}^{(k)}\} the projection of the ii-th datum to Sk1S^{k-1}, so that 𝝃i(k)\bm{\xi}_{i}^{(k)} agrees with 𝝃i\bm{\xi}_{i} in the first kk-coordinates and is 0 in the rest. Essentially, given a 𝝃E\bm{\xi}\in E, where ξj\xi_{j} denotes the deviation from the jj-th nested sphere (of dimension djd-j),

h~(𝝃)=(g~1g~d1)(𝝃),g~k(𝝃(k1),ξk):=𝑹(𝒗k)(sin(rk+ξk)𝝃(k1)cos(rk+ξk)),k=1,,d,\displaystyle\tilde{h}(\bm{\xi})=(\tilde{g}_{1}\circ\cdots\circ\tilde{g}_{d-1})(\bm{\xi}),\quad\tilde{g}_{k}(\bm{\xi}^{(k-1)},\xi_{k}):=\bm{R}(\bm{v}_{k})^{\prime}\begin{pmatrix}\sin(r_{k}+\xi_{k})\bm{\xi}^{(k-1)}\\ \cos(r_{k}+\xi_{k})\end{pmatrix},\quad k=1,\ldots,d,

where 𝑹(𝒗k)\bm{R}(\bm{v}_{k}) is the matrix that rotates 𝒗k\bm{v}_{k}, the center of the kk-th subsphere, to the north pole, and rkr_{k} its radius.

For more information on the maps hh and h~\tilde{h} the reader is referred to Section 4 of the Supplementary Material of Jung et al., (2012) and to the main paper for the construction of the nested spheres. PNS is implemented in the pns function of the shapes R package (Dryden,, 2021), which we have used.

3.4 Prediction

Step 1 of ST-PCA grants an association between the point configurations (𝒙1,,𝒙n)(𝕋d)n(\bm{x}_{1},\ldots,\bm{x}_{n})\in(\mathbb{T}^{d})^{n} and (𝒚^1,,𝒚^n)(𝕊rd)n(\widehat{\bm{y}}_{1},\ldots,\widehat{\bm{y}}_{n})\in(\mathbb{S}^{d}_{r})^{n}. Suppose that we are given an arbitrary point 𝒚𝕊rd\bm{y}\in\mathbb{S}_{r}^{d} and are asked to predict a configuration 𝒙𝕋d\bm{x}\in\mathbb{T}^{d} that best preserves the interpoint distances of 𝒙\bm{x} with respect to the original configuration 𝒙1,,𝒙n\bm{x}_{1},\ldots,\bm{x}_{n}. This problem is referred to as the prediction problem for MDS, see, e.g., Gower and Hand, (1996). This “prediction” can be achieved by defining the set-valued map T~:𝕊rd𝕋d\widetilde{T}:\mathbb{S}^{d}_{r}\to\mathbb{T}^{d} given by

T~(𝒚):=argmin𝒙𝕋d1ni=1n(δ𝕋d(𝒙,𝒙i)δ𝕊rd(𝒚,𝒚^i))2,\displaystyle\widetilde{T}(\bm{y}):=\arg\min_{\bm{x}\in\mathbb{T}^{d}}\frac{1}{n}\sum_{i=1}^{n}\left(\delta_{\mathbb{T}^{d}}(\bm{x},\bm{x}_{i})-\delta_{\mathbb{S}_{r}^{d}}(\bm{y},\widehat{\bm{y}}_{i})\right)^{2}, (9)

which is a coherent prediction problem (Gower and Hand,, 1996) since the same objective function (stress) as in the original SMDS problem is used. In practice, an appropriate initialization for the numerical optimization to be done in (9) is the sample point 𝒙i\bm{x}_{i} whose 𝒚^i\widehat{\bm{y}}_{i} is closest to 𝒚\bm{y}. For the sake of completeness, we mention that the interpolation problem for MDS refers to the reverse problem to (9), which induces the definition of the set-valued map T^:𝕋d𝕊rd\widehat{T}:\mathbb{T}^{d}\to\mathbb{S}^{d}_{r} given by

T^(𝒙):=argmin𝒚𝕊rd1ni=1n(δ𝕋d(𝒙,𝒙i)δ𝕊rd(𝒚,𝒚^i))2.\displaystyle\widehat{T}(\bm{x}):=\arg\min_{\bm{y}\in\mathbb{S}_{r}^{d}}\frac{1}{n}\sum_{i=1}^{n}\left(\delta_{\mathbb{T}^{d}}(\bm{x},\bm{x}_{i})-\delta_{\mathbb{S}_{r}^{d}}(\bm{y},\widehat{\bm{y}}_{i})\right)^{2}. (10)

Both (9) and (10) give a means of extending the SMDS matching between (𝒙1,,𝒙n)(\bm{x}_{1},\ldots,\bm{x}_{n}) and (𝒚^1,,𝒚^n)(\widehat{\bm{y}}_{1},\ldots,\widehat{\bm{y}}_{n}) to arbitrary points on 𝕋d\mathbb{T}^{d} and 𝕊rd\mathbb{S}_{r}^{d}, respectively.

We now explain how prediction is used to find a tractable first principal component on 𝕋d\mathbb{T}^{d}. Recall from Section 3.3 that PNS yields an explicit map, h~:E𝕊d\tilde{h}:E\rightarrow\mathbb{S}^{d}. Using h~\tilde{h} we can obtain the Euclidean parametrization of S1S^{1} in 𝕊d\mathbb{S}^{d}. The descriptor is obtained in two steps:

  1. i.

    Take a grid {ξj}j=1m[π,π)\{\xi_{j}\}_{j=1}^{m}\subset[-\pi,\pi). Then, {𝝃j}j=1mE\{\bm{\xi}_{j}\}_{j=1}^{m}\subset E is a discrete representation of S1S^{1}, where 𝝃j=ξj×{0}×d1×{0}\bm{\xi}_{j}=\xi_{j}\times\{0\}\times\stackrel{{\scriptstyle d-1}}{{\cdots}}\times\{0\}, and obtain {𝝍j}j=1m\{\bm{\psi}_{j}\}_{j=1}^{m} for 𝝍j:=h~(𝝃j)\bm{\psi}_{j}:=\tilde{h}(\bm{\xi}_{j}).

  2. ii.

    Compute T~(𝝍j)\widetilde{T}(\bm{\psi}_{j}) for each j=1,,mj=1,\ldots,m by numerically optimizing (9). To aid continuity in the resulting sequence of solutions of the optimization problems and avoid spurious minima, use the former prediction T~(𝝍j)\widetilde{T}(\bm{\psi}_{j}) as starting value for obtaining the next one T~(𝝍j+1)\widetilde{T}(\bm{\psi}_{j+1}).

The above procedure generates a discretization of the set

S~1:=T~(S1)={T~(𝝍)𝕋d:𝝍S1}.\displaystyle\widetilde{S}^{1}:=\widetilde{T}(S^{1})=\{\widetilde{T}(\bm{\psi})\in\mathbb{T}^{d}:\bm{\psi}\in S^{1}\}.

We close this subsection with a remark. The strength and flexibility of ST-PCA relies on the fact that it searches for a transformation that is, in some sense, optimal. Since this is a solution of optimization problems, there is no guaranteed regularity for ST-PCA’s principal components on the torus. This can become apparent when predicting S1S^{1}, a continuous curve on 𝕊d\mathbb{S}^{d} whose predicting curve may yield discontinuities or non-closed curves in data-sparse regions of 𝕋d\mathbb{T}^{d}. In practice, the continuity of S~1\widetilde{S}^{1} can be facilitated by sensible initializations, as above described, followed by a fine-grain reconstruction of S~1\widetilde{S}^{1} by linear interpolations of {T~(𝝍j)}j=1m\{\widetilde{T}(\bm{\psi}_{j})\}_{j=1}^{m} that are adapted to 𝕋d\mathbb{T}^{d}. Nevertheless, for purposes of clustering and data visualization, Steps 1 and 2 of our algorithm suffice. Hence, even in the case that the principal component constructed by Step 3 of ST-PCA is discontinuous, this algorithm can yield meaningful dimensionality reduction.

3.5 Explained variance

Consider the projection of 𝒚^i\widehat{\bm{y}}_{i} to SkS^{k}, denoted by 𝒚^i(k)\widehat{\bm{y}}_{i}^{(k)}, k=0,,dk=0,\ldots,d, where 𝒚^i(0)=𝝁^\widehat{\bm{y}}_{i}^{(0)}=\widehat{\bm{\mu}}, i=1,,ni=1,\ldots,n, and 𝝁^\widehat{\bm{\mu}} is the Fréchet mean of the projections of the data on S1𝕊1S^{1}\cong\mathbb{S}^{1}. Then, the variance decomposition provided by PNS for such sample is

var𝕊d(𝒚^1,,𝒚^n):=k=1dvar𝕊d(k)(𝒚^1,,𝒚^n):=k=1di=1n(δ𝕊d(𝒚^i(k1),𝒚^i(k)))2\displaystyle\mathrm{var}_{\mathbb{S}^{d}}(\widehat{\bm{y}}_{1},\ldots,\widehat{\bm{y}}_{n}):=\sum_{k=1}^{d}\mathrm{var}_{\mathbb{S}^{d}}^{(k)}(\widehat{\bm{y}}_{1},\ldots,\widehat{\bm{y}}_{n}):=\sum_{k=1}^{d}\sum_{i=1}^{n}\left(\delta_{\mathbb{S}^{d}}\big{(}\widehat{\bm{y}}_{i}^{(k-1)},\widehat{\bm{y}}_{i}^{(k)}\big{)}\right)^{2} (11)

and the proportion of variance explained by the kk-th component is var𝕊d(k)(𝒚^1,,𝒚^n)/var𝕊d(𝒚^1,,𝒚^n)\mathrm{var}_{\mathbb{S}^{d}}^{(k)}(\widehat{\bm{y}}_{1},\ldots,\widehat{\bm{y}}_{n})/\allowbreak\mathrm{var}_{\mathbb{S}^{d}}(\widehat{\bm{y}}_{1},\ldots,\widehat{\bm{y}}_{n}).

Denote by 𝒙^i(k)=T~(𝒚^i(k))\widehat{\bm{x}}_{i}^{(k)}=\widetilde{T}(\widehat{\bm{y}}_{i}^{(k)}), k=0,,dk=0,\ldots,d, to the predicted projection 𝒚^i(k)\widehat{\bm{y}}_{i}^{(k)}, which can be equivalently regarded as the projection of 𝒙i\bm{x}_{i} to S~k\widetilde{S}^{k}. Then, based on (11), we define the total variance of 𝒙1,,𝒙n\bm{x}_{1},\ldots,\bm{x}_{n} on 𝕋d\mathbb{T}^{d} as

var𝕋d(𝒙1,,𝒙n):=k=1dvar𝕋d(k)(𝒙1,,𝒙n):=k=1di=1n(δ𝕋d(𝒙^i(k1),𝒙^i(k)))2.\displaystyle\mathrm{var}_{\mathbb{T}^{d}}(\bm{x}_{1},\ldots,\bm{x}_{n}):=\sum_{k=1}^{d}\mathrm{var}^{(k)}_{\mathbb{T}^{d}}(\bm{x}_{1},\ldots,\bm{x}_{n}):=\sum_{k=1}^{d}\sum_{i=1}^{n}\left(\delta_{\mathbb{T}^{d}}\big{(}\widehat{\bm{x}}_{i}^{(k-1)},\widehat{\bm{x}}_{i}^{(k)}\big{)}\right)^{2}.

The proportion of variance explained by the kk-th component is then defined as var𝕋d(k)(𝒙1,,𝒙n)/var𝕋d(𝒙1,,𝒙n)\mathrm{var}_{\mathbb{T}^{d}}^{(k)}(\bm{x}_{1},\ldots,\bm{x}_{n})/\allowbreak\mathrm{var}_{\mathbb{T}^{d}}(\bm{x}_{1},\ldots,\bm{x}_{n}). In particular, this ratio allows quantifying the proportion of data variance that is captured by S~1\widetilde{S}^{1}, a popular metric of success in dimensionality reduction. As it is typical for PCA analogs in non-Euclidean spaces (see, e.g., the discussion in Section 2.3 in Eltzner et al.,, 2018), there is no guarantee that kvar𝕋d(k)(𝒙1,,𝒙n)/var𝕋d(𝒙1,,𝒙n)k\mapsto\mathrm{var}_{\mathbb{T}^{d}}^{(k)}(\bm{x}_{1},\ldots,\bm{x}_{n})/\allowbreak\mathrm{var}_{\mathbb{T}^{d}}(\bm{x}_{1},\ldots,\bm{x}_{n}) is a non-decreasing function.

4 Numerical experiments

4.1 Clusters toy data on 𝕋2\mathbb{T}^{2}

We demonstrate next the ability of ST-PCA to separate clustered data on 𝕋2\mathbb{T}^{2} without distorting the cluster structure. To that aim, a toy dataset consisting of three isotropic clusters with means 𝝁1=(1,2)\bm{\mu}_{1}=(-1,-2)^{\prime}, 𝝁2=(3,0.5)\bm{\mu}_{2}=(3,0.5)^{\prime}, and 𝝁3=(0.8,2.5)\bm{\mu}_{3}=(-0.8,2.5)^{\prime}, each with n=100n=100, has been considered. The dataset is displayed in Figure 2a. We performed Step 1 to obtain a configuration of the data on 𝕊2\mathbb{S}^{2}, displayed in Figure 2b. For this specific dataset, we have obtained r=1.47r^{*}=1.47 as an initial value from (7), which is then modified to r^=1.35\widehat{r}=1.35 through (6). In Step 2, we obtained the PNS scores plot for this spherical configuration, displayed in Figure 2c. The black curve in Figure 2b shows S1S^{1}, the best fitting 𝕊1\mathbb{S}^{1} to the spherical data, which corresponds to the horizontal axis of the scores plot in Figure 2c.

Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Figure 2: Panel (a) shows the three clusters of simulated data on 𝕋2\mathbb{T}^{2}. Panel (b) presents the sample SMDS-mapped to 𝕊2\mathbb{S}^{2}. Panel (c) displays the PNS scores. The principal mode of variation on the torus, sphere, and scores is displayed in all panels as a black curve. The first ST-PCA mode of variation clearly captures the three clusters.

The resulting S~1\widetilde{S}^{1}, computed in Step 3 of ST-PCA, is shown in the black curve in Figure 2a. The figure reveals that the first toroidal component efficiently adapts to the clustered structure of the data (percentage of variance explained: 95%95\%). S~1\widetilde{S}^{1} was obtained by predicting an equispaced grid {𝝍j}j=1100S1\{\bm{\psi}_{j}\}_{j=1}^{100}\subset S^{1} and then linearly interpolating the resulting predictions on 𝕋2\mathbb{T}^{2} to yield a smooth curve.

4.2 Clusters toy data on 𝕋3\mathbb{T}^{3}

We now simulate data from an isotropic mixture of three wrapped normal distributions in 𝕋3\mathbb{T}^{3}. We simulated n=300n=300 points equally distributed in the three components with centers 𝝁1=(1,2,0)\bm{\mu}_{1}=(-1,-2,0)^{\prime}, 𝝁2=(1,1,1)\bm{\mu}_{2}=(1,1,-1)^{\prime}, and 𝝁3=(0,1,3)\bm{\mu}_{3}=(0,-1,3)^{\prime}. Figure 3a displays the three clusters on 𝕋3\mathbb{T}^{3}, where each side of the cube is glued to the opposite side. After running SMDS in Step 1, we have obtained a configuration on points 𝒚^1,,𝒚^n𝕊3\widehat{\bm{y}}_{1},\ldots,\widehat{\bm{y}}_{n}\in\mathbb{S}^{3}. For visualization purposes, we show in Figure 3b the spherical view of the configuration’s projections to S2𝕊2S^{2}\cong\mathbb{S}^{2}. For this dataset we have obtained r=1.79r^{*}=1.79, which then yielded r^=1.63\widehat{r}=1.63. Figure 3c shows the scores plot after applying PNS to 𝒚^1,,𝒚^n𝕊3\widehat{\bm{y}}_{1},\ldots,\widehat{\bm{y}}_{n}\in\mathbb{S}^{3} (Step 2). Finally, we inverted S1S^{1}, the black curve in Figure 3b, to produce S~1\widetilde{S}^{1} as displayed in the black curve of Figure 3. We did so with an equispaced grid {𝝃j}j=1100S1\{\bm{\xi}_{j}\}_{j=1}^{100}\subset S^{1} complemented by a periodic linear interpolation on 𝕋3\mathbb{T}^{3}. The entire cluster structure is captured by the first mode of variation, which clearly exploits the periodicity of 𝕋3\mathbb{T}^{3} for efficiently explaining the data variability. The percentage of variance explained by the first and second ST-PCA components is 94%94\% and 1%1\%, respectively.

Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Figure 3: Panel (a) shows the three clusters of simulated data on 𝕋3\mathbb{T}^{3}. Panel (b) presents the projection to S2𝕊2S^{2}\cong\mathbb{S}^{2} of the sample SMDS-mapped to 𝕊3\mathbb{S}^{3}. Panel (c) displays the first PNS scores. The principal mode of variation on the torus, sphere, and scores is displayed in all panels as a black curve. The first ST-PCA principal component successfully separates the three clusters.

4.3 Anisotropic wrapped normal on 𝕋2\mathbb{T}^{2}

In the final numerical experiment we simulated n=500n=500 observations from a wrapped normal distribution on 𝕋2\mathbb{T}^{2} with mean 𝝁=(1,0)\bm{\mu}=(-1,0)^{\prime} and covariance matrix 𝚺=(2,2;2,3)\bm{\Sigma}=(2,2;2,3). Figure 4 shows the analogous analysis to that carried out in Figure 2, this time using a rainbow palette for easier identification of the first ST-PCA principal component and with r=1.48r^{*}=1.48 and r^=1.39\widehat{r}=1.39. The mode of variation successfully utilizes the periodicity of the torus to optimally capture the data variability, attaining a percentage of variance explained of 89%89\%.

Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Figure 4: Panel (a) shows the sample from a wrapped Gaussian distribution on 𝕋2\mathbb{T}^{2} ; panel (b): spherical view of the configuration of points obtained by MDS. Panel (c) displays the first PNS scores. The principal mode of variation from the sphere is displayed in all panels in a rainbow palette. The first ST-PCA mode of variation spans along the presumed center of the banded data-dense region.

5 Real data applications

5.1 Sunspots

Sunspots are cooler regions of the Sun’s photosphere that are related with solar magnetic field concentrations. They have been used as a measure of the Sun’s solar activity and are speculated to have an impact on Earth’s long-term climate (see, e.g., Haigh,, 2007). The solar magnetic field generating the sunspots varies according to a nearly periodic, 11-year period that is referred to as a solar cycle. Sunspots observations have been well documented in the Debrecen Photoheliographic Data and the Greenwhich Photoheliographic Results. From the latter source, García-Portugués et al., (2020) give a curated dataset with the centers of groups of sunspots. The locations of the centers refer to the first-ever observations of such sunspots (henceforth referred to as “births”). We focus on the last cycle that has been fully observed with curated data, the 23rd solar cycle. It includes a total of 5373 records ranging from August of 1996 to November of 2001. Even though the main generator of sunspots, the twisting of the solar magnetic field, has a rotationally symmetric nature, the existence of preferred longitudes has been speculated for the recurrent appearance of sunspots (Babcock,, 1961; Bogart,, 1982; Pelt et al.,, 2010).

We explore the dependence structure of the longitudes θ[π,π)\theta\in[-\pi,\pi) of sunspots births. Since the Earth rotates around the Sun, sunspots visibility is limited by which side of the Sun is facing the Earth. Therefore, if there existed significantly preferred longitudes, a significant deviation from a linear dependence structure in the series of sunspots longitudes should be expected. To investigate this serial structure, we jointly consider the series of longitudes and its lagged versions of order one and two,

𝚯i:=(θi,θi+1,θi+2)𝕋3,i=1,,5371,\displaystyle\bm{\Theta}_{i}:=(\theta_{i},\theta_{i+1},\theta_{i+2})^{\prime}\in\mathbb{T}^{3},\quad i=1,\ldots,5371,

the main driver behind that selection being showing graphically the output of ST-PCA on 𝕋3\mathbb{T}^{3}. The principal mode of variation for the longitudes of the births of three consecutive groups of sunspots is shown in Figure 1c. The figure reveals that the data are concentrated around the (periodic) diagonal line and that ST-PCA (with r^=1.81\widehat{r}=1.81) correctly identifies this line as the first mode of variation, S~1\widetilde{S}^{1}. Recall that the points on the edges of the cube belong to the cloud of points around the diagonal, thus rendering it as the most sensible choice for a principal mode of variation. S~1\widetilde{S}^{1} is remarkably linear: a suitably-adapted linear fit to the grid of 100 points defining S~1\widetilde{S}^{1} yields R2=13×105R^{2}=1-3\times 10^{-5}. Therefore, the apparent linearity of the innovations in the series of sunspots births longitudes points towards the non-existence of (at least) major preferred longitudes during the 23rd solar cycle, a result that is coherent with the non-rejection of rotational symmetry for such cycle reported in García-Portugués et al., (2020).

An application of T-PCA, in either of its variants, proposed a mode of variation that is statistically less efficient (Figure 1c for SO ordering, analogous results for SI ordering). Furthermore, the deformation of 𝕋3\mathbb{T}^{3} to 𝕊3\mathbb{S}^{3} in T-PCA introduces distortion in the form of an artificial cluster structure on the first scores, as shown in Figures 5b and 5c, despite the marginals of {𝚯i}i=15371\{\bm{\Theta}_{i}\}_{i=1}^{5371} being fairly uniformly distributed. This contrasts with Figure 5a. To provide a formal comparison between the distributions of the first scores of T-PCA and ST-PCA, we performed three omnibus circular uniformity tests using their asymptotic distributions as implemented in the sphunif R package (García-Portugués and Verdebout,, 2021). The results are summarized in Table 1. The resulting pp-values illustrate that the uniformity hypothesis cannot be rejected for the marginal distribution of the original longitudes {θi}i=15373\{\theta_{i}\}_{i=1}^{5373} and the corresponding scores for ST-PCA, for a significance level α=0.05\alpha=0.05, but is emphatically rejected with the scores obtained by the two possible orderings for T-PCA.

Finally, we provide a comprehensive comparison of ST-PCA with seven existing methods of dimensionality reduction on the torus for this dataset. To do so, we sampled a common subset of 1000 sunspots and computed for each method the percent of variance that is explained by each of its three components. Table 2 summarizes the results of that analysis, with several remarks following. First, dPCA gives a six-dimensional embedding and as a result, the sum of the variance explained by the first three dimensions does not add up to 100%100\%. For dPCA+, histograms of ten bins were used to identify the regions of lowest density. Existing software for GeoPCA uses PGA and does not report the percentage of total variance explained, but the percentage of variance explained in the first two components instead; for that reason, we have not included it in the table. As a result, we were only able to find upper bounds for the percents of variance explained by the first two components (73.54%73.54\% and 26.46%26.46\%, respectively). Due to the apparent lack of appropriate software, the model-based approaches have not been implemented. As it can be seen, ST-PCA explains the most variance by the first component among its competitors. Its second component explains a smaller percentage compared to the third component, which is an often-encountered situation in non-Euclidean PCA (see Section 3.5). For example, an instance of this issue for T-PCA is reported in Table 5(b) of Eltzner et al., (2018). We conclude remarking that the methods that go through the sphere seem to yield a much higher signal compression compared to the other methods, with ST-PCA slightly outperforming T-PCA with regard to the percent of variance explained in the first component.

Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Figure 5: Histograms of the first scores of ST-PCA and T-PCA: (a) ST-PCA; (b) T-PCA with SI ordering; (c) T-PCA with SO ordering. Both versions of T-PCA suggest the existence of spurious clusters (the marginals of the data are not significantly non-uniform), which are not present in ST-PCA.
Test Longitudes ST-PCA T-PCA (SO) T-PCA (SI)
Giné’s FnF_{n} 0.3215 0.0855 0 0
Watson 0.3595 0.1843 0 0
Projected Anderson–Darling 0.7554 0.1273 0 9.522×1099.522\times 10^{-9}
Table 1: pp-values for various circular uniformity tests. From left to right column: name of the test; pp-values for testing uniformity on the original longitudes; pp-values for testing uniformity on the first scores of the corresponding method. The original longitudes and ST-PCA’s first scores are not significantly non-uniform, while uniformity is rejected in T-PCA’s first scores.

5.2 An RNA dataset

This RNA dataset consists of nine angles, out of which six are dihedral (α,β,γ,δ,ϵ,ζ)(\alpha,\beta,\gamma,\delta,\epsilon,\zeta), one corresponds to the base (χ)(\chi), and two to the pseudotorsion angles (η,θ)(\eta,\theta). This dataset was put together by Duarte and Pyle, (1998) and updated by Wadley et al., (2007) in an endeavor to cluster the complex structure of RNA nucleotides. The pseudotorsion angles were found to approximate well their folding structure. Sargsyan et al., (2012) then curated a subset that consists of 190190 observations and three clusters (C3’-endo clusters I, II, and V of Wadley et al., (2007), containing 5959, 8888, and 4343 points respectively), henceforth referred to as the small RNA dataset. The η\etaθ\theta plot of the pseudotorsion angles of the small RNA set reveals a clear cluster structure, which is not visible by the pairwise scatterplots of the other seven backbone and base angles (see, e.g., Figure 7 in Eltzner et al., (2018)). We want to investigate whether it is possible to predict the cluster structure revealed in the η\etaθ\theta plot by using the remaining seven variables only. Following the analyses by Sargsyan et al., (2012), Eltzner et al., (2018), and Nodehi et al., (2020), the small RNA dataset has become a de facto benchmark for dimensionality reduction methods on the torus.

Method Component 1 Component 2 Component 3
dPCA 38.48 34.06 7.67
dPCA+ 42.69 30.53 26.78
aPCA 43.27 31.28 25.45
Euclidean PCA 55.43 24.48 20.09
Complex-dPCA 72.51 14.59 12.90
T-PCA (SI) 88.27 6.97 4.76
T-PCA (SO) 89.32 6.43 4.25
ST-PCA 90.51 3.07 6.40
Table 2: Percentage of variance explained by components of dimension reduction methods on the torus, sorted decreasingly according to the percentage of variance explained by their first components. ST-PCA and T-PCA explain the most percentage of variance in their first component. Variance calculations based on standard PCA. Variance calculations based on torus variance as defined in Section 3.5.

Figures 3 and 6 of Sargsyan et al., (2012) reveal that dPCA fails at accomplishing any separation by utilizing the dihedral and the base angles, while GeoPCA succeeds in separating cluster I from II in the first two principal components. However, as it can be seen from the η\etaθ\theta plot, cluster V is close to cluster II and GeoPCA can offer no further insight on such a separation based on the first two components. An analysis by TPPCA on a subset of the small RNA dataset consisting of 181 nucleotides can give a decent separation using the first two components. Once more, however, the distinction between the two nearby clusters is not very clear (see Figure 2 of Nodehi et al., (2020)). Out of torus dimensionality reduction techniques, T-PCA on a 181 nucleotides subset, followed by mode hunting, has been the most successful at revealing the cluster structure of all 3 clusters by using only the first principal component (see Figure 8 of Eltzner et al., (2018)).

To produce a clear comparison of ST-PCA’s performance to T-PCA, we obtain the spherical scores of the 190190 nucleotides dataset with ST-PCA (r^=1.49\widehat{r}=1.49), as well as T-PCA with SI and SO ordering (version “20200518” as provided by the authors). Then, we proceed with a modified clustering scheme (to accommodate for the periodicity of the data) based on kernel density estimation, on the scores of the first principal components. The bandwidth has been determined using the plug-in selector for density derivative estimation (Wand and Jones,, 1995, pages 49 and 70) implemented in the R package ks (Duong,, 2021). Figure 6 demonstrates the results of our analysis, where the colors of the points correspond to their true labels: red, yellow, and green are used respectively for clusters I, II, and V. The domains of attraction of each cluster are represented by a different color in the scores plots and are separated by dashed vertical lines.

The left column of Figure 6 demonstrates that deforming the torus with the SI scheme underperforms by recognizing two clusters, as well as three ill-defined. In particular, it fails to separate clusters II and V and thus misclassifies 105 points (classification rate: 0.4470.447). The middle column shows that T-PCA with SO ordering correctly recognizes all three clusters, while misclassifying 23 points (classification rate: 0.8790.879). Finally, the right column shows that ST-PCA also reveals three clusters and misclassifies 1616 points (classification rate: 0.9160.916), hence outperforming T-PCA in both its variants. An additional benefit for ST-PCA is that the distribution of the estimated densities suggests a more clear separation of the clusters.

Refer to caption
Figure 6: Upper row: scores scatterplots for the first two components; first component on the horizontal axis, second on the vertical. Lower row: kernel density estimation investigation of modality for the first scores. From left to right: T-PCA with SI ordering; T-PCA with SO ordering; ST-PCA. Shows that ST-PCA gives the best separation of these clusters in the first scores.

6 Discussion

In this paper, a novel approach to dimensionality reduction on the torus has been developed. It is an analogue of PCA, as it provides modes of variations and scores. Consistent with previous work, we have provided evidence that for toroidal data a transformation to the sphere induces less distortion compared to direct Euclidean transformations. We propose a fully data-driven transformation that is optimal in terms of minimizing the stress function inherited from the MDS literature. This key advantage of ST-PCA comes also at a price: the inheritance of the computational limitations of MDS for larger datasets and the delicate inversion procedure of such transformation. ST-PCA’s benefits are validated by real data applications on the appearance of sunspots and an benchmark RNA dataset, where ST-PCA is seen to overperform previous methods in both qualitative and quantitative metrics.

Acknowledgments

We are grateful to Benjamin Eltzner for kindly sharing his implementation of T-PCA. The second author acknowledges support by grants PGC2018-097284-B-100 and IJCI-2017-32005 by Spain’s Ministry of Science, Innovation and Universities. Both grants are co-funded with FEDER funds.

References

  • Altis et al., (2007) Altis, A., Nguyen, P. H., Hegger, R., and Stock, G. (2007). Dihedral angle principal component analysis of molecular dynamics simulations. J. Chem. Phys., 126(24):244111.
  • Babcock, (1961) Babcock, H. W. (1961). The topology of the Sun’s magnetic field and the 22-year cycle. Astrophys. J., 133(2):572–587.
  • Bai et al., (2015) Bai, S., Qi, H.-D., and Xiu, N. (2015). Constrained best Euclidean distance embedding on a sphere: A matrix optimization approach. SIAM J. Optim., 25(1):439–467.
  • Bogart, (1982) Bogart, R. S. (1982). Recurrence of solar activity: Evidence for active longitudes. Solar Phys., 76(1):155–165.
  • Borg and Groenen, (2005) Borg, I. and Groenen, P. J. F. (2005). Modern Multidimensional Scaling Theory and Applications. Springer, New York.
  • Borg and Lingoes, (1980) Borg, I. and Lingoes, J. (1980). A model and algorithm for multidimensional scaling with external constraints on the distances. Psychometrika, 45:25–38.
  • Cox and Cox, (2008) Cox, M. A. A. and Cox, T. F. (2008). Multidimensional scaling. In Handbook of Data Visualization, pages 315–347. Springer, Berlin.
  • Cox and Cox, (1991) Cox, T. F. and Cox, M. A. A. (1991). Multidimensional scaling on a sphere. Commun. Stat. Theory Methods, 20(9):2943–2953.
  • Damon and Marron, (2014) Damon, J. and Marron, J. S. (2014). Backwards principal component analysis and principal nested relations. J. Math. Imaging Vis., 50(1):107–114.
  • de Leeuw and Heiser, (1980) de Leeuw, J. and Heiser, W. (1980). Multidimensional scaling with restrictions on the configuration. J. Multivar. Anal, 5:501–522.
  • de Leeuw and Mair, (2009) de Leeuw, J. and Mair, P. (2009). Multidimensional scaling using majorization: SMACOF in R. J. Stat. Softw., 31(3):1–30.
  • Dryden, (2021) Dryden, I. L. (2021). shapes: Statistical Shape Analysis. R package version 1.2.6.
  • Duarte and Pyle, (1998) Duarte, C. M. and Pyle, A. M. (1998). Stepping through an RNA structure: A novel approach to conformational analysis. J. Mol. Biol, 284(5):1465–1478.
  • Ducharme et al., (2012) Ducharme, G. R., Vincent, C., and Aliaume, C. (2012). A statistical test to detect vortices in the current fields of bodies of water. Environ. Ecol. Stat., 19(3):345–367.
  • Duong, (2021) Duong, T. (2021). ks: Kernel Smoothing. R package version 1.13.2.
  • Elad et al., (2005) Elad, A., Keller, Y., and Kimmel, R. (2005). Texture mapping via spherical multi-dimensional scaling. In Kimmel, R., Sochen, N. A., and Weickert, J., editors, Scale Space and PDE Methods in Computer Vision, pages 433–455, Berlin. Springer.
  • Eltzner et al., (2018) Eltzner, B., Huckemann, S., and Mardia, K. V. (2018). Torus principal component analysis with applications to RNA structure. Ann. Appl. Stat., 12(2):1332–1359.
  • Fletcher et al., (2004) Fletcher, P. T., Lu, C., Pizer, S. M., and Joshi, S. (2004). Principal geodesic analysis for the study of nonlinear statistics of shape. IEEE Trans. Med. Imaging, 23(8):995–1005.
  • García-Portugués et al., (2020) García-Portugués, E., Paindaveine, D., and Verdebout, T. (2020). On optimal tests for rotational symmetry against new classes of hyperspherical distributions. J. Am. Stat. Assoc., 115(532):1873–1887.
  • García-Portugués and Verdebout, (2021) García-Portugués, E. and Verdebout, T. (2021). sphunif: Uniformity Tests on the Circle, Sphere, and Hypersphere. R package version 1.0.1.
  • Gower and Hand, (1996) Gower, J. C. and Hand, D. J. (1996). Biplots. Monographs on Statistics and Applied Probability. Chapman and Hall, London.
  • Haigh, (2007) Haigh, J. (2007). The Sun and the Earth’s climate. Living Rev. Sol. Phys, 4(2).
  • Huckemann et al., (2010) Huckemann, S., Hotz, T., and Munk, A. (2010). Intrinsic shape analysis: geodesic PCA for Riemannian manifolds modulo isometric Lie group actions. Stat. Sin., 20(1):1–58.
  • Huckemann and Ziezold, (2006) Huckemann, S. and Ziezold, H. (2006). Principal component analysis for Riemannian manifolds, with an application to triangular shape spaces. Adv. Appl. Probab., 38(2):299–319.
  • Huckemann and Eltzner, (2018) Huckemann, S. F. and Eltzner, B. (2018). Backward nested descriptors asymptotics with inference on stem cell differentiation. Ann. Stat., 46(5):1994–2019.
  • Jung et al., (2012) Jung, S., Dryden, I. S., and Marron, J. S. (2012). Analysis of principal nested spheres. Biometrika, 99(3):551–568.
  • Jung et al., (2011) Jung, S., Foskey, M., and Marron, J. S. (2011). Principal arc analysis on direct product manifolds. Ann. Appl. Stat., 5(1):578–603.
  • Jupp et al., (2003) Jupp, P. E., Kim, P. T., Koo, J.-Y., and Wiegert, P. (2003). The intrinsic distribution and selection bias of long-period cometary orbits. J. Am. Stat. Assoc., 98(463):515–521.
  • Kent and Mardia, (2009) Kent, J. T. and Mardia, K. V. (2009). Principal component analysis for the wrapped normal torus model. In Gusnanto, A., Mardia, K. V., and Fallaize, C. J., editors, LASR 2009 – Statistical Tools for Challenges in Bioinformatics, pages 39–41, Leeds. Department of Statistics, University of Leeds.
  • Lee and Bentler, (1980) Lee, S. Y. and Bentler, P. M. (1980). Functional relations in multidimensional scaling. British J. Math. Stat. Psych., 33(2):142–150.
  • Marron and Alonso, (2014) Marron, J. S. and Alonso, A. M. (2014). Overview of object oriented data analysis. Biom. J., 56(5):732–753.
  • Morgan, (1974) Morgan, C. L. (1974). Embedding metric spaces in Euclidean space. J. Geom., 5(1):101–107.
  • Mu et al., (2005) Mu, Y., Nguyen, P. H., and Stock, G. (2005). Energy landscape of a small peptide revealed by dihedral angle principal component analysis. Proteins, 58(1):45–52.
  • Nodehi et al., (2015) Nodehi, A., Golalizadeh, M., and Heydari, A. (2015). Dihedral angles principal geodesic analysis using nonlinear statistics. J. Appl. Stat., 42(9):1962–1972.
  • Nodehi et al., (2020) Nodehi, A., Golalizadeh, M., Maadooliat, M., and Agostinelli, C. (2020). Torus probabilistic principal component analysis. arXiv:2008.10725.
  • Papazoglou and Mylonas, (2017) Papazoglou, S. and Mylonas, K. (2017). An examination of alternative multidimensional scaling techniques. Educ. Psychol. Meas., 77:429–448.
  • Pekalska and Duin, (2005) Pekalska, E. and Duin, R. P. W. (2005). The Dissimilarity Representation for Pattern Recognition: Foundations and Applications. World Scientific Publishing, Singapore.
  • Pelt et al., (2010) Pelt, J., Korpi, M. J., and Tuominen, I. (2010). Solar active regions: A nonparametric statistical analysis. Astron. Astrophys., 513:A48.
  • Pennec, (2018) Pennec, X. (2018). Barycentric subspace analysis on manifolds. Ann. Stat., 46(6A):2711–2746.
  • Pewsey and García-Portugués, (2021) Pewsey, A. and García-Portugués, E. (2021). Recent advances in directional statistics. Test, 30(1):1–58.
  • Pizer et al., (2020) Pizer, S. M., Hong, J., Vicory, J., Liu, Z., Marron, J. S., Choi, H.-Y., Damon, J., Jung, S., Paniagua, B., Schulz, J., Sharma, A., Tu, L., and Wang, J. (2020). Object shape representation via skeletal models (s-reps) and statistical analysis. In Pennec, X., Sommer, S., and Fletcher, T., editors, Riemannian Geometric Statistics in Medical Image Analysis, pages 233–271. Academic Press, London.
  • Pizer and Marron, (2017) Pizer, S. M. and Marron, J. (2017). Object statistics on curved manifolds. In Zheng, G., Li, S., and Székely, G., editors, Statistical Shape and Deformation Analysis, pages 137–164. Academic Press, Cambridge, Massachusetts.
  • Ramachandran et al., (1963) Ramachandran, G. N., Ramakrishnan, C., and Sasisekharan, V. (1963). Stereochemistry of polypeptide chain configurations. J. Mol. Biol., 7:95–99.
  • Riccardi et al., (2009) Riccardi, L., Nguyen, P. H., and Stock, G. (2009). Free-energy landscape of RNA hairpins constructed via dihedral angle principal component analysis. J. Phys. Chem. B, 113(52):16660–16668.
  • Saeed et al., (2018) Saeed, N., Nam, H., Haq, M. I. U., and Muhammad Saqib, D. B. (2018). A survey on multidimensional scaling. ACM Comput. Surv., 51(3).
  • Sargsyan et al., (2012) Sargsyan, K., Wright, J., and Lim, C. (2012). GeoPCA: a new tool for multivariate analysis of dihedral angles based on principal component geodesics. Nucleic Acids Res., 40(3):e25–e25.
  • Schoenberg, (1937) Schoenberg, I. J. (1937). On certain metric spaces arising from Euclidean spaces by a change of metric and their imbedding in Hilbert space. Ann. Math., 38(4):787–793.
  • Sittel et al., (2017) Sittel, F., Filk, T., and Stock, G. (2017). Principal component analysis on a torus: theory and application to protein dynamics. J. Chem. Phys., 147(24):244101.
  • Wadley et al., (2007) Wadley, L. M., Keating, K. S., Duarte, C. M., and Pyle, A. M. (2007). Evaluating and learning from RNA pseudotorsional space: Quantitative validation of a reduced representation for RNA structure. J. Mol. Biol, 372(4):942–957.
  • Wand and Jones, (1995) Wand, M. P. and Jones, M. C. (1995). Kernel Smoothing, volume 60 of Monographs on Statistics and Applied Probability. Chapman & Hall, London.