¹¹footnotetext: Department of Statistics and Operations Research, University of North Carolina at Chapel Hill (USA).²²footnotetext: Department of Statistics, Carlos III University of Madrid (Spain).³³footnotetext: Corresponding author. e-mail: pavlos@live.unc.edu.

Scaled torus principal component analysis

Pavlos Zoubouloglou^1,3, Eduardo García-Portugués², and J. S. Marron¹

Abstract

A particularly challenging context for dimensionality reduction is multivariate circular data, i.e., data supported on a torus. Such kind of data appears, e.g., in the analysis of various phenomena in ecology and astronomy, as well as in molecular structures. This paper introduces Scaled Torus Principal Component Analysis (ST-PCA), a novel approach to perform dimensionality reduction with toroidal data. ST-PCA finds a data-driven map from a torus to a sphere of the same dimension and a certain radius. The map is constructed with multidimensional scaling to minimize the discrepancy between pairwise geodesic distances in both spaces. ST-PCA then resorts to principal nested spheres to obtain a nested sequence of subspheres that best fits the data, which can afterwards be inverted back to the torus. Numerical experiments illustrate how ST-PCA can be used to achieve meaningful dimensionality reduction on low-dimensional torii, particularly with the purpose of clusters separation, while two data applications in astronomy (three-dimensional torus) and molecular biology (on a seven-dimensional torus) show that ST-PCA outperforms existing methods for the investigated datasets.

Keywords: Dimension reduction; Directional statistics; Multidimensional scaling; Principal component analysis; Statistics on manifolds.

1 Introduction

Recent advances in the way information is collected have brought along the need to analyze data of a non-Euclidean nature. Important among these are multivariate circular data, or toroidal data, which are vectors of angles that are naturally represented on the torus $\mathbb{T}^{d}:=[-\pi,\pi)^{d}$ , $d\geq 1$ , where $-\pi$ and $\pi$ are identified. An important instance of toroidal data happens in molecular biology, where vectors of torsion angles characterize protein and RNA structures (Ramachandran et al.,, 1963; Duarte and Pyle,, 1998). A different source for data on $\mathbb{T}^{d}$ is environmental sciences, with planar directions being key constituents of wind and sea currents vector fields; see Ducharme et al., (2012) for an application in the latter context. Another popular instance of non-Euclidean data is spherical data, i.e., data on $\mathbb{S}^{d}:=\{\bm{x}\in\mathbb{R}^{d+1}:\|\bm{x}\|=1\}$ with $d=2$ . Among other fields, spherical data plays a relevant role in astronomy; see Jupp et al., (2003) for a statistical analysis of the normal vectors associated to long-period comet orbits. Pewsey and García-Portugués, (2021) and Marron and Alonso, (2014) provide recent surveys of existing statistical methodology for data on spheres and torii, and for other types of non-Euclidean spaces, respectively.

Principal Component Analysis (PCA) is arguably the most popular multivariate technique, its adaptation to non-Euclidean data being the subject of considerable research in the last decades. A first step towards identifying modes of variation for manifold-valued data were the tangent plane-based methods, which seek to utilize the locally Euclidean property of manifolds. An important representative of them is the Principal Geodesic Analysis (PGA) of Fletcher et al., (2004). Recall that (classic) Euclidean PCA builds the directions of maximal projected variation starting at the sample mean. PGA parallels this by starting with the Fréchet (sample) mean and finding the geodesic through that mean on which the projected variation in the tangent plane is maximal. Thus the first principal geodesic is obtained and the remaining modes of variation are obtained sequentially, by imposing orthogonality on geodesics at the Fréchet mean. Huckemann and Ziezold, (2006) showed that the constraint of geodesics passing through that mean seriously limited the flexibility of PGA and proposed Geodesic-PCA, which removes precisely that constraint. Later, Huckemann et al., (2010) proposed generalized geodesics and orthogonal projections on quotient spaces, and used these as components for a PCA on certain Riemannian manifolds.

In the context of skeletal models for object shape representations, modes of variation on the sphere that follow non-great subspheres (i.e., non-geodesics) can provide more effective dimensionality reduction, as noted, e.g., in Pizer and Marron, (2017) and Pizer et al., (2020). This motivated the principal arc analysis of Jung et al., (2011) and its generalization on $\mathbb{S}^{d}$ , $d\geq 2$ , Jung et al., (2012)’s Principal Nested Spheres (PNS). PNS exploits the fact that intersecting $\mathbb{S}^{d}$ with a hyperplane of its Euclidean embedding yields a subsphere isomorphic to $\mathbb{S}^{d-1}$ which is not necessarily a great subsphere. Therefore, PNS offers a richer family of possible modes of variation on $\mathbb{S}^{d}$ than manifold-generic methods for dimensionality reduction, that are focused on great subspheres only. This provided evidence for the utility of exploiting the specificity of the data support in building a more informative dimension-reduction analysis. PNS features an iterative process that yields a sequence of best-fitting subspheres of decreasing dimensions to $\mathbb{S}^{d}$ . This motivated the concept of “backwards analogues” of PCA, as discussed in Damon and Marron, (2014): for a $d$ -dimensional manifold, a $(d-1)$ -dimensional best-fitting submanifold is identified, and then this process is iterated, in a nested fashion, until a one-dimensional mode of variation has been obtained. Under fairly general assumptions, Huckemann and Eltzner, (2018) proved that backwards component sequences (including PNS) satisfy asymptotic results, such as strong consistency and asymptotic normality. Yet both forward and backward analogues of PCA are greedy algorithms, which do not necessarily guarantee a global optimum in the sequence of proportion of explained variance. In response to this point, Pennec, (2018) developed barycentric subspace analysis, which introduces a nested (backwards or forward) sequence of submanifolds that is defined as the locus of points that are manifold-affine spans of a set of $d+1$ points in a $d$ -dimensional manifold (great subspheres for $\mathbb{S}^{d}$ ).

Despite its apparent simplicity, the torus is a particularly challenging manifold for dimensionality reduction. Tangent plane methods disregard long-range periodicity and tend to underperform, unless the data are fairly closely distributed. An attempt to adapt geodesic-based methods on the torus is torus principal geodesic analysis by Nodehi et al., (2015). Yet, as noted in Eltzner et al., (2018), geodesics on the torus face the problem of producing space-filling curves almost surely by winding around indefinitely. For that reason, existing methods on PCA adaptations for toroidal data can be classified into those relying either on (a) imposing a certain distributional model on the data or (b) transforming the data to a space where dimension-reduction tools are already consolidated. With respect to approach (a), one can find Kent and Mardia, (2009)’s approach of PCA on the covariance matrix of a wrapped normal model on the torus. A recent approach proposed by Nodehi et al., (2020) extends probabilistic PCA to Torus Probabilistic PCA (TPPCA), where the toroidal data are expressed as a latent variable model modulo $2\pi$ , while assuming Gaussianity for the latent variables. The parameters of the model are then to be estimated by an Expectation–Maximization algorithm. With regard to approach (b), typically the arrival space has been set as the Euclidean space or variations thereof. For example, Mu et al., (2005) proposed dihedral PCA (dPCA), which leverages the usual embedding $\mathbb{T}^{d}\rightarrow[-1,1]^{2d}$ via sines and cosines, and then performs PCA on $\mathbb{R}^{2d}$ , a drawback being that ordinary PCA ignores the fact that the data lie on a curved $d$ -dimensional manifold within $[-1,1]^{2d}$ . A similar transformation is that of Altis et al., (2007), using the complex representation of the angles. Riccardi et al., (2009) propose angular PCA (aPCA), where they first center the toroidal data and then proceed with usual PCA. Sittel et al., (2017) modified aPCA to dPCA+, a variant that attempts to minimize the distortion by shifting the “boundaries” of $\mathbb{T}^{d}$ to the lowest density region.

Still within (b), but differently to the previous approaches, Sargsyan et al., (2012)’s Geodesic PCA deforms $\mathbb{T}^{d}$ to $\mathbb{S}^{d}$ , then performs dimensionality reduction in the latter space by PGA. Nevertheless, this transformation directly uses the angles in $\mathbb{T}^{d}$ as the $d$ angles that determine the hyperspherical coordinates of $\mathbb{S}^{d}$ , with the consequent appearance of singularities (e.g., when $d=2$ the two maximally-separated points $(\pm\pi/2,\pm\pi/2)$ on $\mathbb{T}^{2}$ are mapped to the same point on $\mathbb{S}^{2}$ ). Perhaps the most successful and advanced existing method for PCA on the torus is Eltzner et al., (2018)’s Torus PCA (T-PCA), where $\mathbb{T}^{d}$ is also deformed to $\mathbb{S}^{d}$ , now with a more sophisticated transformation, and PNS is then applied. However, the deformation in T-PCA is not invariant to permutations of the angular variables, which introduce a source of subjectivity in the analysis. To alleviate this issue, the authors propose two data-driven orderings of variables, SI ordering and SO ordering, corresponding to sorting the variables in terms of descending and increasing amount of circular spread, respectively. Yet the practitioner has to choose one ordering, the two possibly yielding significantly different scores that may affect subsequent analyses (as evidenced in Figure 6). Furthermore, the deformation to $\mathbb{S}^{d}$ , although designed to cut open the torus at a point with the minimum distortion, may induce relevant artifacts to certain datasets. These deformation artifacts can create spurious cluster structures in the scores of datasets with non-existent cluster structures (see Figure 5). A final limitation of T-PCA, presented in Remark 2.3 of Eltzner et al., (2018), is that T-PCA is applicable whenever there is a structural data gap in all angles except for at most two.

In this paper we propose a toroidal PCA that is based on embedding the sample on $\mathbb{T}^{d}$ into $\mathbb{S}^{d}$ by using a data-driven map aimed at preserving the pairwise geodesic distances of the data. Our dimension-reduction method is summarized in three steps. First, we transform the original sample on $\mathbb{T}^{d}$ to a similar configuration of points on a $d$ -dimensional sphere by using Spherical MultiDimensional Scaling (SMDS). SMDS is an optimization procedure designed to minimize the squared differences between the sample pairwise geodesic distances on $\mathbb{T}^{d}$ and the corresponding pairwise geodesic distances of the transformed sample on a $d$ -sphere. The radius of the arrival $d$ -sphere is also chosen automatically in a further attempt to minimize distortions. Second, we apply PNS to the transformed data to find the sequence of best-fitting subspheres and the scores associated to them. Finally, we show how to optionally invert the SMDS map, in a step usually referred to as prediction, to produce a low-dimensional representation of the data in the original torus via the inverted PNS scores. Due to the multidimensional scaling operation, we henceforth refer to our method as Scaled Torus PCA (ST-PCA). An illustration of the adequacy of ST-PCA for data analysis is advanced in Figure 1. Figure 1c shows astronomical data on the appearance of sunspots, which lie on $\mathbb{T}^{3}$ in a roughly diagonal pattern, and are later described in detail in Section 5.1. Figure 1a displays the scores from standard PCA, which disregards the periodicity of the data, while Figure 1b shows that ST-PCA gives more sensible scores. Figure 1c shows that while T-PCA honors the periodicity of the data, it yields a less efficient mode of variation (red curve) when compared to ST-PCA (blue curve). In this case study, and among other competitors, ST-PCA performs the best in the sense of preserving data fidelity and explaining the most variance in the first principal component. A second case study to an RNA dataset in $\mathbb{T}^{7}$ that has lately been used as benchmark in the literature also evidences the adequacy of ST-PCA among other existing alternatives, in the sense of correctly clustering the largest amount of points based on the first principal component scores. We evidence also the ability of ST-PCA to perform dimension reduction and identifying clusters in the data with numerical experiments on different data patterns on $\mathbb{T}^{2}$ and $\mathbb{T}^{3}$ .

The rest of this paper is organized as follows. Section 2 explores the links between torii and spheres and provides a review of multidimensional scaling, with discussion on relevant literature. In Section 3 we develop ST-PCA in full detail and discuss the subtleties on its implementation. A simulation study on clustered data is provided in Section 4 to illustrate the strengths of ST-PCA. We apply ST-PCA to two real datasets in Section 5, evidencing the benefits of ST-PCA with respect to available alternatives. The paper concludes with a discussion in Section 6.

2 Background

The first subsection introduces some required notions from geometry and provide justification on why the sphere is a more natural arrival space than the Euclidean space. The reader who is curious about further relevant geometric concepts is referred to Huckemann et al., (2010) for a concise summary that is very accessible to statisticians. The second subsection provides details on classical Euclidean MDS and its spherical variant. An overview of related work is provided.

2.1 Some geometrical considerations

Traditional data embeddings into metric spaces take place in low-dimensional Euclidean spaces, as these allow for easiest visualization and inference. Results for isometric embeddings to the Euclidean space are due to the Nash embedding theorem for Riemannian manifolds and Morgan, (1974) for arbitrary metric spaces. Recent advances in machine learning and computer graphics have popularized the idea of embedding data that lie in inherently curved spaces into lower-dimensional non-Euclidean spaces, as these spaces may reduce the distortion of the embedding.

In our setting, the $d$ -sphere of radius $r$ and the $d$ -torus are specially relevant. These spaces are respectively defined as $\mathbb{S}^{d}_{r}:=\{\bm{x}\in\mathbb{R}^{d+1}:\|\bm{x}\|=r\}$ , where $\|\cdot\|$ is the Euclidean norm, and $\mathbb{T}^{d}:=\allowbreak(\mathbb{S}^{1})^{d}:=\mathbb{S}^{1}\times\stackrel{{\scriptstyle d}}{{\cdots}}\times\mathbb{S}^{1}$ or, equivalently and as previously introduced, $\mathbb{T}^{d}:=[-\pi,\pi)^{d}$ with $-\pi$ and $\pi$ identified, since $\mathbb{S}^{1}$ can also be thought of as the quotient space $\mathbb{R}/(2\pi\mathbb{Z})$ . For simplicity of notation, we denote $\mathbb{S}^{d}_{1}$ by $\mathbb{S}^{d}$ . Geodesic distances on $\mathbb{S}^{d}_{r}$ are given by

\displaystyle\delta_{\mathbb{S}^{d}_{r}}(\bm{x},\bm{y})=r\arccos\left(\bm{x}^{\prime}\bm{y}/r^{2}\right),\quad\bm{x},\bm{y}\in\mathbb{S}^{d}_{r}.

(1)

For angular coordinates on $\mathbb{S}^{1}$ , $\delta_{\mathbb{S}^{1}}(\phi,\psi)=\min\{|\phi-\psi|,2\pi-|\phi-\psi|\}$ , $\phi,\psi\in[-\pi,\pi)$ . The distance on $\mathbb{T}^{d}$ benefits from the space’s product structure: $\delta_{\mathbb{T}^{d}}(\bm{\phi},\bm{\psi}):=\big{(}\sum_{i=1}^{d}\delta_{\mathbb{S}^{1}}(\phi_{i},\psi_{i})^{2}\big{)}^{1/2}$ , $\bm{\phi},\bm{\psi}\in\mathbb{T}^{d}$ .

The construction of our dimensionality reduction tool rests upon the underlying idea of Eltzner et al., (2018) that spherical embeddings are more appropriate than Euclidean ones when considering transformations from $\mathbb{T}^{d}$ . While a definitely arguable claim, we present next a series of reasons supporting it. First, $\mathbb{S}_{r}^{d}$ is a space benign enough to isometrically embed an arbitrary distance under certain conditions. Precisely, Schoenberg, (1937) provides necessary and sufficient conditions for an isometric embedding of an arbitrary distance to the sphere, as shown in Theorem 3.22 of Pekalska and Duin, (2005):

Theorem 1 (Schoenberg, 1937).

An $n\times n$ distance matrix $\bm{D}=(\delta_{ij})_{i,j=1}^{n}$ can be isometrically embedded to a sphere $\mathbb{S}^{m}_{r}$ of radius $r>0$ and dimension $m$ if and only if $\delta_{ij}\leq\pi r$ , for all $i,j=1,\ldots,n$ , and the matrix $\bm{G}=(\cos(\delta_{ij}/r))_{i,j=1}^{n}$ is positive semidefinite. Then, the smallest $m$ such that $\bm{D}$ embeds to $\mathbb{S}^{m}_{r}$ is $m=\mathrm{rank}(\bm{G})-1$ , while the solution is undefined for $\mathrm{rank}(\bm{G})=1$ .

Second, both $\mathbb{S}_{r}^{d}$ and $\mathbb{T}^{d}$ , endowed with their respective metrics, are compact Riemannian manifolds for every $d\in\mathbb{N}$ , unlike $\mathbb{R}^{d}$ with the usual Euclidean distance. Hence the second similarity of the two manifolds (and dissimilarity with the Euclidean space) comes from topology. Third, $\mathbb{S}_{r}^{d}$ has a subspace isomorphic to $\mathbb{S}^{1}$ (recall, e.g., its parametrization in terms of hyperspherical coordinates) that contributes on preserving the $d$ -dimensional periodicity in $\mathbb{T}^{d}$ , as opposed to $\mathbb{R}^{d}$ , where periodicity is nonexistent. Fourth, as noted in the context of data analysis at least as early as in Cox and Cox, (1991), convex hulls in $\mathbb{S}^{d}$ and $\mathbb{T}^{d}$ present definition issues, in contradiction to $\mathbb{R}^{d}$ , where the convex hull of the data provides a natural notion of boundary. A final, yet very important reason, comes from statistics and concerns the ability of PNS to capture cluster structure when compared to PCA. Consider the problem of a dataset with three normal-like clusters on $\mathbb{S}^{d}$ and $\mathbb{R}^{d}$ . In $\mathbb{S}^{d}$ it is always possible to capture three clusters by using a one-dimensional periodic mode of variation ( $\mathbb{S}^{1}$ ), as long as one allows small circles. However, unless the three clusters have colinear centers, in $\mathbb{R}^{d}$ one can only flawlessly capture up to two clusters by using a straight line. Therefore, an embedding into $\mathbb{S}^{d}$ followed by PNS allows for a more efficient cluster hunting than one into $\mathbb{R}^{d}$ and subsequent PCA.

2.2 Multidimensional scaling

MultiDimensional Scaling (MDS) is a well-established method with origins that date to the psychometrics literature, as early as the 1950’s. Many reviews and books already exist on the topic, see, e.g., Borg and Groenen, (2005), Cox and Cox, (2008), or Saeed et al., (2018). We review the Spherical MDS literature relevant for our work. We will only discuss and work with what is called metric MDS and in particular, its classical and spherical variant.

2.2.1 Classical Euclidean MDS

Let $\bm{x}_{1},\ldots,\bm{x}_{n}$ be a sample of elements in an arbitrary metric space $(X,\delta_{X})$ , where $\delta_{X}$ is a distance function on $X$ . For a given $p\in\mathbb{N}$ , the metric version of Classical MDS seeks to find a configuration $\widehat{\bm{y}}_{1},\ldots,\widehat{\bm{y}}_{n}\in\mathbb{R}^{p}$ such that

\displaystyle(\widehat{\bm{y}}_{1},\ldots,\widehat{\bm{y}}_{n})=\arg\min_{(\bm{y}_{1},\ldots,\bm{y}_{n})\in(\mathbb{R}^{p})^{n}}\frac{1}{n(n-1)}\sum_{i\neq j}(\delta_{X}(\bm{x}_{i},\bm{x}_{j})-\|\bm{y}_{i}-\bm{y}_{j}\|)^{2}.

(2)

When $(X,\delta_{X})=(\mathbb{R}^{d},\|\cdot\|)$ for $d\in\mathbb{N}$ , $d\geq p$ , (2) is essentially a standard PCA up to translation (see, e.g., Section 2.2.7 of Cox and Cox, (2008)), where $\widehat{\bm{y}}_{1},\ldots,\widehat{\bm{y}}_{n}$ are the projections of $\bm{x}_{1},\ldots,\bm{x}_{n}$ to the best-fitting $\mathbb{R}^{p}$ , up to isometry.

2.2.2 Spherical MDS

In the non-classical context, we consider a second metric space $(Y,\delta_{Y})$ , not necessarily Euclidean, to which we want to map our original data, see, e.g., Cox and Cox, (1991, 2008). Then, (2) is equivalent to finding a configuration $\widehat{\bm{y}}_{1},\ldots,\widehat{\bm{y}}_{n}\in Y$ such that

\displaystyle(\widehat{\bm{y}}_{1},\ldots,\widehat{\bm{y}}_{n})=\arg\min_{(\bm{y}_{1},\ldots,\bm{y}_{n})\in Y^{n}}\frac{1}{n(n-1)}\sum_{i\neq j}(\delta_{X}(\bm{x}_{i},\bm{x}_{j})-\delta_{Y}(\bm{y}_{i},\bm{y}_{j}))^{2}.

(3)

The objective function in (2) and (3) is referred to as the stress function for the given method. In the case that $(Y,\delta_{Y})=(\mathbb{S}^{d}_{r},\delta_{\mathbb{S}^{d}_{r}})$ , we refer to (3) as Spherical MDS (SMDS). Typically, to solve the Spherical MDS problem (4) where $(X,d_{X})$ is arbitrary, one of two strategies have been used. The first represents the data on $\mathbb{S}_{r}^{d}$ with hyperspherical coordinates and numerically optimizes in (3) over $Y=[0,\pi]^{d-1}\times[-\pi,\pi)$ . Its main advantage is computational convenience, as for the optimization only some box constraints are required. A nuisance of this approach is that computation of Cartesian coordinates is required for computing geodesic distances on $\mathbb{S}_{r}^{d}$ , for $d\geq 3$ ; perhaps for this reason, instances of this approach seem to have focused on $d=1,2$ (Elad et al.,, 2005; Cox and Cox,, 1991; Papazoglou and Mylonas,, 2017). The second strategy employs the Euclidean parametrization, as well as either linear or quadratic constraints that will force the new configuration to lie on a sphere. This, however, may yield a somehow demanding optimization problem. Instances of such work include Lee and Bentler, (1980), Borg and Lingoes, (1980), and de Leeuw and Heiser, (1980). A particularly successful approach in this subcategory that can solve MDS with quadratic constraints by employing majorization techniques is the primal method of MDS-Quadratic (MDS-Q), proposed by de Leeuw and Mair, (2009). MDS-Q reduces to Spherical MDS when $Y=\mathbb{S}^{d}$ in (3). A faster algorithm is also supported by MDS-Q, referred to as the dual method of MDS-Q, which yields approximately spherical solutions (and so a projection to the sphere must follow).

3 Scaled Torus-PCA

We provide next a high-level overview of Scaled Torus-PCA (ST-PCA). Given a sample $\bm{x}_{1},\ldots,\bm{x}_{n}\in\mathbb{T}^{d}$ , $d\in\mathbb{N}$ , ST-PCA proceeds in three main steps:

Obtain $\widehat{\bm{y}}_{1},\ldots,\widehat{\bm{y}}_{n}\in\mathbb{S}_{r}^{d}$ by solving the SMDS problem arising from (3) with $(X,\delta_{X})=(\mathbb{T}^{d},\delta_{\mathbb{T}^{d}})$ and $(Y,\delta_{Y})=(\mathbb{S}^{d}_{r},\delta_{\mathbb{S}^{d}_{r}})$ :

\displaystyle(\widehat{\bm{y}}_{1},\ldots,\widehat{\bm{y}}_{n})=\arg\min_{(\bm{y}_{1},\ldots,\bm{y}_{n})\in{(\mathbb{S}^{d}_{r})}^{n}}\frac{1}{n(n-1)}\sum_{i\neq j}(\delta_{\mathbb{T}^{d}}(\bm{x}_{i},\bm{x}_{j})-\delta_{\mathbb{S}^{d}_{r}}(\bm{y}_{i},\bm{y}_{j}))^{2}.

(4)

2.

Obtain $S^{j}$ , $j=d-1,\ldots,1$ , the nested sequence of subspaces that are isomorphic to $\mathbb{S}^{j}$ and that best fit $\widehat{\bm{y}}_{1},\ldots,\widehat{\bm{y}}_{n}$ according to PNS.
3.

Optionally, “invert” $S^{1}$ through a reverse SMDS problem to obtain a principal curve on $\mathbb{T}^{d}$ .

Step 1 is discussed in Subsection 3.1, while considerations on choosing the radius of $\mathbb{S}^{d}_{r}$ are addressed in Subsection 3.2. Steps 2 and 3 are elaborated in Subsections 3.3 and 3.4, respectively.

3.1 Solving the SMDS problem

For a given dataset $\bm{x}_{1},\ldots,\bm{x}_{n}\in\mathbb{T}^{d}$ and radius $r\in\mathbb{R}_{+}$ , in Step 1 we seek a solution to the optimization problem (4). While this could be achieved by using MDS-Q, as previously described, we have also explored the unconstrained optimization approach

\displaystyle(\widehat{\bm{z}}_{1},\ldots,\widehat{\bm{z}}_{n})=\underset{(\bm{z}_{1},\ldots,\bm{z}_{n})\in(\mathbb{R}^{d+1})^{n}}{\arg\min}\frac{1}{n(n-1)}\sum_{i\neq j}\left(\delta_{\mathbb{T}^{d}}(\bm{x}_{i},\bm{x}_{j})-r\arccos\left(\frac{\bm{z}_{i}^{\prime}\bm{z}_{j}}{\|\bm{z}_{i}\|\|\bm{z}_{j}\|}\right)\right)^{2},

(5)

whose solution is then projected as $\widehat{\bm{y}}_{i}:=r\widehat{\bm{z}}_{i}/\|\widehat{\bm{z}}_{i}\|\in\mathbb{S}^{d}_{r}$ , $i=1,\ldots,n$ . An advantage of this relatively simple approach over MDS-Q is that it can be further modified straightforwardly, as later discussed in Section 3.2. More importantly, even though the framework for a geodesic MDS-Q exists, existing software only features chordal distances on $\mathbb{S}^{d}$ . When either solving (4) or (5) some care is advised: none have unique global minima, as their objective functions are invariant under rotations of point configurations. In (4), this issue can be addressed by fixing $\bm{y}_{1}$ to, say, $(0,\ldots,0,1)^{\prime}$ and optimizing over the $(n-1)$ remaining $(\bm{y}_{2},\ldots,\bm{y}_{n})$ . In (5), one can fix $\bm{z}_{1}=(0,\ldots,0,s)^{\prime}$ and optimize on $(s,\bm{z}_{2},\ldots,\bm{z}_{n})\in\mathbb{R}_{+}\times(\mathbb{R}^{d+1})^{n-1}$ .

We ran experiments comparing the MDS-Q approach with the BFGS algorithm as implemented in R’s optim function to solve (6) (a slight modification of (5)). MDS-Q was run as implemented in the smacofSphere function of the smacof R package (de Leeuw and Mair,, 2009). Both SMDS algorithms were initiated by a projection to $\mathbb{S}^{d}$ of the classical MDS solution. Calculation times for the optimization of (6) also include the initialization of the radius according to (7). Optimization in (6) was conducted at a relative tolerance level of $5\times 10^{-2}$ and MDS-Q at $\varepsilon=5\times 10^{-2}$ , which was chosen due to comparable computational efficiency and notable improvements in stress over MDS-Q and can partly be attributed to MDS-Q’s non-geodesic calculations of distances. For the dual MDS-Q algorithm, the default penalty (10) was used. Each of the three numerical examples in Section 4 was generated 50 times and their results were averaged. In the setting of Subsection 4.1, our optimization of (6) took 6.45s and yielded a stress of 0.068, primal MDS-Q needed 14.29s and yielded a stress of 0.479, while dual MDS-Q reported 2.16s and 1.076, respectively. The respective metrics for the setting of Subsection 4.2 are: 7.28s and 0.064; 75.02s (with a range of 43–157s) and 1.23; 2.6 and 1.753. For the setting in Subsection 4.3 we obtained: 30.93s and 0.159; 14.65s and 0.645; and 2.19s and 0.691. Due to the above considerations, and since we specifically target geodesic distances, we have decided to use approach (6) in the sequel.

3.2 Considerations on the radius

The choice of $r$ in (4) and (5) affects the scaling of the spherical distances. This has an important effect, since distances on $\mathbb{S}^{d}_{r}$ are bounded by $r\pi$ while distances on $\mathbb{T}^{d}$ are bounded by $\sqrt{d}\pi$ . Besides, $\mathbb{S}^{d}_{r}$ becomes a very different space when $r\to 0$ or $r\to\infty$ , collapsing in the first case to $\{\mathbf{0}\}$ and becoming flat in the second limit case (the curvature of $\mathbb{S}_{r}^{d}$ is $1/r^{2}$ ).

A possibility is to incorporate $r$ into the optimization problem, modifying the radius of the arrival space in a data-driven form, as done by Bai et al., (2015). Precisely, we can turn (5) into

\displaystyle(\widehat{r},\widehat{\bm{z}}_{1},\ldots,\widehat{\bm{z}}_{n})=\underset{(r,\bm{z}_{1},\ldots,\bm{z}_{n})\in\mathbb{R}_{+}\times(\mathbb{R}^{d+1})^{n}}{\arg\min}\frac{1}{n(n-1)}\sum_{i\neq j}\left(\delta_{\mathbb{T}^{d}}(\bm{x}_{i},\bm{x}_{j})-r\arccos\left(\frac{\bm{z}_{i}^{\prime}\bm{z}_{j}}{\|\bm{z}_{i}\|\|\bm{z}_{j}\|}\right)\right)^{2}.

(6)

Including $r$ within the optimization in (5) is immediate, which is an additional benefit of (5). A simple initial value for $r$ is $\sqrt{d}$ ; it equates the maximum distances on $\mathbb{T}^{d}$ and $\mathbb{S}_{r}^{d}$ .

A more sophisticated approach to select $r$ that does not require expanding (5) to (6) is described next. Denote by $\mu$ and $\nu_{r}$ the two uniform measures on $\mathbb{T}^{d}$ and $\mathbb{S}_{r}^{d}$ , respectively. Let $\bm{u}_{1},\ldots,\bm{u}_{n}\sim\mu$ and $\bm{v}_{1},\ldots,\bm{v}_{n}\sim\nu_{r}$ be iid samples. Denote by $F_{N,\mathbb{T}^{d}}$ to the empirical cumulative distribution function (ecdf) of the pairwise distances $\{\delta_{\mathbb{T}^{d}}(\bm{u}_{i},\bm{u}_{j})\}_{1\leq i<j\leq n}$ and by $G_{N,\mathbb{S}_{r}^{d}}$ the analogous ecdf for $\{\delta_{\mathbb{S}_{r}^{d}}(\bm{v}_{i},\bm{v}_{j})\}_{1\leq i<j\leq n}$ , with $N:=n(n-1)/2$ . Define

\displaystyle r^{*}:=\arg\min_{r\geq 0}\mathbb{E}_{\mu\times\nu_{r}}\left[W_{1}(F_{N,\mathbb{T}^{d}},G_{N,\mathbb{S}_{r}^{d}})\right],\quad W_{1}(F_{N,\mathbb{T}^{d}},G_{N,\mathbb{S}_{r}^{d}})=\int_{0}^{1}\big{|}F^{(-1)}_{N,\mathbb{T}^{d}}(s)-G^{(-1)}_{N,\mathbb{S}_{r}^{d}}(s)\big{|}\,\mathrm{d}s,

(7)

which corresponds to the expected $1$ -Wasserstein distance between the two pairwise distance ecdfs under the product measure $\mu\times\nu_{r}$ . Above, $F^{(-1)}$ stands for the of generalized inverse of $F$ . As defined, $r^{*}$ guarantees that, on average, the most spread sample of size $n$ on $\mathbb{T}^{d}$ can be allocated on $\mathbb{S}^{d}_{r}$ while optimally maintaining its interpoint-distance distribution. Uniformly-distributed samples on $\mathbb{T}^{d}$ can be regarded as the worst-case scenario on which to perform SMDS effectively; $r^{*}$ is designed to aid precisely in this situation. In practice, the expectation in (7) can be estimated empirically by Monte Carlo and the Wasserstein distance is evaluated by its equivalent form $W_{1}(F_{N,\mathbb{T}^{d}},G_{N,\mathbb{S}_{r}^{d}})=\frac{1}{N}\sum_{i=1}^{N}\big{|}d_{(i),\mathbb{T}^{d}}-d_{(i),\mathbb{S}_{r}^{d}}\big{|}$ for $d_{(k),X}$ being the $k$ -th order statistic of $\{\delta_{X}(\bm{u}_{i},\bm{u}_{j})\}_{1\leq i<j\leq n}$ . In the experiments described in Subsection 3.1, as well as in the sequel, we computed $r^{*}$ as described in (7) estimating the expectation with $M=100$ Monte Carlo replicates of $n=100$ uniformly generated points each. Then, we have used this as an initial value for $\widehat{r}$ in solving (6).

3.3 PNS fit

By solving (5) we have found a configuration $\widehat{\bm{y}}_{1},\ldots,\widehat{\bm{y}}_{n}\in\mathbb{S}^{d}_{r}$ . For the easiness of presentation, we assume this configuration has been projected to $\mathbb{S}^{d}$ before applying PNS (which is invariant to scaling). We can now use PNS on $\widehat{\bm{y}}_{1},\ldots,\widehat{\bm{y}}_{n}$ to obtain a nested sequence of subspaces

\displaystyle\mathbb{S}^{d}\supset S^{d-1}\supset S^{d-2}\supset\ldots\supset S^{1}\supset\{\widehat{\bm{\mu}}\},

(8)

where $S^{d-1}\cong\mathbb{S}^{d-1},\ldots,S^{1}\cong\mathbb{S}^{1}$ , and $\widehat{\bm{\mu}}$ is the backwards mean, which is the Fréchet mean of the projections of the data onto $S^{1}$ . A possible parametrization for the $(d-1)$ -dimensional subsphere $S^{d-1}$ that sheds light into its small-subsphere structure is attained with the tangent-normal decomposition: $S^{d-1}=\{\bm{x}\in\mathbb{S}^{d}:\bm{x}=t_{1}\bm{\theta}_{1}+(1-t_{1}^{2})^{1/2}\bm{B}_{\bm{\theta}_{1}}\bm{\xi}_{1},\,\bm{\xi}_{1}\in\mathbb{S}^{d-1}\}$ , where $(t_{1},\bm{\theta}_{1})\in[-1,1]\times\mathbb{S}^{d}$ and $\bm{B}_{\bm{\theta}_{1}}$ is an arbitrary semi-orthonormal $(d+1)\times d$ matrix such that $\bm{B}_{\bm{\theta}_{1}}\bm{B}_{\bm{\theta}_{1}}^{\prime}=\bm{I}_{d+1}-\bm{\theta}_{1}\bm{\theta}_{1}^{\prime}$ and $\bm{B}_{\bm{\theta}_{1}}^{\prime}\bm{B}_{\bm{\theta}_{1}}=\bm{I}_{d}$ . Further subspheres can be parametrized iteratively in a nested fashion, yet explicit forms are more convoluted.

PNS also yields the matrix of scores, $\{\bm{\xi}_{i}\}_{i=1}^{n}$ , where $\bm{\xi}_{i}$ is a $d$ -vector (for angular parametrization) that denotes the coordinates of $\widehat{\bm{y}}_{i}$ with regard to the spherical components. This matrix gives rise to the space of principal scores $E=[-\pi,\pi)\times[-\pi/2,\pi/2]^{d-1}$ , a map $h:\mathbb{S}^{d}\to E$ of PNS such that $h(\widehat{\bm{y}}_{i})=\bm{\xi}_{i}$ , and its inverse, $\tilde{h}$ . Further, we denote by $\{\bm{\xi}_{i}^{(k)}\}$ the projection of the $i$ -th datum to $S^{k-1}$ , so that $\bm{\xi}_{i}^{(k)}$ agrees with $\bm{\xi}_{i}$ in the first $k$ -coordinates and is $0$ in the rest. Essentially, given a $\bm{\xi}\in E$ , where $\xi_{j}$ denotes the deviation from the $j$ -th nested sphere (of dimension $d-j$ ),

\displaystyle\tilde{h}(\bm{\xi})=(\tilde{g}_{1}\circ\cdots\circ\tilde{g}_{d-1})(\bm{\xi}),\quad\tilde{g}_{k}(\bm{\xi}^{(k-1)},\xi_{k}):=\bm{R}(\bm{v}_{k})^{\prime}\begin{pmatrix}\sin(r_{k}+\xi_{k})\bm{\xi}^{(k-1)}\\ \cos(r_{k}+\xi_{k})\end{pmatrix},\quad k=1,\ldots,d,

where $\bm{R}(\bm{v}_{k})$ is the matrix that rotates $\bm{v}_{k}$ , the center of the $k$ -th subsphere, to the north pole, and $r_{k}$ its radius.

For more information on the maps $h$ and $\tilde{h}$ the reader is referred to Section 4 of the Supplementary Material of Jung et al., (2012) and to the main paper for the construction of the nested spheres. PNS is implemented in the pns function of the shapes R package (Dryden,, 2021), which we have used.

3.4 Prediction

Step 1 of ST-PCA grants an association between the point configurations $(\bm{x}_{1},\ldots,\bm{x}_{n})\in(\mathbb{T}^{d})^{n}$ and $(\widehat{\bm{y}}_{1},\ldots,\widehat{\bm{y}}_{n})\in(\mathbb{S}^{d}_{r})^{n}$ . Suppose that we are given an arbitrary point $\bm{y}\in\mathbb{S}_{r}^{d}$ and are asked to predict a configuration $\bm{x}\in\mathbb{T}^{d}$ that best preserves the interpoint distances of $\bm{x}$ with respect to the original configuration $\bm{x}_{1},\ldots,\bm{x}_{n}$ . This problem is referred to as the prediction problem for MDS, see, e.g., Gower and Hand, (1996). This “prediction” can be achieved by defining the set-valued map $\widetilde{T}:\mathbb{S}^{d}_{r}\to\mathbb{T}^{d}$ given by

\displaystyle\widetilde{T}(\bm{y}):=\arg\min_{\bm{x}\in\mathbb{T}^{d}}\frac{1}{n}\sum_{i=1}^{n}\left(\delta_{\mathbb{T}^{d}}(\bm{x},\bm{x}_{i})-\delta_{\mathbb{S}_{r}^{d}}(\bm{y},\widehat{\bm{y}}_{i})\right)^{2},

(9)

which is a coherent prediction problem (Gower and Hand,, 1996) since the same objective function (stress) as in the original SMDS problem is used. In practice, an appropriate initialization for the numerical optimization to be done in (9) is the sample point $\bm{x}_{i}$ whose $\widehat{\bm{y}}_{i}$ is closest to $\bm{y}$ . For the sake of completeness, we mention that the interpolation problem for MDS refers to the reverse problem to (9), which induces the definition of the set-valued map $\widehat{T}:\mathbb{T}^{d}\to\mathbb{S}^{d}_{r}$ given by

\displaystyle\widehat{T}(\bm{x}):=\arg\min_{\bm{y}\in\mathbb{S}_{r}^{d}}\frac{1}{n}\sum_{i=1}^{n}\left(\delta_{\mathbb{T}^{d}}(\bm{x},\bm{x}_{i})-\delta_{\mathbb{S}_{r}^{d}}(\bm{y},\widehat{\bm{y}}_{i})\right)^{2}.

(10)

Both (9) and (10) give a means of extending the SMDS matching between $(\bm{x}_{1},\ldots,\bm{x}_{n})$ and $(\widehat{\bm{y}}_{1},\ldots,\widehat{\bm{y}}_{n})$ to arbitrary points on $\mathbb{T}^{d}$ and $\mathbb{S}_{r}^{d}$ , respectively.

We now explain how prediction is used to find a tractable first principal component on $\mathbb{T}^{d}$ . Recall from Section 3.3 that PNS yields an explicit map, $\tilde{h}:E\rightarrow\mathbb{S}^{d}$ . Using $\tilde{h}$ we can obtain the Euclidean parametrization of $S^{1}$ in $\mathbb{S}^{d}$ . The descriptor is obtained in two steps:

i.

Take a grid $\{\xi_{j}\}_{j=1}^{m}\subset[-\pi,\pi)$ . Then, $\{\bm{\xi}_{j}\}_{j=1}^{m}\subset E$ is a discrete representation of $S^{1}$ , where $\bm{\xi}_{j}=\xi_{j}\times\{0\}\times\stackrel{{\scriptstyle d-1}}{{\cdots}}\times\{0\}$ , and obtain $\{\bm{\psi}_{j}\}_{j=1}^{m}$ for $\bm{\psi}_{j}:=\tilde{h}(\bm{\xi}_{j})$ .
ii.

Compute $\widetilde{T}(\bm{\psi}_{j})$ for each $j=1,\ldots,m$ by numerically optimizing (9). To aid continuity in the resulting sequence of solutions of the optimization problems and avoid spurious minima, use the former prediction $\widetilde{T}(\bm{\psi}_{j})$ as starting value for obtaining the next one $\widetilde{T}(\bm{\psi}_{j+1})$ .

The above procedure generates a discretization of the set

\displaystyle\widetilde{S}^{1}:=\widetilde{T}(S^{1})=\{\widetilde{T}(\bm{\psi})\in\mathbb{T}^{d}:\bm{\psi}\in S^{1}\}.

We close this subsection with a remark. The strength and flexibility of ST-PCA relies on the fact that it searches for a transformation that is, in some sense, optimal. Since this is a solution of optimization problems, there is no guaranteed regularity for ST-PCA’s principal components on the torus. This can become apparent when predicting $S^{1}$ , a continuous curve on $\mathbb{S}^{d}$ whose predicting curve may yield discontinuities or non-closed curves in data-sparse regions of $\mathbb{T}^{d}$ . In practice, the continuity of $\widetilde{S}^{1}$ can be facilitated by sensible initializations, as above described, followed by a fine-grain reconstruction of $\widetilde{S}^{1}$ by linear interpolations of $\{\widetilde{T}(\bm{\psi}_{j})\}_{j=1}^{m}$ that are adapted to $\mathbb{T}^{d}$ . Nevertheless, for purposes of clustering and data visualization, Steps 1 and 2 of our algorithm suffice. Hence, even in the case that the principal component constructed by Step 3 of ST-PCA is discontinuous, this algorithm can yield meaningful dimensionality reduction.

3.5 Explained variance

Consider the projection of $\widehat{\bm{y}}_{i}$ to $S^{k}$ , denoted by $\widehat{\bm{y}}_{i}^{(k)}$ , $k=0,\ldots,d$ , where $\widehat{\bm{y}}_{i}^{(0)}=\widehat{\bm{\mu}}$ , $i=1,\ldots,n$ , and $\widehat{\bm{\mu}}$ is the Fréchet mean of the projections of the data on $S^{1}\cong\mathbb{S}^{1}$ . Then, the variance decomposition provided by PNS for such sample is

\displaystyle\mathrm{var}_{\mathbb{S}^{d}}(\widehat{\bm{y}}_{1},\ldots,\widehat{\bm{y}}_{n}):=\sum_{k=1}^{d}\mathrm{var}_{\mathbb{S}^{d}}^{(k)}(\widehat{\bm{y}}_{1},\ldots,\widehat{\bm{y}}_{n}):=\sum_{k=1}^{d}\sum_{i=1}^{n}\left(\delta_{\mathbb{S}^{d}}\big{(}\widehat{\bm{y}}_{i}^{(k-1)},\widehat{\bm{y}}_{i}^{(k)}\big{)}\right)^{2}

(11)

and the proportion of variance explained by the $k$ -th component is $\mathrm{var}_{\mathbb{S}^{d}}^{(k)}(\widehat{\bm{y}}_{1},\ldots,\widehat{\bm{y}}_{n})/\allowbreak\mathrm{var}_{\mathbb{S}^{d}}(\widehat{\bm{y}}_{1},\ldots,\widehat{\bm{y}}_{n})$ .

Denote by $\widehat{\bm{x}}_{i}^{(k)}=\widetilde{T}(\widehat{\bm{y}}_{i}^{(k)})$ , $k=0,\ldots,d$ , to the predicted projection $\widehat{\bm{y}}_{i}^{(k)}$ , which can be equivalently regarded as the projection of $\bm{x}_{i}$ to $\widetilde{S}^{k}$ . Then, based on (11), we define the total variance of $\bm{x}_{1},\ldots,\bm{x}_{n}$ on $\mathbb{T}^{d}$ as

\displaystyle\mathrm{var}_{\mathbb{T}^{d}}(\bm{x}_{1},\ldots,\bm{x}_{n}):=\sum_{k=1}^{d}\mathrm{var}^{(k)}_{\mathbb{T}^{d}}(\bm{x}_{1},\ldots,\bm{x}_{n}):=\sum_{k=1}^{d}\sum_{i=1}^{n}\left(\delta_{\mathbb{T}^{d}}\big{(}\widehat{\bm{x}}_{i}^{(k-1)},\widehat{\bm{x}}_{i}^{(k)}\big{)}\right)^{2}.

The proportion of variance explained by the $k$ -th component is then defined as $\mathrm{var}_{\mathbb{T}^{d}}^{(k)}(\bm{x}_{1},\ldots,\bm{x}_{n})/\allowbreak\mathrm{var}_{\mathbb{T}^{d}}(\bm{x}_{1},\ldots,\bm{x}_{n})$ . In particular, this ratio allows quantifying the proportion of data variance that is captured by $\widetilde{S}^{1}$ , a popular metric of success in dimensionality reduction. As it is typical for PCA analogs in non-Euclidean spaces (see, e.g., the discussion in Section 2.3 in Eltzner et al.,, 2018), there is no guarantee that $k\mapsto\mathrm{var}_{\mathbb{T}^{d}}^{(k)}(\bm{x}_{1},\ldots,\bm{x}_{n})/\allowbreak\mathrm{var}_{\mathbb{T}^{d}}(\bm{x}_{1},\ldots,\bm{x}_{n})$ is a non-decreasing function.

4 Numerical experiments

4.1 Clusters toy data on $\mathbb{T}^{2}$

We demonstrate next the ability of ST-PCA to separate clustered data on $\mathbb{T}^{2}$ without distorting the cluster structure. To that aim, a toy dataset consisting of three isotropic clusters with means $\bm{\mu}_{1}=(-1,-2)^{\prime}$ , $\bm{\mu}_{2}=(3,0.5)^{\prime}$ , and $\bm{\mu}_{3}=(-0.8,2.5)^{\prime}$ , each with $n=100$ , has been considered. The dataset is displayed in Figure 2a. We performed Step 1 to obtain a configuration of the data on $\mathbb{S}^{2}$ , displayed in Figure 2b. For this specific dataset, we have obtained $r^{*}=1.47$ as an initial value from (7), which is then modified to $\widehat{r}=1.35$ through (6). In Step 2, we obtained the PNS scores plot for this spherical configuration, displayed in Figure 2c. The black curve in Figure 2b shows $S^{1}$ , the best fitting $\mathbb{S}^{1}$ to the spherical data, which corresponds to the horizontal axis of the scores plot in Figure 2c.

The resulting $\widetilde{S}^{1}$ , computed in Step 3 of ST-PCA, is shown in the black curve in Figure 2a. The figure reveals that the first toroidal component efficiently adapts to the clustered structure of the data (percentage of variance explained: $95\%$ ). $\widetilde{S}^{1}$ was obtained by predicting an equispaced grid $\{\bm{\psi}_{j}\}_{j=1}^{100}\subset S^{1}$ and then linearly interpolating the resulting predictions on $\mathbb{T}^{2}$ to yield a smooth curve.

4.2 Clusters toy data on $\mathbb{T}^{3}$

We now simulate data from an isotropic mixture of three wrapped normal distributions in $\mathbb{T}^{3}$ . We simulated $n=300$ points equally distributed in the three components with centers $\bm{\mu}_{1}=(-1,-2,0)^{\prime}$ , $\bm{\mu}_{2}=(1,1,-1)^{\prime}$ , and $\bm{\mu}_{3}=(0,-1,3)^{\prime}$ . Figure 3a displays the three clusters on $\mathbb{T}^{3}$ , where each side of the cube is glued to the opposite side. After running SMDS in Step 1, we have obtained a configuration on points $\widehat{\bm{y}}_{1},\ldots,\widehat{\bm{y}}_{n}\in\mathbb{S}^{3}$ . For visualization purposes, we show in Figure 3b the spherical view of the configuration’s projections to $S^{2}\cong\mathbb{S}^{2}$ . For this dataset we have obtained $r^{*}=1.79$ , which then yielded $\widehat{r}=1.63$ . Figure 3c shows the scores plot after applying PNS to $\widehat{\bm{y}}_{1},\ldots,\widehat{\bm{y}}_{n}\in\mathbb{S}^{3}$ (Step 2). Finally, we inverted $S^{1}$ , the black curve in Figure 3b, to produce $\widetilde{S}^{1}$ as displayed in the black curve of Figure 3. We did so with an equispaced grid $\{\bm{\xi}_{j}\}_{j=1}^{100}\subset S^{1}$ complemented by a periodic linear interpolation on $\mathbb{T}^{3}$ . The entire cluster structure is captured by the first mode of variation, which clearly exploits the periodicity of $\mathbb{T}^{3}$ for efficiently explaining the data variability. The percentage of variance explained by the first and second ST-PCA components is $94\%$ and $1\%$ , respectively.

4.3 Anisotropic wrapped normal on $\mathbb{T}^{2}$

In the final numerical experiment we simulated $n=500$ observations from a wrapped normal distribution on $\mathbb{T}^{2}$ with mean $\bm{\mu}=(-1,0)^{\prime}$ and covariance matrix $\bm{\Sigma}=(2,2;2,3)$ . Figure 4 shows the analogous analysis to that carried out in Figure 2, this time using a rainbow palette for easier identification of the first ST-PCA principal component and with $r^{*}=1.48$ and $\widehat{r}=1.39$ . The mode of variation successfully utilizes the periodicity of the torus to optimally capture the data variability, attaining a percentage of variance explained of $89\%$ .

5 Real data applications

5.1 Sunspots

Sunspots are cooler regions of the Sun’s photosphere that are related with solar magnetic field concentrations. They have been used as a measure of the Sun’s solar activity and are speculated to have an impact on Earth’s long-term climate (see, e.g., Haigh,, 2007). The solar magnetic field generating the sunspots varies according to a nearly periodic, 11-year period that is referred to as a solar cycle. Sunspots observations have been well documented in the Debrecen Photoheliographic Data and the Greenwhich Photoheliographic Results. From the latter source, García-Portugués et al., (2020) give a curated dataset with the centers of groups of sunspots. The locations of the centers refer to the first-ever observations of such sunspots (henceforth referred to as “births”). We focus on the last cycle that has been fully observed with curated data, the 23rd solar cycle. It includes a total of 5373 records ranging from August of 1996 to November of 2001. Even though the main generator of sunspots, the twisting of the solar magnetic field, has a rotationally symmetric nature, the existence of preferred longitudes has been speculated for the recurrent appearance of sunspots (Babcock,, 1961; Bogart,, 1982; Pelt et al.,, 2010).

We explore the dependence structure of the longitudes $\theta\in[-\pi,\pi)$ of sunspots births. Since the Earth rotates around the Sun, sunspots visibility is limited by which side of the Sun is facing the Earth. Therefore, if there existed significantly preferred longitudes, a significant deviation from a linear dependence structure in the series of sunspots longitudes should be expected. To investigate this serial structure, we jointly consider the series of longitudes and its lagged versions of order one and two,

\displaystyle\bm{\Theta}_{i}:=(\theta_{i},\theta_{i+1},\theta_{i+2})^{\prime}\in\mathbb{T}^{3},\quad i=1,\ldots,5371,

the main driver behind that selection being showing graphically the output of ST-PCA on $\mathbb{T}^{3}$ . The principal mode of variation for the longitudes of the births of three consecutive groups of sunspots is shown in Figure 1c. The figure reveals that the data are concentrated around the (periodic) diagonal line and that ST-PCA (with $\widehat{r}=1.81$ ) correctly identifies this line as the first mode of variation, $\widetilde{S}^{1}$ . Recall that the points on the edges of the cube belong to the cloud of points around the diagonal, thus rendering it as the most sensible choice for a principal mode of variation. $\widetilde{S}^{1}$ is remarkably linear: a suitably-adapted linear fit to the grid of 100 points defining $\widetilde{S}^{1}$ yields $R^{2}=1-3\times 10^{-5}$ . Therefore, the apparent linearity of the innovations in the series of sunspots births longitudes points towards the non-existence of (at least) major preferred longitudes during the 23rd solar cycle, a result that is coherent with the non-rejection of rotational symmetry for such cycle reported in García-Portugués et al., (2020).

An application of T-PCA, in either of its variants, proposed a mode of variation that is statistically less efficient (Figure 1c for SO ordering, analogous results for SI ordering). Furthermore, the deformation of $\mathbb{T}^{3}$ to $\mathbb{S}^{3}$ in T-PCA introduces distortion in the form of an artificial cluster structure on the first scores, as shown in Figures 5b and 5c, despite the marginals of $\{\bm{\Theta}_{i}\}_{i=1}^{5371}$ being fairly uniformly distributed. This contrasts with Figure 5a. To provide a formal comparison between the distributions of the first scores of T-PCA and ST-PCA, we performed three omnibus circular uniformity tests using their asymptotic distributions as implemented in the sphunif R package (García-Portugués and Verdebout,, 2021). The results are summarized in Table 1. The resulting $p$ -values illustrate that the uniformity hypothesis cannot be rejected for the marginal distribution of the original longitudes $\{\theta_{i}\}_{i=1}^{5373}$ and the corresponding scores for ST-PCA, for a significance level $\alpha=0.05$ , but is emphatically rejected with the scores obtained by the two possible orderings for T-PCA.

Finally, we provide a comprehensive comparison of ST-PCA with seven existing methods of dimensionality reduction on the torus for this dataset. To do so, we sampled a common subset of 1000 sunspots and computed for each method the percent of variance that is explained by each of its three components. Table 2 summarizes the results of that analysis, with several remarks following. First, dPCA gives a six-dimensional embedding and as a result, the sum of the variance explained by the first three dimensions does not add up to $100\%$ . For dPCA+, histograms of ten bins were used to identify the regions of lowest density. Existing software for GeoPCA uses PGA and does not report the percentage of total variance explained, but the percentage of variance explained in the first two components instead; for that reason, we have not included it in the table. As a result, we were only able to find upper bounds for the percents of variance explained by the first two components ( $73.54\%$ and $26.46\%$ , respectively). Due to the apparent lack of appropriate software, the model-based approaches have not been implemented. As it can be seen, ST-PCA explains the most variance by the first component among its competitors. Its second component explains a smaller percentage compared to the third component, which is an often-encountered situation in non-Euclidean PCA (see Section 3.5). For example, an instance of this issue for T-PCA is reported in Table 5(b) of Eltzner et al., (2018). We conclude remarking that the methods that go through the sphere seem to yield a much higher signal compression compared to the other methods, with ST-PCA slightly outperforming T-PCA with regard to the percent of variance explained in the first component.

Test	Longitudes	ST-PCA	T-PCA (SI)
Giné’s $F_{n}$	0.3215	0.0855	0
Watson	0.3595	0.1843	0
Projected Anderson–Darling	0.7554	0.1273	$9.522\times 10^{-9}$

Table 1:

p

-values for various circular uniformity tests. From left to right column: name of the test;

p

-values for testing uniformity on the original longitudes;

p

-values for testing uniformity on the first scores of the corresponding method. The original longitudes and ST-PCA’s first scores are not significantly non-uniform, while uniformity is rejected in T-PCA’s first scores.

5.2 An RNA dataset

This RNA dataset consists of nine angles, out of which six are dihedral $(\alpha,\beta,\gamma,\delta,\epsilon,\zeta)$ , one corresponds to the base $(\chi)$ , and two to the pseudotorsion angles $(\eta,\theta)$ . This dataset was put together by Duarte and Pyle, (1998) and updated by Wadley et al., (2007) in an endeavor to cluster the complex structure of RNA nucleotides. The pseudotorsion angles were found to approximate well their folding structure. Sargsyan et al., (2012) then curated a subset that consists of $190$ observations and three clusters (C3’-endo clusters I, II, and V of Wadley et al., (2007), containing $59$ , $88$ , and $43$ points respectively), henceforth referred to as the small RNA dataset. The $\eta$ – $\theta$ plot of the pseudotorsion angles of the small RNA set reveals a clear cluster structure, which is not visible by the pairwise scatterplots of the other seven backbone and base angles (see, e.g., Figure 7 in Eltzner et al., (2018)). We want to investigate whether it is possible to predict the cluster structure revealed in the $\eta$ – $\theta$ plot by using the remaining seven variables only. Following the analyses by Sargsyan et al., (2012), Eltzner et al., (2018), and Nodehi et al., (2020), the small RNA dataset has become a de facto benchmark for dimensionality reduction methods on the torus.

Method	Component 1	Component 2	Component 3
dPCA^∗	38.48	34.06	7.67
dPCA+^∗	42.69	30.53	26.78
aPCA^∗	43.27	31.28	25.45
Euclidean PCA^∗	55.43	24.48	20.09
Complex-dPCA^∗	72.51	14.59	12.90
T-PCA (SI)^†	88.27	6.97	4.76
T-PCA (SO)^†	89.32	6.43	4.25
ST-PCA^†	90.51	3.07	6.40

Table 2: Percentage of variance explained by components of dimension reduction methods on the torus, sorted decreasingly according to the percentage of variance explained by their first components. ST-PCA and T-PCA explain the most percentage of variance in their first component. ^∗Variance calculations based on standard PCA. ^†Variance calculations based on torus variance as defined in Section 3.5.

Figures 3 and 6 of Sargsyan et al., (2012) reveal that dPCA fails at accomplishing any separation by utilizing the dihedral and the base angles, while GeoPCA succeeds in separating cluster I from II in the first two principal components. However, as it can be seen from the $\eta$ – $\theta$ plot, cluster V is close to cluster II and GeoPCA can offer no further insight on such a separation based on the first two components. An analysis by TPPCA on a subset of the small RNA dataset consisting of 181 nucleotides can give a decent separation using the first two components. Once more, however, the distinction between the two nearby clusters is not very clear (see Figure 2 of Nodehi et al., (2020)). Out of torus dimensionality reduction techniques, T-PCA on a 181 nucleotides subset, followed by mode hunting, has been the most successful at revealing the cluster structure of all 3 clusters by using only the first principal component (see Figure 8 of Eltzner et al., (2018)).

To produce a clear comparison of ST-PCA’s performance to T-PCA, we obtain the spherical scores of the $190$ nucleotides dataset with ST-PCA ( $\widehat{r}=1.49$ ), as well as T-PCA with SI and SO ordering (version “20200518” as provided by the authors). Then, we proceed with a modified clustering scheme (to accommodate for the periodicity of the data) based on kernel density estimation, on the scores of the first principal components. The bandwidth has been determined using the plug-in selector for density derivative estimation (Wand and Jones,, 1995, pages 49 and 70) implemented in the R package ks (Duong,, 2021). Figure 6 demonstrates the results of our analysis, where the colors of the points correspond to their true labels: red, yellow, and green are used respectively for clusters I, II, and V. The domains of attraction of each cluster are represented by a different color in the scores plots and are separated by dashed vertical lines.

The left column of Figure 6 demonstrates that deforming the torus with the SI scheme underperforms by recognizing two clusters, as well as three ill-defined. In particular, it fails to separate clusters II and V and thus misclassifies 105 points (classification rate: $0.447$ ). The middle column shows that T-PCA with SO ordering correctly recognizes all three clusters, while misclassifying 23 points (classification rate: $0.879$ ). Finally, the right column shows that ST-PCA also reveals three clusters and misclassifies $16$ points (classification rate: $0.916$ ), hence outperforming T-PCA in both its variants. An additional benefit for ST-PCA is that the distribution of the estimated densities suggests a more clear separation of the clusters.

6 Discussion

In this paper, a novel approach to dimensionality reduction on the torus has been developed. It is an analogue of PCA, as it provides modes of variations and scores. Consistent with previous work, we have provided evidence that for toroidal data a transformation to the sphere induces less distortion compared to direct Euclidean transformations. We propose a fully data-driven transformation that is optimal in terms of minimizing the stress function inherited from the MDS literature. This key advantage of ST-PCA comes also at a price: the inheritance of the computational limitations of MDS for larger datasets and the delicate inversion procedure of such transformation. ST-PCA’s benefits are validated by real data applications on the appearance of sunspots and an benchmark RNA dataset, where ST-PCA is seen to overperform previous methods in both qualitative and quantitative metrics.

Acknowledgments

We are grateful to Benjamin Eltzner for kindly sharing his implementation of T-PCA. The second author acknowledges support by grants PGC2018-097284-B-100 and IJCI-2017-32005 by Spain’s Ministry of Science, Innovation and Universities. Both grants are co-funded with FEDER funds.

References

Altis et al., (2007) Altis, A., Nguyen, P. H., Hegger, R., and Stock, G. (2007). Dihedral angle principal component analysis of molecular dynamics simulations. J. Chem. Phys., 126(24):244111.
Babcock, (1961) Babcock, H. W. (1961). The topology of the Sun’s magnetic field and the 22-year cycle. Astrophys. J., 133(2):572–587.
Bai et al., (2015) Bai, S., Qi, H.-D., and Xiu, N. (2015). Constrained best Euclidean distance embedding on a sphere: A matrix optimization approach. SIAM J. Optim., 25(1):439–467.
Bogart, (1982) Bogart, R. S. (1982). Recurrence of solar activity: Evidence for active longitudes. Solar Phys., 76(1):155–165.
Borg and Groenen, (2005) Borg, I. and Groenen, P. J. F. (2005). Modern Multidimensional Scaling Theory and Applications. Springer, New York.
Borg and Lingoes, (1980) Borg, I. and Lingoes, J. (1980). A model and algorithm for multidimensional scaling with external constraints on the distances. Psychometrika, 45:25–38.
Cox and Cox, (2008) Cox, M. A. A. and Cox, T. F. (2008). Multidimensional scaling. In Handbook of Data Visualization, pages 315–347. Springer, Berlin.
Cox and Cox, (1991) Cox, T. F. and Cox, M. A. A. (1991). Multidimensional scaling on a sphere. Commun. Stat. Theory Methods, 20(9):2943–2953.
Damon and Marron, (2014) Damon, J. and Marron, J. S. (2014). Backwards principal component analysis and principal nested relations. J. Math. Imaging Vis., 50(1):107–114.
de Leeuw and Heiser, (1980) de Leeuw, J. and Heiser, W. (1980). Multidimensional scaling with restrictions on the configuration. J. Multivar. Anal, 5:501–522.
de Leeuw and Mair, (2009) de Leeuw, J. and Mair, P. (2009). Multidimensional scaling using majorization: SMACOF in R. J. Stat. Softw., 31(3):1–30.
Dryden, (2021) Dryden, I. L. (2021). shapes: Statistical Shape Analysis. R package version 1.2.6.
Duarte and Pyle, (1998) Duarte, C. M. and Pyle, A. M. (1998). Stepping through an RNA structure: A novel approach to conformational analysis. J. Mol. Biol, 284(5):1465–1478.
Ducharme et al., (2012) Ducharme, G. R., Vincent, C., and Aliaume, C. (2012). A statistical test to detect vortices in the current fields of bodies of water. Environ. Ecol. Stat., 19(3):345–367.
Duong, (2021) Duong, T. (2021). ks: Kernel Smoothing. R package version 1.13.2.
Elad et al., (2005) Elad, A., Keller, Y., and Kimmel, R. (2005). Texture mapping via spherical multi-dimensional scaling. In Kimmel, R., Sochen, N. A., and Weickert, J., editors, Scale Space and PDE Methods in Computer Vision, pages 433–455, Berlin. Springer.
Eltzner et al., (2018) Eltzner, B., Huckemann, S., and Mardia, K. V. (2018). Torus principal component analysis with applications to RNA structure. Ann. Appl. Stat., 12(2):1332–1359.
Fletcher et al., (2004) Fletcher, P. T., Lu, C., Pizer, S. M., and Joshi, S. (2004). Principal geodesic analysis for the study of nonlinear statistics of shape. IEEE Trans. Med. Imaging, 23(8):995–1005.
García-Portugués et al., (2020) García-Portugués, E., Paindaveine, D., and Verdebout, T. (2020). On optimal tests for rotational symmetry against new classes of hyperspherical distributions. J. Am. Stat. Assoc., 115(532):1873–1887.
García-Portugués and Verdebout, (2021) García-Portugués, E. and Verdebout, T. (2021). sphunif: Uniformity Tests on the Circle, Sphere, and Hypersphere. R package version 1.0.1.
Gower and Hand, (1996) Gower, J. C. and Hand, D. J. (1996). Biplots. Monographs on Statistics and Applied Probability. Chapman and Hall, London.
Haigh, (2007) Haigh, J. (2007). The Sun and the Earth’s climate. Living Rev. Sol. Phys, 4(2).
Huckemann et al., (2010) Huckemann, S., Hotz, T., and Munk, A. (2010). Intrinsic shape analysis: geodesic PCA for Riemannian manifolds modulo isometric Lie group actions. Stat. Sin., 20(1):1–58.
Huckemann and Ziezold, (2006) Huckemann, S. and Ziezold, H. (2006). Principal component analysis for Riemannian manifolds, with an application to triangular shape spaces. Adv. Appl. Probab., 38(2):299–319.
Huckemann and Eltzner, (2018) Huckemann, S. F. and Eltzner, B. (2018). Backward nested descriptors asymptotics with inference on stem cell differentiation. Ann. Stat., 46(5):1994–2019.
Jung et al., (2012) Jung, S., Dryden, I. S., and Marron, J. S. (2012). Analysis of principal nested spheres. Biometrika, 99(3):551–568.
Jung et al., (2011) Jung, S., Foskey, M., and Marron, J. S. (2011). Principal arc analysis on direct product manifolds. Ann. Appl. Stat., 5(1):578–603.
Jupp et al., (2003) Jupp, P. E., Kim, P. T., Koo, J.-Y., and Wiegert, P. (2003). The intrinsic distribution and selection bias of long-period cometary orbits. J. Am. Stat. Assoc., 98(463):515–521.
Kent and Mardia, (2009) Kent, J. T. and Mardia, K. V. (2009). Principal component analysis for the wrapped normal torus model. In Gusnanto, A., Mardia, K. V., and Fallaize, C. J., editors, LASR 2009 – Statistical Tools for Challenges in Bioinformatics, pages 39–41, Leeds. Department of Statistics, University of Leeds.
Lee and Bentler, (1980) Lee, S. Y. and Bentler, P. M. (1980). Functional relations in multidimensional scaling. British J. Math. Stat. Psych., 33(2):142–150.
Marron and Alonso, (2014) Marron, J. S. and Alonso, A. M. (2014). Overview of object oriented data analysis. Biom. J., 56(5):732–753.
Morgan, (1974) Morgan, C. L. (1974). Embedding metric spaces in Euclidean space. J. Geom., 5(1):101–107.
Mu et al., (2005) Mu, Y., Nguyen, P. H., and Stock, G. (2005). Energy landscape of a small peptide revealed by dihedral angle principal component analysis. Proteins, 58(1):45–52.
Nodehi et al., (2015) Nodehi, A., Golalizadeh, M., and Heydari, A. (2015). Dihedral angles principal geodesic analysis using nonlinear statistics. J. Appl. Stat., 42(9):1962–1972.
Nodehi et al., (2020) Nodehi, A., Golalizadeh, M., Maadooliat, M., and Agostinelli, C. (2020). Torus probabilistic principal component analysis. arXiv:2008.10725.
Papazoglou and Mylonas, (2017) Papazoglou, S. and Mylonas, K. (2017). An examination of alternative multidimensional scaling techniques. Educ. Psychol. Meas., 77:429–448.
Pekalska and Duin, (2005) Pekalska, E. and Duin, R. P. W. (2005). The Dissimilarity Representation for Pattern Recognition: Foundations and Applications. World Scientific Publishing, Singapore.
Pelt et al., (2010) Pelt, J., Korpi, M. J., and Tuominen, I. (2010). Solar active regions: A nonparametric statistical analysis. Astron. Astrophys., 513:A48.
Pennec, (2018) Pennec, X. (2018). Barycentric subspace analysis on manifolds. Ann. Stat., 46(6A):2711–2746.
Pewsey and García-Portugués, (2021) Pewsey, A. and García-Portugués, E. (2021). Recent advances in directional statistics. Test, 30(1):1–58.
Pizer et al., (2020) Pizer, S. M., Hong, J., Vicory, J., Liu, Z., Marron, J. S., Choi, H.-Y., Damon, J., Jung, S., Paniagua, B., Schulz, J., Sharma, A., Tu, L., and Wang, J. (2020). Object shape representation via skeletal models (s-reps) and statistical analysis. In Pennec, X., Sommer, S., and Fletcher, T., editors, Riemannian Geometric Statistics in Medical Image Analysis, pages 233–271. Academic Press, London.
Pizer and Marron, (2017) Pizer, S. M. and Marron, J. (2017). Object statistics on curved manifolds. In Zheng, G., Li, S., and Székely, G., editors, Statistical Shape and Deformation Analysis, pages 137–164. Academic Press, Cambridge, Massachusetts.
Ramachandran et al., (1963) Ramachandran, G. N., Ramakrishnan, C., and Sasisekharan, V. (1963). Stereochemistry of polypeptide chain configurations. J. Mol. Biol., 7:95–99.
Riccardi et al., (2009) Riccardi, L., Nguyen, P. H., and Stock, G. (2009). Free-energy landscape of RNA hairpins constructed via dihedral angle principal component analysis. J. Phys. Chem. B, 113(52):16660–16668.
Saeed et al., (2018) Saeed, N., Nam, H., Haq, M. I. U., and Muhammad Saqib, D. B. (2018). A survey on multidimensional scaling. ACM Comput. Surv., 51(3).
Sargsyan et al., (2012) Sargsyan, K., Wright, J., and Lim, C. (2012). GeoPCA: a new tool for multivariate analysis of dihedral angles based on principal component geodesics. Nucleic Acids Res., 40(3):e25–e25.
Schoenberg, (1937) Schoenberg, I. J. (1937). On certain metric spaces arising from Euclidean spaces by a change of metric and their imbedding in Hilbert space. Ann. Math., 38(4):787–793.
Sittel et al., (2017) Sittel, F., Filk, T., and Stock, G. (2017). Principal component analysis on a torus: theory and application to protein dynamics. J. Chem. Phys., 147(24):244101.
Wadley et al., (2007) Wadley, L. M., Keating, K. S., Duarte, C. M., and Pyle, A. M. (2007). Evaluating and learning from RNA pseudotorsional space: Quantitative validation of a reduced representation for RNA structure. J. Mol. Biol, 372(4):942–957.
Wand and Jones, (1995) Wand, M. P. and Jones, M. C. (1995). Kernel Smoothing, volume 60 of Monographs on Statistics and Applied Probability. Chapman & Hall, London.