This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Topological time-series analysis with delay-variant embedding

Quoc Hoan Tran zoro@biom.t.u-tokyo.ac.jp Department of Information and Communication Engineering, Graduate School of Information Science and Technology,
The University of Tokyo, Tokyo 113-8656, Japan
   Yoshihiko Hasegawa hasegawa@biom.t.u-tokyo.ac.jp Department of Information and Communication Engineering, Graduate School of Information Science and Technology,
The University of Tokyo, Tokyo 113-8656, Japan
(September 4, 2025)
Abstract

Identifying the qualitative changes in time-series data provides insights into the dynamics associated with such data. Such qualitative changes can be detected through topological approaches, which first embed the data into a high-dimensional space using a time-delay parameter and subsequently extract topological features describing the shape of the data from the embedded points. However, the essential topological features that are extracted using a single time delay are considered to be insufficient for evaluating the aforementioned qualitative changes, even when a well-selected time delay is used. We therefore propose a delay-variant embedding method that constructs the extended topological features by considering the time delay as a variable parameter instead of considering it as a single fixed value. This delay-variant embedding method reveals multiple-time-scale patterns in a time series by allowing the observation of the variations in topological features, with the time delay serving as an additional dimension in the topological feature space. We theoretically prove that the constructed topological features are robust when the time series is perturbed by noise. Furthermore, we combine these features with the kernel technique in machine learning algorithms to classify the general time-series data. We demonstrate the effectiveness of our method for classifying the synthetic noisy biological and real time-series data. Our method outperforms a method that is based on a single time delay and, surprisingly, achieves the highest classification accuracy on an average among the standard time-series analysis techniques.

pacs:
Valid PACS appear here

I Introduction

Time-series data can undergo qualitative changes such as transitioning from the quiescent state to oscillatory dynamics through a bifurcation. Identifying such changes enables deep understanding of the underlying dynamics; however, this is challenging when the data are subject to noise. In material science, topological features, which indicate the “shape” of the data, can be used to detect the qualitative changes, i.e., phase transitions Donato et al. (2016); Kusano et al. (2016) or transitions in morphological and hierarchical structures Ardanza-Trevijano et al. (2014); Nakamura et al. (2015); Hiraoka et al. (2016); Ichinomiya et al. (2017). Because topology is a qualitative property that is stable under the influence of noise, the topological features of time-series data are expected to reflect the qualitative changes in dynamics. These features are constructed through delay embedding, in which a time series x(t)x(t) is mapped to mm-dimensional points using delay coordinates [x(t),x(tτ),,x(t(m1)τ)][x(t),x(t-\tau),...,x(t-(m-1)\tau)] on the embedded space, where τ\tau denotes the predefined time delay and mm denotes the embedding dimension. Further, the embedded points form geometric features, such as clusters and loops, and the topological features, which monitor the emergence and disappearance of geometric features, can be used to characterize the dynamics of the system Maletić et al. (2016); Mittal and Gupta (2017).

Theoretically, for a noise-free time series of unlimited length, embeddings with the same value of mm but different values of time delay τ\tau are considered to be equivalent in terms of “optimal” reconstruction Takens (1981). Herein, the reconstruction that preserves the invariants of dynamics, such as the fractal dimension and the Lyapunov exponent, when the time series is embedded can be referred to as optimal reconstruction. However, because a real time series is noisy and because of finite length, the selection of τ\tau is considered to be an inherently difficult problem. Instead of being based on mathematically rigorous criteria, majority of the methods that are required for the selection of τ\tau are observed to be based on heuristics, i.e., they involve a tradeoff between redundance and irrelevance in case of successive delay coordinates Casdagli et al. (1991). Traditional methods determine τ\tau as the first time scale that maximizes the indices of independence such as linear independence and nonlinear independent information Fraser and Swinney (1986), and the correlation sum Liebert and Schuster (1989). Meanwhile, the geometry-based strategies that are used for estimating τ\tau examine the measures that are required for expanding the attractor in the reconstructed phase space Buzug and Pfister (1992a, b); Rosenstein et al. (1994). However, the optimal value of τ\tau exhibits no definite theoretical properties because τ\tau has no theoretical relevance in the mathematical framework of delay embedding Kantz and Schreiber (2003). Furthermore, there is no universal strategy for selecting τ\tau, and the value of τ\tau that works well for one application may not work well for another application even though the same data may be used Bradley and Kantz (2015). Therefore, the optimal selection of τ\tau depends on the objective of the analysis and is obtained by trial and error without using any systematic method. This defect limits the power of topological features in time-series data analysis.

Herein, we propose a delay-variant embedding method in which the topological features are constructed by considering τ\tau as the variable parameter; using this method, the topological changes are monitored in the embedded space. Embedding with a single value of τ\tau is sensitive to noise; further, considering a range of τ\tau can provide useful information with which the dynamics can be understood. We theoretically prove the stability of constructed features against noise and apply these features to classification of several time-series datasets. We demonstrate that our method outperforms a method that is based on a single value of τ\tau in classifying the oscillatory activity of synthetic noisy biological data. Surprisingly, in classifying real time-series data, our approach demonstrates a higher accuracy on an average when compared to several standard techniques that are used for performing time-series analysis.

II Method

II.1 Topological features from time-series data with delay-variant embedding

To qualitatively evaluate the characteristics of the time series, we apply topological data analysis Carlsson (2009), which is a computational method that can be used for characterizing the topological features of high-dimensional data. We construct a simplicial-complex model Edelsbrunner and Harer (2010) from the points in the embedded space and obtain the topological information as the number, position, and size of single- or multi-dimensional clusters and loops. We build a complex over a set of points if the pairwise distances between them are less than or equal to 2ε2\varepsilon, where ε\varepsilon is a given non-negative scale parameter. Different values of ε\varepsilon result in different complexes and different topological information. If ε\varepsilon is considerably small, no connections are created; further, the resulting simplicial complex is not different when compared to the original points. As we gradually increase ε\varepsilon, connections appear between the points; however, if ε\varepsilon becomes considerably large, all the points are connected with each other, and no useful information can be conveyed. If we increase ε\varepsilon as ε1<<εn\varepsilon_{1}<\ldots<\varepsilon_{n}, we obtain a sequence of embedded simplicial complexes that can be referred to as filtration (see Appendix A).

We use persistent homology theory Edelsbrunner et al. (2002); Zomorodian and Carlsson (2005) to study the topological features across filtration. A practical way to visualize the results of persistent homology is through multi-set points in the two-dimensional persistence diagram. In this diagram, each point (b,d)(b,d) represents an ll-dimensional hole (i.e., the connected components are zero-dimensional, loops and tunnels are one-dimensional holes, and voids are two-dimensional holes) that appears at ε=b\varepsilon=b (known as the birth scale) and disappears at ε=d\varepsilon=d (known as the death scale) across the filtration (see Appendix A). This information encapsulates the topological features of the time series after embedding and provides valuable insights into the behavior of a dynamical system; for instance, the emergence of an oscillation in a time series can be attributed to the birth of a loop in the embedded space. Using these features, the qualitative properties of the time series can be captured in a robust and efficient manner.

For the observed time series x(t)x(t), we use Fl,τ(x(t))F_{l,\tau}(x(t)) to denote the two-dimensional persistence diagram calculated for ll-dimensional holes from the embedded points with time delay τ\tau. We consider τ\tau in a predefined set 𝒯={τ1,τ2,,τK}\mathcal{T}=\{\tau_{1},\tau_{2},...,\tau_{K}\}, where τ1<τ2<<τK\tau_{1}<\tau_{2}<\cdots<\tau_{K} (KK denotes the size of the set). The three-dimensional persistence diagram for time series x(t)x(t) can be defined as

PDl(3)\displaystyle\text{PD}^{(3)}_{l} (x(t))\displaystyle(x(t))
={(b,d,τ)(b,d)Fl,τ(x(t)),τ𝒯}.\displaystyle=\left\{(b,d,\tau)\mid(b,d)\in F_{l,\tau}(x(t)),\tau\in\mathcal{T}\right\}. (1)

Figure 1 shows a schematic of the three-dimensional persistence diagram PD1(3)(x(t))\text{PD}^{(3)}_{1}(x(t)) for loops and tunnels (one-dimensional holes), wherein the embedded points exhibit different shapes and topological features for different values of τ\tau. In the middle panel of Fig. 1, there are two big loops for τ1\tau_{1}, whereas loops survive for short times with different distributions of birth and death scales (right panel) for τ2\tau_{2} and τ3\tau_{3}. This information cannot be obtained using a single value of τ\tau in the embedding.

Refer to caption
Figure 1: Time series x(t)x(t) is embedded in an mm-dimensional space (in this illustration, m=3m=3) through delay embedding (left panel). The embedded points at different values of time delay τ\tau have different geometric features, such as clusters and loops (middle panel). We extract the topological features such as the emergence (birth scale) and disappearance (death scale) of these geometric features. The birth and death scales at each τ\tau are represented as points in a two-dimensional persistence diagram. By observing the manner in which the topological features vary with τ\tau serving as an additional dimension, we obtain the three-dimensional persistence diagram (right panel), which can be considered to be a typical feature of the time series.

II.2 Stability of topological features

Time-series data tend to be considerably noisy, which is a feature that can be considered to be either a measurement artifact or an inherent part of the dynamics themselves. Therefore, the persistence diagram should be stable with respect to the data being perturbed by noise. To evaluate the stability of the persistence diagram, we introduce the concept of bottleneck distance as a metric structure for comparing the persistence diagrams. Given two three-dimensional persistence diagrams Dg1(3)Dg_{1}^{(3)} and Dg2(3)Dg_{2}^{(3)}, consider all matchings ψ\psi such that a point on one diagram can be matched either to a point on the other diagram or to its projection on the diagonal plane 𝒲(3)={(b,b,τ)b,τ}\mathcal{W}^{(3)}=\{(b,b,\tau)\mid b,\tau\in\mathbb{R}\} (Fig. 2(a)). For each pair (𝒑,𝒒)ψ(\bm{p},\bm{q})\in\psi for which 𝒑=(b1,d1,τ1)\bm{p}=\left(b_{1},d_{1},\tau_{1}\right) and 𝒒=(b2,d2,τ2)\bm{q}=\left(b_{2},d_{2},\tau_{2}\right), we define the relative infinity-norm distance between 𝒑\bm{p} and 𝒒\bm{q} as dξ()(𝒑,𝒒)=max(|b1b2|,|d1d2|,ξ|τ1τ2|)d^{(\infty)}_{\xi}(\bm{p},\bm{q})=\max\left(|b_{1}-b_{2}|,|d_{1}-d_{2}|,\xi|\tau_{1}-\tau_{2}|\right), where ξ\xi is a positive rescaling coefficient introduced to adjust the scale difference between the point-wise distance and time. The bottleneck distance dB,ξ(3)(Dg1(3),Dg2(3))d^{(3)}_{B,\xi}(Dg_{1}^{(3)},Dg_{2}^{(3)}) can be defined as the infimum of the longest matched relative infinity-norm distance over all matchings ψ\psi:

dB,ξ(3)(Dg1(3),Dg2(3))=infψmax(𝒑,𝒒)ψdξ()(𝒑,𝒒).d^{(3)}_{B,\xi}(Dg_{1}^{(3)},Dg_{2}^{(3)})=\inf_{\psi}\max_{(\bm{p},\bm{q})\in\psi}d^{(\infty)}_{\xi}(\bm{p},\bm{q}). (2)

We show that three-dimensional persistence diagrams are stable with respect to the bottleneck distance under perturbation applied to the time series. Given two time series x(t)x(t) and y(t)y(t) of the same length, let Dg(m)(3)(x)Dg_{(m)}^{(3)}(x) and Dg(m)(3)(y)Dg_{(m)}^{(3)}(y) be their three-dimensional persistence diagrams respectively, as calculated for the embedding dimension mm. Based on the stability properties of two-dimensional persistence diagrams Chazal et al. (2014), we can prove the following stability property of three-dimensional pesistence diagrams (see Appendix B):

dB,ξ(3)(Dg(m)(3)(x),Dg(m)(3)(y))2mmaxt|x(t)y(t)|d^{(3)}_{B,\xi}(Dg_{(m)}^{(3)}(x),Dg_{(m)}^{(3)}(y))\leq 2\sqrt{m}\max_{t}|x(t)-y(t)| (3)

for an arbitrary positive ξ\xi. If we identify y(t)y(t) as the perturbed data obtained by adding noise to x(t)x(t), Eq. (3) shows that the upper limit of the bottleneck distance between Dg(m)(3)(x)Dg_{(m)}^{(3)}(x) and Dg(m)(3)(y)Dg_{(m)}^{(3)}(y) is governed by the magnitude of the noise. Thus, the inequality of Eq. (3) states that our three-dimensional persistence diagrams are robust with respect to the time-series data being perturbed by noise. Therefore, these diagrams can be used as discriminating features for characterizing the time series.

Refer to caption
Figure 2: (a) Bottleneck distance between two diagrams Dg1(3)Dg_{1}^{(3)} (green points) and Dg2(3)Dg_{2}^{(3)} (red points). The τ\tau axis is rescaled with the positive coefficient ξ\xi to adjust the scale difference between the point-wise distance and time. A green (red) point is matched to either a red (green) point or its projection on the diagonal plane, 𝒲(3)\mathcal{W}^{(3)}. The distance between two matched points is the longest edge of their differences along any coordinate dimension. The bottleneck distance is defined as the longest distance between two matched points that is minimal compared to all other matchings. (b) Kernel between Dg1(3)Dg_{1}^{(3)} (green points) and Dg2(3)Dg_{2}^{(3)} (dark red points). The faint red points are symmetric to the dark red points with respect to 𝒲(3)\mathcal{W}^{(3)}. Points near 𝒲(3)\mathcal{W}^{(3)}, i.e., 𝒑1,𝒒2\bm{p}_{1},\bm{q}_{2}, exhibit less influence, whereas the pairs of points far away from 𝒲(3)\mathcal{W}^{(3)}, i.e., (𝒑2,𝒒1),(𝒑2,𝒒3),(𝒑3,𝒒1),(𝒑3,𝒒3)(\bm{p}_{2},\bm{q}_{1}),(\bm{p}_{2},\bm{q}_{3}),(\bm{p}_{3},\bm{q}_{1}),(\bm{p}_{3},\bm{q}_{3}), make essential contributions to the kernel.

II.3 Kernel method for the topological features

To use the persistence diagrams as features for performing the statistical learning tasks, such as classification, we must define a similarity measure, such as kernel mapping, for such diagrams Reininghaus et al. (2015); Kusano et al. (2016); Carrière et al. (2017). Because the points that are close to 𝒲(3)\mathcal{W}^{(3)} can be considered to be insignificant topological features, they should not influence the computed value of this kernel. Given the positive bandwidth σ\sigma and the positive rescaling parameter ξ\xi, the kernel kσ,ξk_{\sigma,\xi} between two three-dimensional persistence diagrams Dg1(3)Dg^{(3)}_{1} and Dg2(3)Dg^{(3)}_{2} can be defined as

kσ,ξ(Dg1(3),Dg2(3))\displaystyle k_{\sigma,\xi}(Dg^{(3)}_{1},Dg^{(3)}_{2})
=1σ2π𝒑Dg1(3),𝒒Dg2(3)(edξ2(𝒑,𝒒)2σ2edξ2(𝒑,𝒒¯)2σ2),\displaystyle=\frac{1}{\sigma\sqrt{2\pi}}\sum_{\bm{p}\in Dg^{(3)}_{1},\bm{q}\in Dg^{(3)}_{2}}\left(e^{-\frac{d_{\xi}^{2}(\bm{p},\bm{q})}{2\sigma^{2}}}-e^{-\frac{d_{\xi}^{2}(\bm{p},\bm{\bar{q}})}{2\sigma^{2}}}\right), (4)

where 𝒒¯\bm{\bar{q}} is a symmetric point of 𝒒\bm{q} with respect to 𝒲(3)\mathcal{W}^{(3)} and dξ2(𝒑,𝒒)=|b1b2|2+|d1d2|2+ξ2|τ1τ2|2d^{2}_{\xi}(\bm{p},\bm{q})=|b_{1}-b_{2}|^{2}+|d_{1}-d_{2}|^{2}+\xi^{2}|\tau_{1}-\tau_{2}|^{2}, dξ2(𝒑,𝒒¯)=|b1d2|2+|d1b2|2+ξ2|τ1τ2|2d^{2}_{\xi}(\bm{p},\bm{\bar{q}})=|b_{1}-d_{2}|^{2}+|d_{1}-b_{2}|^{2}+\xi^{2}|\tau_{1}-\tau_{2}|^{2}, with 𝒑=(b1,d1,τ1)\bm{p}=\left(b_{1},d_{1},\tau_{1}\right) and 𝒒=(b2,d2,τ2)\bm{q}=\left(b_{2},d_{2},\tau_{2}\right). If 𝒑\bm{p} or 𝒒\bm{q} is near 𝒲(3)\mathcal{W}^{(3)}, i.e., b1d1b_{1}\approx d_{1} or b2d2b_{2}\approx d_{2}, then dξ2(𝒑,𝒒)dξ2(𝒑,𝒒¯)d^{2}_{\xi}(\bm{p},\bm{q})\approx d^{2}_{\xi}(\bm{p},\bm{\bar{q}}) and the pair (𝒑,𝒒)(\bm{p},\bm{q}) will have a minor influence on the value of the kernel (Fig. 2(b)). Based on Ref. Reininghaus et al. (2015), we can prove that kσ,ξk_{\sigma,\xi} is a positive-definite kernel (see Appendix C). When we apply kσ,ξk_{\sigma,\xi} to time-series classification, the parameters σ\sigma and ξ\xi are observed to affect the classification performance and can be chosen either by cross-validation or in a heuristic manner Gretton et al. (2008) (see Appendix C). In our classification tasks, we use the normalized version of the kernel, which can be calculated as kσ,ξ(E,F)kσ,ξ(E,F)/kσ,ξ(E,E)kσ,ξ(F,F),k_{\sigma,\xi}(E,F)\leftarrow k_{\sigma,\xi}(E,F)/\sqrt{k_{\sigma,\xi}(E,E)k_{\sigma,\xi}(F,F)}, where EE and FF denote two persistence diagrams. The source code that has been used to calculate the persistence diagrams and kernels can be found on GitHub Tran and Hasegawa .

III Results

III.1 Classification of the synthetic oscillatory and non-oscillatory data

Initially, the proposed method is applied to classify periodic and aperiodic time series using synthetic single-cell data. This is challenging because of the difficulty associated with discriminating between an oscillation containing noise and a mere noisy fluctuation Phillips et al. (2017). We generate synthetic mRNA and protein time-series data from a stochastic model of the Hes1 genetic oscillator Monk (2003); Galla (2009) exhibiting negative autoregulation with delay. We use the delayed version of the Gillespie algorithm Gillespie (1977); Anderson (2007) to generate data from 1,000 cells in both the oscillatory and non-oscillatory parameter regimes Galla (2009); Brett and Galla (2013). We measure the protein levels after every ν\nu (= 64, 32, 16, 8) min for 4,096 min. We normalize the time series to have zero mean and unit variance; further, we add Gaussian white noise with variance σn2=0.1\sigma^{2}_{n}=0.1 to assess the robustness of the method.

Figure 3 depicts the examples of the time series and persistence diagrams for the two regimes with an embedding dimension of m=3m=3. Figures 3(a) and (d) show the time series generated by the measurements in the non-oscillatory and oscillatory regimes, respectively, after every ν=16\nu=16 min, and Figs. 3(b) and (e) denote their respective two-dimensional persistence diagrams computed from the single-delay embedding. In Figs. 3(b) and (e), τ\tau is selected by the mutual-information method to maximize the statistical measure of nonlinear independence in the delay coordinates of the embedded points Fraser and Swinney (1986). Figures 3(c) and (f) depict the three-dimensional persistence diagrams obtained using Fig. 3(a) and (d), respectively, with K=10K=10 time-delay values as τ1=1,τ2=2,,τ10=10\tau_{1}=1,\tau_{2}=2,\ldots,\tau_{10}=10. In these examples, it is difficult to distinguish between the two regimes using either the original time series or the two-dimensional diagrams, whereas there is a distinct pattern in the three-dimensional diagrams because all the topological variations are considered when τ\tau changes. In the three-dimensional diagram of the non-oscillatory data, the points, which represent the loops in the embedded space, are extensively distributed along the birth and death scales, corresponding to the random fluctuation in the time series. In contrast, in the oscillatory data, points are densely distributed along the birth and death scales; this manifests the appearance of repeated loop patterns in the embedded space, thereby indicating periodic patterns in the time series.

Refer to caption
Figure 3: (a)(d) Examples of two time series generated from the (a) non-oscillatory and (d) oscillatory regimes of the Hes1 model measured after every 16 min. (b)(e) Two-dimensional persistence diagrams of the loop patterns calculated with embedding dimension m=3m=3 and a single value of τ\tau selected according to Fraser and Swinney (1986) for the (b) non-oscillatory (τ=4\tau=4) and (e) oscillatory (τ=2\tau=2) time series. (c)(f) Three-dimensional persistence diagrams of loop patterns calculated with m=3m=3 and 1010 values of τ\tau given by τ1=1,τ2=2,,τ10=10\tau_{1}=1,\tau_{2}=2,\ldots,\tau_{10}=10 for the (c) non-oscillatory and (f) oscillatory time series.

Further, we assess the effectiveness of the proposed topological features by classifying the data obtained from the Hes1 model, which can be randomly split with equal probability to belong to the training and test sets. We use the support vector machine Bishop (2006) to perform classification in the kernel space. In the delay-variant method, we consider K=10K=10 time-delay values as τ1=1,τ2=2,,τ10=10\tau_{1}=1,\tau_{2}=2,\ldots,\tau_{10}=10. In the single-delay method, we choose a single value of τ\tau for each value from {τ1,,τ10}\{\tau_{1},\ldots,\tau_{10}\}. For each embedding dimension of the single-delay method, we use the maximum accuracy in the test dataset over all the values of τ\tau for performing comparison with the delay-variant method. Figure 4 depicts the average classification accuracy obtained using 100 random splits at different embedding dimensions mm and different measurement intervals ν=8\nu=8 (Fig. 4(a)), ν=16\nu=16 (Fig. 4(c)), ν=32\nu=32 (Fig. 4(e)), and ν=64\nu=64 (Fig. 4(g)). When the same embedding dimension mm is being used, the total dimension for the delay-variant method (mKmK) is observed to be higher than that for the single-delay method (mm). In the classification task, the high-dimensional features tend to achieve high classification accuracies. Therefore, to ensure that a fair comparison between the single-delay method and the delay-variant method can be performed, we define the effective embedding dimension as M=mKM=mK and further compare the accuracy for different values of MM with ν=8\nu=8 (Fig. 4(b)), ν=16\nu=16 (Fig. 4(d)), ν=32\nu=32 (Fig. 4(f)), and ν=64\nu=64 (Fig. 4(h)). For ν=64\nu=64 (Fig. 4(h)), we do not consider embeddings with M60M\geq 60 because the length of the time series is 64.

When the comparison is performed over the embedding dimension mm, Fig. 4 depicts that both the methods will achieve almost the same accuracy when the time series is long enough (ν=8,16\nu=8,16) at m4m\geq 4. In the single-delay method, τ\tau should be selected by cross-validation for performing a fair comparison; however, even with trial and error to obtain the maximum accuracy in the test dataset in the single-delay method, our delay-variant method still performs better. When ν\nu is increased, the time series is reduced, thereby increasing the difficulty of classification; regardless, the delay-variant method attains a higher accuracy than the single-delay method. For ν=32,64\nu=32,64, the delay-variant method outperforms the single-delay method over a wide range of mm (4104\text{--}10) in terms of the accuracy. This observation indicates that the single-delay method degrades quickly with a shorter time series because it is highly sensitive to the choice of τ\tau. However, by considering a range of τ\tau with delay embedding, this sensitivity can be reduced, thereby achieving stable performance.

While performing the comparison over the effective embedding dimension MM, the accuracy of the delay-variant method does not change over a wide range of MM (4010040\text{--}100), indicating that this method is more reliable than the single-delay method (Fig. 4(b), (d), (f), and (h)). For a given effective dimension, the embedded points of single-delay method become highly folded and excessively sparse in a high-dimensional space because the embedding dimension that is used in the single-delay method is considerably higher than that used in the delay-variant method. Furthermore, selecting a considerably high embedding dimension increases the computational complexity of the embedding and causes the high impact of noise acting in a high proportion of elements in delay coordinates. Therefore, selecting a considerably high embedding dimension makes it disadvantageous to use topological features for characterizing the time series, thereby degrading the classification performance.

Refer to caption
Figure 4: Average classification accuracies of the delay-variant (circles) and single-delay (squares) methods over 100 random splits of the Hes1-model data at different (a)(c)(e)(g) embedding dimensions mm and different (b)(d)(f)(h) effective embedding dimensions MM. The time-series data were obtained at different measurement intervals (a)(b) ν=8\nu=8, (c)(d) ν=16\nu=16, (e)(f) ν=32\nu=32, and (g)(h) ν=64\nu=64. The insets in (a) (ν=8\nu=8) and (c) (ν=16\nu=16) highlight the accuracy over the embedding dimension m=6106\text{--}10.

To demonstrate that the delay-variant method is more robust than the single-delay method when the time series is being perturbed by noise, we compare the two aforementioned methods based on the values of the noise variance σn2\sigma^{2}_{n} of 0.1, 0.3, 0.5, and 0.7. Figure 5 depicts the average classification accuracy versus the embedding dimension mm over 100 random splits at a measurement interval of ν=16\nu=16 min when the noise variance σn2=0.1\sigma^{2}_{n}=0.1 (Fig. 5(a)), σn2=0.3\sigma^{2}_{n}=0.3 (Fig. 5(b)), σn2=0.5\sigma^{2}_{n}=0.5 (Fig. 5(c)), and σn2=0.7\sigma^{2}_{n}=0.7 (Fig. 5(d)). As σn2\sigma^{2}_{n} is increased, the classification becomes more difficult because the time-series data become noisier. Even so, the delay-variant method still outperforms the single-delay method over mm. Furthermore, the single-delay method degrades more quickly as σn2\sigma^{2}_{n} is increased, thereby indicating that our delay-variant method is more robust and stable in noisy conditions than the single-delay method.

Refer to caption
Figure 5: Average classification accuracies of the delay-variant (circles) and single-delay (squares) methods over 100 random splits of the Hes1-model data at different embedding dimensions mm. The time-series data were obtained by adding the Gaussian white noise with variance (a) σn2=0.1\sigma^{2}_{n}=0.1, (b) σn2=0.3\sigma^{2}_{n}=0.3, (c) σn2=0.5\sigma^{2}_{n}=0.5, and (d) σn2=0.7\sigma^{2}_{n}=0.7 to the time series generated with ν=16\nu=16 min. The inset in (a) (σn2=0.1\sigma^{2}_{n}=0.1) depicts the accuracy over the embedding dimension m=610m=6\text{--}10

III.2 Classification of the real time-series data

Further, we evaluate the performance of the delay-variant method in classifying different heartbeat-signal patterns based on six real electrocardiogram (ECG) datasets in Chen et al. (2015), namely ECG200 (separate normal and myocardial infarction heartbeats; 200 time series of length 96), ECG5000 (separate five levels of congestive heart failure; 5,000 time series of length 140), ECGFiveDays (separate records from two different days for the same patient; 884 time series of length 136), TwoLeadECG (separate records from two different leads; 1,162 time series of length 82), and Non-InvasiveFetalECGThorax1 and Non-InvasiveFetalECGThorax2 (separate records from the left and right thorax with expert labeling of 42 classes of non-invasive fetal ECG; 3,765 time series of length 750 in each dataset). We use the delay-variant method to classify the Caenorhabditis elegans roundworms from EigenWorms dataset as either wild or mutant based on their movements Brown et al. (2013); Yemini et al. (2013). The movement trajectories are processed as 259 time series of length 900. Finally, we denote that our method can be used to diagnose whether a certain symptom can be observed in an automotive subsystem using the time-series data associated with the engine noise with the FordB dataset for 4,446 time series of length 500 Chen et al. (2015). Here, the training data in the FordB dataset were collected under typical operating conditions, whereas the test data were collected under noisy conditions. For these datasets, we employ the train-test split provided in Chen et al. (2015). For the Non-InvasiveFetalECGThorax1, Non-InvasiveFetalECGThorax2, EigenWorms, and FordB datasets, we downsampled the time series with sampling rates as 2, 2, 3, and 2, respectively, before computing the persistence diagrams.

In the delay-variant method, we use K=10K=10 time-delay values of τ1=1,τ2=2,,τ10=10\tau_{1}=1,\tau_{2}=2,\ldots,\tau_{10}=10 with embedding dimensions of m=2,3,,10m=2,3,\ldots,10. In the single-delay method, we choose single τ\tau for each value from {τ1,,τ10}\{\tau_{1},\ldots,\tau_{10}\}. For each embedding dimension of the single-delay method, we use the maximum accuracy in the test dataset over all values of τ\tau to compare with the delay-variant method. In both the methods, we use a linear combination of normalized kernels for the zero-dimensional and one-dimensional holes. The combination weights and the embedding dimension mm are selected by cross-validation (see Appendix C). Further, we compare the delay-variant and single-delay methods along with some alternative standard approaches. Most of the previous research on time-series classification focused on finding appropriate similarity measures for the 1st1^{\text{st}}-nearest neighbor (NN) classifier. For ensuring similarity in the time domain, we consider either the Euclidean (E) or dynamic time warping (D) distance. For ensuring similarity in the frequency domain and for similarity in the autocorrelation, we consider the power spectrum (PS) and the autocorrelation function (AC). We also examine the elastic ensemble (EE) Lines and Bagnall (2014) as a combination of nearest neighbor classifiers using multiple distance measures in the time domain. Finally, we perform the learned-shapelets (LS) method, which classifies time series by learning the representative shapelets (i.e., short discriminant time-series subsequences) Grabocka et al. (2014). Instead of implementing these standard algorithms, we use the results from Ref. Bagnall et al. (2016, ). The test results are presented in Table 1, wherein the best and second-best accuracy scores of each dataset are colored in dark pink and light pink, respectively. The delay-variant method shows better results than the single-delay method for all the datasets and outperforms all the other algorithms on an average. Further, the delay-variant method offers the best results for four of the eight datasets and the second-best results for the remaining four, suggesting that our method is an effective mechanism to classify the time series.

Table 1: Classification accuracies (%) for the ECG200, ECG5000, Non-InvasiveFetalECGThorax1 (Thorax1), Non-InvasiveFetalECGThorax2 (Thorax2), ECGFiveDays (FiveDays), TwoLeadECG (TwoLead), EigenWorms (Worms), and FordB datasets. For each dataset, the best and second-best scores are colored in dark pink and light pink, respectively. The notations are 1st1^{\text{st}}-nearest neighbor classifier (NN), Euclidean distance (E), dynamic time warping distance (D), autocorrelation function (AC), power spectrum (PS), elastic ensemble (EE), and learned-shapelets method (LS).
Data Delay variant Single delay NN (E) NN (D) NN (AC) NN (PS) EE LS
ECG200 90.0 87.0 88.0 88.0 82.0 86.0 88.0 88.0
ECG5000 93.6 92.1 92.5 92.5 91.0 93.6 93.9 93.2
Thorax1 91.8 78.2 82.9 82.9 72.1 87.5 84.6 25.9
Thorax2 93.0 83.6 88.0 87.0 75.2 88.4 91.4 77.0
FiveDays 99.9 92.0 79.7 79.7 98.1 100.0 82.0 100.0
TwoLead 99.4 94.0 74.7 86.8 80.4 96.1 97.1 99.6
Worms 83.1 83.1 61.0 58.4 76.6 81.8 68.8 72.7
FordB 90.8 78.2 60.6 59.9 78.0 79.0 66.2 91.7

III.3 Multiple-time-scale patterns captured by delay-variant embedding

Subsequently, we investigate the types of features that can be effectively captured using the delay-variant method. Because the time scale of the extracted pattern is partially dependent on the time delay, the delay-variant method should be able to extract the patterns that comprise multiple different time scales. To better understand this ability, we analyze the synthetic noisy time series obtained from a frequency-modulated model. Given the original signal sm(t)=Amsin(2πfmt)s_{m}(t)=A_{m}\sin(2\pi f_{m}t) with a carrier signal sc(t)=Acsin(2πfct)s_{c}(t)=A_{c}\sin(2\pi f_{c}t), we consider the modulated signal to be s(t)=Acsin(2πfct+Amsin(2πfmt))s(t)=A_{c}\sin(2\pi f_{c}t+A_{m}\sin(2\pi f_{m}t)), where t[0,0.1]t\in[0,0.1], Am=10A_{m}=10, Ac=1.0A_{c}=1.0, and fm=25f_{m}=25. Further, we consider the frequency fcf_{c} to be fc=5ff_{c}=5f with f=120f=1\text{--}20 to represent patterns with multiple time scales in the modulated signal. For each value of fcf_{c}, we generate 20 noisy discrete time series zl(n)=s(0.0002n)+wl(n)z_{l}(n)=s(0.0002n)+w_{l}(n), where l=1,2,,20l=1,2,\ldots,20, n=0,1,,500n=0,1,\ldots,500, and wl(n)w_{l}(n) denote the Gaussian white noise with a variance of 0.1.

Refer to caption
Figure 6: (a) Time series generated from the frequency-modulated model. (b) Three-dimensional persistence diagram of the loop patterns calculated with m=6m=6 and 20 values of τ\tau given by τ1=1,τ2=2,,τ20=20\tau_{1}=1,\tau_{2}=2,\ldots,\tau_{20}=20. The multiple-time-scale components in the time series corresponding to the limit-cycle oscillations of the dynamics are revealed from the points in the diagram over different values of τ\tau.

In the delay-variant method, we use K=20K=20 time-delay values of τ1=1,τ2=2,,τ20=20\tau_{1}=1,\tau_{2}=2,\ldots,\tau_{20}=20. Figure 6 depicts a time series with fc=25f_{c}=25 and its three-dimensional persistence diagram for the loops or cycles at an embedding dimension of m=6m=6. In the persistence diagram, the points near the diagonal plane 𝒲(3)={(b,b,τ)b,τ}\mathcal{W}^{(3)}=\{(b,b,\tau)\mid b,\tau\in\mathbb{R}\} represent the noise in the time series, whereas the points that are away from 𝒲(3)\mathcal{W}^{(3)} represent significant cycles in the embedded space. Therefore, the points in the diagram that has been plotted for different values of τ\tau reveal the multiple-time-scale components in the time series corresponding to the limit-cycle oscillations of the dynamics.

An example of the principal component projection from the kernel space is depicted in Fig. 7 for (a) the delay-variant method with embedding dimension m=6m=6, (b) the single-delay method with m=6m=6 and τ=1,2,,20\tau=1,2,\ldots,20, and (c) the single-delay method with m=120m=120 (to compare with the delay-variant method at the same effective embedding dimension M=120M=120). Because the length of the time series is 501, in the single-delay approach, τ\tau can only take values from {1,2,3,4}\{1,2,3,4\} when m=120m=120. Different colors represent the data generated using different values of fcf_{c}. Figure 7 depicts that the delay-variant method results in more distinct regions corresponding to different values of fcf_{c} than that obtained using the single-delay method. Note that the points obtained using the single-delay method cannot be distinct, regardless of the angle. These results denote that the delay-variant method is superior to the single-delay method with respect to the ability to identify changes in the multiple-time-scale patterns in time series.

Refer to caption
Figure 7: Projection of the features of synthetic frequency-modulated time series from kernel space to principal components for (a) the delay-variant method with embedding dimension m=6m=6, (b) the single-delay method with m=6m=6 and τ=1,2,,20\tau=1,2,\ldots,20, and (c) the single-delay method with m=120m=120 (for performing comparison with the delay-variant method at the same effective embedding dimension M=120M=120) and τ=1,2,3,4\tau=1,2,3,4. Different colors represent the data generated using different values of the carrier frequency fcf_{c}.

IV Concluding remarks

We have demonstrated that the topological features that are constructed using delay-variant embedding can capture the topological variation in a time series when the time-delay value changes. Therefore, these features can be used to discriminate between different time series. Theoretically, we have mathematically denoted that these features are robust when the time series is being perturbed by noise. Our method outperformed the standard time-series analysis techniques while classifying both synthetic and real time-series data. These results indicate that the topological features that are deduced with delay-variant embedding can be used to reveal the representative features of the original time series.

In general, the mathematical model of a time series denotes a stochastic process. It is important to understand the factor that distinguishes this model from other models with respect to the dynamics. Recent powerful deep learning methods have focused on accurately classifying time series with predefined labels without identifying the essential behavior of the model. Unfortunately, these approaches could fail when the input is slightly but deliberately perturbed, resulting in misclassification by a neural network. Such perturbed examples can be referred to as adversarial examples and have gained considerable attention recently in the discussion about the robustness of deep learning Szegedy et al. (2014). Machine learning tools, such as deep learning methods, find it difficult to understand why adversarial examples are misclassified because of the black-box nature of the causality between the input data and their underlying model. From another viewpoint, topological data analysis provides a systematic methodology for understanding the true behavior of the data, which will eventually become significant in characterizing the behavior of the model.

We expect that our study will lead to a unified analysis mechanism of the time-series data. This study paves several opportunities for the application of topological tools to produce effective algorithms for analyzing the time-series data. For instance, given a time series, we could use delay-variant embedding to predict its future values or to detect anomalous values or outliers therein.

Appendix A Topological features obtained from data

To extract the topological features from a set PP of points in the Euclideand space L\mathbb{R}^{L}, we build a ε\varepsilon-scale Vietoris–Rips complex (denoted as VR(P,ε)\text{VR}(P,\varepsilon)) from a union of LL-dimensional hyperspheres of radius ε\varepsilon centered at each point in PP. Every collection of n+1n+1 affinely independent points in PP forms an nn-simplex in VR(P,ε)\text{VR}(P,\varepsilon) if the pairwise distance between points is less than or equal to 2ε2\varepsilon. The complex VR(P,ε)\text{VR}(P,\varepsilon) gives us the topological information from PP associated with radius ε\varepsilon. For example, in Fig. 8(d), there are two loops (called one-dimensional holes) and one connected component (called a zero-dimensional hole). However, this information depends on how to choose the radius ε\varepsilon. If ε\varepsilon is too small, the complex created by union hyperspheres (Fig. 8(b)) remains almost the same as the discrete points (Fig. 8(a)). If ε\varepsilon is too large, we obtain a trivially connected and overlapped complex without any hole inside it (Fig. 8(f)).

Refer to caption
Figure 8: We consider LL-dimensional hyperspheres with radius ε\varepsilon centered at each point. From (a) to (f), we increase the radius ε\varepsilon gradually. By increasing ε\varepsilon, holes appear and disappear in the region. The first one-dimensional hole (blue loop) appears at (c), while the second (red loop) appears at (d). Then at (e), the first hole disappears and finally the second one disappears at (f). (g) An examplary two-dimensional persistence diagram calculated by considering the appearance and disappearance of loops from a filtration of the Vietoris–Rips complex. The radius ε\varepsilon takes the discrete values from the set {0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6}. There are two persistence pairs (0.2, 0.4) and (0.3, 0.6), which represent the appearance and disappearance of the blue loop and the red loop, respectively. These persistence pairs are displayed as a blue circle and a red triangle in the Cartesian plane, respectively. The collection of all the persistence pairs in the filtration is a two-dimensional persistence diagram.

The problem above can be solved by considering not only a single radius, but all choices of radius ε\varepsilon. This yields a filtration, which is a sequence of simplicial complexes used to monitor the appearance of holes such as clusters and loops over changing ε\varepsilon. For example, in Fig. 8, we start from ε=0\varepsilon=0 (Fig. 8(a)); then we increase ε\varepsilon gradually to see whether the holes appear or disappear. When a hole appears and disappears at radii ε=b\varepsilon=b and ε=d\varepsilon=d, respectively, the hole is characterized by a pair (b,d)(b,d), where bb, dd, and (b,d)(b,d) are referred to as the birth scale, death scale, and persistence pair, respectively.

The persistence pairs, which we use as topological features, are displayed in the Cartesian plane as a two-dimensional persistence diagram where the birth and death scales appear as the horizontal and vertical coordinates, respectively. In the two-dimensional persistence diagram, points far from the diagonal generally correspond to robust features, which are persistent over a long time, whereas those near the diagonal are regarded as noise in the data. We provide an exemplary two-dimensional persistence diagram in Fig. 8(g), where we consider the appearance and disappearance of loops in a filtration of the Vietoris–Rips complex constructed from points when ε\varepsilon takes the discrete values from the set {0,0.1,0.2,0.3,0.4,0.5,0.6}\{0,0.1,0.2,0.3,0.4,0.5,0.6\}.

Appendix B Proof of the stability of the three-dimensional persistence diagrams

We prove the result in Eq. (3). First, we define the bottleneck distance dB(2)d^{(2)}_{B} between two two-dimensional persistence diagrams Dg1(2)Dg_{1}^{(2)} and Dg2(2)Dg_{2}^{(2)} as

dB(2)(Dg1(2),Dg2(2))=infγmax(𝒑,𝒒)γ𝒑𝒒,d^{(2)}_{B}(Dg_{1}^{(2)},Dg_{2}^{(2)})=\inf_{\gamma}\max_{(\bm{p},\bm{q})\in\gamma}\|\bm{p}-\bm{q}\|_{\infty}, (5)

where γ\gamma is a matching between Dg1(2)Dg_{1}^{(2)} and Dg2(2)Dg_{2}^{(2)} such that a point on a diagram is matched to a point on the other or to its projection on the diagonal line 𝒲(2)={(b,b)b}\mathcal{W}^{(2)}=\{(b,b)\mid b\in\mathbb{R}\}. Here the distance 𝒛\|\bm{z}\|_{\infty} is defined as 𝒛=max{|z1|,,|zn|}\|\bm{z}\|_{\infty}=\max\{|z_{1}|,\ldots,|z_{n}|\} for 𝒛=(z1,,zn)n\bm{z}=(z_{1},\ldots,z_{n})\in\mathbb{R}^{n}.

The bottleneck distance between the two-dimensional persistence diagrams satisfies the following property Chazal et al. (2014):

Proposition A1

Let XX and YY be finite sets of points embedded in the Euclidean space n\mathbb{R}^{n}. Denote their two-dimensional persistent diagrams as Dg(2)(X)Dg^{(2)}(X) and Dg(2)(Y)Dg^{(2)}(Y), respectively. Then,

dB(2)(Dg(2)(X),Dg(2)(Y))2dH(X,Y),d^{(2)}_{B}(Dg^{(2)}(X),Dg^{(2)}(Y))\leq 2d_{H}(X,Y), (6)

where dH(X,Y)d_{H}(X,Y) is the Hausdorff distance given by

dH(X,Y)=max{max𝒙Xmin𝒚Yd(𝒙,𝒚),max𝒚Ymin𝒙Xd(𝒙,𝒚)}.d_{H}(X,Y)=\max\left\{\max_{\bm{x}\in X}\min_{\bm{y}\in Y}d(\bm{x},\bm{y}),\max_{\bm{y}\in Y}\min_{\bm{x}\in X}d(\bm{x},\bm{y})\right\}.

Here, d(𝐱,𝐲)d(\bm{x},\bm{y}) is the Euclidean distance between 𝐱,𝐲\bm{x},\bm{y}.

For each τ𝒯={τ1,τ2,,τK}\tau\in\mathcal{T}=\{\tau_{1},\tau_{2},...,\tau_{K}\}, we denote X(m)τX_{(m)}^{\tau} and Y(m)τY_{(m)}^{\tau} are the embedded points of x(t)x(t) and y(t)y(t), respectively, in an embedding space with dimension mm and time delay τ\tau. Consider two two-dimensional persistence diagrams Dg(2)(X(m)τ)Dg^{(2)}(X_{(m)}^{\tau}) and Dg(2)(Y(m)τ)Dg^{(2)}(Y_{(m)}^{\tau}) calculated from X(m)τX_{(m)}^{\tau} and Y(m)τY_{(m)}^{\tau}, respectively. Let Γτ\Gamma_{\tau} be the set of matchings defined in Eq. (5) between Dg(2)(X(m)τ)Dg^{(2)}(X_{(m)}^{\tau}) and Dg(2)(Y(m)τ)Dg^{(2)}(Y_{(m)}^{\tau}). For each collection Λ={γ1,γ2,,γKγiΓτi,i=1,2,,K}\Lambda=\{\gamma_{1},\gamma_{2},\ldots,\gamma_{K}\mid\gamma_{i}\in\Gamma_{\tau_{i}},i=1,2,\ldots,K\}, we construct a matching ψ\psi between two three-dimensional persistence diagrams Dg(m)(3)(x)Dg_{(m)}^{(3)}(x) and Dg(m)(3)(y)Dg_{(m)}^{(3)}(y), such that, for each (𝒑,𝒒)ψ(\bm{p},\bm{q})\in\psi, then 𝒑=(b1,d1,τ)\bm{p}=(b_{1},d_{1},\tau), 𝒒=(b2,d2,τ)\bm{q}=(b_{2},d_{2},\tau), and (𝒑γ,𝒒γ)γ(\bm{p}_{\gamma},\bm{q}_{\gamma})\in\gamma, where 𝒑γ=(b1,d1)\bm{p}_{\gamma}=(b_{1},d_{1}), 𝒒γ=(b2,d2)\bm{q}_{\gamma}=(b_{2},d_{2}), and γΛΓτ\gamma\in\Lambda\cap\Gamma_{\tau}. Let Γ\Gamma be the set of all matchings ψ\psi constructed this way. From the definition of bottleneck distance, we have the following inequality:

dB,ξ(3)(Dg(m)(3)(x),Dg(m)(3)(y))infψΓmax(𝒑,𝒒)ψdξ()(𝒑,𝒒).d^{(3)}_{B,\xi}(Dg_{(m)}^{(3)}(x),Dg_{(m)}^{(3)}(y))\leq\inf_{\psi\in\Gamma}\max_{(\bm{p},\bm{q})\in\psi}d^{(\infty)}_{\xi}(\bm{p},\bm{q}). (7)

For (𝒑,𝒒)ψ(\bm{p},\bm{q})\in\psi, we have

dξ()(𝒑,𝒒)\displaystyle d^{(\infty)}_{\xi}(\bm{p},\bm{q}) =max{|b1b2|,|d1d2|,ξ|ττ|}\displaystyle=\max\{|b_{1}-b_{2}|,|d_{1}-d_{2}|,\xi|\tau-\tau|\} (8)
=max{|b1b2|,|d1d2|}\displaystyle=\max\{|b_{1}-b_{2}|,|d_{1}-d_{2}|\} (9)
=𝒑γ𝒒γ,\displaystyle=\|\bm{p}_{\gamma}-\bm{q}_{\gamma}\|_{\infty}, (10)

and Eq. (7) becomes

dB,ξ(3)\displaystyle d^{(3)}_{B,\xi} (Dg(m)(3)(x),Dg(m)(3)(y))\displaystyle(Dg_{(m)}^{(3)}(x),Dg_{(m)}^{(3)}(y))
maxτ𝒯infγΓτmax(𝒑γ,𝒒γ)γ𝒑γ𝒒γ\displaystyle\leq\max_{\tau\in\mathcal{T}}\inf_{\gamma\in\Gamma_{\tau}}\max_{(\bm{p}_{\gamma},\bm{q}_{\gamma})\in\gamma}\|\bm{p}_{\gamma}-\bm{q}_{\gamma}\|_{\infty} (11)
=maxτ𝒯dB(2)(Dg(2)(X(m)τ),Dg(2)(Y(m)τ)).\displaystyle=\max_{\tau\in\mathcal{T}}d^{(2)}_{B}(Dg^{(2)}(X_{(m)}^{\tau}),Dg^{(2)}(Y_{(m)}^{\tau})). (12)

From Eq. (12) and Proposition A1, we have

dB,ξ(3)(Dg(m)(3)(x),Dg(m)(3)(y))2maxτ𝒯dH(X(m)τ,Y(m)τ).d^{(3)}_{B,\xi}(Dg_{(m)}^{(3)}(x),Dg_{(m)}^{(3)}(y))\leq 2\max_{\tau\in\mathcal{T}}d_{H}(X_{(m)}^{\tau},Y_{(m)}^{\tau}). (13)

At each time point t0t_{0}, we consider 𝒳(m)τ(t0)=(x(t0),x(t0τ),,x(t0(m1)τ))X(m)τ\mathcal{X}_{(m)}^{\tau}(t_{0})=\left(x(t_{0}),x(t_{0}-\tau),\ldots,x(t_{0}-(m-1)\tau)\right)\in X_{(m)}^{\tau} and 𝒴(m)τ(t0)=(y(t0),y(t0τ),,y(t0(m1)τ))Y(m)τ\mathcal{Y}_{(m)}^{\tau}(t_{0})=\left(y(t_{0}),y(t_{0}-\tau),\ldots,y(t_{0}-(m-1)\tau)\right)\in Y_{(m)}^{\tau}. From the definition of Euclidean distance, we have

d(𝒳(m)τ(t0),𝒴(m)τ(t0))\displaystyle d(\mathcal{X}_{(m)}^{\tau}(t_{0}),\mathcal{Y}_{(m)}^{\tau}(t_{0})) =i=0m1[x(t0+iτ)y(t0+iτ)]2\displaystyle=\sqrt{\sum_{i=0}^{m-1}[x(t_{0}+i\tau)-y(t_{0}+i\tau)]^{2}}
mmaxt|x(t)y(t)|.\displaystyle\leq\sqrt{m}\max_{t}|x(t)-y(t)|. (14)

From Eq. (14) and the definition of Hausdorff distance, we have the following inequality:

dH(X(m)τ,Y(m)τ)\displaystyle d_{H}(X_{(m)}^{\tau},Y_{(m)}^{\tau}) =max{maxt1mint2d(𝒳(m)τ(t1),𝒴(m)τ(t2)),maxt2mint1d(𝒳(m)τ(t1),𝒴(m)τ(t2))}\displaystyle=\max\left\{\max_{t_{1}}\min_{t_{2}}d(\mathcal{X}_{(m)}^{\tau}(t_{1}),\mathcal{Y}_{(m)}^{\tau}(t_{2})),\max_{t_{2}}\min_{t_{1}}d(\mathcal{X}_{(m)}^{\tau}(t_{1}),\mathcal{Y}_{(m)}^{\tau}(t_{2}))\right\}
max{maxt1d(𝒳(m)τ(t1),𝒴(m)τ(t1)),maxt2d(𝒳(m)τ(t2),𝒴(m)τ(t2))}mmaxt|x(t)y(t)|.\displaystyle\leq\max\left\{\max_{t_{1}}d(\mathcal{X}_{(m)}^{\tau}(t_{1}),\mathcal{Y}_{(m)}^{\tau}(t_{1})),\max_{t_{2}}d(\mathcal{X}_{(m)}^{\tau}(t_{2}),\mathcal{Y}_{(m)}^{\tau}(t_{2}))\right\}\leq\sqrt{m}\max_{t}|x(t)-y(t)|. (15)

Equation (3) is obtained by taking the maximum of Eq. (15) over τ𝒯\tau\in\mathcal{T} and applying it to Eq. (13).

Appendix C Kernel of the three-dimensional persistence diagrams

C.1 Proof of the positive-definite property

We prove that the proposed kernel for the three-dimensional persistence diagrams is positive-definite. We prove for the case when the positive rescaling coefficient ξ=1\xi=1. The proof is straightforward for other positive values of ξ\xi.

For each parameter σ\sigma and a three-dimensional persistence diagram Dg(3)Dg^{(3)}, we define the following feature mapping Φσ:𝒟(3)L2(Ω)\Phi_{\sigma}:\mathcal{D}^{(3)}\rightarrow L^{2}(\Omega), where 𝒟(3)\mathcal{D}^{(3)} is the space of three-dimensional persistence diagrams, Ω={𝒙=(x1,x2,x3)x1,x2,x3,x1x2}\Omega=\{\bm{x}=(x_{1},x_{2},x_{3})\mid x_{1},x_{2},x_{3}\in\mathbb{R},x_{1}\leq x_{2}\} and L2(Ω)L^{2}(\Omega) is the Hilbert space of square-integrable L2L^{2}-functions defined on the domain Ω\Omega:

Φσ(Dg(3))(𝒙)\displaystyle\Phi_{\sigma}(Dg^{(3)})(\bm{x})
=1κ𝒑Dg(3)exp(𝒙𝒑2σ2)exp(𝒙𝒑¯2σ2),\displaystyle=\frac{1}{\sqrt{\kappa}}\sum_{\bm{p}\in Dg^{(3)}}\exp\left(-\frac{\|\bm{x}-\bm{p}\|^{2}}{\sigma^{2}}\right)-\exp\left(-\frac{\|\bm{x}-\bm{\bar{p}}\|^{2}}{\sigma^{2}}\right), (16)

where 𝒑¯\bm{\bar{p}} is a symmetric point of 𝒑\bm{p} with respect to diagonal plane 𝒲(3)\mathcal{W}^{(3)} on 3\mathbb{R}^{3} and κ\kappa is a positive value depending on σ\sigma, which we will show later.

We show that the kernel defined in Eq. (II.3) is the inner product of Φσ\Phi_{\sigma} on L2(Ω)L^{2}(\Omega) as

kσ,ξ(Dg1(3),Dg2(3))\displaystyle k_{\sigma,\xi}(Dg^{(3)}_{1},Dg^{(3)}_{2}) =Φσ(Dg1(3)),Φσ(Dg2(3))L2(Ω)\displaystyle=\langle\Phi_{\sigma}(Dg^{(3)}_{1}),\Phi_{\sigma}(Dg^{(3)}_{2})\rangle_{L^{2}(\Omega)} (17)
=ΩΦσ(Dg1(3))(𝒙)Φσ(Dg2(3))(𝒙)𝑑𝒙.\displaystyle=\int_{\Omega}\Phi_{\sigma}(Dg^{(3)}_{1})(\bm{x})\Phi_{\sigma}(Dg^{(3)}_{2})(\bm{x})d\bm{x}. (18)

We extend the domain of function Φσ(Dg(3))(𝒙)\Phi_{\sigma}(Dg^{(3)})(\bm{x}) from Ω\Omega to 3\mathbb{R}^{3} to obtain a function that is symmetric with respect to the diagonal plane 𝒲(3)\mathcal{W}^{(3)} (because 𝒙𝒑=𝒙¯𝒑¯\|\bm{x}-\bm{p}\|=\|\bm{\bar{x}}-\bm{\bar{p}}\| and 𝒙𝒑¯=𝒙¯𝒑\|\bm{x}-\bm{\bar{p}}\|=\|\bm{\bar{x}}-\bm{p}\|). Then we have

Ω(3)\displaystyle\int_{\Omega^{(3)}} Φσ(Dg1(3))(𝒙)Φσ(Dg2(3))(𝒙)d𝒙\displaystyle\Phi_{\sigma}(Dg^{(3)}_{1})(\bm{x})\Phi_{\sigma}(Dg^{(3)}_{2})(\bm{x})d\bm{x}
=123Φσ(Dg1(3))(𝒙)Φσ(Dg2(3))(𝒙)𝑑𝒙\displaystyle=\frac{1}{2}\int_{\mathbb{R}^{3}}\Phi_{\sigma}(Dg^{(3)}_{1})(\bm{x})\Phi_{\sigma}(Dg^{(3)}_{2})(\bm{x})d\bm{x}
=12κ𝒑Dg1(3)𝒒Dg2(3)[A(𝒑,𝒒)+A(𝒑¯,𝒒¯)A(𝒑¯,𝒒)A(𝒑,𝒒¯)],\displaystyle=\frac{1}{2\kappa}\sum_{\begin{subarray}{c}\bm{p}\in Dg^{(3)}_{1}\\ \bm{q}\in Dg^{(3)}_{2}\end{subarray}}\left[A(\bm{p},\bm{q})+A(\bm{\bar{p}},\bm{\bar{q}})-A(\bm{\bar{p}},\bm{q})-A(\bm{p},\bm{\bar{q}})\right], (19)

where A(𝒄,𝒅)A(\bm{c},\bm{d}) is defined as

A(𝒄,𝒅)=\displaystyle A(\bm{c},\bm{d})= 3exp(𝒙𝒄2+𝒙𝒅2σ2)𝑑𝒙\displaystyle\int_{\mathbb{R}^{3}}\exp\left(-\frac{\|\bm{x}-\bm{c}\|^{2}+\|\bm{x}-\bm{d}\|^{2}}{\sigma^{2}}\right)d\bm{x}
=\displaystyle= σ3(2π)3/28exp(𝒄𝒅22σ2).\displaystyle\frac{\sigma^{3}(2\pi)^{3/2}}{8}\exp\left(-\frac{\|\bm{c}-\bm{d}\|^{2}}{2\sigma^{2}}\right). (20)

Here, 𝒄,𝒅3\bm{c},\bm{d}\in\mathbb{R}^{3}. Since 𝒑𝒒2=𝒑¯𝒒¯2,𝒑¯𝒒2=𝒑𝒒¯2\|\bm{p}-\bm{q}\|^{2}=\|\bm{\bar{p}}-\bm{\bar{q}}\|^{2},\|\bm{\bar{p}}-\bm{q}\|^{2}=\|\bm{p}-\bm{\bar{q}}\|^{2}, and from Eqs. (19) and (20) we have the closed form of the kernel where κ=π2σ42\kappa=\dfrac{\pi^{2}\sigma^{4}}{2}.

The kernel is positive-definite due to the inner-product nature of the feature mapping. Consider the three-dimensional persistence diagrams Dg1(3),,DgN(3)Dg^{(3)}_{1},\ldots,Dg^{(3)}_{N} of the ll-dimensional holes that are needed to compute the kernel. The Gram matrix for these diagrams is defined as 𝐊N×N(l)=[kij]\mathbf{K}^{(l)}_{N\times N}=[k_{ij}], whose element kijk_{ij} is kij=kσ,ξ(Dgi(3),Dgj(3))k_{ij}=k_{\sigma,\xi}(Dg^{(3)}_{i},Dg^{(3)}_{j}) with i=1,,Ni=1,\ldots,N and j=1,,Nj=1,\ldots,N.

C.2 Selection of the kernel parameters

In time-series classification experiments, we compute the kernel by taking time-delay values as discrete values with sampling interval 1 and set the rescaling coefficient to ξ=σ\xi=\sigma. The selection of kernel bandwidth σ\sigma can be chosen by cross-validation; however, as proposed in Gretton et al. (2008), we present here a heuristic way to select σ\sigma. Consider the three-dimensional persistence diagrams Dg1(3),Dg2(3),,DgN(3)Dg_{1}^{(3)},Dg_{2}^{(3)},\ldots,Dg_{N}^{(3)} that are required to compute the kernel. We denote σs2=median{(bibj)2+(didj)2(bi,di,τi),(bj,dj,τj)Dgs(3)}\sigma^{2}_{s}=\text{median}\{(b_{i}-b_{j})^{2}+(d_{i}-d_{j})^{2}\mid(b_{i},d_{i},\tau_{i}),(b_{j},d_{j},\tau_{j})\in Dg_{s}^{(3)}\} with s=1,2,,Ns=1,2,\ldots,N. σ\sigma is set as σ2=12median{σs2s=1,,N}\sigma^{2}=\dfrac{1}{2}\text{median}\{\sigma^{2}_{s}\mid s=1,\ldots,N\}, such that 2σ22\sigma^{2} takes values close to many (bibj)2+(didj)2(b_{i}-b_{j})^{2}+(d_{i}-d_{j})^{2} values.

C.3 Using kernels with holes of multiple dimensions

For all ll-dimensional holes, we obtain the persistence diagrams and compute their kernels as kσl,ξlk_{\sigma_{l},\xi_{l}}, where σl\sigma_{l} is the bandwidth of this kernel and ξl\xi_{l} is the positive rescaling coefficient corresponding with ll-dimensional holes. To use persistence diagrams for different dimensions of holes, we can combine the kernels at various dimensions through linear combinations. In our time-series classification experiments, we only consider the topological features of zero-dimensional holes (connected components) and one-dimensional holes (loops). Thus, the combined Gram matrix of NN data can be defined as

𝐊N×N=α0𝐊N×N(0)+α1𝐊N×N(1),\mathbf{K}_{N\times N}=\alpha_{0}\mathbf{K}^{(0)}_{N\times N}+\alpha_{1}\mathbf{K}^{(1)}_{N\times N}, (21)

where 0α0,α11,α0+α1=10\leq\alpha_{0},\alpha_{1}\leq 1,\alpha_{0}+\alpha_{1}=1, and 𝐊N×N(0)\mathbf{K}^{(0)}_{N\times N} and 𝐊N×N(1)\mathbf{K}^{(1)}_{N\times N} are the Gram matrices of NN persistence diagrams at the zero-dimensional and one-dimensional holes, respectively. In our time-series classification experiments, we choose α0\alpha_{0} from 0, 0.0001, 0.0002, 0.0005, 0.001, 0.002, 0.005, 0.01, 0.02, 0.05, 0.1, 0.2, 0.5, 1.0 by cross-validation.

Appendix D Synthetic data

We generate data from Hes1 regulatory model, which is a stochastic model of the Hes1 genetic oscillator exhibiting negative autoregulation with delay Monk (2003); Galla (2009). The model describes the concentrations and interactions of two types of particles: hes1 mRNA molecules, denoted by M, and Hes1 protein molecules, denoted by P. The stochastic dynamics are defined by the following reactions:

M\displaystyle M μm\displaystyle\stackrel{{\scriptstyle\mu_{m}}}{{\longrightarrow}} ;\displaystyle\varnothing; (22)
P\displaystyle P μp\displaystyle\stackrel{{\scriptstyle\mu_{p}}}{{\longrightarrow}} ;\displaystyle\varnothing; (23)
M\displaystyle M βp\displaystyle\stackrel{{\scriptstyle\beta_{p}}}{{\longrightarrow}} M+P;\displaystyle M+P; (24)
\displaystyle\varnothing g(np),K(ζ)\displaystyle\stackrel{{\scriptstyle g(n_{p}),K(\zeta)}}{{\Longrightarrow}} M.\displaystyle M. (25)

The degradations of mRNA and protein are described in reactions (22) and (23), respectively, where the rates of these degradation are μm\mu_{m} and μp\mu_{p}, respectively. mRNA molecules are translated into protein via reaction (24) by the translation-rate parameter βp\beta_{p}. The final reaction, (25) with a double arrow describes the transcription process for producing mRNA which is accompanied by time delay. The rate of hes1 mRNA production depends on the concentration of Hes1 protein molecules through a negative-feedback mechanism, as described by the function g(np)=βm[1+[np/(P0Θ)]h]1g(n_{p})=\beta_{m}\left[1+\left[n_{p}/(P_{0}\Theta)\right]^{h}\right]^{-1}. Here, npn_{p} is the number of protein molecules in the system, βm\beta_{m}, P0P_{0}, and hh are constants, and Θ\Theta is the size of the system. This transcription process is associated with a time delay ζ\zeta drawn from a distribution K(ζ)K(\zeta); that is, the protein concentration at time tζt-\zeta only affects the production of mRNA at time tt.

In the simulations, the protein levels were measured after every ν\nu (= 64, 32, 16, 8) min for 4,096 min. The measurements start at t=5,000t=5,000 min for the system to equilibrate. The parameters for the oscillatory regime are P0=300,h=1,ζ=0,βm=βp=1,μm=μp=0.07,Θ=20P_{0}=300,h=1,\zeta=0,\beta_{m}=\beta_{p}=1,\mu_{m}=\mu_{p}=0.07,\Theta=20, and for the non-oscillatory regime are P0=100,h=3,ζ=18,βm=βp=1,μm=μp=0.03,Θ=20P_{0}=100,h=3,\zeta=18,\beta_{m}=\beta_{p}=1,\mu_{m}=\mu_{p}=0.03,\Theta=20.

References