This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Time-Aware Knowledge Representations of
Dynamic Objects with Multidimensional Persistence

Baris Coskunuzer\equalcontrib1, Ignacio Segovia-Dominguez\equalcontrib2, Yuzhou Chen\equalcontrib3, Yulia R. Gel1,4
Abstract

Learning time-evolving objects such as multivariate time series and dynamic networks requires the development of novel knowledge representation mechanisms and neural network architectures, which allow for capturing implicit time-dependent information contained in the data. Such information is typically not directly observed but plays a key role in the learning task performance. In turn, lack of time dimension in knowledge encoding mechanisms for time-dependent data leads to frequent model updates, poor learning performance, and, as a result, subpar decision-making. Here we propose a new approach to a time-aware knowledge representation mechanism that notably focuses on implicit time-dependent topological information along multiple geometric dimensions. In particular, we propose a new approach, named Temporal MultiPersistence (TMP), which produces multidimensional topological fingerprints of the data by using the existing single parameter topological summaries. The main idea behind TMP is to merge the two newest directions in topological representation learning, that is, multi-persistence which simultaneously describes data shape evolution along multiple key parameters, and zigzag persistence to enable us to extract the most salient data shape information over time. We derive theoretical guarantees of TMP vectorizations and show its utility, in application to forecasting on benchmark traffic flow, Ethereum blockchain, and electrocardiogram datasets, demonstrating the competitive performance, especially, in scenarios of limited data records. In addition, our TMP method improves the computational efficiency of the state-of-the-art multipersistence summaries up to 59.5 times.

1 Introduction

Over the last decade, the field of topological data analysis (TDA) has demonstrated its effectiveness in revealing concealed patterns within diverse types of data that conventional methods struggle to access. Notably, in cases where conventional approaches frequently falter, tools such as persistent homology (PH) within TDA have showcased remarkable capabilities in identifying both localized and overarching patterns. These tools have the potential to generate a distinctive topological signature, a trait that holds great promise for a range of ML applications. This inherent capacity of PH becomes particularly appealing for capturing implicit temporal traits of evolving data, which might hold the crucial insights underlying the performance of learning tasks.

In turn, the concept of multiparameter persistence (MP) introduces a groundbreaking dimension to machine learning by enhancing the capabilities of persistent homology. Its objective is to analyze data across multiple dimensions concurrently, in a more nuanced manner. However, due to the complex algebraic challenges intrinsic to its framework, MP has yet to be universally defined in all contexts (Botnan and Lesnick 2022; Carrière and Blumberg 2020).

In response, we present a novel approach designed to effectively harness MP homology for the dual purposes of time-aware learning and the representation of time-dependent data. Specifically, the temporal parameter within time-dependent data furnishes the crucial dimension necessary for the application of the slicing concept within the MP framework. Our method yields a distinctive topological MP signature for the provided time-dependent data, manifested as multidimensional vectors (matrices or tensors). These vectors are highly compatible with ML applications. Notably, our findings possess broad applicability and can be tailored to various forms of PH vectorization, rendering them suitable for diverse categories of time-dependent data.

Our key contributions can be summarized as follows:

  • We bring a new perspective to use TDA for time-dependent data by using multipersistence approach.

  • We introduce TMP vectorizations framework which provides a multidimensional topological fingerprint of the data. TMP framework expands many known single persistence vectorizations to multidimensions by utilizing time dimension effectively in PH machinery.

  • The versatility of our TMP framework allows its application to diverse types of time-dependent data. Furthermore, we show that TMP enjoys many important stability guarantees as most single persistence summaries.

  • Rooted in computational linear algebra, TMP vectorizations generate multidimensional arrays (i.e., matrices or tensors) which serve as compatible inputs for various ML models. Notably, our proposed TMP approach boasts a speed advantage, performing up to 59.5 times faster than the cutting-edge MP methods.

  • Through successful integration of the latest TDA techniques with deep learning tools, our TMP-Nets model consistently and cohesively outperforms the majority of state-of-the-art deep learning models.

2 Related Work

2.1 Time Series Forecasting

Recurrent Neural Networks (RNNs) are the most successful deep learning techniques to model datasets with time-dependent variables (Lipton, Berkowitz, and Elkan 2015). Long-Short-Term Memory networks (LSTMs) addressed the prior RNN limitations in learning long-term dependencies by solving known issues with exploding and vanishing gradients (Yu et al. 2019), serving as basis for other improved RNN, such as Gate Recurrent Units (GRUs) (Dey and Salem 2017), Bidirectional LSTMs (BI_LSTM) (Wang, Yang, and Meinel 2018), and seq2seq LSTMs (Sutskever, Vinyals, and Le 2014). Despite the widespread adoption of RNNs in multiple applications (Xiang, Yan, and Demir 2020; Schmidhuber 2017; Shin and Kim 2020; Shewalkar, Nyavanandi, and Ludwig 2019; Segovia-Dominguez et al. 2021; Bin et al. 2018), RNNs are limited by the structure of the input data and can not naturally handle data-structures from manifolds and graphs, i.e. non-Euclidean spaces.

2.2 Graph Convolutional Networks

New methods on graph convolution-based methods overcome prior limitations of traditional GCN approaches, e.g. learning underlying local and global connectivity patterns (Veličković et al. 2018; Defferrard, Bresson, and Vandergheynst 2016; Kipf and Welling 2017). GCN handles graph-structure data via aggregation of node information from the neighborhoods using graph filters. Lately, there is an increasing interest in expanding GCN capabilities to the time series forecasting domain. In this context, modern approaches have reached outstanding results in COVID-19 forecasting, money laundering, transportation forecasting, and scene recognition (Pareja et al. 2020; Segovia Dominguez et al. 2021; Yu, Yin, and Zhu 2018; Yan, Xiong, and Lin 2018; Guo et al. 2019; Weber et al. 2019; Yao et al. 2018). However, a major drawback of these approaches is the lack of versatility as they assume a fixed graph-structure and rely on the existing correlation among spatial and temporal features.

2.3 Multiparameter Persistence

Multipersistence (MP) is a highly promising approach to significantly improve the success of single parameter persistence (SP) in applied TDA, but the theory is not complete yet (Botnan and Lesnick 2022). Except for some special cases, the MP theory tends to suffer from the problem of the nonexistence of barcode decomposition because of the partially ordered structure of the index set {(αi,βj)}\{(\alpha_{i},\beta_{j})\}. The existing approaches remedy this issue via the slicing technique by studying one-dimensional fibers of the multiparameter domain. However, this approach tends to lose most of the information the MP approach produces. Another idea along these lines is to use several such directions (vineyards), and produce a vectorization summarizing these SP vectorizations (Carrière and Blumberg 2020). However, again choosing these directions suitably and computing restricted SP vectorizations are computationally costly which restricts these approaches in many real-life applications. There are several promising recent studies in this direction (Botnan, Oppermann, and Oudot 2022; Vipond 2020), but these techniques often do not provide a topological summary that can readily serve as input to ML models. In this paper, we develop a highly efficient way to use the MP approach for time-dependent data and provide a multidimensional topological summary with TMP Vectorizations. We discuss the current fundamental challenges in the MP theory and the contributions of our TMP vectorizations in Section D.2.

3 Background

We start by providing the basic background for our machinery. While our techniques are applicable to any type of time-dependent data, here we mainly focus on the dynamic networks since our primary motivation comes from time-aware learning of time-evolving graphs as well as time series and spatio-temporal processes, also represented as graph structures. (For discussion on other types of data see Section D.3.)

Notation Table: All the notations used in the paper are given in Table 12 in the appendix.

Time-Dependent Data: Throughout the paper, by time-dependent data, we mean the data which implicitly or explicitly has time information embedded in itself. Such data include but are not limited to multivariate time series, spatio-temporal processes, and dynamic networks. Since our paper is primarily motivated by time-aware graph neural networks and their broader applications to forecasting, we focus on dynamic networks. Let {𝒢1,𝒢2,,𝒢T}\{\mathcal{G}_{1},\mathcal{G}_{2},\dots,\mathcal{G}_{T}\} be a sequence of weighted graphs for time steps t={1,,T}t=\{1,\ldots,T\}. In particular, 𝒢t={𝒱t,t,Wt\mathcal{G}_{t}=\{\mathcal{V}_{t},\mathcal{E}_{t},W_{t}} with node set 𝒱t\mathcal{V}_{t}, and edge set t\mathcal{E}_{t}. Let |𝒱t|=Nt|\mathcal{V}_{t}|=N_{t} be the cardinality of the node set. WtW_{t} represents the edge weights for t\mathcal{E}_{t} as a nonnegative symmetric Nt×NtN_{t}\times N_{t}-matrix with entries {ωrst}1r,sNt\{\omega^{t}_{rs}\}_{1\leq r,s\leq N_{t}}, i.e. the adjacency matrix of 𝒢t\mathcal{G}_{t}. In other words, ωrst>0\omega^{t}_{rs}>0 for any erstte^{t}_{rs}\in\mathcal{E}_{t} and ωrst=0\omega^{t}_{rs}=0, otherwise. In the case of unweighted networks, let ωrst=1\omega^{t}_{rs}=1 for any erstte^{t}_{rs}\in\mathcal{E}_{t} and ωrst=0\omega^{t}_{rs}=0, otherwise.

3.1 Background on Persistent Homology

Persistent homology (PH) is a mathematical machinery to capture the hidden shape patterns in the data by using algebraic topology tools. PH extracts this information by keeping track of the evolution of the topological features (components, loops, cavities) created in the data while looking at it using different resolutions. Here, we give basic background for PH in the graph setting. For further details, see (Dey and Wang 2022; Edelsbrunner and Harer 2010).

For a given graph 𝒢\mathcal{G}, consider a nested sequence of subgraphs 𝒢1𝒢N=𝒢\mathcal{G}^{1}\subseteq\ldots\subseteq\mathcal{G}^{N}=\mathcal{G}. For each 𝒢i\mathcal{G}^{i}, define an abstract simplicial complex 𝒢^i\widehat{\mathcal{G}}^{i}, 1iN1\leq i\leq N, yielding a filtration of complexes 𝒢^1𝒢^N\widehat{\mathcal{G}}^{1}\subseteq\ldots\subseteq\widehat{\mathcal{G}}^{N}. Here, clique complexes are among the most common ones, i.e., clique complex 𝒢^\widehat{\mathcal{G}} is obtained by assigning (filling with) a kk-simplex to each complete (k+1)(k+1)-complete subgraph in 𝒢\mathcal{G}, e.g., a 33-clique, a complete 33-subgraph, in 𝒢\mathcal{G} will be filled with a 22-simplex (triangle). Then, in this sequence of simplicial complexes, one can systematically keep track of the evolution of the topological patterns mentioned above. A kk-dimensional topological feature (or kk-hole) may represent connected components (0-hole), loops (11-hole) and cavities (22-hole). For each kk-hole σ\sigma, PH records its first appearance in the filtration sequence, say 𝒢^bσ\widehat{\mathcal{G}}^{b_{\sigma}}, and first disappearance in later complexes, 𝒢^dσ\widehat{\mathcal{G}}^{d_{\sigma}} with a unique pair (bσ,dσ)(b_{\sigma},d_{\sigma}), where 1bσ<dσN1\leq b_{\sigma}<d_{\sigma}\leq N We call bσb_{\sigma} the birth time of σ\sigma and dσd_{\sigma} the death time of σ\sigma. We call dσbσd_{\sigma}-b_{\sigma} the life span of σ\sigma. PH records all these birth and death times of the topological features in persistence diagrams. Let 0kD0\leq k\leq D where DD is the highest dimension in the simplicial complex 𝒢^N\widehat{\mathcal{G}}^{N}. Then kthk^{th} persistence diagram PDk(𝒢)={(bσ,dσ)σHk(𝒢^i) for bσi<dσ}{\rm{PD}_{k}}(\mathcal{G})=\{(b_{\sigma},d_{\sigma})\mid\sigma\in H_{k}(\widehat{\mathcal{G}}^{i})\mbox{ for }b_{\sigma}\leq i<d_{\sigma}\}. Here, Hk(𝒢^i)H_{k}(\widehat{\mathcal{G}}^{i}) represents the kthk^{th} homology group of 𝒢^i\widehat{\mathcal{G}}^{i} which keeps the information of the kk-holes in the simplicial complex 𝒢^i\widehat{\mathcal{G}}^{i}. With the intuition that the topological features with a long life span (persistent features) describe the hidden shape patterns in the data, these persistence diagrams provide a unique topological fingerprint of 𝒢\mathcal{G}.

As one can easily notice, the most important step in the PH machinery is the construction of the nested sequence of subgraphs 𝒢1𝒢N=𝒢\mathcal{G}^{1}\subseteq\ldots\subseteq\mathcal{G}^{N}=\mathcal{G}. For a given unweighted graph 𝒢=(𝒱,E)\mathcal{G}=(\mathcal{V},E), the most common technique is to use a filtering function f:𝒱f:\mathcal{V}\to\mathbb{R} with a choice of thresholds ={αi}1N\mathcal{I}=\{\alpha_{i}\}_{1}^{N} where α1=minv𝒱f(v)<α2<<αN=maxv𝒱f(v)\alpha_{1}=\min_{v\in\mathcal{V}}f(v)<\alpha_{2}<\ldots<\alpha_{N}=\max_{v\in\mathcal{V}}f(v). For αi\alpha_{i}\in\mathcal{I}, let 𝒱i={vr𝒱f(vr)αi}\mathcal{V}_{i}=\{v_{r}\in\mathcal{V}\mid f(v_{r})\leq\alpha_{i}\}. Let 𝒢i\mathcal{G}^{i} be the induced subgraph of 𝒢\mathcal{G} by 𝒱i\mathcal{V}_{i}, i.e. 𝒢i=(𝒱i,i)\mathcal{G}^{i}=(\mathcal{V}_{i},\mathcal{E}_{i}) where i={ersvr,vs𝒱i}\mathcal{E}_{i}=\{e_{rs}\in\mathcal{E}\mid v_{r},v_{s}\in\mathcal{V}_{i}\}. This process yields a nested sequence of subgraphs 𝒢1𝒢2𝒢N=𝒢\mathcal{G}^{1}\subset\mathcal{G}^{2}\subset\ldots\subset\mathcal{G}^{N}=\mathcal{G}, called the sublevel filtration induced by the filtering function ff. Choice of ff is crucial here, and in most cases, ff is either an important function from the domain of the data, e.g. amount of transactions or volume transfer, or a function defined from intrinsic properties of the graph, e.g. degree, betweenness. Similarly, for a weighted graph, one can use sublevel filtration on the weights of the edges and obtain a suitable filtration reflecting the domain information stored in the edge weights. For further details on different filtration types of networks, see (Aktas, Akbas, and El Fatmaoui 2019; Hofer et al. 2020).

3.2 Multidimensional Persistence

In the previous section, we discussed the single-parameter persistence theory. The reason for the term ”single” is that we filter the data in only one direction 𝒢1𝒢N=𝒢\mathcal{G}^{1}\subset\dots\subset\mathcal{G}^{N}=\mathcal{G}. Here, the choice of direction is the key to extracting the hidden patterns from the observed data. For some tasks and data types, it is sufficient to consider only one dimension (or filtering function f:𝒱f:\mathcal{V}\to\mathbb{R}) (e.g., atomic numbers for protein networks) in order to extract the intrinsic data properties. However, often the observed data may have more than one direction to be analyzed (for example, in the case of money laundering detection on bitcoin, we may need to use both transaction amounts and numbers of transactions between any two traders). With this intuition, multiparameter persistence (MP) theory is suggested as a natural generalization of single persistence (SP).

In simpler terms, if one uses only one filtering function, sublevel sets induce a single parameter filtration 𝒢^1𝒢^N=𝒢^\widehat{\mathcal{G}}^{1}\subset\dots\subset\widehat{\mathcal{G}}^{N}=\widehat{\mathcal{G}}. Instead, if one uses two or more functions, then it would enable us to study finer substructures and patterns in the data. In particular, let f:𝒱f:\mathcal{V}\to\mathbb{R} and g:𝒱g:\mathcal{V}\to\mathbb{R} be two filtering functions with very valuable complementary information of the network. Then, MP idea is presumed to produce a unique topological fingerprint combining the information from both functions. These pair of functions f,gf,g induces a multivariate filtering function F:𝒱2F:\mathcal{V}\mapsto\mathbb{R}^{2} with F(v)=(f(v),g(v))F(v)=(f(v),g(v)). Again, we can define a set of nondecreasing thresholds {αi}1m\{\alpha_{i}\}_{1}^{m} and {βj}1n\{\beta_{j}\}_{1}^{n} for ff and gg respectively. Then, 𝒱ij={vrVf(vr)αi,g(vr)βj}\mathcal{V}^{ij}=\{v_{r}\in V\mid f(v_{r})\leq\alpha_{i},g(v_{r})\leq\beta_{j}\}, i.e. 𝒱ij=F1((,αi]×(,βj])\mathcal{V}^{ij}=F^{-1}((-\infty,\alpha_{i}]\times(-\infty,\beta_{j}]). Then, let 𝒢ij\mathcal{G}^{ij} be the induced subgraph of 𝒢\mathcal{G} by 𝒱ij\mathcal{V}^{ij}, i.e. the smallest subgraph of 𝒢\mathcal{G} containing 𝒱ij\mathcal{V}^{ij}. Then, instead of a single filtration of complexes, we get a bifiltration of complexes {𝒢^ij1im,1jn}\{\widehat{\mathcal{G}}^{ij}\mid 1\leq i\leq m,1\leq j\leq n\}. See Figure 2 (Appendix) for an explicit example.

As illustrated in Figure 2, we can imagine {𝒢^ij}\{\widehat{\mathcal{G}}^{ij}\} as a rectangular grid of size m×nm\times n such that for each 1i0m1\leq i_{0}\leq m, {G^i0j}j=1n\{\widehat{G}^{i_{0}j}\}_{j=1}^{n} gives a nested (horizontal) sequence of simplicial complexes. Similarly, for each 1j0n1\leq j_{0}\leq n, {G^ij0}i=1m\{\widehat{G}^{ij_{0}}\}_{i=1}^{m} gives a nested (vertical) sequence of simplicial complexes. By computing the homology groups of these complexes, {Hk(𝒢ij)}\{H_{k}(\mathcal{G}^{ij})\}, we obtain the induced bigraded persistence module (a rectangular grid of size m×nm\times n). Again, the idea is to keep track of the kk-dimensional topological features via the homology groups {Hk(𝒢^ij)}\{H_{k}(\widehat{\mathcal{G}}^{ij})\} in this grid. As detailed in Section D.2, because of the technical issues related to commutative algebra coming from the partially ordered structure of the multipersistence module, this MP approach has not been completed like SP theory yet, and there is a need to facilitate this promising idea effectively in real-life applications.

In this paper, for time-dependent data, we overcome this problem by using the naturally inherited special direction in the data: Time. By using this canonical direction in the multipersistence module, we bypass the partial ordering problem and generalize the ideas from single parameter persistence to produce a unique topological fingerprint of the data via MP. Our approach provides a general framework to utilize various vectorization forms defined for single PH and gives a multidimensional topological summary of the data.

Utilizing Time Direction - Zigzag Persistence: While our intuition is to use time direction in MP for forecasting purposes, the time parameter is not very suitable to use in PH construction in its original form. This is because PH construction needs nested subgraphs to keep track of the existing topological features, while time-dependent data do not come nested, i.e. 𝒢t1𝒢t2\mathcal{G}_{t_{1}}\not\subseteq\mathcal{G}_{t_{2}} in general for t1t2t_{1}\leq t_{2}. However, a generalized version of PH construction helps us to overcome this problem. We want to keep track of topological features which exist in different time instances. Zigzag homology (Carlsson and Silva 2010) bypasses the requirement of the nested sequence by using the ”zigzag scheme”. We provide the details of zigzag persistent homology in Section C.1.

4 TMP Vectorizations

We now introduce a general framework to define vectorizations for multipersistence homology on time-dependent data. First, we recall the single persistence vectorizations which we will expand as multidimensional vectorizations with our TMP framework.

4.1 Single Persistence Vectorizations

While PH extracts hidden shape patterns from data as persistence diagrams (PD), PDs being a collection of points {(bσ,dσ)}\{(b_{\sigma},d_{\sigma})\} in 2\mathbb{R}^{2} by itself are not very practical for statistical and machine learning purposes. Instead, the common techniques are by accurately representing PDs as kernels (Kriege, Johansson, and Morris 2020) or vectorizations (Ali et al. 2023). Single Persistence Vectorizations transform obtained PH information (PDs) into a function or a feature vector form which are much more suitable for ML tools. Common single persistence (SP) vectorization methods are Persistence Images (Adams et al. 2017), Persistence Landscapes (Bubenik 2015), Silhouettes (Chazal et al. 2014), and various Persistence Curves (Chung and Lawson 2022). These vectorizations define a single variable or multivariable functions out of PDs, which can be used as fixed size 1D1D or 2D2D vectors in applications, i.e 1×m1\times m vectors or m×nm\times n vectors. For example, a Betti curve for a PD with mm thresholds can be written as 1×m1\times m size vectors. Similarly, persistence images is an example of 2D2D vectors with the chosen resolution (grid) size. See the examples below and in Section D.1 for further details.

4.2 TMP Vectorizations

Finally, we define our Temporal MultiPersistence (TMP) framework for time-dependent data. In particular, by using the existing single-parameter persistence vectorizations, we produce multidimensional vectorization by effectively using the time direction in the multipersistence module. The idea is to use zigzag homology in the time direction and consider dd-dimensional filtering for the other directions. This process produces (d+1)(d+1)-dimensional vectorizations of the dataset. While the most common choice would be d=1d=1 for computational purposes, we restrict ourselves to d=2d=2 to give a general idea. The construction can easily be generalized to higher dimensions. Below and in Section D.1, we provide explicit examples of TMP Vectorizations. While we mainly focus on network data in this part, we give how to generalize TMP vectorizations to other types of data (e.g., point clouds, images) in Section D.3.

Again, let 𝒢~={𝒢1,𝒢2,,𝒢T}\widetilde{\mathcal{G}}=\{\mathcal{G}_{1},\mathcal{G}_{2},\dots,\mathcal{G}_{T}\} be a sequence of weighted (or unweighted) graphs for time steps t=1,,Tt=1,\ldots,T with 𝒢t={𝒱t,t,Wt\mathcal{G}_{t}=\{\mathcal{V}_{t},\mathcal{E}_{t},W_{t}} as defined in Section 3. By using a filtering function Ft:𝒱tdF_{t}:\mathcal{V}_{t}\to\mathbb{R}^{d} or weights, define a bifiltration for each t0t_{0}, i.e. {𝒢t0ij}\{\mathcal{G}_{t_{0}}^{ij}\} for 1im1\leq i\leq m and 1jn1\leq j\leq n. For each fixed i0,j0i_{0},j_{0}, consider the sequence {𝒢1i0j0,𝒢2i0j0,,𝒢Ti0j0}\{\mathcal{G}^{i_{0}j_{0}}_{1},\mathcal{G}_{2}^{i_{0}j_{0}},\dots,\mathcal{G}_{T}^{i_{0}j_{0}}\}. This sequence of subgraphs induces a zigzag sequence of clique complexes as described in Section C.1:

𝒢^1i0j0𝒢^1.5i0j0𝒢^2i0j0𝒢^2.5i0j0𝒢^3𝒢^Ti0j0.\displaystyle\widehat{\mathcal{G}}_{1}^{i_{0}j_{0}}\hookrightarrow\widehat{\mathcal{G}}^{i_{0}j_{0}}_{1.5}\hookleftarrow\widehat{\mathcal{G}}^{i_{0}j_{0}}_{2}\hookrightarrow\widehat{\mathcal{G}}^{i_{0}j_{0}}_{2.5}\hookleftarrow\widehat{\mathcal{G}}_{3}\hookrightarrow\dots\hookleftarrow\widehat{\mathcal{G}}^{i_{0}j_{0}}_{T}.

Now, let ZPDk(𝒢~i0j0)ZPD_{k}(\widetilde{\mathcal{G}}^{i_{0}j_{0}}) be the induced zigzag persistence diagram. Let φ\varphi represent an SP vectorization as described above, e.g. Persistence Landscape, Silhouette, Persistence Image, Persistence Curves. This means if PD(𝒢)PD(\mathcal{G}) is the persistence diagram for some filtration induced by 𝒢\mathcal{G}, then we call φ(𝒢)\varphi(\mathcal{G}) is the corresponding vectorization for PD(𝒢)PD(\mathcal{G}) (see Figure 1 in Appendix F7). In most cases, φ(𝒢)\varphi(\mathcal{G}) is represented as functions on the threshold domain (Persistence curves, Landscapes, Silhouettes, Persistence Surfaces). However, the discrete structure of the threshold domain enables us to interpret the function φ(𝒢)\varphi(\mathcal{G}) as a 1D1D-vector φ(𝒢)\vec{\varphi}(\mathcal{G}) (Persistence curves, Landscapes, Silhouettes) or 2D2D-vector φ(𝒢)\vec{\varphi}(\mathcal{G}) (Persistence Images). See examples given below and in the Section D.1 for more details.

Refer to caption
Figure 1: TMP outline. Given 𝒢~={𝒢1,𝒢2,,𝒢T}\widetilde{\mathcal{G}}=\{\mathcal{G}_{1},\mathcal{G}_{2},\dots,\mathcal{G}_{T}\} with time-index t=1,,Tt=1,\ldots,T (1st row) we apply a bifiltration on node/edge-features at tt, i.e. {𝒢tij}\{\mathcal{G}_{t}^{ij}\} for 1im1\leq i\leq m and 1jn1\leq j\leq n (2nd row). The sequence of subgraphs {𝒢1i0j0,𝒢2i0j0,,𝒢Ti0j0}\{\mathcal{G}^{i_{0}j_{0}}_{1},\mathcal{G}_{2}^{i_{0}j_{0}},\dots,\mathcal{G}_{T}^{i_{0}j_{0}}\}, at fixed i0,j0i_{0},j_{0} is the input into the zigzag persistence method to produce a zigzag persistence barcode (3rd row). Then, φ(𝒢~i0j0)\vec{\varphi}(\widetilde{\mathcal{G}}^{i_{0}j_{0}}) is the corresponding vectorization for zigzag PD ZPDk(𝒢~i0j0)ZPD_{k}(\widetilde{\mathcal{G}}^{i_{0}j_{0}}) of kk-dim feature (4th row).

Now, let φ(𝒢~ij)\vec{\varphi}(\widetilde{\mathcal{G}}^{ij}) be the corresponding vector for the zigzag persistence diagram ZPDk(𝒢~ij)ZPD_{k}(\widetilde{\mathcal{G}}^{ij}). Then, for any 1im1\leq i\leq m and 1jn1\leq j\leq n, we have a (1D1D or 2D2D) vector φ(𝒢~ij)\vec{\varphi}(\widetilde{\mathcal{G}}^{ij}). Now, define the induced TMP Vectorization 𝐌φ\mathbf{M}_{\varphi} as the corresponding tensor 𝐌φij=φ(𝒢~ij)\mathbf{M}_{\varphi}^{ij}=\vec{\varphi}(\widetilde{\mathcal{G}}^{ij}) for 1im1\leq i\leq m and 1jn.1\leq j\leq n.

In particular, if φ\vec{\varphi} is a 1D1D-vector of size 1×k1\times k, then 𝐌φ\mathbf{M}_{\varphi} would be a 3D3D-vector (rank-33 tensor) with size m×n×km\times n\times k. if φ\vec{\varphi} is a 2D2D-vector of size k×rk\times r, then 𝐌φ\mathbf{M}_{\varphi} would be a rank-44 tensor with size m×n×k×rm\times n\times k\times r. In the examples below, we provide explicit constructions for 𝐌φ\mathbf{M}_{\varphi} for the most common SP vectorizations φ\varphi.

4.3 Examples of TMP Vectorizations

While we describe TMP Vectorizations for d=2d=2, in most applications, d=1d=1 would be preferable for computational purposes. Then if the preferred single persistence (SP) vectorization φ\varphi produces 1D1D-vector (say size 1×r1\times r), then induced TMP Vectorization would be a 2D2D-vector MφM_{\varphi} (a matrix) of size m×rm\times r where mm is the number of thresholds for the filtering function used, e.g. f:𝒱tf:\mathcal{V}_{t}\to\mathbb{R}. These m×rm\times r matrices provide unique topological fingerprints for each time-dependent dataset {𝒢t}t=1T\{\mathcal{G}_{t}\}_{t=1}^{T}. These multidimensional fingerprints are produced by using persistent homology with two-dimensional filtering where the first dimension is the natural direction time tt, and the second dimension comes from the filtering function ff.

Here, we discuss explicit constructions of two examples of TMP vectorizations. As we mentioned above, the framework is very general, and it can be applied to various vectorization methods. In Section D.1, we provide details of further examples of TMP Vectorizations for time-dependent data, i.e., TMP Silhouettes, and TMP Betti Summaries.

TMP Landscapes

Persistence Landscapes λ\lambda are one of the most common SP vectorization methods introduced in (Bubenik 2015). For a given persistence diagram PD(𝒢)={(bi,di)}PD(\mathcal{G})=\{(b_{i},d_{i})\}, λ\lambda produces a function λ(𝒢)\lambda(\mathcal{G}) by using generating functions Λi\Lambda_{i} for each (bi,di)PD(𝒢)(b_{i},d_{i})\in PD(\mathcal{G}), i.e. Λi:[bi,di]\Lambda_{i}:[b_{i},d_{i}]\to\mathbb{R} is a piecewise linear function obtained by two line segments starting from (bi,0)(b_{i},0) and (di,0)(d_{i},0) connecting to the same point (bi+di2,bidi2)(\frac{b_{i}+d_{i}}{2},\frac{b_{i}-d_{i}}{2}). Then, the Persistence Landscape function λ(𝒢):[ϵ1,ϵq]\lambda(\mathcal{G}):[\epsilon_{1},\epsilon_{q}]\to\mathbb{R} is defined as λ(𝒢)(t)=maxiΛi(t)\lambda(\mathcal{G})(t)=\max_{i}\Lambda_{i}(t) for t[ϵ1,ϵq]t\in[\epsilon_{1},\epsilon_{q}], where {ϵk}1q\{\epsilon_{k}\}_{1}^{q} represents the thresholds for the filtration used.

Considering the piecewise linear structure of the function, λ(𝒢)\lambda(\mathcal{G}) is completely determined by its values on 2q12q-1 points, i.e. bi±di2{ϵ1,ϵ1.5,ϵ2,ϵ2.5,,ϵq}\frac{b_{i}\pm d_{i}}{2}\in\{\epsilon_{1},\epsilon_{1.5},\epsilon_{2},\epsilon_{2.5},\dots,\epsilon_{q}\} where ϵk.5=(ϵk+ϵk+1)/2\epsilon_{k.5}={(\epsilon_{k}+\epsilon_{k+1})}/{2}. Hence, a vector of size 1×(2q1)1\times(2q-1) whose entries the values of this function would suffice to capture all the information needed, i.e. λ=[λ(ϵ1)λ(ϵ1.5)λ(ϵ2)λ(ϵ2.5)λ(ϵ3)λ(ϵq)]\vec{\lambda}=[\lambda(\epsilon_{1})\ \lambda(\epsilon_{1.5})\ \lambda(\epsilon_{2})\ \lambda(\epsilon_{2.5})\ \lambda(\epsilon_{3})\ \dots\ \lambda(\epsilon_{q})].

Now, for the time-dependent data 𝒢~={𝒢t}t=1T\widetilde{\mathcal{G}}=\{\mathcal{G}_{t}\}_{t=1}^{T}, to construct our induced TMP Vectorization 𝐌λ\mathbf{M}_{\lambda}, TMP Landscapes, we use λ\lambda for time direction, t=1,,Tt=1,\dots,T. For zigzag persistence, we have 2T12T-1 thresholds steps. Hence, by taking q=2T1q=2T-1, we would have 4T34T-3 length vector λ(𝒢~)\vec{\lambda}(\widetilde{\mathcal{G}}).

For the other multipersistence direction, by using a filtering function f:𝒱tf:\mathcal{V}_{t}\to\mathbb{R} with the threshold set ={αj}1m\mathcal{I}=\{\alpha_{j}\}_{1}^{m}, we obtain TMP Landscape 𝐌λ\mathbf{M}_{\lambda} as follows: 𝐌λj=λ(𝒢~j)\mathbf{M}_{\lambda}^{j}=\vec{\lambda}(\widetilde{\mathcal{G}}^{j}) where 𝐌λj\mathbf{M}_{\lambda}^{j} represents jthj^{th}-row of the 2D2D-vector 𝐌λ\mathbf{M}_{\lambda}. Here, 𝒢~j={𝒢tj}t=1T\widetilde{\mathcal{G}}^{j}=\{\mathcal{G}_{t}^{j}\}_{t=1}^{T} is induced by the sublevel filtration for f:𝒱tf:\mathcal{V}_{t}\to\mathbb{R} as described in the paper, i.e. 𝒢tj\mathcal{G}^{j}_{t} is the induced subgraph by 𝒱tj={vr𝒱tf(vr)αj}\mathcal{V}^{j}_{t}=\{v_{r}\in\mathcal{V}_{t}\mid f(v_{r})\leq\alpha_{j}\}.

Hence, for a time-dependent data 𝒢~={𝒢t}t=1T\widetilde{\mathcal{G}}=\{\mathcal{G}_{t}\}_{t=1}^{T}, TMP Landscape 𝐌λ(𝒢~)\mathbf{M}_{\lambda}(\widetilde{\mathcal{G}}) is a 2D2D-vector of size m×(4T3)m\times(4T-3) where TT is the number of time steps.

TMP Persistence Images

Next SP vectorization in our list is persistence images (Adams et al. 2017). Different than most SP vectorizations, persistence images produce 2D2D-vectors. The idea is to capture the location of the points in the persistence diagrams with a multivariable function by using the 2D2D Gaussian functions centered at these points. For PD(𝒢)={(bi,di)}PD(\mathcal{G})=\{(b_{i},d_{i})\}, let ϕi\phi_{i} represent a 2D2D-Gaussian centered at the point (bi,di)2(b_{i},d_{i})\in\mathbb{R}^{2}. Then, one defines a multivariable function, Persistence Surface, μ~=iwiϕi\widetilde{\mu}=\sum_{i}w_{i}\phi_{i} where wiw_{i} is the weight, mostly a function of the life span dibid_{i}-b_{i}. To represent this multivariable function as a 2D2D-vector, one defines a k×lk\times l grid (resolution size) on the domain of μ~\widetilde{\mu}, i.e. threshold domain of PD(𝒢)PD(\mathcal{G}). Then, one obtains the Persistence Image, a 2D2D-vector μ=[μrs]\vec{\mu}=[\mu_{rs}] of size k×lk\times l, where μrs=Δrsμ~(x,y)𝑑x𝑑y\mu_{rs}=\int_{\Delta_{rs}}\widetilde{\mu}(x,y)\,dxdy and Δrs\Delta_{rs} is the corresponding pixel (rectangle) in the k×lk\times l grid.

Following a similar route, for our TMP vectorization, we use time as one direction, and the filtering function in the other direction, i.e. f:𝒱tf:\mathcal{V}_{t}\to\mathbb{R} with threshold set ={αj}1m\mathcal{I}=\{\alpha_{j}\}_{1}^{m}. Then, for time-dependent data 𝒢~={𝒢t}t=1T\widetilde{\mathcal{G}}=\{\mathcal{G}_{t}\}_{t=1}^{T}, in the time direction, we use zigzag PDs and their persistence images. Hence, for each 1jm1\leq j\leq m, we define TMP Persistence Image as 𝐌μj(𝒢~)=μ(𝒢~j)\mathbf{M}_{\mu}^{j}(\widetilde{\mathcal{G}})=\vec{\mu}(\widetilde{\mathcal{G}}^{j}) where 2D2D-vector 𝐌μj\mathbf{M}_{\mu}^{j} is jthj^{th}-floor of the 3D3D-vector 𝐌μ\mathbf{M}_{\mu}. Then, TMP Persistence Image 𝐌μj(𝒢~)\mathbf{M}_{\mu}^{j}(\widetilde{\mathcal{G}}) is a 3D3D-vector of size m×k×lm\times k\times l.

More details for TMP Persistence Surfaces and TMP Silhouettes are provided in Section D.1.

4.4 Stability of TMP Vectorizations

We now prove that when the source single parameter vectorization φ\varphi is stable, then so is its induced TMP vectorization 𝐌φ\mathbf{M}_{\varphi}. We discuss the details of the stability notion in persistence theory and examples of stable SP vectorizations in Section C.2.

Let 𝒢~={𝒢t}t=1T\widetilde{\mathcal{G}}=\{\mathcal{G}_{t}\}_{t=1}^{T} and ~={t}t=1T\widetilde{\mathcal{H}}=\{\mathcal{H}_{t}\}_{t=1}^{T} be two time sequences of networks. Let φ\varphi be a stable SP vectorization with the stability equation

d(φ(𝒢~),φ(~))Cφ𝒲pφ(PD(𝒢~),PD(~))\mathrm{d}(\varphi(\widetilde{\mathcal{G}}),\varphi(\widetilde{\mathcal{H}}))\leq C_{\varphi}\cdot\mathcal{W}_{p_{\varphi}}(PD(\widetilde{\mathcal{G}}),PD(\widetilde{\mathcal{H}}))

for some 1pφ1\leq p_{\varphi}\leq\infty. Here, 𝒲p\mathcal{W}_{p} represents Wasserstein-pp distance as defined in Section C.2.

Now, consider the bifiltrations {𝒢^tij}\{\widehat{\mathcal{G}}_{t}^{ij}\} and {^tij}\{\widehat{\mathcal{H}}_{t}^{ij}\} for each 1tT1\leq t\leq T. We define the induced matching distance between the multiple persistence diagrams (See Remark 2) as 𝐃({ZPD(𝒢~)},{ZPD(~)})=maxi,j𝒲pφ(ZPD(𝒢~ij),ZPD(~ij))\mathbf{D}(\{ZPD(\widetilde{\mathcal{G}})\},\{ZPD(\widetilde{\mathcal{H}})\})=\max_{i,j}\mathcal{W}_{p_{\varphi}}(ZPD(\widetilde{\mathcal{G}}^{ij}),ZPD(\widetilde{\mathcal{H}}^{ij}))

Now, define the distance between TMP Vectorizations as 𝐃(𝐌φ(𝒢~),𝐌φ(~))=maxi,jd(φ(𝒢~ij),φ(~ij))\mathbf{D}(\mathbf{M}_{\varphi}(\widetilde{\mathcal{G}}),\mathbf{M}_{\varphi}(\widetilde{\mathcal{H}}))=\max_{i,j}\mathrm{d}(\varphi(\widetilde{\mathcal{G}}^{ij}),\varphi(\widetilde{\mathcal{H}}^{ij})).

Theorem 1.

Let φ\varphi be a stable vectorization for single parameter PDs. Then, the induced TMP Vectorization 𝐌φ\mathbf{M}_{\varphi} is also stable, i.e. With the notation above, there exists C^φ>0\widehat{C}_{\varphi}>0 such that for any pair of time-aware network sequences 𝒢~\widetilde{\mathcal{G}} and ~\widetilde{\mathcal{H}}, we have the following inequality.

𝐃(𝐌φ(𝒢~),𝐌φ(H~))C^φ𝐃({ZPD(𝒢~)},{ZPD(~)})\mathbf{D}(\mathbf{M}_{\varphi}(\widetilde{\mathcal{G}}),\mathbf{M}_{\varphi}(\widetilde{H}))\leq\widehat{C}_{\varphi}\cdot\mathbf{D}(\{ZPD(\widetilde{\mathcal{G}})\},\{ZPD(\widetilde{\mathcal{H}})\})

The proof of the theorem is given in Appendix E.

5 TMP-Nets

To fully take advantage of the extracted signatures by TMP vectorizations, we propose a GNN-based module to track and learn significant temporal and topological patterns. Our Time Aware Multiparameter Persistence Nets (TMP-Nets) capture spatio-temporal relationships via trainable node embedding dictionaries in a GDL-based framework.

Graph Convolution on Adaptive Adjacency Matrix

To model the hidden dependencies among nodes in the spatio-temporal graph, we define the spatial graph convolution operation based on the adaptive adjacency matrix and given node feature matrix. Inspired by (Wu et al. 2019), to investigate the beyond pairwise relations among nodes, we use the adaptive adjacency matrix based on trainable node embedding dictionaries, i.e., Zt,Spatial()=LZt,Spatial(1)W(1)Z^{(\ell)}_{t,\text{Spatial}}=LZ_{t,\text{Spatial}}^{(\ell-1)}W^{(\ell-1)}, where L=Softmax(ReLU(EθEθ))L=\text{Softmax}(\text{ReLU}(E_{\theta}E^{\top}_{\theta})) (here EθN×dcE_{\theta}\in\mathbb{R}^{N\times d_{c}} and dc1d_{c}\geq 1), ZSpatial(1)Z_{\text{Spatial}}^{(\ell-1)} and ZSpatial()Z_{\text{Spatial}}^{(\ell)} are the input and output of (1)(\ell-1)-th layer, and ZSpatial(0)=XN×FZ_{\text{Spatial}}^{(0)}=X\in\mathbb{R}^{N\times F} (here FF represents the number of features for each node), and W(1)W^{(\ell-1)} is the trainable weights.

Topological Signatures Representation Learning

In our experiments, we use CNN based model to learn the TMP topological features. Given the TMP topological features of resolution pp, i.e., TMPtp×p\text{TMP}_{t}\in\mathbb{R}^{p\times p}, we employ CNN-based model and global max pooling to obtain the image-level local topological feature Zt,TMPZ_{t,\text{TMP}} as

Zt,TMP=fGMP(fθ(TMPt)),Z_{t,\text{TMP}}=f_{\text{GMP}}(f_{\theta}(\text{TMP}_{t})),

where fGMPf_{\text{GMP}} is the global max pooling, fθf_{\theta} is a CNN based neural network with parameter set θi\theta_{i}, and Zt,TMPdcZ_{t,\text{TMP}}\in\mathbb{R}^{d_{c}} is the output for TMP representation.

Lastly, we combine the two embeddings to obtain the final embedding ZtZ_{t}:

Zt=Concat(Zt,Spatial,Zt,TMP).Z_{t}=Concat(Z_{t,\text{Spatial}},Z_{t,\text{TMP}}).

To capture both spatial and temporal correlations in time-series, we feed the final embedding ZtZ_{t} into Gated Recurrent Units (GRU) for future time points forecasting.

6 Experiments

Datasets:

We consider three types of data: two widely used benchmark datasets on California (CA) traffic (Chen et al. 2001) and electrocardiography (ECG5000) (Chen et al. 2015a), and the newly emerged data on Ethereum blockchain tokens (Shamsi et al. 2022). (The results on the ECG5000 are presented in Section A.4). More details descriptions of datasets can be found in Section B.1.

6.1 Experimental Results

We compare our TMP-Nets with 6 state-of-the-art baselines. We use three standard performance metrics Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and Mean absolute percentage error (MAPE). We provide additional experimental results in Appendix A. In Appendix B, we provide further details on the experimental setup and empirical evaluation. Our source code is available at the link 111https://www.dropbox.com/sh/h28f1cf98t9xmzj/AACBavvHc˙ctCB1FVQNyf-XRa?dl=0.

Model Bytom Decentraland Golem
DCRNN (Li et al. 2018) 35.36±\pm1.18 27.69±\pm1.77 23.15±\pm1.91
STGCN (Yu, Yin, and Zhu 2018) 37.33±\pm1.06 28.22±\pm1.69 23.68±\pm2.31
GraphWaveNet (Wu et al. 2019) 39.18±\pm0.96 37.67±\pm1.76 28.89±\pm2.34
AGCRN (Bai et al. 2020) 34.46±\pm1.37 26.75±\pm1.51 22.83±\pm1.91
Z-GCNETs (Chen, Segovia, and Gel 2021) 31.04±\pm0.78 23.81±\pm2.43 22.32±\pm1.42
StemGNN (Cao et al. 2020) 34.91±\pm1.04 28.37±\pm1.96 22.50±\pm2.01
TMP-Nets 28.77±\pm3.30 22.97±\pm1.80 29.01±\pm1.05
Table 1: Experimental results on Bytom, Decentraland, and Golem on MAPE and standard deviation.

Results on Blockchain Datasets: Table 1 shows performance on Bytom, Decentraland, and Golem. Table 1 suggests the following phenomena: (i) TMP-Nets achieves the best performance on Bytom and Decentraland, and the relative gains of TMP-Nets over the best baseline (i.e., Z-GCNETs) are 7.89% and 3.66% on Bytom and Decentraland respectively; (ii) compared with Z-GCNETs, the size of TMP topological features used in this work is much smaller than the zigzag persistence image utilized in Z-GCNETs.

An interesting question is why TMP-Nets performs differently on Golem vs. Bytom and Decentraland. Success on each network token depends on the diversity of connections among nodes. In cryptocurrency networks, we expect nodes/addresses to be connected with other nodes with similar transaction connectivity (e.g. interaction among whales) as well as with nodes with low connectivity (e.g. terminal nodes). However, the assortativity measure of Golem (-0.47) is considerably lower than Bytom (-0.42) and Decentraland (-0.35), leading to disassortativity patterns (i.e., repetitive isolated clusters) in the Golem network, which, in turn, downgrade the success rate of forecasting.

Model PeMSD4 PeMSD8
MAE RMSE MAPE (%) MAE RMSE MAPE (%)
AGCRN 110.36±\pm0.20 150.37±\pm0.15 208.36±\pm0.20 87.12±\pm0.25 109.20±\pm0.33 277.44±\pm0.26
Z-GCNETs 112.65±\pm0.12 153.47±\pm0.17 206.09±\pm0.33 69.82±\pm0.16 95.83±\pm0.37 102.74±\pm0.53
StemGNN 112.83±\pm0.07 150.22±\pm0.30 209.52±\pm0.51 65.16±\pm0.36 89.60±\pm0.60 108.71±\pm0.51
TMP-Nets 108.38±\pm0.10 147.57±\pm0.23 208.66±\pm0.27 59.82±\pm0.82 85.86±\pm0.64 109.88±\pm0.65
Table 2: Forecasting performance on (first 1,000 networks) of PeMSD4 and PeMSD8 benchmark datasets.
Model PeMSD4 PeMSD8
MAE RMSE MAPE (%) MAE RMSE MAPE (%)
AGCRN 90.36±\pm0.10 122.61±\pm0.13 176.90±\pm0.35 55.20±\pm0.19 83.01±\pm0.53 167.39±\pm0.25
Z-GCNETs 89.57±\pm0.11 117.94±\pm0.15 180.11±\pm0.26 47.11±\pm0.20 80.25±\pm0.24 98.15±\pm0.33
StemGNN 93.27±\pm0.16 131.49±\pm0.21 189.18±\pm0.30 53.86±\pm0.39 82.00±\pm0.52 97.78±\pm0.30
TMP-Nets 85.15±\pm0.12 115.00±\pm0.16 170.97±\pm0.22 50.20±\pm0.37 80.17±\pm0.26 100.31±\pm0.58
Table 3: Forecasting performance on (first 2,000 networks) PeMSD4 and PeMSD8 benchmark datasets.

Results on Traffic Datasets: For traffic flow data PeMSD4 and PeMSD8, we evaluate Z-GCNETs’ performance on varying lengths. This allows us to further explore the learning capabilities of our Z-GCNETs as a function of sample size. In particular, in many real-world scenarios, there exists only a limited number of temporal records to be used in the training stage, and the learning problem with lower sample sizes becomes substantially more challenging. Tables 2 and 3 show that under the scenario of limited data records for both PeMSD4 and PeMSD8 (i.e., 𝒯=1,000\mathcal{T}=1,000 and 𝒯=2,000\mathcal{T}^{\prime}=2,000), our TMP-Nets always outperforms three representative baselines in MAE and RMSE. For example, TMP-Nets significantly outperform SOTA baselines, where we achieve relative gains of 1.79% and 4.36% in RMSE on PeMSD4T=1,000 and PeMSD8T=1,000, respectively. Overall, the results demonstrate that our proposed TMP-Nets can accurately capture the hidden complex spatial and temporal correlations in the correlated time series datasets and achieve promising forecasting performances under the scenarios of limited data records. Moreover, we conduct experiments on the whole PeMSD4 and PeMSD8 datasets. As Table 6 (Appendix) indicates, our TMP-Nets still achieve competitive performances on both datasets.

Finally, we applied our approach in a different domain with a benchmark electrocardiogram dataset, ECG5000. Again, our model gives highly competitive results with the SOTA methods (Section A.4).

Ablation Studies:

To better evaluate the importance of different components of TMP-Nets, we perform ablation studies on two traffic datasets, i.e., PeMSD4 and PeMSD8 by using only (i) Zt,Spatial()Z^{(\ell)}_{t,\text{Spatial}} or (ii) Zt,TMPZ_{t,\text{TMP}} as input. Table 4 report the forecasting performance of (i) Zt,Spatial()Z^{(\ell)}_{t,\text{Spatial}}, (ii) Zt,TMPZ_{t,\text{TMP}}, and (iii) TMP-Nets (our proposed model). We find that our TMP-Nets outperforms both Zt,Spatial()Z^{(\ell)}_{t,\text{Spatial}} and Zt,TMPZ_{t,\text{TMP}} on two datasets, yielding highly statistically significant gains. Hence, we can conclude that (i) TMP vectorizations help to better capture global and local hidden topological information in the time dimension, and (ii) spatial graph convolution operation accurately learns the inter-dependencies (i.e., spatial correlations) among spatio-temporal graphs. We provide further ablation studies comparing the effect of slicing direction and the MP vectorization methods in the Section A.2.

Model PeMSD4 PeMSD8
TMP-Nets 147.57±\pm0.23 85.86±\pm0.64
Zt,TMPZ_{t,\text{TMP}} 165.67±\pm0.30 90.23±\pm0.15
Zt,Spatial()Z^{(\ell)}_{t,\text{Spatial}} 153.75±\pm0.22 88.38±\pm1.05
Table 4: Ablation Study on PeMSD4 and PESMD8 (RMSE results for first 1000 networks).

Computational Complexity:

One of the key issues why MP has not propagated widely into practice yet is its high computational costs. Our method improves on the state-of-the-art MP (ranging from 23.8 to 59.5 times faster than Multiparameter Persistence Image (MP-I) (Carrière and Blumberg 2020), and from 1.2 to 8.6 times faster than Multiparameter Persistence Kernel (MP-L) (Corbet et al. 2019)) and, armed with a computationally fast vectorization method (e.g., Betti summary (Lesnick and Wright 2022)), TMP yields competitive computational costs for a lower number of filtering functions (See Section A.3). Nevertheless, scaling for really large scale-problems is still a challenge. In the future we will explore TMP constructed only on the landmark points, that is, TMP will be constructed not on all but on the most important landmark nodes, which would lead to substantial sparsification of the graph representation.

Comparison with Other Topological GNN Models for Dynamic Networks:

The two existing time-aware topological GNNs for dynamic networks are TAMP-S2GCNets (Chen et al. 2021) and Z-GCNETs (Chen, Segovia, and Gel 2021). The pivotal distinction between our model and these counterparts lies in the fact that our model serves as a comprehensive extension of both, applicable across diverse data types encompassing point clouds and images (see Section D.3). Z-GCNETs employs single persistence approach, rendering it unsuitable for datasets that encompass two or more significant domain functions. In contrast, TAMP-S2GCNets employs multipersistence; however, its Euler-Characteristic surface vectorization fails to encapsulate lifespan information present in persistence diagrams. Notably, in scenarios involving sparse data, barcodes with longer lifespans signify main data characteristics, while short barcodes are considered as topological noise. The limitation of Euler-Characteristic Surfaces, being simply a linear combination of bigraded Betti numbers, lies in its inability to capture this distinction. In stark contrast, our framework encompasses all forms of vectorizations, permitting practitioners to choose their preferred vectorization technique while adapting to dynamic networks or time-dependent data comprehensively. For instance, compared to TAMP-S2GCNets model, our TMP-Nets achieves a better performance on the Bytom dataset, i.e., TMP-Nets (MAPE: 28.77±\pm3.30) vs. TAMP-S2GCNets (MAPE: 29.26±\pm1.06). Furthermore, from the computational time perspective, the average computation time of TMP and Dynamic Euler-Poincaré Surface (which is used in TAMP-S2GCNets model) are 1.85 seconds and 38.99 seconds respectively, i.e., our TMP is more efficient.

7 Discussion

We have proposed a new highly computationally efficient summary for multidimensional persistence for time-dependent objects, Temporal MultiPersistence (TMP). By successfully combining the latest TDA methods with deep learning tools, our TMP approach outperforms many popular state-of-the-art deep learning models in a consistent and unified manner. Further, we have shown that TMP enjoys important theoretical stability guarantees. As such, TMP makes an important step toward bringing the theoretical concepts of multipersistence from pure mathematics to the machine learning community and to the practical problems of time-aware learning of time-conditioned objects, such as dynamic graphs, time series, and spatio-temporal processes.

Still, scaling for ultra high-dimensional processes, especially in modern data streaming scenarios, may be infeasible for TMP. In the future, we will investigate algorithms such as those based on landmarks or pruning, with the goal to advance the computational efficiency of TMP for streaming applications.

Acknowledgements

Supported by the NSF grants DMS-2220613, DMS-2229417, ECCS 2039701, TIP-2333703, Simons Foundation grant # 579977, and ONR grant N00014-21-1-2530. Also, the paper is based upon work supported by (while Y.R.G. was serving at) the NSF. The views expressed in the article do not necessarily represent the views of NSF or ONR.

References

  • Adams et al. (2017) Adams, H.; et al. 2017. Persistence images: A stable vector representation of persistent homology. JMLR, 18.
  • Akcora, Gel, and Kantarcioglu (2022) Akcora, C. G.; Gel, Y. R.; and Kantarcioglu, M. 2022. Blockchain networks: Data structures of bitcoin, monero, zcash, ethereum, ripple, and iota. Wiley Int. Reviews: Data Mining and Knowledge Discovery, 12(1): e1436.
  • Aktas, Akbas, and El Fatmaoui (2019) Aktas, M. E.; Akbas, E.; and El Fatmaoui, A. 2019. Persistence homology of networks: methods and applications. Applied Network Science, 4(1): 1–28.
  • Ali et al. (2023) Ali, D.; et al. 2023. A survey of vectorization methods in topological data analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence.
  • Atienza et al. (2020) Atienza, N.; et al. 2020. On the stability of persistent entropy and new summary functions for topological data analysis. Pattern Recognition, 107: 107509.
  • Bai et al. (2020) Bai, L.; Yao, L.; Li, C.; Wang, X.; and Wang, C. 2020. Adaptive Graph Convolutional Recurrent Network for Traffic Forecasting. NeurIPS, 33.
  • Bin et al. (2018) Bin, Y.; et al. 2018. Describing video with attention-based bidirectional LSTM. IEEE transactions on cybernetics, 49(7): 2631–2641.
  • Botnan and Lesnick (2022) Botnan, M. B.; and Lesnick, M. 2022. An introduction to multiparameter persistence. arXiv preprint arXiv:2203.14289.
  • Botnan, Oppermann, and Oudot (2022) Botnan, M. B.; Oppermann, S.; and Oudot, S. 2022. Signed barcodes for multi-parameter persistence via rank decompositions. In SoCG.
  • Bubenik (2015) Bubenik, P. 2015. Statistical Topological Data Analysis using Persistence Landscapes. JMLR, 16(1): 77–102.
  • Cao et al. (2020) Cao, D.; et al. 2020. Spectral Temporal Graph Neural Network for Multivariate Time-series Forecasting. In NeurIPS, volume 33, 17766–17778.
  • Carlsson, De Silva, and Morozov (2009) Carlsson, G.; De Silva, V.; and Morozov, D. 2009. Zigzag persistent homology and real-valued functions. In SoCG.
  • Carlsson and Silva (2010) Carlsson, G.; and Silva, V. 2010. Zigzag Persistence. Found. Comput. Math., 10(4): 367–405.
  • Carrière and Blumberg (2020) Carrière, M.; and Blumberg, A. 2020. Multiparameter persistence image for topological machine learning. NeurIPS.
  • Chazal et al. (2014) Chazal, F.; Fasy, B. T.; Lecci, F.; Rinaldo, A.; and Wasserman, L. 2014. Stochastic convergence of persistence landscapes and silhouettes. In SoCG.
  • Chen et al. (2001) Chen, C.; Petty, K.; Skabardonis, A.; Varaiya, P.; and Jia, Z. 2001. Freeway performance measurement system: mining loop detector data. Transportation Research Record, 1748(1): 96–102.
  • Chen et al. (2015a) Chen, Y.; Keogh, E.; Hu, B.; Begum, N.; Bagnall, A.; Mueen, A.; and Batista, G. 2015a. The UCR time series classification archive.
  • Chen, Segovia, and Gel (2021) Chen, Y.; Segovia, I.; and Gel, Y. R. 2021. Z-GCNETs: time zigzags at graph convolutional networks for time series forecasting. In ICML, 1684–1694. PMLR.
  • Chen et al. (2021) Chen, Y.; Segovia-Dominguez, I.; Coskunuzer, B.; and Gel, Y. 2021. TAMP-S2GCNets: coupling time-aware multipersistence knowledge representation with spatio-supra graph convolutional networks for time-series forecasting. In ICLR.
  • Chen et al. (2015b) Chen, Y.; et al. 2015b. A general framework for never-ending learning from time series streams. Data mining and knowledge discovery, 29: 1622–1664.
  • Chung and Lawson (2022) Chung, Y.-M.; and Lawson, A. 2022. Persistence curves: A canonical framework for summarizing persistence diagrams. Advances in Computational Mathematics, 48(1): 6.
  • Corbet et al. (2019) Corbet, R.; Fugacci, U.; Kerber, M.; Landi, C.; and Wang, B. 2019. A kernel for multi-parameter persistent homology. Computers & graphics: X, 2: 100005.
  • Defferrard, Bresson, and Vandergheynst (2016) Defferrard, M.; Bresson, X.; and Vandergheynst, P. 2016. Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering. In NeurIPS, volume 29, 3844–3852.
  • Dey and Salem (2017) Dey, R.; and Salem, F. M. 2017. Gate-variants of Gated Recurrent Unit (GRU) neural networks. In MWSCAS.
  • Dey and Wang (2022) Dey, T. K.; and Wang, Y. 2022. Computational Topology for Data Analysis. Cambridge University Press.
  • di Angelo and Salzer (2020) di Angelo, M.; and Salzer, G. 2020. Tokens, Types, and Standards: Identification and Utilization in Ethereum. In 2020 IEEE DAPPS, 1–10.
  • Edelsbrunner and Harer (2010) Edelsbrunner, H.; and Harer, J. 2010. Computational topology: an introduction. American Mathematical Soc.
  • Eisenbud (2013) Eisenbud, D. 2013. Commutative algebra: with a view toward algebraic geometry, volume 150. Springer Science & Business Media.
  • Guo et al. (2019) Guo, S.; Lin, Y.; Feng, N.; Song, C.; and Wan, H. 2019. Attention based spatial-temporal graph convolutional networks for traffic flow forecasting. In AAAI.
  • Hofer et al. (2020) Hofer, C.; Graf, F.; Rieck, B.; Niethammer, M.; and Kwitt, R. 2020. Graph filtration learning. In ICML, 4314–4323. PMLR.
  • Jiang and Luo (2022) Jiang, W.; and Luo, J. 2022. Graph neural network for traffic forecasting: A survey. Expert Systems with Applications, 207: 117921.
  • Johnson and Jung (2021) Johnson, M.; and Jung, J.-H. 2021. Instability of the Betti Sequence for Persistent Homology. J. Korean Soc. Ind. and Applied Math., 25(4): 296–311.
  • Kipf and Welling (2017) Kipf, T. N.; and Welling, M. 2017. Semi-supervised classification with graph convolutional networks. ICLR.
  • Kriege, Johansson, and Morris (2020) Kriege, N. M.; Johansson, F. D.; and Morris, C. 2020. A survey on graph kernels. Applied Network Science, 1–42.
  • Lesnick and Wright (2022) Lesnick, M.; and Wright, M. 2022. Computing minimal presentations and bigraded betti numbers of 2-parameter persistent homology. SIAGA.
  • Li et al. (2018) Li, Y.; Yu, R.; Shahabi, C.; and Liu, Y. 2018. Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting. In ICLR.
  • Lipton, Berkowitz, and Elkan (2015) Lipton, Z. C.; Berkowitz, J.; and Elkan, C. 2015. A critical review of recurrent neural networks for sequence learning. arXiv preprint arXiv:1506.00019.
  • Pareja et al. (2020) Pareja, A.; Domeniconi, G.; Chen, J.; Ma, T.; Suzumura, T.; Kanezashi, H.; Kaler, T.; Schardl, T.; and Leiserson, C. 2020. Evolvegcn: Evolving graph convolutional networks for dynamic graphs. In AAAI.
  • Schmidhuber (2017) Schmidhuber, J. 2017. LSTM: Impact on the world’s most valuable public companies. http://people.idsia.ch/~juergen/impact-on-most-valuable-companies.html. Accessed: 2020-03-19.
  • Segovia Dominguez et al. (2021) Segovia Dominguez, I.; et al. 2021. Does Air Quality Really Impact COVID-19 Clinical Severity: Coupling NASA Satellite Datasets with Geometric Deep Learning. KDD.
  • Segovia-Dominguez et al. (2021) Segovia-Dominguez, I.; et al. 2021. TLife-LSTM: Forecasting Future COVID-19 Progression with Topological Signatures of Atmospheric Conditions. In PAKDD (1), 201–212.
  • Shamsi et al. (2022) Shamsi, K.; Victor, F.; Kantarcioglu, M.; Gel, Y.; and Akcora, C. G. 2022. Chartalist: Labeled Graph Datasets for UTXO and Account-based Blockchains. NeurIPS.
  • Shewalkar, Nyavanandi, and Ludwig (2019) Shewalkar, A.; Nyavanandi, D.; and Ludwig, S. A. 2019. Performance evaluation of deep neural networks applied to speech recognition: RNN, LSTM and GRU. J. Artificial Intelligence and Soft Computing Research, 9(4): 235–245.
  • Shin and Kim (2020) Shin, S.; and Kim, W. 2020. Skeleton-Based Dynamic Hand Gesture Recognition Using a Part-Based GRU-RNN for Gesture-Based Interface. IEEE Access, 8: 50236–50243.
  • Sutskever, Vinyals, and Le (2014) Sutskever, I.; Vinyals, O.; and Le, Q. V. 2014. Sequence to Sequence Learning with Neural Networks. In Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2, NIPS’14, 3104–3112. Cambridge, MA, USA: MIT Press.
  • Vassilevska, Williams, and Yuster (2006) Vassilevska, V.; Williams, R.; and Yuster, R. 2006. Finding the Smallest H-Subgraph in Real Weighted Graphs and Related Problems. In Automata, Languages and Programming.
  • Veličković et al. (2018) Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Lio, P.; and Bengio, Y. 2018. Graph attention networks. ICLR.
  • Vipond (2020) Vipond, O. 2020. Multiparameter Persistence Landscapes. J. Mach. Learn. Res., 21: 61–1.
  • Wang, Yang, and Meinel (2018) Wang, C.; Yang, H.; and Meinel, C. 2018. Image Captioning with Deep Bidirectional LSTMs and Multi-Task Learning. ACM Trans. Multimedia Comput. Commun. Appl., 14(2s).
  • Weber et al. (2019) Weber, M.; et al. 2019. Anti-Money Laundering in Bitcoin: Experimenting with GCNs for Financial Forensics. In KDD.
  • Wu et al. (2019) Wu, Z.; Pan, S.; Long, G.; Jiang, J.; and Zhang, C. 2019. Graph WaveNet for Deep Spatial-Temporal Graph Modeling. In IJCAI.
  • Xiang, Yan, and Demir (2020) Xiang, Z.; Yan, J.; and Demir, I. 2020. A rainfall-runoff model with LSTM-based sequence-to-sequence learning. Water resources research, 56(1): e2019WR025326.
  • Yan, Xiong, and Lin (2018) Yan, S.; Xiong, Y.; and Lin, D. 2018. Spatial temporal graph convolutional networks for skeleton-based action recognition. In AAAI.
  • Yao et al. (2018) Yao, H.; Wu, F.; Ke, J.; Tang, X.; Jia, Y.; Lu, S.; Gong, P.; Ye, J.; and Li, Z. 2018. Deep multi-view spatial-temporal network for taxi demand prediction. In AAAI.
  • Yu, Yin, and Zhu (2018) Yu, B.; Yin, H.; and Zhu, Z. 2018. Spatio-temporal GCNs: a deep learning framework for traffic forecasting. In IJCAI.
  • Yu et al. (2019) Yu, Y.; Si, X.; Hu, C.; and Zhang, J. 2019. A Review of Recurrent Neural Networks: Lstm Cells and Network Architectures. Neural Comput., 31(7): 1235–1270.

Appendix

In this part, we give additional details about our experiments and methods. In Appendix A, we provide more experimental results as ablation studies and additional baselines. In Appendix B, we discuss datasets and our experimental setup. In Appendix C, we provide a more theoretical background on persistent homology. In Appendix D, we give further examples of TMP vectorizations and generalizations to general types of data. We also discuss fundamental challenges in applications of multipersistence theory in spatio-temporal data, and our contributions in this context in Section D.2. Finally, in Appendix E, we prove the stability of TMP vectorizations. Our notation table (Table 12) can be found at the end of the appendix.

Appendix A Additional Results on Experiments

A.1 Additional Baselines

Contrary to other papers (e.g., (Jiang and Luo 2022)) which consider only a single option of 16,992 (PeMSD4) / 17,856 (PeMSD8) time stamps, we evaluate Z-GCNETs performance on varying lengths of 1,000 and 2,000 (Section 6.3 - Experimental Results details). This allows us to further explore the learning capabilities of our Z-GCNETs as a function of sample size and also most importantly to assess the performance of Z-GCNETs and its competitors under a more challenging and much more realistic scenario of limited temporal records. To better highlight the effectiveness of our proposed TMP-Nets model, we compare it with more baselines - DCRNN (Li et al. 2018), STGCN (Yu, Yin, and Zhu 2018), and GraphWaveNet (Wu et al. 2019). As shown in Table 5, we can find that our TMP-Nets is highly statistically significantly better than DCRNN, STGCN, and GraphWaveNet on PeMSD4 dataset.

Dataset Model RMSE
PEMSD4 TMP-Nets 147.57±\pm0.23
PEMSD4 DCRNN 153.34±\pm0.55
PEMSD4 STGCN 174.75±\pm0.35
PEMSD4 GraphWaveNet 151.87±\pm0.22
Table 5: Comparison of TMP-Nets and baselines on PeMSD4 (first 1,000 networks).
Model PeMSD4 PeMSD8
MAE RMSE MAPE (%) MAE RMSE MAPE (%)
AGCRN 19.83 32.26 12.97% 15.95 25.22 10.09%
Z-GCNETs 19.50 31.61 12.78% 15.76 25.11 10.01%
StemGNN 20.24 32.15 10.03% 15.83 24.93 9.26%
TMP-Nets 19.57 31.69 12.89% 16.36 25.85 10.36%
Table 6: Forecasting performance on whole PeMSD4 and PeMSD8 datasets.

A.2 Further Ablation Studies

Slicing Direction.

To investigate the importance of time direction, we now consider zigzag persistent homology along the axis of degree instead of time. We then conduct comparison experiments between TMP-Nets (i.e., Zt,TMPZ_{t,\text{TMP}} is generated through the time axis) and TMPdeg-Nets (i.e., Zt,TMPZ_{t,\text{TMP}} is generated along the axis of degree instead of time). As Table 7 indicates, TMP-Nets based on the time component outperforms TMPdeg-Nets. These findings can be expected, as time is one of the core variables in spatio-temporal processes, and, hence, we can conclude that extracting the zigzag-based topological summary along the time dimension is important for forecasting tasks. Nevertheless, we would like to underline that the TMP idea can be also applied to non-time-varying processes as long as there exists some alternative natural geometric dimension.

Dataset Model MAPE
Bytom TMP-Nets 28.77±\pm3.30
Bytom TMPdeg-Nets 29.15±\pm4.17
Table 7: Comparison of TMP-Nets and TMPdeg-Nets on Bytom dataset.

Bigraded Betti Numbers vs. TMP.

To compare the effectiveness of Zt,TMPZ_{t,\text{TMP}} which facilitates time direction with zigzag persistence for spatio-temporal forecasting, we conduct additional experiments on traffic datasets, i.e., PeMSD4 and PeMSD8 by using (i) TMP-Nets (based on Z-Meta) and (ii) MPBetti-Nets (based on bigraded Betti numbers). Tables 8 below show the results when using bigraded Betti numbers as a source of topological signatures in the ML model. As Tables 3 and 4 indicate, our TMP-Nets achieves better forecasting accuracy (i.e., lower RMSE) on both PeMSD4 and PeMSD8 datasets than the MPBetti-Nets and the difference in performance is highly statistically significant.

Such results can be potentially attributed to the fact that TMP-Nets tends to better capture the most important topological signals by choosing a suitable vectorization method for the task at hand. In particular, MPBetti-Nets only counts the number of topological features but do not give any emphasis to the longer barcodes appearing in the temporal direction, that is, MPBetti-Nets are limited in distinguishing topological signals from topological noise. However, longer barcodes (or density of the short barcodes) in the temporal dimension typically are the key to accurately capturing intrinsic topological patterns in the spatio-temporal data.

Model PEMSD4 PEMSD8
TMP-Nets 147.57±\pm0.23 85.86±\pm0.64
MPBetti-Nets 151.58±\pm0.19 87.71±\pm0.70
Table 8: TMP-Nets vs. MPBetti-Nets on PeMSD4 and PeMSD8 (RMSE results for first 1,000 networks).
Dataset Dim Betweenness Closeness Degree Power-Tran Power-Volume
Bytom {0,1} 236.95 seg 239.36 seg 237.60 seg 987.90 seg 941.39 seg
Decentraland {0,1} 134.75 seg 138.81 seg 133.82 seg 2007.50 seg 1524.10 seg
Golem {0,1} 571.35 seg 581.36 seg 573.93 seg 4410.47 seg 4663.52 seg
Table 9: Computational time on Ethereum tokens using five different filtering functions.

A.3 Computational Time on Different Filtering Functions

As expected, Table 9 shows that computational time highly depends on the complexity of the selected filtering function. However, the time spent in computing TMP Vectorizations is below two hours at most, which makes our approach highly useful in ML tasks.

A.4 Experiments on ECG5000 Benchmark Dataset

To support that our methodology can be applied in other dynamic networks, we run additional experiments in the ECG5000 dataset, (Chen et al. 2015b). This benchmark dataset contains 140 nodes and 5,000 time stamps. When running our methodology we extract patterns via Betti, Silhouette, and Entropy vectorizations, set the window size to 12, and the forecasting step as 3. Following preliminary cross-validation experiments, we set the resolution to 50 where we use a quantile-based selection of thresholds. We perform edge-weight filtration on graphs created via a correlation matrix. In our experiments, we have found that there is no significant difference between the results based on Betti and Silhouette vectorizations. In Table 10, we only report the results of TMP-Nets based on Silhouette vectorization. From Table 10, we can find that our TMP-Nets either deliver on par or outperforms the state-of-the-art baselines (with a smaller standard deviation). Note that ECG5000 is a small dataset (that is, 140 nodes only), and as such, the differences among models cannot be expected to be high for such a smaller network.

Dataset Model RMSE
ECG5000 TMP-Nets 0.52±\pm0.005
ECG5000 StemGNN 0.52±\pm0.006
ECG5000 AGCRN 0.52±\pm0.008
ECG5000 DCRNN 0.55±\pm0.005
Table 10: Comparison of TMP-Nets and baselines on ECG5000.

A.5 Connectivity in Ethereum networks

An interesting question is why TMP-Nets performs differently on Golem vs. Bytom and Decentraland. Success on each network token depends on the diversity of connections among nodes. In cryptocurrency networks, we expect nodes/addresses to be connected with other nodes with similar transaction connectivity (e.g. interaction among whales) as well as with nodes with low connectivity (e.g. terminal nodes). However, the assortativity measure of Golem (-0.47) is considerably lower than Bytom (-0.42) and Decentraland (-0.35), leading to disassortativity patterns (i.e., repetitive isolated clusters) in the Golem network, which, in turn, downgrade the success rate of forecasting.

Token Degree Betweenness Density Assortativity
Bytom 0.1995789 0.0002146 0.0020159 -0.4276000
Decentraland 0.3387378 0.0004677 0.0034215 -0.3589580
Golem 0.3354401 0.0004175 0.0033882 -0.4731063
Table 11: Comparison of statistics on Ethereum token networks.

Appendix B Further Details on Experimental Setup

B.1 Datasets

CA Traffic.

We consider two traffic flow datasets, i.e., PeMSD4 and PeMSD8, in California from January 1, 2018, to February 28, 2018, and from January 7, 2016, to August 31, 2016, respectively. Note that, both PeMSD4 and PeMSD8 are aggregated to 5 minutes, which means there are 12 time points in the flow data per hour. Following the settings of (Guo et al. 2019), we split the traffic datasets with ratio 6:2:26:2:2 into training, validation, and test sets; furthermore, in our experiments, we evaluate our TMP-Nets and baselines on two traffic flow datasets with varying lengths of sequences, i.e., 𝒯=1,000\mathcal{T}=1,000 (first 1,000 networks out of whole dataset) and 𝒯=2,000\mathcal{T}^{\prime}=2,000 (first 2,000 networks out of whole dataset).

Electrocardiogram.

We use the electrocardiogram (ECG5000) dataset (i.e., with a length of 5,000) from the UCR time series archive (Chen et al. 2015a), where each time series length is 140.

Ethereum blockchain tokens.

We use three token networks from the Ethereum blockchain (Bytom, Decentraland and Golem) each with more than $100M in market value (https://EtherScan.io). Thus dynamic networks are a compound of addresses of users, i.e. nodes, and daily transactions among users, i.e. edges (di Angelo and Salzer 2020; Akcora, Gel, and Kantarcioglu 2022). Since original token networks have an average of 442788/1192722 nodes/edges, we compute a subgraph via a maximum weight subgraph approximation (Vassilevska, Williams, and Yuster 2006) using the amount of transactions as weight. The dynamic networks contain different numbers of nets since every token was created on different days; Bytom (285), Decentraland (206), and Golem (443). Hence, given the dynamic network 𝒢t={𝒱t,t,W~t}\mathcal{G}_{t}=\{\mathcal{V}_{t},\mathcal{E}_{t},\tilde{W}_{t}\} and its corresponding node feature matrix XtN×FX_{t}\in\mathbb{R}^{N\times F}, where FF represents the number of features, we test our algorithm with both node and edge features and use the set of more active nodes, i.e. N=100N=100.

B.2 Experimental Setup

We implement our TMP-Nets with Pytorch framework on NVIDIA GeForce RTX 3090 GPU. Further, for all datasets, TMP-Nets is trained end-to-end by using the Adam optimizer with a L1 loss function. For Ethereum blockchain token networks, we use Adam optimizer with weight decay, initial learning rate, batch size, and epoch as 0, 0.01, 8, 80 respectively. For traffic datasets, we use Adam optimizer with weight decay, initial learning rate, batch size, and epoch as 0.3, 0.003, 64, and 350 respectively (where the learning rate is reduced by every 10 epochs after 110 epochs). In our experiments, we compare with 6 types of state-of-the-art methods, including DCRNN (Li et al. 2018), STGCN (Yu, Yin, and Zhu 2018), GraphWaveNet (Wu et al. 2019), AGCRN (Bai et al. 2020), Z-GCNETs (Chen, Segovia, and Gel 2021), and StemGNN (Cao et al. 2020). We search the hidden feature dimension of the CNN-based model for TMP representation learning among {16,32,64,128}\{16,32,64,128\}, and the embedding dimension among values {1,2,3,5,10}\{1,2,3,5,10\}. The resolution of TMP is 50 for all three datasets (i.e., the shape of input TMP is 50×5050\times 50). The tuning of our proposed TMP-Nets on each dataset is done via grid search over a fixed set of choices and the same cross-validation setup is used to tune the above baselines. For both PeMSD4 and PeMSD8, specifically, we consider the first 1,000 and 2,000 timestamps for both of them. This allows us to further explore the learning capabilities of our Z-GCNETs as a function of sample size and also most importantly to assess the performance of Z-GCNETs and its competitors under a more challenging and much more realistic scenario of limited temporal records. For all methods (including our TMP-Nets and baselines), we run 5 times in the same partition and report the average accuracy along with the standard deviation.

Filtering Functions and Thresholds.

We select five filtering functions capturing different graph properties: 3 node sublevel filtrations (degree, closeness, and betweenness) and 2 power filtrations on edges (transaction and volume). The thresholds are chosen from equally spaced quantiles of either, function values (node sublevel filtration), or geodesic distances (power filtration). As a result, the number of thresholds depends upon the desired resolution for the MP grid. A too-low resolution does not provide sufficient topological information for graph learning tasks, whilst a too-high resolution unnecessarily increases the computational complexity. Based on our cross-validation experiments, we found 50 thresholds is a reasonable rule of thumb, working well in most studies.

Prediction Horizon.

For Ethereum blockchain token networks (i.e., Bytom, Decentraland, and Golem), according to (Chen, Segovia, and Gel 2021), we set the forecasting step as 7 and the sliding window size as 7. For traffic datasets (PeMSD4 and PeMD8), according to (Guo et al. 2019), we set the forecasting step as 5 and the sliding window size as 12.

Appendix C More on Persistent Homology

C.1 Zigzag Persistent Homology

While the notion of zigzag persistence is general, to keep the exposition simpler, we restrict ourselves to dynamic networks. For a given dynamic network 𝒢~={𝒢t}1T\widetilde{\mathcal{G}}=\{\mathcal{G}_{t}\}_{1}^{T}, zigzag persistence detects pairwise compatible topological features in this time-ordered sequence of networks. While in single PH inclusions are always in the same direction (forward or backward), zigzag persistence and, more generally, the Mayer–Vietoris Diamond Principle allows us to consider evolution of topological features simultaneously in multiple directions (Carlsson, De Silva, and Morozov 2009). In particular, define a set of network inclusions over time

𝒢1𝒢2𝒢3,𝒢1𝒢2𝒢2𝒢3\begin{matrix}\mathcal{G}_{1}&&&&\mathcal{G}_{2}&&&&\mathcal{G}_{3}&\ldots,\\ &\mathrel{\rotatebox[origin={c}]{-45.0}{$\hookrightarrow$}}&&\mathrel{\rotatebox[origin={c}]{-135.0}{$\hookrightarrow$}}&&\mathrel{\rotatebox[origin={c}]{-45.0}{$\hookrightarrow$}}&&\mathrel{\rotatebox[origin={c}]{-135.0}{$\hookrightarrow$}}&&\\ &&\mathcal{G}_{1}\cup\mathcal{G}_{2}&&&&\mathcal{G}_{2}\cup\mathcal{G}_{3}&&&\end{matrix}

where 𝒢k𝒢k+1\mathcal{G}_{k}\cup\mathcal{G}_{k+1} is defined as a graph with a node set VkVk+1V_{k}\cup V_{k+1} and an edge set EkEk+1E_{k}\cup E_{k+1}.

Then, as before, by going to clique complexes of 𝒢i\mathcal{G}_{i} and 𝒢i𝒢i+1\mathcal{G}_{i}\cup\mathcal{G}_{i+1}, we obtain an ”extended” zigzag filtration induced by the dynamic network {𝒢t}1T\{\mathcal{G}_{t}\}_{1}^{T}, which allows us to detect the topological features which persist over time. That is, we record time points where we first and last observe a topological feature σ\sigma over the considered time period, i.e., birth and death times of σ\sigma, respectively, and 1dσ<bσT1\leq d_{\sigma}<b_{\sigma}\leq T. Notice that in contrast to ordinary persistence, both birth and death times (bσb_{\sigma} or dσd_{\sigma}) can be fractional i+12i+\frac{1}{2} (corresponding to 𝒢i𝒢i+1\mathcal{G}_{i}\cup\mathcal{G}_{i+1}) for 1i<T1\leq i<T. We then obtain kthk^{th} Zigzag Persistence Diagram ZPDk(𝒢~)={(bσ,dσ)σHk(𝒢^i) for bσi<dσ}{\rm{ZPD}_{k}}(\widetilde{\mathcal{G}})=\{(b_{\sigma},d_{\sigma})\mid\sigma\in H_{k}(\widehat{\mathcal{G}}_{i})\mbox{ for }b_{\sigma}\leq i<d_{\sigma}\}.

C.2 Stability for Single Persistence Vectorizations

For a given PD vectorization, stability is one of the most important properties for statistical purposes. Intuitively, the stability question is whether a small perturbation in PD causes a big change in the vectorization or not. To make this question meaningful, one needs to formalize what ”small” and “big” means in this context. That is, we need to define a notion of distance, i.e., a metric in the space of PDs. The most common such metric is called Wasserstein distance (or matching distance) which is defined as follows. Let PD(𝒳+)PD(\mathcal{X}^{+}) and PD(𝒳)PD(\mathcal{X}^{-}) be persistence diagrams two datasets 𝒳+\mathcal{X}^{+} and 𝒳\mathcal{X}^{-}. (We omit the dimensions in PDs). Let PD(𝒳+)={qj+}Δ+PD(\mathcal{X}^{+})=\{q_{j}^{+}\}\cup\Delta^{+} and PD(𝒳)={ql}ΔPD(\mathcal{X}^{-})=\{q_{l}^{-}\}\cup\Delta^{-} where Δ±\Delta^{\pm} represents the diagonal (representing trivial cycles) with infinite multiplicity. Here, qj+=(bj+,dj+)PD(𝒳+)q_{j}^{+}=(b^{+}_{j},d_{j}^{+})\in PD(\mathcal{X}^{+}) represents the birth and death times of a kk-hole σj\sigma_{j}. Let ϕ:PDk(𝒳+)PDk(𝒳)\phi:PD_{k}(\mathcal{X}^{+})\to PD_{k}(\mathcal{X}^{-}) represent a bijection (matching). With the existence of the diagonal Δ±\Delta^{\pm} on both sides, we make sure of the existence of these bijections even if the cardinalities |{qj+}||\{q_{j}^{+}\}| and |{ql}||\{q_{l}^{-}\}| are different. Then, pthp^{th} Wasserstein distance 𝒲p\mathcal{W}_{p} defined as

𝒲p(PD(𝒳+),PD(𝒳))=minϕ(jqj+ϕ(qj+)p)1p,\mathcal{W}_{p}(PD(\mathcal{X}^{+}),PD(\mathcal{X}^{-}))=\min_{\phi}(\sum_{j}\|q_{j}^{+}-\phi(q_{j}^{+})\|_{\infty}^{p})^{\frac{1}{p}},

where p+p\in\mathbb{Z}^{+}. Here, the bottleneck distance is 𝒲(PD(𝒳+),PD(𝒳))=maxjqj+ϕ(qj+)\mathcal{W}_{\infty}(PD(\mathcal{X}^{+}),PD(\mathcal{X}^{-}))=\max_{j}\|q_{j}^{+}-\phi(q_{j}^{+})\|_{\infty}.

Then, function φ\varphi is called stable if d(φ+,φ)C𝒲p(PD(𝒳+),PD(𝒳))\mathrm{d}(\varphi^{+},\varphi^{-})\leq C\cdot\mathcal{W}_{p}(PD(\mathcal{X}^{+}),PD(\mathcal{X}^{-})), where φ±\varphi^{\pm} is a vectorization of PD(𝒳±)PD(\mathcal{X}^{\pm}) and d(.,.)\mathrm{d}(.,.) is a suitable metric in the space of vectorizations. Here, the constant C>0C>0 is independent of 𝒳±\mathcal{X}^{\pm}. This stability inequality interprets that as the changes in the vectorizations are bounded by the changes in PDs. If a given vectorization φ\varphi holds such a stability inequality for some d\mathrm{d} and 𝒲p\mathcal{W}_{p}, we call φ\varphi a stable vectorization (Atienza et al. 2020). Persistence Landscapes (Bubenik 2015), Persistence Images (Adams et al. 2017), Stabilized Betti Curves (Johnson and Jung 2021) and several Persistence curves (Chung and Lawson 2022) are among well-known examples of stable vectorizations.

Appendix D More on TMP Vectorizations

D.1 Further Examples of TMP Vectorizations

TMP Silhouettes.

Silhouettes are another very popular SP vectorization method in machine learning applications (Chazal et al. 2014). The idea is similar to persistence landscapes, but this vectorization uses the life span of the topological features more effectively. For PD(𝒢)={(bi,di)}i=1NPD(\mathcal{G})=\{(b_{i},d_{i})\}_{i=1}^{N}, let Λi\Lambda_{i} be the generating function for (bi,di)(b_{i},d_{i}) as defined in Landscapes (Section 5). Then, Silhouette function ψ\psi is defined as ψ(𝒢)=i=1NwiΛi(t)i=1mwi,t[ϵ1,ϵq]\psi(\mathcal{G})=\dfrac{\sum_{i=1}^{N}w_{i}\Lambda_{i}(t)}{\sum_{i=1}^{m}w_{i}},\ t\in[\epsilon_{1},\epsilon_{q}], where the weight wiw_{i} is mostly chosen as the lifespan dibid_{i}-b_{i}, and {ϵk}k=1q\{\epsilon_{k}\}_{k=1}^{q} represents the thresholds for the filtration used. Again such a Silhouette function ψ(𝒢)\psi(\mathcal{G}) produces a 1D1D-vector ψ(𝒢)\vec{\psi}(\mathcal{G}) of size 1×(2q1)1\times(2q-1) as in persistence landscapes case.

As the structures of Silhouettes and Persistence Landscapes are very similar, so are their TMP Vectorizations. For a given time-dependent data 𝒢~={𝒢t}t=1T\widetilde{\mathcal{G}}=\{\mathcal{G}_{t}\}_{t=1}^{T}, similar to persistence landscapes, we use time direction and filtering function direction for our TMP Silhouettes. For a filtering function f:𝒱tf:\mathcal{V}_{t}\to\mathbb{R} with threshold set ={αj}1m\mathcal{I}=\{\alpha_{j}\}_{1}^{m}, we obtain TMP Silhouette as 𝐌ψj(𝒢~)=ψ(𝒢~j)\mathbf{M}_{\psi}^{j}(\widetilde{\mathcal{G}})=\vec{\psi}(\widetilde{\mathcal{G}}^{j}), where 𝐌ψj\mathbf{M}_{\psi}^{j} represents jthj^{th}-row of the 2D2D-vector 𝐌ψ\mathbf{M}_{\psi} and ψ(𝒢~j)\vec{\psi}(\widetilde{\mathcal{G}}^{j}) is the Silhouette vector induced by the zigzag persistence diagram for the time sequence 𝒢~j\widetilde{\mathcal{G}}^{j}. Again similar to the landscapes, by taking q=2T1q=2T-1, 𝐌ψ(𝒢~)\mathbf{M}_{\psi}(\widetilde{\mathcal{G}}), we obtain a 2D2D-vector of size m×(4T3)m\times(4T-3), where TT is the number of time steps in the data 𝒢~\widetilde{\mathcal{G}}.

TMP Betti & TMP Persistence Summaries.

Next, we discuss an important family of SP vectorizations, Persistence Curves (Chung and Lawson 2022). This is an umbrella term for several different SP vectorizations, i.e. Betti Curves, Life Entropy, Landscapes, et al. Our TMP framework naturally adapts to all Persistence Curves to produce multidimensional vectorizations. As Persistence Curves produce a single variable function in general, they all can be represented as 1D-vectors by choosing a suitable mesh size depending on the number of thresholds used. Here, we describe one of the most common Persistence Curves in detail, i.e. Betti Curves. It is straightforward to generalize the construction to other Persistence Curves.

Betti curves are one of the simplest SP vectorizations as it gives the count of the topological feature at a given threshold interval. In particular, βk(Δ)\beta_{k}(\Delta) is the total count of kk-dimensional topological feature in the simplicial complex Δ\Delta, i.e. βk(Δ)=rank(Hk(Δ))\beta_{k}(\Delta)=rank(H_{k}(\Delta)) (See Figure 2). For a given time-dependent data 𝒢~={𝒢t}t=1T\widetilde{\mathcal{G}}=\{\mathcal{G}_{t}\}_{t=1}^{T}, we use zigzag persistence in time direction, which yields 2T12T-1 thresholds steps. Then, the Betti Curve β(𝒢~)\beta(\widetilde{\mathcal{G}}) for the zigzag persistence diagram ZPD(𝒢~)ZPD(\widetilde{\mathcal{G}}) is a step function with 2T12T-1-intervals (we add another threshold after 2T2T to interpret the last interval). As β(𝒢~)\beta(\widetilde{\mathcal{G}}) is a step function, it can be described as a vector of size 1×(2T1)1\times(2T-1), i.e. β(𝒢~)=[β(1)β(1.5)β(2)β(2.5)β(3)β(T)]\vec{\beta}(\widetilde{\mathcal{G}})=[\beta(1)\ \beta(1.5)\ \beta(2)\ \beta(2.5)\ \beta(3)\ \dots\ \beta(T)], where β(t)\beta(t) is the total count of topological features in 𝒢^t\widehat{\mathcal{G}}_{t}. Here we omit the homological dimensions (i.e., subscript kk) to keep the exposition simpler.

Then, by using a filtering function f:𝒱tf:\mathcal{V}_{t}\to\mathbb{R} with threshold set ={αj}1m\mathcal{I}=\{\alpha_{j}\}_{1}^{m} for other direction, we define a TMP Betti curve as 𝐌βj=β(𝒢~j)\mathbf{M}_{\beta}^{j}=\vec{\beta}(\widetilde{\mathcal{G}}^{j}), where 𝐌βj\mathbf{M}_{\beta}^{j} is the jthj^{th}-row of the 2D2D-vector 𝐌β\mathbf{M}_{\beta}. Here, 𝒢~j={𝒢tj}t=1T\widetilde{\mathcal{G}}^{j}=\{\mathcal{G}_{t}^{j}\}_{t=1}^{T} is induced by the sublevel filtration for f:𝒱tf:\mathcal{V}_{t}\to\mathbb{R}, i.e. 𝒱tj={vr𝒱tf(vr)αj}\mathcal{V}^{j}_{t}=\{v_{r}\in\mathcal{V}_{t}\mid f(v_{r})\leq\alpha_{j}\}. Then, 𝐌β\mathbf{M}_{\beta} is s 2D2D-vector of size m×(2T1)m\times(2T-1).

An alternate (and computationally friendly) route for TMP Betti Summaries is to bypass zigzag persistent homology and 𝒢^t+12\widehat{\mathcal{G}}_{t+\frac{1}{2}} cliques and use directly clique complexes {𝒢^t}t=1T\{\widehat{\mathcal{G}}_{t}\}_{t=1}^{T}. This is because Betti curves do not need PDs, and they can be directly computed from the simplicial complexes {𝒢^t}t=1T\{\widehat{\mathcal{G}}_{t}\}_{t=1}^{T}. This way, we obtain a vector of size 1×T1\times T as β(𝒢~)=[β(1)β(2)β(3)β(T)]\vec{\beta}(\widetilde{\mathcal{G}})=[\beta(1)\ \beta(2)\ \beta(3)\ \dots\ \beta(T)]. Then, this version of induced TMP Betti curve 𝐌β(𝒢~)\mathbf{M}_{\beta}(\widetilde{\mathcal{G}}) yields a 2D2D-vector of size m×Tm\times T. It might have less information than the original zigzag version, but this is computationally much faster (Lesnick and Wright 2022) as one skips computation of PDs. Note that skipping zigzag persistence in time direction is only possible for Betti curves, as other vectorizations come from PDs, that is, lifespans, birth and death times are needed.

Refer to caption
Figure 2: Multidimensional persistence on a graph network (original graph: left). Black numbers denote the degree values of each node whilst red numbers show the edge weights of the network. Hence, shape properties are computed on two filtering functions (i.e., degree and edge weight). While each row filters by degree, each column filters the corresponding subgraph using its edge weights. For each cell, lower left corners represent the corresponding threshold values. For each cell, 0\mathcal{B}_{0} and 1\mathcal{B}_{1} represent the corresponding Betti numbers.

D.2 TMP Vectorizations and Multipersistence

Multipersistence theory is under intense research because of its promise to significantly improve the performance and robustness properties of single persistence theory. While single persistence theory obtains the topological fingerprint of single filtration, a multidimensional filtration with more than one parameter should deliver a much finer summary of the data to be used with ML models. However, multipersistence virtually has not reached any applications yet and remains largely unexplored by the ML community because of technical problems. Here, we provide a short summary of these issues. For further details, (Botnan and Lesnick 2022) gives a nice outline of the current state of the theory and major obstacles.

In single persistence, the threshold space {αi}\{\alpha_{i}\} being a subset of \mathbb{R}, is totally ordered, i.e., birth time << death time for any topological feature appearing in the filtration sequence {Δi}\{\Delta_{i}\}. By using this property, it was shown that “barcode decomposition” is well-defined in single persistence theory in the 1950s [Krull-Schmidt-Azumaya Theorem (Botnan and Lesnick 2022)–Theorem 4.2]. This decomposition makes the persistence module M={Hk(Δi)}i=1NM=\{H_{k}(\Delta_{i})\}_{i=1}^{N} uniquely decomposable into barcodes. This barcode decomposition is exactly what we call a PD.

However, when one goes to higher dimensions, i.e. d=2d=2, then the threshold set {(αi,βj)}\{(\alpha_{i},\beta_{j})\} is no longer totally ordered, but becomes partially ordered (Poset). In other words, some indices have ordering relation (1,2)<(4,7)(1,2)<(4,7), while some do not, e.g., (2,3) vs. (1,5). Hence, if one has a multipersistence grid {Δij}\{\Delta_{ij}\}, we no longer can talk about birth time or death time as there is no ordering any more. Furthermore, Krull-Schmidt-Azumaya Theorem is no longer true for upper dimensions (Botnan and Lesnick 2022)–Section 4.2. Hence, for general multipersistence modules barcode decomposition is not possible, and the direct generalization of single persistence to multipersistence fails. On the other hand, even if the multipersistence module has a good barcode decomposition, because of partial ordering, representing these barcodes faithfully is another major problem. Multipersistence modules are an important subject in commutative algebra, where one can find the details of the topic in (Eisenbud 2013).

While complete generalization is out of reach for now, several attempts have been tried to utilize the MP idea by using one-dimensional slices in the MP grid in recent (Carrière and Blumberg 2020; Vipond 2020). Slicing techniques use the persistence diagrams of predetermined one-dimensional slices in the multipersistence grid and then combine (compress) them as one-dimensional output (Botnan and Lesnick 2022). In that respect, one major issue is that the topological summary highly depends on the predetermined slicing directions in this approach. The other problem is the loss of information when compressing the information in various persistence diagrams.

As explained above, the MP approach does not have theoretical foundations yet, and there are several attempts to utilize this idea. In this paper, we do not claim to solve theoretical problems of multipersistence homology but offer a novel, highly practical multidimensional topological summary by advancing the existing methods in spatio-temporal settings. We use the time direction in the multipersistence grid as a natural slicing direction and overcome the predetermined slices problem. Furthermore, for each filtering step of the spatial direction, unlike other MP vectorizations, we do not compress the induced PDs, but we combine them as multidimensional vectors (matrices or arrays). As a result, these multidimensional topological fingerprints are capable of capturing very fine topological information hidden in the spatio-temporal data. In the spatial direction, we filter the data with one (or more) domain function and obtain induced substructures, while in the time direction, we capture the evolving topological patterns of these substructures induced by the filtering function via zigzag persistence. Our fingerprinting process is highly flexible, one can easily choose the right single persistence vectorization to emphasize either density of short barcodes or give importance to long barcodes appearing in these PDs. We obtain multidimensional vectors (matrices and arrays) as output which are highly practical to be used with various ML models.

D.3 TMP Framework for General Types of Data

So far, to keep the exposition simpler, we described our construction for dynamic networks. However, our framework is suitable for various types of time-dependent data. Let 𝒳~={𝒳t}t=1T\widetilde{\mathcal{X}}=\{\mathcal{X}_{t}\}_{t=1}^{T} be a time sequence of images or point clouds. Let f:𝒳~f:\widetilde{\mathcal{X}}\to\mathbb{R} be a filtering function that can be applied to all 𝒳t\mathcal{X}_{t}. Ideally, ff is a function that does not depend on tt, e.g if {𝒳t}\{\mathcal{X}_{t}\} represent a sequence of images for time 1tT1\leq t\leq T, ff can be taken as a grayscale function. If {𝒳t}\{\mathcal{X}_{t}\} is a sequence of point clouds at different times, then ff can be defined as a density function.

Then, the construction is the same as before. Let f:𝒳~f:\widetilde{\mathcal{X}}\to\mathbb{R} be the filtering function with threshold set ={αj}1m\mathcal{I}=\{\alpha_{j}\}_{1}^{m}. Let 𝒳tj=f1((,αj])\mathcal{X}_{t}^{j}=f^{-1}((-\infty,\alpha_{j}]). Then, for each 1j0m1\leq j_{0}\leq m, we have a time sequence 𝒳~j0={𝒳tj0}t=1T\widetilde{\mathcal{X}}^{j_{0}}=\{\mathcal{X}_{t}^{j_{0}}\}_{t=1}^{T}. Let {𝒳^tj0}t=1T\{\widehat{\mathcal{X}}_{t}^{j_{0}}\}_{t=1}^{T} be the induced simplicial complexes to be used for filtration. Then, by taking 𝒳^k.5j0=𝒳^kj0𝒳^k+1j0\widehat{\mathcal{X}}^{j_{0}}_{k.5}=\widehat{\mathcal{X}}^{j_{0}}_{k}\cup\widehat{\mathcal{X}}^{j_{0}}_{k+1}, we apply Zigzag PH to this sequence as before.

𝒳^1j0𝒳^1.5j0𝒳^2j0𝒳^2.5j0𝒳^3j0𝒳^Tj0\widehat{\mathcal{X}}_{1}^{j_{0}}\hookrightarrow\widehat{\mathcal{X}}^{j_{0}}_{1.5}\hookleftarrow\widehat{\mathcal{X}}^{j_{0}}_{2}\hookrightarrow\widehat{\mathcal{X}}^{j_{0}}_{2.5}\hookleftarrow\widehat{\mathcal{X}}^{j_{0}}_{3}\hookrightarrow\dots\hookleftarrow\widehat{\mathcal{X}}^{j_{0}}_{T}

Then, we obtain the zigzag persistence diagram ZPD(𝒳~j0)ZPD(\widetilde{\mathcal{X}}^{j_{0}}) for the filtration {𝒳^tj0}t=1T\{\widehat{\mathcal{X}}_{t}^{j_{0}}\}_{t=1}^{T}. Hence, we obtain mm zigzag PDs, ZPD(𝒳~j)ZPD(\widetilde{\mathcal{X}}^{j}), one for each 1jm1\leq j\leq m. Then, again by applying a preferred SP vectorization φ\varphi to the persistence diagram ZPD(𝒳~j)ZPD(\widetilde{\mathcal{X}}^{j}), we have corresponding vector φ(𝒳~j)\vec{\varphi}(\widetilde{\mathcal{X}}^{j}) (say size 1×k1\times k). Then, TMP vectorization 𝐌φ\mathbf{M}_{\varphi} can be defined as 𝐌φj(𝒳~)=φ(𝒳~j)\mathbf{M}_{\varphi}^{j}(\widetilde{\mathcal{X}})=\vec{\varphi}(\widetilde{\mathcal{X}}^{j}) where 𝐌φj\mathbf{M}_{\varphi}^{j} represents jthj^{th}-row of the 2D2D-vector 𝐌φ\mathbf{M}_{\varphi}. Hence, TMP vectorization of 𝒳~\widetilde{\mathcal{X}}, 𝐌φ(𝒳~)\mathbf{M}_{\varphi}(\widetilde{\mathcal{X}}), becomes a 2D2D-vector of size m×km\times k.

Appendix E Stability of TMP Vectorizations

In this part, we prove the stability theorem (Theorem 1) for TMP vectorizations. In particular, we prove that if the original SP vectorization φ\varphi is stable, then so is its TMP vectorization 𝐌φ\mathbf{M}_{\varphi}. Let 𝒢~={𝒢t}t=1T\widetilde{\mathcal{G}}=\{\mathcal{G}_{t}\}_{t=1}^{T} and ~={t}t=1T\widetilde{\mathcal{H}}=\{\mathcal{H}_{t}\}_{t=1}^{T} be two time sequences of networks. Let φ\varphi be a stable SP vectorization with the stability equation

d(φ(𝒢~),φ(~))Cφ𝒲pφ(PD(𝒢~),PD(~))\mathrm{d}(\varphi(\widetilde{\mathcal{G}}),\varphi(\widetilde{\mathcal{H}}))\leq C_{\varphi}\cdot\mathcal{W}_{p_{\varphi}}(PD(\widetilde{\mathcal{G}}),PD(\widetilde{\mathcal{H}})) (1)

for some 1pφ1\leq p_{\varphi}\leq\infty. Here, 𝒲p\mathcal{W}_{p} represents Wasserstein-pp distance as defined before.

Now, by taking d=2d=2 for TMP construction, we obtain bifiltrations {𝒢^tij}\{\widehat{\mathcal{G}}_{t}^{ij}\} and {^tij}\{\widehat{\mathcal{H}}_{t}^{ij}\} for each 1tT1\leq t\leq T. We define the induced matching distance between the multiple PDs as

𝐃({ZPD(𝒢~)},{ZPD(~)})=maxi,j𝒲pφ(ZPD(𝒢~ij),ZPD(~ij))\displaystyle\mathbf{D}(\{ZPD(\widetilde{\mathcal{G}})\},\{ZPD(\widetilde{\mathcal{H}})\})=\max_{i,j}\mathcal{W}_{p_{\varphi}}(ZPD(\widetilde{\mathcal{G}}^{ij}),ZPD(\widetilde{\mathcal{H}}^{ij}))

(2)

Now, we define the distance between induced TMP Vectorizations as

𝐃(𝐌φ(𝒢~),𝐌φ(~))=maxi,jd(φ(𝒢~ij),φ(~ij)).\mathbf{D}(\mathbf{M}_{\varphi}(\widetilde{\mathcal{G}}),\mathbf{M}_{\varphi}(\widetilde{\mathcal{H}}))=\max_{i,j}\mathrm{d}(\varphi(\widetilde{\mathcal{G}}^{ij}),\varphi(\widetilde{\mathcal{H}}^{ij})). (3)

Theorem 1: Let φ\varphi be a stable vectorization for single parameter PDs. Then, the induced TMP Vectorization 𝐌φ\mathbf{M}_{\varphi} is also stable, i.e. there exists C^φ>0\widehat{C}_{\varphi}>0 such that for any pair of time-aware network sequences 𝒢~\widetilde{\mathcal{G}} and ~\widetilde{\mathcal{H}}, the following inequality holds

𝐃(𝐌φ(𝒢~),𝐌φ(H~))C^φ𝐃({ZPD(𝒢~)},{ZPD(~)}).\mathbf{D}(\mathbf{M}_{\varphi}(\widetilde{\mathcal{G}}),\mathbf{M}_{\varphi}(\widetilde{H}))\leq\widehat{C}_{\varphi}\cdot\mathbf{D}(\{ZPD(\widetilde{\mathcal{G}})\},\{ZPD(\widetilde{\mathcal{H}})\}).
Proof.

WLOG, we assume SP vectorization φ\varphi produces a 1D1D-vector. For 2D2D or higher dimensional vectors, the proof will be similar. For any i0,j0i_{0},j_{0} with 1i0m1\leq i_{0}\leq m and 1j0n1\leq j_{0}\leq n, we will have filtration sequences {𝒢^ti0j0}t=1T\{\widehat{\mathcal{G}}^{i_{0}j_{0}}_{t}\}_{t=1}^{T} and {^ti0j0}t=1T\{\widehat{\mathcal{H}}^{i_{0}j_{0}}_{t}\}_{t=1}^{T}. This produces a zigzag persistence diagrams ZPD(𝒢~i0j0)ZPD(\widetilde{\mathcal{G}}^{i_{0}j_{0}}) and ZPD(~i0j0)ZPD(\widetilde{\mathcal{H}}^{i_{0}j_{0}}). Therefore, we have m.nm.n pairs of zigzag persistence diagrams.

Consider the distance definition for TMP vectorizations (Equation 3). Let i1,j1i_{1},j_{1} be the indices realizing the maximum in the right side of the equation, i.e.

d(φ(𝒢~i1j1),φ(~i1j1))=maxi,jd(φ(𝒢~ij),φ(~ij)).\mathrm{d}(\varphi(\widetilde{\mathcal{G}}^{i_{1}j_{1}}),\varphi(\widetilde{\mathcal{H}}^{i_{1}j_{1}}))=\max_{i,j}\mathrm{d}(\varphi(\widetilde{\mathcal{G}}^{ij}),\varphi(\widetilde{\mathcal{H}}^{ij})). (4)

Then, by stability of φ\varphi (i.e., inequality (1)), we have

d(φ(G~i1j1),φ(~i1j1))Cφ𝒲pφ(ZPD(G~i1j1),ZPD(~i1j1))\mathrm{d}(\varphi(\widetilde{G}^{i_{1}j_{1}}),\varphi(\widetilde{\mathcal{H}}^{i_{1}j_{1}}))\leq C_{\varphi}\cdot\mathcal{W}_{p_{\varphi}}(ZPD(\widetilde{G}^{i_{1}j_{1}}),ZPD(\widetilde{\mathcal{H}}^{i_{1}j_{1}})).

(5)

Now, as

𝒲pφ(ZPD(G~i1j1),ZPD(~i1j1))maxi,j𝒲pφ(ZPD(𝒢~ij),ZPD(~ij))\mathcal{W}_{p_{\varphi}}(ZPD(\widetilde{G}^{i_{1}j_{1}}),ZPD(\widetilde{\mathcal{H}}^{i_{1}j_{1}}))\\ \leq\max_{i,j}\mathcal{W}_{p_{\varphi}}(ZPD(\widetilde{\mathcal{G}}^{ij}),ZPD(\widetilde{\mathcal{H}}^{ij})),

(6)

by the definition of distance between TMP vectorizations (Equation 2), we find that

𝒲pφ(ZPD(G~i1j1),ZPD(~i1j1))𝐃({ZPD(𝒢~)},{ZPD(~)})\mathcal{W}_{p_{\varphi}}(ZPD(\widetilde{G}^{i_{1}j_{1}}),ZPD(\widetilde{\mathcal{H}}^{i_{1}j_{1}}))\\ \leq\mathbf{D}(\{ZPD(\widetilde{\mathcal{G}})\},\{ZPD(\widetilde{\mathcal{H}})\}).

(7)

Finally,

𝐃(𝐌φ(𝒢~),𝐌φ(~))\displaystyle\mathbf{D}(\mathbf{M}_{\varphi}(\widetilde{\mathcal{G}}),\mathbf{M}_{\varphi}(\widetilde{\mathcal{H}})) =\displaystyle= d(φ(𝒢~i1j1),φ(~i1j1))\displaystyle\mathrm{d}(\varphi(\widetilde{\mathcal{G}}^{i_{1}j_{1}}),\varphi(\widetilde{\mathcal{H}}^{i_{1}j_{1}}))
\displaystyle\leq Cφ𝒲pφ\displaystyle C_{\varphi}\cdot\mathcal{W}_{p_{\varphi}}
\displaystyle\leq C^φ𝐃({ZPD(𝒢~)},{ZPD(~)}).\displaystyle\widehat{C}_{\varphi}\cdot\mathbf{D}(\{ZPD(\widetilde{\mathcal{G}})\},\{ZPD(\widetilde{\mathcal{H}})\}).

Here, the leftmost equation follows from Equation 3. The first inequality follows from Equation 5. The final inequality follows from Equation 2. This concludes the proof of the theorem. ∎

Remark 2 (Stability with respect to Matching Distance).

While we define our own distance (.,.)\mathbf{(}.,.) in the space of MP modules for suitability to our setup, if you take matching distance in the space of MP modules, our result still implies the stability for TMP vectorizations 𝐌φ\mathbf{M}_{\varphi} induced from stable SP vectorization φ\varphi with pφ=p_{\varphi}=\infty. In particular, our distance definition 𝐃(.,.)\mathbf{D}(.,.) with specified slices is just a version of matching distance dM(.,.)d_{M}(.,.) restricted only to horizontal slices. The matching distance between two multipersistence modules E1,E2E_{1},E_{2} is defined as the supremum of the bottleneck (W(.,.)W_{\infty}(.,.)) distances between single persistence diagrams induced from all one-dimensional fibers (slices) of the MP module, i.e. dM(E1,E2)=supLW(PD(E1(L),PD(E2(L)))d_{M}(E_{1},E_{2})=\sup_{L}W_{\infty}(PD(E_{1}(L),PD(E_{2}(L))) where Ei(L)E_{i}(L) represents the slice restricted to line LL in the MP grid EiE_{i} (Dey and Wang 2022, Section 12.3).

In this sense, with this notation, our distance would be a restricted version of matching distance dM(.,.)d_{M}(.,.) as follows: 𝐃(M1,M2)=maxW,\mathbf{D}(M_{1},M_{2})=\max W_{\infty}, where L0L_{0} represent horizontal slices, then 𝐃(E1,E2)dM(E1,E2)\mathbf{D}(E_{1},E_{2})\leq d_{M}(E_{1},E_{2}). Then, by combining with our stability theorem, we obtain d(𝐌φ(E1),𝐌φ(E2))CφdM(E1,E2)d(\mathbf{M}_{\varphi}(E_{1}),\mathbf{M}_{\varphi}(E_{2}))\leq C_{\varphi}d_{M}(E_{1},E_{2}). Hence, if two MP modules are close to each other in the matching distance, then their corresponding TMP vectorizations 𝐌φ\mathbf{M}_{\varphi} are close to each other, too.

To sum up, for TMP vectorizations 𝐌φ\mathbf{M}_{\varphi} induced from stable SP vectorization φ\varphi with pφ=p_{\varphi}=\infty, our result naturally implies stability with respect to matching distance on multipersistence modules. The condition pφ=p_{\varphi}=\infty comes from the bottleneck distance (WW_{\infty}) used in the definition of dMd_{M}. If one defines a generalization of matching distance for other Wasserstein distances WpW_{p} for p[1,)p\in[1,\infty), then a similar result can hold for other stable TMP vectorizations.

Table 12: Notation and main symbols.
Notation Definition
𝒢t\mathcal{G}_{t} the spatial network at timestamp tt
𝒱t\mathcal{V}_{t} the node set at timestamp tt
t\mathcal{E}_{t} the edge set at timestamp tt
WtW_{t} the edge weights at timestamp tt
NtN_{t} the number of nodes at timestamp tt
𝒢~\widetilde{\mathcal{G}} a time series of graphs
𝒢^i\widehat{\mathcal{G}}^{i} an abstract simplicial complex
DD the highest dimension in a simplicial complex
σ\sigma a kk-dimensional topological feature
dσbσd_{\sigma}-b_{\sigma} the life span of σ\sigma
PDk(𝒢){\rm{PD}_{k}}(\mathcal{G}) kk-dimensional persistent diagram
Hk()H_{k}(\cdot) kthk^{th} homology group
ff and gg two filtering functions for sublevel filtration
F(,)F(\cdot,\cdot) multivariate filtering function
m×nm\times n rectangular grid for bifiltrations
ZPDk()ZPD_{k}(\cdot) the zigzag persistence diagram
φ\varphi a single persistence vectorization
φ()\vec{\varphi}(\cdot) the vector from φ\varphi
𝐌φ\mathbf{M}_{\varphi} TMP Vectorization of φ\varphi
λ\lambda persistence landscape
𝐌λ\mathbf{M}_{\lambda} TMP Landscape
μ~\widetilde{\mu} persistence surface
𝐌μj\mathbf{M}^{j}_{\mu} TMP Persistence Image
𝐃(,)\mathbf{D}(\cdot,\cdot) distance between persistence diagrams
D(,)\mathrm{D}(\cdot,\cdot) distance between TMP Vectorizations
XX node feature matrix
Zt,Spatial()Z^{(\ell)}_{t,\text{Spatial}} graph convolution on adptive adjacency matrix
Zt,TMPZ_{t,\text{TMP}} image-level local topological feature
𝒲p(,)\mathcal{W}_{p}(\cdot,\cdot) Wasserstein distance
𝒲(,)\mathcal{W}_{\infty}(\cdot,\cdot) Bottleneck distance