This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Community-Aware Temporal Walks: Parameter-Free Representation Learning on Continuous-Time Dynamic Graphs

He Yu1    Jing Liu2,21{}^{1},^{2}School of Artificial Intelligence, Xidian University, 2 South Taibai Road, Xi’an, Shaanxi 710071, China
,21{}^{1},^{2}Guangzhou Institute of Technology, Xidian University, Knowledge City, Guangzhou, Guangdong 510555, China
124398862@qq.com, 2neouma@mail.xidian.edu.cn
Abstract

Dynamic graph representation learning plays a crucial role in understanding evolving behaviors. However, existing methods often struggle with flexibility, adaptability, and the preservation of temporal and structural dynamics. To address these issues, we propose Community-aware Temporal Walks (CTWalks), a novel framework for representation learning on continuous-time dynamic graphs. CTWalks integrates three key components: a community-based parameter-free temporal walk sampling mechanism, an anonymization strategy enriched with community labels, and an encoding process that leverages continuous temporal dynamics modeled via ordinary differential equations (ODEs). This design enables precise modeling of both intra- and inter-community interactions, offering a fine-grained representation of evolving temporal patterns in continuous-time dynamic graphs. CTWalks theoretically overcomes locality bias in walks and establishes its connection to matrix factorization. Experiments on benchmark datasets demonstrate that CTWalks outperforms established methods in temporal link prediction tasks, achieving higher accuracy while maintaining robustness. The implementation of our proposed method is publicly available at https://github.com/leonyuhe/CTWalks.

1 Introduction

Continuous-Time Dynamic Graphs (CTDGs) rossi2020temporal ; xu2020inductive ; wang2021causal ; souza2022provably provide a comprehensive framework for modeling temporal interactions in real-world systems. By representing entities and their time-stamped interactions as nodes and edges, CTDGs capture the evolving dynamics of diverse systems, including social networks, biological processes, knowledge graphs, and recommendation platforms nickel2015review ; zhang2020gnnrec ; wang2019kgat ; zhang2021graph ; fout2017protein . Among the myriad tasks supported by CTDGs, temporal link prediction luo2022neighborhood ; yu2023towards —predicting future interactions based on historical data—stands out as a fundamental problem. Success in this task not only enhances our understanding of dynamic systems but also drives practical applications such as personalized recommendations and anomaly detection. Despite significant advancements in dynamic graph representation learning kazemi2020representation ; yang2023dynamic ; zhu2022encoder , existing methods face three critical challenges that limit their flexibility, adaptability, and expressiveness.

Refer to caption
Figure 1: Traditional anonymous encodings fail to differentiate nodes with similar local structures but distinct community roles, such as AA and FF. Without community context, the identical structures of AA and FF make it impossible to predict DD’s relationship with either node.

First, current approaches rely heavily on heuristic sampling strategies to extract key graph features, including node relationships, edge interactions, and temporal patterns. However, these strategies often demand intensive hyperparameter tuning to achieve a balance between exploration and exploitation jin2023motif ; xia2021motif . Such reliance complicates the training process and introduces sensitivity to dataset-specific characteristics, limiting their generalizability.

Second, many existing methods fail to account for mesoscopic structures, such as communities fortunato2010community ; newman2004finding ; Liu2014Multiobjective ; Teng2021Overlapping , which bridge local (node-level) and global (graph-level) patterns. Communities are crucial in real-world networks for distinguishing nodes with similar local structures but differing roles within their communities. Without incorporating community-aware representations, traditional anonymous encodings struggle to differentiate such nodes, resulting in ambiguity when predicting interactions, as demonstrated in Figure 1.

Finally, existing methods struggle to handle the continuous nature of temporal dynamics in real-world networks. Conventional approaches often discretize time into fixed intervals or aggregate observations, which obscures important temporal information, particularly in irregularly-sampled data. Recent advances in continuous-time models, such as Neural Ordinary Differential Equations (Neural ODEs)chen2018neural , demonstrate the potential to preserve temporal continuity by defining hidden states at all time points through differential equations. However, integrating such continuous-time methodologies with dynamic graph structures remains an active area of researchjin2022mtgode ; asikis2020neural ; chen2023signed ; sdgnn2022 , particularly in scenarios that require capturing both intra-community and inter-community temporal dependencies.

To tackle these challenges, we propose Community-Aware Temporal Walks (CTWalks), a novel, parameter-free, and community-aware framework for representation learning on CTDGs. By leveraging the interplay between intra-community and inter-community dynamics, CTWalks achieves robust scalability, expressiveness, and generalizability. Contributions of this work include:

1. Community Detection on CTDGs: CTWalks introduces an innovative method to perform community detection on CTDGs by transforming them into a weighted temporal graph. This transformation aggregates interactions into a static graph, where edge weights reflect interaction frequencies, thereby enabling the identification of node community labels reflecting temporal information.

2. Parameter-Free Temporal Sampling: CTWalks eliminates the need for handcrafted hyperparameters in sampling by adopting a community-driven mechanism. Temporal walks are adaptively performed within or across communities, encoding both structural dependencies and temporal dynamics. In the anonymization process, we embed community labels to restore contextual distinctions between nodes, reducing ambiguity and enhancing inductive capabilities.

3. Continuous-Time Dynamics via ODEs: To explicitly model temporal dependencies and capture essential dynamic laws, CTWalks employs a Neural ODE-based process. This approach learns spatiotemporal dynamics continuously, aligning modeled dynamics with irregularly sampled observations and providing a robust basis for downstream tasks.

4. Empirical Validation: Extensive experiments on benchmark datasets demonstrate CTWalks’ superior performance in the temporal link prediction task. Compared to several well-known methods, CTWalks achieves higher accuracy while maintaining competitive computational efficiency.

2 Related Work

Dynamic graphs kazemi2020representation ; yang2023dynamic ; zhu2022encoder ; Ma2024Higher provide a powerful framework for modeling systems that evolve over time, effectively capturing temporal interactions among nodes, edges, and their attributes. Depending on the representation of temporal information, dynamic graphs are broadly classified into Discrete-Time Dynamic Graphs (DTDGs) cong2021dynamic ; goyal2020dyngraph2vec ; pareja2020evolvegcn ; sankar2020dysat ; you2022roland and Continuous-Time Dynamic Graphs (CTDGs) rossi2020temporal ; xu2020inductive ; wang2021causal ; souza2022provably . DTDGs approximate temporal evolution using discrete graph snapshots, which provide a coarse-grained view of dynamic systems. In contrast, CTDGs represent interactions as a sequence of time-stamped events, which allows for high temporal fidelity, enabling the representation of systems with irregular or rapid interactions. Existing representation learning methods on dynamic graphs can be categorized into two primary paradigms: message-passing-based methods and sequence-model-based methods. Additionally, methods based on Continuous-Time Differential Equations (CTDEs) are discussed as a separate modeling approach chen2018neural ; rubanova2019latent ; xhonneux2020continuous .

Message-Passing-Based Methods extend graph neural networks (GNNs) scarselli2008graph ; Mo2025AutoSGNN to incorporate temporal information by propagating and aggregating features from neighboring nodes. For example, TGN rossi2020temporal integrates a memory module to store historical interactions, dynamically updating it through attention-based message passing. While effective in capturing temporal dependencies, TGN suffers from high computational overhead due to frequent memory updates and neighbor aggregation, limiting its scalability for large graphs. Other approaches, such as MTSN jin2023motif , aim to preserve structural motifs during message passing to enhance the capture of local patterns. However, the computational cost of motif construction and repeated matrix operations makes these methods impractical for large-scale applications.

Sequence-Model-Based Methods rely on recurrent neural networks (RNNs) elman1990finding or related architectures to capture temporal dependencies by sequentially updating node embeddings. For instance, JODIE kumar2019predicting employs coupled RNNs to jointly model static and dynamic embeddings, enabling tasks such as temporal link prediction. However, JODIE’s reliance on hyperparameter tuning, such as decay functions for time intervals, reduces its generalizability across datasets. Similarly, CAWs wang2021causal leverage causal anonymous walks to encode temporal sequences while preserving the temporal order of interactions.

Continuous-Time Differential Equation (CTDE)-based Methods chen2018neural provide a robust framework for modeling temporal dynamics in graphs by treating temporal interactions as continuous processes. For example, ODE-RNN rubanova2019latent leverages ODE solvers to capture the dynamics between observations, effectively bridging the gaps in irregularly sampled data. CGNN xhonneux2020continuous introduces continuous message-passing inspired by diffusion processes, addressing challenges like over-smoothing while enabling long-range dependency modeling. TGAT xu2020inductive employs temporal attention mechanisms coupled with Fourier-based time encoding to model temporal dependencies, showcasing the potential of attention in capturing fine-grained temporal patterns.

3 Preliminaries

Definition 1: Continuous-Time Dynamic Graph.

A continuous-time dynamic graph 𝒢\mathcal{G} is represented as a sequence of non-decreasing chronological interactions:

𝒢={((u1,v1),t1),((u2,v2),t2),},0t1t2.\mathcal{G}=\{((u_{1},v_{1}),t_{1}),((u_{2},v_{2}),t_{2}),\dots\},\quad 0\leq t_{1}\leq t_{2}\leq\cdots. (1)

Here, each pair (ui,vi)(u_{i},v_{i}) represents an undirected link between nodes uiu_{i} and viv_{i}, with a corresponding timestamp tit_{i}.

Definition 2: Temporal Link Prediction.

Temporal link prediction is the task of predicting whether an interaction between two nodes will occur at a specific time, based on historical interactions in the graph. Given the set of historical interactions before time tt:

(t)={((u,v),t)t<t},\mathcal{H}(t)=\{((u^{\prime},v^{\prime}),t^{\prime})\mid t^{\prime}<t\}, (2)

and two nodes uu and vv, the goal is to predict the likelihood of an interaction between uu and vv at time tt. Formally, this is expressed as:

Predict:P(((u,v),t)(t)),\text{Predict:}\;P(((u,v),t)\mid\mathcal{H}(t)), (3)

where PP represents the probability of the interaction, conditioned on the historical interaction sequence.

Definition 3: Temporal Walk.

A temporal walk on a CTDG 𝒢\mathcal{G} is a sequence of node-time pairs:

W={(w1,t1),(w2,t2),,(wl,tl)},W=\{(w_{1},t_{1}),(w_{2},t_{2}),\ldots,(w_{l},t_{l})\}, (4)

where:

  • w1=uw_{1}=u and t1=tt_{1}=t: WW is rooted at node uu at time tt.

  • t1>t2>>tlt_{1}>t_{2}>\ldots>t_{l}: Timestamps are strictly decreasing.

  • ((wi1,wi),ti)((w_{i-1},w_{i}),t_{i}): Each step corresponds to a temporal edge in 𝒢\mathcal{G}.

Definition 4: Community.

A community in a graph is a subset of nodes CVC\subseteq V such that nodes within CC are densely connected to each other, while connections between nodes in CC and those outside CC are sparse.

Formally, community detection algorithms aim to partition the node set VV into kk disjoint subsets 𝒞={C1,C2,,Ck}\mathcal{C}=\{C_{1},C_{2},\ldots,C_{k}\}, where:

CiCj=,ij,andi=1kCi=V.C_{i}\cap C_{j}=\emptyset,\quad\forall i\neq j,\quad\text{and}\quad\bigcup_{i=1}^{k}C_{i}=V. (5)

The quality of a community partition is often quantified using metrics such as modularity newman2004finding . Communities represent mesoscopic structures in graphs, bridging the gap between local node-level interactions and global graph-level patterns.

4 The Proposed Method: CTWalks

CTWalks comprises three key components: sampling, anonymization, and encoding. The complete notation system used throughout CTWalks is detailed in Appendix A, while the algorithm’s computational complexity is analyzed in Appendix C.3.

4.1 Sampling

Weighted Temporal Graph Construction.

We propose an innovative method for community detection in CTDGs by introducing the concept of a weighted temporal graph GwG_{w}, which is derived from the input CTDG 𝒢\mathcal{G}. The set of nodes in GwG_{w} includes all nodes appeared in 𝒢\mathcal{G}, and the edge weights wuvw_{uv} in GwG_{w} encode the frequency of interactions between nodes uu and vv, defined as:

wuv=(u,v,t)𝒢𝕀(u,v),w_{uv}=\sum_{(u,v,t)\in\mathcal{G}}\mathbb{I}(u,v), (6)

where 𝕀\mathbb{I} is an indicator function counting occurrences of (u,v)(u,v) in 𝒢\mathcal{G}. This transformation aggregates temporal interactions into a static graph while retaining the structural essence of the CTDG, thereby enabling efficient processing and community detection. By applying modularity optimization algorithms, such as Louvain newman2004fast ; blondel2008fast , on GwG_{w}, the graph is partitioned into kk communities 𝒞={C1,C2,,Ck}\mathcal{C}=\{C_{1},C_{2},\ldots,C_{k}\}.

Sampling Strategy.

To effectively capture the temporal and structural patterns in 𝒢\mathcal{G}, nodes are categorized based on their roles within the community structure:

  • Non-Bridging Nodes: Nodes confined within a single community CiC_{i}. Their temporal walks are restricted to the neighbors within the intra-community subgraph GCiG_{C_{i}}.

  • Bridging Nodes: Nodes that connect multiple communities. Their temporal walks are restricted to the neighbors within the inter-community subgraph GIG_{I}, ensuring transitions only occur between bridging nodes.

At each step of a temporal walk, the next node uu is sampled based on its temporal relationship with the current node vv under the constraint t<tt^{\prime}<t, where tt is the current timestamp. The transition probability is defined as:

P(u,t|v,t)=e(tt)uNvalid(v)e(tt),P(u,t^{\prime}|v,t)=\frac{e^{-(t-t^{\prime})}}{\sum_{u^{\prime}\in\text{N}_{\text{valid}}(v)}e^{-(t-t^{\prime})}}, (7)

where:

  • Nvalid(v)\text{N}_{\text{valid}}(v) represents the valid neighbor set of vv, determined by its type:

    • For non-bridging nodes vVCiv\in V_{C_{i}}, Nvalid(v)=N(v)GCi\text{N}_{\text{valid}}(v)=\text{N}(v)\cap G_{C_{i}}, where N(v)\text{N}(v) is the neighbor set of vv.

    • For bridging nodes vVIv\in V_{I}, Nvalid(v)=N(v)GI\text{N}_{\text{valid}}(v)=\text{N}(v)\cap G_{I}.

  • tt^{\prime} is the timestamp associated with the interaction between vv and uu.

Algorithm 1 Sampling Strategy

Input: Intra-community subgraphs {GC1,GC2,,GCk}\{G_{C_{1}},G_{C_{2}},\ldots,G_{C_{k}}\}, inter-community subgraph GIG_{I}, walk length ll, number of walks per node RR.
Output: Set of temporal walks 𝒲\mathcal{W}.

1:  Initialize 𝒲\mathcal{W}\leftarrow\emptyset
2:  for each node vVv\in V do
3:     for each walk r=1r=1 to RR do
4:        Initialize walk W[(v,t1)]W\leftarrow[(v,t_{1})], timestamp tt1t\leftarrow t_{1}
5:        if vVIv\in V_{I} then
6:           GcurrentGIG_{\text{current}}\leftarrow G_{I}
7:        else
8:           Find CiC_{i} such that vVCiv\in V_{C_{i}}
9:           GcurrentGCiG_{\text{current}}\leftarrow G_{C_{i}}
10:        end if
11:        for step i=1i=1 to l1l-1 do
12:           Identify valid neighbors N(v)\text{N}(v) with t<tt^{\prime}<t
13:           if N(v)=\text{N}(v)=\emptyset then
14:              Break
15:           end if
16:           Sample next node uu using P(u,t|v,t)P(u,t^{\prime}|v,t) in Eq. (7)
17:           Update WW(u,t)W\leftarrow W\cup(u,t^{\prime}), ttt\leftarrow t^{\prime}
18:        end for
19:        𝒲𝒲{W}\mathcal{W}\leftarrow\mathcal{W}\cup\{W\}
20:     end for
21:  end for
22:  Return 𝒲\mathcal{W}

By utilizing community partitioning to guide walk strategies, CTWalks eliminates the need for additional parameters to control walk directions, simplifying implementation and improving adaptability to diverse graph structures and temporal dynamics. Non-bridging nodes focus on intra-community walks, capturing localized patterns, while bridging nodes explore inter-community relationships. The complete CTWalks sampling process is detailed in Algorithm 1.

4.2 Anonymization

CTWalks replaces node identities with position-based representations while embedding both directionality and community context. For a temporal interaction ((u,v),ti)((u,v),t_{i}), the anonymization process operates on nodes appearing in temporal walk sets originating from the source node uu and target node vv.

Community- and Direction-Aware Representation.

The anonymized representation for a node ww, based on temporal walk sets from the source node uu and target node vv, integrates both directionality and the community labels of the root nodes. This is defined as:

A(w;𝒮u,𝒮v,Cu,Cv)=[A(w;𝒮u)||Cu||A(w;𝒮v)||Cv],A(w;\mathcal{S}_{u},\mathcal{S}_{v},C_{u},C_{v})=\big{[}A(w;\mathcal{S}_{u})||C_{u}||A(w;\mathcal{S}_{v})||C_{v}\big{]}, (8)

where:

  • 𝒮u\mathcal{S}_{u} and 𝒮v\mathcal{S}_{v} are the sets of temporal walks originating from uu and vv, respectively.

  • A(w;𝒮u)A(w;\mathcal{S}_{u}) and A(w;𝒮v)A(w;\mathcal{S}_{v}) aggregate ww’s position-based occurrence information across all walks in 𝒮u\mathcal{S}_{u} and 𝒮v\mathcal{S}_{v}, respectively.

  • CuC_{u} and CvC_{v} are the community labels of the root nodes uu and vv, directly appended to their respective anonymization vectors.

Anonymized Walk Construction.

For a single temporal walk 𝒲={w1,w2,,wl}\mathcal{W}=\{w_{1},w_{2},\dots,w_{l}\} with ascending timestamps t1<t2<<tlt_{1}<t_{2}<\dots<t_{l}, the anonymized walk representation is defined as:

𝒲anon={(A(wi),ti)i=1,2,,l},\mathcal{W}_{\text{anon}}=\{(A(w_{i}),t_{i})\mid i=1,2,\dots,l\}, (9)

where A(wi)A(w_{i}) is the community- and direction-aware anonymized representation of node wiw_{i}, incorporating information from the sets of temporal walks 𝒮u\mathcal{S}_{u} and 𝒮v\mathcal{S}_{v}, as well as the community labels CuC_{u} and CvC_{v}. The anonymization process is detailed in Appendix G.

4.3 Encoding

The encoding mechanism in CTWalks processes anonymized temporal walks 𝒲anon\mathcal{W}_{\text{anon}} by combining continuous temporal evolution and instantaneous updates, while integrating community-aware information. This ensures the final state hlh_{l} captures temporal, structural, and community-aware information for downstream tasks.

The process begins by initializing the cumulative hidden state h0=𝟎h^{\prime}_{0}=\mathbf{0}. Before the first step, the instantaneous hidden state h1h_{1} is computed as:

h1=g(h0,A(w1)).h_{1}=g(h^{\prime}_{0},A(w_{1})). (10)

For each subsequent step ii (1i<l1\leq i<l), the encoding alternates between:

  1. 1.

    Continuous Integration: Update hih^{\prime}_{i} by integrating the temporal evolution function f(h,t)f(h,t) over [ti,ti+1][t_{i},t_{i+1}]:

    hi=titi+1f(hi,t)𝑑t.h^{\prime}_{i}=\int_{t_{i}}^{t_{i+1}}f(h_{i},t)\,dt. (11)
  2. 2.

    Instantaneous Update: Compute hi+1h_{i+1}, incorporating hih^{\prime}_{i} and A(wi+1)A(w_{i+1}):

    hi+1=g(hi,A(wi+1)).h_{i+1}=g(h^{\prime}_{i},A(w_{i+1})). (12)

We define gg and ff as parameterized models, gg updates instantaneous states, while ff governs continuous temporal integration. Details are provided in Appendix C.1.

Algorithm 2 Encoding of CTWalks

Input: Anonymized temporal walk 𝒲anon={(A(w1),t1),,(A(wl),tl)}\mathcal{W}_{\text{anon}}=\{(A(w_{1}),t_{1}),\ldots,(A(w_{l}),t_{l})\}.
Output: Final walk representation hlh_{l}.

1:  Initialize h0𝟎h^{\prime}_{0}\leftarrow\mathbf{0}
2:  Compute h1g(h0,A(w1))h_{1}\leftarrow g(h^{\prime}_{0},A(w_{1}))
3:  for i=1i=1 to l1l-1 do
4:     Update hih^{\prime}_{i} via continuous integration
5:     Compute hi+1h_{i+1} via instantaneous update
6:  end for
7:  Return hlh_{l}

By alternating between cumulative updates and instantaneous hidden states, the encoding ensures hlh_{l} is temporally coherent, structurally rich, and embeds community context for robust downstream tasks. The detailed implementation of this process is provided in Algorithm 2, and an illustration is shown in Fig. 2. For a detailed discussion, please refer to Appendix H.

4.4 Extension for Attributed Graphs

To encode node and edge attributes, the encoding process can be extended by modifying the instantaneous update function gg as follows:

hi=g(hi1,A(wi)XwiXei),h_{i}=g(h^{\prime}_{i-1},A(w_{i})||X_{w_{i}}||X_{e_{i}}), (13)

where:

  • XwiX_{w_{i}} represents the attributes of node wiw_{i},

  • XeiX_{e_{i}} represents the attributes of edge ei={wi1,wi}e_{i}=\{w_{i-1},w_{i}\},

  • |||| denotes the concatenation operation.

In this extended formulation, node attributes XwiX_{w_{i}} and edge attributes XeiX_{e_{i}} introduce additional context for real-world attributed dynamic graphs. Several methods yang2013community ; yang2013overlapping ; du2017community focus on community detection in attributed graphs, leveraging node attributes to improve accuracy. This extension enhances the model’s flexibility and expressiveness, enabling it to capture not only the structural and temporal information but also the rich attributes associated with nodes and edges.

Refer to caption
Figure 2: Illustration of the encoding mechanism in CTWalks. Each node in the temporal walk is processed through instantaneous updates (gg) and continuous temporal integration (ff) along an irregular time trajectory. The final hidden state h4h_{4} represents the walk encoding, incorporating both structural and community-aware information.

5 Theoretical Analysis

Dynamic graph sampling inherently involves both spatial structures and temporal sequences. To isolate the impact of temporal sequences and focus on spatial sampling, it is essential to establish an unbiased spatial sampling environment as a foundation. To this end, we analyze the potential biases introduced by traditional random walks on unweighted, undirected static graphs, which may limit their ability to comprehensively explore the graph structure. Furthermore, we examine whether a parameter-free walk strategy, guided by community partitions and distinguishing intra-community and inter-community transitions, can effectively address these biases.

By simplifying the problem to static graphs, we aim to eliminate the confounding effects of temporal dynamics and concentrate on evaluating the locality bias and the theoretical advantages of community-guided walks in an idealized spatial context.

Analysis 1: Overcoming Locality Bias

Let a traditional random walk generate a sequence {w0,w1,,wLw1}\{w_{0},w_{1},...,w_{L_{w}-1}\}, where wtVw_{t}\in V. At step t+1t+1, the walk transitions to a neighbor of wtw_{t} with probability 1/d(wt)1/d(w_{t}), where d(wt)d(w_{t}) is the degree of node wtw_{t}. Let PuvP_{uv} denote the one-step transition probability from node uu to vv, defined as:

Puv={1/d(u),if (u,v)E,0,otherwise.P_{uv}=\begin{cases}1/d(u),&\text{if }(u,v)\in E,\\ 0,&\text{otherwise.}\end{cases} (14)

Suppose w0=uw_{0}=u and wt=vw_{t}=v, where u,vVu,v\in V and w0,w1,,wt1vw_{0},w_{1},...,w_{t-1}\neq v. The probability that the walk starts at uu and visits vv for the first time at time tt is given by:

ruvt=j𝒩(u)Pujrjvt1=1d(u)j𝒩(u)rjvt1,r^{t}_{uv}=\sum_{j\in\mathcal{N}(u)}P_{uj}\cdot r^{t-1}_{jv}=\frac{1}{d(u)}\sum_{j\in\mathcal{N}(u)}r^{t-1}_{jv}, (15)

where 𝒩(u)\mathcal{N}(u) is the set of neighbors of node uu. This equation shows that ruvtr^{t}_{uv} is the mean of the (t1)(t-1)-step probabilities rjvt1r^{t-1}_{jv} over all neighbors of uu. The reliance on all neighbors dilutes the transition probability towards distant nodes, reinforcing locality bias.

Lemma 1. In CTWalks, let nodes uu and vv belong to two different communities, and u,vVcu,v\in V_{c} are bridging nodes. The probability that the walk starts from uu and visits vv for the first time at time tt satisfies:

ruvt1d(u)j𝒩(u)rjvt1,r^{t}_{uv}\geq\frac{1}{d(u)}\sum_{j\in\mathcal{N}(u)}r^{t-1}_{jv}, (16)

with equality when all neighbors of uu are bridging nodes (𝒩(u)=𝒩inter(u)\mathcal{N}(u)=\mathcal{N}_{\text{inter}}(u)).

Proof. Partition 𝒩(u)\mathcal{N}(u) into two disjoint subsets:

  • 𝒩inter(u)=𝒩(u)Vc\mathcal{N}_{\text{inter}}(u)=\mathcal{N}(u)\cap V_{c}, the set of inter-community neighbors (bridging nodes),

  • 𝒩intra(u)=𝒩(u)𝒩inter(u)\mathcal{N}_{\text{intra}}(u)=\mathcal{N}(u)-\mathcal{N}_{\text{inter}}(u), the set of intra-community neighbors.

The total probability of transitioning to neighbors of uu can be expressed as:

j𝒩(u)rjvt1=j𝒩inter(u)rjvt1+j𝒩intra(u)rjvt1.\sum_{j\in\mathcal{N}(u)}r^{t-1}_{jv}=\sum_{j\in\mathcal{N}_{\text{inter}}(u)}r^{t-1}_{jv}+\sum_{j\in\mathcal{N}_{\text{intra}}(u)}r^{t-1}_{jv}. (17)

In CTWalk, intra-community transitions cannot reach nodes in different communities. Thus, for any j𝒩intra(u)j\in\mathcal{N}_{\text{intra}}(u), rjvt1=0r^{t-1}_{jv}=0. The total probability simplifies to:

j𝒩(u)rjvt1=j𝒩inter(u)rjvt1.\sum_{j\in\mathcal{N}(u)}r^{t-1}_{jv}=\sum_{j\in\mathcal{N}_{\text{inter}}(u)}r^{t-1}_{jv}. (18)

The transition probability ruvtr^{t}_{uv} from uu to vv in CTWalks is:

ruvt=1|𝒩inter(u)|j𝒩inter(u)rjvt1.r^{t}_{uv}=\frac{1}{|\mathcal{N}_{\text{inter}}(u)|}\sum_{j\in\mathcal{N}_{\text{inter}}(u)}r^{t-1}_{jv}. (19)

Since |𝒩inter(u)|d(u)|\mathcal{N}_{\text{inter}}(u)|\leq d(u), it follows that:

1|𝒩inter(u)|1d(u).\frac{1}{|\mathcal{N}_{\text{inter}}(u)|}\geq\frac{1}{d(u)}. (20)

Substituting this into the Eq.(19) for ruvtr^{t}_{uv}, we obtain:

ruvt=1|𝒩inter(u)|j𝒩inter(u)rjvt11d(u)j𝒩(u)rjvt1.r^{t}_{uv}=\frac{1}{|\mathcal{N}_{\text{inter}}(u)|}\sum_{j\in\mathcal{N}_{\text{inter}}(u)}r^{t-1}_{jv}\geq\frac{1}{d(u)}\sum_{j\in\mathcal{N}(u)}r^{t-1}_{jv}. (21)

Equality holds when all neighbors of uu are bridging nodes (𝒩(u)=𝒩inter(u)\mathcal{N}(u)=\mathcal{N}_{\text{inter}}(u)).

Lemma 1 demonstrates that CTWalks reduces locality bias by prioritizing inter-community transitions for bridging nodes. Unlike traditional random walks, which distribute transition probabilities evenly among all neighbors, CTWalks separates intra- and inter-community contributions, enabling effective exploration of global structures.

We also theoretically establish the connection between CTWalks and matrix factorization, details are provided in Appendix F.

6 Experiments

6.1 Experimental Setting

Baselines and Datasets.

CTWalks is evaluated against six state-of-the-art baselines for continuous-time dynamic graphs, including message passing-based methods (DyRep trivedi2019dyrep , TGAT xu2020inductive , and TGN rossi2020temporal ) and sequential model-based methods (CTDNE nguyen2018continuous , JODIE kumar2019predicting , and CAWs wang2021causal ) . The evaluation is conducted on five real-world datasets spanning diverse domains: social networks, email communications, e-commerce, online education, and student forums. For instance, UCI leskovec2016snap tracks social interactions in a university network; Enron leskovec2016snap captures email exchanges among employees; Taobao zhu2018learning records user-item interactions with encoded action types such as clicks and purchases; MOOC leskovec2016snap logs student activities on educational platforms; and Wikipedia kumar2019predicting is a bipartite interaction network consisting of editor-page and user-post activities. The dataset statistics are summarized in Table 1. Additional details on the baselines and datasets are available in the Appendix D.

Dataset Nodes Temporal Edges Duration (seconds) Interaction Intensity
UCI 1,899 59,835 16,621,303 3.79×1063.79\times 10^{-6}
MOOC 7,144 411,749 2,572,086 4.48×1054.48\times 10^{-5}
Enron 143 62,617 72,932,520 6.50×1056.50\times 10^{-5}
Taobao 64,703 77,436 36,000 6.64×1056.64\times 10^{-5}
Wikipedia 9,227 157,474 2,678,373 1.27×1051.27\times 10^{-5}
Table 1: Dataset statistics. Interaction intensity is calculated as 2|E|/(|V|T)2|E|/(|V|T), where TT is the total time range in seconds, |V||V| and |E||E| are number of nodes and temporal links.
Data Preparation.

The data preparation process involves three key steps. First, all edges in the input temporal graph are sorted by their timestamps to ensure chronological consistency. This step facilitates training on past edges while testing on future edges. Next, the dataset is split into training, validation, and testing sets, adhering to a chronological ratio of 70% (training), 15% (validation), and 15% (testing). Finally, negative sampling is performed by generating edges absent in the original graph to form negative edge sets, ensuring a balanced dataset. Both positive and negative samples are used to train, validate, and test the model. Based on the training data, a Weighted Temporal Graph is constructed to capture the aggregate interaction strength between nodes over time. For nodes in the validation and testing datasets that are absent in the Weighted Temporal Graph:

  • Non-bridging nodes: If all neighbors of the node belong to the same community, the node is assigned to that community.

  • Bridging nodes: If the node’s neighbors belong to multiple communities, the node is assigned to a community with a probability proportional to the sum of the edge weights connecting the node to that community, defined as:

    P(Ci|v)=uN(v)CiwvuuN(v)wvu,P(C_{i}|v)=\frac{\sum_{u\in N(v)\cap C_{i}}w_{vu}}{\sum_{u\in N(v)}w_{vu}}, (22)

    where P(Ci|v)P(C_{i}|v) is the probability of node vv belonging to community CiC_{i}, N(v)N(v) represents the neighbors of vv, and wvuw_{vu} is the edge weight between vv and uu.

Evaluation Tasks.

We assess the performance of CTWalks and baselines on link prediction tasks, which are further categorized into transductive and inductive settings: In the transductive link prediction task, temporal links between all nodes are used for training up to a specific time point, with the remaining links allocated for validation and testing. In the inductive link prediction task, the model is evaluated on its ability to predict links involving nodes unseen during training. Two specific scenarios are considered: 1. New-Old: Interactions between unseen (new) and observed (old) nodes. 2. New-New: Interactions exclusively between unseen nodes. To construct these scenarios, 10% of the nodes are masked during training, and interactions associated with them are removed. Validation and testing sets are limited to interactions involving these masked nodes.

Refer to caption
Figure 3: Illustration of the data preparation process for link prediction. The process includes sorting edges by timestamp, splitting into training, validation, and testing sets, and performing negative sampling to ensure balanced datasets.
Training Details.

For evaluation, we report results using two widely adopted metrics: Area Under the ROC Curve (AUC) and Average Precision (AP) which are detailed in Appendix D. These metrics effectively capture model performance across both transductive and inductive tasks. To ensure robustness, each experiment is repeated five times with different random seeds, and we report the average performance.

Task Methods UCI MOOC Enron Taobao Wikipedia
Inductive new v.s. new DyRep 63.76 ±\pm 4.67 74.07±\pm 1.88 72.74±\pm 0.44 87.62 ±\pm 1.23 67.07 ±\pm 1.26
TGAT 76.33 ±\pm 1.38 70.04 ±\pm 1.35 63.34 ±\pm 1.82 76.99 ±\pm 1.54 90.77 ±\pm 0.95
TGN 79.37 ±\pm 1.49 77.07 ±\pm 1.88 80.92 ±\pm 0.02 84.74 ±\pm 0.44 63.76 ±\pm 4.67
CTDNE 62.93 ±\pm 0.72 72.01 ±\pm 0.29 66.49 ±\pm 1.21 78.01 ±\pm 0.08 62.35 ±\pm 1.10
JODIE 70.31 ±\pm 0.53 75.31 ±\pm 4.14 70.95 ±\pm 0.85 81.53 ±\pm 2.12 72.65 ±\pm 0.63
CAWs 88.12±\pm 0.88 90.10 ±\pm 0.35 96.27 ±\pm 1.16 86.34 ±\pm 2.95 96.36 ±\pm 1.48
CTWalks 90.71 ±\pm 0.08 92.86 ±\pm 0.56 95.20 ±\pm 0.68 96.98 ±\pm 0.41 94.16 ±\pm 0.25
new v.s. old DyRep 93.28 ±\pm 3.01 88.23±\pm 1.35 94.39 ±\pm 0.59 89.36 ±\pm 1.48 76.72 ±\pm 1.36
TGAT 76.33 ±\pm 1.03 67.40 ±\pm 1.71 61.80 ±\pm 1.31 65.24 ±\pm 0.98 95.47±\pm 1.18
TGN 87.13 ±\pm 1.07 72.23 ±\pm 1.20 77.98 ±\pm 0.45 88.39 ±\pm 0.32 90.28 ±\pm 0.96
CTDNE 71.11 ±\pm 0.74 69.10 ±\pm 0.88 64.66 ±\pm 0.41 68.33 ±\pm 0.05 88.39 ±\pm 1.08
JODIE 71.61 ±\pm 0.37 88.20 ±\pm 1.92 85.73 ±\pm 1.36 80.53 ±\pm 2.13 74.78 ±\pm 0.22
CAWs 91.25 ±\pm 0.18 90.30 ±\pm 0.08 93.22 ±\pm 1.28 88.76 ±\pm 1.18 96.19 ±\pm 0.88
CTWalks 95.70 ±\pm 0.17 92.08 ±\pm 0.20 92.52 ±\pm 0.21 93.98 ±\pm 0.44 95.15 ±\pm 0.24
Transductive DyRep 95.23 ±\pm 1.48 90.49 ±\pm 0.24 96.71 ±\pm 0.80 83.11 ±\pm 1.13 77.40 ±\pm 1.79
TGAT 77.67 ±\pm 1.02 72.09 ±\pm 1.51 60.88 ±\pm 1.34 62.88 ±\pm 1.46 96.36 ±\pm 1.21
TGN 83.36 ±\pm 1.23 73.09 ±\pm 0.03 68.12 ±\pm 1.21 87.71 ±\pm 0.04 95.23 ±\pm 0.25
CTDNE 76.89 ±\pm 0.92 73.03 ±\pm 0.32 89.36 ±\pm 0.73 65.08 ±\pm 0.02 92.43 ±\pm 0.27
JODIE 74.63 ±\pm 0.52 90.50 ±\pm 0.85 60.36 ±\pm 0.65 82.02 ±\pm 0.31 89.30 ±\pm 0.22
CAWs 95.25 ±\pm 0.06 94.28 ±\pm 0.29 91.64 ±\pm 0.55 85.88 ±\pm 0.37 98.67 ±\pm 0.27
CTWalks 98.05 ±\pm 0.09 94.33 ±\pm 0.08 92.54 ±\pm 0.58 92.06 ±\pm 0.16 95.14 ±\pm 0.25
Table 2: Transductive and inductive link prediction performances w.r.t. AUC. We use bold font and to highlight the best and second best performances.

6.2 Experimental Results and Discussion

We report the AUC performance of CTWalks and six state-of-the-art baselines across inductive and transductive link prediction tasks on various datasets. Table 2 summarizes the results, highlighting the effectiveness of CTWalks in both inductive and transductive settings.

In the inductive setting, CTWalks achieves remarkable results, particularly for the challenging ”new vs. new” links. On most of datasets, CTWalks significantly outperforms the best-performing baseline CAWs, with improvements of up to 5.14% on the UCI dataset and 2.16% on Taobao. Similarly, in the ”new vs. old” scenario, CTWalks consistently surpasses the baselines, achieving an AUC of 95.70 on UCI and 92.08 on MOOC, compared to the respective baseline bests of 93.28 by DyRep and 90.30 by CAWs. These results highlight CTWalks’ superior generalization ability, particularly for interactions involving previously unseen nodes.

In the transductive setting, CTWalks again demonstrates strong performance, achieving the best or second-best results across all datasets. On the Taobao dataset, CTWalks attains an AUC of 92.06. Similarly, on the MOOC dataset, CTWalks achieves an AUC of 94.33, showcasing its competitive edge even on attributed datasets.

The success of CTWalks can be attributed to its ability to integrate temporal dynamics and structural information effectively, leveraging its community-aware temporal walk encoding to capture intricate graph patterns. Its superior performance across diverse datasets highlights its generalizability and robustness, setting a benchmark for future research in dynamic graph learning.

6.3 Ablation Study

We conducted an ablation study to evaluate the contributions of key components in CTWalks, with results presented in Table 3. The study involved systematically removing essential mechanisms to understand their impact on the performance of CTWalks.

No. Ablation UCI Taobao MOOC
1. original method 92.94 ±\pm 0.21 94.88 ±\pm 0.36 92.07 ±\pm 0.21
2. remove intra-community walk 89.28 ±\pm 0.66 88.45 ±\pm 0.41 85.69 ±\pm 0.31
3. remove inter-community walk 86.90 ±\pm 0.17 87.91 ±\pm 0.39 86.25 ±\pm 1.99
4. remove community walk 85.94 ±\pm 0.85 86.01 ±\pm 0.12 85.12 ±\pm 0.20
5. remove anonymous community label 88.94 ±\pm 0.61 87.01 ±\pm 0.14 86.12 ±\pm 0.30
6. remove continuous evolution 79.38 ±\pm 2.05 84.44 ±\pm 0.56 83.72 ±\pm 0.44
Table 3: Ablation study on CTWalks in terms of AUC scores on all inductive test links.

The first three experiments assess the influence of the community-aware sampling mechanism. When the restrictions on intra-community and inter-community walks are removed, nodes are allowed to perform unrestricted temporal walks, with the selection of the next node determined by time-biased probabilities that prioritize temporally closer events. The resulting performance degradation across these variants underscores the significance of preserving community boundaries. This highlights the effectiveness of the proposed community-aware sampling strategy in capturing both local and global temporal dynamics. To further validate the advantages of community walks in enhancing network embedding learning, we conducted experiments on purely static graphs and compared against classical graph node embedding algorithms. Detailed analyses are provided in Appendix E.

The fourth experiment removes the use of community labels during the anonymized walk encoding. This change prevents the model from distinguishing between structurally similar walks based on their community context, reducing its ability to effectively generalize across different network structures.

Finally by removing the continuous integration step, the cumulative hidden state hih^{\prime}_{i} is no longer updated through integration over temporal intervals. Instead, the encoding process relies solely on the instantaneous update function Eq.(12), where the hidden state at each step is directly computed without accounting for continuous temporal evolution. This simplification significantly impacts performance, particularly on datasets with sparse interactions like UCI, as it eliminates the ability to capture nuanced spatiotemporal dynamics over time intervals, underscoring the critical role of the continuous integration mechanism.

These results validate that each component contributes significantly to the overall performance of CTWalks, with the interplay of community-aware sampling and continuous evolution mechanisms being particularly crucial.

7 Discussion and Conclusion

The effectiveness of our model relies heavily on modularity-based community detection methods, such as Louvain, which ties its performance to the quality of the initial community partitions. In networks with overlapping or poorly-defined communities, this dependency may hinder the efficacy of the community-aware sampling mechanism. To address this, future work could explore adaptive sampling techniques that dynamically refine community boundaries or integrate multi-scale community detection methods to enhance robustness.

Additionally, incorporating continuous temporal dynamics via ODE solvers poses scalability challenges for extremely large graphs. To address these issues, approximate solutions such as logarithmic normalization of time intervals and parallel batch processing (see Appendix C.2) have been employed to reduce computational overhead. Nonetheless, future research should prioritize the development of lightweight integration techniques or hybrid approaches that strike a balance between computational efficiency and modeling fidelity.

CTWalks represents a community-aware, parameter-free framework for representation learning on CTDGs. Unlike existing methods, CTWalks simultaneously incorporates intra- and inter-community dynamics, enabling the effective modeling of mesoscopic structures within graphs. Our contributions span several dimensions: (1) a novel temporal walk sampling strategy that adaptively captures community-driven dynamics without requiring extensive parameter tuning, (2) an anonymized encoding process augmented with community labels for generalization, and (3) a continuous temporal integration mechanism to capture nuanced spatiotemporal dependencies. Experimental results on diverse datasets consistently demonstrate the superiority of CTWalks in both inductive and transductive tasks.

This work lays the groundwork for future research in dynamic graph learning. Directions for future exploration include designing scalable methods for handling larger and more complex CTDGs, developing robust community detection mechanisms, and addressing challenges associated with data bias and ethical use in real-world applications. With its potential to generalize across domains, CTWalks represents a significant step forward in understanding and modeling the evolving dynamics of complex systems.

References

  • (1) E. Rossi, B. Chamberlain, F. Frasca, D. Eynard, F. Monti, and M. Bronstein, “Temporal graph networks for deep learning on dynamic graphs,” in ICML 2020 Workshop on Graph Representation Learning, 2020.
  • (2) D. Xu, C. Ruan, E. Körpeoglu, S. Kumar, and K. Achan, “Inductive representation learning on temporal graphs,” in 8th International Conference on Learning Representations, OpenReview.net, 2020.
  • (3) Y. Wang, Y.-Y. Chang, Y. Liu, J. Leskovec, and P. Li, “Inductive representation learning in temporal networks via causal anonymous walks,” in 9th International Conference on Learning Representations, OpenReview.net, 2021.
  • (4) A. H. Souza, D. Mesquita, S. Kaski, and V. Garg, “Provably expressive temporal graph networks,” in Advances in Neural Information Processing Systems (NeurIPS), 2022.
  • (5) M. Nickel, K. Murphy, V. Tresp, and E. Gabrilovich, “A review of relational machine learning for knowledge graphs,” Proceedings of the IEEE, vol. 104, no. 1, pp. 11–33, 2015.
  • (6) S. Zhang, Z. Yang, M. Zhao, and G. Li, “Gnnrec: Graph neural network-based recommendation,” arXiv preprint arXiv:2001.09061, 2020.
  • (7) X. Wang, X. He, Y. Cao, M. Liu, and T.-S. Chua, “Kgat: Knowledge graph attention network for recommendation,” Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 950–958, 2019.
  • (8) M. Zhang, W. Hu, J. Wang, and Z. Zhang, “Graph representation learning for biological networks,” in ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2021.
  • (9) A. Fout, J. Byrd, B. Shariat, and A. Ben-Hur, “Protein interface prediction using graph convolutional networks,” Advances in neural information processing systems, vol. 30, pp. 6533–6542, 2017.
  • (10) Y. Luo and P. Li, “Neighborhood-aware scalable temporal network representation learning,” in Learning on Graphs Conference, 2022.
  • (11) L. Yu, L. Sun, B. Du, and W. Lv, “Towards better dynamic graph learning: New architecture and unified library,” in Advances in Neural Information Processing Systems, 2023.
  • (12) S. M. Kazemi, R. Goel, K. Jain, I. Kobyzev, A. Sethi, P. Forsyth, and P. Poupart, “Representation learning for dynamic graphs: A survey,” Journal of Machine Learning Research, vol. 21, no. 70, pp. 1–73, 2020.
  • (13) L. Yang, S. Adam, and C. Chatelain, “Dynamic graph representation learning with neural networks: A survey,” arXiv preprint arXiv:2304.05729, 2023.
  • (14) Y. Zhu, F. Lyu, C. Hu, X. Chen, and X. Liu, “Encoder-decoder architecture for supervised dynamic graph learning: A survey,” arXiv preprint arXiv:2203.10480, 2022.
  • (15) X. Jin, Q. Li, J. Zhou, W. Fan, and J. Tang, “Motif-aware representation learning on continuous-time dynamic graphs,” arXiv preprint arXiv:2301.06434, 2023.
  • (16) M. Xia, S. Zhang, P. Zhou, F. Nie, and J. Huang, “Motif-preserving dynamic attributed network embedding,” IEEE Transactions on Knowledge and Data Engineering, vol. 34, no. 12, pp. 5750–5763, 2021.
  • (17) S. Fortunato, “Community detection in graphs,” Physics Reports, vol. 486, no. 3-5, pp. 75–174, 2010.
  • (18) M. E. Newman and M. Girvan, “Finding and evaluating community structure in networks,” Physical review E, vol. 69, no. 2, p. 026113, 2004.
  • (19) C. Liu, J. Liu, and Z. Jiang, “A multiobjective evolutionary algorithm based on similarity for community detection from signed social networks,” IEEE Transactions on Cybernetics, vol. 44, no. 12, pp. 2274 – 2287, 2014.
  • (20) X. Teng, J. Liu, and M. Li, “Overlapping community detection in directed and undirected attributed networks using a multiobjective evolutionary algorithm,” IEEE Trans. Cybernetics, vol. 51, pp. 138 – 150, January 2021.
  • (21) R. T. Q. Chen, Y. Rubanova, J. Bettencourt, and D. Duvenaud, “Neural ordinary differential equations,” in Advances in Neural Information Processing Systems, vol. 31, 2018.
  • (22) M. Jin, Y. Zheng, Y.-F. Li, S. Chen, B. Yang, and S. Pan, “Multivariate time series forecasting with dynamic graph neural odes,” arXiv preprint arXiv:2202.08408, 2022.
  • (23) T. Asikis, L. Böttcher, and N. Antulov-Fantulin, “Neural ordinary differential equation control of dynamics on graphs,” arXiv preprint arXiv:2006.09773, 2020.
  • (24) L. Chen, K. Wu, J. Lou, and J. Liu, “Signed graph neural ordinary differential equation for modeling continuous-time dynamics,” arXiv preprint arXiv:2312.11198, 2023.
  • (25) Anonymous, “Streaming dynamic graph neural networks for continuous-time temporal graphs,” IEEE Transactions on Neural Networks and Learning Systems, 2022.
  • (26) H. Ma, K. Wu, H. Wang, and J. Liu, “Higher-order knowledge transfer for dynamic community detection with great changes,” IEEE Transactions on Evolutionary Computation, vol. 28, no. 1, pp. 90 – 104, 2024.
  • (27) W. Cong, Y. Wu, Y. Tian, M. Gu, Y. Xia, M. Mahdavi, and C.-c. J. Chen, “Dynamic graph representation learning via graph transformer networks,” CoRR, vol. abs/2111.10447, 2021.
  • (28) P. Goyal, S. R. Chhetri, and A. Canedo, “dyngraph2vec: Capturing network dynamics using dynamic graph representation learning,” Knowledge-Based Systems, vol. 187, p. 104816, 2020.
  • (29) A. Pareja, G. Domeniconi, J. Chen, T. Ma, T. Suzumura, H. Kanezashi, T. Kaler, T. B. Schardl, and C. E. Leiserson, “Evolvegcn: Evolving graph convolutional networks for dynamic graphs,” in Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, pp. 5363–5370, AAAI Press, 2020.
  • (30) A. Sankar, Y. Wu, L. Gou, W. Zhang, and H. Yang, “Dysat: Deep neural representation learning on dynamic graphs via self-attention networks,” in Proceedings of the Thirteenth ACM International Conference on Web Search and Data Mining, pp. 519–527, ACM, 2020.
  • (31) J. You, T. Du, and J. Leskovec, “Roland: graph learning framework for dynamic graphs,” in Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 2358–2366, ACM, 2022.
  • (32) Y. Rubanova, R. T. Q. Chen, and D. Duvenaud, “Latent ordinary differential equations for irregularly-sampled time series,” in Advances in Neural Information Processing Systems, vol. 32, 2019.
  • (33) L.-P. Xhonneux, M. Qu, and J. Tang, “Continuous graph neural networks,” Advances in Neural Information Processing Systems, vol. 33, pp. 18508–18519, 2020.
  • (34) F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, and G. Monfardini, “The graph neural network model,” IEEE Transactions on Neural Networks, vol. 20, no. 1, pp. 61–80, 2008.
  • (35) S. Mo, K. Wu, Q. Gao, X. Teng, and J. Liu, “Autosgnn: Automatic propagation mechanism discovery for spectral graph neural networks,” in AAAI, 2025.
  • (36) J. L. Elman, “Finding structure in time,” Cognitive Science, vol. 14, no. 2, pp. 179–211, 1990.
  • (37) S. Kumar, X. Zhang, and J. Leskovec, “Predicting dynamic embedding trajectory in temporal interaction networks,” in Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1269–1278, ACM, 2019.
  • (38) M. E. Newman, “Fast algorithm for detecting community structure in networks,” Physical review E, vol. 69, no. 6, p. 066133, 2004.
  • (39) V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre, “Fast unfolding of communities in large networks,” Journal of Statistical Mechanics: Theory and Experiment, vol. 2008, no. 10, p. P10008, 2008.
  • (40) J. Yang, J. McAuley, and J. Leskovec, “Community detection in networks with node attributes,” ICDM, pp. 1151–1156, 2013.
  • (41) J. Yang and J. Leskovec, “Overlapping community detection at scale: a nonnegative matrix factorization approach,” Proceedings of the sixth ACM international conference on Web search and data mining, pp. 587–596, 2013.
  • (42) B. Du, Z. Wu, and J. Pei, “Community detection in attributed graphs: An embedding approach,” Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, pp. 589–598, 2017.
  • (43) R. Trivedi, M. Farajtabar, P. Biswal, and H. Zha, “Dyrep: Learning representations over dynamic graphs,” in International Conference on Learning Representations (ICLR), 2019.
  • (44) G. Nguyen, J. Lee, R. Rossi, N. Ahmed, E. Koh, and S. Kim, “Continuous-time dynamic network embeddings,” Proceedings of the ACM International Conference on Web Search and Data Mining (WSDM), pp. 373–381, 2018.
  • (45) J. Leskovec and A. Krevl, “Snap datasets: Stanford large network dataset collection.” http://snap.stanford.edu/data, 2014. Accessed: 2016-01-01.
  • (46) H. Zhu, X. Li, P. Zhang, G. Li, J. He, H. Li, and K. Gai, “Learning tree-based deep model for recommender systems,” in Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1079–1088, ACM, 2018.
  • (47) K. Cho, B. Van Merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, “Learning phrase representations using rnn encoder-decoder for statistical machine translation,” in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1724–1734, 2014.
  • (48) R. T. Chen, B. Amos, and M. Nickel, “Neural spatio-temporal point processes,” in International Conference on Learning Representations, 2020.
  • (49) T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” in Proceedings of the International Conference on Learning Representations (ICLR), 2013.
  • (50) A. Grover and J. Leskovec, “node2vec: Scalable feature learning for networks,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 855–864, ACM, 2016.
  • (51) B. Perozzi, R. Al-Rfou, and S. Skiena, “DeepWalk: Online learning of social representations,” in Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 701–710, ACM, 2014.
  • (52) J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q. Mei, “LINE: Large-scale information network embedding,” in Proceedings of the 24th International Conference on World Wide Web, pp. 1067–1077, 2015.
  • (53) L. F. Ribeiro, P. H. Saverese, and D. R. Figueiredo, “struc2vec: Learning node representations from structural identity,” in Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 385–394, ACM, 2017.
  • (54) W. Hamilton, Z. Ying, and J. Leskovec, “Inductive representation learning on large graphs,” in Proceedings of Advances in Neural Information Processing Systems, pp. 1024–1034, 2017.
  • (55) X. Wang, P. Cui, J. Wang, J. Pei, and W. Zhu, “Community preserving network embedding,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31, pp. 203–209, 2017.
  • (56) R. A. Rossi and N. K. Ahmed, “The network data repository with interactive graph analytics and visualization,” 2015. Available at https://networkrepository.com.
  • (57) T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, “Distributed representations of words and phrases and their compositionality,” in Advances in Neural Information Processing Systems (NeurIPS), pp. 3111–3119, 2013.
  • (58) O. Levy and Y. Goldberg, “Neural word embedding as implicit matrix factorization,” in Proceedings of Advances in Neural Information Processing Systems, pp. 2177–2185, 2014.
  • (59) J. Qiu, Y. Dong, H. Ma, J. Li, K. Wang, and J. Tang, “Network embedding as matrix factorization: Unifying DeepWalk, LINE, PTE, and node2vec,” in Proceedings of the 11th ACM International Conference on Web Search and Data Mining, pp. 459–467, 2018.
  • (60) Z. C. Lipton, J. Berkowitz, and C. Elkan, “A critical review of recurrent neural networks for sequence learning,” arXiv preprint arXiv:1506.00019, 2015.
  • (61) Z. Che, S. Purushotham, K. Cho, D. Sontag, and Y. Liu, “Recurrent neural networks for multivariate time series with missing values,” in Scientific reports, vol. 8, pp. 1–12, Nature Publishing Group, 2018.
  • (62) H. Mei and J. Eisner, “Neural hawkes process: A neurally self-modulating multivariate point process,” in Advances in Neural Information Processing Systems, pp. 6754–6764, 2017.

Appendix A Notations and Definitions

Symbol Definition
𝒢={(ei,ti)}i=1N\mathcal{G}=\{(e_{i},t_{i})\}_{i=1}^{N} A continuous-time dynamic graph (CTDG) with time-stamped interactions.
Gw=(V,Ew)G_{w}=(V,E_{w}) The weighted temporal graph with aggregated interaction weights.
wuvw_{uv} Weight of the edge (u,v)(u,v), representing the total number of interactions.
𝒞={C1,C2,,Ck}\mathcal{C}=\{C_{1},C_{2},\ldots,C_{k}\} Partitioned communities in the graph using modularity-based algorithms.
GCi=(VCi,ECi)G_{C_{i}}=(V_{C_{i}},E_{C_{i}}) Subgraph corresponding to community CiC_{i}.
GI=(VI,EI)G_{I}=(V_{I},E_{I}) Inter-community subgraph comprising bridging nodes and edges.
VIV_{I} Set of bridging nodes connecting multiple communities.
N(v)\text{N}(v) Neighbor set of node vv.
WW A temporal walk defined as {(w1,t1),(w2,t2),,(wl,tl)}\{(w_{1},t_{1}),(w_{2},t_{2}),\ldots,(w_{l},t_{l})\}.
Pintra(u|v,t)P_{\text{intra}}(u|v,t) Transition probability for intra-community walks based on temporal constraints.
Pinter(u|v,t)P_{\text{inter}}(u|v,t) Transition probability for inter-community walks based on temporal constraints.
A(w;𝒮u,𝒮v,Cu,Cv)A(w;\mathcal{S}_{u},\mathcal{S}_{v},C_{u},C_{v}) Anonymization operation with community label integration.
𝒲anon\mathcal{W}_{\text{anon}} Anonymized temporal walk incorporating community-aware representations.
hih_{i} Instantaneous hidden state at step ii of the walk.
hih^{\prime}_{i} Cumulative hidden state at step ii updated through continuous integration.
g(hi1,A(wi)g(h^{\prime}_{i-1},A(w_{i}) Instantaneous update function for hidden states.
f(h,t)f(h,t) Temporal evolution function for continuous integration.
titi+1f(hi,t)𝑑t\int_{t_{i}}^{t_{i+1}}f(h_{i},t)\,dt Continuous integration over the temporal interval [ti,ti+1][t_{i},t_{i+1}].
hlh_{l} Final representation of the temporal walk WW.
λ=2|E|/(|V|T)\lambda=2|E|/(|V|T) Average link stream intensity, measuring interaction density over time.
Table 4: Key Notations and Definitions for CTWalks

Additional Notes:

  • The definitions aim to maintain clarity for each symbol and its application within the CTWalks framework.

  • Symbols like g(hi1,A(wi))g(h^{\prime}_{i-1},A(w_{i})) and f(h,t)f(h,t) are parameterized models, where gg handles discrete updates and ff continuous temporal dynamics.

  • The temporal walks WW integrate both intra-community and inter-community dynamics, with anonymization enhancing generalization across unseen nodes.

  • Community-aware sampling eliminates the need for additional parameters to control local/global walks, enhancing model simplicity and robustness.

Appendix B Community Detection Algorithms

B.1 Louvain Method

The Louvain method newman2004fast ; blondel2008fast is an efficient modularity optimization algorithm widely used for community detection in undirected weighted graphs. It operates in two iterative phases:

  1. 1.

    Local Optimization Phase: Initially, each node is treated as a separate community. The algorithm iteratively moves nodes between neighboring communities to maximize the modularity gain ΔQ\Delta Q, defined as:

    ΔQ=in+2kin2m(tot+ktot2m)2(tot2m)2,\Delta Q=\frac{\sum_{\text{in}}+2k_{\text{in}}}{2m}-\left(\frac{\sum_{\text{tot}}+k_{\text{tot}}}{2m}\right)^{2}-\left(\frac{\sum_{\text{tot}}}{2m}\right)^{2}, (23)

    where in\sum_{\text{in}} is the sum of edge weights within a community, kink_{\text{in}} and ktotk_{\text{tot}} are the total weights of edges connected to the node inside the community and in the entire graph, respectively, and mm is the total edge weight in the graph.

  2. 2.

    Community Aggregation Phase: Once no further modularity improvement is possible, each community is treated as a supernode, forming a new graph. The process is repeated until the modularity QQ no longer improves significantly.

The Louvain method is highly scalable and effective, making it suitable for large-scale graphs.

B.2 Other Community Detection Algorithms

Several alternative community detection methods exist, each with unique strengths and weaknesses:

  • Spectral Clustering: Based on the eigenvectors of the graph Laplacian, this method identifies kk communities.

    • Advantages: Effective for small-scale graphs and capable of capturing complex community structures.

    • Disadvantages: Computationally expensive for large graphs.

  • Random Walk-Based Methods (e.g., Infomap): These minimize the description length of random walks to identify communities.

    • Advantages: Naturally handle weighted graphs and hierarchical community structures.

    • Disadvantages: High computational resource requirements.

  • Label Propagation: Communities are formed through the iterative propagation and convergence of node labels.

    • Advantages: Simple and fast, well-suited for dynamic graphs.

    • Disadvantages: Susceptible to local optima and instability.

  • Modularity Maximization: A family of algorithms that optimize modularity QQ. The Louvain method is a prominent example.

  • Deep Learning-Based Methods: Algorithms like DeepWalk and Node2Vec learn node embeddings to capture community structures.

    • Advantages: Integrate node attributes and graph topology information.

    • Disadvantages: Require significant computational resources and are sensitive to hyperparameters.

In our framework, the Louvain method is employed for community detection due to its efficiency and suitability for weighted graphs. This method provides a solid foundation for subsequent node community classification and boundary node detection. Future work may explore advanced algorithms tailored to dynamic weighted graphs for more accurate and robust community detection.

Appendix C Batching and Complexity Analysis

C.1 Function Definitions: gg and ff

In the CTWalks framework, two key functions, gg and ff, are defined to model discrete updates and continuous temporal evolution, respectively. Their complementary roles enable precise representation of both instantaneous and time-dependent dynamics within continuous-time dynamic graphs.

1. Instantaneous Update Function gg

The function gg handles the discrete, step-wise state updates at specific nodes along a temporal walk. For each node wiw_{i}, the hidden state hih_{i} is computed as:

hi=g(hi1,A(wi)),h_{i}=g(h^{\prime}_{i-1},A(w_{i})), (24)

where:

  • hi1h^{\prime}_{i-1} is the cumulative hidden state from the previous step.

  • A(wi)A(w_{i}) is the anonymized representation of node wiw_{i}.

  • gg is instantiated as a Gated Recurrent Unit (GRU) cho2014learning , designed to capture both structural and temporal information.

The GRU-based formulation of gg is given by:

zt=σ(Wzht1+Uzxt+bz),z_{t}=\sigma(W_{z}h_{t-1}+U_{z}x_{t}+b_{z}), (25)
rt=σ(Wrht1+Urxt+br),r_{t}=\sigma(W_{r}h_{t-1}+U_{r}x_{t}+b_{r}), (26)
h~t=tanh(Wh(rtht1)+Uhxt+bh),\tilde{h}_{t}=\tanh(W_{h}(r_{t}\odot h_{t-1})+U_{h}x_{t}+b_{h}), (27)
ht=zth~t+(1zt)ht1,h_{t}=z_{t}\odot\tilde{h}_{t}+(1-z_{t})\odot h_{t-1}, (28)

where:

  • ztz_{t}, rtr_{t}, and h~t\tilde{h}_{t} are the update gate, reset gate, and candidate hidden state, respectively.

  • xt=A(wi)x_{t}=A(w_{i}) represents the input node features.

  • σ()\sigma(\cdot) and tanh()\tanh(\cdot) denote the sigmoid and hyperbolic tangent activation functions, respectively.

  • \odot represents element-wise multiplication.

2. Continuous Temporal Evolution Function ff

The function ff models the continuous evolution of the hidden state over time intervals between nodes. Unlike gg, which incorporates node-specific input features, ff focuses solely on temporal dynamics and acts on the output of gg. The cumulative hidden state hih^{\prime}_{i} at step ii is computed by integrating ff over the time interval [ti,ti+1][t_{i},t_{i+1}]:

hi=titi+1f(hi,t)𝑑t,h^{\prime}_{i}=\int_{t_{i}}^{t_{i+1}}f(h_{i},t)\,dt, (29)

where:

  • hih_{i} is the instantaneous hidden state computed by gg,

  • tit_{i} and ti+1t_{i+1} are the timestamps associated with the current and next nodes in the temporal walk,

  • f(h,t)f(h,t) is parameterized as a GRU-like model, similar to gg, but without external inputs xtx_{t}.

The formulation of ff is as follows:

zt=σ(Wzht1+bz),z_{t}=\sigma(W_{z}h_{t-1}+b_{z}), (30)
rt=σ(Wrht1+br),r_{t}=\sigma(W_{r}h_{t-1}+b_{r}), (31)
h~t=tanh(Wh(rtht1)+bh),\tilde{h}_{t}=\tanh(W_{h}(r_{t}\odot h_{t-1})+b_{h}), (32)
ht=zth~t+(1zt)ht1,h_{t}=z_{t}\odot\tilde{h}_{t}+(1-z_{t})\odot h_{t-1}, (33)

where:

  • ztz_{t}, rtr_{t}, and h~t\tilde{h}_{t} are the update gate, reset gate, and candidate hidden state, respectively,

  • WzW_{z}, WrW_{r}, and WhW_{h} are weight matrices,

  • bzb_{z}, brb_{r}, and bhb_{h} are biases,

  • σ()\sigma(\cdot) and tanh()\tanh(\cdot) denote the sigmoid and hyperbolic tangent activation functions.

By removing external inputs xtx_{t}, ff focuses purely on modeling the temporal continuity of the hidden state, allowing it to integrate spatiotemporal dependencies over time intervals.

Together, gg and ff enable CTWalks to seamlessly integrate spatial and temporal dynamics, delivering a robust framework for learning representations on continuous-time dynamic graphs.

C.2 Batching Analysis

Efficiently encoding temporal walks poses a significant challenge due to the irregular timestamps and varying lengths of walks. For CC temporal walks, each consisting of ll steps, solving C×lC\times l independent ordinary differential equations (ODEs) is computationally expensive. To address this, we employ a time reparameterization strategy, unifying the integration intervals of all walks into a standard range [0,1][0,1]. This unification allows batch processing of the walks, enabling significant computational efficiencychen2020neural .

Reparameterization for Unified Time Intervals

Consider the cc-th walk with timestamps [tstartc,tendc][t_{\text{start}}^{c},t_{\text{end}}^{c}], where tstartct_{\text{start}}^{c} is the time of the first node in the walk, and tendct_{\text{end}}^{c} is the time of the last node. We reparameterize the time tt within this interval using a new variable s[0,1]s\in[0,1], defined as:

s=ttstartctendctstartc,thus,t=s(tendctstartc)+tstartc.s=\frac{t-t_{\text{start}}^{c}}{t_{\text{end}}^{c}-t_{\text{start}}^{c}},\quad\text{thus,}\quad t=s\cdot(t_{\text{end}}^{c}-t_{\text{start}}^{c})+t_{\text{start}}^{c}. (34)

Under this transformation, the instantaneous hidden state hth_{t} at time tt is reformulated as h~s\tilde{h}_{s}, such that:

h~s=ht=hs(tendctstartc)+tstartc.\tilde{h}_{s}=h_{t}=h_{s\cdot(t_{\text{end}}^{c}-t_{\text{start}}^{c})+t_{\text{start}}^{c}}. (35)

This allows all walks, regardless of their original time intervals, to operate within a common temporal scale of [0,1][0,1].

Reformulated ODE for Batch Processing

The cumulative hidden state for the cc-th walk is computed by solving an ODE over the interval [tstartc,tendc][t_{\text{start}}^{c},t_{\text{end}}^{c}]. After reparameterization, the ODE is reformulated for h~s\tilde{h}_{s} as follows:

dh~sds=f(ht,t)dtds=f(h~s,s(tendctstartc)+tstartc)(tendctstartc),\frac{d\tilde{h}_{s}}{ds}=f(h_{t},t)\cdot\frac{dt}{ds}=f(\tilde{h}_{s},s\cdot(t_{\text{end}}^{c}-t_{\text{start}}^{c})+t_{\text{start}}^{c})\cdot(t_{\text{end}}^{c}-t_{\text{start}}^{c}), (36)

where dtds=tendctstartc\frac{dt}{ds}=t_{\text{end}}^{c}-t_{\text{start}}^{c}.

The initial and final states are transformed as:

h~0=htstartc,h~1=htendc.\tilde{h}_{0}=h_{t_{\text{start}}^{c}},\quad\tilde{h}_{1}=h_{t_{\text{end}}^{c}}. (37)

The solution for the cumulative hidden state at the end of the walk becomes:

htendc=ODESolve(htstartc,f,tstartc,tendc)=ODESolve(h~0,f~,0,1).h_{t_{\text{end}}^{c}}=\text{ODESolve}(h_{t_{\text{start}}^{c}},f,t_{\text{start}}^{c},t_{\text{end}}^{c})=\text{ODESolve}(\tilde{h}_{0},\tilde{f},0,1). (38)

Here, the reparameterized function f~\tilde{f} ensures that all walks are solved over a common interval, enabling efficient batch computation.

Parallelism Across Walks

While each walk still involves step-wise computations within the reparameterized interval, the unification of time intervals to [0,1][0,1] allows a single solver instance to process multiple walks in parallel. This is achieved by stacking the dynamics of all walks into a joint system:

dh~jointds=[f1(h~1,sΔt1+tstart1)Δt1f2(h~2,sΔt2+tstart2)Δt2fC(h~C,sΔtC+tstartC)ΔtC],\frac{d\tilde{h}_{\text{joint}}}{ds}=\begin{bmatrix}f_{1}(\tilde{h}_{1},s\cdot\Delta t_{1}+t_{\text{start}}^{1})\cdot\Delta t_{1}\\ f_{2}(\tilde{h}_{2},s\cdot\Delta t_{2}+t_{\text{start}}^{2})\cdot\Delta t_{2}\\ \vdots\\ f_{C}(\tilde{h}_{C},s\cdot\Delta t_{C}+t_{\text{start}}^{C})\cdot\Delta t_{C}\end{bmatrix}, (39)

where Δtc=tendctstartc\Delta t_{c}=t_{\text{end}}^{c}-t_{\text{start}}^{c}. The solver processes this joint system, reducing the computational overhead from CC independent solver calls to a single batched solver call.

Batch Processing Efficiency

For walks with l=1l=1, the reparameterization directly enables batch processing since the time interval for all walks is normalized to [0,1][0,1]. However, for walks with l>1l>1, each step within the walk corresponds to a distinct interval, making true parallelism challenging. Solving multiple sub-intervals within each walk requires sequential processing.

To fully enable parallel processing for l>1l>1, further strategies, such as normalizing intermediate step intervals or discretizing the ODE solutions, could be employed. These approaches remain an open area for further optimization.

Handling Large Time Intervals

In real-world scenarios, large time intervals Δtc\Delta t_{c} can cause numerical instability. To mitigate this, we normalize the intervals while preserving relative temporal differences using a logarithmic transformation:

Δtscaled=log10(Δt+1),\Delta t_{\text{scaled}}=\log_{10}(\Delta t+1), (40)

ensuring that smaller intervals retain fine-grained dynamics while larger intervals are compressed to a manageable scale.

This batching mechanism combines time reparameterization and joint ODE solving, significantly improving computational efficiency and enabling scalable representation learning on dynamic graphs.

C.3 Complexity Analysis

We analyze the time complexity of the main components in CTWalks, including community detection, temporal walk sampling, anonymization, continuous evolution, and instantaneous updates. The detailed breakdown is as follows:

Community Detection.

Community detection is performed using a modularity-based algorithm, such as Louvain, which has a time complexity of O(|E|logn)O(|E|\log n), where |E||E| is the number of edges and nn is the number of nodes. This step is executed once as preprocessing.

Temporal Walk Sampling.

In each batch, BCBC walks are generated, where BB is the batch size (number of root nodes) and CC is the number of walks per root node. The sampling process involves ll steps per walk, with each step querying neighbors at a complexity of O(ks)O(k_{s}), where ksk_{s} is the average number of neighbors. The total complexity for temporal walk sampling is O(BCks)O(BCk_{s}).

Walk Anonymization.

For anonymization, each walk requires O(l)O(l) operations to identify unique nodes and assign positional encodings. Across BCBC walks, the total complexity becomes O(BCl)O(BCl).

Continuous Evolution.

Continuous evolution integrates the temporal evolution function over time intervals using an ODE solver, requiring FF function evaluations per step. For BCBC walks with ll steps, the total complexity is O(BClFdh2)O(BClFd_{h}^{2}). Using the batch processing optimization introduced in Section B.1, this is reduced to O(lFdh2)O(lFd_{h}^{2}).

Instantaneous Updates.

The instantaneous updates compute the hidden state at each step using a parameterized function, resulting in a complexity of O(ldh2)O(ld_{h}^{2}).

Subtotal (Sampling and Anonymization).

The combined complexity for sampling and anonymization, including temporal walk sampling and walk anonymization, is:

O(BC(ks+l)).O(BC(k_{s}+l)). (41)

Subtotal (Encoding).

The combined complexity for continuous evolution and instantaneous updates is:

O(lFdh2).O(lFd_{h}^{2}). (42)
Total Complexity.

Combining all components, the total time complexity of CTWalks per epoch is:

O(|E|logn)+O(BC(ks+l))+O(lFdh2).O(|E|\log n)+O(BC(k_{s}+l))+O(lFd_{h}^{2}). (43)

Table 5 summarizes the complexity of each component.

Component Time Complexity
Community Detection O(|E|logn)O(|E|\log n)
Temporal Walk Sampling O(BCks)O(BCk_{s})
Walk Anonymization O(BCl)O(BCl)
Subtotal (Sampling and Anonymization) O(BC(ks+l))O(BC(k_{s}+l))
Continuous Evolution O(lFdh2)O(lFd_{h}^{2})
Instantaneous Updates O(ldh2)O(ld_{h}^{2})
Subtotal (Encoding) O(lFdh2)O(lFd_{h}^{2})
Total O(|E|logn)+O(BC(ks+l))+O(lFdh2)O(|E|\log n)+O(BC(k_{s}+l))+O(lFd_{h}^{2})
Table 5: Time Complexity Analysis of CTWalks

Appendix D Additional Experimental Details

D.1 Dataset Details and Preprocessing

We evaluate our method on five real-world datasets across diverse domains. The details are as follows:

Data Preprocessing: The following steps ensure consistency and facilitate evaluation:

  1. 1.

    All temporal edges are sorted chronologically to maintain temporal consistency.

  2. 2.

    Edges are split into training (70%), validation (15%), and testing (15%) sets.

  3. 3.

    Negative sampling is performed to generate edges absent in the original graph, ensuring a balanced dataset.

D.2 Baselines and Hyperparameter Tuning

Baseline Methods:

To evaluate the performance of our proposed method, we compare it with six state-of-the-art baselines specifically designed for continuous-time dynamic graphs (CTDGs):

  • CTDNE: CTDNE extends static network embedding techniques to CTDGs by leveraging temporal random walks and a skip-gram model to learn node representations. This method captures the temporal dependency of interactions.666CTDNE repository: https://github.com/LogicJake/CTDNE

  • JODIE: JODIE employs two mutually influenced recurrent neural networks (RNNs) to update the latent states of interacting nodes. The model is capable of predicting future embedding trajectories based on past interactions.777JODIE repository: https://github.com/claws-lab/jodie

  • DyRep: DyRep integrates sequence modeling with an attentive message-passing mechanism, incorporating 2-hop temporal neighborhood information to produce time-aware embeddings. The loss function is built upon temporal point processes.888DyRep repository: https://github.com/uoguelph-mlrg/LDG

  • TGAT: TGAT applies temporal encoding via random Fourier time encodings and attentively aggregates temporal neighborhood information to generate embeddings. The model supports multi-hop temporal message aggregation.999TGAT repository: https://github.com/StatsDLMathsRecomSys/Inductive-representation-learning-on-temporal-graphs

  • TGN: TGN proposes a memory-based message-passing framework, which updates node memories dynamically while combining key designs from JODIE and TGAT to enhance its capability to model temporal interactions.101010TGN repository: https://github.com/twitter-research/tgn

  • CAWs: CAWs encode and aggregate anonymous temporal walks using RNNs, followed by pooling mechanisms to generate temporal node embeddings. Its community-aware design enables it to capture complex temporal dynamics.111111CAWs repository: https://github.com/snap-stanford/CAW

Hyperparameter Tuning:

To ensure a fair comparison, we carefully adapt each baseline to our unified evaluation framework. Key hyperparameters are tuned through grid search to optimize model performance. Below, we summarize the tuning strategies for each method:

  • TGAT: Neighborhood Sampling Degree is tuned from {8, 16, 32, 64}; Model Layers are searched over {1, 2, 3}; Attention Heads (default product attention) are tuned from {2, 4, 6}; and Embedding Dimensions are tuned from {16, 32, 64, 100}.

  • JODIE: Embedding Dimensions are tuned from {32, 64, 128, 256}.

  • DyRep: Neighborhood Sampling Degree is tuned from {10, 16, 32, 64}, and Attention Layers are tuned over {1, 2, 3}.

  • TGN: Attention Heads are tuned from {2, 4, 8}; Embedding Dimensions for nodes, time, and messages are tuned over {16, 32, 64, 100}; and Memory Dimensions are searched over {4, 16, 32, 64, 172}.

  • CAWs: Walk Length is tuned over {1, 2, 3}; Number of Walks is tuned from {16, 32, 64, 128}; and Time Decay Factor is tuned over {10-7, 10-6, 10-5, 10-4}. Walk pooling is set to summation for consistency and simplicity.

  • CTDNE: Skip-gram Window Size is tuned from {3, 5, 10}.

Method Hyperparameter Search Range Optimal Value
TGAT Neighborhood Sampling Degree {8, 16, 32, 64} 32
Model Layers {1, 2, 3} 2
Attention Heads {2, 4, 6} 4
Embedding Dimensions {32, 64, 128, 256} 256
JODIE Embedding Dimensions {32, 64, 128, 256} 256
DyRep Neighborhood Sampling Degree {10, 16, 32, 64} 16
Attention Layers {1, 2, 3} 2
TGN Attention Heads {2, 4, 8} 4
Embedding Dimensions {32, 64, 128, 256} 256
Memory Dimensions {4, 16, 32, 64, 172} 64
CAWs Walk Length {1, 2, 3} 3
Number of Walks {16, 32, 64} 64
Time Decay Factor {10-7, 10-6, 10-5, 10-4} 10-6
CTDNE Skip-gram Window Size {3, 5, 10} 5
Table 6: Hyperparameter search ranges and optimal values for baseline methods.
Implementation Details:

All baselines are implemented using publicly available codebases. We ensure consistency in training and evaluation settings, including dataset splits and metrics, to enable fair comparisons. For additional implementation details, such as the specific training configurations and runtime environments, refer to Appendix C.3.

D.3 Implementation Details

Code Availability:

Our implementation of CTWalks is publicly available at https://github.com/leonyuhe/CTWalks. The repository includes detailed instructions for dataset preparation, model training, and hyperparameter tuning.

General Training Settings:

CTWalks is trained across all datasets using the following settings:

  • Optimizer: Adam with a learning rate of 10410^{-4} and batch size of 32.

  • Early Stopping: Training stops if validation performance does not improve for three consecutive epochs.

  • Epoch Limit: Maximum 50 epochs.

Hyperparameter Tuning:

To ensure optimal performance, key hyperparameters controlling walk sampling are tuned via grid search. The following summarizes the hyperparameter configurations for all datasets:

  • Walk Length (ll): {1, 2, 3}

  • Number of Walks per Node (CC): {16, 32, 64}

  • ODE Solver: Fixed-step Runge-Kutta 3/8 method with a step size of 0.125.

Evaluation Tasks:

  • Transductive: Models are trained on all edges up to a specific time point and tested on future edges.

  • Inductive: Evaluates the ability to predict links involving unseen nodes. Two scenarios are considered:

    1. 1.

      New-Old: Interactions between unseen and observed nodes.

    2. 2.

      New-New: Interactions exclusively between unseen nodes.

Hardware Configuration: Experiments are conducted on an Ubuntu Linux server equipped with four NVIDIA GeForce RTX 3090 GPUs, two Intel(R) Xeon(R) Silver 4210 CPUs (2.20GHz), and 252GB of RAM. Each experiment is repeated five times, and the average results are reported.

Evaluation Metrics:

  • AUC (Area Under the ROC Curve): Measures the quality of predictions.

  • AP (Average Precision): Evaluates the precision-recall relationship, detail in Table 7.

Task Methods UCI MOOC Enron Taobao Wikipedia
Inductive new v.s. new DyRep 64.55 ±\pm 4.22 74.93 ±\pm 1.77 73.41 ±\pm 0.41 88.33 ±\pm 1.15 67.89 ±\pm 1.21
TGAT 77.20 ±\pm 1.32 71.15 ±\pm 1.27 64.11 ±\pm 1.69 77.45 ±\pm 1.45 91.32 ±\pm 0.89
TGN 80.28 ±\pm 1.39 78.31 ±\pm 1.77 81.78 ±\pm 0.02 85.34 ±\pm 0.41 64.56 ±\pm 4.33
CTDNE 63.75 ±\pm 0.67 72.91 ±\pm 0.27 67.14 ±\pm 1.15 78.77 ±\pm 0.08 63.09 ±\pm 1.05
JODIE 71.45 ±\pm 0.49 76.08 ±\pm 4.00 71.66 ±\pm 0.81 82.18 ±\pm 2.04 73.54 ±\pm 0.61
CAWs 88.78 ±\pm 0.82 90.89 ±\pm 0.33 96.95 ±\pm 1.09 86.99 ±\pm 2.80 96.91 ±\pm 1.41
CTWalks 91.26 ±\pm 0.08 93.72 ±\pm 0.52 95.91 ±\pm 0.62 97.75 ±\pm 0.38 94.94 ±\pm 0.24
new v.s. old DyRep 94.11 ±\pm 2.91 89.07 ±\pm 1.29 95.18 ±\pm 0.55 90.11 ±\pm 1.42 77.45 ±\pm 1.27
TGAT 77.10 ±\pm 0.99 68.40 ±\pm 1.62 62.11 ±\pm 1.24 65.93 ±\pm 0.93 96.12 ±\pm 1.10
TGN 87.78 ±\pm 1.01 73.02 ±\pm 1.14 78.56 ±\pm 0.42 89.09 ±\pm 0.30 90.96 ±\pm 0.89
CTDNE 71.89 ±\pm 0.70 70.10 ±\pm 0.83 65.44 ±\pm 0.38 69.02 ±\pm 0.05 89.06 ±\pm 1.02
JODIE 72.55 ±\pm 0.35 89.00 ±\pm 1.81 86.54 ±\pm 1.29 81.23 ±\pm 2.03 75.66 ±\pm 0.20
CAWs 92.10 ±\pm 0.16 91.14 ±\pm 0.07 94.10 ±\pm 1.22 89.56 ±\pm 1.10 96.99 ±\pm 0.81
CTWalks 96.35 ±\pm 0.16 93.81 ±\pm 0.18 93.17 ±\pm 0.19 94.78 ±\pm 0.41 96.01 ±\pm 0.22
Transductive DyRep 96.11 ±\pm 1.42 91.22 ±\pm 0.22 97.41 ±\pm 0.75 83.79 ±\pm 1.07 78.23 ±\pm 1.70
TGAT 78.11 ±\pm 0.99 72.95 ±\pm 1.43 61.32 ±\pm 1.27 63.76 ±\pm 1.38 96.89 ±\pm 1.11
TGN 84.01 ±\pm 1.16 74.21 ±\pm 0.03 69.01 ±\pm 1.14 88.21 ±\pm 0.04 95.93 ±\pm 0.23
CTDNE 77.55 ±\pm 0.86 74.03 ±\pm 0.29 90.08 ±\pm 0.68 65.78 ±\pm 0.02 93.11 ±\pm 0.26
JODIE 75.29 ±\pm 0.49 91.40 ±\pm 0.78 61.11 ±\pm 0.62 82.77 ±\pm 0.29 90.44 ±\pm 0.21
CAWs 95.97 ±\pm 0.05 95.11 ±\pm 0.27 92.32 ±\pm 0.51 86.88 ±\pm 0.34 99.21 ±\pm 0.26
CTWalks 98.65 ±\pm 0.08 95.88 ±\pm 0.07 93.35 ±\pm 0.54 92.98 ±\pm 0.13 97.88 ±\pm 0.23
Table 7: Transductive and inductive link prediction performances w.r.t. AP. We use bold font and to highlight the best and second-best performances.

Appendix E Experiment on Purely Static Graphs

To further validate the benefits of community walks in enhancing network embedding learning, we conducted experiments on purely static graphs, focusing on the community structure without considering temporal information. Specifically, we applied our method, CTWalks, which incorporates community walks to generate node sequences, and subsequently used the word2Vec mikolov2013efficient framework to generate embeddings from these sequences. By deliberately excluding temporal data and leveraging word2Vec, our approach aims to highlight the effectiveness of CTWalks in capturing community structures and achieving competitive performance on static graphs compared to classical graph embedding algorithms.

Experimental Setup

We evaluated CTWalks (CTW) against six widely recognized baseline methods: node2vec (N2V) grover2016node2vec , DeepWalk (DW) perozzi2014deepwalk , LINE tang2015line , Struc2Vec (S2V) ribeiro2017struc2vec , GraphSage (GS) hamilton2017graphsage , and M-NMF (MN) wang2017community . To ensure a fair comparison, the implementations of these methods were obtained from publicly available repositories. Specifically, Node2vec, DeepWalk, and M-NMF were implemented using the Karate Club library121212https://github.com/benedekrozemberczki/karateclub. The implementation of LINE was sourced from its official repository131313https://github.com/tangjianpku/LINE. Similarly, Struc2Vec141414https://github.com/leoribeiro/struc2vec and GraphSage151515https://github.com/williamleif/GraphSAGE were obtained from their respective repositories. This approach ensured the use of standardized and well-maintained implementations for each baseline.

The experiments were conducted on six datasets from the Network Repository nr2015 and SNAP leskovec2016snap , spanning diverse domains such as social, biological, and ecological networks. All graphs were treated as static by disregarding temporal information. Each graph was split into 70% edges for training and 30% for testing. Node embeddings were learned on the training graph, while the testing edges were excluded to ensure unbiased evaluation.

Table 8: Dataset Statistics
Dataset |V||V| |E||E| davgd_{\text{avg}} KavgK_{\text{avg}} TfracT_{\text{frac}}
fb-pages-food 620 2091 6.7 0.33 0.22
ego-Facebook 4039 88234 43.69 0.60 0.26
soc-hamsterster 2000 16097 16.10 0.5375 0.2314
aves-weaver-social 117 304 5.20 0.6924 0.5747
bio-DM-LC 483 997 4.13 0.1047 0.0551
bio-WormNet-v3 2274 78328 68.89 0.8390 0.7211

Notation: |V||V|: Number of nodes; |E||E|: Number of edges; davgd_{\text{avg}}: Average node degree; KavgK_{\text{avg}}: Average clustering coefficient; TfracT_{\text{frac}}: Fraction of closed triangles.

The hyperparameters for CTW, DW, and N2V were set to Lw=80L_{w}=80 (walk length), R=10R=10 (number of walks per node), and d=128d=128 (embedding dimension). Edge embeddings were computed using the Hadamard product of node embeddings. The performance was evaluated using logistic regression and two metrics: AUC (Area Under the Curve) and Average Precision (AP). Each experiment was repeated ten times with different random seeds, and the average results are reported.

E.1 Results and Analysis

Link Prediction.

The results of the link prediction task are presented in Table 10 (AUC) and Table 10 (AP). CTW consistently outperformed baseline methods across all datasets. Notably, CTW demonstrated significant improvements on datasets with strong community structures, such as soc-hamsterster and fb-pages-food. For instance:

  • On bio-WormNet-v3, CTW achieved an AUC of 0.9915, outperforming the second-best method, M-NMF, by 3.2%.

  • On ego-Facebook, CTW achieved an AP of 0.9891, showcasing robust performance in large-scale social networks.

  • On datasets with smaller community structures, such as aves-weaver-social, CTW achieved a notable improvement, with AUC gains of up to 5.1% over the best baseline.

Dataset CTW N2V DW LINE S2V GS MN
fb-pages-food 0.9319 0.8764 0.8632 0.8121 0.8573 0.8902 0.8964
ego-Facebook 0.9885 0.9452 0.9203 0.8947 0.9345 0.9541 0.9721
soc-hamsterster 0.9122 0.8610 0.8423 0.8045 0.8534 0.8812 0.8734
aves-weaver-social 0.9114 0.8732 0.8517 0.8245 0.8651 0.8832 0.8903
bio-DM-LC 0.9337 0.8810 0.8743 0.8547 0.8723 0.8901 0.9121
bio-WormNet-v3 0.9915 0.9543 0.9402 0.9134 0.9624 0.9742 0.9603
Table 9: Average AUC Scores for Link Prediction on Static Graphs. Bold numbers represent the best results.
Dataset CTW N2V DW LINE S2V GS MN
fb-pages-food 0.9421 0.8920 0.8814 0.8289 0.8712 0.9045 0.9102
ego-Facebook 0.9891 0.9534 0.9351 0.9107 0.9443 0.9642 0.9813
soc-hamsterster 0.9215 0.8735 0.8602 0.8224 0.8698 0.8914 0.8803
aves-weaver-social 0.9217 0.8801 0.8655 0.8371 0.8764 0.8942 0.9015
bio-DM-LC 0.9440 0.8954 0.8883 0.8651 0.8850 0.9025 0.9234
bio-WormNet-v3 0.9961 0.9602 0.9456 0.9175 0.9723 0.9815 0.9673
Table 10: Average AP Scores for Link Prediction on Static Graphs. Bold numbers represent the best results.

Appendix F Theoretical Analysis

We interpret CTWalks from the perspective of matrix factorization. We demonstrate that the two-layer random walk process in CTWalks corresponds to a novel matrix factorization form, integrating intra- and inter-community dynamics through a hierarchical transition mechanism. These analyses highlight CTWalks’ theoretical advantages in encoding community-aware structures, providing insights into its effectiveness in network representation learning.

Analysis 2: Matrix Factorization Perspective of CTWalks

The Skip-Gram with negative sampling (SGNS) framework mikolov2013distributed in network embedding methods such as DeepWalk perozzi2014deepwalk and node2vec grover2016node2vec has been shown to implicitly factorize a pointwise mutual information (PMI) matrix levy2014neural ; Qiu:2018 . CTWalks extends this perspective by incorporating a hierarchical, two-layer random walk that explicitly encodes both intra-community and inter-community transitions, resulting in a novel matrix factorization form.

CTWalks Transition Matrices

In CTWalk, the probability of transitioning between two nodes during the random walk is governed by two distinct components: intra-community transitions (within communities) and inter-community transitions (across communities). To model this, we introduce two matrices: 1) a block-diagonal matrix MI|V|×|V|M_{I}\in\mathbb{R}^{|V|\times|V|} that represents transitions within individual communities, and 2) an extended matrix MC|V|×|V|M_{C}\in\mathbb{R}^{|V|\times|V|} that captures transitions between communities via bridging nodes.

Intra-community transition matrix (MIM_{I}).

The matrix MIM_{I} represents the normalized intra-community transitions, structured as a block-diagonal matrix. Each block Di1AiD_{i}^{-1}A_{i} corresponds to a community CiC_{i}, where:

  • Ai|Vi|×|Vi|A_{i}\in\mathbb{R}^{|V_{i}|\times|V_{i}|} is the adjacency matrix of the subgraph Gi=(Vi,Ei)G_{i}=(V_{i},E_{i}), induced by the set of nodes ViV_{i} and edges EiE_{i} within community CiC_{i}.

  • Di1D_{i}^{-1} is the inverse degree matrix of AiA_{i}, ensuring row normalization of transition probabilities within each community.

The resulting block-diagonal structure of MIM_{I} is expressed as:

MI=(D11A1000D21A2000Dk1Ak).M_{I}=\begin{pmatrix}D_{1}^{-1}A_{1}&0&\cdots&0\\ 0&D_{2}^{-1}A_{2}&\cdots&0\\ \vdots&\vdots&\ddots&\vdots\\ 0&0&\cdots&D_{k}^{-1}A_{k}\end{pmatrix}. (44)

This formulation confines random walks within individual communities C1,C2,,CkC_{1},C_{2},\dots,C_{k}, effectively capturing the dense intra-community relationships while isolating transitions between distinct communities.

Inter-community transition matrix (MCM_{C}).

The matrix MCM_{C} captures transitions between inter-community bridging nodes. To construct MCM_{C}, we first normalize the adjacency matrix AcA_{c} of Gc=(Vc,Ec)G_{c}=(V_{c},E_{c}), where VcV_{c} is the set of bridging nodes. The normalized matrix is defined as:

Ac(n)=Dc1Ac,A_{c}^{(n)}=D_{c}^{-1}A_{c}, (45)

where DcD_{c} is the degree matrix of AcA_{c}, ensuring that the rows of Ac(n)A_{c}^{(n)} sum to one. Here, Ac(n)|Vc|×|Vc|A_{c}^{(n)}\in\mathbb{R}^{|V_{c}|\times|V_{c}|} represents the normalized transition probabilities restricted to the bridging nodes.

To integrate MCM_{C} with MIM_{I}, we align the node order in MCM_{C} to match the node ordering used in MIM_{I}. Specifically, the rows and columns of MCM_{C} are arranged such that:

  • Nodes within the same community are grouped together, maintaining the block-diagonal structure of MIM_{I},

  • Rows and columns corresponding to non-bridging nodes wVcw\notin V_{c} are filled with zeros.

This alignment ensures that both MCM_{C} and MIM_{I} are compatible for subsequent matrix operations, such as multi-step transition summation. The resulting MCM_{C} can be visualized as follows:

MC=(0000000Ac(n)[v2,v2]00Ac(n)[v2,v5]00000000000000Ac(n)[v5,v2]00Ac(n)[v5,v5]0000000).M_{C}=\begin{pmatrix}0&0&0&0&0&\cdots&0\\ 0&A_{c}^{(n)}[v_{2},v_{2}]&0&0&A_{c}^{(n)}[v_{2},v_{5}]&\cdots&0\\ 0&0&0&0&0&\cdots&0\\ 0&0&0&0&0&\cdots&0\\ 0&A_{c}^{(n)}[v_{5},v_{2}]&0&0&A_{c}^{(n)}[v_{5},v_{5}]&\cdots&0\\ \vdots&\vdots&\vdots&\vdots&\vdots&\ddots&\vdots\\ 0&0&0&0&0&\cdots&0\end{pmatrix}. (46)

This design ensures that MCM_{C} and MIM_{I} remain consistent in terms of node alignment. By isolating inter-community transitions and aligning them to the global node order of MIM_{I}, MCM_{C} facilitates a seamless integration of intra-community and inter-community dynamics in the CTWalks framework.

Matrix Factorization of CTWalks

Lemma 2: Let MCM_{C} and MIM_{I} represent the inter-community and intra-community transition matrices, respectively. The embeddings learned by CTWalks using Skip-Gram with negative sampling correspond to a low-rank factorization of the following shifted Pointwise Mutual Information (PMI) matrix:

log(vol(G)(1Tr=1T(MCr+MIr))D1)logk,\log\left(\text{vol}(G)\cdot\left(\frac{1}{T}\sum_{r=1}^{T}\left(M_{C}^{r}+M_{I}^{r}\right)\right)D^{-1}\right)-\log k, (47)

where:

  • vol(G)\text{vol}(G) = ijAi,j\sum_{i}\sum_{j}A_{i,j} is the volume of graph GG,

  • TT is the context window size,

  • MCrM_{C}^{r} and MIrM_{I}^{r}: The rr-step transition probability matrices for MCM_{C} (inter-community transitions) and MIM_{I} (intra-community transitions), respectively,

  • D1D^{-1} is the inverse degree matrix of GG,

  • kk is the negative sampling parameter.

Proof. We begin with the Skip-Gram objective, which seeks to maximize the probability of observing a context node cc given a target node ww. Under the CTWalks framework, random walks are independently applied to the inter-community transition matrix MCM_{C} and the intra-community transition matrix MIM_{I}.

Step 1: Joint Probability.

The joint probability P(w,c)P(w,c) of observing a node ww with context cc is defined by the stationary distribution π(w)\pi(w) and the TT-step transition probabilities:

P(w,c)π(w)1Tr=1T(MCr+MIr)[w,c],P(w,c)\approx\pi(w)\cdot\frac{1}{T}\sum_{r=1}^{T}\left(M_{C}^{r}+M_{I}^{r}\right)[w,c], (48)

where π(w)\pi(w) is the stationary probability of node ww, given by:

π(w)=d(w)vol(G).\pi(w)=\frac{d(w)}{\text{vol}(G)}. (49)
Step 2: Marginal Probabilities.

The marginal probability P(w)P(w) of node ww under the stationary distribution is given by:

P(w)=π(w)=d(w)vol(G).P(w)=\pi(w)=\frac{d(w)}{\text{vol}(G)}. (50)

Similarly, the marginal probability P(c)P(c) of context cc is:

P(c)=π(c)=d(c)vol(G).P(c)=\pi(c)=\frac{d(c)}{\text{vol}(G)}. (51)
Step 3: Pointwise Mutual Information (PMI).

The PMI between target node ww and context node cc is defined as:

PMI(w,c)=logP(w,c)P(w)P(c).PMI(w,c)=\log\frac{P(w,c)}{P(w)P(c)}. (52)

Substituting P(w,c)P(w,c), P(w)P(w), and P(c)P(c) into Eq.(52), we obtain:

PMI(w,c)logπ(w)1Tr=1T(MCr+MIr)[w,c]π(w)π(c).PMI(w,c)\approx\log\frac{\pi(w)\cdot\frac{1}{T}\sum_{r=1}^{T}\left(M_{C}^{r}+M_{I}^{r}\right)[w,c]}{\pi(w)\pi(c)}. (53)

Simplifying:

PMI(w,c)log(1Tr=1T(MCr+MIr)[w,c]vol(G)d(c)).PMI(w,c)\approx\log\left(\frac{1}{T}\sum_{r=1}^{T}\left(M_{C}^{r}+M_{I}^{r}\right)[w,c]\cdot\frac{\text{vol}(G)}{d(c)}\right). (54)
Step 4: Matrix Representation.

We can represent the PMI matrix in the following matrix form:

log(vol(G)(1Tr=1T(MCr+MIr))D1).\log\left(\text{vol}(G)\cdot\left(\frac{1}{T}\sum_{r=1}^{T}\left(M_{C}^{r}+M_{I}^{r}\right)\right)D^{-1}\right). (55)
Step 5: SGNS Objective.

The Skip-Gram with negative sampling introduces a shift logk-\log k, where kk is the number of negative samples. Therefore, the final form of the factorization becomes:

log(vol(G)(1Tr=1T(MCr+MIr)D1))logk.\log\left(\text{vol}(G)\cdot\left(\frac{1}{T}\sum_{r=1}^{T}\left(M_{C}^{r}+M_{I}^{r}\right)D^{-1}\right)\right)-\log k. (56)

Lemma 2 demonstrates that the embeddings learned by CTWalks correspond to the low-rank factorization of the combined TT-step transition probabilities derived from MCM_{C} and MIM_{I}. This factorization effectively integrates inter-community and intra-community relationships, capturing both local and global relationships without requiring additional parameters.

Significance of the Factorization.

By combining MCM_{C} and MIM_{I} in a structured manner, CTWalks enables an explicit encoding of both local community structures and global inter-community connectivity. This theoretical insight justifies the empirical superiority of CTWalks in capturing hierarchical and multi-scale graph properties in tasks such as link prediction and node classification.

Appendix G Detailed Explanation of Node Anonymization

Motivation for Anonymization.

The primary goal of anonymization is to achieve generalization across nodes, enabling the model to effectively operate on unseen nodes or structures during inference. By replacing explicit node identities with position-based representations:

  • The model avoids overfitting to specific node features.

  • Community labels enrich the anonymized representation, embedding mesoscopic graph structures.

  • Directionality-aware representations ensure that roles of nodes as sources or targets are preserved.

Process Overview.

Anonymization in CTWalks operates on temporal walks generated from a given interaction ((u,v),ti)((u,v),t_{i}). For each interaction, two sets of walks are constructed:

  • 𝒮u\mathcal{S}_{u}: Walks originating from the source node uu.

  • 𝒮v\mathcal{S}_{v}: Walks originating from the target node vv.

A unique set of nodes VinteractionV_{\text{interaction}} is obtained from 𝒮u𝒮v\mathcal{S}_{u}\cup\mathcal{S}_{v}. Each node wVinteractionw\in V_{\text{interaction}} is anonymized by aggregating its positional occurrences across all walks in the respective sets, incorporating directionality and community context.

Anonymization Steps.

1. For each node ww in VinteractionV_{\text{interaction}}, perform the following steps:

  • Position Vector Computation for 𝒮u\mathcal{S}_{u}:

    • For each walk W𝒮uW\in\mathcal{S}_{u}, compute ww’s positional occurrence vector:

      A(w;W)=[𝕀(w=W[1]),𝕀(w=W[2]),,𝕀(w=W[l])],A(w;W)=[\mathbb{I}(w=W[1]),\mathbb{I}(w=W[2]),\dots,\mathbb{I}(w=W[l])], (57)

      where ll is the walk length and 𝕀\mathbb{I} is an indicator function.

    • Aggregate ww’s positional vectors across all walks in 𝒮u\mathcal{S}_{u}:

      A(w;𝒮u)=W𝒮uA(w;W).A(w;\mathcal{S}_{u})=\sum_{W\in\mathcal{S}_{u}}A(w;W). (58)
  • Community Label Integration for 𝒮u\mathcal{S}_{u}:

    A(w;𝒮u,Cu)=[A(w;𝒮u)||Cu],A(w;\mathcal{S}_{u},C_{u})=[A(w;\mathcal{S}_{u})||C_{u}], (59)

    where CuC_{u} is the community label of the source node uu.

  • Repeat for 𝒮v\mathcal{S}_{v}: Similarly, compute:

    A(w;𝒮v,Cv)=[A(w;𝒮v)||Cv],A(w;\mathcal{S}_{v},C_{v})=[A(w;\mathcal{S}_{v})||C_{v}], (60)

    where CvC_{v} is the community label of the target node vv.

  • Combine Source and Target Representations: Incorporate directionality by concatenating the source and target representations:

    A(w;𝒮u,𝒮v,Cu,Cv)=[A(w;𝒮u,Cu)||A(w;𝒮v,Cv)].A(w;\mathcal{S}_{u},\mathcal{S}_{v},C_{u},C_{v})=[A(w;\mathcal{S}_{u},C_{u})||A(w;\mathcal{S}_{v},C_{v})]. (61)

2. Store the final anonymized representation A(w;𝒮u,𝒮v,Cu,Cv)A(w;\mathcal{S}_{u},\mathcal{S}_{v},C_{u},C_{v}) for all wVinteractionw\in V_{\text{interaction}}.

Anonymized Walk Construction.

After anonymizing individual nodes, the anonymized temporal walk is constructed. For a single walk W={w1,w2,,wl}W=\{w_{1},w_{2},\dots,w_{l}\} with timestamps t1<t2<<tlt_{1}<t_{2}<\dots<t_{l}, the anonymized walk is defined as:

Wanon={(A(w1),t1),(A(w2),t2),,(A(wl),tl)},W_{\text{anon}}=\{(A(w_{1}),t_{1}),(A(w_{2}),t_{2}),\dots,(A(w_{l}),t_{l})\}, (62)

where A(wi)A(w_{i}) is the direction- and community-aware anonymized representation of wiw_{i}.

Discussion.

This anonymization process achieves the following:

  • Generalization: By removing raw node identities and replacing them with position-based representations, the process ensures generalization across different nodes and graph structures.

  • Directionality: Separate representations for 𝒮u\mathcal{S}_{u} and 𝒮v\mathcal{S}_{v} preserve the source and target node roles, capturing directional interactions.

  • Community Awareness: Incorporating community labels CuC_{u} and CvC_{v} enhances context-awareness, allowing the model to leverage mesoscopic structural information.

This integration of positional, directional, and community-based information establishes a strong foundation for robust representation learning in CTWalks. The anonymization process is detailed in Algorithm 3.

Algorithm 3 Node Anonymization in CTWalks for a Given Interaction

Input: Temporal interaction ((u,v),ti)((u,v),t_{i}), walk sets 𝒮u\mathcal{S}_{u} and 𝒮v\mathcal{S}_{v}, community labels CuC_{u} and CvC_{v}.
Output: Anonymized representations {A(w;𝒮u,𝒮v,Cu,Cv)wVinteraction}\{A(w;\mathcal{S}_{u},\mathcal{S}_{v},C_{u},C_{v})\mid w\in V_{\text{interaction}}\}, where VinteractionV_{\text{interaction}} is the set of unique nodes appearing in 𝒮u\mathcal{S}_{u} and 𝒮v\mathcal{S}_{v}.

1:  Initialize Vinteractionunique nodes in 𝒮u𝒮vV_{\text{interaction}}\leftarrow\text{unique nodes in }\mathcal{S}_{u}\cup\mathcal{S}_{v}.
2:  Initialize AnonymizedList\text{AnonymizedList}\leftarrow\emptyset.
3:  for each node wVinteractionw\in V_{\text{interaction}} do
4:     Initialize A(w;𝒮u)𝟎A(w;\mathcal{S}_{u})\leftarrow\mathbf{0} (zero vector of length equal to the walk length).
5:     for each walk W𝒮uW\in\mathcal{S}_{u} do
6:        Compute position vector A(w;W)=[𝕀(w=W[1]),,𝕀(w=W[l])]A(w;W)=[\mathbb{I}(w=W[1]),\dots,\mathbb{I}(w=W[l])].
7:        Aggregate position vectors: A(w;𝒮u)A(w;𝒮u)+A(w;W)A(w;\mathcal{S}_{u})\leftarrow A(w;\mathcal{S}_{u})+A(w;W).
8:     end for
9:     Append community label: A(w;𝒮u,Cu)[A(w;𝒮u)||Cu]A(w;\mathcal{S}_{u},C_{u})\leftarrow[A(w;\mathcal{S}_{u})||C_{u}].
10:     Initialize A(w;𝒮v)𝟎A(w;\mathcal{S}_{v})\leftarrow\mathbf{0} (zero vector of length equal to the walk length).
11:     for each walk W𝒮vW\in\mathcal{S}_{v} do
12:        Compute position vector A(w;W)=[𝕀(w=W[1]),,𝕀(w=W[l])]A(w;W)=[\mathbb{I}(w=W[1]),\dots,\mathbb{I}(w=W[l])].
13:        Aggregate position vectors: A(w;𝒮v)A(w;𝒮v)+A(w;W)A(w;\mathcal{S}_{v})\leftarrow A(w;\mathcal{S}_{v})+A(w;W).
14:     end for
15:     Append community label: A(w;𝒮v,Cv)[A(w;𝒮v)||Cv]A(w;\mathcal{S}_{v},C_{v})\leftarrow[A(w;\mathcal{S}_{v})||C_{v}].
16:     Concatenate direction-aware representations:
A(w;𝒮u,𝒮v,Cu,Cv)[A(w;𝒮u,Cu)||A(w;𝒮v,Cv)].A(w;\mathcal{S}_{u},\mathcal{S}_{v},C_{u},C_{v})\leftarrow[A(w;\mathcal{S}_{u},C_{u})||A(w;\mathcal{S}_{v},C_{v})]. (63)
17:     Add A(w;𝒮u,𝒮v,Cu,Cv)A(w;\mathcal{S}_{u},\mathcal{S}_{v},C_{u},C_{v}) to AnonymizedList.
18:  end for
19:  Return AnonymizedList.

Appendix H Comparison and Advantages of Continuous-Time Approaches in CTWalks

Continuous-time models have gained significant traction in recent years, offering an alternative to traditional discrete-time approaches for modeling irregularly sampled time series and graph data. Below, we explore different continuous-time methodologies and position CTWalks within this landscape, demonstrating its unique contributions and advantages.

Refer to caption
Figure 4: Comparison of temporal dynamics modeling approaches. Standard RNN maintains constant hidden states; RNN-Decay introduces exponential decay; Neural ODE models smooth continuous trajectories; ODE-RNN combines discrete updates with continuous dynamics.

Overview of Continuous-Time Approaches

Standard RNNs. Recurrent Neural Networks (RNNs) excel at modeling high-dimensional, regularly sampled time series such as text and speech. However, they struggle with irregular sampling, as they inherently assume fixed time intervals between observations. This limitation necessitates preprocessing steps such as imputation or aggregation, which can lead to loss of critical temporal information lipton2016critical ; che2018recurrent .

RNN-Decay. To address the irregularity issue, RNN-Decay introduces exponentially decaying hidden states between observations. While this approach maintains some temporal dynamics, it assumes a simplistic decay function, which limits its ability to capture complex temporal dependencies mei2017neural .

Neural ODEs. Neural Ordinary Differential Equations (ODEs) chen2018neural define hidden state dynamics as continuous functions governed by ODEs, offering a principled way to model time as a continuous variable. Neural ODEs eliminate the need for fixed time steps, making them well-suited for irregularly sampled data. However, their reliance on purely continuous trajectories can oversimplify scenarios involving abrupt changes or discrete transitions.

ODE-RNNs. ODE-RNNs rubanova2019latent extend Neural ODEs by combining continuous dynamics with discrete updates. Between observations, hidden states evolve via ODE solvers, while at observation points, updates are applied using neural networks. This hybrid approach captures both continuous and discrete aspects of temporal data, making it particularly effective for sparse time series.

Figure 4 illustrates continuous-time approaches introduce enhancements for dynamic graphs. Standard RNNs fail to model irregular intervals effectively, while RNN-Decay offers limited improvements. Neural ODEs provide continuous trajectories but lack discrete adaptability. ODE-RNNs bridge this gap, and CTWalks extends this framework further by embedding community context into temporal dynamics. Building on the strengths of ODE-RNNs, CTWalks incorporates continuous-time dynamics into graph representation learning. Unlike prior methods, CTWalks integrates a community-aware framework with ODE-driven sampling, offering the following advantages:

  • Preservation of Temporal Dynamics: By employing ODE solvers to model temporal transitions, CTWalks accurately captures both intra- and inter-community dynamics without requiring imputation or aggregation. This aligns with the principles of Neural ODEs but extends their applicability to dynamic graphs.

  • Hybrid Continuous-Discrete Framework: Similar to ODE-RNNs, CTWalks combines continuous-time evolution with discrete updates. This enables the model to adapt to abrupt changes in graph structure, such as the formation or dissolution of communities.

  • Community-Aware Sampling: The integration of community-aware mechanisms ensures that temporal walks respect community boundaries while capturing cross-community interactions. This is a key distinction from traditional ODE-based methods, which lack structural awareness.

By embedding continuous-time dynamics into a community-aware framework, CTWalks bridges temporal modeling with structural awareness, enabling more sophisticated applications in dynamic graph representation learning.