Community-Aware Temporal Walks: Parameter-Free Representation Learning on Continuous-Time Dynamic Graphs
Abstract
Dynamic graph representation learning plays a crucial role in understanding evolving behaviors. However, existing methods often struggle with flexibility, adaptability, and the preservation of temporal and structural dynamics. To address these issues, we propose Community-aware Temporal Walks (CTWalks), a novel framework for representation learning on continuous-time dynamic graphs. CTWalks integrates three key components: a community-based parameter-free temporal walk sampling mechanism, an anonymization strategy enriched with community labels, and an encoding process that leverages continuous temporal dynamics modeled via ordinary differential equations (ODEs). This design enables precise modeling of both intra- and inter-community interactions, offering a fine-grained representation of evolving temporal patterns in continuous-time dynamic graphs. CTWalks theoretically overcomes locality bias in walks and establishes its connection to matrix factorization. Experiments on benchmark datasets demonstrate that CTWalks outperforms established methods in temporal link prediction tasks, achieving higher accuracy while maintaining robustness. The implementation of our proposed method is publicly available at https://github.com/leonyuhe/CTWalks.
1 Introduction
Continuous-Time Dynamic Graphs (CTDGs) rossi2020temporal ; xu2020inductive ; wang2021causal ; souza2022provably provide a comprehensive framework for modeling temporal interactions in real-world systems. By representing entities and their time-stamped interactions as nodes and edges, CTDGs capture the evolving dynamics of diverse systems, including social networks, biological processes, knowledge graphs, and recommendation platforms nickel2015review ; zhang2020gnnrec ; wang2019kgat ; zhang2021graph ; fout2017protein . Among the myriad tasks supported by CTDGs, temporal link prediction luo2022neighborhood ; yu2023towards —predicting future interactions based on historical data—stands out as a fundamental problem. Success in this task not only enhances our understanding of dynamic systems but also drives practical applications such as personalized recommendations and anomaly detection. Despite significant advancements in dynamic graph representation learning kazemi2020representation ; yang2023dynamic ; zhu2022encoder , existing methods face three critical challenges that limit their flexibility, adaptability, and expressiveness.
 
First, current approaches rely heavily on heuristic sampling strategies to extract key graph features, including node relationships, edge interactions, and temporal patterns. However, these strategies often demand intensive hyperparameter tuning to achieve a balance between exploration and exploitation jin2023motif ; xia2021motif . Such reliance complicates the training process and introduces sensitivity to dataset-specific characteristics, limiting their generalizability.
Second, many existing methods fail to account for mesoscopic structures, such as communities fortunato2010community ; newman2004finding ; Liu2014Multiobjective ; Teng2021Overlapping , which bridge local (node-level) and global (graph-level) patterns. Communities are crucial in real-world networks for distinguishing nodes with similar local structures but differing roles within their communities. Without incorporating community-aware representations, traditional anonymous encodings struggle to differentiate such nodes, resulting in ambiguity when predicting interactions, as demonstrated in Figure 1.
Finally, existing methods struggle to handle the continuous nature of temporal dynamics in real-world networks. Conventional approaches often discretize time into fixed intervals or aggregate observations, which obscures important temporal information, particularly in irregularly-sampled data. Recent advances in continuous-time models, such as Neural Ordinary Differential Equations (Neural ODEs)chen2018neural , demonstrate the potential to preserve temporal continuity by defining hidden states at all time points through differential equations. However, integrating such continuous-time methodologies with dynamic graph structures remains an active area of researchjin2022mtgode ; asikis2020neural ; chen2023signed ; sdgnn2022 , particularly in scenarios that require capturing both intra-community and inter-community temporal dependencies.
To tackle these challenges, we propose Community-Aware Temporal Walks (CTWalks), a novel, parameter-free, and community-aware framework for representation learning on CTDGs. By leveraging the interplay between intra-community and inter-community dynamics, CTWalks achieves robust scalability, expressiveness, and generalizability. Contributions of this work include:
1. Community Detection on CTDGs: CTWalks introduces an innovative method to perform community detection on CTDGs by transforming them into a weighted temporal graph. This transformation aggregates interactions into a static graph, where edge weights reflect interaction frequencies, thereby enabling the identification of node community labels reflecting temporal information.
2. Parameter-Free Temporal Sampling: CTWalks eliminates the need for handcrafted hyperparameters in sampling by adopting a community-driven mechanism. Temporal walks are adaptively performed within or across communities, encoding both structural dependencies and temporal dynamics. In the anonymization process, we embed community labels to restore contextual distinctions between nodes, reducing ambiguity and enhancing inductive capabilities.
3. Continuous-Time Dynamics via ODEs: To explicitly model temporal dependencies and capture essential dynamic laws, CTWalks employs a Neural ODE-based process. This approach learns spatiotemporal dynamics continuously, aligning modeled dynamics with irregularly sampled observations and providing a robust basis for downstream tasks.
4. Empirical Validation: Extensive experiments on benchmark datasets demonstrate CTWalks’ superior performance in the temporal link prediction task. Compared to several well-known methods, CTWalks achieves higher accuracy while maintaining competitive computational efficiency.
2 Related Work
Dynamic graphs kazemi2020representation ; yang2023dynamic ; zhu2022encoder ; Ma2024Higher provide a powerful framework for modeling systems that evolve over time, effectively capturing temporal interactions among nodes, edges, and their attributes. Depending on the representation of temporal information, dynamic graphs are broadly classified into Discrete-Time Dynamic Graphs (DTDGs) cong2021dynamic ; goyal2020dyngraph2vec ; pareja2020evolvegcn ; sankar2020dysat ; you2022roland and Continuous-Time Dynamic Graphs (CTDGs) rossi2020temporal ; xu2020inductive ; wang2021causal ; souza2022provably . DTDGs approximate temporal evolution using discrete graph snapshots, which provide a coarse-grained view of dynamic systems. In contrast, CTDGs represent interactions as a sequence of time-stamped events, which allows for high temporal fidelity, enabling the representation of systems with irregular or rapid interactions. Existing representation learning methods on dynamic graphs can be categorized into two primary paradigms: message-passing-based methods and sequence-model-based methods. Additionally, methods based on Continuous-Time Differential Equations (CTDEs) are discussed as a separate modeling approach chen2018neural ; rubanova2019latent ; xhonneux2020continuous .
Message-Passing-Based Methods extend graph neural networks (GNNs) scarselli2008graph ; Mo2025AutoSGNN to incorporate temporal information by propagating and aggregating features from neighboring nodes. For example, TGN rossi2020temporal integrates a memory module to store historical interactions, dynamically updating it through attention-based message passing. While effective in capturing temporal dependencies, TGN suffers from high computational overhead due to frequent memory updates and neighbor aggregation, limiting its scalability for large graphs. Other approaches, such as MTSN jin2023motif , aim to preserve structural motifs during message passing to enhance the capture of local patterns. However, the computational cost of motif construction and repeated matrix operations makes these methods impractical for large-scale applications.
Sequence-Model-Based Methods rely on recurrent neural networks (RNNs) elman1990finding or related architectures to capture temporal dependencies by sequentially updating node embeddings. For instance, JODIE kumar2019predicting employs coupled RNNs to jointly model static and dynamic embeddings, enabling tasks such as temporal link prediction. However, JODIE’s reliance on hyperparameter tuning, such as decay functions for time intervals, reduces its generalizability across datasets. Similarly, CAWs wang2021causal leverage causal anonymous walks to encode temporal sequences while preserving the temporal order of interactions.
Continuous-Time Differential Equation (CTDE)-based Methods chen2018neural provide a robust framework for modeling temporal dynamics in graphs by treating temporal interactions as continuous processes. For example, ODE-RNN rubanova2019latent leverages ODE solvers to capture the dynamics between observations, effectively bridging the gaps in irregularly sampled data. CGNN xhonneux2020continuous introduces continuous message-passing inspired by diffusion processes, addressing challenges like over-smoothing while enabling long-range dependency modeling. TGAT xu2020inductive employs temporal attention mechanisms coupled with Fourier-based time encoding to model temporal dependencies, showcasing the potential of attention in capturing fine-grained temporal patterns.
3 Preliminaries
Definition 1: Continuous-Time Dynamic Graph.
A continuous-time dynamic graph is represented as a sequence of non-decreasing chronological interactions:
| (1) | 
Here, each pair represents an undirected link between nodes and , with a corresponding timestamp .
Definition 2: Temporal Link Prediction.
Temporal link prediction is the task of predicting whether an interaction between two nodes will occur at a specific time, based on historical interactions in the graph. Given the set of historical interactions before time :
| (2) | 
and two nodes and , the goal is to predict the likelihood of an interaction between and at time . Formally, this is expressed as:
| (3) | 
where represents the probability of the interaction, conditioned on the historical interaction sequence.
Definition 3: Temporal Walk.
A temporal walk on a CTDG is a sequence of node-time pairs:
| (4) | 
where:
- 
• 
and : is rooted at node at time . 
- 
• 
: Timestamps are strictly decreasing. 
- 
• 
: Each step corresponds to a temporal edge in . 
Definition 4: Community.
A community in a graph is a subset of nodes such that nodes within are densely connected to each other, while connections between nodes in and those outside are sparse.
Formally, community detection algorithms aim to partition the node set into disjoint subsets , where:
| (5) | 
The quality of a community partition is often quantified using metrics such as modularity newman2004finding . Communities represent mesoscopic structures in graphs, bridging the gap between local node-level interactions and global graph-level patterns.
4 The Proposed Method: CTWalks
CTWalks comprises three key components: sampling, anonymization, and encoding. The complete notation system used throughout CTWalks is detailed in Appendix A, while the algorithm’s computational complexity is analyzed in Appendix C.3.
4.1 Sampling
Weighted Temporal Graph Construction.
We propose an innovative method for community detection in CTDGs by introducing the concept of a weighted temporal graph , which is derived from the input CTDG . The set of nodes in includes all nodes appeared in , and the edge weights in encode the frequency of interactions between nodes and , defined as:
| (6) | 
where is an indicator function counting occurrences of in . This transformation aggregates temporal interactions into a static graph while retaining the structural essence of the CTDG, thereby enabling efficient processing and community detection. By applying modularity optimization algorithms, such as Louvain newman2004fast ; blondel2008fast , on , the graph is partitioned into communities .
Sampling Strategy.
To effectively capture the temporal and structural patterns in , nodes are categorized based on their roles within the community structure:
- 
• 
Non-Bridging Nodes: Nodes confined within a single community . Their temporal walks are restricted to the neighbors within the intra-community subgraph . 
- 
• 
Bridging Nodes: Nodes that connect multiple communities. Their temporal walks are restricted to the neighbors within the inter-community subgraph , ensuring transitions only occur between bridging nodes. 
At each step of a temporal walk, the next node is sampled based on its temporal relationship with the current node under the constraint , where is the current timestamp. The transition probability is defined as:
| (7) | 
where:
- 
• 
represents the valid neighbor set of , determined by its type: - 
– 
For non-bridging nodes , , where is the neighbor set of . 
- 
– 
For bridging nodes , . 
 
- 
– 
- 
• 
is the timestamp associated with the interaction between and . 
Input: Intra-community subgraphs , inter-community subgraph , walk length , number of walks per node .
Output: Set of temporal walks .
By utilizing community partitioning to guide walk strategies, CTWalks eliminates the need for additional parameters to control walk directions, simplifying implementation and improving adaptability to diverse graph structures and temporal dynamics. Non-bridging nodes focus on intra-community walks, capturing localized patterns, while bridging nodes explore inter-community relationships. The complete CTWalks sampling process is detailed in Algorithm 1.
4.2 Anonymization
CTWalks replaces node identities with position-based representations while embedding both directionality and community context. For a temporal interaction , the anonymization process operates on nodes appearing in temporal walk sets originating from the source node and target node .
Community- and Direction-Aware Representation.
The anonymized representation for a node , based on temporal walk sets from the source node and target node , integrates both directionality and the community labels of the root nodes. This is defined as:
| (8) | 
where:
- 
• 
and are the sets of temporal walks originating from and , respectively. 
- 
• 
and aggregate ’s position-based occurrence information across all walks in and , respectively. 
- 
• 
and are the community labels of the root nodes and , directly appended to their respective anonymization vectors. 
Anonymized Walk Construction.
For a single temporal walk with ascending timestamps , the anonymized walk representation is defined as:
| (9) | 
where is the community- and direction-aware anonymized representation of node , incorporating information from the sets of temporal walks and , as well as the community labels and . The anonymization process is detailed in Appendix G.
4.3 Encoding
The encoding mechanism in CTWalks processes anonymized temporal walks by combining continuous temporal evolution and instantaneous updates, while integrating community-aware information. This ensures the final state captures temporal, structural, and community-aware information for downstream tasks.
The process begins by initializing the cumulative hidden state . Before the first step, the instantaneous hidden state is computed as:
| (10) | 
For each subsequent step (), the encoding alternates between:
- 
1. 
Continuous Integration: Update by integrating the temporal evolution function over : (11) 
- 
2. 
Instantaneous Update: Compute , incorporating and : (12) 
We define and as parameterized models, updates instantaneous states, while governs continuous temporal integration. Details are provided in Appendix C.1.
Input: Anonymized temporal walk .
Output: Final walk representation .
By alternating between cumulative updates and instantaneous hidden states, the encoding ensures is temporally coherent, structurally rich, and embeds community context for robust downstream tasks. The detailed implementation of this process is provided in Algorithm 2, and an illustration is shown in Fig. 2. For a detailed discussion, please refer to Appendix H.
4.4 Extension for Attributed Graphs
To encode node and edge attributes, the encoding process can be extended by modifying the instantaneous update function as follows:
| (13) | 
where:
- 
• 
represents the attributes of node , 
- 
• 
represents the attributes of edge , 
- 
• 
denotes the concatenation operation. 
In this extended formulation, node attributes and edge attributes introduce additional context for real-world attributed dynamic graphs. Several methods yang2013community ; yang2013overlapping ; du2017community focus on community detection in attributed graphs, leveraging node attributes to improve accuracy. This extension enhances the model’s flexibility and expressiveness, enabling it to capture not only the structural and temporal information but also the rich attributes associated with nodes and edges.
 
5 Theoretical Analysis
Dynamic graph sampling inherently involves both spatial structures and temporal sequences. To isolate the impact of temporal sequences and focus on spatial sampling, it is essential to establish an unbiased spatial sampling environment as a foundation. To this end, we analyze the potential biases introduced by traditional random walks on unweighted, undirected static graphs, which may limit their ability to comprehensively explore the graph structure. Furthermore, we examine whether a parameter-free walk strategy, guided by community partitions and distinguishing intra-community and inter-community transitions, can effectively address these biases.
By simplifying the problem to static graphs, we aim to eliminate the confounding effects of temporal dynamics and concentrate on evaluating the locality bias and the theoretical advantages of community-guided walks in an idealized spatial context.
Analysis 1: Overcoming Locality Bias
Let a traditional random walk generate a sequence , where . At step , the walk transitions to a neighbor of with probability , where is the degree of node . Let denote the one-step transition probability from node to , defined as:
| (14) | 
Suppose and , where and . The probability that the walk starts at and visits for the first time at time is given by:
| (15) | 
where is the set of neighbors of node . This equation shows that is the mean of the -step probabilities over all neighbors of . The reliance on all neighbors dilutes the transition probability towards distant nodes, reinforcing locality bias.
Lemma 1. In CTWalks, let nodes and belong to two different communities, and are bridging nodes. The probability that the walk starts from and visits for the first time at time satisfies:
| (16) | 
with equality when all neighbors of are bridging nodes ().
Proof. Partition into two disjoint subsets:
- 
• 
, the set of inter-community neighbors (bridging nodes), 
- 
• 
, the set of intra-community neighbors. 
The total probability of transitioning to neighbors of can be expressed as:
| (17) | 
In CTWalk, intra-community transitions cannot reach nodes in different communities. Thus, for any , . The total probability simplifies to:
| (18) | 
The transition probability from to in CTWalks is:
| (19) | 
Since , it follows that:
| (20) | 
Substituting this into the Eq.(19) for , we obtain:
| (21) | 
Equality holds when all neighbors of are bridging nodes ().
Lemma 1 demonstrates that CTWalks reduces locality bias by prioritizing inter-community transitions for bridging nodes. Unlike traditional random walks, which distribute transition probabilities evenly among all neighbors, CTWalks separates intra- and inter-community contributions, enabling effective exploration of global structures.
We also theoretically establish the connection between CTWalks and matrix factorization, details are provided in Appendix F.
6 Experiments
6.1 Experimental Setting
Baselines and Datasets.
CTWalks is evaluated against six state-of-the-art baselines for continuous-time dynamic graphs, including message passing-based methods (DyRep trivedi2019dyrep , TGAT xu2020inductive , and TGN rossi2020temporal ) and sequential model-based methods (CTDNE nguyen2018continuous , JODIE kumar2019predicting , and CAWs wang2021causal ) . The evaluation is conducted on five real-world datasets spanning diverse domains: social networks, email communications, e-commerce, online education, and student forums. For instance, UCI leskovec2016snap tracks social interactions in a university network; Enron leskovec2016snap captures email exchanges among employees; Taobao zhu2018learning records user-item interactions with encoded action types such as clicks and purchases; MOOC leskovec2016snap logs student activities on educational platforms; and Wikipedia kumar2019predicting is a bipartite interaction network consisting of editor-page and user-post activities. The dataset statistics are summarized in Table 1. Additional details on the baselines and datasets are available in the Appendix D.
| Dataset | Nodes | Temporal Edges | Duration (seconds) | Interaction Intensity | 
|---|---|---|---|---|
| UCI | 1,899 | 59,835 | 16,621,303 | |
| MOOC | 7,144 | 411,749 | 2,572,086 | |
| Enron | 143 | 62,617 | 72,932,520 | |
| Taobao | 64,703 | 77,436 | 36,000 | |
| Wikipedia | 9,227 | 157,474 | 2,678,373 | 
Data Preparation.
The data preparation process involves three key steps. First, all edges in the input temporal graph are sorted by their timestamps to ensure chronological consistency. This step facilitates training on past edges while testing on future edges. Next, the dataset is split into training, validation, and testing sets, adhering to a chronological ratio of 70% (training), 15% (validation), and 15% (testing). Finally, negative sampling is performed by generating edges absent in the original graph to form negative edge sets, ensuring a balanced dataset. Both positive and negative samples are used to train, validate, and test the model. Based on the training data, a Weighted Temporal Graph is constructed to capture the aggregate interaction strength between nodes over time. For nodes in the validation and testing datasets that are absent in the Weighted Temporal Graph:
- 
• 
Non-bridging nodes: If all neighbors of the node belong to the same community, the node is assigned to that community. 
- 
• 
Bridging nodes: If the node’s neighbors belong to multiple communities, the node is assigned to a community with a probability proportional to the sum of the edge weights connecting the node to that community, defined as: (22) where is the probability of node belonging to community , represents the neighbors of , and is the edge weight between and . 
Evaluation Tasks.
We assess the performance of CTWalks and baselines on link prediction tasks, which are further categorized into transductive and inductive settings: In the transductive link prediction task, temporal links between all nodes are used for training up to a specific time point, with the remaining links allocated for validation and testing. In the inductive link prediction task, the model is evaluated on its ability to predict links involving nodes unseen during training. Two specific scenarios are considered: 1. New-Old: Interactions between unseen (new) and observed (old) nodes. 2. New-New: Interactions exclusively between unseen nodes. To construct these scenarios, 10% of the nodes are masked during training, and interactions associated with them are removed. Validation and testing sets are limited to interactions involving these masked nodes.
 
Training Details.
For evaluation, we report results using two widely adopted metrics: Area Under the ROC Curve (AUC) and Average Precision (AP) which are detailed in Appendix D. These metrics effectively capture model performance across both transductive and inductive tasks. To ensure robustness, each experiment is repeated five times with different random seeds, and we report the average performance.
| Task | Methods | UCI | MOOC | Enron | Taobao | Wikipedia | |
|---|---|---|---|---|---|---|---|
| Inductive | new v.s. new | DyRep | 63.76 4.67 | 74.07 1.88 | 72.74 0.44 | 87.62 1.23† | 67.07 1.26 | 
| TGAT | 76.33 1.38 | 70.04 1.35 | 63.34 1.82 | 76.99 1.54 | 90.77 0.95 | ||
| TGN | 79.37 1.49 | 77.07 1.88 | 80.92 0.02 | 84.74 0.44 | 63.76 4.67 | ||
| CTDNE | 62.93 0.72 | 72.01 0.29 | 66.49 1.21 | 78.01 0.08 | 62.35 1.10 | ||
| JODIE | 70.31 0.53 | 75.31 4.14 | 70.95 0.85 | 81.53 2.12 | 72.65 0.63 | ||
| CAWs | 88.12 0.88† | 90.10 0.35† | 96.27 1.16 | 86.34 2.95 | 96.36 1.48 | ||
| CTWalks | 90.71 0.08 | 92.86 0.56 | 95.20 0.68† | 96.98 0.41 | 94.16 0.25† | ||
| new v.s. old | DyRep | 93.28 3.01† | 88.23 1.35 | 94.39 0.59 | 89.36 1.48† | 76.72 1.36 | |
| TGAT | 76.33 1.03 | 67.40 1.71 | 61.80 1.31 | 65.24 0.98 | 95.47 1.18 | ||
| TGN | 87.13 1.07 | 72.23 1.20† | 77.98 0.45 | 88.39 0.32 | 90.28 0.96 | ||
| CTDNE | 71.11 0.74 | 69.10 0.88 | 64.66 0.41 | 68.33 0.05 | 88.39 1.08 | ||
| JODIE | 71.61 0.37 | 88.20 1.92 | 85.73 1.36 | 80.53 2.13 | 74.78 0.22 | ||
| CAWs | 91.25 0.18 | 90.30 0.08† | 93.22 1.28† | 88.76 1.18 | 96.19 0.88 | ||
| CTWalks | 95.70 0.17 | 92.08 0.20 | 92.52 0.21 | 93.98 0.44 | 95.15 0.24† | ||
| Transductive | DyRep | 95.23 1.48† | 90.49 0.24 | 96.71 0.80 | 83.11 1.13 | 77.40 1.79 | |
| TGAT | 77.67 1.02 | 72.09 1.51 | 60.88 1.34 | 62.88 1.46 | 96.36 1.21† | ||
| TGN | 83.36 1.23 | 73.09 0.03 | 68.12 1.21 | 87.71 0.04† | 95.23 0.25 | ||
| CTDNE | 76.89 0.92 | 73.03 0.32 | 89.36 0.73 | 65.08 0.02 | 92.43 0.27 | ||
| JODIE | 74.63 0.52 | 90.50 0.85 | 60.36 0.65 | 82.02 0.31 | 89.30 0.22 | ||
| CAWs | 95.25 0.06 | 94.28 0.29† | 91.64 0.55 | 85.88 0.37 | 98.67 0.27 | ||
| CTWalks | 98.05 0.09 | 94.33 0.08 | 92.54 0.58† | 92.06 0.16 | 95.14 0.25 | ||
6.2 Experimental Results and Discussion
We report the AUC performance of CTWalks and six state-of-the-art baselines across inductive and transductive link prediction tasks on various datasets. Table 2 summarizes the results, highlighting the effectiveness of CTWalks in both inductive and transductive settings.
In the inductive setting, CTWalks achieves remarkable results, particularly for the challenging ”new vs. new” links. On most of datasets, CTWalks significantly outperforms the best-performing baseline CAWs, with improvements of up to 5.14% on the UCI dataset and 2.16% on Taobao. Similarly, in the ”new vs. old” scenario, CTWalks consistently surpasses the baselines, achieving an AUC of 95.70 on UCI and 92.08 on MOOC, compared to the respective baseline bests of 93.28 by DyRep and 90.30 by CAWs. These results highlight CTWalks’ superior generalization ability, particularly for interactions involving previously unseen nodes.
In the transductive setting, CTWalks again demonstrates strong performance, achieving the best or second-best results across all datasets. On the Taobao dataset, CTWalks attains an AUC of 92.06. Similarly, on the MOOC dataset, CTWalks achieves an AUC of 94.33, showcasing its competitive edge even on attributed datasets.
The success of CTWalks can be attributed to its ability to integrate temporal dynamics and structural information effectively, leveraging its community-aware temporal walk encoding to capture intricate graph patterns. Its superior performance across diverse datasets highlights its generalizability and robustness, setting a benchmark for future research in dynamic graph learning.
6.3 Ablation Study
We conducted an ablation study to evaluate the contributions of key components in CTWalks, with results presented in Table 3. The study involved systematically removing essential mechanisms to understand their impact on the performance of CTWalks.
| No. | Ablation | UCI | Taobao | MOOC | 
|---|---|---|---|---|
| 1. | original method | 92.94 0.21 | 94.88 0.36 | 92.07 0.21 | 
| 2. | remove intra-community walk | 89.28 0.66 | 88.45 0.41 | 85.69 0.31 | 
| 3. | remove inter-community walk | 86.90 0.17 | 87.91 0.39 | 86.25 1.99 | 
| 4. | remove community walk | 85.94 0.85 | 86.01 0.12 | 85.12 0.20 | 
| 5. | remove anonymous community label | 88.94 0.61 | 87.01 0.14 | 86.12 0.30 | 
| 6. | remove continuous evolution | 79.38 2.05 | 84.44 0.56 | 83.72 0.44 | 
The first three experiments assess the influence of the community-aware sampling mechanism. When the restrictions on intra-community and inter-community walks are removed, nodes are allowed to perform unrestricted temporal walks, with the selection of the next node determined by time-biased probabilities that prioritize temporally closer events. The resulting performance degradation across these variants underscores the significance of preserving community boundaries. This highlights the effectiveness of the proposed community-aware sampling strategy in capturing both local and global temporal dynamics. To further validate the advantages of community walks in enhancing network embedding learning, we conducted experiments on purely static graphs and compared against classical graph node embedding algorithms. Detailed analyses are provided in Appendix E.
The fourth experiment removes the use of community labels during the anonymized walk encoding. This change prevents the model from distinguishing between structurally similar walks based on their community context, reducing its ability to effectively generalize across different network structures.
Finally by removing the continuous integration step, the cumulative hidden state is no longer updated through integration over temporal intervals. Instead, the encoding process relies solely on the instantaneous update function Eq.(12), where the hidden state at each step is directly computed without accounting for continuous temporal evolution. This simplification significantly impacts performance, particularly on datasets with sparse interactions like UCI, as it eliminates the ability to capture nuanced spatiotemporal dynamics over time intervals, underscoring the critical role of the continuous integration mechanism.
These results validate that each component contributes significantly to the overall performance of CTWalks, with the interplay of community-aware sampling and continuous evolution mechanisms being particularly crucial.
7 Discussion and Conclusion
The effectiveness of our model relies heavily on modularity-based community detection methods, such as Louvain, which ties its performance to the quality of the initial community partitions. In networks with overlapping or poorly-defined communities, this dependency may hinder the efficacy of the community-aware sampling mechanism. To address this, future work could explore adaptive sampling techniques that dynamically refine community boundaries or integrate multi-scale community detection methods to enhance robustness.
Additionally, incorporating continuous temporal dynamics via ODE solvers poses scalability challenges for extremely large graphs. To address these issues, approximate solutions such as logarithmic normalization of time intervals and parallel batch processing (see Appendix C.2) have been employed to reduce computational overhead. Nonetheless, future research should prioritize the development of lightweight integration techniques or hybrid approaches that strike a balance between computational efficiency and modeling fidelity.
CTWalks represents a community-aware, parameter-free framework for representation learning on CTDGs. Unlike existing methods, CTWalks simultaneously incorporates intra- and inter-community dynamics, enabling the effective modeling of mesoscopic structures within graphs. Our contributions span several dimensions: (1) a novel temporal walk sampling strategy that adaptively captures community-driven dynamics without requiring extensive parameter tuning, (2) an anonymized encoding process augmented with community labels for generalization, and (3) a continuous temporal integration mechanism to capture nuanced spatiotemporal dependencies. Experimental results on diverse datasets consistently demonstrate the superiority of CTWalks in both inductive and transductive tasks.
This work lays the groundwork for future research in dynamic graph learning. Directions for future exploration include designing scalable methods for handling larger and more complex CTDGs, developing robust community detection mechanisms, and addressing challenges associated with data bias and ethical use in real-world applications. With its potential to generalize across domains, CTWalks represents a significant step forward in understanding and modeling the evolving dynamics of complex systems.
References
- (1) E. Rossi, B. Chamberlain, F. Frasca, D. Eynard, F. Monti, and M. Bronstein, “Temporal graph networks for deep learning on dynamic graphs,” in ICML 2020 Workshop on Graph Representation Learning, 2020.
- (2) D. Xu, C. Ruan, E. Körpeoglu, S. Kumar, and K. Achan, “Inductive representation learning on temporal graphs,” in 8th International Conference on Learning Representations, OpenReview.net, 2020.
- (3) Y. Wang, Y.-Y. Chang, Y. Liu, J. Leskovec, and P. Li, “Inductive representation learning in temporal networks via causal anonymous walks,” in 9th International Conference on Learning Representations, OpenReview.net, 2021.
- (4) A. H. Souza, D. Mesquita, S. Kaski, and V. Garg, “Provably expressive temporal graph networks,” in Advances in Neural Information Processing Systems (NeurIPS), 2022.
- (5) M. Nickel, K. Murphy, V. Tresp, and E. Gabrilovich, “A review of relational machine learning for knowledge graphs,” Proceedings of the IEEE, vol. 104, no. 1, pp. 11–33, 2015.
- (6) S. Zhang, Z. Yang, M. Zhao, and G. Li, “Gnnrec: Graph neural network-based recommendation,” arXiv preprint arXiv:2001.09061, 2020.
- (7) X. Wang, X. He, Y. Cao, M. Liu, and T.-S. Chua, “Kgat: Knowledge graph attention network for recommendation,” Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 950–958, 2019.
- (8) M. Zhang, W. Hu, J. Wang, and Z. Zhang, “Graph representation learning for biological networks,” in ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2021.
- (9) A. Fout, J. Byrd, B. Shariat, and A. Ben-Hur, “Protein interface prediction using graph convolutional networks,” Advances in neural information processing systems, vol. 30, pp. 6533–6542, 2017.
- (10) Y. Luo and P. Li, “Neighborhood-aware scalable temporal network representation learning,” in Learning on Graphs Conference, 2022.
- (11) L. Yu, L. Sun, B. Du, and W. Lv, “Towards better dynamic graph learning: New architecture and unified library,” in Advances in Neural Information Processing Systems, 2023.
- (12) S. M. Kazemi, R. Goel, K. Jain, I. Kobyzev, A. Sethi, P. Forsyth, and P. Poupart, “Representation learning for dynamic graphs: A survey,” Journal of Machine Learning Research, vol. 21, no. 70, pp. 1–73, 2020.
- (13) L. Yang, S. Adam, and C. Chatelain, “Dynamic graph representation learning with neural networks: A survey,” arXiv preprint arXiv:2304.05729, 2023.
- (14) Y. Zhu, F. Lyu, C. Hu, X. Chen, and X. Liu, “Encoder-decoder architecture for supervised dynamic graph learning: A survey,” arXiv preprint arXiv:2203.10480, 2022.
- (15) X. Jin, Q. Li, J. Zhou, W. Fan, and J. Tang, “Motif-aware representation learning on continuous-time dynamic graphs,” arXiv preprint arXiv:2301.06434, 2023.
- (16) M. Xia, S. Zhang, P. Zhou, F. Nie, and J. Huang, “Motif-preserving dynamic attributed network embedding,” IEEE Transactions on Knowledge and Data Engineering, vol. 34, no. 12, pp. 5750–5763, 2021.
- (17) S. Fortunato, “Community detection in graphs,” Physics Reports, vol. 486, no. 3-5, pp. 75–174, 2010.
- (18) M. E. Newman and M. Girvan, “Finding and evaluating community structure in networks,” Physical review E, vol. 69, no. 2, p. 026113, 2004.
- (19) C. Liu, J. Liu, and Z. Jiang, “A multiobjective evolutionary algorithm based on similarity for community detection from signed social networks,” IEEE Transactions on Cybernetics, vol. 44, no. 12, pp. 2274 – 2287, 2014.
- (20) X. Teng, J. Liu, and M. Li, “Overlapping community detection in directed and undirected attributed networks using a multiobjective evolutionary algorithm,” IEEE Trans. Cybernetics, vol. 51, pp. 138 – 150, January 2021.
- (21) R. T. Q. Chen, Y. Rubanova, J. Bettencourt, and D. Duvenaud, “Neural ordinary differential equations,” in Advances in Neural Information Processing Systems, vol. 31, 2018.
- (22) M. Jin, Y. Zheng, Y.-F. Li, S. Chen, B. Yang, and S. Pan, “Multivariate time series forecasting with dynamic graph neural odes,” arXiv preprint arXiv:2202.08408, 2022.
- (23) T. Asikis, L. Böttcher, and N. Antulov-Fantulin, “Neural ordinary differential equation control of dynamics on graphs,” arXiv preprint arXiv:2006.09773, 2020.
- (24) L. Chen, K. Wu, J. Lou, and J. Liu, “Signed graph neural ordinary differential equation for modeling continuous-time dynamics,” arXiv preprint arXiv:2312.11198, 2023.
- (25) Anonymous, “Streaming dynamic graph neural networks for continuous-time temporal graphs,” IEEE Transactions on Neural Networks and Learning Systems, 2022.
- (26) H. Ma, K. Wu, H. Wang, and J. Liu, “Higher-order knowledge transfer for dynamic community detection with great changes,” IEEE Transactions on Evolutionary Computation, vol. 28, no. 1, pp. 90 – 104, 2024.
- (27) W. Cong, Y. Wu, Y. Tian, M. Gu, Y. Xia, M. Mahdavi, and C.-c. J. Chen, “Dynamic graph representation learning via graph transformer networks,” CoRR, vol. abs/2111.10447, 2021.
- (28) P. Goyal, S. R. Chhetri, and A. Canedo, “dyngraph2vec: Capturing network dynamics using dynamic graph representation learning,” Knowledge-Based Systems, vol. 187, p. 104816, 2020.
- (29) A. Pareja, G. Domeniconi, J. Chen, T. Ma, T. Suzumura, H. Kanezashi, T. Kaler, T. B. Schardl, and C. E. Leiserson, “Evolvegcn: Evolving graph convolutional networks for dynamic graphs,” in Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, pp. 5363–5370, AAAI Press, 2020.
- (30) A. Sankar, Y. Wu, L. Gou, W. Zhang, and H. Yang, “Dysat: Deep neural representation learning on dynamic graphs via self-attention networks,” in Proceedings of the Thirteenth ACM International Conference on Web Search and Data Mining, pp. 519–527, ACM, 2020.
- (31) J. You, T. Du, and J. Leskovec, “Roland: graph learning framework for dynamic graphs,” in Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 2358–2366, ACM, 2022.
- (32) Y. Rubanova, R. T. Q. Chen, and D. Duvenaud, “Latent ordinary differential equations for irregularly-sampled time series,” in Advances in Neural Information Processing Systems, vol. 32, 2019.
- (33) L.-P. Xhonneux, M. Qu, and J. Tang, “Continuous graph neural networks,” Advances in Neural Information Processing Systems, vol. 33, pp. 18508–18519, 2020.
- (34) F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, and G. Monfardini, “The graph neural network model,” IEEE Transactions on Neural Networks, vol. 20, no. 1, pp. 61–80, 2008.
- (35) S. Mo, K. Wu, Q. Gao, X. Teng, and J. Liu, “Autosgnn: Automatic propagation mechanism discovery for spectral graph neural networks,” in AAAI, 2025.
- (36) J. L. Elman, “Finding structure in time,” Cognitive Science, vol. 14, no. 2, pp. 179–211, 1990.
- (37) S. Kumar, X. Zhang, and J. Leskovec, “Predicting dynamic embedding trajectory in temporal interaction networks,” in Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1269–1278, ACM, 2019.
- (38) M. E. Newman, “Fast algorithm for detecting community structure in networks,” Physical review E, vol. 69, no. 6, p. 066133, 2004.
- (39) V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre, “Fast unfolding of communities in large networks,” Journal of Statistical Mechanics: Theory and Experiment, vol. 2008, no. 10, p. P10008, 2008.
- (40) J. Yang, J. McAuley, and J. Leskovec, “Community detection in networks with node attributes,” ICDM, pp. 1151–1156, 2013.
- (41) J. Yang and J. Leskovec, “Overlapping community detection at scale: a nonnegative matrix factorization approach,” Proceedings of the sixth ACM international conference on Web search and data mining, pp. 587–596, 2013.
- (42) B. Du, Z. Wu, and J. Pei, “Community detection in attributed graphs: An embedding approach,” Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, pp. 589–598, 2017.
- (43) R. Trivedi, M. Farajtabar, P. Biswal, and H. Zha, “Dyrep: Learning representations over dynamic graphs,” in International Conference on Learning Representations (ICLR), 2019.
- (44) G. Nguyen, J. Lee, R. Rossi, N. Ahmed, E. Koh, and S. Kim, “Continuous-time dynamic network embeddings,” Proceedings of the ACM International Conference on Web Search and Data Mining (WSDM), pp. 373–381, 2018.
- (45) J. Leskovec and A. Krevl, “Snap datasets: Stanford large network dataset collection.” http://snap.stanford.edu/data, 2014. Accessed: 2016-01-01.
- (46) H. Zhu, X. Li, P. Zhang, G. Li, J. He, H. Li, and K. Gai, “Learning tree-based deep model for recommender systems,” in Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1079–1088, ACM, 2018.
- (47) K. Cho, B. Van Merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, “Learning phrase representations using rnn encoder-decoder for statistical machine translation,” in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1724–1734, 2014.
- (48) R. T. Chen, B. Amos, and M. Nickel, “Neural spatio-temporal point processes,” in International Conference on Learning Representations, 2020.
- (49) T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” in Proceedings of the International Conference on Learning Representations (ICLR), 2013.
- (50) A. Grover and J. Leskovec, “node2vec: Scalable feature learning for networks,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 855–864, ACM, 2016.
- (51) B. Perozzi, R. Al-Rfou, and S. Skiena, “DeepWalk: Online learning of social representations,” in Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 701–710, ACM, 2014.
- (52) J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q. Mei, “LINE: Large-scale information network embedding,” in Proceedings of the 24th International Conference on World Wide Web, pp. 1067–1077, 2015.
- (53) L. F. Ribeiro, P. H. Saverese, and D. R. Figueiredo, “struc2vec: Learning node representations from structural identity,” in Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 385–394, ACM, 2017.
- (54) W. Hamilton, Z. Ying, and J. Leskovec, “Inductive representation learning on large graphs,” in Proceedings of Advances in Neural Information Processing Systems, pp. 1024–1034, 2017.
- (55) X. Wang, P. Cui, J. Wang, J. Pei, and W. Zhu, “Community preserving network embedding,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31, pp. 203–209, 2017.
- (56) R. A. Rossi and N. K. Ahmed, “The network data repository with interactive graph analytics and visualization,” 2015. Available at https://networkrepository.com.
- (57) T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, “Distributed representations of words and phrases and their compositionality,” in Advances in Neural Information Processing Systems (NeurIPS), pp. 3111–3119, 2013.
- (58) O. Levy and Y. Goldberg, “Neural word embedding as implicit matrix factorization,” in Proceedings of Advances in Neural Information Processing Systems, pp. 2177–2185, 2014.
- (59) J. Qiu, Y. Dong, H. Ma, J. Li, K. Wang, and J. Tang, “Network embedding as matrix factorization: Unifying DeepWalk, LINE, PTE, and node2vec,” in Proceedings of the 11th ACM International Conference on Web Search and Data Mining, pp. 459–467, 2018.
- (60) Z. C. Lipton, J. Berkowitz, and C. Elkan, “A critical review of recurrent neural networks for sequence learning,” arXiv preprint arXiv:1506.00019, 2015.
- (61) Z. Che, S. Purushotham, K. Cho, D. Sontag, and Y. Liu, “Recurrent neural networks for multivariate time series with missing values,” in Scientific reports, vol. 8, pp. 1–12, Nature Publishing Group, 2018.
- (62) H. Mei and J. Eisner, “Neural hawkes process: A neurally self-modulating multivariate point process,” in Advances in Neural Information Processing Systems, pp. 6754–6764, 2017.
Appendix A Notations and Definitions
| Symbol | Definition | 
|---|---|
| A continuous-time dynamic graph (CTDG) with time-stamped interactions. | |
| The weighted temporal graph with aggregated interaction weights. | |
| Weight of the edge , representing the total number of interactions. | |
| Partitioned communities in the graph using modularity-based algorithms. | |
| Subgraph corresponding to community . | |
| Inter-community subgraph comprising bridging nodes and edges. | |
| Set of bridging nodes connecting multiple communities. | |
| Neighbor set of node . | |
| A temporal walk defined as . | |
| Transition probability for intra-community walks based on temporal constraints. | |
| Transition probability for inter-community walks based on temporal constraints. | |
| Anonymization operation with community label integration. | |
| Anonymized temporal walk incorporating community-aware representations. | |
| Instantaneous hidden state at step of the walk. | |
| Cumulative hidden state at step updated through continuous integration. | |
| Instantaneous update function for hidden states. | |
| Temporal evolution function for continuous integration. | |
| Continuous integration over the temporal interval . | |
| Final representation of the temporal walk . | |
| Average link stream intensity, measuring interaction density over time. | 
Additional Notes:
- 
• 
The definitions aim to maintain clarity for each symbol and its application within the CTWalks framework. 
- 
• 
Symbols like and are parameterized models, where handles discrete updates and continuous temporal dynamics. 
- 
• 
The temporal walks integrate both intra-community and inter-community dynamics, with anonymization enhancing generalization across unseen nodes. 
- 
• 
Community-aware sampling eliminates the need for additional parameters to control local/global walks, enhancing model simplicity and robustness. 
Appendix B Community Detection Algorithms
B.1 Louvain Method
The Louvain method newman2004fast ; blondel2008fast is an efficient modularity optimization algorithm widely used for community detection in undirected weighted graphs. It operates in two iterative phases:
- 
1. 
Local Optimization Phase: Initially, each node is treated as a separate community. The algorithm iteratively moves nodes between neighboring communities to maximize the modularity gain , defined as: (23) where is the sum of edge weights within a community, and are the total weights of edges connected to the node inside the community and in the entire graph, respectively, and is the total edge weight in the graph. 
- 
2. 
Community Aggregation Phase: Once no further modularity improvement is possible, each community is treated as a supernode, forming a new graph. The process is repeated until the modularity no longer improves significantly. 
The Louvain method is highly scalable and effective, making it suitable for large-scale graphs.
B.2 Other Community Detection Algorithms
Several alternative community detection methods exist, each with unique strengths and weaknesses:
- 
• 
Spectral Clustering: Based on the eigenvectors of the graph Laplacian, this method identifies communities. - 
– 
Advantages: Effective for small-scale graphs and capable of capturing complex community structures. 
- 
– 
Disadvantages: Computationally expensive for large graphs. 
 
- 
– 
- 
• 
Random Walk-Based Methods (e.g., Infomap): These minimize the description length of random walks to identify communities. - 
– 
Advantages: Naturally handle weighted graphs and hierarchical community structures. 
- 
– 
Disadvantages: High computational resource requirements. 
 
- 
– 
- 
• 
Label Propagation: Communities are formed through the iterative propagation and convergence of node labels. - 
– 
Advantages: Simple and fast, well-suited for dynamic graphs. 
- 
– 
Disadvantages: Susceptible to local optima and instability. 
 
- 
– 
- 
• 
Modularity Maximization: A family of algorithms that optimize modularity . The Louvain method is a prominent example. 
- 
• 
Deep Learning-Based Methods: Algorithms like DeepWalk and Node2Vec learn node embeddings to capture community structures. - 
– 
Advantages: Integrate node attributes and graph topology information. 
- 
– 
Disadvantages: Require significant computational resources and are sensitive to hyperparameters. 
 
- 
– 
In our framework, the Louvain method is employed for community detection due to its efficiency and suitability for weighted graphs. This method provides a solid foundation for subsequent node community classification and boundary node detection. Future work may explore advanced algorithms tailored to dynamic weighted graphs for more accurate and robust community detection.
Appendix C Batching and Complexity Analysis
C.1 Function Definitions: and
In the CTWalks framework, two key functions, and , are defined to model discrete updates and continuous temporal evolution, respectively. Their complementary roles enable precise representation of both instantaneous and time-dependent dynamics within continuous-time dynamic graphs.
1. Instantaneous Update Function
The function handles the discrete, step-wise state updates at specific nodes along a temporal walk. For each node , the hidden state is computed as:
| (24) | 
where:
- 
• 
is the cumulative hidden state from the previous step. 
- 
• 
is the anonymized representation of node . 
- 
• 
is instantiated as a Gated Recurrent Unit (GRU) cho2014learning , designed to capture both structural and temporal information. 
The GRU-based formulation of is given by:
| (25) | 
| (26) | 
| (27) | 
| (28) | 
where:
- 
• 
, , and are the update gate, reset gate, and candidate hidden state, respectively. 
- 
• 
represents the input node features. 
- 
• 
and denote the sigmoid and hyperbolic tangent activation functions, respectively. 
- 
• 
represents element-wise multiplication. 
2. Continuous Temporal Evolution Function
The function models the continuous evolution of the hidden state over time intervals between nodes. Unlike , which incorporates node-specific input features, focuses solely on temporal dynamics and acts on the output of . The cumulative hidden state at step is computed by integrating over the time interval :
| (29) | 
where:
- 
• 
is the instantaneous hidden state computed by , 
- 
• 
and are the timestamps associated with the current and next nodes in the temporal walk, 
- 
• 
is parameterized as a GRU-like model, similar to , but without external inputs . 
The formulation of is as follows:
| (30) | 
| (31) | 
| (32) | 
| (33) | 
where:
- 
• 
, , and are the update gate, reset gate, and candidate hidden state, respectively, 
- 
• 
, , and are weight matrices, 
- 
• 
, , and are biases, 
- 
• 
and denote the sigmoid and hyperbolic tangent activation functions. 
By removing external inputs , focuses purely on modeling the temporal continuity of the hidden state, allowing it to integrate spatiotemporal dependencies over time intervals.
Together, and enable CTWalks to seamlessly integrate spatial and temporal dynamics, delivering a robust framework for learning representations on continuous-time dynamic graphs.
C.2 Batching Analysis
Efficiently encoding temporal walks poses a significant challenge due to the irregular timestamps and varying lengths of walks. For temporal walks, each consisting of steps, solving independent ordinary differential equations (ODEs) is computationally expensive. To address this, we employ a time reparameterization strategy, unifying the integration intervals of all walks into a standard range . This unification allows batch processing of the walks, enabling significant computational efficiencychen2020neural .
Reparameterization for Unified Time Intervals
Consider the -th walk with timestamps , where is the time of the first node in the walk, and is the time of the last node. We reparameterize the time within this interval using a new variable , defined as:
| (34) | 
Under this transformation, the instantaneous hidden state at time is reformulated as , such that:
| (35) | 
This allows all walks, regardless of their original time intervals, to operate within a common temporal scale of .
Reformulated ODE for Batch Processing
The cumulative hidden state for the -th walk is computed by solving an ODE over the interval . After reparameterization, the ODE is reformulated for as follows:
| (36) | 
where .
The initial and final states are transformed as:
| (37) | 
The solution for the cumulative hidden state at the end of the walk becomes:
| (38) | 
Here, the reparameterized function ensures that all walks are solved over a common interval, enabling efficient batch computation.
Parallelism Across Walks
While each walk still involves step-wise computations within the reparameterized interval, the unification of time intervals to allows a single solver instance to process multiple walks in parallel. This is achieved by stacking the dynamics of all walks into a joint system:
| (39) | 
where . The solver processes this joint system, reducing the computational overhead from independent solver calls to a single batched solver call.
Batch Processing Efficiency
For walks with , the reparameterization directly enables batch processing since the time interval for all walks is normalized to . However, for walks with , each step within the walk corresponds to a distinct interval, making true parallelism challenging. Solving multiple sub-intervals within each walk requires sequential processing.
To fully enable parallel processing for , further strategies, such as normalizing intermediate step intervals or discretizing the ODE solutions, could be employed. These approaches remain an open area for further optimization.
Handling Large Time Intervals
In real-world scenarios, large time intervals can cause numerical instability. To mitigate this, we normalize the intervals while preserving relative temporal differences using a logarithmic transformation:
| (40) | 
ensuring that smaller intervals retain fine-grained dynamics while larger intervals are compressed to a manageable scale.
This batching mechanism combines time reparameterization and joint ODE solving, significantly improving computational efficiency and enabling scalable representation learning on dynamic graphs.
C.3 Complexity Analysis
We analyze the time complexity of the main components in CTWalks, including community detection, temporal walk sampling, anonymization, continuous evolution, and instantaneous updates. The detailed breakdown is as follows:
Community Detection.
Community detection is performed using a modularity-based algorithm, such as Louvain, which has a time complexity of , where is the number of edges and is the number of nodes. This step is executed once as preprocessing.
Temporal Walk Sampling.
In each batch, walks are generated, where is the batch size (number of root nodes) and is the number of walks per root node. The sampling process involves steps per walk, with each step querying neighbors at a complexity of , where is the average number of neighbors. The total complexity for temporal walk sampling is .
Walk Anonymization.
For anonymization, each walk requires operations to identify unique nodes and assign positional encodings. Across walks, the total complexity becomes .
Continuous Evolution.
Continuous evolution integrates the temporal evolution function over time intervals using an ODE solver, requiring function evaluations per step. For walks with steps, the total complexity is . Using the batch processing optimization introduced in Section B.1, this is reduced to .
Instantaneous Updates.
The instantaneous updates compute the hidden state at each step using a parameterized function, resulting in a complexity of .
Subtotal (Sampling and Anonymization).
The combined complexity for sampling and anonymization, including temporal walk sampling and walk anonymization, is:
| (41) | 
Subtotal (Encoding).
The combined complexity for continuous evolution and instantaneous updates is:
| (42) | 
Total Complexity.
Combining all components, the total time complexity of CTWalks per epoch is:
| (43) | 
Table 5 summarizes the complexity of each component.
| Component | Time Complexity | 
|---|---|
| Community Detection | |
| Temporal Walk Sampling | |
| Walk Anonymization | |
| Subtotal (Sampling and Anonymization) | |
| Continuous Evolution | |
| Instantaneous Updates | |
| Subtotal (Encoding) | |
| Total | 
Appendix D Additional Experimental Details
D.1 Dataset Details and Preprocessing
We evaluate our method on five real-world datasets across diverse domains. The details are as follows:
- 
• 
UCI: Contains 1,899 nodes and 59,835 temporal edges, representing message interactions in a university social forum. No additional preprocessing is required.111Dataset available at: https://snap.stanford.edu/data/CollegeMsg.html 
- 
• 
MOOC: Logs 7,145 nodes and 411,749 temporal edges, capturing student interactions with course units over one month.222Dataset available at: http://snap.stanford.edu/jodie/mooc.csv 
- 
• 
Enron: Includes 184 nodes and 125,235 temporal edges, recording email communications over several years.333Dataset available at: http://www.cs.cmu.edu/~enron/ 
- 
• 
Taobao: Contains 987,994 nodes and 2,099,520 temporal edges, representing user-item interactions with encoded action types (e.g., clicks, purchases).444Dataset available at: https://tianchi.aliyun.com/dataset/dataDetail?dataId=649&userId=1 
- 
• 
Wikipedia: Bipartite graph with 9,227 nodes and 157,474 temporal edges, representing editor-page interactions encoded with LIWC features.555Dataset available at: http://snap.stanford.edu/jodie/wikipedia.csv 
Data Preprocessing: The following steps ensure consistency and facilitate evaluation:
- 
1. 
All temporal edges are sorted chronologically to maintain temporal consistency. 
- 
2. 
Edges are split into training (70%), validation (15%), and testing (15%) sets. 
- 
3. 
Negative sampling is performed to generate edges absent in the original graph, ensuring a balanced dataset. 
D.2 Baselines and Hyperparameter Tuning
Baseline Methods:
To evaluate the performance of our proposed method, we compare it with six state-of-the-art baselines specifically designed for continuous-time dynamic graphs (CTDGs):
- 
• 
CTDNE: CTDNE extends static network embedding techniques to CTDGs by leveraging temporal random walks and a skip-gram model to learn node representations. This method captures the temporal dependency of interactions.666CTDNE repository: https://github.com/LogicJake/CTDNE 
- 
• 
JODIE: JODIE employs two mutually influenced recurrent neural networks (RNNs) to update the latent states of interacting nodes. The model is capable of predicting future embedding trajectories based on past interactions.777JODIE repository: https://github.com/claws-lab/jodie 
- 
• 
DyRep: DyRep integrates sequence modeling with an attentive message-passing mechanism, incorporating 2-hop temporal neighborhood information to produce time-aware embeddings. The loss function is built upon temporal point processes.888DyRep repository: https://github.com/uoguelph-mlrg/LDG 
- 
• 
TGAT: TGAT applies temporal encoding via random Fourier time encodings and attentively aggregates temporal neighborhood information to generate embeddings. The model supports multi-hop temporal message aggregation.999TGAT repository: https://github.com/StatsDLMathsRecomSys/Inductive-representation-learning-on-temporal-graphs 
- 
• 
TGN: TGN proposes a memory-based message-passing framework, which updates node memories dynamically while combining key designs from JODIE and TGAT to enhance its capability to model temporal interactions.101010TGN repository: https://github.com/twitter-research/tgn 
- 
• 
CAWs: CAWs encode and aggregate anonymous temporal walks using RNNs, followed by pooling mechanisms to generate temporal node embeddings. Its community-aware design enables it to capture complex temporal dynamics.111111CAWs repository: https://github.com/snap-stanford/CAW 
Hyperparameter Tuning:
To ensure a fair comparison, we carefully adapt each baseline to our unified evaluation framework. Key hyperparameters are tuned through grid search to optimize model performance. Below, we summarize the tuning strategies for each method:
- 
• 
TGAT: Neighborhood Sampling Degree is tuned from {8, 16, 32, 64}; Model Layers are searched over {1, 2, 3}; Attention Heads (default product attention) are tuned from {2, 4, 6}; and Embedding Dimensions are tuned from {16, 32, 64, 100}. 
- 
• 
JODIE: Embedding Dimensions are tuned from {32, 64, 128, 256}. 
- 
• 
DyRep: Neighborhood Sampling Degree is tuned from {10, 16, 32, 64}, and Attention Layers are tuned over {1, 2, 3}. 
- 
• 
TGN: Attention Heads are tuned from {2, 4, 8}; Embedding Dimensions for nodes, time, and messages are tuned over {16, 32, 64, 100}; and Memory Dimensions are searched over {4, 16, 32, 64, 172}. 
- 
• 
CAWs: Walk Length is tuned over {1, 2, 3}; Number of Walks is tuned from {16, 32, 64, 128}; and Time Decay Factor is tuned over {10-7, 10-6, 10-5, 10-4}. Walk pooling is set to summation for consistency and simplicity. 
- 
• 
CTDNE: Skip-gram Window Size is tuned from {3, 5, 10}. 
| Method | Hyperparameter | Search Range | Optimal Value | 
| TGAT | Neighborhood Sampling Degree | {8, 16, 32, 64} | 32 | 
| Model Layers | {1, 2, 3} | 2 | |
| Attention Heads | {2, 4, 6} | 4 | |
| Embedding Dimensions | {32, 64, 128, 256} | 256 | |
| JODIE | Embedding Dimensions | {32, 64, 128, 256} | 256 | 
| DyRep | Neighborhood Sampling Degree | {10, 16, 32, 64} | 16 | 
| Attention Layers | {1, 2, 3} | 2 | |
| TGN | Attention Heads | {2, 4, 8} | 4 | 
| Embedding Dimensions | {32, 64, 128, 256} | 256 | |
| Memory Dimensions | {4, 16, 32, 64, 172} | 64 | |
| CAWs | Walk Length | {1, 2, 3} | 3 | 
| Number of Walks | {16, 32, 64} | 64 | |
| Time Decay Factor | {10-7, 10-6, 10-5, 10-4} | 10-6 | |
| CTDNE | Skip-gram Window Size | {3, 5, 10} | 5 | 
Implementation Details:
All baselines are implemented using publicly available codebases. We ensure consistency in training and evaluation settings, including dataset splits and metrics, to enable fair comparisons. For additional implementation details, such as the specific training configurations and runtime environments, refer to Appendix C.3.
D.3 Implementation Details
Code Availability:
Our implementation of CTWalks is publicly available at https://github.com/leonyuhe/CTWalks. The repository includes detailed instructions for dataset preparation, model training, and hyperparameter tuning.
General Training Settings:
CTWalks is trained across all datasets using the following settings:
- 
• 
Optimizer: Adam with a learning rate of and batch size of 32. 
- 
• 
Early Stopping: Training stops if validation performance does not improve for three consecutive epochs. 
- 
• 
Epoch Limit: Maximum 50 epochs. 
Hyperparameter Tuning:
To ensure optimal performance, key hyperparameters controlling walk sampling are tuned via grid search. The following summarizes the hyperparameter configurations for all datasets:
- 
• 
Walk Length (): {1, 2, 3} 
- 
• 
Number of Walks per Node (): {16, 32, 64} 
- 
• 
ODE Solver: Fixed-step Runge-Kutta 3/8 method with a step size of 0.125. 
Evaluation Tasks:
- 
• 
Transductive: Models are trained on all edges up to a specific time point and tested on future edges. 
- 
• 
Inductive: Evaluates the ability to predict links involving unseen nodes. Two scenarios are considered: - 
1. 
New-Old: Interactions between unseen and observed nodes. 
- 
2. 
New-New: Interactions exclusively between unseen nodes. 
 
- 
1. 
Hardware Configuration: Experiments are conducted on an Ubuntu Linux server equipped with four NVIDIA GeForce RTX 3090 GPUs, two Intel(R) Xeon(R) Silver 4210 CPUs (2.20GHz), and 252GB of RAM. Each experiment is repeated five times, and the average results are reported.
Evaluation Metrics:
- 
• 
AUC (Area Under the ROC Curve): Measures the quality of predictions. 
- 
• 
AP (Average Precision): Evaluates the precision-recall relationship, detail in Table 7. 
| Task | Methods | UCI | MOOC | Enron | Taobao | Wikipedia | |
|---|---|---|---|---|---|---|---|
| Inductive | new v.s. new | DyRep | 64.55 4.22 | 74.93 1.77 | 73.41 0.41 | 88.33 1.15† | 67.89 1.21 | 
| TGAT | 77.20 1.32 | 71.15 1.27 | 64.11 1.69 | 77.45 1.45 | 91.32 0.89 | ||
| TGN | 80.28 1.39 | 78.31 1.77 | 81.78 0.02 | 85.34 0.41 | 64.56 4.33 | ||
| CTDNE | 63.75 0.67 | 72.91 0.27 | 67.14 1.15 | 78.77 0.08 | 63.09 1.05 | ||
| JODIE | 71.45 0.49 | 76.08 4.00 | 71.66 0.81 | 82.18 2.04 | 73.54 0.61 | ||
| CAWs | 88.78 0.82† | 90.89 0.33† | 96.95 1.09 | 86.99 2.80 | 96.91 1.41 | ||
| CTWalks | 91.26 0.08 | 93.72 0.52 | 95.91 0.62† | 97.75 0.38 | 94.94 0.24† | ||
| new v.s. old | DyRep | 94.11 2.91† | 89.07 1.29 | 95.18 0.55 | 90.11 1.42† | 77.45 1.27 | |
| TGAT | 77.10 0.99 | 68.40 1.62 | 62.11 1.24 | 65.93 0.93 | 96.12 1.10 | ||
| TGN | 87.78 1.01 | 73.02 1.14† | 78.56 0.42 | 89.09 0.30 | 90.96 0.89 | ||
| CTDNE | 71.89 0.70 | 70.10 0.83 | 65.44 0.38 | 69.02 0.05 | 89.06 1.02 | ||
| JODIE | 72.55 0.35 | 89.00 1.81 | 86.54 1.29 | 81.23 2.03 | 75.66 0.20 | ||
| CAWs | 92.10 0.16 | 91.14 0.07† | 94.10 1.22† | 89.56 1.10 | 96.99 0.81 | ||
| CTWalks | 96.35 0.16 | 93.81 0.18 | 93.17 0.19 | 94.78 0.41 | 96.01 0.22† | ||
| Transductive | DyRep | 96.11 1.42† | 91.22 0.22 | 97.41 0.75 | 83.79 1.07 | 78.23 1.70 | |
| TGAT | 78.11 0.99 | 72.95 1.43 | 61.32 1.27 | 63.76 1.38 | 96.89 1.11† | ||
| TGN | 84.01 1.16 | 74.21 0.03 | 69.01 1.14 | 88.21 0.04† | 95.93 0.23 | ||
| CTDNE | 77.55 0.86 | 74.03 0.29 | 90.08 0.68 | 65.78 0.02 | 93.11 0.26 | ||
| JODIE | 75.29 0.49 | 91.40 0.78 | 61.11 0.62 | 82.77 0.29 | 90.44 0.21 | ||
| CAWs | 95.97 0.05 | 95.11 0.27† | 92.32 0.51 | 86.88 0.34 | 99.21 0.26 | ||
| CTWalks | 98.65 0.08 | 95.88 0.07 | 93.35 0.54† | 92.98 0.13 | 97.88 0.23 | ||
Appendix E Experiment on Purely Static Graphs
To further validate the benefits of community walks in enhancing network embedding learning, we conducted experiments on purely static graphs, focusing on the community structure without considering temporal information. Specifically, we applied our method, CTWalks, which incorporates community walks to generate node sequences, and subsequently used the word2Vec mikolov2013efficient framework to generate embeddings from these sequences. By deliberately excluding temporal data and leveraging word2Vec, our approach aims to highlight the effectiveness of CTWalks in capturing community structures and achieving competitive performance on static graphs compared to classical graph embedding algorithms.
Experimental Setup
We evaluated CTWalks (CTW) against six widely recognized baseline methods: node2vec (N2V) grover2016node2vec , DeepWalk (DW) perozzi2014deepwalk , LINE tang2015line , Struc2Vec (S2V) ribeiro2017struc2vec , GraphSage (GS) hamilton2017graphsage , and M-NMF (MN) wang2017community . To ensure a fair comparison, the implementations of these methods were obtained from publicly available repositories. Specifically, Node2vec, DeepWalk, and M-NMF were implemented using the Karate Club library121212https://github.com/benedekrozemberczki/karateclub. The implementation of LINE was sourced from its official repository131313https://github.com/tangjianpku/LINE. Similarly, Struc2Vec141414https://github.com/leoribeiro/struc2vec and GraphSage151515https://github.com/williamleif/GraphSAGE were obtained from their respective repositories. This approach ensured the use of standardized and well-maintained implementations for each baseline.
The experiments were conducted on six datasets from the Network Repository nr2015 and SNAP leskovec2016snap , spanning diverse domains such as social, biological, and ecological networks. All graphs were treated as static by disregarding temporal information. Each graph was split into 70% edges for training and 30% for testing. Node embeddings were learned on the training graph, while the testing edges were excluded to ensure unbiased evaluation.
| Dataset | |||||
|---|---|---|---|---|---|
| fb-pages-food | 620 | 2091 | 6.7 | 0.33 | 0.22 | 
| ego-Facebook | 4039 | 88234 | 43.69 | 0.60 | 0.26 | 
| soc-hamsterster | 2000 | 16097 | 16.10 | 0.5375 | 0.2314 | 
| aves-weaver-social | 117 | 304 | 5.20 | 0.6924 | 0.5747 | 
| bio-DM-LC | 483 | 997 | 4.13 | 0.1047 | 0.0551 | 
| bio-WormNet-v3 | 2274 | 78328 | 68.89 | 0.8390 | 0.7211 | 
Notation: : Number of nodes; : Number of edges; : Average node degree; : Average clustering coefficient; : Fraction of closed triangles.
The hyperparameters for CTW, DW, and N2V were set to (walk length), (number of walks per node), and (embedding dimension). Edge embeddings were computed using the Hadamard product of node embeddings. The performance was evaluated using logistic regression and two metrics: AUC (Area Under the Curve) and Average Precision (AP). Each experiment was repeated ten times with different random seeds, and the average results are reported.
E.1 Results and Analysis
Link Prediction.
The results of the link prediction task are presented in Table 10 (AUC) and Table 10 (AP). CTW consistently outperformed baseline methods across all datasets. Notably, CTW demonstrated significant improvements on datasets with strong community structures, such as soc-hamsterster and fb-pages-food. For instance:
- 
• 
On bio-WormNet-v3, CTW achieved an AUC of 0.9915, outperforming the second-best method, M-NMF, by 3.2%. 
- 
• 
On ego-Facebook, CTW achieved an AP of 0.9891, showcasing robust performance in large-scale social networks. 
- 
• 
On datasets with smaller community structures, such as aves-weaver-social, CTW achieved a notable improvement, with AUC gains of up to 5.1% over the best baseline. 
| Dataset | CTW | N2V | DW | LINE | S2V | GS | MN | 
|---|---|---|---|---|---|---|---|
| fb-pages-food | 0.9319 | 0.8764 | 0.8632 | 0.8121 | 0.8573 | 0.8902 | 0.8964 | 
| ego-Facebook | 0.9885 | 0.9452 | 0.9203 | 0.8947 | 0.9345 | 0.9541 | 0.9721 | 
| soc-hamsterster | 0.9122 | 0.8610 | 0.8423 | 0.8045 | 0.8534 | 0.8812 | 0.8734 | 
| aves-weaver-social | 0.9114 | 0.8732 | 0.8517 | 0.8245 | 0.8651 | 0.8832 | 0.8903 | 
| bio-DM-LC | 0.9337 | 0.8810 | 0.8743 | 0.8547 | 0.8723 | 0.8901 | 0.9121 | 
| bio-WormNet-v3 | 0.9915 | 0.9543 | 0.9402 | 0.9134 | 0.9624 | 0.9742 | 0.9603 | 
| Dataset | CTW | N2V | DW | LINE | S2V | GS | MN | 
|---|---|---|---|---|---|---|---|
| fb-pages-food | 0.9421 | 0.8920 | 0.8814 | 0.8289 | 0.8712 | 0.9045 | 0.9102 | 
| ego-Facebook | 0.9891 | 0.9534 | 0.9351 | 0.9107 | 0.9443 | 0.9642 | 0.9813 | 
| soc-hamsterster | 0.9215 | 0.8735 | 0.8602 | 0.8224 | 0.8698 | 0.8914 | 0.8803 | 
| aves-weaver-social | 0.9217 | 0.8801 | 0.8655 | 0.8371 | 0.8764 | 0.8942 | 0.9015 | 
| bio-DM-LC | 0.9440 | 0.8954 | 0.8883 | 0.8651 | 0.8850 | 0.9025 | 0.9234 | 
| bio-WormNet-v3 | 0.9961 | 0.9602 | 0.9456 | 0.9175 | 0.9723 | 0.9815 | 0.9673 | 
Appendix F Theoretical Analysis
We interpret CTWalks from the perspective of matrix factorization. We demonstrate that the two-layer random walk process in CTWalks corresponds to a novel matrix factorization form, integrating intra- and inter-community dynamics through a hierarchical transition mechanism. These analyses highlight CTWalks’ theoretical advantages in encoding community-aware structures, providing insights into its effectiveness in network representation learning.
Analysis 2: Matrix Factorization Perspective of CTWalks
The Skip-Gram with negative sampling (SGNS) framework mikolov2013distributed in network embedding methods such as DeepWalk perozzi2014deepwalk and node2vec grover2016node2vec has been shown to implicitly factorize a pointwise mutual information (PMI) matrix levy2014neural ; Qiu:2018 . CTWalks extends this perspective by incorporating a hierarchical, two-layer random walk that explicitly encodes both intra-community and inter-community transitions, resulting in a novel matrix factorization form.
CTWalks Transition Matrices
In CTWalk, the probability of transitioning between two nodes during the random walk is governed by two distinct components: intra-community transitions (within communities) and inter-community transitions (across communities). To model this, we introduce two matrices: 1) a block-diagonal matrix that represents transitions within individual communities, and 2) an extended matrix that captures transitions between communities via bridging nodes.
Intra-community transition matrix ().
The matrix represents the normalized intra-community transitions, structured as a block-diagonal matrix. Each block corresponds to a community , where:
- 
• 
is the adjacency matrix of the subgraph , induced by the set of nodes and edges within community . 
- 
• 
is the inverse degree matrix of , ensuring row normalization of transition probabilities within each community. 
The resulting block-diagonal structure of is expressed as:
| (44) | 
This formulation confines random walks within individual communities , effectively capturing the dense intra-community relationships while isolating transitions between distinct communities.
Inter-community transition matrix ().
The matrix captures transitions between inter-community bridging nodes. To construct , we first normalize the adjacency matrix of , where is the set of bridging nodes. The normalized matrix is defined as:
| (45) | 
where is the degree matrix of , ensuring that the rows of sum to one. Here, represents the normalized transition probabilities restricted to the bridging nodes.
To integrate with , we align the node order in to match the node ordering used in . Specifically, the rows and columns of are arranged such that:
- 
• 
Nodes within the same community are grouped together, maintaining the block-diagonal structure of , 
- 
• 
Rows and columns corresponding to non-bridging nodes are filled with zeros. 
This alignment ensures that both and are compatible for subsequent matrix operations, such as multi-step transition summation. The resulting can be visualized as follows:
| (46) | 
This design ensures that and remain consistent in terms of node alignment. By isolating inter-community transitions and aligning them to the global node order of , facilitates a seamless integration of intra-community and inter-community dynamics in the CTWalks framework.
Matrix Factorization of CTWalks
Lemma 2: Let and represent the inter-community and intra-community transition matrices, respectively. The embeddings learned by CTWalks using Skip-Gram with negative sampling correspond to a low-rank factorization of the following shifted Pointwise Mutual Information (PMI) matrix:
| (47) | 
where:
- 
• 
= is the volume of graph , 
- 
• 
is the context window size, 
- 
• 
and : The -step transition probability matrices for (inter-community transitions) and (intra-community transitions), respectively, 
- 
• 
is the inverse degree matrix of , 
- 
• 
is the negative sampling parameter. 
Proof. We begin with the Skip-Gram objective, which seeks to maximize the probability of observing a context node given a target node . Under the CTWalks framework, random walks are independently applied to the inter-community transition matrix and the intra-community transition matrix .
Step 1: Joint Probability.
The joint probability of observing a node with context is defined by the stationary distribution and the -step transition probabilities:
| (48) | 
where is the stationary probability of node , given by:
| (49) | 
Step 2: Marginal Probabilities.
The marginal probability of node under the stationary distribution is given by:
| (50) | 
Similarly, the marginal probability of context is:
| (51) | 
Step 3: Pointwise Mutual Information (PMI).
The PMI between target node and context node is defined as:
| (52) | 
Substituting , , and into Eq.(52), we obtain:
| (53) | 
Simplifying:
| (54) | 
Step 4: Matrix Representation.
We can represent the PMI matrix in the following matrix form:
| (55) | 
Step 5: SGNS Objective.
The Skip-Gram with negative sampling introduces a shift , where is the number of negative samples. Therefore, the final form of the factorization becomes:
| (56) | 
Lemma 2 demonstrates that the embeddings learned by CTWalks correspond to the low-rank factorization of the combined -step transition probabilities derived from and . This factorization effectively integrates inter-community and intra-community relationships, capturing both local and global relationships without requiring additional parameters.
Significance of the Factorization.
By combining and in a structured manner, CTWalks enables an explicit encoding of both local community structures and global inter-community connectivity. This theoretical insight justifies the empirical superiority of CTWalks in capturing hierarchical and multi-scale graph properties in tasks such as link prediction and node classification.
Appendix G Detailed Explanation of Node Anonymization
Motivation for Anonymization.
The primary goal of anonymization is to achieve generalization across nodes, enabling the model to effectively operate on unseen nodes or structures during inference. By replacing explicit node identities with position-based representations:
- 
• 
The model avoids overfitting to specific node features. 
- 
• 
Community labels enrich the anonymized representation, embedding mesoscopic graph structures. 
- 
• 
Directionality-aware representations ensure that roles of nodes as sources or targets are preserved. 
Process Overview.
Anonymization in CTWalks operates on temporal walks generated from a given interaction . For each interaction, two sets of walks are constructed:
- 
• 
: Walks originating from the source node . 
- 
• 
: Walks originating from the target node . 
A unique set of nodes is obtained from . Each node is anonymized by aggregating its positional occurrences across all walks in the respective sets, incorporating directionality and community context.
Anonymization Steps.
1. For each node in , perform the following steps:
- 
• 
Position Vector Computation for : - 
– 
For each walk , compute ’s positional occurrence vector: (57) where is the walk length and is an indicator function. 
- 
– 
Aggregate ’s positional vectors across all walks in : (58) 
 
- 
– 
- 
• 
Community Label Integration for : (59) where is the community label of the source node . 
- 
• 
Repeat for : Similarly, compute: (60) where is the community label of the target node . 
- 
• 
Combine Source and Target Representations: Incorporate directionality by concatenating the source and target representations: (61) 
2. Store the final anonymized representation for all .
Anonymized Walk Construction.
After anonymizing individual nodes, the anonymized temporal walk is constructed. For a single walk with timestamps , the anonymized walk is defined as:
| (62) | 
where is the direction- and community-aware anonymized representation of .
Discussion.
This anonymization process achieves the following:
- 
• 
Generalization: By removing raw node identities and replacing them with position-based representations, the process ensures generalization across different nodes and graph structures. 
- 
• 
Directionality: Separate representations for and preserve the source and target node roles, capturing directional interactions. 
- 
• 
Community Awareness: Incorporating community labels and enhances context-awareness, allowing the model to leverage mesoscopic structural information. 
This integration of positional, directional, and community-based information establishes a strong foundation for robust representation learning in CTWalks. The anonymization process is detailed in Algorithm 3.
Input: Temporal interaction , walk sets  and , community labels  and . 
Output: Anonymized representations , where  is the set of unique nodes appearing in  and .
| (63) | 
Appendix H Comparison and Advantages of Continuous-Time Approaches in CTWalks
Continuous-time models have gained significant traction in recent years, offering an alternative to traditional discrete-time approaches for modeling irregularly sampled time series and graph data. Below, we explore different continuous-time methodologies and position CTWalks within this landscape, demonstrating its unique contributions and advantages.
 
Overview of Continuous-Time Approaches
Standard RNNs. Recurrent Neural Networks (RNNs) excel at modeling high-dimensional, regularly sampled time series such as text and speech. However, they struggle with irregular sampling, as they inherently assume fixed time intervals between observations. This limitation necessitates preprocessing steps such as imputation or aggregation, which can lead to loss of critical temporal information lipton2016critical ; che2018recurrent .
RNN-Decay. To address the irregularity issue, RNN-Decay introduces exponentially decaying hidden states between observations. While this approach maintains some temporal dynamics, it assumes a simplistic decay function, which limits its ability to capture complex temporal dependencies mei2017neural .
Neural ODEs. Neural Ordinary Differential Equations (ODEs) chen2018neural define hidden state dynamics as continuous functions governed by ODEs, offering a principled way to model time as a continuous variable. Neural ODEs eliminate the need for fixed time steps, making them well-suited for irregularly sampled data. However, their reliance on purely continuous trajectories can oversimplify scenarios involving abrupt changes or discrete transitions.
ODE-RNNs. ODE-RNNs rubanova2019latent extend Neural ODEs by combining continuous dynamics with discrete updates. Between observations, hidden states evolve via ODE solvers, while at observation points, updates are applied using neural networks. This hybrid approach captures both continuous and discrete aspects of temporal data, making it particularly effective for sparse time series.
Figure 4 illustrates continuous-time approaches introduce enhancements for dynamic graphs. Standard RNNs fail to model irregular intervals effectively, while RNN-Decay offers limited improvements. Neural ODEs provide continuous trajectories but lack discrete adaptability. ODE-RNNs bridge this gap, and CTWalks extends this framework further by embedding community context into temporal dynamics. Building on the strengths of ODE-RNNs, CTWalks incorporates continuous-time dynamics into graph representation learning. Unlike prior methods, CTWalks integrates a community-aware framework with ODE-driven sampling, offering the following advantages:
- 
• 
Preservation of Temporal Dynamics: By employing ODE solvers to model temporal transitions, CTWalks accurately captures both intra- and inter-community dynamics without requiring imputation or aggregation. This aligns with the principles of Neural ODEs but extends their applicability to dynamic graphs. 
- 
• 
Hybrid Continuous-Discrete Framework: Similar to ODE-RNNs, CTWalks combines continuous-time evolution with discrete updates. This enables the model to adapt to abrupt changes in graph structure, such as the formation or dissolution of communities. 
- 
• 
Community-Aware Sampling: The integration of community-aware mechanisms ensures that temporal walks respect community boundaries while capturing cross-community interactions. This is a key distinction from traditional ODE-based methods, which lack structural awareness. 
By embedding continuous-time dynamics into a community-aware framework, CTWalks bridges temporal modeling with structural awareness, enabling more sophisticated applications in dynamic graph representation learning.