This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Generalizable Implicit Neural Representation As a Universal Spatiotemporal Traffic Data Learner

Tong Niea,b, Guoyang Qina, Wei Mab,∗ and Jian Suna,∗
(March 31, 2025)

  Keywords: Implicit neural representations, Traffic data learning, Spatiotemporal traffic data, Traffic dynamics, Meta-learning

1 INTRODUCTION

The unpredictable elements involved in a vehicular traffic system, such as human behavior, weather conditions, energy supply and social economics, lead to a complex and high-dimensional dynamical transportation system. To better understand this system, Spatiotemporal Traffic Data (STTD) is often collected to describe its evolution over space and time. This data includes various sources such as vehicle trajectories, sensor-based time series, and dynamic mobility flow. The primary aim of STTD learning is to develop data-centric models that accurately depict traffic dynamics and can predict complex system behaviors.

Refer to caption
Figure 1: Representing spatiotemporal traffic data as implicit neural functions. (a) Traffic data at arbitrary spatiotemporal coordinates can be represented as a continuous function in an implicit space. (b) Coordinate-based MLPs map coordinates to traffic states. (c) With the resolution-independent property, our model can represent various spatiotemporal traffic data.

Despite its complexity, recent advances in STTD learning have found that the dynamics of the system evolve with some dominating patterns and can be captured by some low-dimensional structures. Notably, low-rankness is a widely studied pattern, and models based on it assist in reconstructing sparse data, detecting anomalies, revealing patterns, and predicting unknown system states. However, these models have two primary limitations: 1) they often require a grid-based input with fixed spatiotemporal dimensions, restricting them from accommodating varying spatial resolutions or temporal lengths; 2) the low-rank pattern modeling, fixed on one data source, may not generalize to different data sources. For instance, patterns identified in one data type, such as vehicle trajectories, may not be applicable to differently structured data, such as OD demand. These constraints mean that current STTD learning depends on data structures and sources. This limits the potential for a unified representation and emphasizes the need for a universally applicable method to link various types of STTD learning.

To address these limitations, we employ a novel technique called implicit neural representations (INRs) to learn the underlying dynamics of STTD. INRs use deep neural networks to discern patterns from continuous input (Sitzmann et al., , 2020, Tancik et al., , 2020). They function in a continuous space and take domain coordinates as input, predicting the corresponding quantity at queried coordinates. INRs learn patterns in implicit manifolds and fit processes that generate target data with functional representation. This differentiates them from low-rank models that depend on explicit patterns, enhancing their expressivity, and enabling them to learn dynamics implicitly. Consequently, they eliminate the need for fixed data dimensions and can adjust to traffic data of any scale or resolution, allowing us to model various STTD with a unified input. In this work, we exploit the advances of INRs and tailor them to incorporate the characteristics of STTD, resulting in a novel method that serves as a universal traffic data learner (refer to Fig. 1).

Our proof-of-concept has shown promising results through extensive testing using real-world data. The method is versatile, working across different scales - from corridor-level to network-level applications. It can also be generalized to various input dimensions, data domains, output resolutions, and network topologies. This study offers novel perspectives on STTD modeling and provides an extensive analysis of practical applications, contributing to the state-of-the-art. To our knowledge, this is the first time that INRs have been applied to STTD learning and have demonstrated effectiveness in a variety of real-world tasks. We anticipate this could form the basis for developing foundational models for STTD.

2 METHODOLOGY

To formalize a universal data learner, we let MLPs be the parameterization θ\theta. Concretely, the function representation is expressed as a continuous mapping from the input domain to the traffic state of interest: Φθ(x,t):𝒳×𝒯𝒴\Phi_{\theta}(x,t):\mathcal{X}\times\mathcal{T}\mapsto\mathcal{Y}, where 𝒳N\mathcal{X}\subseteq\mathbb{R}^{N} is the spatial domain, 𝒯+\mathcal{T}\subseteq\mathbb{R}^{+} is the temporal domain, and 𝒴\mathcal{Y}\subseteq\mathbb{R} is the output domain. Φθ\Phi_{\theta} is a coordinate-based MLP (Fig. 1 (b)).

2.1 Encoding high-frequency components in function representation

High-frequency components can encode complex details about STTD. To alleviate the spectral bias of neural network towards low-frequency patterns, we adopt two advanced techniques to enable Φθ\Phi_{\theta} to learn high-frequency components. Given the spatial-temporal input coordinate 𝐯=(x,t)×+\mathbf{v}=(x,t)\subseteq\mathbb{R}\times\mathbb{R}^{+}, the frequency-enhanced MLP can be formulated as:

𝐡(1)=ReLU(𝐖(0)γ(𝐯)+𝐛(0)),𝐡(+1)=sin(ω0𝐖()𝐡()+𝐛()),Φ(𝐯)=𝐖(L)𝐡(L)+𝐛(L),\mathbf{h}^{(1)}=\texttt{ReLU}(\mathbf{W}^{(0)}\gamma(\mathbf{v})+\mathbf{b}^{(0)}),\leavevmode\nobreak\ \mathbf{h}^{(\ell+1)}=\sin(\omega_{0}\cdot\mathbf{W}^{(\ell)}\mathbf{h}^{(\ell)}+\mathbf{b}^{(\ell)}),\leavevmode\nobreak\ \Phi(\mathbf{v})=\mathbf{W}^{(L)}\mathbf{h}^{(L)}+\mathbf{b}^{(L)}, (1)

where 𝐖()d()×d(+1),𝐛()d(+1)\mathbf{W}^{(\ell)}\in\mathbb{R}^{d_{(\ell)}\times d_{(\ell+1)}},\mathbf{b}^{(\ell)}\in\mathbb{R}^{d_{(\ell+1)}} are layerwise parameters, and Φ(𝐯)dout\Phi(\mathbf{v})\in\mathbb{R}^{d_{\text{out}}} is the predicted value. sin()\sin(\cdot) is the periodic activation function with frequency factor ω0\omega_{0} (Sitzmann et al., , 2020). γ(𝐯)\gamma(\mathbf{v}) is the concatenated random Fourier features (CRF) (Tancik et al., , 2020) with different Fourier basis frequencies 𝐁kd/2×cin\mathbf{B}_{k}\in\mathbb{R}^{d/2\times c_{\text{in}}} sampled from the Gaussian 𝒩(0,σk2)\mathcal{N}(0,\sigma_{k}^{2}):

γ(𝐯)=[sin(2π𝐁1𝐯),cos(2π𝐁1𝐯),,sin(2π𝐁Nf𝐯),cos(2π𝐁Nf𝐯)]𝖳dNf.\gamma(\mathbf{v})=[\sin(2\pi\mathbf{B}_{1}\mathbf{v}),\cos(2\pi\mathbf{B}_{1}\mathbf{v}),\dots,\sin(2\pi\mathbf{B}_{N_{f}}\mathbf{v}),\cos(2\pi\mathbf{B}_{N_{f}}\mathbf{v})]^{\mathsf{T}}\in\mathbb{R}^{d{N_{f}}}. (2)

By setting a large number of frequency features NfN_{f} and a series of scale parameters {σk2}\{\sigma^{2}_{k}\}, we can sample a variety of frequency patterns in the input domain. The combination of these two strategies achieves high-frequency, low-dimensional regression, empowering the coordinate-based MLPs to learn complex details with high resolution.

2.2 Factorizing spatial-temporal variability

Using a single Φθ\Phi_{\theta} to model entangled spatiotemporal interactions can be challenging. Therefore, we decompose the spatiotemporal process into two dimensions using variable separation:

Φ(𝐯)=Φx(vx)Φt(vt)𝖳,Φx:𝒳,vxΦx(vx)dx,Φt:𝒯,vtΦt(vt)dt,\Phi(\mathbf{v})=\Phi_{x}(v_{x})\Phi_{t}(v_{t})^{\mathsf{T}},\Phi_{x}:\mathcal{X}\mapsto\mathbb{R},\leavevmode\nobreak\ v_{x}\mapsto\Phi_{x}({v}_{x})\in\mathbb{R}^{d_{x}},\leavevmode\nobreak\ \Phi_{t}:\mathcal{T}\mapsto\mathbb{R},\leavevmode\nobreak\ {v}_{t}\mapsto\Phi_{t}({v}_{t})\in\mathbb{R}^{d_{t}}, (3)

where Φx\Phi_{x} and Φt\Phi_{t} are defined by Eq. (1). Eq. (3) is an implicit representation of matrix factorization model. But it can process data or functions that exist beyond the regular mesh grid of matrices. To further align the two components, we adopt a middle transform matrix 𝐌xtdx×dt\mathbf{M}_{xt}\in\mathbb{R}^{d_{x}\times d_{t}} to model their interactions in the hidden manifold, which yields: Φ(𝐯)=Φx(vx)𝐌xtΦt(vt)𝖳\Phi(\mathbf{v})=\Phi_{x}({v}_{x})\mathbf{M}_{xt}\Phi_{t}({v}_{t})^{\mathsf{T}}.

2.3 Generalizable representation with meta-learning

Given a STTD instance, we can sample a set containing MM data pairs 𝐱={(𝐯i,𝐲i)}i=1M\mathbf{x}=\{(\mathbf{v}_{i},\mathbf{y}_{i})\}_{i=1}^{M} where 𝐯icin\mathbf{v}_{i}\in\mathbb{R}^{c_{\text{in}}} is the input coordinate and 𝐲icout\mathbf{y}_{i}\in\mathbb{R}^{c_{\text{out}}} is the traffic state value. Then we can learn an INR using gradient descent over the loss minθ(θ;𝐱)=1Mi=1M𝐲iΦθ(𝐯i)22\min_{\theta}\mathcal{L}(\theta;\mathbf{x})=\frac{1}{M}\sum_{i=1}^{M}\|\mathbf{y}_{i}-\Phi_{\theta}(\mathbf{v}_{i})\|_{2}^{2}. As can be seen, a single INR encodes a single data domain, but the learned INR cannot be generalized to represent other data instances and requires per-sample retraining. Given a series of data instances 𝒳={𝐱(n)}n=1N\mathcal{X}=\{\mathbf{x}^{(n)}\}_{n=1}^{N}, we set a series of latent codes for each instance {ϕ(n)dlatent}n=1N\{\phi^{(n)}\in\mathbb{R}^{d_{\text{latent}}}\}_{n=1}^{N} to account for the instance-specific data pattern and make Φθ\Phi_{\theta} a base network conditional on the latent code ϕ\phi (Dupont et al., , 2022). We then perform per-sample modulations to the middle INR layers:

𝐡(+1)=sin(ω0𝐖()𝐡()+𝐛()+𝐬(n)),𝐬(n)=hω()(ϕ(n))=𝐖s()ϕ(n)+𝐛s(),\mathbf{h}^{(\ell+1)}=\sin(\omega_{0}\cdot\mathbf{W}^{(\ell)}\mathbf{h}^{(\ell)}+\mathbf{b}^{(\ell)}+\mathbf{s}^{(n)}),\leavevmode\nobreak\ \mathbf{s}^{(n)}=h_{\omega}^{(\ell)}(\phi^{(n)})=\mathbf{W}^{(\ell)}_{s}\phi^{(n)}+\mathbf{b}^{(\ell)}_{s}, (4)

where 𝐬(n)d()\mathbf{s}^{(n)}\in\mathbb{R}^{d_{(\ell)}} is the shift modulation of instance nn at layer \ell, and hω()(|ωΘ):dlatentd()h_{\omega}^{(\ell)}(\cdot|\omega\in\Theta):\mathbb{R}^{d_{\text{latent}}}\mapsto\mathbb{R}^{d_{(\ell)}} is a shared linear hypernetwork layer to map the latent code to layerwise modulations. Then, the loss function of the generalizable implicit neural representations (GINRs) is given as:

minθ,ϕ(θ,{ϕ(n)}n=1N;𝒳)=𝔼𝐱𝒳[(θ,ϕ(n);𝐱(n)]=1NMn=1Ni=1M𝐲i(n)Φθ,hω(ϕ)(𝐯i(n);ϕ(n))22.\min_{\theta,\phi}\mathcal{L}(\theta,\{\phi^{(n)}\}_{n=1}^{N};\mathcal{X})=\mathbb{E}_{\mathbf{x}\sim\mathcal{X}}[\mathcal{L}(\theta,\phi^{(n)};\mathbf{x}^{(n)}]=\frac{1}{NM}\sum_{n=1}^{N}\sum_{i=1}^{M}\|\mathbf{y}^{(n)}_{i}-\Phi_{\theta,h_{\omega}(\phi)}(\mathbf{v}_{i}^{(n)};\phi^{(n)})\|_{2}^{2}. (5)

To learn all codes, we adopt the meta-learning strategy to achieve efficient adaptation and stable optimization. Since conditional modulations 𝐬\mathbf{s} are processed as functions of ϕ\phi, and each ϕ\phi represents an individual instance, we can implicitly obtain these codes using an auto-decoding mechanism. For data nn, this is achieved by an iterative gradient descent process: ϕ(n)ϕ(n)αϕ(n)(Φθ,hω(ϕ),{(𝐯i(n),𝐲i(n))}iM)\phi^{(n)}\leftarrow\phi^{(n)}-\alpha\nabla_{\phi^{(n)}}\mathcal{L}(\Phi_{\theta,h_{\omega}{(\phi)}},\{(\mathbf{v}_{i}^{(n)},\mathbf{y}_{i}^{(n)})\}_{i\in M}), where α\alpha is the learning rate, and the above process is repeated in several steps. To integrate the auto-decoding into the meta-learning procedure, inner-loop and outer-loop iterations are considered to alternatively update Φθ\Phi_{\theta}, and ϕ\phi.

3 RESULTS

We conduct extensive experiments on real-world STTD covering scales from corridor to network, specifically including: (a) Corridor-level application: Highway traffic state estimation; (b-c) Grid-level application: Urban mesh-based flow estimation; and (d-f) Network-level application: Highway and urban network state estimation. We compare our model with SOTA low-rank models and evaluate its generalizability in different scenarios, such as different input domains, multiple resolutions, and distinct topologies. We also find that the encoding of high-frequency components is crucial for learning complex patterns (g-h). Fig. 2 briefly summarizes our results.

Refer to caption
Figure 2: Experiments on multiscale STTD. Full results can be found at (Nie et al., , 2024).

4 SUMMARY

We have developed a new method for learning spatiotemporal traffic data (STTD) using implicit neural representations. This involves parameterizing STTD as deep neural networks, with INRs trained to map coordinates directly to traffic states. The versatility of this representation allows it to model various STTD types, including vehicle trajectories, origin-destination flows, grid flows, highway networks, and urban networks. Thanks to the meta-learning paradigm, this approach can be generalized to a range of data instances. Experimental results from various real-world benchmarks show that our model consistently surpasses conventional low-rank models. It also demonstrates potential for generalization across different data structures and problem contexts.

References

  • Dupont et al., (2022) Dupont, Emilien, Kim, Hyunjik, Eslami, SM, Rezende, Danilo, & Rosenbaum, Dan. 2022. From data to functa: Your data point is a function and you can treat it like one. arXiv preprint arXiv:2201.12204.
  • Nie et al., (2024) Nie, Tong, Qin, Guoyang, Ma, Wei, & Sun, Jian. 2024. Spatiotemporal Implicit Neural Representation as a Generalized Traffic Data Learner. arXiv preprint arXiv:2405.03185.
  • Sitzmann et al., (2020) Sitzmann, Vincent, Martel, Julien, Bergman, Alexander, Lindell, David, & Wetzstein, Gordon. 2020. Implicit neural representations with periodic activation functions. Advances in Neural Information Processing Systems, 33, 7462–7473.
  • Tancik et al., (2020) Tancik, Matthew, Srinivasan, Pratul, Mildenhall, Ben, Fridovich-Keil, Sara, Raghavan, Nithin, Singhal, Utkarsh, Ramamoorthi, Ravi, Barron, Jonathan, & Ng, Ren. 2020. Fourier features let networks learn high frequency functions in low dimensional domains. Advances in Neural Information Processing Systems, 33, 7537–7547.