Newell’s theory based feature transformations for spatio-temporal traffic prediction

Agnimitra Sengupta¹, S. Ilgin Guler²\equalcontrib

Abstract

Deep learning (DL) models for spatio-temporal traffic flow forecasting employ convolutional or graph-convolutional filters along with recurrent neural networks to capture spatial and temporal dependencies in traffic data. These models, such as CNN-LSTM, utilize traffic flows from neighboring detector stations to predict flows at a specific location of interest. However, these models are limited in their ability to capture the broader dynamics of the traffic system, as they primarily learn features specific to the detector configuration and traffic characteristics at the target location. Hence, the transferability of these models to different locations becomes challenging, particularly when data is unavailable at the new location for model training. To address this limitation, we propose a traffic flow physics-based feature transformation for spatio-temporal DL models. This transformation incorporates Newell’s uncongested and congested-state estimators of traffic flows at the target locations, enabling the models to learn broader dynamics of the system. Our methodology is empirically validated using traffic data from two different locations. The results demonstrate that the proposed feature transformation improves the models’ performance in predicting traffic flows over different prediction horizons, as indicated by better goodness-of-fit statistics. An important advantage of our framework is its ability to be transferred to new locations where data is unavailable. This is achieved by appropriately accounting for spatial dependencies based on station distances and various traffic parameters. In contrast, regular DL models are not easily transferable as their inputs remain fixed. It should be noted that due to data limitations, we were unable to perform spatial sensitivity analysis, which calls for further research using simulated data.

Introduction

Short-term traffic forecasts play a crucial role in supporting operational network models and facilitating Intelligent Transportation Systems (ITS) applications. Researchers have long recognized the importance of accurately predicting future traffic conditions for proactive traffic management (Cheslow, Hatcher, and Patel 1992) and comprehensive traveler information services (Kaysi, Ben-Akiva, and Koutsopoulos 1993). These forecasts enable ITS applications to provide drivers with precise and timely information, allowing them in making informed decisions. As a result, this enables effective congestion mitigation, reduced travel time, and an enhanced overall travel experience. The success of these strategies relies heavily on the quality and accuracy of the traffic forecasts, which are obtained through sophisticated time-series forecasting models.

The prediction of traffic conditions, including flow, occupancy, and travel speed, is essentially a time-series forecasting problem that relies on input data from a sufficient number of spatially distributed sensors throughout the network. Therefore, in addition to considering temporal dependencies, it is crucial to incorporate the spatial correlations among various sensor data within the network to effectively capture the intricate dynamics of traffic flow.

Various parametric techniques have been used to model temporal dependence in traffic time series, including historical average algorithms, smoothing techniques, and autoregressive integrated moving average (ARIMA) models (Ahmed and Cook 1979; Levin and Tsao 1980). However, studies have indicated that ARIMA models have limitations in capturing complex traffic patterns, especially during congested conditions (Davis and Nihan 1991; Hamed, Al-Masaeid, and Said 1995). Additionally, the development of multi-variable prediction models incorporating flow, speed, and occupancy for traffic forecasting has yielded mixed results (Innamaa 2000; Dougherty and Cobbett 1997; Florio and Mussone 1996; Lyons et al. 1996). These findings highlight the need for more advanced and effective modeling approaches in traffic prediction.

Alternately, state-space models have been shown to be an excellent foundation for modeling traffic data since they are multivariate in nature and can describe simpler univariate time series (Okutani and Stephanedes 1984; Chen and Chien 2001; Chien and Kuchipudi 2003; Whittaker, Garside, and Lindveld 1997). Following advances in computer efficiency and capacity to handle large data quantities, non-parametric techniques, which, unlike parametric methods, do not specify any functional form, was used to model traffic data with greater transferability and robustness across datasets (Smith and Demetsky 1997; Clark 2003). These approaches, in contrast to parametric methods, have more degrees of freedom, allowing them to better adapt to non-linearities and capture spatio-temporal features of traffic. For small scale traffic scenarios, methods like nearest neighbors (Smith, Williams, and Oswald 2002), support vector machine (Mingheng et al. 2013), and Bayesian network (Sun, Zhang, and Yu 2006) have proven to be beneficial. However, these methods become limited in their ability to extract effective features for large-scale traffic analysis (Lin et al. 2019).

To the contrary, neural networks (NN) and deep learning (DL) can use a multi-layer architecture to capture complex relations (LeCun, Bengio, and Hinton 2015) in traffic data. Use of NNs to capture temporal patterns in traffic time series have been demonstrated in (Chang and Su 1995; Innamaa 2000; Dia 2001). In this backdrop, numerous variants of NN were developed, such as backpropagation neural networks (Park and Rilett 1999), the modular neural network (Park and Rilett 1998) and radial basis function neural network (Park, Messer, and Urbanik 1998). However, for sequential data like traffic time series, recurrent neural networks (RNN) (Van Lint, Hoogendoorn, and van Zuylen 2002; Rumelhart, Hinton, and Williams 1986) and its variants like long short-term memory (LSTM) (Hochreiter and Schmidhuber 1997) have been specifically introduced to preserve temporal correlations between observations to predict future states. For example, Ma et al. (Ma et al. 2015) used an LSTM architecture to make short-term travel speed predictions. Yu et al. (Yu et al. 2017) applied LSTM and autoencoder to capture the sequential dependency for predicting traffic under extreme conditions, particularly for peak-hour and post-accident scenarios. Cui et al. (Cui et al. 2020) proposed a deep stacked bidirectional and unidirectional LSTM network for traffic speed prediction.

To account for the spatial aspects of traffic, data from multiple locations across the network need to be utilized in predicting future traffic states. For instance, Stathopoulos and Karlaftis (Stathopoulos and Karlaftis 2003) incorporated data from upstream detectors to improve predictions at downstream locations using a multivariate time-series state-space model. DL models like deep belief networks (Huang et al. 2014) and stacked autoencoder (Lv et al. 2014) have also been considered for multi-sensor information fusion - however, because of the underlying structure of the input data, the spatial relationship could not be efficiently captured with these models. To this end, convolutional neural network (CNN) (Krizhevsky, Sutskever, and Hinton 2012) has emerged as one of the most successful deep NNs to model topological locality or spatial correlation by use of filters or kernels to extract local features. For example, (Zhang, Zheng, and Qi 2017) developed a DL model, ST-ResNet which used convolutions for city-wide crowd flow prediction. However, CNNs are particularly successful when dealing with data in which there is an underlying Euclidean structure. For generalized, non-Euclidean data, graph convolutional neural network (GCNN) (Henaff, Bruna, and LeCun 2015) have been used for traffic network modeling and prediction tasks (Cui et al. 2019; Mallick et al. 2021). However, while these studies explicitly model temporal or spatial dependency, they do account for their interaction effects. Recent studies proposed integrating CNN and LSTM (Yao et al. 2019, 2018) to jointly model spatial and temporal patterns in traffic. Several other hybrid models like Sparse Autoencoder LSTM (Lin et al. 2019), DeepTransport (Cheng et al. 2018) - hybrid CNN and RNN with an attention mechanism, ST-TrafficNet (Lu et al. 2020) have been recently demonstrated to efficiently capture spatio-temporal features of traffic data.

In modelling traffic data, DL algorithms have shown considerable potential. The success of data-driven models can be linked to the data quality and quantity. However, limited data availability poses a significant restriction to model development for many freeway corridors. Transfer learning (TL) is a promising method for avoiding data scarcity, training, and deployment issues. In this method, a model learned for one task is reused and/or altered for a similar task. TL is commonly employed for image classification, sentiment analysis, and document classification (Pan and Yang 2009; Zhuang et al. 2020), but it has received less attention in the traffic forecasting domain(Mallick et al. 2021). Further, physics-informed models may uncover additional dynamics of the system, that might not be observable from the limited data. Physics-informed deep learning (PIDL) models usually comprise of a model-driven component (a physics-informed NN for regularization) and a data-driven component (a NN for estimation) to integrate the advantages of both components. Using this framework, recent researches (Huang and Agarwal 2020; Shi et al. 2021; Thodi et al. 2022) have demonstrated the superiority of the PIDL in traffic state estimations over purely data-driven or traffic flow physics-based approaches. This paper presents yet another approach to incorporate traffic flow physics to a DL model that uses a physics-based feature enhancement to perform the task of traffic flow prediction along a freeway corridor. Our approach is particularly advantageous due to its ability to be transferred to new locations where data is not available.

The remainder of the paper is organised as follows: first, we present the conceptual background of Newell’s solution to Lighthill-Whitham-Richards (LWR) model and the DL model, followed by the physics-based feature enhancement to the model. Next, the problem setup is described, including the modelling results. Finally, some concluding remarks are presented.

Background

In this section, we provide a brief introduction to Newell’s simplified solution to the Lighthill-Whitham-Richards (LWR) model - which forms the basis of the physics-based feature transformation, followed by an overview of the DL model considered.

Newell’s simplified theory estimations

Traffic state evolution on a roadway using the basic principle of LWR continuum model requires identification of ‘shockwaves’ and their interactions on the time-space diagram. A simplified approach - Newell’s solution (Newell 1993) - can be used to estimate traffic dynamics at a particular location in terms of cumulative vehicle counts, i.e., the cumulative number of vehicles observed at that location. Using vehicle counts at a detector location, this method aims to estimate cumulative vehicle counts at some other location along the homogeneous freeway assumed to exhibit a triangular fundamental diagram. Cumulative vehicle counts at detector stations can be appropriately translated using fundamental traffic parameters (defined in Table 1) to give an estimate of cumulative counts at the target site consistent with traffic flow theory. The relevant components of Newell’s solution are summarized below. The basic idea is to determine the change in vehicle counts along characteristics of free-flow from upstream and congested flow from downstream. Therefore, this approach assumes upstream is in free-flow state, whereas the downstream is in congestion. Newell’s simplified theory states that the real state at the target site would be determined as the minimum of what would be expected along the free-flow characteristic and the congested characteristic.

Table 1: Macroscopic fundamental variables

Symbols	Variable
$v_{f}$	Free flow speed
$w$	Congested wave speed
$N_{i}$	Vehicle count at station i
$q$	Flow
$q_{c}$	Flow at capacity
$k$	Density
$k_{c}$	Density at capacity
$k_{j}$	Jam Density

Along the free-flow characteristic

Assume a point $A$ in a free-flow state (F) with vehicle count $N_{A}$ . Consider a target location $B$ downstream to location $A$ . As we move from $A$ to $B$ , which lies on the characteristic that emanates from $A$ , the change in vehicle counts is given by $\Delta N_{A\rightarrow B}$ .

\begin{split}\Delta N_{A\rightarrow B}&=\dfrac{\partial N}{\partial x}\Delta x+\dfrac{\partial N}{\partial t}\Delta t\\ \dfrac{\Delta N_{A\rightarrow B}}{\Delta x}&=\dfrac{\partial N}{\partial x}+\dfrac{\partial N}{\partial t}\dfrac{\Delta t}{\Delta x}\\ &=-k+\dfrac{q}{v_{f}}\\ \Delta N_{A\rightarrow B}&=0\hskip 5.69054pt(\text{since}\hskip 5.69054ptq=kv)\end{split}

(1)

From Equation 1, the change in vehicular count along the interface or signal travelling at free flow speed is zero. However, the time for the signal to travel from $A$ to $B$ is $d_{A\rightarrow B}/v_{f}$ , where $d_{A\rightarrow B}$ is the distance between locations $A$ and $B$ . Therefore, the counts at $B$ as a function of counts at $A$ can be given by:

N_{f}(t,B)=N\left(t-\dfrac{d_{A\rightarrow B}}{v_{f}},A\right)

(2)

Conversely, if the location $B$ is upstream to location $A$ , and free-flow regime persists, then Equation 2 can be suitably modified as,

N_{f}(t,B)=N\left(t+\dfrac{d_{A\rightarrow B}}{v_{f}},A\right)

(3)

Along the congested flow characteristic

Consider a target location $Y$ upstream to location $X$ , that has a known congested state (C) with vehicle counts $N_{X}$ . As we move from $X$ to $Y$ that lies on the characteristic that emanates from $X$ , the change in vehicle counts is given by $\Delta N_{X\rightarrow Y}$ .

\begin{split}\Delta N_{X\rightarrow Y}&=\dfrac{\partial N}{\partial x}\Delta x+\dfrac{\partial N}{\partial t}\Delta t\\ \dfrac{\Delta N_{X\rightarrow Y}}{\Delta x}&=\dfrac{\partial N}{\partial x}+\dfrac{\partial N}{\partial t}\dfrac{\Delta t}{\Delta x}\\ &=-k-\dfrac{q}{w}\\ \Delta N_{X\rightarrow Y}&=-\Delta x\cdot(k_{j})\\ &=d_{X\rightarrow Y}(k_{j})\end{split}

(4)

Therefore, in case of congestion, there is a finite change in the vehicle counts equal to $d_{X\rightarrow Y}k_{j}$ that occurs in time $d_{X\rightarrow Y}/w$ . Thus, the counts at $Y$ as a function of counts at $X$ is given by:

N_{c}(t,Y)=N\left(t-\dfrac{d_{X\rightarrow Y}}{w},X\right)+d_{X\rightarrow Y}k_{j}

(5)

The free-flow and congested estimators of cumulative counts at a target location using cumulative counts from another location along the homogeneous roadway are represented by Equations 2, 3 and 5, which we suitably utilize in designing the physics-based model.

Spatio-temporal model

In this study, we employ a combination of convolutional neural network (CNN) (Krizhevsky, Sutskever, and Hinton 2012) and long short-term memory (LSTM) (Hochreiter and Schmidhuber 1997), referred to as CNN-LSTM, as the baseline model for jointly capturing the spatio-temporal aspects of traffic states.

CNN and LSTM extract information from the input data from two different perspectives - learning the time-invariant spatial characteristics in CNN; and short- and long-term temporal patterns in LSTM. Specifically, the convolutional layer in the model helps learn an internal representation of a two (or higher)-dimensional input through a feature learning process using kernel functions or filters. Notably, the convolution allows the model to learn features that are invariant across the time dimension. These features are then passed through an activation function to introduce non-linearities into the mapping function. Other layers, such as pooling layer are also used to reduce the number of parameters, while dropout layer prevents the model from overfitting. As a result, it generates more accurate feature representations, allowing the LSTM layers to learn temporal patterns with greater accuracy. LSTM uses feedback mechanism and several gating mechanisms where output from a previous time step is fed as input to the current step such that selective information from the past can propagate into the future states – allowing them to persist, which makes it suitable for capturing the temporal evolution of traffic states.

A schematic of the model architecture considered in the study is shown in Figure 1. The architecture comprises a convolutional layer that processes the spatio-temporal input to extract relevant features using filters. The input to the model is structured as an $N\times n$ array that is constructed using data from $N$ fixed-location stations, with each row representing flows from a particular station over a period of time, $n$ . Notably, the pooling layers were not utilized since the spatial dimension of traffic data is limited. After the convolutional layer, a flattening layer is employed to reshape the outputs appropriately. The resulting feature outputs from the convolutional operation are then fed into an LSTM, which generates outputs through a sequence of densely connected layers.

Refer to caption — Figure 1: Architecture of CNN-LSTM model

Methodology

Feature transformation

DL models learn the spatial and temporal characteristics specific to a detector configuration for which it has been trained using the traffic flows from neighboring locations. In other words, it is trained to map temporal patterns from the given spatial configuration to its target location. However, this mapping is specific to the exact location of detectors from which data is used (e.g., the distance between input and target detectors). Therefore, these models are difficult to adapt to varied configurations and hence cannot perform suitably, i.e. they are not transferable. To improve the transferability of the models, we aim to utilize knowledge from traffic flow theory and propose a physics-based feature transformation to the model inputs. Notably, we explicitly assume that the data available at the transfer location is limited and hence, cannot be used to train a new model.

In our proposed approach, instead of using the flow data from the input detectors, the inputs are modified using Newell’s transformations. The specific modification depends on whether the target location is upstream or downstream of the location from which input data is utilized. Hence, it is expected to learn generalized features independent of the distance between the detectors and hence be suitable for transfer learning.

The physics-based modified version in this study includes three approaches: ‘Physics FC’, ‘Physics FF’ and ‘Hybrid’. In all three approaches, it is assumed that free-flow conditions prevail at an upstream station (e.g., location $A$ ) and the cumulative counts for the target or transfer station ( $X$ ) are estimated using Equation 2 based on the cumulative counts at station $A$ . The three approaches differ in how they treat input from a downstream station $B$ . The ‘Physics FC’ approach assumes that congestion prevails at a station downstream of the target or transfer stations, and hence uses Equation 5 on the cumulative counts at station $B$ to determine the cumulative counts for the target or transfer station, $X$ . To the contrary, the ‘Physics FF’ approach assumes that free-flow prevails at the downstream station, and hence uses Equation 3 on the cumulative counts at station $B$ to determine the cumulative counts for the target or transfer station, $X$ . The ‘Hybrid’ approach, on the other hand, does not make explicit assumptions regarding the prevailing conditions (free-flow or congestion) at the station downstream of the target or transfer locations. Instead, it considers two inputs for the target or transfer station, $X$ – one calculated using Equation 3 based on the cumulative counts at station $B$ , and the other calculated using Equation 5 based on the cumulative counts at station $B$ . This approach utilizes both pieces of information to estimate the cumulative counts for the target or transfer station, thereby combining aspects of both free-flow and congestion conditions. Once the cumulative counts are computed for each case, the corresponding flows are then calculated by taking the differences between consecutive observations of the cumulative counts.

It is important to note that accurate estimation of fundamental traffic parameters such as free-flow speed ( $v_{f}$ ), congestion wave speed ( $w$ ), and jam density ( $k_{j}$ ) is crucial for the proper functioning of these transformations. The estimation of these parameters will be discussed in the following section.

Estimation of traffic parameters using detector data

In our methodology, we demonstrate the estimation of traffic parameters using data from multiple loop detectors. For a stretch of freeway, we combine data from multiple detectors to obtain mean estimates of these parameters. This aggregation of detector data assumes homogeneity within the freeway section, implying that the traffic parameters do not significantly vary within the section. This assumption aligns with the assumptions made in Newell’s solutions. However, it is important to note that for a larger number of detectors with wider spatial coverage, this assumption may not hold true.

The data provided includes measurements of flow ( $q$ ), occupancy ( $o$ ), and speed ( $v$ ) recorded at 5-minute intervals from each station. Assuming a triangular fundamental diagram, we use the fundamental equation of traffic, $q=k\times v$ , to calculate the corresponding densities ( $k$ ). To determine the capacity ( $q_{c}$ ) and free-flow speed ( $v_{f}$ ), we consider the $95^{th}$ percentile values of flow and speed, respectively. This is done because the maximum flow and speed values do not indicate a stable state and persist for only a short period of time. The critical density ( $k_{c}$ ) is then evaluated using the fundamental equation of traffic as $k_{c}=q_{c}/v_{f}$ . These parameters provide essential insights into the traffic conditions and are fundamental for our subsequent analysis. Figure 2 illustrates the flow versus density relationship for four stations along the SR04 California highway.

The freeway operates in free-flow conditions for most of the day with congested states appearing only during peak hours. Therefore, congested states ( $k>k_{c}$ ) are not easily observable in the flow-density curve, which do not allow us to directly estimate the full congested branch of the fundamental diagram, including congestion wave speed ( $w$ ) and jam density ( $k_{j}$ ). Hence, we assume a fixed value of $w=14$ for our analysis.

Data

The traffic data used in this study for model training and evaluation is obtained from the California Department of Transportation’s Performance Measurement System (PeMS), which is widely used in similar research (Lv et al. 2014; Huang et al. 2014; Mallick et al. 2021). The PeMS captures real-time traffic data from sensors along the freeway and ramps at 30 second intervals, which are aggregated every 5 minutes. The dataset used for model training and evaluation consists of flow, occupancy, and speed, collected from a series of vehicle detection sensors (VDS).

For this work, we use two datasets - (1) Dataset 1: California SR04 Delta Highway, collected over a period of 56 consecutive days from July 1, 2021 to August 25, 2021 including weekdays and weekends, and (2) Dataset 2: California Interstate-05 NB in District 11, collected for one year in 2019. In Dataset 1, the distances between consecutive stations are 0.5 mi, 0.3 mi, and 0.5 mi, respectively. On the other hand, for Dataset 2, the distances between consecutive stations are 1.6 mi, 0.6 mi, and 0.9 mi. These datasets provide an understanding of how the model can scale spatially by considering the proximity and distances between monitoring stations along the freeways. Figures 3 and 4 depicts the map of the station configurations, illustrating the locations and distances between monitoring stations for both Dataset 1 (California SR04 Delta Highway) and Dataset 2 (California Interstate-05 NB in District 11). In addition, Figure 5 displays the temporal patterns of flows at various stations in both Dataset 1 and Dataset 2.

It is important to note that on- and off-ramp traffic volumes were not always available. Hence, the vehicle inflows and outflows associated with on-ramps and off-ramps could not be adjusted to maintain vehicle conservation along the freeway. As a result, the analysis and modeling in this study focus solely on the traffic data collected from the detection sensors along the main freeway lanes, without accounting for the ramp movements.

Results and discussions

The goal is to predict traffic flow 5 minutes into the future at a station where data is not available by utilizing input data from neighboring detectors for the the last 50 minutes for Dataset 1 and 100 minutes for Dataset 2. Sensitivity analysis on the performance for longer prediction time periods (from 5 minutes to 25 minutes) are discussed later.

Prediction scenario

To evaluate and compare the performance of the models, different scenarios are considered based on the relative positions of the target and transfer stations. This relative positioning determines which Newell’s transformation is applied to the target and transfer locations. Table 2 presents these scenarios, labeled A1, A2, B1, B2, C1, C2, D1, and D2, where the model’s performance is assessed at both the target (where it has been trained) and the transfer locations without further retraining. It is assumed that up-to-date data is available at the two source locations, while no data is available at the target or transfer locations for prediction purposes. For each scenario, a regular model and three physics-modified models are considered.

Different scenarios are considered to better understand when the proposed models can improve predictions. For instance, let us consider a scenario where the model is trained using traffic information from an upstream and downstream station to predict flows at an intermediate target location (Scenario A1 and A2). In this case, the physics-modified feature inputs consist of Newell’s uncongested flow estimate (from the upstream station) and either an uncongested or congested flow estimate (from the downstream station). If the same model is applied to another intermediate location within the bounds of the upstream and downstream stations, the model will still receive similar flow estimates as inputs. However, in other scenarios (e.g., D1 or D2), the models are trained to predict flow at an upstream (or downstream) location relative to both stations but are transferred to predict flow at a downstream (or upstream) location. Since the input features in the target and transfer domains differ significantly, it is important to investigate the model’s performance under these circumstances as well.

It is worth noting that Case B always utilizes free-flow features both in the target and transfer locations, since both stations are upstream of target and transfer. Therefore, we have performance recorded only for the ‘Physics FF’ method in this case. Additionally, a hybrid model could not be employed for Case D since the dimensions of the input features in the target and transfer locations would be different, making it challenging to maintain consistency in the model architecture.

Table 2: Prediction performance scenario

Scenario	S1	S2	S3	S4
A1	Source 1	Target	Transfer	Source 2
A2	Source 1	Transfer	Target	Source 2
B1	Source 1	Source 2	Target	Transfer
B2	Source 1	Source 2	Transfer	Target
C1	Transfer	Target	Source 1	Source 2
C2	Target	Transfer	Source 1	Source 2
D1	Target	Source 1	Source 2	Transfer
D2	Transfer	Source 1	Source 2	Target

Evaluation metrics

We evaluate the model performances for both single- and multi-step prediction horizons, by comparing the prediction mean with the corresponding true flow values using three metrics: root mean squared error (RMSE), mean absolute percentage error (MAPE) and $R^{2}$ as defined below.

\mathrm{RMSE}={\sqrt{\dfrac{1}{n}\sum_{i=1}^{N}\left[y_{i}-\hat{y_{i}}\right]^{2}}}

(6)

\mathrm{MAPE}=\dfrac{100\%}{N}\sum_{i=1}^{N}\lvert{\dfrac{y_{i}-\hat{y_{i}}}{y_{i}}}\rvert

(7)

\mathrm{R^{2}}=1-\dfrac{\sum_{i=1}^{N}\left[y_{i}-\hat{y_{i}}\right]^{2}}{\sum_{i=1}^{N}\left[y_{i}-\bar{y_{i}}\right]^{2}}

(8)

where $y_{i}$ represents the true value of the observation $i$ , $\hat{y_{i}}$ is the predicted value of $y_{i}$ for $i=1,2,\dots T$ . Both RMSE and mean absolute error (MAE) measures the error of the predictions, however, RMSE is often preferred due to its higher penalization to outliers compared to MAE, where all errors are weighed equally. On the other hand, $\mathrm{R^{2}}$ measures the performance of the model in terms of proportion of the variance in data that could be explained by the regression model.

Model training

In this study, two different model architectures are used due to variations in the volume of the two datasets. For Dataset 1, the model architecture comprises a 2-dimensional convolutional layer with 12 filters and a kernel size of (3, 2), activated by the rectified linear unit (ReLU) activation function. The outputs from the convolutional layer are then flattened and reshaped appropriately to be fed into two LSTM layers with 10 and 6 units respectively. Finally, a single-unit dense layer is utilized to predict flows. On the other hand, for Dataset 2, the model architecture consists of a 2-dimensional convolutional layer with 16 filters and a kernel size of (3, 2), activated by the rectified linear unit (ReLU) activation function. This is followed by two LSTM layers with 10 and 6 units, respectively. The outputs from the LSTM layers are then passed through a dense layer with 6 units, activated by the ReLU activation function. Finally, the prediction layer is added to generate the traffic flow predictions for the specified future time point.

To ensure generalizability and prevent overfitting, the models are trained, validated, and tested using three distinct sets. The dataset is partitioned into three parts: 60% for model training, 15% for validation, and 25% for testing. The model parameters are tuned throughout the training process based on their performance on the validation set. The mean squared error (MSE) loss function is minimized during the training process, and the model with the lowest validation error is selected as the final model. For the models trained on Dataset 1, a batch size of 10 and a temporal lag of 10 are used. In contrast, the models trained on Dataset 2 uses a batch size of 20 and a temporal lag of 20. The Adadelta (Zeiler 2012) optimizer is employed with a learning rate of 0.10, a rho value of 0.95, and an epsilon value of 1e-7 to train the models.

Dataset 1: California SR04

Prediction performances of the models trained on Dataset 1 are provided in Tables 3, 4 and 5 corresponding to different performance metrics – RMSE, $R^{2}$ and MAPE.

Table 3: Comparison of RMSE between regular and physics-modified models trained on Dataset 1: California SR04

Case	Location	Regular	Physics FC	Physics FF	Hybrid
A1	Target	27.4609	27.4523	27.4251	27.3052
A1	Transfer	24.1859	24.0482	24.2259	24.1502
A2	Target	23.9148	23.9316	23.9943	23.8019
A2	Transfer	27.5623	27.7183	27.6590	27.4935
B1	Target	23.9541	-	23.9417	-
B1	Transfer	26.6676	-	26.6647	-
B2	Target	26.0435	-	25.9842	-
B2	Transfer	24.2526	-	24.4648	-
C1	Target	27.8632	27.8772	27.8294	27.8290
C1	Transfer	29.3075	30.6976	29.2691	29.5725
C2	Target	29.0854	30.4303	29.0880	29.0852
C2	Transfer	28.0220	28.3145	28.1024	28.0230
D1	Target	29.3008	29.2855	29.2226	-
D1	Transfer	27.3864	29.4767	27.4675	-
D2	Target	26.5213	26.5085	26.5085	-
D2	Transfer	29.5364	29.3480	28.5449	-

Table 4: Comparison of

R^{2}

between regular and physics-modified models trained on Dataset 1: California SR04

Case	Location	Regular	Physics FC	Physics FF	Hybrid
A1	Target	0.9648	0.9648	0.9649	0.9652
A1	Transfer	0.9705	0.9709	0.9704	0.9706
A2	Target	0.9712	0.9711	0.9710	0.9714
A2	Transfer	0.9645	0.9641	0.9643	0.9647
B1	Target	0.9711	-	0.9711	-
B1	Transfer	0.9739	-	0.9739	-
B2	Target	0.9751	-	0.9752	-
B2	Transfer	0.9703	-	0.9698	-
C1	Target	0.9638	0.9637	0.9638	0.9639
C1	Transfer	0.9692	0.9663	0.9693	0.9687
C2	Target	0.9697	0.9668	0.9697	0.9697
C2	Transfer	0.9633	0.9626	0.9631	0.9633
D1	Target	0.9693	0.9693	0.9694	-
D1	Transfer	0.9724	0.9681	0.9723	-
D2	Target	0.9741	0.9742	0.9742	-
D2	Transfer	0.9688	0.9692	0.9708	-

Table 5: Comparison of MAPE between regular and physics-modified models trained on Dataset 1: California SR04

Case	Location	Regular	Physics FC	Physics FF	Hybrid
A1	Target	0.8093	0.7971	0.8065	0.7657
A1	Transfer	0.7156	0.6964	0.7079	0.6809
A2	Target	0.7183	0.7008	0.7070	0.6872
A2	Transfer	0.8201	0.8149	0.8291	0.7958
B1	Target	0.7136	-	0.7184	-
B1	Transfer	1.0218	-	1.0394	-
B2	Target	0.9575	-	0.9838	-
B2	Transfer	0.7440	-	0.7956	-
C1	Target	0.8096	0.8086	0.8139	0.7681
C1	Transfer	5.0663	4.6473	5.0910	5.1809
C2	Target	5.0956	5.0455	5.1540	5.2339
C2	Transfer	0.8543	0.8717	0.8476	0.8487
D1	Target	5.2534	4.8477	5.1256	-
D1	Transfer	1.0569	0.9453	1.0630	-
D2	Target	1.0126	1.0126	1.0127	-
D2	Transfer	4.5709	4.2736	4.3245	-

The performance results for Cases A, C, and D demonstrate that Newell’s assumption of congested states prevailing downstream does not hold true, or the interference of ramp flows are too large for Newell’s conservation assumptions to hold. As a result, the ‘Physics FC’ method, which relies on this assumption, does not outperform the regular model either at the target or transfer locations in the cases where the source station is downstream of either the target or transfer location, or both. In contrast, we find that the assumption of free-flow states performs better in the majority of these scenarios.

The hybrid approach demonstrates superior performance in both Cases A and C compared to other approaches. In Cases A1 and A2, the feature inputs consist of three dimensions: one from the upstream station and two from the downstream stations. This configuration allows the model to leverage the free-flow characteristics from both upstream and downstream stations, as well as the congested information from downstream, thereby enhancing its prediction capabilities. Similarly, in Cases C1 and C2, the input feature size is expanded to four dimensions, with two dimensions from both the upstream and downstream stations. This enables the model to incorporate the combined information from both directions, taking into account the variations in traffic conditions along the freeway section. By considering the inputs from both upstream and downstream stations, the model can better capture the complex interactions and correlations between different segments of the freeway, leading to improved prediction accuracy.

Dataset 2: Interstate-05 NB

Here, we present the modeling results on Dataset 2 in Table 6, which displays the RMSE values. Other evaluation measures exhibit similar trends and are not included in the table for brevity. In this larger dataset, we also evaluate the accuracy of the model predictions separately during free flow and congestion to better understand the benefits of the proposed approaches. To distinguish between free-flow and congestion states, we divide the daily traffic into two categories based on a speed threshold: if the speed drops below 50 mph it is assumed that that location is in congestion (as shown in blue dashed lines), See Figure 6.

Table 6: Comparison of RMSE between regular and physics-modified models trained on Dataset 2: Interstate-05 NB

Case	State	Location	Regular	Physics FC	Physics FF	Hybrid
A1	Combined	Target	27.6386	27.7352	27.6706	27.4902
	Combined	Transfer	41.4230	41.5126	42.1914	41.4253
	Free-flow	Target	24.7315	24.7775	24.9809	24.5950
	Free-flow	Transfer	37.2959	37.7488	38.0739	37.4430
	Congestion	Target	38.6853	38.9359	38.0415	38.4616
	Congestion	Transfer	57.2159	56.1263	58.0150	56.7572
A2	Combined	Target	46.5795	48.0333	47.0279	45.6740
	Combined	Transfer	38.2043	39.8507	38.1397	37.5140
	Free-flow	Target	47.1252	48.8942	47.5663	46.3539
	Free-flow	Transfer	32.0716	33.4739	32.1922	31.6403
	Congestion	Target	44.4094	44.3008	44.8736	42.8027
	Congestion	Transfer	59.4009	61.9251	58.8517	57.9369
B1	Combined	Target	38.5346	-	38.6557	-
	Combined	Transfer	43.0122	-	42.2568	-
	Free-flow	Target	38.2690	-	38.4363	-
	Free-flow	Transfer	39.9212	-	39.0535	-
	Congestion	Target	39.9460	-	39.8573	-
	Congestion	Transfer	55.7046	-	55.2978	-
B2	Combined	Target	36.5822	-	36.6687	-
	Combined	Transfer	32.0468	-	31.1714	-
	Free-flow	Target	35.4826	-	35.5825	-
	Free-flow	Transfer	29.7955	-	28.9267	-
	Congestion	Target	41.9924	-	42.0334	-
	Congestion	Transfer	41.3182	-	40.3653	-
C1	Combined	Target	32.7486	32.1711	32.9541	33.0664
	Combined	Transfer	41.3909	45.6134	42.0968	44.0113
	Free-flow	Target	29.8942	29.4090	30.3760	30.6260
	Free-flow	Transfer	38.6701	42.6464	39.2933	39.8573
	Congestion	Target	44.0564	43.1335	43.3757	43.0108
	Congestion	Transfer	52.6719	57.8503	53.1065	60.1006
C2	Combined	Target	37.1587	39.6226	37.9617	36.9906
	Combined	Transfer	36.0370	36.9692	36.3954	35.7617
	Free-flow	Target	36.5649	39.0817	37.6206	36.7317
	Free-flow	Transfer	31.5107	32.5991	32.4174	31.6026
	Congestion	Target	40.1009	42.3659	39.7900	38.4324
	Congestion	Transfer	52.5501	53.1936	51.3670	51.2351
D1	Combined	Target	33.5547	35.1328	33.4300	-
	Combined	Transfer	46.7479	50.9093	46.8157	-
	Free-flow	Target	33.0796	34.7593	32.8865	-
	Free-flow	Transfer	41.2508	42.4285	41.3332	-
	Congestion	Target	35.8799	37.0699	36.0348	-
	Congestion	Transfer	67.3285	79.9866	67.3495	-
D2	Combined	Target	35.8765	36.1171	36.1171	-
	Combined	Transfer	40.5691	45.7459	37.9290	-
	Free-flow	Target	34.7978	35.0323	35.0323	-
	Free-flow	Transfer	36.6836	40.0097	34.5566	-
	Congestion	Target	41.1896	41.4576	41.4576	-
	Congestion	Transfer	55.5357	66.6779	51.0994	-

The results for Cases A1 and A2, where both the target and transfer locations are in between the source detector stations, again indicate that the use of Newell’s modification for congestion conditions does not improve predictions, likely due to lack of congestion or interference of traffic dynamics due to ramp flows. This finding aligns with the observations from Dataset 1. Interestingly, assuming free-flow conditions leads to only minor improvements in some cases. However, when we employ the hybrid approach that combines information from both upstream (free-flow) and downstream (congested and uncongested) stations, we observe significant improvements in model performance. This hybrid approach outperforms both the regular model and other versions of the physics-based approaches. It is important to note that during the model training, the optimization is still based on the combined mean squared error (MSE) loss, meaning that the model does not individually optimize for losses during free-flow and congestion. The superior performance of the hybrid approach highlights the effectiveness of incorporating both free-flow and congestion information to enhance the model’s predictive capabilities.

In Case C1, we observe that Newell’s assumption of congestion prevailing downstream (Physics FC) works well at the target location (S2), where only congestion characteristics from downstream locations are used during model training. However, when this assumption is transferred to S1, it does not perform well. In contrast, the assumption of free-flow conditions (Physics FF) shows poor performance at the target location but demonstrates better performance at the transfer location in Case C1. Surprisingly, the hybrid approach also shows poor overall performance in both the target and transfer locations. But it still provides better performance during congestion at the target location.

The discrepancy in performance can be attributed to location-specific differences, such as variations in traffic patterns, congestion levels, or spatial configurations of the freeway section. In this case, the target station for Case C1 (S2) is located 0.6 mi and 1.5 mi away from the source stations, while the transfer location for Case C1 (S1) is located 2.2 mi and 3.1 mi away. These differences in spatial distances can significantly impact the traffic dynamics and patterns observed at each location. To further support this observation, we examine the flow vs. speed trends for both S2 (target in Case C1) and S1 (transfer in Case C1) as depicted in Figure 7. When comparing these trends, we find that S1 operates predominantly in free-flow conditions with only short-lived congested states. This aligns with the better transferability of the FF model, which leverages the free-flow characteristics, as opposed to the hybrid model that combines both free-flow and congested information.

In Case C2, we observe a different pattern compared to Case C1. The hybrid approach performs better than both the regular model and the ‘Physics FF’ approach in both the target and transfer locations. The ‘Physics FF’ model also demonstrates good performance, consistent with the assumption that the target station mostly operates in free-flow conditions. However, the hybrid approach excels by accurately accounting for the dependencies between the source and target station.

In Cases B1 and B2, we observe that while the regular model performs better at the target location, its transferability is compromised, indicating that it does not generalize well to different locations. On the other hand, the physics-based FF model demonstrates better transferability in these cases. This suggests that the ‘Physics FF’ model captures the underlying dynamics of traffic more effectively, allowing it to perform well even at locations it has not been explicitly trained on.

In Cases D1 and D2, we consistently observe that the ‘Physics FF’ model outperforms the ‘Physics FC’ model, indicating that free-flow characteristics play a significant role in these scenarios. However, it is important to note that there are some instances where the regular model shows minor advantages.

These inconsistencies in performance could be attributed to several factors. Firstly, the spatial distances between stations in this dataset are larger, which may violate the assumption of homogeneous traffic parameters over the entire section. Secondly, the presence of major interchanges and ramps between the mainline detectors can significantly impact traffic patterns. When ramps experience high volumes of vehicles trying to enter or exit the highway, they can disrupt the smooth flow of traffic and lead to localized congestion or bottlenecks on the mainline. For example, congestion on exit ramps can lead to queue spillbacks, causing disruptions in traffic flow on the mainline. These disruptions can result in variations in traffic conditions that may not be accurately captured by the models.

Overall, while the ‘Physics FF’ and hybrid models (where applicable) generally outperform the regular model and ‘Physics FC’ in terms of transferability, the inconsistencies in trends compared to Dataset 1 highlight the influence of spatial distances, non-homogeneous traffic parameters, and the presence of interchanges and ramps on the accuracy of the models. These factors should be carefully considered when applying and interpreting the modeling results in such complex freeway environments.

Time sensitivity

Performing a sensitivity analysis along the time dimension is crucial to assess the robustness of the model when predicting traffic flows further into the future. In this case, both the regular model and the physics-based hybrid model were trained on Dataset 1 for Case A. The models were evaluated to predict 25 minutes into the future using a temporal history of 50 minutes, which consists of 10 time steps. The analysis reveals that the prediction errors for both models increase as the prediction horizon becomes longer. This is expected, as longer prediction horizons introduce more uncertainties and make it more challenging to accurately forecast traffic flows. However, it is observed that the physics-based hybrid model consistently outperforms the regular model at all prediction horizons. The performance gap between the two models tends to widen as the prediction horizon increases, indicating the superiority of the physics-based approach in capturing the dynamics of traffic and improving the accuracy of predictions. Furthermore, the physics-based hybrid model demonstrates relatively higher benefits at the transfer location compared to the target location. This suggests that incorporating physics-based principles and combining information from both upstream and downstream stations allows for a more comprehensive representation of the traffic conditions and enhances the model’s predictive capabilities.

Conclusions

Real-time traffic prediction on freeways plays a crucial role in the effective functioning of Intelligent Transportation Systems. Extensive research has been conducted to enhance the accuracy of traffic prediction models, including statistical parametric and non-parametric approaches that provide improved flexibility in capturing the spatial and temporal aspects of traffic patterns. However, despite these advancements, state-of-the-art spatio-temporal models such as CNN-LSTM often learn case-specific features that lack generalizability. These models uncover only limited information about the underlying traffic dynamics, thereby limiting their usability to specific training conditions.

In this study, we propose a feature transformation approach for traffic flow prediction models based on traffic flow theory. The proposed model demonstrates the ability to transfer its learned features to different settings, thereby enhancing its generalizability. The results indicate that the physics-based model can learn more universal features by mapping true traffic flows using estimates of both congested and uncongested flows. When compared to the regular model, the physics-based models exhibits improved prediction performance at both the target and transfer locations across various prediction scenarios. This improvement can be attributed to the inherent ability of the proposed model’s feature inputs to account for spatial shifts, making them more transferable. Our analysis reveals that physics-based FC models trained using congested state estimators from downstream locations do not perform well. This lack of performance could be attributed to the absence of congested states downstream or the significant interference of ramp flows, which violate Newell’s conservation assumptions. On the other hand, FF and hybrid models generally outperform the regular model and FC model in terms of transferability. Minor discrepancies in trends for larger datasets are observed, likely due to location-specific attributes. Additionally, the physics-based hybrid model consistently outperforms the regular model across all prediction horizons in a multi-step prediction setting. However, due to limitations in the available data, we were unable to conduct spatial sensitivity analysis, highlighting the need for further research using simulated data.

References

Ahmed and Cook (1979) Ahmed, M. S.; and Cook, A. R. 1979. Analysis of freeway traffic time-series data by using Box-Jenkins techniques. 722.
Chang and Su (1995) Chang, G.-L.; and Su, C.-C. 1995. Predicting intersection queue with neural network models. Transportation Research Part C: Emerging Technologies, 3(3): 175–191.
Chen and Chien (2001) Chen, M.; and Chien, S. I. 2001. Dynamic freeway travel-time prediction with probe vehicle data: Link based versus path based. Transportation Research Record, 1768(1): 157–161.
Cheng et al. (2018) Cheng, X.; Zhang, R.; Zhou, J.; and Xu, W. 2018. Deeptransport: Learning spatial-temporal dependency for traffic condition forecasting. In 2018 International Joint Conference on Neural Networks (IJCNN), 1–8. IEEE.
Cheslow, Hatcher, and Patel (1992) Cheslow, M.; Hatcher, S. G.; and Patel, V. M. 1992. An initial evaluation of alternative intelligent vehicle highway systems architectures. Technical report.
Chien and Kuchipudi (2003) Chien, S. I.-J.; and Kuchipudi, C. M. 2003. Dynamic travel time prediction with real-time and historic data. Journal of transportation engineering, 129(6): 608–616.
Clark (2003) Clark, S. 2003. Traffic prediction using multivariate nonparametric regression. Journal of transportation engineering, 129(2): 161–168.
Cui et al. (2019) Cui, Z.; Henrickson, K.; Ke, R.; and Wang, Y. 2019. Traffic graph convolutional recurrent neural network: A deep learning framework for network-scale traffic learning and forecasting. IEEE Transactions on Intelligent Transportation Systems, 21(11): 4883–4894.
Cui et al. (2020) Cui, Z.; Ke, R.; Pu, Z.; and Wang, Y. 2020. Stacked bidirectional and unidirectional LSTM recurrent neural network for forecasting network-wide traffic state with missing values. Transportation Research Part C: Emerging Technologies, 118: 102674.
Davis and Nihan (1991) Davis, G. A.; and Nihan, N. L. 1991. Nonparametric regression and short-term freeway traffic forecasting. Journal of Transportation Engineering, 117(2): 178–188.
Dia (2001) Dia, H. 2001. An object-oriented neural network approach to short-term traffic forecasting. European Journal of Operational Research, 131(2): 253–261.
Dougherty and Cobbett (1997) Dougherty, M. S.; and Cobbett, M. R. 1997. Short-term inter-urban traffic forecasts using neural networks. International journal of forecasting, 13(1): 21–31.
Florio and Mussone (1996) Florio, L.; and Mussone, L. 1996. Neural-network models for classification and forecasting of freeway traffic flow stability. Control Engineering Practice, 4(2): 153–164.
Hamed, Al-Masaeid, and Said (1995) Hamed, M. M.; Al-Masaeid, H. R.; and Said, Z. M. B. 1995. Short-term prediction of traffic volume in urban arterials. Journal of Transportation Engineering, 121(3): 249–254.
Henaff, Bruna, and LeCun (2015) Henaff, M.; Bruna, J.; and LeCun, Y. 2015. Deep convolutional networks on graph-structured data. arXiv preprint arXiv:1506.05163.
Hochreiter and Schmidhuber (1997) Hochreiter, S.; and Schmidhuber, J. 1997. Long short-term memory. Neural computation, 9(8): 1735–1780.
Huang and Agarwal (2020) Huang, J.; and Agarwal, S. 2020. Physics Informed Deep Learning for Traffic State Estimation. In 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC), 1–6.
Huang et al. (2014) Huang, W.; Song, G.; Hong, H.; and Xie, K. 2014. Deep architecture for traffic flow prediction: deep belief networks with multitask learning. IEEE Transactions on Intelligent Transportation Systems, 15(5): 2191–2201.
Innamaa (2000) Innamaa, S. 2000. Short-term prediction of traffic situation using MLP-neural networks. In Proceedings of the 7th world congress on intelligent transport systems, Turin, Italy, 6–9.
Kaysi, Ben-Akiva, and Koutsopoulos (1993) Kaysi, I.; Ben-Akiva, M. E.; and Koutsopoulos, H. 1993. An integrated approach to vehicle routing and congestion prediction for real-time driver guidance, volume 1408. Transportation Research Board.
Krizhevsky, Sutskever, and Hinton (2012) Krizhevsky, A.; Sutskever, I.; and Hinton, G. E. 2012. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25.
LeCun, Bengio, and Hinton (2015) LeCun, Y.; Bengio, Y.; and Hinton, G. 2015. Deep learning. nature, 521(7553): 436–444.
Levin and Tsao (1980) Levin, M.; and Tsao, Y.-D. 1980. On forecasting freeway occupancies and volumes (abridgment). Transportation Research Record, (773).
Lin et al. (2019) Lin, F.; Xu, Y.; Yang, Y.; and Ma, H. 2019. A spatial-temporal hybrid model for short-term traffic prediction. Mathematical Problems in Engineering, 2019.
Lu et al. (2020) Lu, H.; Huang, D.; Song, Y.; Jiang, D.; Zhou, T.; and Qin, J. 2020. St-trafficnet: A spatial-temporal deep learning network for traffic forecasting. Electronics, 9(9): 1474.
Lv et al. (2014) Lv, Y.; Duan, Y.; Kang, W.; Li, Z.; and Wang, F.-Y. 2014. Traffic flow prediction with big data: a deep learning approach. IEEE Transactions on Intelligent Transportation Systems, 16(2): 865–873.
Lyons et al. (1996) Lyons, G.; McDonald, M.; Hounsell, N.; Williams, B.; Cheese, J.; and Radia, B. 1996. Urban traffic management; the viability of short term congestion forecasting using artificial neural networks. PTRC.
Ma et al. (2015) Ma, X.; Tao, Z.; Wang, Y.; Yu, H.; and Wang, Y. 2015. Long short-term memory neural network for traffic speed prediction using remote microwave sensor data. Transportation Research Part C: Emerging Technologies, 54: 187–197.
Mallick et al. (2021) Mallick, T.; Balaprakash, P.; Rask, E.; and Macfarlane, J. 2021. Transfer learning with graph neural networks for short-term highway traffic forecasting. In 2020 25th International Conference on Pattern Recognition (ICPR), 10367–10374. IEEE.
Mingheng et al. (2013) Mingheng, Z.; Yaobao, Z.; Ganglong, H.; and Gang, C. 2013. Accurate multisteps traffic flow prediction based on SVM. Mathematical Problems in Engineering, 2013.
Newell (1993) Newell, G. F. 1993. A simplified theory of kinematic waves in highway traffic, part I: General theory. Transportation Research Part B: Methodological, 27(4): 281–287.
Okutani and Stephanedes (1984) Okutani, I.; and Stephanedes, Y. J. 1984. Dynamic prediction of traffic volume through Kalman filtering theory. Transportation Research Part B: Methodological, 18(1): 1–11.
Pan and Yang (2009) Pan, S. J.; and Yang, Q. 2009. A survey on transfer learning. IEEE Transactions on knowledge and data engineering, 22(10): 1345–1359.
Park, Messer, and Urbanik (1998) Park, B.; Messer, C. J.; and Urbanik, T. 1998. Short-term freeway traffic volume forecasting using radial basis function neural network. Transportation Research Record, 1651(1): 39–47.
Park and Rilett (1998) Park, D.; and Rilett, L. R. 1998. Forecasting multiple-period freeway link travel times using modular neural networks. Transportation research record, 1617(1): 163–170.
Park and Rilett (1999) Park, D.; and Rilett, L. R. 1999. Forecasting freeway link travel times with a multilayer feedforward neural network. Computer-Aided Civil and Infrastructure Engineering, 14(5): 357–367.
Rumelhart, Hinton, and Williams (1986) Rumelhart, D. E.; Hinton, G. E.; and Williams, R. J. 1986. Learning representations by back-propagating errors. nature, 323(6088): 533–536.
Shi et al. (2021) Shi, R.; Mo, Z.; Huang, K.; Di, X.; and Du, Q. 2021. A Physics-Informed Deep Learning Paradigm for Traffic State and Fundamental Diagram Estimation. IEEE Transactions on Intelligent Transportation Systems, 1–11.
Smith and Demetsky (1997) Smith, B. L.; and Demetsky, M. J. 1997. Traffic flow forecasting: comparison of modeling approaches. Journal of transportation engineering, 123(4): 261–266.
Smith, Williams, and Oswald (2002) Smith, B. L.; Williams, B. M.; and Oswald, R. K. 2002. Comparison of parametric and nonparametric models for traffic flow forecasting. Transportation Research Part C: Emerging Technologies, 10(4): 303–321.
Stathopoulos and Karlaftis (2003) Stathopoulos, A.; and Karlaftis, M. G. 2003. A multivariate state space approach for urban traffic flow modeling and prediction. Transportation Research Part C: Emerging Technologies, 11(2): 121–135.
Sun, Zhang, and Yu (2006) Sun, S.; Zhang, C.; and Yu, G. 2006. A Bayesian network approach to traffic flow forecasting. IEEE Transactions on intelligent transportation systems, 7(1): 124–132.
Thodi et al. (2022) Thodi, B. T.; Khan, Z. S.; Jabari, S. E.; and Menendez, M. 2022. Incorporating kinematic wave theory into a deep learning method for high-resolution traffic speed estimation. IEEE Transactions on Intelligent Transportation Systems.
Van Lint, Hoogendoorn, and van Zuylen (2002) Van Lint, J.; Hoogendoorn, S.; and van Zuylen, H. J. 2002. Freeway travel time prediction with state-space neural networks: modeling state-space dynamics with recurrent neural networks. Transportation Research Record, 1811(1): 30–39.
Whittaker, Garside, and Lindveld (1997) Whittaker, J.; Garside, S.; and Lindveld, K. 1997. Tracking and predicting a network traffic process. International Journal of Forecasting, 13(1): 51–61.
Yao et al. (2019) Yao, H.; Tang, X.; Wei, H.; Zheng, G.; and Li, Z. 2019. Revisiting spatial-temporal similarity: A deep learning framework for traffic prediction. In Proceedings of the AAAI conference on artificial intelligence, volume 33, 5668–5675.
Yao et al. (2018) Yao, H.; Wu, F.; Ke, J.; Tang, X.; Jia, Y.; Lu, S.; Gong, P.; Ye, J.; and Li, Z. 2018. Deep multi-view spatial-temporal network for taxi demand prediction. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32.
Yu et al. (2017) Yu, R.; Li, Y.; Shahabi, C.; Demiryurek, U.; and Liu, Y. 2017. Deep learning: A generic approach for extreme condition traffic forecasting. In Proceedings of the 2017 SIAM international Conference on Data Mining, 777–785. SIAM.
Zeiler (2012) Zeiler, M. D. 2012. Adadelta: an adaptive learning rate method. arXiv preprint arXiv:1212.5701.
Zhang, Zheng, and Qi (2017) Zhang, J.; Zheng, Y.; and Qi, D. 2017. Deep spatio-temporal residual networks for citywide crowd flows prediction. In Thirty-first AAAI conference on artificial intelligence.
Zhuang et al. (2020) Zhuang, F.; Qi, Z.; Duan, K.; Xi, D.; Zhu, Y.; Zhu, H.; Xiong, H.; and He, Q. 2020. A comprehensive survey on transfer learning. Proceedings of the IEEE, 109(1): 43–76.