Spatially Focused Attack against Spatiotemporal Graph Neural Networks

Fuqiang Liu, Luis Miranda-Moreno, Lijun Sun Corresponding author.

Abstract

Spatiotemporal forecasting plays an essential role in various applications in intelligent transportation systems (ITS), such as route planning, navigation, and traffic control and management. Deep Spatiotemporal graph neural networks (GNNs), which capture both spatial and temporal patterns, have achieved great success in traffic forecasting applications. Understanding how GNNs-based forecasting work and the vulnerability and robustness of these models becomes critical to real-world applications. For example, if spatiotemporal GNNs are vulnerable in real-world traffic prediction applications, a hacker can easily manipulate the results and cause serious traffic congestion and even a city-scale breakdown. However, despite that recent studies have demonstrated that deep neural networks (DNNs) are vulnerable to carefully designed perturbations in multiple domains like objection classification and graph representation, current adversarial works cannot be directly applied to spatiotemporal forecasting due to the causal nature and spatiotemporal mechanisms in forecasting models. To fill this gap, in this paper we design Spatially Focused Attack (SFA) to break spatiotemporal GNNs by attacking a single vertex. To achieve this, we first propose the inverse estimation to address the causality issue; then, we apply genetic algorithms with a universal attack method as the evaluation function to locate the weakest vertex; finally, perturbations are generated by solving an inverse estimation-based optimization problem. We conduct experiments on real-world traffic data and our results show that perturbations in one vertex designed by SA can be diffused into a large part of the graph.

1 Introduction

Spatiotemporal traffic forecasting has been a long-standing research topic and a fundamental application in intelligent transportation systems (ITS). For instance, with better prediction of future traffic states, navigation apps can help drivers avoid traffic congestion, and traffic signals can manage traffic flows to increase network capacity. Essentially, traffic forecasting can be modeled as a multivariate time series prediction problem for a network of connected sensors based on the topology of road networks. Given the complex spatial and temporal patterns governed by traffic dynamics and road network structure, recent studies have developed various Graph Neural Networks-based traffic forecasting models and achieved great success (Fang et al. 2019; Wu et al. 2019; Guo et al. 2019).

It has been shown in many recent studies that deep learning frameworks are very vulnerable to carefully designed attacks (see e.g., Kurakin, Goodfellow, and Bengio 2016b; Goodfellow, Shlens, and Szegedy 2014; Papernot et al. 2016a; Tramèr et al. 2017; Kurakin, Goodfellow, and Bengio 2016a). This raises a critical concern about the application of spatiotemporal GNNs-based models for real-world traffic forecasting, in which robustness and reliability are of utmost importance. For example, with a vulnerable forecasting model, a hacker can manipulate the predicted traffic states and feed these manipulated values into the downstream application, thus causing severe problems such as traffic congestion and even city-scale breakdown. Despite having superior accuracy, GNNs-based traffic prediction models are also facing great cyber-security challenges in practice. It remains a critical question to understand and evaluate the vulnerability of these models.

Refer to caption — Figure 1: Temporal and spatial gaps when applying current adversarial studies to spatiotemporal forecasting.

However, current adversarial works cannot be directly applied to evaluating the vulnerability of GNNs-based spatiotemporal forecasting because of the temporal and spatial gaps shown in Figure 1. In the following of this paper, we refer to “sensors” in a “road network” as “vertices” of a “graph” in GNNs and use two terms interchangeably. First, current adversarial works like adversarial attacks against recurrent neural network (RNN) (Rosenberg et al. 2019; Papernot et al. 2016b; Hu and Tan 2017) rely on the ground truth to generate perturbations, while the future state—the forecasting models’ ground truth—is inaccessible in traffic forecasting applications. For instance, the traffic condition at 11:35 am can not be detected by sensors at 11:30 am. Current adversarial works require the unavailable information at 11:35 am to generate perturbations when they are utilized to fool spatiotemporal forecasting models at 11:30 am. Thus these ground truth-based adversarial models cannot work anymore. We refer to this challenge as the “temporal gap” (see right and bottom part of Figure 1). Second, in real-world ITS applications, sensors are deployed on a large-scale road network. Following current adversarial studies, one assumes that all sensors can be manipulated at the same time (see red dots in Figure 1). However, this assumption is is unrealistic as it is impossible for the hacker vehicle to poison all sensors in such a large-scale road network. We refer to this challenge as the “spatial gap”. More discussions on why current adversarial works cannot be directly applied to attacking GNNs-based forecasting models are detailed in Section 2. Overall, it remains unclear how vulnerable these GNNs-based spatiotemporal forecasting using existing attack frameworks.

The goal of this paper is to understand and examine the vulnerability and robustness of GNNs-based spatiotemporal forecasting models. In doing so, we design a Spatially Focused Attack (SFA) framework to break these forecasting models by manipulating only one vertex in the graph (see the green square in Figure 1). We first propose Inverse Estimation (IE) to avoid using future ground truth and design and IE-based universal attack mechanism. Then, we utilize the genetic algorithm, of which the evaluation function consists of the proposed universal attack method, to locate the “weakest” sensor/vertex. Here the weakest vertex refers to the vertex that will it will cause maximum damage to the forecasting models when being attacked. Finally, we generate perturbations by solving an optimization problem. It should be noted that the proposed method does not require future information in designing perturbations. Following the proposed SFA framework, one hacker can break forecasting models by poisoning just one sensor in a large-scale road network. Thus, SFA is a realistic solution to evaluate the robustness and vulnerability of spatiotemporal forecasting models for real-world applications.

To prove the effectiveness of the proposed SFA method, we test it on two spatiotemporal traffic datasets with three different Spatiotemporal GNNs, including STGCN (Yu, Yin, and Zhu 2018), DCRNN (Li et al. 2017) and Graph Wavenet (Wu et al. 2019). Our results show that SFA can cause at least 15% accuracy drop, and there are about 10% sensors severely impacted with the boundary of speed variation limited to 15 km/h. The main contributions of this paper can be summarized as follows.

•

We propose a novel Spatially Focused Attack (SFA) method to find the weakest vertex and break the forecasting model by poisoning single one vertex. To the best of our knowledge, this is the first vulnerability study on GNNs-based spatiotemporal forecasting models by poisoning only one vertex.
•

We propose to use inverse estimation to avoid using future ground truth when computing perturbations.
•

We study the effectiveness of the proposed SFA with extensive experiments on real-world datasets.

2 Related Work

One Pixel Attack for Fooling Deep Neural Networks.

One pixel attack (Su, Vargas, and Kouichi 2019) utilizes Differential Evolution (DE) to generate the perturbation to poison one pixel in images and then fools CNNs. However, one-pixel attack requires the ground truth to compute perturbations, which means applying one-pixel attack still face the temporal gap. Moreover, images are regular-structured and there exist no temporal variations in one-pixel attack. One-pixel attack’ poisoning positions vary in different frames, which is not applicable to spatiotemporal forecasting domains. The above features prevent us from directly using one-pixel attack on spatiotemporal forecasting models.

Adversarial Attacks against Time Series Analysis.

Some previous works (Chen, Tan, and Zhang 2019; Zhou et al. 2019; Alfeld, Zhu, and Barford 2016; Karim, Majumdar, and Darabi 2019) propose adversarial attack methods against autoregressive models or time series classification models. Essentially, these works only consider univariate time series. Different from these works, we focus on multivariate time series generated from a complex spatial domain/network. The input of spatiotemporal GNNs is a dynamic graph rather than regular matrices or sequences. We take the spatial correlation into consideration, which is overlooked in previous studies.

Adversarial Attacks against Graph Neural Networks.

Many studies (Dai, Li, and Tian 2018; Tang et al. 2020; Zhang and Zitnik 2020; You et al. 2020) utilize reinforcement learning (RL), meta learning, or genetic algorithm to fool GNNs in vertex, edge, and graph classification tasks by tuning the graph topology. Still, these studies involve no temporal variations in their graphs, and they mainly focus on the spatial pattern. These models cannot be applied to fool spatiotemporal forecasting models because of the lack of temporal correlation. In particular, attacking spatiotemporal forecasting models deployed in real-world applications by graph topology-based attack methods (Zugner and Gunnemann 2019; Chang et al. 2020) are unrealistic, because tuning graph topology corresponds to tuning the structure of sensor networks (i.e., road networks) that collects spatiotemporal data continuously.

3 Preliminary

Traffic state data collected from a sensor network is often represented a time varying graph, which encodes both spatial and temporal information. In general, the spatiotemporal sequences can be represented as $\mathcal{G}_{t}=\left\{\mathcal{V}_{t},\mathcal{E},W\right\}$ , where $\mathcal{E}$ is the set of edges in the graph, $\mathcal{W}$ is the weighted adjacency matrix in which every element describes the spatial relationship between different sensors, $\mathcal{V}_{t}=\left\{v_{1,t},\ldots,v_{n,t}\right\}$ is the set of state values (e.g. traffic speed or traffic volume) collected from sensors on timestamp $t$ , and $n$ is the number of sensors (Shuman et al. 2013). For multi-step spatiotemporal forecasting, future states are estimated as

~{}\left\{\mathcal{G}^{*}_{t+M},...,\mathcal{G}^{*}_{t+1}\right\}=F\left(\left\{\mathcal{G}_{t},...,\mathcal{G}_{t-N+1}\right\}\right),

(1)

where $\mathcal{G}^{*}_{t}$ denotes the prediction of the states at time $t$ . Previous states from $t-N+1$ to $t$ are fed into a forecasting model $F$ that outputs predictions of future states from $t+1$ to $t+M$ . In general, we have $M\leq N$ . The above process is customarily called sequence-to-sequence (seq2seq) forecasting. Most spatiotemporal forecasting models output a single future state, which will be in turn fed as input into the model to forecast the next state. This process is named as the recursive multistep forecasting, and future states are computed as

~{}\left\{\begin{aligned} \mathcal{G}^{*}_{t+1}&=F\left(\left\{\mathcal{G}_{t},\mathcal{G}_{t-1},...,\mathcal{G}_{t-N+1}\right\}\right)\\ \mathcal{G}^{*}_{t+2}&=F\left(\left\{\mathcal{G}^{*}_{t+1},\mathcal{G}_{t},...,\mathcal{G}_{t-N+2}\right\}\right)\\ \vdots\\ \mathcal{G}^{*}_{t+M}&=F\left(\left\{\mathcal{G}^{*}_{t+M-1},\mathcal{G}^{*}_{t+M-2},...,\mathcal{G}_{t-N+M}\right\}\right).\\ \end{aligned}\right.

(2)

State-of-the-art forecasting models, $F$ , are generally constructed based on spatiotemporal GNNs (see e.g., Li et al. 2017; Wu et al. 2019; Yu, Yin, and Zhu 2018; Guo et al. 2019), consisting of both spatial layers and temporal layers. In general, these models use gated linear unit (GLU) or Gated-CNN (Chen et al. 2020; Bai, Kolter, and Koltun 2018) as the temporal layer to capture the temporal patterns embedded in the spatiotemporal sequence, and the Graph-CNN (Shuman et al. 2013; Bruna et al. 2014) is used as spatial layers to capture the spatial patterns. In this paper, we concentrate our adversarial studies on recursive multistep spatiotemporal forecasting models. However, the analysis can be easily extended to seq2seq-based multistep forecasting.

4 Methodology

In this section, we detail the proposed SFA framework, which essentially consists of three components. We first propose to use Inverse Estimation (IE) as a potential solution to address the temporal gap. Second, we derive a solution to locate the weakest vertex to avoid manipulating all sensors. Finally, an optimization model is introduced to SFA to compute and design the perturbations used to fool spatiotemporal forecasting models.

4.1 Inverse Estimation

In the domain of computer vision, adversarial attacks aim at fooling a machine learning-based classifier to misclassify objects with undetectable modifications. When it comes to spatiotemporal forecasting, adversarial attacks is to add unnoticeable perturbations into historical time series such that the forecasting models begin to generate bad predictions that are far away from the ground truth. We formulate this goal as the the following optimization problem:

~{}\begin{split}\mathop{\max}_{\bm{\rho}}&\left\|F\left(\bm{\mathcal{G}}+\bm{\rho}\right)-{\mathcal{G}}_{t+1}\right\|_{2}\\ {\rm s.t.}\ &\rho_{i}^{2}\leq\xi,\end{split}

(3)

where $\bm{\mathcal{G}}=\left\{\mathcal{G}_{t},...,\mathcal{G}_{t-N+1}\right\}$ denotes the input graph sequence, $\mathcal{G}_{t+1}$ denotes the future traffic state that is the ground truth of the forecasting model, $\bm{\rho}=\{\rho_{t},...,\rho_{t-N+1}\}$ denotes a collection of perturbations, and $\xi$ denotes the pre-specified constant to constrain the perturbation scale in order to make the perturbations unnoticeable.

However, it is difficult to directly solve the optimization model in (3). As an alternative, we design a proxy optimization problem by integrating the constraint into the objective function:

~{}\mathop{\max}_{\bm{\rho}}\left\|F\left(\bm{\mathcal{G}}+\bm{\rho}\right)-{\mathcal{G}}_{t+1}\right\|_{2}-\alpha\sum_{i=t-N+1}^{t}\max\left(0,\rho_{i}^{2}-\xi\right),

(4)

where $\alpha$ denotes the penalty factor. Even though the regularization term in Eq. (4) is a soft constraint compared to the original constraint $\left\|\rho\right\|_{2}\leq\xi$ , it can still strictly force the perturbation’s scale less than the upper bound, $\xi$ , by setting the penalty factor to be a large value to make the scale penalty term much larger than the first term in Eq. (4). In real-world traffic applications, $\xi$ is used to balance the attack performance and degree of observability.

However, the fact is that the ground truth of a forecasting model, $\mathcal{G}_{t+1}$ in Eq. (4), is unavailable at time $t$ . This issue is referred to as the “temporal gap” in Figure 1. Due to the inevitable usage of the future information, fooling spatiotemporal GNNs as Eq. (4) with the simple optimization model becomes unrealistic. A simple solution to address this issue is to directly use the most recent observations as the forecasting. This is equivalent to approximating the complex GNNs with a “most recent” forecaster:

~{}\mathcal{G}_{t+1}\leftarrow\mathcal{G}_{t}.

(5)

However, perturbations computed by the direct estimation may not be effective enough to cause significant performance drop. The reason is the direct estimation involves great errors in the perturbation computation, which is discussed in Section 5.1. In addition, the maximization optimization problem is still difficult to solve given the large search space.

To address the said issue, we propose an Inverse Estimation scheme to offer a simple but clear direction to the optimization model. We introduce the concept of “opposite state” and transform the maximization problem in (4) into a minimization problem, whose goal is to fool spatiotemporal GNNs to generate predictions opposite to the ground truth:

~{}\mathop{\min}_{\bm{\rho}}\left\|F\left(\bm{\mathcal{G}}+\bm{\rho}\right)-\tilde{\mathcal{G}}_{t+1}\right\|_{2}+\alpha\sum_{i=t-N+1}^{t}\max\left(0,\rho_{i}^{2}-\xi\right),

(6)

where $\tilde{\mathcal{G}}_{t+1}$ denotes the “opposite state” of $\mathcal{G}_{t+1}$ . Perturbations can be generated more effectively by solving Eq. (6). The above idea is similar to targeted attacks (Akhtar and Mian 2018). However, classical targeted attacks still utilize the ground truth in perturbation computations.

To avoid using future information, the opposite of future state, $\tilde{\mathcal{G}}_{t+1}$ , is estimated by computing the opposite of the most recent state:

~{}\tilde{\mathcal{G}}_{t+1}\leftarrow\tilde{\mathcal{G}}_{t}=\left\{\tilde{\mathcal{V}}_{t},\mathcal{E},W\right\},

(7)

where $\tilde{\mathcal{V}}_{t}=\left\{\tilde{v}_{1,t},...,\tilde{v}_{n,t}\right\}$ denotes a collection of state values opposite to these collected from sensors. Taking the traffic speed as an example, when the condition is “congested/low speed”, its opposite state should be “free/high speed”. In this case, we compute $\tilde{v}_{i,t}$ —the the opposite of $v_{i,t}$ —using two distinct values:

~{}\tilde{v}_{i,t}=\left\{\begin{aligned} \max\left(\mathcal{V}\right),&\quad v_{i,t}<{\rm mid}\left(\mathcal{V}\right),\\ \min\left(\mathcal{V}\right),&\quad v_{i,t}\geq{\rm mid}\left(\mathcal{V}\right),\end{aligned}\right.

(8)

where ${\rm mid}\left(\mathcal{V}\right)$ , $\max\left(\mathcal{V}\right)$ , and $\min\left(\mathcal{V}\right)$ represent the mean, maximum, and minimum value of the spatiotemporal dataset, respectively. We examine the effectiveness of this approach in the experiment section with real-world datasets.

4.2 Locating the Weakest Vertex

In this section, we introduce a solution to avoid manipulating (introducing perturbations on) all sensors in the road network. The key idea is to identify the most vulnerable sensor, which produces the largest accuracy drop to the whole sensor network when being attacked. To achieve this goal, we first solve a universal attack problem by making the perturbation static. Thus, the universal attack implies that the perturbation is consistent and independent from the input. The universal perturbation is computed by solving the following optimization problem

~{}\mathop{\min}_{\rho_{u}}\left\|F\left(\bm{\mathcal{G}}+\{\rho_{u}\}\right)-\tilde{\mathcal{G}}_{t+1}\right\|_{2}+\alpha\cdot\max\left(0,\rho_{u}^{2}-\xi\right),\\

(9)

where $\rho_{u}$ denotes the universal perturbation. After the universal perturbation is generated, there is no need to update it when new data come. It should be noted that this universal perturbation can only break forecasting models by poisoning all sensors. Thus it is unrealistic to apply it to real-world forecasting applications. Nevertheless, this universal perturbation will be used to define and quantify sensor weakness.

We define the “weakness” score of the $j$ th sensor as the number of influenced/affected sensors when sensor $j$ th is attacked by the proposed universal perturbation. Specifically, the weakness is computed as

weak_{j,t}=\left\|K_{\theta}\left\{F\left(\bm{\mathcal{G}}+M_{j}\cdot\rho_{u}\right)-\mathcal{G}_{t+1}\right\}\right\|_{0},\\

(10)

where $M_{j}\cdot\rho_{u}$ denotes that all elements except the one corresponding to the $j$ th sensor, and $K_{\theta}\left\{{\cdot}\right\}$ denotes an element-wise filter to set elements whose absolute value is smaller than $\theta$ to 0. As Eq. (10) shows, the weakness is actually time-dependent. By collecting the weakness in a time frame, a weakness vector can be formed. The $l$ 2-norm of the vector can be regarded as the time-invariant weakness for a vertex. Thus, for a sensor $j$ , a greater weakness value suggests that more sensors will be influenced if it is attacked. We will attack the vertex with the largest “weakness” value. For traffic forecasting applications, we consider the vertex where the prediction error is greater than 5 km/h as the influenced vertex, and $\theta$ is set to 5 consequently.

A possible method to locate the weakest vertex is the complete traversal algorithm. However, this method is time consuming. To reduce the time cost, we utilize the genetic algorithm to locate the weakest vertex, which is shown as follows. The genetic process is presented schematically in Figure 2.

•

First, the initial candidate set is generated with $s$ sensors with the most edges.
•

Second, the updated set consists of $s$ new candidates and they are computed as

$c_{i}(g+1)=c_{r1}(g)+p(c_{r2}(g)-c_{r3}(g)),$ (11)

where $c_{i}$ denotes the position (longitude and latitude) of the $i$ th vertex, $g$ denotes the $g$ th iteration, $r1$ , $r2$ , and $r3$ are random numbers with different values, and $p$ is set to 0.5 empirically.
•

Third, compare the weakness of updated candidates with the previous candidate set, then keep only $s$ candidates with the largest weakness value.
•

Fourth, repeat the second and third step until the candidate set is consistent or $g$ is sufficient. Select the weakest vertex to attack. It should be noted that the bound of $g$ controls the trade-off of the effectiveness and efficiency of the solution. The larger bound represents the proposed solution is much closer to the complete traversal algorithm.

4.3 Spatially Focused Attack

Once the weakest vertex is located, spatially focused attack is proposed to fool the spatiotemporal forecasting model by poisoning only the selected vertex. The perturbation is computed by solving the following optimization problem:

~{}\begin{split}\mathop{\min}_{M_{J}\cdot\bm{\rho}}\left\|F\left(\bm{\mathcal{G}}+M_{J}\cdot\bm{\rho}\right)-\tilde{\mathcal{G}}_{t}\right\|_{2}+\alpha\cdot\mathcal{R},\end{split}

(12)

where $M_{J}\cdot\bm{\rho}$ is the generated one vertex perturbation, $J$ denotes the index of the weakest vertex, and $\mathcal{R}=\max\left(0,(M_{J}\cdot\bm{\rho})^{2}-\xi\right)$ denotes the regularization term to control the scale of the generated one vertex perturbation.

Different from adversarial attack methods as in Eq. (6) and Eq. (9) that manipulates all sensors, the proposed SFA poisons only one sensor in the road network in order to achieve the largest accuracy drop. Therefore, SFA provides a more realistic and reasonable framework for implementing real-world attacks. In reality, perturbation on a selected sensor can be introduced by making a hacker vehicle drive by or by controling the network communication to generate fake sensor readings. Overall, we consider SFA an essential and meaningful attack strategy and thus it can be used to evaluate the robustness and vulnerability of different GNNs-based traffic forecasting models.

5 Evaluation and Results

In this section, we evaluate the proposed SFA framework on two traffic datasets, namely PeMS and METR-LA(S). PeMS consists of traffic speed data from 200 detectors of Caltrans Performance Measurement System (PeMS) and METR-LA(S) also registers traffic speed data for 100 detectors on the highways of Los Angeles County. These two datasets have been widely used as a benchmark to assess spatiotemporal GNN models. Our experiments are conducted on an NVIDIA Tesla V100 GPU.

We test three spatiotemporal GNNs-based forecasting models including STGCN (Yu, Yin, and Zhu 2018), DCRNN (Li et al. 2017), and Graph WaveNet (Wu et al. 2019). Each dataset is split into 3 subsets: 70% for training, 10% for validation, and 20% for test. All parameters are as the same as in the original studies except that we set the number of input and output channels to be consistent with the number of detectors/sensors. We use the validation set to locate the weakest sensor, and generate SFA perturbations in real-time for the test set. As for evaluation metrics, we introduce three metrics to quantify the effectiveness of attacks:

•

MAPE Increase (MAPEI): MAPE is a measure of prediction accuracy and smaller MAPE represents better predictions. An increase in MAPE thus translates into a decrease in the prediction accuracy.
•

Normalized MAPE Increase (NMAPEI): The ratio between MAPEI and MAPE before the attacks.
•

$k\%$ -Impacted Vertices ( $k$ %-IV): The number of vertices with NMAPEI greater than $k$ %.

5.1 Effectiveness of Inverse Estimation

We compare the effectiveness of perturbations generated from the inverse estimation (Eqs. (6) and (7)) with the direct estimation (Eqs. (4) and (5)). It should be noted that in this experiment we apply perturbations on all sensors instead of attacking only one sensor. We conduct the analysis on 15 min-head traffic prediction on the PeMS data with STGCN as the base model.

Figure 3: Comparison between the inverse estimation and the direct estimation.

Figure 3 shows the performance comparison between the proposed inverse estimation and the baseline, direct estimation. As can be seen, perturbations generated from inverse estimation can cause the forecasting forecasting model’s accuracy drop greater compared with perturbations generated from direct estimation. Inverse Estimation outperforms directly estimating the future ground truth because the proposed IE can feed less errors into the perturbation computation process. Both strategies cannot achieve perfect estimation and their estimation errors can impact the effectiveness of perturbations. The proposed inverse estimation is a binary estimation and it is much easier than estimating a continuous value. Thus the proposed IE involves less errors in perturbation computations, which in turn, as a result, also lead to more effective adversarial examples. Take PeMS for instance, errors fed by IE is small (MAPE: 0.56%), while errors fed by the direct estimation is large (MAPE: 3.3%).

5.2 Experiments on Hyperparameters

We next examine the effect of hyperparameters on locating the weakest sensor. The setting is as same as that in Section 5.1. We set the number of candidates $s$ to 5 and 10, respectively, and record the number of the iterations. We use NMAPEI and computation time to measure and compare the performance of different hyperparameter configurations.

From Figure 4, the computation time generally grows with the increase of the iteration $g$ . When $s$ is set to 5, locating the weakest vertex is much harder compared with the case of setting $s$ to 10. A possible reason is that, with a few initial candidates, the proposed strategy tends to converge to a local optimum. Note that when we set $s$ to 10, the proposed strategy breaks the iteration loop. For the following experiments, we set both $s$ and $g$ to 10.

5.3 Experiments on Effectiveness and Efficiency of Locating the Weakest Vertex

In this section, we design experiments to demonstrate the effectiveness and efficiency of different strategies in locating the weakest sensor. The setting in Section 5.1 is applied to this experiment. We compare the proposed approach with three simple baselines, including (1) locating the vertex with the highest degree (DEG), i.e., the number of connected sensors, (2) locating the vertex with the highest weighted degree centrality (CEN), i.e., row-sum of the weighted adjacency matrix, and (3) locating the weakest vertex by the complete traversal algorithm (CT). After locating the weakest vertex by different strategies, perturbations are computed based on Eq. (12) and then fed into STGCN. We evaluate different approaches using NMAPEI, 30%-IV, and computation time.

	NMAPEI (%)	30%-IV	time (s)
DEG	4.5	3	-
CEN	3.2	1	-
CT	15.2	17	1795
Proposed ( $s$ =10, $g$ =10)	15.2	17	1104

Table 1: The effectiveness and efficiency analysis on the proposed strategy to locate the weakest vertex.

Table 1 shows the comparison results. As we can see, the proposed solution achieves the same optimal as CT—it identifies the same weakest sensor as the full enumeration. On the other hand, simply poisoning the vertices with highest degree and the highest centrality cannot ensure an effective and efficient attack. A possible reason is that the robustness of sensors is improved by their neighbors because of the local/spatial aggregation mechanism in GNNs. Besides, the proposed strategy can reduce 40% computation cost compared with the CT.

5.4 Tradeoff between Attack Performance and the Attack Observability

	STGCN				DCRNN				Grave Wavenet
$\sqrt{\xi}$ (km/h)	5	10	15	20	5	10	15	20	5	10	15	20
5%-IV	43	71	69	90	51	75	82	88	42	70	82	91
10%-IV	14	52	61	82	17	59	65	71	16	32	52	73
20%-IV	4	22	40	46	8	27	45	50	3	25	41	55
30%-IV	0	1	17	39	1	1	26	40	1	4	19	38
40%-IV	0	1	1	9	0	0	0	13	0	0	1	16

Table 2: The relationship between

\sqrt{\xi}

and the

k\%

-IV.

In this section, we evaluate examine the effect of perturbation scale on the attack performance. We can consider perturbation scale an indicator for attack observability. A larger perturbation is more likely to be noticed by the user of GNNs. We first propose an experiment to assess how the parameter $\xi$ in Eq. (12) influences the effectiveness of the proposed one vertex attack method. In this subsection, 15 min traffic speed forecasting is undertaken by STGCN, DCRNN, and Graph Wavenet that work as the targeted models and the experiment is conducted on META-LA(S). These models are attacked by the proposed SFA with different $\xi$ . Note that only one sensor/vertex is poisoned in this experiment.

Table 2 shows the number of impacted vertices (IV) with different $\xi$ . When setting $\sqrt{\xi}$ to 20 km/h, we find that around 10% sensors will have an NMAPEI greater than 40% and $\sim$ 90% sensors will show at least 5% increase in NMAPEI. This suggests that, when we have a large $\sqrt{\xi}$ , the whole network could be severely impacted even attacking only one sensor. With a small $\sqrt{\xi}$ , there are about 50% sensors are influenced for at least 5% NMAPE. Based on Table 2, we can conclude that perturbations will be effectively diffused from one vertex to most of the graph when we apply GNNs-based spatiotemporal forecasting models. The greater the perturbation is, the larger the number of sensors in the graph will be influenced. Our analysis also suggests that setting $\xi$ to an appropriate range is important. An extremely large $\xi$ , which represents abnormal driving behaviors in traffic domains, can be detected easily by the user of GNNs. By analyzing PeMS and META-LA(S), speed variation within 15km/h occurs frequently, and consequently, we regard the accessible boundary of speed variation is 15 km/h, namely $\sqrt{\xi}=15$ km/h.

	STGCN		DCRNN		Grave Wavenet
	NMAPEI	30%-IV	NMAPEI	30%-IV	NMAPEI	30%-IV
SFA	15.2%	17	16.7%	22	15.5%	21
GWN	1.7%	0	2.3%	0	2.1%	0
DEG	4.5%	3	4.7%	3	5.7%	2
MFGSM ( $\epsilon$ =2)	15.4%	-	15.6%	-	16.2%	-
MFGSM ( $\epsilon$ =3)	27.3%	-	24.4%	-	25.8%	-

Table 3: Effectiveness evaluation based on PeMS.

5.5 Effectiveness of Spatially Focused Attack

We set parameters and hyperparameters in SFA based on the aforementioned experimental results. Finally, we conduct experiments on PeMS to show the overall the effectiveness of the proposed method. We perform 15 min-ahead traffic speed forecasting using the three mentioned GNNs and compare the proposed SFA with four attack baselines.

•

GWN: Generate Gaussian White Noise (GWN) with the scale being consistent with $\sqrt{\xi}$ as in SFA, and attack the same weakest sensor. This baseline is used to evaluate the effectiveness of perturbation design as in Eq. (12).
•

DEG: We simply consider the sensor with the largest number of neighbors as the “weakest”, and attack it using the same optimization algorithm as in SFA. This baseline is designed to examine whether the proposed locating strategy can effectively identify the weakest sensor (or a suboptimal that also works well).
•

MFGSM: Attack all sensors with the modified Fast Gradient Sign Method (FGSM) (Szegedy et al. 2015; Moosavi-Dezfooli, Fawzi, and Frossard 2016). We replace the ground truth in the original FGSM with the proposed inverse estimation as Eq. (7). The modified FGSM perturbation is computed as

$\bm{\rho}=\epsilon\ \operatorname{sign}(\bigtriangledown\mathcal{J}(\Phi,\bm{\mathcal{G}},\tilde{\mathcal{G}}_{t})),$ (13)

where $\bigtriangledown\mathcal{J}$ computes the gradient of the cost function around the prediction of the forecasting model parameterized by $\Phi$ w.r.t the input sequence $\bm{\mathcal{G}}$ , $sign$ denotes the sign function, $\tilde{\mathcal{G}}_{t}$ denotes the inverse estimation of the ground truth, and $\epsilon$ control the perturbation’s scale. This baseline is chosen to compare the targeted attack in SFA with manipulating all sensors. We set the scale parameter, $\epsilon$ , to 2 and 3, respectively.

In these experiments, we set $\sqrt{\xi}$ to 15 km/h for methods that attack only one sensor (i.e., SFA, GWN, and DEG). Table 3 shows the experiment results, which confirm the superior performance of the proposed SFA framework. We can see that SFA outperforms attacking the vertex with the most edges (DEG), showing that SFA can effectively identify a weak sensor. SFA also outperforms GWN that attacks the same weakest sensor, confirming that the proposed method can generate the optimal perturbations for manipulating only one vertex to fool the spatiotemporal forecasting model in the entire graph.

Fig. 5 shows an example of attack results on PeMS using STGCN as the forecasting model. We apply SFA to attack sensor A with optimized/designed perturbations (see Fig. 5(b)). In Figs. 5(c) and 5(d), we the values/results on sensors (B) and (C), including the ground truth traffic speed, the default forecasting of STGCN without attacks, and the forecasting results when implementing the perturbations on senosr A. As can be seen, the attack on sensor A can cause substantial accuracy drop on sensor B, while sensor C is less affected by the attack. A potential reason is that A and B are connected on the highway, thus having strong dependencies, while the forecasting of sensor C might be mainly determined by its local neighbors in GNNs. Overall, for traffic forecasting in PeMS, the proposed SFA framework can cause more than 15% accuracy drop for all three Spatiotemporal GNN models, and about 10% sensors are severely impacted (the NMAPEI of these sensors are greater than 30%) when setting $\sqrt{\xi}$ to 15 km/h.

Nevertheless, it should be noted that attacking all sensors will always be more effective than attacking only one vertex; however, the results in Table. 3 show that the one-vertex-based SFA can offer comparable performance as MFSDM that attack all sensors when setting $\epsilon=2\text{ km/h }\approx 0.13\sqrt{\xi}$ (both $\sqrt{\xi}$ and $\epsilon$ control the level of perturbations). Overall, the above experiments confirms the effectiveness of SFA. The perturbation can be diffused into the entire graph and even predictions on sensors that are far from the attacked one can be severely influenced.

6 Conclusion

In this paper, we propose Spatially Focused Attack (SFA) to break GNNs-based spatiotemporal forecasting models by poisoning only one vertex/sensor. SFA consists of three key components—using inverse estimation to effectively design a universal perturbation, identifying the most vulnerable sensor based on “weakness”, and redesign perturbations on the selected sensor. Different from other attack studies, the routine of SFA does not require future information in computing the optimal perturbations. Our experiments on two real-world traffic datasets have demonstrated the effectiveness of the single sensor-based attacks. One direction for future research is to seek solutions to reformulate the white-box attack to generate black-box perturbations. Given that SFA has shown that attacking a single sensor can cause network disruption, how to defend the real-world forecasting systems and to make them more robust is an urgent research question for agencies and practitioners.

References

Akhtar and Mian (2018) Akhtar, N.; and Mian, A. 2018. Threat of Adversarial Attacks on Deep Learning in Computer Vision: A Survey. arXiv preprint arXiv:1801.00553.
Alfeld, Zhu, and Barford (2016) Alfeld, S.; Zhu, X.; and Barford, P. 2016. Data Poisoning Attacks against Autoregressive Models. In AAAI.
Bai, Kolter, and Koltun (2018) Bai, S.; Kolter, Z.; and Koltun, V. 2018. An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling. arXiv preprint arXiv:1803.01271.
Bruna et al. (2014) Bruna, J.; Zaremba, W.; Szlam, A.; and LeCun, Y. 2014. Spectral Networks and Deep Locally Connected Networks on Graphs. In ICLR.
Chang et al. (2020) Chang, H.; Rong, Y.; Xu, T.; Huang, W.; Zhang, H.; Cui, P.; Zhu, W.; and Huang, J. 2020. A Restricted Black-box Adversarial Framework Towards Attacking Graph Embedding Models. In AAAI.
Chen et al. (2020) Chen, C. H.; Wu, L.; Kong, C.; Hao, X.; and Chen, W. 2020. A Short-Term Load Forecasting Method Based on GRU-CNN Hybrid Neural Network Model. In Mathematical Problems in Engineering.
Chen, Tan, and Zhang (2019) Chen, Y.; Tan, Y.; and Zhang, B. 2019. Exploiting Vulnerabilities of Load Forecasting Through Adversarial Attacks. In ACM ICFES.
Dai, Li, and Tian (2018) Dai, H.; Li, H.; and Tian, T. 2018. Adversarial Attacks on Graph Structured Data. In ICML.
Fang et al. (2019) Fang, S.; Zhang, Q.; Meng, G.; Xiang, S.; and Pan, C. 2019. GSTNet: Global Spatial-Temporal Network for Traffic Flow Prediction. In IJCAI, 2286–2293.
Goodfellow, Shlens, and Szegedy (2014) Goodfellow, I. J.; Shlens, J.; and Szegedy, C. 2014. Explaining and Harnessing Adversarial Examples. arXiv preprint arXiv:1412.6572.
Guo et al. (2019) Guo, S.; Lin, Y.; Feng, N.; Song, C.; and Wan, H. 2019. Attention Based Spatial-Temporal Graph Convolutional Networks for Traffic Flow Forecasting. In AAAI.
Hu and Tan (2017) Hu, W.; and Tan, Y. 2017. Black-Box Attacks against RNN based Malware Detection Algorithms. arXiv preprint arXiv:1705.08131.
Karim, Majumdar, and Darabi (2019) Karim, F.; Majumdar, S.; and Darabi, H. 2019. Adversarial Attacks on Time Series. arXiv preprint arXiv:1902.10755.
Kurakin, Goodfellow, and Bengio (2016a) Kurakin, A.; Goodfellow, I.; and Bengio, S. 2016a. Adversarial Machine Learning at Scale. arXiv preprint arXiv:1611.01236.
Kurakin, Goodfellow, and Bengio (2016b) Kurakin, A.; Goodfellow, I. J.; and Bengio, S. 2016b. Adversarial examples in the physical world. arXiv preprint arXiv:1607.02533.
Li et al. (2017) Li, Y.; Yu, R.; Shahabi, C.; and Liu, Y. 2017. Diffusion convolutional recurrent neural network: Data-driven traffic forecasting. arXiv preprint arXiv:1707.01926.
Moosavi-Dezfooli, Fawzi, and Frossard (2016) Moosavi-Dezfooli, S.-M.; Fawzi, A.; and Frossard, P. 2016. Deepfool: a simple and accurate method to fool deep neural networks. In CVPR.
Papernot et al. (2016a) Papernot, N.; McDaniel, P. D.; Goodfellow, I. J.; Jha, S.; Celik, Z. B.; and Swami, A. 2016a. Practical Black-Box Attacks against Deep Learning Systems using Adversarial Examples. arXiv preprint arXiv:1602.02697.
Papernot et al. (2016b) Papernot, N.; McDaniel, P. D.; Swami, A.; and Harang, R. E. 2016b. Crafting Adversarial Input Sequences for Recurrent Neural Networks. arXiv preprint arXiv:1604.08275.
Rosenberg et al. (2019) Rosenberg, I.; Shabtai, A.; Elovici, Y.; and Rokach, L. 2019. Defense Methods Against Adversarial Examples for Recurrent Neural Networks. arXiv preprint arXiv:1901.09963.
Shuman et al. (2013) Shuman, D. I.; Narang, S. K.; Frossard, P.; Ortega, A.; and Vandergheynst, P. 2013. The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains. IEEE Signal Processing Magazine.
Su, Vargas, and Kouichi (2019) Su, J.; Vargas, D. V.; and Kouichi, S. 2019. One Pixel Attack for Fooling Deep Neural Networks. In CVPR.
Szegedy et al. (2015) Szegedy, C.; Wei Liu; Yangqing Jia; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; and Rabinovich, A. 2015. Going deeper with convolutions. In CVPR.
Tang et al. (2020) Tang, X.; Li, Y.; Sun, Y.; Yao, H.; Mitra, P.; and Wang, S. 2020. Transferring Robustness for Graph Neural Network Against Poisoning Attacks. In ICWSDM.
Tramèr et al. (2017) Tramèr, F.; Kurakin, A.; Papernot, N.; Goodfellow, I.; Boneh, D.; and McDaniel, P. 2017. Ensemble Adversarial Training: Attacks and Defenses. arXiv preprint arXiv:1705.07204.
Wu et al. (2019) Wu, Z.; Pan, S.; Long, G.; Jiang, J.; and Zhang, C. 2019. Graph WaveNet for Deep Spatial-Temporal Graph Modeling. In IJCAI.
You et al. (2020) You, Y.; Chen, T.; Sui, Y.; Chen, T.; Wang, Z.; and Shen, Y. 2020. Graph contrastive learning with augmentations. In Advances in Neural Information Processing Systems.
Yu, Yin, and Zhu (2018) Yu, B.; Yin, H.; and Zhu, Z. 2018. Spatio-temporal graph convolutional networks: A deep learning framework for traffic forecasting. In IJCAI.
Zhang and Zitnik (2020) Zhang, X.; and Zitnik, M. 2020. Gnnguard: Defending graph neural networks against adversarial attacks. In NIPS.
Zhou et al. (2019) Zhou, X.; Li, Y.; Barreto, C. A.; Volgyesi, P.; and Koutsoukos, X. 2019. Load forecasting with adversarial attacks in power systems using DeepForge. In ASHTSS.
Zugner and Gunnemann (2019) Zugner, D.; and Gunnemann, S. 2019. Adversarial Attacks on Graph Neural Network via Meta Learning. In ICLR.