[section]

High-Confidence Attack Detection via Wasserstein-Metric Computations

Dan Li¹ and Sonia Martínez¹ This research was developed with funding from ONR N00014-19-1-2471, and AFOSR FA9550-19-1-0235.¹D. Li and S. Martínez are with the Department of Mechanical and Aerospace Engineering, University of California San Diego, La Jolla, CA 92092, USA lidan@ucsd.edu; soniamd@ucsd.edu

Abstract

This paper considers a sensor attack and fault detection problem for linear cyber-physical systems, which are subject to system noise that can obey an unknown light-tailed distribution. We propose a new threshold-based detection mechanism that employs the Wasserstein metric, and which guarantees system performance with high confidence employing a finite number of measurements. The proposed detector may generate false alarms with a rate $\Delta$ in normal operation, where $\Delta$ can be tuned to be arbitrarily small by means of a benchmark distribution which is part of our mechanism. Thus, the proposed detector is sensitive to sensor attacks and faults which have a statistical behavior that is different from that of the system noise. We quantify the impact of stealthy attacks—which aim to perturb the system operation while producing false alarms that are consistent with the natural system noise—via a probabilistic reachable set. To enable tractable implementation of our methods, we propose a linear optimization problem that computes the proposed detection measure and a semidefinite program that produces the proposed reachable set.

I Introduction

Cyber-Physical Systems (CPS) are physical processes that are tightly integrated with computation and communication systems for monitoring and control. Examples include critical infrastructure, such as transportation networks, energy distribution systems, and the Internet. These systems are usually complex, large-scale and insufficiently supervised, making them vulnerable to attacks [1, 2]. A significant literature has studied various denial of service [3], false data-injection [4, 5], replay [6, 7], sensor, and integrity attacks [8, 9, 10, 11]. The majority of these works study attack-detection problems in a control-theoretical framework. This approach essentially employs detectors to identify abnormal behaviors by comparing estimation and measurements under some predefined metrics. However, attacks could be stealthy, and exploit knowledge of the system structure, uncertainty and noise information to inflict significant damage on the physical system while avoiding detection. This motivates the characterization of the impact of stealthy attacks via e.g. reachability set analysis [9, 12, 13]. To ensure computational tractability, these works assume either Gaussian or bounded system noise. However, these assumptions fall short in modeling all natural disturbances that can affect a system. Such systems would be vulnerable to stealthy attacks that disguise themselves via an intentionally selected, unbounded and non-Gaussian distribution. When designing detectors, an added difficulty is in obtaining tractable computations that can handle these more general distributions. More recently, novel measure of concentration has opened the way for online tractable and robust attack detection with probability guarantees under uncertainty. A first attempt in this direction is [14], where a Chebyshev inequality is used to design a detector and, an assessment of the impact of stealthy attacks is given, under the assumption that system noises were bounded. With the aim of obtaining a less conservative detection mechanism, we leverage an alternative measure-concentration result via Wasserstein metric. This metric is built from data gathered on the system, and can provide concentration result significantly sharper than that via Chebyshev inequality. In particular, we address the following question for linear CPSs: How to design an online attack-detection mechanism that is robust to light-tailed distributions of system noise while remaining sensitive to attacks and limiting the impact of the stealthy attack?

To answer the question, we consider a sensor-attack detection problem on a linear dynamical system. The linear system models a remotely-observed process that is subject to an additive noise described by an unknown, not-necessarily bounded, light-tailed distribution. To identify abnormal behavior, we employ a steady-state Kalman filter as well as a benchmark distribution of an offline sequence corresponding to the normal system operation. In this framework, we propose a novel detection mechanism that achieves online and robust attack detection of stealthy attacks in high confidence.

Statement of Contributions: 1) We propose a novel detection measure, which employs the Wasserstein distance between the benchmark and a distribution of the residual sequence obtained online. 2) We propose a novel threshold-detection mechanism, which exploits measure-of-concentration results to guarantee the robust detection of an attack with high confidence using a finite set of data, and which further enables the robust tuning of the false alarm rate. The proposed detector can effectively identify real-time attacks when its behavior differs from that of the system noise. In addition, the detector can handle systems noises that are not necessarily distributed as Gaussian. 3) We propose a quantifiable, probabilistic state-reachable set, which reveals the impact of the stealthy attacker and system noise in high probability. 4) To implement and analyze the proposed mechanism, we formulate a linear optimization problem and a semidefinite problem for the computation of the detection measure as well as the reachable set, respectively. We illustrate our methods in a two-dimensional linear system with irregular noise distributions and stealthy sensor attacks.

II CPS and Its Normal Operation

This section presents cyber-physical system (CPS) model, and underlying assumptions on the system and attacks.

Refer to caption — Figure 1: Cyber-Physical System Diagram.

A remotely-observed, cyber-physical system subject to sensor-measurement attacks, as in Fig. 1, is described as a discrete-time, stochastic, linear, and time-invariant system

	$\displaystyle\boldsymbol{x}(t+1)$	$\displaystyle=\;A\boldsymbol{x}(t)+B\boldsymbol{u}(t)+\boldsymbol{w}(t),$		(1)
	$\displaystyle\boldsymbol{y}(t)$	$\displaystyle=\;C\boldsymbol{x}(t)+\boldsymbol{v}(t)+\boldsymbol{\gamma}(t),$		(1)

where $\boldsymbol{x}(t)\in^{n}$ , $\boldsymbol{u}(t)\in^{m}$ and $\boldsymbol{y}(t)\in^{p}$ denote the system state, input and output at time $t\in\mathbb{N}$ , respectively. The state matrix $A$ , input matrix $B$ and output matrix $C$ are assumed to be known in advance. In particular, we assume that the pair $(A,B)$ is stabilizable, and $(A,C)$ is detectable. The process noise $\boldsymbol{w}(t)\in^{n}$ and output noise $\boldsymbol{v}(t)\in^{p}$ are independent zero-mean random vectors. We assume that each $\boldsymbol{w}(t)$ and $\boldsymbol{v}(t)$ are independent and identically distributed (i.i.d.) over time. We denote their (unknown, not-necessarily equal) distributions by $\mathbb{P}_{\boldsymbol{w}}$ and $\mathbb{P}_{\boldsymbol{v}}$ , respectively. In addition, we assume that $\mathbb{P}_{\boldsymbol{w}}$ and $\mathbb{P}_{\boldsymbol{v}}$ are light-tailed¹¹1 For a random vector $\boldsymbol{w}$ such that $\boldsymbol{w}\sim\mathbb{P}$ , we say $\mathbb{P}$ is $q$ -light-tailed, $q=1,2,\ldots$ , if $c:=\mathbb{E}_{\mathbb{P}}[\exp(b\|\boldsymbol{w}\|^{a})]<\infty$ for some $a>q$ and $b>0$ . All examples listed have a moment generating function, so their exponential moment can be constructed for at least $q=1$ ., excluding scenarios of systems operating under extreme events, or subject to large delays. In fact, Gaussian, Sub-Gaussian, Exponential distributions, and any distribution with a compact support set are admissible. This class of distributions is sufficient to characterize the uncertainty or noise of many practical problems.

An additive sensor-measurement attack is implemented via $\boldsymbol{\gamma}(t)\in^{p}$ in (1), on which we assume the following {assumption}[Attack model] It holds that 1) $\boldsymbol{\gamma}(t)=\boldsymbol{0}$ whenever there is no attack; 2) the attacker can modulate any component of $\boldsymbol{\gamma}(t)$ at any time; 3) the attacker has unlimited computational resources and access to system information, e.g., $A$ , $B$ , $C$ , $\boldsymbol{u}$ , $\mathbb{P}_{\boldsymbol{w}}$ and $\mathbb{P}_{\boldsymbol{v}}$ to decide on $\boldsymbol{\gamma}(t)$ , $t\in\mathbb{N}$ .

II-A Normal System Operation

In what follows, we introduce the state observer that enables prediction in the absence of attacks (when $\boldsymbol{\gamma}(t)=\boldsymbol{0}$ ). Since the distribution of system noise is unknown, we identify a benchmark distribution that can be used to characterize this unknown distribution with high confidence.

To predict (estimate) the system behavior, we leverage the system information $(A,B,C)$ and employ a Kalman filter

	$\displaystyle\hat{\boldsymbol{x}}(t+1)$	$\displaystyle=\;A\hat{\boldsymbol{x}}(t)+B\boldsymbol{u}(t)+L(t)\left(\boldsymbol{y}(t)-\hat{\boldsymbol{y}}(t)\right),$
	$\displaystyle\hat{\boldsymbol{y}}(t)$	$\displaystyle=\;C\hat{\boldsymbol{x}}(t),$

where $\hat{\boldsymbol{x}}(t)$ is the state estimate and $L(t)\equiv L$ is the steady-state Kalman gain matrix. As the pair $(A,C)$ is detectable, the gain $L$ can be selected in such a way that the eigenvalues of $A-LC$ are inside the unit circle. This ensures asymptotic tracking of the state in expectation; that is, the expectation of the estimation error $\boldsymbol{e}(t):=\boldsymbol{x}(t)-\hat{\boldsymbol{x}}(t)$ satisfies

\mathbb{E}[\boldsymbol{e}(t)]\rightarrow 0\textrm{ as }t\rightarrow\infty,\;\textrm{ for any }\boldsymbol{x}(0),\;\hat{\boldsymbol{x}}(0).

The further selection of eigenvalues of $A-LC$ and the structure of $L$ usually depends on additional control objectives such as noise attenuation requirements. In this paper, we additionally consider the estimated state feedback

\boldsymbol{u}(t)=K\hat{\boldsymbol{x}}(t),

where $K$ is so that the following augmented system is stable²²2System (2) is input-to-state stable in probability (ISSp) relative to any compact set $\mathcal{A}$ which contains the origin, if we select $K$ such that eigenvalues of the matrix $A+BK$ are inside the unit circle, see e.g. [15].

\boldsymbol{\xi}(t+1)=F\boldsymbol{\xi}(t)+G\boldsymbol{\sigma}(t),

(2)

where $\boldsymbol{\xi}(t):={\left({\boldsymbol{x}}(t),\boldsymbol{e}(t)\right)}^{\top}$ , $\boldsymbol{\sigma}(t):={\left(\boldsymbol{w}(t),\boldsymbol{v}(t)+\boldsymbol{\gamma}(t)\right)}^{\top}$ ,

F=\begin{bmatrix}A+BK&-BK\\ 0&A-LC\end{bmatrix},G=\begin{bmatrix}I&0\\ I&-L\end{bmatrix}\textrm{ and some }\boldsymbol{\xi}(0).

Remark II.1 (Selection of $L$ and $K$ ).

In general, the selection of the matrices $L$ and $K$ for the system (1) is a nontrivial task, especially when certain performance criteria are to be satified, such as fast system response, energy conservation, or noise minimization. However, there are a few scenarios in which the Separation Principle can be invoked for a tractable design of $L$ and $K$ . For example, 1) when there is no system noise, matrices $L$ and $K$ can be designed separately, such that each $A+BK$ and $A-LC$ have all eigenvalues contained inside the unit circle, respectively. 2) when noise are Gaussian, the gain matrices $L$ and $K$ can be designed to minimize the steady-state covariance matrix and control effort, via a separated design of a Kalman filter (as an observer) and a linear-quadratic regulator (as a controller). The resulting design is referred to as a Linear-Quadratic-Gaussian (LQG) control [16].

Consider the system initially operates normally with the proper selection of $L$ and $K$ , and assume that the augmented system (2) is in steady state, i.e., $\mathbb{E}[\boldsymbol{\xi}(t)]=\boldsymbol{0}$ . In order to design an attack detector later, we need a characterization of the distribution of the residue $\boldsymbol{r}(t)\in^{p}$ , which measures the difference between what we measure and what we expect to receive, as follows

\displaystyle\boldsymbol{r}(t)=

\displaystyle\;\boldsymbol{y}(t)-\hat{\boldsymbol{y}}(t)=C\boldsymbol{e}(t)+\boldsymbol{v}(t)+\boldsymbol{\gamma}(t).

When there is no attack, it can be verified that the random vector $\boldsymbol{r}(t)$ is zero-mean, and light-tailed³³3This can be checked from the definition in the footnote 1, and the fact that $\boldsymbol{r}(t)$ is a linear combination of zero-mean $q$ -light-tailed distributions., and we denote its unknown distribution by $\mathbb{P}_{\boldsymbol{r}}$ . We assume that a finite but large number $N$ of i.i.d. samples of $\mathbb{P}_{\boldsymbol{r}}$ are accessible, and acquired by collecting $\boldsymbol{r}(t)$ for a sufficiently large time. We call these i.i.d. samples a benchmark data set, $\Xi_{\operatorname{B}}:=\{\boldsymbol{r}^{(i)}\}_{i=1}^{N}$ , and construct the resulting empirical distribution $\mathbb{P}_{\boldsymbol{r},\operatorname{B}}$ by

\mathbb{P}_{\boldsymbol{r},\operatorname{B}}:=\frac{1}{N}\sum_{i=1}^{N}\delta_{\{\boldsymbol{r}^{(i)}\}},

where the operator $\delta$ is the mass function and the subscript $\operatorname{B}$ indicates that $\mathbb{P}_{\boldsymbol{r},\operatorname{B}}$ is the benchmark to approximate the unknown $\mathbb{P}_{\boldsymbol{r}}$ . We can claim that, the benchmark $\mathbb{P}_{\boldsymbol{r},\operatorname{B}}$ provides a good sense of the effect of the noise on the system (2) via the following measure concentration result

Theorem II.1 (Measure Concentration [17, Application of Theorem 2]).

If $\mathbb{P}_{\boldsymbol{r}}$ is a $q$ -light-tailed distribution for some $q\geq 1$ , then for a given $\beta\in(0,1)$ , the following holds

{\operatorname{Prob}}\left(d_{W,q}(\mathbb{P}_{\boldsymbol{r},\operatorname{B}},\mathbb{P}_{\boldsymbol{r}})\leq\epsilon_{\operatorname{B}}\right)\geq 1-\beta,

where $\operatorname{Prob}$ denotes the Probability, $d_{W,q}$ denotes the $q$ -Wasserstein metric⁴⁴4 Let ${\mathcal{M}_{q}}(\mathcal{Z})$ denote the space of all $q$ -light-tailed probability distributions supported on $\mathcal{Z}\subset^{p}$ . Then for any two distributions $\mathbb{Q}_{1}$ , $\mathbb{Q}_{2}\in\mathcal{M}_{q}(\mathcal{Z})$ , the $q$ -Wasserstein metric [18] $d_{W,q}:\mathcal{M}_{q}(\mathcal{Z})\times\mathcal{M}_{q}(\mathcal{Z})\rightarrow\mathbb{R}_{\geq 0}$ is defined by $d_{W,q}(\mathbb{Q}_{1},\mathbb{Q}_{2}):=(\min\limits_{\Pi}\int_{\mathcal{Z}\times\mathcal{Z}}\ell^{q}(\xi_{1},\xi_{2})\Pi(d\xi_{1},d\xi_{2}))^{1/q},$ where $\Pi$ is in a set of all the probability distributions on $\mathcal{Z}\times\mathcal{Z}$ with marginals $\mathbb{Q}_{1}$ and $\mathbb{Q}_{2}$ . The cost $\ell(\xi_{1},\xi_{2}):=\|\xi_{1}-\xi_{2}\|$ is a norm on $\mathcal{Z}$ . in the probability space, and the parameter $\epsilon_{\operatorname{B}}$ is selected as

\epsilon_{\operatorname{B}}:=\left\{{\begin{array}[]{*{10}{l}}\left(\frac{\log(c_{1}\beta^{-1})}{c_{2}N}\right)^{q/a},&\textrm{if}\;N<\frac{\log(c_{1}\beta^{-1})}{c_{2}},\\ \bar{\epsilon},&\textrm{if}\;N\geq\frac{\log(c_{1}\beta^{-1})}{c_{2}},\end{array}}\right.

(3)

for some constant⁵⁵5The parameter $a$ is determined as in the definition of $\mathbb{P}_{\boldsymbol{r}}$ and the constants $c_{1}$ , $c_{2}$ depend on $q$ , $m$ , and $\mathbb{P}_{\boldsymbol{r}}$ (via $a$ , $b$ , $c$ ). When information on $\mathbb{P}_{\boldsymbol{r}}$ is absent, the parameters $a$ , $c_{1}$ and $c_{2}$ can be determined in a data-driven fashion using sufficiently many samples of $\mathbb{P}_{\boldsymbol{r}}$ . See [17] for details. $a>q$ , $c_{1}$ , $c_{2}>0$ , and $\bar{\epsilon}$ is such that

\small\small\frac{\bar{\epsilon}}{\log(2+1/\bar{\epsilon})}=\left(\frac{\log(c_{1}\beta^{-1})}{c_{2}N}\right)^{1/2},\textrm{ if }p=2q,

\small\bar{\epsilon}=\left(\frac{\log(c_{1}\beta^{-1})}{c_{2}N}\right)^{1/\max\{2,p/q\}},\textrm{ if }p\neq 2q,

where $p$ is the dimension of $\boldsymbol{r}$ . $\square$

Theorem II.1 provides a probabilistic bound $\epsilon_{\operatorname{B}}$ on the $q$ -Wasserstein distance between $\mathbb{P}_{\boldsymbol{r},\operatorname{B}}$ and $\mathbb{P}_{\boldsymbol{r}}$ , with a confidence at least $1-\beta$ . The result indicates how to tune the parameter $\beta$ and the number of benchmark samples $N$ that are needed to find a sufficiently good approximation of $\mathbb{P}_{\boldsymbol{r}}$ , by means of $\mathbb{P}_{\boldsymbol{r},\operatorname{B}}$ . In this way, given a fixed $\epsilon$ , we can increase our confidence ( $1-\beta$ ) on whether $\mathbb{P}_{\boldsymbol{r}}$ and $\mathbb{P}_{\boldsymbol{r},\operatorname{B}}$ are within distance $\epsilon$ , by increasing the number of samples. We assume that $\mathbb{P}_{\boldsymbol{r},\operatorname{B}}$ has been determined in advance, selecting a very large (unique) $N$ to ensure various very small bounds $\epsilon_{\operatorname{B}}$ associated with various $\beta$ . Later, we discuss how the parameter $\beta$ can be interpreted as a false alarm rate in the proposed attack detector. The resulting $\mathbb{P}_{\boldsymbol{r},\operatorname{B}}$ , with a tunnable false alarm rate (depending on $\beta$ ), will allow us to design a detection procedure which is robust to the system noise.

III Threshold-based robust detection of attacks, and stealthiness

This section presents our online detection procedure, and a threshold-based detector with high-confidence performance guarantees. Then, we propose a tractable computation of the detection measure used for online detection. We finish the section by introducing a class of stealthy attacks.

Online Detection Procedure (ODP): At each time $t\geq T$ , we construct a $T$ -step detector distribution

\mathbb{P}_{\boldsymbol{r},\operatorname{D}}:=\frac{1}{T}\sum_{j=0}^{T-1}\delta_{\{\boldsymbol{r}(t-j)\}},

where $\boldsymbol{r}(t-j)$ is the residue data collected independently at time $t-j$ , for $j\in\{{0},\dots,{T-1}\}$ . Then with a given $q$ and a threshold $\alpha>0$ , we consider the detection measure

z(t):=d_{W,q}(\mathbb{P}_{\boldsymbol{r},\operatorname{B}},\mathbb{P}_{\boldsymbol{r},\operatorname{D}}),

(4)

and the attack detector

\begin{cases}z(t)\leq\alpha,&\textrm{ no alarm at }t:\operatorname{Alarm}(t)=0,\\ z(t)>\alpha,&\textrm{ alarm at }t:\operatorname{Alarm}(t)=1,\end{cases}

(5)

with $\operatorname{Alarm}(t)$ the sequence of alarms generated online based on the previous threshold.

The distribution $\mathbb{P}_{\boldsymbol{r},\operatorname{D}}$ uses a small number $T$ of samples to ensure the online computational tractability of $z(t)$ , so $\mathbb{P}_{\boldsymbol{r},\operatorname{D}}$ is highly dependent on instantaneous samples. Thus, $\mathbb{P}_{\boldsymbol{r},\operatorname{D}}$ may significantly deviate from the true $\mathbb{P}_{\boldsymbol{r}}$ , and from $\mathbb{P}_{\boldsymbol{r},\operatorname{B}}$ . Therefore, even if there is no attack, the attack detector is expected to generate false alarms due to the system noise as well as an improper selection of the threshold $\alpha$ . In the following, we discuss how to select an $\alpha$ that is robust to the system noise and which results in a desired false alarm rate. Note that the value $\alpha$ should be small to be able to distinguish attacks from noise, as discussed later.

Lemma III.1 (Selection of $\alpha$ for Robust Detectors).

Given parameters $N$ , $T$ , $q$ , $\beta$ , and a desired false alarm rate $\Delta>\beta$ at time $t$ , if we select the threshold $\alpha$ as

\alpha:=\epsilon_{\operatorname{B}}+\epsilon_{\operatorname{D}},

where $\epsilon_{\operatorname{B}}$ is chosen as in (3) and $\epsilon_{\operatorname{D}}$ is selected following the $\epsilon_{\operatorname{B}}$ -formula (3), but with $T$ and $\frac{\Delta-\beta}{1-\beta}$ in place of $N$ and $\beta$ , respectively. Then, the detection measure (4) satisfies

{\operatorname{Prob}}\left(z(t)\leq\alpha\right)\geq 1-\Delta,

for any zero-mean $q$ -light-tailed underlying distribution $\mathbb{P}_{\boldsymbol{r}}$ .

Due to space limit, please see ArXiv version [19] for proofs.

Proof.

The proof leverages the triangular inequality

z(t)\leq d_{W,q}(\mathbb{P}_{\boldsymbol{r},\operatorname{B}},\mathbb{P}_{\boldsymbol{r}})+d_{W,q}(\mathbb{P}_{\boldsymbol{r},\operatorname{D}},\mathbb{P}_{\boldsymbol{r}}),

the measure concentration result for each $d_{W,q}$ term, and that samples of $\mathbb{P}_{\boldsymbol{r},\operatorname{B}}$ and $\mathbb{P}_{\boldsymbol{r},\operatorname{D}}$ are collected independently.

Lemma III.1 ensures that the false alarm rate is no higher than $\Delta$ when there is no attack, i.e.,

\operatorname{Prob}(\operatorname{Alarm}(t)=1\;|\;\operatorname{no\;attack})\leq\Delta,\quad\forall\;t.

Note that the rate $\Delta$ can be selected by properly choosing the threshold $\alpha$ . Intuitively, if we fix all the other parameters, then the smaller the rate $\Delta$ , the larger the threshold $\alpha$ . Also, large values of $N$ , $T$ , $1-\beta$ contribute to small $\alpha$ .

Remark III.1 (Comparison with $\chi^{2}$ -detector).

Consider an alternative detection measure

z_{\chi}(t):={\boldsymbol{r}(t)}^{\top}\Sigma^{-1}\boldsymbol{r}(t),

where $\Sigma$ is the constant covariance matrix of the residue $\boldsymbol{r}(t)$ under normal system operation. In particular, if $\boldsymbol{r}$ is Gaussian, the detection measure $z_{\chi}(t)$ is $\chi^{2}$ -distributed and referred to as $\chi^{2}$ detection measure with $p$ degree of freedom. The detector threshold $\alpha$ is selected via look-up tables of $\chi^{2}$ distribution, given the desired false alarm rate $\Delta$ . To compare $z(t)$ with $z_{\chi}(t)$ , we leverage the assumption that $\boldsymbol{r}$ is Gaussian with the given covariance $\Sigma$ . This gives explicitly the expression of $z(t)$ the following

z(t):=\left(\mathbb{E}_{\xi\sim\mathcal{N}(\boldsymbol{r}(t),\Sigma)}[\|\xi\|^{q}]\right)^{-1/q}.

By selecting $q=2$ , we have

z(t):=\left({\boldsymbol{r}(t)}^{\top}\boldsymbol{r}(t)+Tr(\Sigma)\right)^{-1/2}.

Note that, the measure-of-concentration result in Theorem II.1 is sharp when $\boldsymbol{r}$ is Gaussian, which in fact results in the threshold $\alpha$ as tight as that derived for $\chi^{2}$ -detector.

Computation of detection measure: To achieve a tractable computation of the detection measure $z(t)$ , we leverage the definition of the Wasserstein distance (see footnote 4) and the fact that both $\mathbb{P}_{\boldsymbol{r},\operatorname{B}}$ and $\mathbb{P}_{\boldsymbol{r},\operatorname{D}}$ are discrete. The solution is given as a linear program.

The Wasserstein distance $d_{W,q}(\mathbb{P}_{\boldsymbol{r},\operatorname{B}},\mathbb{P}_{\boldsymbol{r},\operatorname{D}})$ , originally solving the Kantorovich optimal transport problem [18], can be interpreted as the minimal work needed to move a mass of points described via a probability distribution $\mathbb{P}_{\boldsymbol{r},\operatorname{B}}(\boldsymbol{r})$ , on the space $\mathcal{Z}\subset^{p}$ , to a mass of points described by the probability distribution $\mathbb{P}_{\boldsymbol{r},\operatorname{D}}(\boldsymbol{r})$ on the same space, with some transportation cost $\ell$ . The minimization that defines $d_{W,q}$ is taken over the space of all the joint distributions $\Pi$ on $\mathcal{Z}\times\mathcal{Z}$ whose marginals are $\mathbb{P}_{\boldsymbol{r},\operatorname{B}}$ and $\mathbb{P}_{\boldsymbol{r},\operatorname{D}}$ , respectively.

Assuming that both $\mathbb{P}_{\boldsymbol{r},\operatorname{B}}$ and $\mathbb{P}_{\boldsymbol{r},\operatorname{D}}$ are discrete, we can equivalently characterize the joint distribution $\Pi$ as a discrete mass optimal transportation plan [18]. To do this, let us consider two sets $\mathcal{N}:=\{{1},\dots,{N}\}$ and $\mathcal{T}:=\{{0},\dots,{T-1}\}$ . Then, $\Pi$ can be parameterized (by $\lambda$ ) as follows

$\displaystyle\Pi_{\lambda}$	$\displaystyle(\xi_{1},\xi_{2}):=\sum_{i\in\mathcal{N}}\sum_{j\in\mathcal{T}}\lambda_{ij}\delta_{\{\boldsymbol{r}^{(i)}\}}(\xi_{1})\delta_{\{\boldsymbol{r}(t-j)\}}(\xi_{2}),$
$\displaystyle\operatorname{s.t.}$	$\displaystyle\hskip 0.0pt\sum_{i\in\mathcal{N}}\lambda_{ij}=\frac{1}{T},\;\forall\;j\in\mathcal{T},\quad\sum_{j\in\mathcal{T}}\lambda_{ij}=\frac{1}{N},\;\forall\;i\in\mathcal{N},$	(6)
	$\displaystyle\hskip 8.5359pt\lambda_{ij}\geq 0,\;\forall\;i\in\mathcal{N},\;j\in\mathcal{T}.$	(7)

Note that this characterizes all the joint distributions with marginals $\mathbb{P}_{\boldsymbol{r},\operatorname{B}}$ and $\mathbb{P}_{\boldsymbol{r},\operatorname{D}}$ , where $\lambda$ is the allocation of the mass from $\mathbb{P}_{\boldsymbol{r},\operatorname{B}}$ to $\mathbb{P}_{\boldsymbol{r},\operatorname{D}}$ . Then, the proposed detection measure $z(t)$ in (4) reduces to the following

	$\displaystyle(z(t))^{q}=\min\limits_{\lambda}$	$\displaystyle\;\sum_{i\in\mathcal{N}}\sum_{j\in\mathcal{T}}\lambda_{ij}\\|\boldsymbol{r}^{(i)}-\boldsymbol{r}(t-j)\\|^{q},$		(P)
	$\displaystyle\operatorname{s.t.}$	$\displaystyle\;\eqref{eq:lambda1},\;\eqref{eq:lambda2},$		(P)

which is a linear program over a compact polyhedral set. Therefore, the solution exists and (P) can be solved to global optimal in polynomial time by e.g., a CPLEX solver.

III-A Detection and Stealthiness of Attacks

Following from the previous discussion, we now introduce a False Alarm Quantification Problem, then specialize it to the Attack Detection Problem considered in this paper. In particular, we analyze the sensitivity of the proposed attack detector method for the cyber-physical system under attacks.

Problem 1. (False Alarm Quantification Problem): Given the augmented system (2), the online detection procedure in Section III, and the attacker type described in Assumption 1, compute the false alarm rate

		$\displaystyle\operatorname{Prob}(\textrm{false alarm at }t)=$
		$\displaystyle\hskip 28.45274pt\operatorname{Prob}(\operatorname{Alarm}(t)=1\;\|\;\operatorname{no\;attack})\operatorname{Prob}(\operatorname{no\;attack})$
		$\displaystyle\hskip 54.06006pt+\operatorname{Prob}(\operatorname{Alarm}(t)=0\;\|\;\operatorname{attack})\operatorname{Prob}(\operatorname{attack}).$

Problem 1, on the computation of the false alarm probability, requires prior information of the attack probability $\operatorname{Prob}(\operatorname{attack})$ . In this work, we are interested in stealthy attacks, i.e., attacks that cannot be effectively detected by (5). We are led to the following problem.

Problem 2. (Attack Detection Problem): Given the setting of Problem 1, provide conditions that characterize stealthy attacks, i.e., attacks that contribute to $\operatorname{Prob}(\operatorname{Alarm}(t)=0\;|\;\operatorname{attack})$ , and quantify their impact on the system.

To remain undetected, the attacker must select $\boldsymbol{\gamma}(t)$ such that $z(t)$ is limited to within the threshold $\alpha$ . To quantify the effects of these attacks, let us consider an attacker sequence backward in time with length $T$ . At time $t$ , denote the arbitrary injected attacker sequence by $\boldsymbol{\gamma}(t-j)\in^{p}$ , $j\in\{{0},\dots,{T-1}\}$ (if $t-j<0$ , assume $\boldsymbol{\gamma}(t-j)=0$ ). This sequence, together with (2), results in a detection sequence $\{\boldsymbol{r}(t-j)\}_{j}$ that can be used to construct detector distribution $\mathbb{P}_{\boldsymbol{r},\operatorname{D}}$ and detection measure $z(t)$ . For simplicity and w.l.o.g., let us assume that $\boldsymbol{\gamma}(t)$ is in the following form

\boldsymbol{\gamma}(t):=-C\boldsymbol{e}(t)-\boldsymbol{v}(t)+\bar{\gamma}(t),

(8)

where $\bar{\gamma}(t)\in^{p}$ is any vector selected by the attacker.⁶⁶6 Note that, when there is no attack at $t$ , we have $\boldsymbol{\gamma}(t)=\boldsymbol{0}$ , resulting in $\bar{\gamma}(t)=C\boldsymbol{e}(t)+\boldsymbol{v}(t)$ . Similar techniques appearred in, e.g., [9, 20]. We characterize the scenarios that can occur, providing a first, partial answer to Problem 2. Then, we will come back to characterizing the impact of stealthy attacks in Section IV.

Definition III.1 (Attack Detection Characterization).

Assume (2) is subject to attack, i.e., $\boldsymbol{\gamma}(t)\neq\boldsymbol{0}$ for some $t\geq 0$ .

•

If $z(t)\leq\alpha$ , $\forall\;t\geq 0$ , then the attack is stealthy with probability one, i.e., $\operatorname{Prob}(\operatorname{Alarm}(t)=0\;|\operatorname{attack})=1$ .
•

If $z(t)\leq\alpha$ , $\forall t\leq M$ , then the attack is $M$ -step stealthy.
•

If $z(t)>\alpha$ , $\forall t\geq 0$ , then the attack is active with probability one, i.e., $\operatorname{Prob}(\operatorname{Alarm}(t)=0\;|\operatorname{attack})=0$ .

Lemma III.2 (Stealthy Attacks Leveraging System Noise).

Assume (2) is subject to attack in form (8), where $\bar{\gamma}(t)$ is stochastic and distributed as $\mathbb{P}_{\bar{\gamma}}$ . If $\mathbb{P}_{\bar{\gamma}}$ is selected such that $d_{W,q}(\mathbb{P}_{\bar{\gamma}},\mathbb{P}_{\boldsymbol{r},\operatorname{B}})\leq\epsilon_{\operatorname{B}}$ , then the attacker is stealthy with (high) probability at least $\frac{1-\Delta}{1-\beta}$ , i.e., $\operatorname{Prob}(\operatorname{Alarm}(t)=0\;|\;\operatorname{attack})\geq\frac{1-\Delta}{1-\beta}$ .

Proof.

Assume (2) is under attack. Then, we prove the statement leveraging the measure concentration

{\operatorname{Prob}}\left(d_{W,q}(\mathbb{P}_{\bar{\gamma}},\mathbb{P}_{\boldsymbol{r},\operatorname{D}})\leq\epsilon_{\operatorname{D}}\right)\geq 1-\frac{\Delta-\beta}{1-\beta},

which holds as $\mathbb{P}_{\boldsymbol{r},\operatorname{D}}$ is constructed using samples of $\mathbb{P}_{\bar{\gamma}}$ , and the triangular inequality

z(t)\leq d_{W,q}(\mathbb{P}_{\boldsymbol{r},\operatorname{B}},\mathbb{P}_{\bar{\gamma}})+d_{W,q}(\mathbb{P}_{\boldsymbol{r},\operatorname{D}},\mathbb{P}_{\bar{\gamma}}),

resulting in $z(t)\leq\alpha$ with probability at least $\frac{1-\Delta}{1-\beta}$ .

Note that $\alpha>\epsilon_{\operatorname{B}}$ , which allows the attacker to select $\mathbb{P}_{\bar{\gamma}}$ with $\epsilon_{\operatorname{B}}<d_{W,q}(\mathbb{P}_{\bar{\gamma}},\mathbb{P}_{\boldsymbol{r},\operatorname{B}})\leq\alpha$ . However, the probability of being stealthy can be indefinitely low, with the range $[0,\frac{1-\Delta}{1-\beta}]$ .

IV Stealthy Attack Analysis

In this section, we address the second question in Problem 2 via reachable-set analysis. In particular, we achieve an outer-approximation of the finite-step probabilistic reachable set, quantifying the effect of the stealthy attacks and the system noise in probability.

Consider an attack sequence $\boldsymbol{\gamma}(t)$ as in (8), where $\bar{\gamma}(t)\in^{p}$ is any vector such that the attack remains stealthy. That is, $\bar{\gamma}(t)$ results in the detected distribution $\mathbb{P}_{\boldsymbol{r},\operatorname{D}}$ , which is close to $\mathbb{P}_{\boldsymbol{r},\operatorname{B}}$ as prescribed by $\alpha$ . This results in the following representation of the system (2)

\small\boldsymbol{\xi}(t+1)=\underbrace{\begin{bmatrix}A+BK&-BK\\ 0&A\end{bmatrix}}_{H}\boldsymbol{\xi}(t)+\underbrace{\begin{bmatrix}I&0\\ I&-L\end{bmatrix}}_{G}\begin{bmatrix}\boldsymbol{w}(t)\\ \bar{\gamma}(t)\end{bmatrix}.

(9)

We provide in the following remark an intuition of how restrictive the stealthy attacker’s action $\bar{\gamma}(t)$ has to be.

Remark IV.1 (Constant attacks).

Consider a constant offset attack $\bar{\gamma}(t):=\gamma_{0}$ for some $\gamma_{0}\in^{p}$ , $\forall t$ . Then by (P),

z(t)=N^{1-1/q}\|\gamma_{0}-\textrm{C}(\Xi_{\operatorname{B}})\|,\quad\textrm{C}(\Xi_{\operatorname{B}}):=\frac{1}{N}\sum_{i\in\mathcal{N}}\boldsymbol{r}^{(i)}.

To ensure stealth, we require $z(t)\leq\alpha$ , this then limits the selection of $\gamma_{0}$ in a ball centered at $\textrm{C}(\Xi_{\operatorname{B}})$ with radius $\alpha/N^{1-1/q}$ . Note that the radius can be arbitrarily small by choosing the benchmark size $N$ large.

To quantify the reachable set of the system under attacks, prior information on the process noise $\boldsymbol{w}(t)$ is needed. To characterize $\boldsymbol{w}(t)$ , let us assume that, similarly to the benchmark $\mathbb{P}_{\boldsymbol{r},\operatorname{B}}$ , we are able to construct a noise benchmark distribution, denoted by $\mathbb{P}_{\boldsymbol{w},\operatorname{B}}$ . As before,

{\operatorname{Prob}}\left(d_{W,q}(\mathbb{P}_{\boldsymbol{w},\operatorname{B}},\mathbb{P}_{\boldsymbol{w}})\leq\epsilon_{\boldsymbol{w},\operatorname{B}}\right)\geq 1-\beta,

for some $\epsilon_{\boldsymbol{w},\operatorname{B}}$ . Given certain time, we are interested in where, with high probability, the state of the system can evolve from some $\boldsymbol{\xi}_{0}$ . To do this, we consider the $M$ -step probabilistic reachable set defined as follows

\displaystyle\mathcal{R}_{\boldsymbol{x},M}=\left\{\hskip-0.77498pt\boldsymbol{x}(M)\in^{n}\;\rule[-11.38092pt]{0.56917pt}{28.45274pt}\;\begin{array}[]{l}\textrm{system }\eqref{eq:attack}\textrm{ with }\boldsymbol{\xi}(0)=\boldsymbol{\xi}_{0},\\ \exists\;\mathbb{P}_{\boldsymbol{w}}\ni d_{W,q}(\mathbb{P}_{\boldsymbol{w}},\mathbb{P}_{\boldsymbol{w},\operatorname{B}})\leq\epsilon_{\boldsymbol{w},\operatorname{B}},\\ \exists\;\mathbb{P}_{\bar{\gamma}}\ni d_{W,q}(\mathbb{P}_{\bar{\gamma}},\mathbb{P}_{\boldsymbol{r},\operatorname{B}})\leq\alpha,\\ \end{array}\right\},

then the true system state $\boldsymbol{x}(t)$ at time $M$ , $\boldsymbol{x}(M)$ , satisfies

\operatorname{Prob}\left(\boldsymbol{x}(M)\in\mathcal{R}_{\boldsymbol{x},M}\right)\geq 1-\beta,

where $1-\beta$ accounts for the independent restriction of the unknown distributions $\mathbb{P}_{\boldsymbol{w}}$ to be “close” to its benchmark.

The exact computation of $\mathcal{R}_{\boldsymbol{x},M}$ is intractable due to the unbounded support of the unknown distributions $\mathbb{P}_{\boldsymbol{w}}$ and $\mathbb{P}_{\bar{\gamma}}$ , even if they are close to their benchmark. To ensure a tractable approximation, we follow a two-step procedure. First, we equivalently characterize the constraints on $\mathbb{P}$ by its probabilistic support set. Then, we outer-approximate the probabilistic support by ellipsoids, and then the reachable set by an ellipsoidal bound.

Step 1: (Probabilistic Support of $\mathbb{P}_{\bar{\gamma}}\;\ni d_{W,q}(\mathbb{P}_{\bar{\gamma}},\mathbb{P}_{\boldsymbol{r},\operatorname{B}})\leq\alpha$ ) We achieve this by leveraging 1) the Monge formulation [18] of optimal transport, 2) the fact that $\mathbb{P}_{\boldsymbol{r},\operatorname{B}}$ is discrete, and 3) results from coverage control [21, 22]. W.l.o.g., let us assume $\mathbb{P}_{\bar{\gamma}}$ is non-atomic (or continuous) and, consider the distribution $\mathbb{P}_{\bar{\gamma}}$ and $\mathbb{P}_{\boldsymbol{r},\operatorname{B}}$ supported on $\mathcal{Z}\subset^{p}$ . Let us denote by $f:\mathbb{P}_{\bar{\gamma}}\mapsto\mathbb{P}_{\boldsymbol{r},\operatorname{B}}$ the transport map that assigns mass over $\mathcal{Z}$ from $\mathbb{P}_{\bar{\gamma}}$ to $\mathbb{P}_{\boldsymbol{r},\operatorname{B}}$ . The Monge formulation of optimal transport aims to find an optimal transport map that minimizes the transportation cost $\ell$ as follows

d_{M,q}(\mathbb{P}_{\bar{\gamma}},\mathbb{P}_{\boldsymbol{r},\operatorname{B}}):=\left(\min\limits_{f}\int_{\mathcal{Z}}\ell^{q}(\xi,f(\xi))\mathbb{P}_{\bar{\gamma}}(\xi)d\xi\right)^{1/q}.

It is known that if an optimal transport map $f^{\star}$ exists, then the optimal transportation plan $\Pi^{\star}$ exists and the two problems $d_{M,q}$ and $d_{W,q}$ coincide⁷⁷7This is because the Kantorovich transport problem is the tightest relaxation of the Monge transport problem. See e.g., [18] for details.. In our setting, for strongly convex $\ell^{p}$ , and by the fact that $\mathbb{P}_{\bar{\gamma}}$ is absolutely continuous, a unique optimal transport map can indeed be guaranteed⁸⁸8The Monge formulation is not always well-posed, i.e., there exists optimal transportation plans $\Pi^{\star}$ while transport map does not exist [18]., and therefore, $d_{M,q}=d_{W,q}$ .

Let us now consider any transport map $f$ of $d_{M,q}$ , and define a partition of the support of $\mathbb{P}_{\bar{\gamma}}$ by

W_{i}:=\{\boldsymbol{r}\in\mathcal{Z}\;|\;f(\boldsymbol{r})=\boldsymbol{r}^{(i)}\},\;i\in\mathcal{N},

where $\boldsymbol{r}^{(i)}$ are samples in $\Xi_{\operatorname{B}}$ , which generate $\mathbb{P}_{\boldsymbol{r},\operatorname{B}}$ . Then, we equivalently rewrite the value of the objective function defined in $d_{M,q}$ , as

\displaystyle\mathcal{H}(\mathbb{P}_{\bar{\gamma}},W_{1},\ldots,W_{N})=\sum\limits_{i=1}^{N}\int_{W_{i}}\ell^{q}(\xi,\boldsymbol{r}^{(i)})\mathbb{P}_{\bar{\gamma}}(\xi)d\xi,

\displaystyle\small\hskip 28.45274pt\operatorname{s.t.}\;\int_{W_{i}}\mathbb{P}_{\bar{\gamma}}(\xi)d\xi=\frac{1}{N},\;\forall\;i\in\mathcal{N},

(10)

where the $i^{\textup{th}}$ constraints come from the fact that a transport map $f$ should lead to consistent calculation of set volumes under $\mathbb{P}_{\boldsymbol{r},\operatorname{B}}$ and $\mathbb{P}_{\bar{\gamma}}$ , respectively. This results in the following equivalent problem to $d_{M,q}$ as

	$\displaystyle(d_{M,q}(\mathbb{P}_{\bar{\gamma}},\mathbb{P}_{\boldsymbol{r},\operatorname{B}}))^{q}=\min\limits_{W_{i},i\in\mathcal{N}}$	$\displaystyle\;\mathcal{H}(\mathbb{P}_{\bar{\gamma}},W_{1},\ldots,W_{N}),$		(P1)
	$\displaystyle\operatorname{s.t.}$	$\displaystyle\;\eqref{eq:mass}.$		(P1)

Given the distribution $\mathbb{P}_{\bar{\gamma}}$ , Problem (P1) reduces to a load-balancing problem as in [22]. Let us define the Generalized Voronoi Partition (GVP) of $\mathcal{Z}$ associated to the sample set $\Xi_{\operatorname{B}}$ and weight $\omega\in^{N}$ , for all $i\in\mathcal{N}$ , as

		$\displaystyle\mathcal{V}_{i}(\Xi_{\operatorname{B}},\omega)=$
		$\displaystyle\hskip 22.76228pt\{\xi\in\mathcal{Z}\;\|\;\\|\xi-\boldsymbol{r}^{(i)}\\|^{q}-\omega_{i}\leq\\|\xi-\boldsymbol{r}^{(j)}\\|^{q}-\omega_{j},\;\forall j\in\mathcal{N}\}.$

It has been established that the optimal Partition of (P1) is the GVP [22, Proposition V.1]. Further, the standard Voronoi partition, i.e., the GVP with equal weights $\bar{\omega}:=\boldsymbol{0}$ , results in a lower boundof (P1), when constraints (10) are removed [21], and therefore that of $d_{M,q}$ . We denote this lower bound as $L(\mathbb{P}_{\bar{\gamma}},\mathcal{V}(\Xi_{\operatorname{B}}))$ , and use this to quantify a probabilistic support of $\mathbb{P}_{\bar{\gamma}}$ . Let us consider the support set

\Omega(\Xi_{\operatorname{B}},\epsilon):=\cup_{i\in\mathcal{N}}\left(\mathcal{V}_{i}(\Xi_{\operatorname{B}})\cap\mathbb{B}_{\epsilon}(\boldsymbol{r}^{(i)})\right),

where $\mathbb{B}_{\epsilon}(\boldsymbol{r}^{(i)}):=\{\boldsymbol{r}\in^{p}\;|\;\|\boldsymbol{r}-\boldsymbol{r}^{(i)}\|\leq\epsilon\}$ . Then we have the following lemma.

Lemma IV.1 (Probabilistic Support).

Let $\epsilon>0$ and let $\mathbb{P}_{\bar{\gamma}}$ be a distribution such that $L(\mathbb{P}_{\bar{\gamma}},\mathcal{V}(\Xi_{\operatorname{B}}))\leq\epsilon^{q}$ . Then, for any given $s>1$ , at least $1-1/s^{q}$ portion of the mass of $\mathbb{P}_{\bar{\gamma}}$ is supported on $\Omega(\Xi_{\operatorname{B}},s\epsilon)$ , i.e., $\int_{\Omega(\Xi_{\operatorname{B}},s\epsilon)}\mathbb{P}_{\bar{\gamma}}(\xi)d\xi\geq 1-1/s^{q}$ .

Proof.

Suppose otherwise, i.e., $\int_{{}^{p}\setminus\Omega(\Xi_{\operatorname{B}},s\epsilon)}\mathbb{P}_{\bar{\gamma}}(\xi)d\xi=1-\int_{\Omega(\Xi_{\operatorname{B}},s\epsilon)}\mathbb{P}_{\bar{\gamma}}(\xi)d\xi>1/s^{q}$ . Then,

	$\displaystyle L(\mathbb{P}_{\bar{\gamma}},\mathcal{V}(\Xi_{\operatorname{B}}))$	$\displaystyle\geq\int_{{}^{p}\setminus\Omega(\Xi_{\operatorname{B}},s\epsilon)}\\|\xi-\boldsymbol{r}^{(i)}\\|^{q}\mathbb{P}_{\bar{\gamma}}(\xi)d\xi,$
		$\displaystyle\geq s^{q}\epsilon^{q}\int_{{}^{p}\setminus\Omega(\Xi_{\operatorname{B}},s\epsilon)}\mathbb{P}_{\bar{\gamma}}(\xi)d\xi>\epsilon^{q}.$

In this way, the support $\Omega(\Xi_{\operatorname{B}},2\alpha)$ contains at least $1-1/2^{q}$ of the mass of all the distributions $\mathbb{P}_{\bar{\gamma}}$ such that $d_{W,q}(\mathbb{P}_{\bar{\gamma}},\mathbb{P}_{\boldsymbol{r},\operatorname{B}})\leq\alpha$ . Equivalently, we have

\operatorname{Prob}\left(\bar{\gamma}\in\Omega(\Xi_{\operatorname{B}},2\alpha)\right)\geq 1-1/2^{q},

where the random variable $\bar{\gamma}$ has a distribution $\mathbb{P}_{\bar{\gamma}}$ such that $d_{W,q}(\mathbb{P}_{\bar{\gamma}},\mathbb{P}_{\boldsymbol{r},\operatorname{B}})\leq\alpha$ . We characterize $\mathbb{P}_{\boldsymbol{w}}$ similarly.

Note that in practice, one can choose ball radius factor $s$ large in order to generate support which contains higher portion of the mass of unknown $\mathbb{P}$ . However, it comes at a cost on the approximation of the reachable set.

Step 2: (Outer-approximation of $\mathcal{R}_{\boldsymbol{x},M}$ ) Making use of the probabilistic support, we can now obtain a finite-dimensional characterization of the probabilistic reachable set, as follows

\displaystyle\mathcal{R}_{\boldsymbol{x},M}=\left\{\hskip-0.77498pt\boldsymbol{x}(M)\in^{n}\;\rule[-11.38092pt]{0.56917pt}{28.45274pt}\;\begin{array}[]{l}\textrm{system }\eqref{eq:attack},\;\boldsymbol{\xi}(0)=\boldsymbol{\xi}_{0},\\ \boldsymbol{w}\in\Omega(\Xi_{\boldsymbol{w},\operatorname{B}},2\epsilon_{\boldsymbol{w},\operatorname{B}}),\\ \bar{\gamma}\in\Omega(\Xi_{\operatorname{B}},2\alpha)\end{array}\right\},

and the true system state $\boldsymbol{x}(t)$ at time $M$ , $\boldsymbol{x}(M)$ , satisfies $\operatorname{Prob}\left(\boldsymbol{x}(M)\in\mathcal{R}_{\boldsymbol{x},M}\right)\geq(1-\beta)(1-1/2^{q})^{2}.$ Many works focus on the tractable evolution of geometric shapes; e.g. [9, 13]. Here, we follow [13] and propose outer ellipsoidal bounds for $\mathcal{R}_{\boldsymbol{x},M}$ . Let $Q_{\boldsymbol{w}}$ be the positive-definite shape matrix such that $\Omega(\Xi_{\boldsymbol{w},\operatorname{B}},\epsilon_{\boldsymbol{w},\operatorname{B}})\subset\mathcal{E}_{\boldsymbol{w}}:=\{\boldsymbol{w}\;|\;{\boldsymbol{w}}^{\top}Q_{\boldsymbol{w}}\boldsymbol{w}\leq 1\}$ . Similarly, we denote $Q_{\bar{\gamma}}$ and $\mathcal{E}_{\bar{\gamma}}$ for that of $\bar{\gamma}$ . We now state the following lemma, that applies [13, Proposition 1] for our case.

Lemma IV.2 (Outer bounds of $\mathcal{R}_{\boldsymbol{x},M}$ ).

Given any $a\in[0,1)$ , we claim $\mathcal{R}_{\boldsymbol{x},M}\subset\mathcal{E}(Q):=\{\boldsymbol{x}\in^{n}\;|\;{\boldsymbol{\xi}}^{\top}Q\boldsymbol{\xi}\leq a^{M}{\boldsymbol{\xi}_{0}}^{\top}Q\boldsymbol{\xi}_{0}+\frac{(2-a)(1-a^{M})}{1-a}\}$ , with $Q$ satisfying

\displaystyle Q\succ 0,\;\;\begin{bmatrix}aQ&{H}^{\top}Q&\bf{0}\\ QH&Q&QG\\ \bf{0}&{G}^{\top}Q&W\end{bmatrix}\succeq 0,

(11)

where $H$ , $G$ are that in (9) and

	$\displaystyle W=\begin{bmatrix}(1-a_{1})Q_{\boldsymbol{w}}&\bf{0}\\ \bf{0}&(1-a_{2})Q_{\bar{\gamma}}\end{bmatrix},$		(12)
	$\displaystyle\textrm{for some }a_{1}+a_{2}\geq a,\;a_{1},a_{2}\in[0,1).$		(12)

A tight reachable set bound can be now derived by solving

	$\displaystyle\min\limits_{Q,a_{1},a_{2}}$	$\displaystyle\;-\log\det(Q),$		(P2)
	$\displaystyle\operatorname{s.t.}$	$\displaystyle\;\eqref{eq:Q},\;\eqref{eq:W},$		(P2)

which is a convex semidefinite program, solvable via e.g., SeDuMi [23]. Note that the probabilistic reachable set is

\mathcal{R}_{\boldsymbol{x}}:=\cup_{M=1}^{\infty}\mathcal{R}_{\boldsymbol{x},M},

which again can be approximated via $Q^{\star}$ solving (P2) for⁹⁹9The set $\mathcal{R}_{\boldsymbol{x}}$ is in fact contained in the projection of $\mathcal{E}(Q^{\star})$ onto the state subspace, i.e., $\mathcal{R}_{\boldsymbol{x}}\subset\{\boldsymbol{x}\;|\;{\boldsymbol{x}}^{\top}\big{(}Q_{xx}-Q_{xe}Q_{ee}^{-1}{Q_{xe}}^{\top}\big{)}\boldsymbol{x}\leq\frac{(2-a)}{1-a}\}$ with $Q^{\star}:=\begin{bmatrix}Q_{xx}&Q_{xe}\\ {Q_{xe}}^{\top}&Q_{ee}\end{bmatrix}$ . See, e.g., [13] for details.

\mathcal{R}_{\boldsymbol{x}}\subset\mathcal{E}(Q^{\star})=\{\boldsymbol{x}\in^{n}\;|\;{\boldsymbol{\xi}}^{\top}Q^{\star}\boldsymbol{\xi}\leq\frac{(2-a)}{1-a}\}.

V Simulations

In this section, we demonstrate the performance of the proposed attack detector, illustrating its distributional robustness w.r.t. the system noise. Then, we consider stealthy attacks as in (8) and analyze their impact by quantifying the probabilistic reachable set and outer-approximation bound.

Consider the stochastic system (2), given as

		$\displaystyle A=\begin{bmatrix}1.00&0.10\\ -0.20&0.75\end{bmatrix},\;B=\begin{bmatrix}0.10\\ 0.20\end{bmatrix},\;L=\begin{bmatrix}0.23\\ -0.20\end{bmatrix},$
		$\displaystyle C=\begin{bmatrix}1&0\end{bmatrix},\;K=\begin{bmatrix}-0.13&0.01\end{bmatrix},\;n=2,\;m=p=1,$
		$\displaystyle\boldsymbol{w}_{1}\sim\mathcal{N}(-25,02)+\mathcal{U}(0,5),\;\boldsymbol{v}\sim\mathcal{U}(-3,3),$
		$\displaystyle\boldsymbol{w}_{2}\sim\mathcal{N}(0,04)+\mathcal{U}(-2,2),\;$

where $\mathcal{N}$ and $\mathcal{U}$ represent the normal and uniform distributions, respectively. We consider $N=10^{3}$ benchmark samples for $\mathbb{P}_{\boldsymbol{r},\operatorname{B}}$ and $T=10^{2}$ real-time samples for $\mathbb{P}_{\boldsymbol{r},\operatorname{D}}$ . We select the parameter $q=1$ , $\beta=0.01$ and false alarm rate $\Delta=0.05$ . We select the prior information of the system noise via parameters $a=1.5$ , $c_{1}=1.84\times 10^{6}$ and $c_{2}=12.5$ . Using the measure-of-concentration results, we determine the detector threshold to be $\alpha=0.158$ . In the normal system operation (no attack), we run the online detection procedure for $10^{4}$ time steps and draw the distribution of the computed detection measure $z(t)$ as in Fig. 4. We verify that the false alarm rate is $3.68\%$ , within the required rate $\Delta=5\%$ . When the system is subject to stealthy attacks, we assume $\boldsymbol{\xi}_{0}=\boldsymbol{0}$ and visualize the Voronoi partition $\mathcal{V}(\Xi_{\boldsymbol{w},\operatorname{B}})$ (convex sets with blue boundaries) of the probabilistic support $\Omega(\Xi_{\boldsymbol{w},\operatorname{B}},\epsilon_{\boldsymbol{w},\operatorname{B}})$ and its estimated ellipsoidal bound (red line) as in Fig. 4. Further, we demonstrate the impact of the stealthy attacks (8), as in Fig. 4. We used $10^{4}$ empirical points of $\mathcal{R}_{\boldsymbol{x}}$ as its estimate and provided an ellipsoidal bound of $\mathcal{R}_{\boldsymbol{x}}$ computed by solution of (P2). It can be seen that the proposed probabilistic reachable set effectively captures the reachable set in probability. Due to the space limits, we omit the comparison of our approach to the existing ones, such as the classical $\chi^{2}$ detector in [9] and the CUMSUM procedure [8]. However, the difference should be clear: our proposed approach is robust w.r.t. noise distributions while others leverage the moment information up to the second order, which only capture sufficient information about certain noise distributions, e.g., Gaussian.

VI Conclusions

A novel detection measure was proposed to enable distributionally robust detection of attacks w.r.t. unknown, and light-tailed system noise. The proposed detection measure restricts the behavior of the stealthy attacks, whose impact was quantified via reachable-set analysis.

References

[1] A. Cardenas, S. Amin, B. Sinopoli, A. Giani, A. Perrig, and S. Sastry, “Challenges for securing cyber physical systems,” in Workshop on Future Directions of Cyber-Physical Systems, vol. 5, no. 1, 2009.
[2] F. Pasqualetti, F. Dorfler, and F. Bullo, “Attack detection and identification in cyber-physical systems,” IEEE Transactions on Automatic Control, vol. 58, no. 11, p. 2715–2729, 2013.
[3] S. Amin, A. Cardenas, and S. S. Sastry, “Safe and secure networked control systems under denial-of-service attacks,” in Hybrid systems: Computation and Control, 2009, p. 31–45.
[4] F. Miao, Q. Zhu, M. Pajic, and G. J. Pappas, “Coding schemes for securing cyber-physical systems against stealthy data injection attacks,” IEEE Transactions on Control of Network Systems, vol. 4, no. 1, pp. 106–117, 2016.
[5] C. Bai, F. Pasqualetti, and V. Gupta, “Data-injection attacks in stochastic control systems: Detectability and performance tradeoffs,” Automatica, vol. 82, pp. 251–260, 2017.
[6] Y. Mo and B. Sinopoli, “Secure control against replay attacks,” in Allerton Conf. on Communications, Control and Computing, Illinois, USA, September 2009, pp. 911–918.
[7] M. Zhu and S. Martínez, “On the performance analysis of resilient networked control systems under replay attacks,” IEEE Transactions on Automatic Control, vol. 59, no. 3, pp. 804–808, 2014.
[8] J. Milošević, D. Umsonst, H. Sandberg, and K. Johansson, “Quantifying the impact of cyber-attack strategies for control systems equipped with an anomaly detector,” in European Control Conference, 2018, pp. 331–337.
[9] Y. Mo and B. Sinopoli, “On the performance degradation of cyber-physical systems under stealthy integrity attacks,” IEEE Transactions on Automatic Control, vol. 61, no. 9, pp. 2618–2624, 2015.
[10] S. Mishra, Y. Shoukry, N. Karamchandani, S. Diggavi, and P. Tabuada, “Secure state estimation against sensor attacks in the presence of noise,” IEEE Transactions on Control of Network Systems, vol. 4, no. 1, pp. 49–59, 2016.
[11] C. Murguia, N. V. de Wouw, and J. Ruths, “Reachable sets of hidden CPS sensor attacks: Analysis and synthesis tools,” IFAC World Congress, vol. 50, no. 1, pp. 2088–2094, 2017.
[12] C. Bai, F. Pasqualetti, and V. Gupta, “Security in stochastic control systems: Fundamental limitations and performance bounds,” in American Control Conference, 2015, pp. 195–200.
[13] C. Murguia, I. Shames, J. Ruths, and D. Nesic, “Security metrics of networked control systems under sensor attacks,” arXiv preprint arXiv:1809.01808, 2018.
[14] V. Renganathan, N. Hashemi, J. Ruths, and T. Summers, “Distributionally robust tuning of anomaly detectors in Cyber-Physical systems with stealthy attacks,” arXiv preprint arXiv:1909.12506, 2019.
[15] A. R. Teel, J. P. Hespanha, and A. Subbaraman, “Equivalent characterizations of input-to-state stability for stochastic discrete-time systems,” IEEE Transactions on Automatic Control, vol. 59, no. 2, p. 516–522, 2014.
[16] M. Athans, “The role and use of the stochastic linear-quadratic-Gaussian problem in control system design,” IEEE Transactions on Automatic Control, vol. 16, no. 6, pp. 529–552, 1971.
[17] N. Fournier and A. Guillin, “On the rate of convergence in Wasserstein distance of the empirical measure,” Probability Theory and Related Fields, vol. 162, no. 3-4, p. 707–738, 2015.
[18] F. Santambrogio, Optimal transport for applied mathematicians. Springer, 2015.
[19] D. Li and S. Martínez, “High-confidence attack detection via Wasserstein-metric computations,” preprint arXiv:2003.07880, 2020.
[20] C. Murguia and J. Ruths, “Cusum and chi-squared attack detection of compromised sensors,” in IEEE Conf. on Control Applications, 2016, pp. 474–480.
[21] F. Bullo, J. Cortés, and S. Martínez, Distributed Control of Robotic Networks, ser. Applied Mathematics Series. Princeton University Press, 2009.
[22] J. Cortés, “Coverage optimization and spatial load balancing by robotic sensor networks,” IEEE Transactions on Automatic Control, vol. 55, no. 3, pp. 749–754, 2010.
[23] J. Sturm, “Using Sedumi 1.02, a MATLAB toolbox for optimization over symmetric cones,” Optimization Methods and Software, vol. 11, no. 1-4, pp. 625–653, 1999.

High-Confidence Attack Detection via Wasserstein-Metric Computations

Abstract

I Introduction

II CPS and Its Normal Operation

II-A Normal System Operation

Remark II.1 (Selection of LL and KK).

Theorem II.1 (Measure Concentration [17, Application of Theorem 2]).

III Threshold-based robust detection of attacks, and stealthiness

Lemma III.1 (Selection of α\alpha for Robust Detectors).

Proof.

Remark III.1 (Comparison with χ2\chi^{2}-detector).

III-A Detection and Stealthiness of Attacks

Definition III.1 (Attack Detection Characterization).

Lemma III.2 (Stealthy Attacks Leveraging System Noise).

Proof.

IV Stealthy Attack Analysis

Remark IV.1 (Constant attacks).

Lemma IV.1 (Probabilistic Support).

Proof.

Lemma IV.2 (Outer bounds of ℛx,M\mathcal{R}_{\boldsymbol{x},M}).

V Simulations

VI Conclusions

References

Remark II.1 (Selection of $L$ and $K$ ).

Lemma III.1 (Selection of $\alpha$ for Robust Detectors).

Remark III.1 (Comparison with $\chi^{2}$ -detector).

Lemma IV.2 (Outer bounds of $\mathcal{R}_{\boldsymbol{x},M}$ ).