This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

[section]

High-Confidence Attack Detection via Wasserstein-Metric Computations

Dan Li1 and Sonia Martínez1 This research was developed with funding from ONR N00014-19-1-2471, and AFOSR FA9550-19-1-0235.1D. Li and S. Martínez are with the Department of Mechanical and Aerospace Engineering, University of California San Diego, La Jolla, CA 92092, USA lidan@ucsd.edu; soniamd@ucsd.edu
Abstract

This paper considers a sensor attack and fault detection problem for linear cyber-physical systems, which are subject to system noise that can obey an unknown light-tailed distribution. We propose a new threshold-based detection mechanism that employs the Wasserstein metric, and which guarantees system performance with high confidence employing a finite number of measurements. The proposed detector may generate false alarms with a rate Δ\Delta in normal operation, where Δ\Delta can be tuned to be arbitrarily small by means of a benchmark distribution which is part of our mechanism. Thus, the proposed detector is sensitive to sensor attacks and faults which have a statistical behavior that is different from that of the system noise. We quantify the impact of stealthy attacks—which aim to perturb the system operation while producing false alarms that are consistent with the natural system noise—via a probabilistic reachable set. To enable tractable implementation of our methods, we propose a linear optimization problem that computes the proposed detection measure and a semidefinite program that produces the proposed reachable set.

I Introduction

Cyber-Physical Systems (CPS) are physical processes that are tightly integrated with computation and communication systems for monitoring and control. Examples include critical infrastructure, such as transportation networks, energy distribution systems, and the Internet. These systems are usually complex, large-scale and insufficiently supervised, making them vulnerable to attacks [1, 2]. A significant literature has studied various denial of service [3]false data-injection [4, 5]replay [6, 7], sensor, and integrity attacks [8, 9, 10, 11]. The majority of these works study attack-detection problems in a control-theoretical framework. This approach essentially employs detectors to identify abnormal behaviors by comparing estimation and measurements under some predefined metrics. However, attacks could be stealthy, and exploit knowledge of the system structure, uncertainty and noise information to inflict significant damage on the physical system while avoiding detection. This motivates the characterization of the impact of stealthy attacks via e.g. reachability set analysis [9, 12, 13]. To ensure computational tractability, these works assume either Gaussian or bounded system noise. However, these assumptions fall short in modeling all natural disturbances that can affect a system. Such systems would be vulnerable to stealthy attacks that disguise themselves via an intentionally selected, unbounded and non-Gaussian distribution. When designing detectors, an added difficulty is in obtaining tractable computations that can handle these more general distributions. More recently, novel measure of concentration has opened the way for online tractable and robust attack detection with probability guarantees under uncertainty. A first attempt in this direction is [14], where a Chebyshev inequality is used to design a detector and, an assessment of the impact of stealthy attacks is given, under the assumption that system noises were bounded. With the aim of obtaining a less conservative detection mechanism, we leverage an alternative measure-concentration result via Wasserstein metric. This metric is built from data gathered on the system, and can provide concentration result significantly sharper than that via Chebyshev inequality. In particular, we address the following question for linear CPSs: How to design an online attack-detection mechanism that is robust to light-tailed distributions of system noise while remaining sensitive to attacks and limiting the impact of the stealthy attack?

To answer the question, we consider a sensor-attack detection problem on a linear dynamical system. The linear system models a remotely-observed process that is subject to an additive noise described by an unknown, not-necessarily bounded, light-tailed distribution. To identify abnormal behavior, we employ a steady-state Kalman filter as well as a benchmark distribution of an offline sequence corresponding to the normal system operation. In this framework, we propose a novel detection mechanism that achieves online and robust attack detection of stealthy attacks in high confidence.

Statement of Contributions: 1) We propose a novel detection measure, which employs the Wasserstein distance between the benchmark and a distribution of the residual sequence obtained online. 2) We propose a novel threshold-detection mechanism, which exploits measure-of-concentration results to guarantee the robust detection of an attack with high confidence using a finite set of data, and which further enables the robust tuning of the false alarm rate. The proposed detector can effectively identify real-time attacks when its behavior differs from that of the system noise. In addition, the detector can handle systems noises that are not necessarily distributed as Gaussian. 3) We propose a quantifiable, probabilistic state-reachable set, which reveals the impact of the stealthy attacker and system noise in high probability. 4) To implement and analyze the proposed mechanism, we formulate a linear optimization problem and a semidefinite problem for the computation of the detection measure as well as the reachable set, respectively. We illustrate our methods in a two-dimensional linear system with irregular noise distributions and stealthy sensor attacks.

II CPS and Its Normal Operation

This section presents cyber-physical system (CPS) model, and underlying assumptions on the system and attacks.

Refer to caption
Figure 1: Cyber-Physical System Diagram.

A remotely-observed, cyber-physical system subject to sensor-measurement attacks, as in Fig. 1, is described as a discrete-time, stochastic, linear, and time-invariant system

𝒙(t+1)\displaystyle\boldsymbol{x}(t+1) =A𝒙(t)+B𝒖(t)+𝒘(t),\displaystyle=\;A\boldsymbol{x}(t)+B\boldsymbol{u}(t)+\boldsymbol{w}(t), (1)
𝒚(t)\displaystyle\boldsymbol{y}(t) =C𝒙(t)+𝒗(t)+𝜸(t),\displaystyle=\;C\boldsymbol{x}(t)+\boldsymbol{v}(t)+\boldsymbol{\gamma}(t),

where 𝒙(t)n\boldsymbol{x}(t)\in^{n}, 𝒖(t)m\boldsymbol{u}(t)\in^{m} and 𝒚(t)p\boldsymbol{y}(t)\in^{p} denote the system state, input and output at time tt\in\mathbb{N}, respectively. The state matrix AA, input matrix BB and output matrix CC are assumed to be known in advance. In particular, we assume that the pair (A,B)(A,B) is stabilizable, and (A,C)(A,C) is detectable. The process noise 𝒘(t)n\boldsymbol{w}(t)\in^{n} and output noise 𝒗(t)p\boldsymbol{v}(t)\in^{p} are independent zero-mean random vectors. We assume that each 𝒘(t)\boldsymbol{w}(t) and 𝒗(t)\boldsymbol{v}(t) are independent and identically distributed (i.i.d.) over time. We denote their (unknown, not-necessarily equal) distributions by 𝒘\mathbb{P}_{\boldsymbol{w}} and 𝒗\mathbb{P}_{\boldsymbol{v}}, respectively. In addition, we assume that 𝒘\mathbb{P}_{\boldsymbol{w}} and 𝒗\mathbb{P}_{\boldsymbol{v}} are light-tailed111 For a random vector 𝒘\boldsymbol{w} such that 𝒘\boldsymbol{w}\sim\mathbb{P}, we say \mathbb{P} is qq-light-tailed, q=1,2,q=1,2,\ldots, if c:=𝔼[exp(b𝒘a)]<c:=\mathbb{E}_{\mathbb{P}}[\exp(b\|\boldsymbol{w}\|^{a})]<\infty for some a>qa>q and b>0b>0. All examples listed have a moment generating function, so their exponential moment can be constructed for at least q=1q=1., excluding scenarios of systems operating under extreme events, or subject to large delays. In fact, Gaussian, Sub-Gaussian, Exponential distributions, and any distribution with a compact support set are admissible. This class of distributions is sufficient to characterize the uncertainty or noise of many practical problems.

An additive sensor-measurement attack is implemented via 𝜸(t)p\boldsymbol{\gamma}(t)\in^{p} in (1), on which we assume the following {assumption}[Attack model] It holds that 1) 𝜸(t)=𝟎\boldsymbol{\gamma}(t)=\boldsymbol{0} whenever there is no attack; 2) the attacker can modulate any component of 𝜸(t)\boldsymbol{\gamma}(t) at any time; 3) the attacker has unlimited computational resources and access to system information, e.g., AA, BB, CC, 𝒖\boldsymbol{u}, 𝒘\mathbb{P}_{\boldsymbol{w}} and 𝒗\mathbb{P}_{\boldsymbol{v}} to decide on 𝜸(t)\boldsymbol{\gamma}(t), tt\in\mathbb{N}.

II-A Normal System Operation

In what follows, we introduce the state observer that enables prediction in the absence of attacks (when 𝜸(t)=𝟎\boldsymbol{\gamma}(t)=\boldsymbol{0}). Since the distribution of system noise is unknown, we identify a benchmark distribution that can be used to characterize this unknown distribution with high confidence.

To predict (estimate) the system behavior, we leverage the system information (A,B,C)(A,B,C) and employ a Kalman filter

𝒙^(t+1)\displaystyle\hat{\boldsymbol{x}}(t+1) =A𝒙^(t)+B𝒖(t)+L(t)(𝒚(t)𝒚^(t)),\displaystyle=\;A\hat{\boldsymbol{x}}(t)+B\boldsymbol{u}(t)+L(t)\left(\boldsymbol{y}(t)-\hat{\boldsymbol{y}}(t)\right),
𝒚^(t)\displaystyle\hat{\boldsymbol{y}}(t) =C𝒙^(t),\displaystyle=\;C\hat{\boldsymbol{x}}(t),

where 𝒙^(t)\hat{\boldsymbol{x}}(t) is the state estimate and L(t)LL(t)\equiv L is the steady-state Kalman gain matrix. As the pair (A,C)(A,C) is detectable, the gain LL can be selected in such a way that the eigenvalues of ALCA-LC are inside the unit circle. This ensures asymptotic tracking of the state in expectation; that is, the expectation of the estimation error 𝒆(t):=𝒙(t)𝒙^(t)\boldsymbol{e}(t):=\boldsymbol{x}(t)-\hat{\boldsymbol{x}}(t) satisfies

𝔼[𝒆(t)]0 as t, for any 𝒙(0),𝒙^(0).\mathbb{E}[\boldsymbol{e}(t)]\rightarrow 0\textrm{ as }t\rightarrow\infty,\;\textrm{ for any }\boldsymbol{x}(0),\;\hat{\boldsymbol{x}}(0).

The further selection of eigenvalues of ALCA-LC and the structure of LL usually depends on additional control objectives such as noise attenuation requirements. In this paper, we additionally consider the estimated state feedback

𝒖(t)=K𝒙^(t),\boldsymbol{u}(t)=K\hat{\boldsymbol{x}}(t),

where KK is so that the following augmented system is stable222System (2) is input-to-state stable in probability (ISSp) relative to any compact set 𝒜\mathcal{A} which contains the origin, if we select KK such that eigenvalues of the matrix A+BKA+BK are inside the unit circle, see e.g. [15].

𝝃(t+1)=F𝝃(t)+G𝝈(t),\boldsymbol{\xi}(t+1)=F\boldsymbol{\xi}(t)+G\boldsymbol{\sigma}(t), (2)

where 𝝃(t):=(𝒙(t),𝒆(t))\boldsymbol{\xi}(t):={\left({\boldsymbol{x}}(t),\boldsymbol{e}(t)\right)}^{\top}, 𝝈(t):=(𝒘(t),𝒗(t)+𝜸(t))\boldsymbol{\sigma}(t):={\left(\boldsymbol{w}(t),\boldsymbol{v}(t)+\boldsymbol{\gamma}(t)\right)}^{\top},

F=[A+BKBK0ALC],G=[I0IL] and some 𝝃(0).F=\begin{bmatrix}A+BK&-BK\\ 0&A-LC\end{bmatrix},G=\begin{bmatrix}I&0\\ I&-L\end{bmatrix}\textrm{ and some }\boldsymbol{\xi}(0).
Remark II.1 (Selection of LL and KK).

In general, the selection of the matrices LL and KK for the system (1) is a nontrivial task, especially when certain performance criteria are to be satified, such as fast system response, energy conservation, or noise minimization. However, there are a few scenarios in which the Separation Principle can be invoked for a tractable design of LL and KK. For example, 1) when there is no system noise, matrices LL and KK can be designed separately, such that each A+BKA+BK and ALCA-LC have all eigenvalues contained inside the unit circle, respectively. 2) when noise are Gaussian, the gain matrices LL and KK can be designed to minimize the steady-state covariance matrix and control effort, via a separated design of a Kalman filter (as an observer) and a linear-quadratic regulator (as a controller). The resulting design is referred to as a Linear-Quadratic-Gaussian (LQG) control [16].

Consider the system initially operates normally with the proper selection of LL and KK, and assume that the augmented system (2) is in steady state, i.e., 𝔼[𝝃(t)]=𝟎\mathbb{E}[\boldsymbol{\xi}(t)]=\boldsymbol{0}. In order to design an attack detector later, we need a characterization of the distribution of the residue 𝒓(t)p\boldsymbol{r}(t)\in^{p}, which measures the difference between what we measure and what we expect to receive, as follows

𝒓(t):=\displaystyle\boldsymbol{r}(t)= 𝒚(t)𝒚^(t)=C𝒆(t)+𝒗(t)+𝜸(t).\displaystyle\;\boldsymbol{y}(t)-\hat{\boldsymbol{y}}(t)=C\boldsymbol{e}(t)+\boldsymbol{v}(t)+\boldsymbol{\gamma}(t).

When there is no attack, it can be verified that the random vector 𝒓(t)\boldsymbol{r}(t) is zero-mean, and light-tailed333This can be checked from the definition in the footnote 1, and the fact that 𝒓(t)\boldsymbol{r}(t) is a linear combination of zero-mean qq-light-tailed distributions., and we denote its unknown distribution by 𝒓\mathbb{P}_{\boldsymbol{r}}. We assume that a finite but large number NN of i.i.d. samples of 𝒓\mathbb{P}_{\boldsymbol{r}} are accessible, and acquired by collecting 𝒓(t)\boldsymbol{r}(t) for a sufficiently large time. We call these i.i.d. samples a benchmark data set, ΞB:={𝒓(i)}i=1N\Xi_{\operatorname{B}}:=\{\boldsymbol{r}^{(i)}\}_{i=1}^{N}, and construct the resulting empirical distribution 𝒓,B\mathbb{P}_{\boldsymbol{r},\operatorname{B}} by

𝒓,B:=1Ni=1Nδ{𝒓(i)},\mathbb{P}_{\boldsymbol{r},\operatorname{B}}:=\frac{1}{N}\sum_{i=1}^{N}\delta_{\{\boldsymbol{r}^{(i)}\}},

where the operator δ\delta is the mass function and the subscript B\operatorname{B} indicates that 𝒓,B\mathbb{P}_{\boldsymbol{r},\operatorname{B}} is the benchmark to approximate the unknown 𝒓\mathbb{P}_{\boldsymbol{r}}. We can claim that, the benchmark 𝒓,B\mathbb{P}_{\boldsymbol{r},\operatorname{B}} provides a good sense of the effect of the noise on the system (2) via the following measure concentration result

Theorem II.1 (Measure Concentration [17, Application of Theorem 2]).

If 𝐫\mathbb{P}_{\boldsymbol{r}} is a qq-light-tailed distribution for some q1q\geq 1, then for a given β(0,1)\beta\in(0,1), the following holds

Prob(dW,q(𝒓,B,𝒓)ϵB)1β,{\operatorname{Prob}}\left(d_{W,q}(\mathbb{P}_{\boldsymbol{r},\operatorname{B}},\mathbb{P}_{\boldsymbol{r}})\leq\epsilon_{\operatorname{B}}\right)\geq 1-\beta,

where Prob\operatorname{Prob} denotes the Probability, dW,qd_{W,q} denotes the qq-Wasserstein metric444 Let q(𝒵){\mathcal{M}_{q}}(\mathcal{Z}) denote the space of all qq-light-tailed probability distributions supported on 𝒵p\mathcal{Z}\subset^{p}. Then for any two distributions 1\mathbb{Q}_{1}, 2q(𝒵)\mathbb{Q}_{2}\in\mathcal{M}_{q}(\mathcal{Z}), the qq-Wasserstein metric [18] dW,q:q(𝒵)×q(𝒵)0d_{W,q}:\mathcal{M}_{q}(\mathcal{Z})\times\mathcal{M}_{q}(\mathcal{Z})\rightarrow\mathbb{R}_{\geq 0} is defined by dW,q(1,2):=(minΠ𝒵×𝒵q(ξ1,ξ2)Π(dξ1,dξ2))1/q,d_{W,q}(\mathbb{Q}_{1},\mathbb{Q}_{2}):=(\min\limits_{\Pi}\int_{\mathcal{Z}\times\mathcal{Z}}\ell^{q}(\xi_{1},\xi_{2})\Pi(d\xi_{1},d\xi_{2}))^{1/q}, where Π\Pi is in a set of all the probability distributions on 𝒵×𝒵\mathcal{Z}\times\mathcal{Z} with marginals 1\mathbb{Q}_{1} and 2\mathbb{Q}_{2}. The cost (ξ1,ξ2):=ξ1ξ2\ell(\xi_{1},\xi_{2}):=\|\xi_{1}-\xi_{2}\| is a norm on 𝒵\mathcal{Z}. in the probability space, and the parameter ϵB\epsilon_{\operatorname{B}} is selected as

ϵB:={(log(c1β1)c2N)q/a,ifN<log(c1β1)c2,ϵ¯,ifNlog(c1β1)c2,\epsilon_{\operatorname{B}}:=\left\{{\begin{array}[]{*{10}{l}}\left(\frac{\log(c_{1}\beta^{-1})}{c_{2}N}\right)^{q/a},&\textrm{if}\;N<\frac{\log(c_{1}\beta^{-1})}{c_{2}},\\ \bar{\epsilon},&\textrm{if}\;N\geq\frac{\log(c_{1}\beta^{-1})}{c_{2}},\end{array}}\right. (3)

for some constant555The parameter aa is determined as in the definition of 𝐫\mathbb{P}_{\boldsymbol{r}} and the constants c1c_{1}, c2c_{2} depend on qq, mm, and 𝐫\mathbb{P}_{\boldsymbol{r}} (via aa, bb, cc). When information on 𝐫\mathbb{P}_{\boldsymbol{r}} is absent, the parameters aa, c1c_{1} and c2c_{2} can be determined in a data-driven fashion using sufficiently many samples of 𝐫\mathbb{P}_{\boldsymbol{r}}. See [17] for details. a>qa>q, c1c_{1}, c2>0c_{2}>0, and ϵ¯\bar{\epsilon} is such that

ϵ¯log(2+1/ϵ¯)=(log(c1β1)c2N)1/2, if p=2q,\small\small\frac{\bar{\epsilon}}{\log(2+1/\bar{\epsilon})}=\left(\frac{\log(c_{1}\beta^{-1})}{c_{2}N}\right)^{1/2},\textrm{ if }p=2q,

or

ϵ¯=(log(c1β1)c2N)1/max{2,p/q}, if p2q,\small\bar{\epsilon}=\left(\frac{\log(c_{1}\beta^{-1})}{c_{2}N}\right)^{1/\max\{2,p/q\}},\textrm{ if }p\neq 2q,

where pp is the dimension of 𝐫\boldsymbol{r}. \square

Theorem II.1 provides a probabilistic bound ϵB\epsilon_{\operatorname{B}} on the qq-Wasserstein distance between 𝒓,B\mathbb{P}_{\boldsymbol{r},\operatorname{B}} and 𝒓\mathbb{P}_{\boldsymbol{r}}, with a confidence at least 1β1-\beta. The result indicates how to tune the parameter β\beta and the number of benchmark samples NN that are needed to find a sufficiently good approximation of 𝒓\mathbb{P}_{\boldsymbol{r}}, by means of 𝒓,B\mathbb{P}_{\boldsymbol{r},\operatorname{B}}. In this way, given a fixed ϵ\epsilon, we can increase our confidence (1β1-\beta) on whether 𝒓\mathbb{P}_{\boldsymbol{r}} and 𝒓,B\mathbb{P}_{\boldsymbol{r},\operatorname{B}} are within distance ϵ\epsilon, by increasing the number of samples. We assume that 𝒓,B\mathbb{P}_{\boldsymbol{r},\operatorname{B}} has been determined in advance, selecting a very large (unique) NN to ensure various very small bounds ϵB\epsilon_{\operatorname{B}} associated with various β\beta. Later, we discuss how the parameter β\beta can be interpreted as a false alarm rate in the proposed attack detector. The resulting 𝒓,B\mathbb{P}_{\boldsymbol{r},\operatorname{B}}, with a tunnable false alarm rate (depending on β\beta), will allow us to design a detection procedure which is robust to the system noise.

III Threshold-based robust detection of attacks, and stealthiness

This section presents our online detection procedure, and a threshold-based detector with high-confidence performance guarantees. Then, we propose a tractable computation of the detection measure used for online detection. We finish the section by introducing a class of stealthy attacks.

Online Detection Procedure (ODP): At each time tTt\geq T, we construct a TT-step detector distribution

𝒓,D:=1Tj=0T1δ{𝒓(tj)},\mathbb{P}_{\boldsymbol{r},\operatorname{D}}:=\frac{1}{T}\sum_{j=0}^{T-1}\delta_{\{\boldsymbol{r}(t-j)\}},

where 𝒓(tj)\boldsymbol{r}(t-j) is the residue data collected independently at time tjt-j, for j{0,,T1}j\in\{{0},\dots,{T-1}\}. Then with a given qq and a threshold α>0\alpha>0, we consider the detection measure

z(t):=dW,q(𝒓,B,𝒓,D),z(t):=d_{W,q}(\mathbb{P}_{\boldsymbol{r},\operatorname{B}},\mathbb{P}_{\boldsymbol{r},\operatorname{D}}), (4)

and the attack detector

{z(t)α, no alarm at t:Alarm(t)=0,z(t)>α, alarm at t:Alarm(t)=1,\begin{cases}z(t)\leq\alpha,&\textrm{ no alarm at }t:\operatorname{Alarm}(t)=0,\\ z(t)>\alpha,&\textrm{ alarm at }t:\operatorname{Alarm}(t)=1,\end{cases} (5)

with Alarm(t)\operatorname{Alarm}(t) the sequence of alarms generated online based on the previous threshold.

The distribution 𝒓,D\mathbb{P}_{\boldsymbol{r},\operatorname{D}} uses a small number TT of samples to ensure the online computational tractability of z(t)z(t), so 𝒓,D\mathbb{P}_{\boldsymbol{r},\operatorname{D}} is highly dependent on instantaneous samples. Thus, 𝒓,D\mathbb{P}_{\boldsymbol{r},\operatorname{D}} may significantly deviate from the true 𝒓\mathbb{P}_{\boldsymbol{r}}, and from 𝒓,B\mathbb{P}_{\boldsymbol{r},\operatorname{B}}. Therefore, even if there is no attack, the attack detector is expected to generate false alarms due to the system noise as well as an improper selection of the threshold α\alpha. In the following, we discuss how to select an α\alpha that is robust to the system noise and which results in a desired false alarm rate. Note that the value α\alpha should be small to be able to distinguish attacks from noise, as discussed later.

Lemma III.1 (Selection of α\alpha for Robust Detectors).

Given parameters NN, TT, qq, β\beta, and a desired false alarm rate Δ>β\Delta>\beta at time tt, if we select the threshold α\alpha as

α:=ϵB+ϵD,\alpha:=\epsilon_{\operatorname{B}}+\epsilon_{\operatorname{D}},

where ϵB\epsilon_{\operatorname{B}} is chosen as in (3) and ϵD\epsilon_{\operatorname{D}} is selected following the ϵB\epsilon_{\operatorname{B}}-formula (3), but with TT and Δβ1β\frac{\Delta-\beta}{1-\beta} in place of NN and β\beta, respectively. Then, the detection measure (4) satisfies

Prob(z(t)α)1Δ,{\operatorname{Prob}}\left(z(t)\leq\alpha\right)\geq 1-\Delta,

for any zero-mean qq-light-tailed underlying distribution 𝐫\mathbb{P}_{\boldsymbol{r}}.

Due to space limit, please see ArXiv version [19] for proofs.

Proof.

The proof leverages the triangular inequality

z(t)dW,q(𝒓,B,𝒓)+dW,q(𝒓,D,𝒓),z(t)\leq d_{W,q}(\mathbb{P}_{\boldsymbol{r},\operatorname{B}},\mathbb{P}_{\boldsymbol{r}})+d_{W,q}(\mathbb{P}_{\boldsymbol{r},\operatorname{D}},\mathbb{P}_{\boldsymbol{r}}),

the measure concentration result for each dW,qd_{W,q} term, and that samples of 𝒓,B\mathbb{P}_{\boldsymbol{r},\operatorname{B}} and 𝒓,D\mathbb{P}_{\boldsymbol{r},\operatorname{D}} are collected independently.

Lemma III.1 ensures that the false alarm rate is no higher than Δ\Delta when there is no attack, i.e.,

Prob(Alarm(t)=1|noattack)Δ,t.\operatorname{Prob}(\operatorname{Alarm}(t)=1\;|\;\operatorname{no\;attack})\leq\Delta,\quad\forall\;t.

Note that the rate Δ\Delta can be selected by properly choosing the threshold α\alpha. Intuitively, if we fix all the other parameters, then the smaller the rate Δ\Delta, the larger the threshold α\alpha. Also, large values of NN, TT, 1β1-\beta contribute to small α\alpha.

Remark III.1 (Comparison with χ2\chi^{2}-detector).

Consider an alternative detection measure

zχ(t):=𝒓(t)Σ1𝒓(t),z_{\chi}(t):={\boldsymbol{r}(t)}^{\top}\Sigma^{-1}\boldsymbol{r}(t),

where Σ\Sigma is the constant covariance matrix of the residue 𝐫(t)\boldsymbol{r}(t) under normal system operation. In particular, if 𝐫\boldsymbol{r} is Gaussian, the detection measure zχ(t)z_{\chi}(t) is χ2\chi^{2}-distributed and referred to as χ2\chi^{2} detection measure with pp degree of freedom. The detector threshold α\alpha is selected via look-up tables of χ2\chi^{2} distribution, given the desired false alarm rate Δ\Delta. To compare z(t)z(t) with zχ(t)z_{\chi}(t), we leverage the assumption that 𝐫\boldsymbol{r} is Gaussian with the given covariance Σ\Sigma. This gives explicitly the expression of z(t)z(t) the following

z(t):=(𝔼ξ𝒩(𝒓(t),Σ)[ξq])1/q.z(t):=\left(\mathbb{E}_{\xi\sim\mathcal{N}(\boldsymbol{r}(t),\Sigma)}[\|\xi\|^{q}]\right)^{-1/q}.

By selecting q=2q=2, we have

z(t):=(𝒓(t)𝒓(t)+Tr(Σ))1/2.z(t):=\left({\boldsymbol{r}(t)}^{\top}\boldsymbol{r}(t)+Tr(\Sigma)\right)^{-1/2}.

Note that, the measure-of-concentration result in Theorem II.1 is sharp when 𝐫\boldsymbol{r} is Gaussian, which in fact results in the threshold α\alpha as tight as that derived for χ2\chi^{2}-detector.

Computation of detection measure: To achieve a tractable computation of the detection measure z(t)z(t), we leverage the definition of the Wasserstein distance (see footnote 4) and the fact that both 𝒓,B\mathbb{P}_{\boldsymbol{r},\operatorname{B}} and 𝒓,D\mathbb{P}_{\boldsymbol{r},\operatorname{D}} are discrete. The solution is given as a linear program.

The Wasserstein distance dW,q(𝒓,B,𝒓,D)d_{W,q}(\mathbb{P}_{\boldsymbol{r},\operatorname{B}},\mathbb{P}_{\boldsymbol{r},\operatorname{D}}), originally solving the Kantorovich optimal transport problem [18], can be interpreted as the minimal work needed to move a mass of points described via a probability distribution 𝒓,B(𝒓)\mathbb{P}_{\boldsymbol{r},\operatorname{B}}(\boldsymbol{r}), on the space 𝒵p\mathcal{Z}\subset^{p}, to a mass of points described by the probability distribution 𝒓,D(𝒓)\mathbb{P}_{\boldsymbol{r},\operatorname{D}}(\boldsymbol{r}) on the same space, with some transportation cost \ell. The minimization that defines dW,qd_{W,q} is taken over the space of all the joint distributions Π\Pi on 𝒵×𝒵\mathcal{Z}\times\mathcal{Z} whose marginals are 𝒓,B\mathbb{P}_{\boldsymbol{r},\operatorname{B}} and 𝒓,D\mathbb{P}_{\boldsymbol{r},\operatorname{D}}, respectively.

Assuming that both 𝒓,B\mathbb{P}_{\boldsymbol{r},\operatorname{B}} and 𝒓,D\mathbb{P}_{\boldsymbol{r},\operatorname{D}} are discrete, we can equivalently characterize the joint distribution Π\Pi as a discrete mass optimal transportation plan [18]. To do this, let us consider two sets 𝒩:={1,,N}\mathcal{N}:=\{{1},\dots,{N}\} and 𝒯:={0,,T1}\mathcal{T}:=\{{0},\dots,{T-1}\}. Then, Π\Pi can be parameterized (by λ\lambda) as follows

Πλ\displaystyle\Pi_{\lambda} (ξ1,ξ2):=i𝒩j𝒯λijδ{𝒓(i)}(ξ1)δ{𝒓(tj)}(ξ2),\displaystyle(\xi_{1},\xi_{2}):=\sum_{i\in\mathcal{N}}\sum_{j\in\mathcal{T}}\lambda_{ij}\delta_{\{\boldsymbol{r}^{(i)}\}}(\xi_{1})\delta_{\{\boldsymbol{r}(t-j)\}}(\xi_{2}),
s.t.\displaystyle\operatorname{s.t.} i𝒩λij=1T,j𝒯,j𝒯λij=1N,i𝒩,\displaystyle\hskip 0.0pt\sum_{i\in\mathcal{N}}\lambda_{ij}=\frac{1}{T},\;\forall\;j\in\mathcal{T},\quad\sum_{j\in\mathcal{T}}\lambda_{ij}=\frac{1}{N},\;\forall\;i\in\mathcal{N}, (6)
λij0,i𝒩,j𝒯.\displaystyle\hskip 8.5359pt\lambda_{ij}\geq 0,\;\forall\;i\in\mathcal{N},\;j\in\mathcal{T}. (7)

Note that this characterizes all the joint distributions with marginals 𝒓,B\mathbb{P}_{\boldsymbol{r},\operatorname{B}} and 𝒓,D\mathbb{P}_{\boldsymbol{r},\operatorname{D}}, where λ\lambda is the allocation of the mass from 𝒓,B\mathbb{P}_{\boldsymbol{r},\operatorname{B}} to 𝒓,D\mathbb{P}_{\boldsymbol{r},\operatorname{D}}. Then, the proposed detection measure z(t)z(t) in (4) reduces to the following

(z(t))q:=minλ\displaystyle(z(t))^{q}=\min\limits_{\lambda} i𝒩j𝒯λij𝒓(i)𝒓(tj)q,\displaystyle\;\sum_{i\in\mathcal{N}}\sum_{j\in\mathcal{T}}\lambda_{ij}\|\boldsymbol{r}^{(i)}-\boldsymbol{r}(t-j)\|^{q}, (P)
s.t.\displaystyle\operatorname{s.t.} (6),(7),\displaystyle\;\eqref{eq:lambda1},\;\eqref{eq:lambda2},

which is a linear program over a compact polyhedral set. Therefore, the solution exists and (P) can be solved to global optimal in polynomial time by e.g., a CPLEX solver.

III-A Detection and Stealthiness of Attacks

Following from the previous discussion, we now introduce a False Alarm Quantification Problem, then specialize it to the Attack Detection Problem considered in this paper. In particular, we analyze the sensitivity of the proposed attack detector method for the cyber-physical system under attacks.

Problem 1. (False Alarm Quantification Problem): Given the augmented system (2), the online detection procedure in Section III, and the attacker type described in Assumption 1, compute the false alarm rate

Prob(false alarm at t):=\displaystyle\operatorname{Prob}(\textrm{false alarm at }t)=
Prob(Alarm(t)=1|noattack)Prob(noattack)\displaystyle\hskip 28.45274pt\operatorname{Prob}(\operatorname{Alarm}(t)=1\;|\;\operatorname{no\;attack})\operatorname{Prob}(\operatorname{no\;attack})
+Prob(Alarm(t)=0|attack)Prob(attack).\displaystyle\hskip 54.06006pt+\operatorname{Prob}(\operatorname{Alarm}(t)=0\;|\;\operatorname{attack})\operatorname{Prob}(\operatorname{attack}).

Problem 1, on the computation of the false alarm probability, requires prior information of the attack probability Prob(attack)\operatorname{Prob}(\operatorname{attack}). In this work, we are interested in stealthy attacks, i.e., attacks that cannot be effectively detected by (5). We are led to the following problem.

Problem 2. (Attack Detection Problem): Given the setting of Problem 1, provide conditions that characterize stealthy attacks, i.e., attacks that contribute to Prob(Alarm(t)=0|attack)\operatorname{Prob}(\operatorname{Alarm}(t)=0\;|\;\operatorname{attack}), and quantify their impact on the system.

To remain undetected, the attacker must select 𝜸(t)\boldsymbol{\gamma}(t) such that z(t)z(t) is limited to within the threshold α\alpha. To quantify the effects of these attacks, let us consider an attacker sequence backward in time with length TT. At time tt, denote the arbitrary injected attacker sequence by 𝜸(tj)p\boldsymbol{\gamma}(t-j)\in^{p}, j{0,,T1}j\in\{{0},\dots,{T-1}\} (if tj<0t-j<0, assume 𝜸(tj)=0\boldsymbol{\gamma}(t-j)=0). This sequence, together with (2), results in a detection sequence {𝒓(tj)}j\{\boldsymbol{r}(t-j)\}_{j} that can be used to construct detector distribution 𝒓,D\mathbb{P}_{\boldsymbol{r},\operatorname{D}} and detection measure z(t)z(t). For simplicity and w.l.o.g., let us assume that 𝜸(t)\boldsymbol{\gamma}(t) is in the following form

𝜸(t):=C𝒆(t)𝒗(t)+γ¯(t),\boldsymbol{\gamma}(t):=-C\boldsymbol{e}(t)-\boldsymbol{v}(t)+\bar{\gamma}(t), (8)

where γ¯(t)p\bar{\gamma}(t)\in^{p} is any vector selected by the attacker.666 Note that, when there is no attack at tt, we have 𝜸(t)=𝟎\boldsymbol{\gamma}(t)=\boldsymbol{0}, resulting in γ¯(t)=C𝒆(t)+𝒗(t)\bar{\gamma}(t)=C\boldsymbol{e}(t)+\boldsymbol{v}(t). Similar techniques appearred in, e.g., [9, 20]. We characterize the scenarios that can occur, providing a first, partial answer to Problem 2. Then, we will come back to characterizing the impact of stealthy attacks in Section IV.

Definition III.1 (Attack Detection Characterization).

Assume (2) is subject to attack, i.e., 𝛄(t)𝟎\boldsymbol{\gamma}(t)\neq\boldsymbol{0} for some t0t\geq 0.

  • If z(t)αz(t)\leq\alpha, t0\forall\;t\geq 0, then the attack is stealthy with probability one, i.e., Prob(Alarm(t)=0|attack)=1\operatorname{Prob}(\operatorname{Alarm}(t)=0\;|\operatorname{attack})=1.

  • If z(t)αz(t)\leq\alpha, tM\forall t\leq M, then the attack is MM-step stealthy.

  • If z(t)>αz(t)>\alpha, t0\forall t\geq 0, then the attack is active with probability one, i.e., Prob(Alarm(t)=0|attack)=0\operatorname{Prob}(\operatorname{Alarm}(t)=0\;|\operatorname{attack})=0.

Lemma III.2 (Stealthy Attacks Leveraging System Noise).

Assume (2) is subject to attack in form (8), where γ¯(t)\bar{\gamma}(t) is stochastic and distributed as γ¯\mathbb{P}_{\bar{\gamma}}. If γ¯\mathbb{P}_{\bar{\gamma}} is selected such that dW,q(γ¯,𝐫,B)ϵBd_{W,q}(\mathbb{P}_{\bar{\gamma}},\mathbb{P}_{\boldsymbol{r},\operatorname{B}})\leq\epsilon_{\operatorname{B}}, then the attacker is stealthy with (high) probability at least 1Δ1β\frac{1-\Delta}{1-\beta}, i.e., Prob(Alarm(t)=0|attack)1Δ1β\operatorname{Prob}(\operatorname{Alarm}(t)=0\;|\;\operatorname{attack})\geq\frac{1-\Delta}{1-\beta}.

Proof.

Assume (2) is under attack. Then, we prove the statement leveraging the measure concentration

Prob(dW,q(γ¯,𝒓,D)ϵD)1Δβ1β,{\operatorname{Prob}}\left(d_{W,q}(\mathbb{P}_{\bar{\gamma}},\mathbb{P}_{\boldsymbol{r},\operatorname{D}})\leq\epsilon_{\operatorname{D}}\right)\geq 1-\frac{\Delta-\beta}{1-\beta},

which holds as 𝒓,D\mathbb{P}_{\boldsymbol{r},\operatorname{D}} is constructed using samples of γ¯\mathbb{P}_{\bar{\gamma}}, and the triangular inequality

z(t)dW,q(𝒓,B,γ¯)+dW,q(𝒓,D,γ¯),z(t)\leq d_{W,q}(\mathbb{P}_{\boldsymbol{r},\operatorname{B}},\mathbb{P}_{\bar{\gamma}})+d_{W,q}(\mathbb{P}_{\boldsymbol{r},\operatorname{D}},\mathbb{P}_{\bar{\gamma}}),

resulting in z(t)αz(t)\leq\alpha with probability at least 1Δ1β\frac{1-\Delta}{1-\beta}.

Note that α>ϵB\alpha>\epsilon_{\operatorname{B}}, which allows the attacker to select γ¯\mathbb{P}_{\bar{\gamma}} with ϵB<dW,q(γ¯,𝒓,B)α\epsilon_{\operatorname{B}}<d_{W,q}(\mathbb{P}_{\bar{\gamma}},\mathbb{P}_{\boldsymbol{r},\operatorname{B}})\leq\alpha. However, the probability of being stealthy can be indefinitely low, with the range [0,1Δ1β][0,\frac{1-\Delta}{1-\beta}].

IV Stealthy Attack Analysis

In this section, we address the second question in Problem 2 via reachable-set analysis. In particular, we achieve an outer-approximation of the finite-step probabilistic reachable set, quantifying the effect of the stealthy attacks and the system noise in probability.

Consider an attack sequence 𝜸(t)\boldsymbol{\gamma}(t) as in (8), where γ¯(t)p\bar{\gamma}(t)\in^{p} is any vector such that the attack remains stealthy. That is, γ¯(t)\bar{\gamma}(t) results in the detected distribution 𝒓,D\mathbb{P}_{\boldsymbol{r},\operatorname{D}}, which is close to 𝒓,B\mathbb{P}_{\boldsymbol{r},\operatorname{B}} as prescribed by α\alpha. This results in the following representation of the system (2)

𝝃(t+1)=[A+BKBK0A]H𝝃(t)+[I0IL]G[𝒘(t)γ¯(t)].\small\boldsymbol{\xi}(t+1)=\underbrace{\begin{bmatrix}A+BK&-BK\\ 0&A\end{bmatrix}}_{H}\boldsymbol{\xi}(t)+\underbrace{\begin{bmatrix}I&0\\ I&-L\end{bmatrix}}_{G}\begin{bmatrix}\boldsymbol{w}(t)\\ \bar{\gamma}(t)\end{bmatrix}. (9)

We provide in the following remark an intuition of how restrictive the stealthy attacker’s action γ¯(t)\bar{\gamma}(t) has to be.

Remark IV.1 (Constant attacks).

Consider a constant offset attack γ¯(t):=γ0\bar{\gamma}(t):=\gamma_{0} for some γ0p\gamma_{0}\in^{p}, t\forall t. Then by (P),

z(t)=N11/qγ0C(ΞB),C(ΞB):=1Ni𝒩𝒓(i).z(t)=N^{1-1/q}\|\gamma_{0}-\textrm{C}(\Xi_{\operatorname{B}})\|,\quad\textrm{C}(\Xi_{\operatorname{B}}):=\frac{1}{N}\sum_{i\in\mathcal{N}}\boldsymbol{r}^{(i)}.

To ensure stealth, we require z(t)αz(t)\leq\alpha, this then limits the selection of γ0\gamma_{0} in a ball centered at C(ΞB)\textrm{C}(\Xi_{\operatorname{B}}) with radius α/N11/q\alpha/N^{1-1/q}. Note that the radius can be arbitrarily small by choosing the benchmark size NN large.

To quantify the reachable set of the system under attacks, prior information on the process noise 𝒘(t)\boldsymbol{w}(t) is needed. To characterize 𝒘(t)\boldsymbol{w}(t), let us assume that, similarly to the benchmark 𝒓,B\mathbb{P}_{\boldsymbol{r},\operatorname{B}}, we are able to construct a noise benchmark distribution, denoted by 𝒘,B\mathbb{P}_{\boldsymbol{w},\operatorname{B}}. As before,

Prob(dW,q(𝒘,B,𝒘)ϵ𝒘,B)1β,{\operatorname{Prob}}\left(d_{W,q}(\mathbb{P}_{\boldsymbol{w},\operatorname{B}},\mathbb{P}_{\boldsymbol{w}})\leq\epsilon_{\boldsymbol{w},\operatorname{B}}\right)\geq 1-\beta,

for some ϵ𝒘,B\epsilon_{\boldsymbol{w},\operatorname{B}}. Given certain time, we are interested in where, with high probability, the state of the system can evolve from some 𝝃0\boldsymbol{\xi}_{0}. To do this, we consider the MM-step probabilistic reachable set defined as follows

𝒙,M:={𝒙(M)n system (9) with 𝝃(0)=𝝃0,𝒘dW,q(𝒘,𝒘,B)ϵ𝒘,B,γ¯dW,q(γ¯,𝒓,B)α,},\displaystyle\mathcal{R}_{\boldsymbol{x},M}=\left\{\hskip-0.77498pt\boldsymbol{x}(M)\in^{n}\;\rule[-11.38092pt]{0.56917pt}{28.45274pt}\;\begin{array}[]{l}\textrm{system }\eqref{eq:attack}\textrm{ with }\boldsymbol{\xi}(0)=\boldsymbol{\xi}_{0},\\ \exists\;\mathbb{P}_{\boldsymbol{w}}\ni d_{W,q}(\mathbb{P}_{\boldsymbol{w}},\mathbb{P}_{\boldsymbol{w},\operatorname{B}})\leq\epsilon_{\boldsymbol{w},\operatorname{B}},\\ \exists\;\mathbb{P}_{\bar{\gamma}}\ni d_{W,q}(\mathbb{P}_{\bar{\gamma}},\mathbb{P}_{\boldsymbol{r},\operatorname{B}})\leq\alpha,\\ \end{array}\right\},

then the true system state 𝒙(t)\boldsymbol{x}(t) at time MM, 𝒙(M)\boldsymbol{x}(M), satisfies

Prob(𝒙(M)𝒙,M)1β,\operatorname{Prob}\left(\boldsymbol{x}(M)\in\mathcal{R}_{\boldsymbol{x},M}\right)\geq 1-\beta,

where 1β1-\beta accounts for the independent restriction of the unknown distributions 𝒘\mathbb{P}_{\boldsymbol{w}} to be “close” to its benchmark.

The exact computation of 𝒙,M\mathcal{R}_{\boldsymbol{x},M} is intractable due to the unbounded support of the unknown distributions 𝒘\mathbb{P}_{\boldsymbol{w}} and γ¯\mathbb{P}_{\bar{\gamma}}, even if they are close to their benchmark. To ensure a tractable approximation, we follow a two-step procedure. First, we equivalently characterize the constraints on \mathbb{P} by its probabilistic support set. Then, we outer-approximate the probabilistic support by ellipsoids, and then the reachable set by an ellipsoidal bound.

Step 1: (Probabilistic Support of γ¯dW,q(γ¯,r,B)α\mathbb{P}_{\bar{\gamma}}\;\ni d_{W,q}(\mathbb{P}_{\bar{\gamma}},\mathbb{P}_{\boldsymbol{r},\operatorname{B}})\leq\alpha) We achieve this by leveraging 1) the Monge formulation [18] of optimal transport, 2) the fact that 𝒓,B\mathbb{P}_{\boldsymbol{r},\operatorname{B}} is discrete, and 3) results from coverage control [21, 22]. W.l.o.g., let us assume γ¯\mathbb{P}_{\bar{\gamma}} is non-atomic (or continuous) and, consider the distribution γ¯\mathbb{P}_{\bar{\gamma}} and 𝒓,B\mathbb{P}_{\boldsymbol{r},\operatorname{B}} supported on 𝒵p\mathcal{Z}\subset^{p}. Let us denote by f:γ¯𝒓,Bf:\mathbb{P}_{\bar{\gamma}}\mapsto\mathbb{P}_{\boldsymbol{r},\operatorname{B}} the transport map that assigns mass over 𝒵\mathcal{Z} from γ¯\mathbb{P}_{\bar{\gamma}} to 𝒓,B\mathbb{P}_{\boldsymbol{r},\operatorname{B}}. The Monge formulation of optimal transport aims to find an optimal transport map that minimizes the transportation cost \ell as follows

dM,q(γ¯,𝒓,B):=(minf𝒵q(ξ,f(ξ))γ¯(ξ)𝑑ξ)1/q.d_{M,q}(\mathbb{P}_{\bar{\gamma}},\mathbb{P}_{\boldsymbol{r},\operatorname{B}}):=\left(\min\limits_{f}\int_{\mathcal{Z}}\ell^{q}(\xi,f(\xi))\mathbb{P}_{\bar{\gamma}}(\xi)d\xi\right)^{1/q}.

It is known that if an optimal transport map ff^{\star} exists, then the optimal transportation plan Π\Pi^{\star} exists and the two problems dM,qd_{M,q} and dW,qd_{W,q} coincide777This is because the Kantorovich transport problem is the tightest relaxation of the Monge transport problem. See e.g., [18] for details.. In our setting, for strongly convex p\ell^{p}, and by the fact that γ¯\mathbb{P}_{\bar{\gamma}} is absolutely continuous, a unique optimal transport map can indeed be guaranteed888The Monge formulation is not always well-posed, i.e., there exists optimal transportation plans Π\Pi^{\star} while transport map does not exist [18]., and therefore, dM,q=dW,qd_{M,q}=d_{W,q}.

Let us now consider any transport map ff of dM,qd_{M,q}, and define a partition of the support of γ¯\mathbb{P}_{\bar{\gamma}} by

Wi:={𝒓𝒵|f(𝒓)=𝒓(i)},i𝒩,W_{i}:=\{\boldsymbol{r}\in\mathcal{Z}\;|\;f(\boldsymbol{r})=\boldsymbol{r}^{(i)}\},\;i\in\mathcal{N},

where 𝒓(i)\boldsymbol{r}^{(i)} are samples in ΞB\Xi_{\operatorname{B}}, which generate 𝒓,B\mathbb{P}_{\boldsymbol{r},\operatorname{B}}. Then, we equivalently rewrite the value of the objective function defined in dM,qd_{M,q}, as

(γ¯,W1,,WN):=i=1NWiq(ξ,𝒓(i))γ¯(ξ)𝑑ξ,\displaystyle\mathcal{H}(\mathbb{P}_{\bar{\gamma}},W_{1},\ldots,W_{N})=\sum\limits_{i=1}^{N}\int_{W_{i}}\ell^{q}(\xi,\boldsymbol{r}^{(i)})\mathbb{P}_{\bar{\gamma}}(\xi)d\xi,
s.t.Wiγ¯(ξ)𝑑ξ=1N,i𝒩,\displaystyle\small\hskip 28.45274pt\operatorname{s.t.}\;\int_{W_{i}}\mathbb{P}_{\bar{\gamma}}(\xi)d\xi=\frac{1}{N},\;\forall\;i\in\mathcal{N}, (10)

where the ithi^{\textup{th}} constraints come from the fact that a transport map ff should lead to consistent calculation of set volumes under 𝒓,B\mathbb{P}_{\boldsymbol{r},\operatorname{B}} and γ¯\mathbb{P}_{\bar{\gamma}}, respectively. This results in the following equivalent problem to dM,qd_{M,q} as

(dM,q(γ¯,𝒓,B))q:=minWi,i𝒩\displaystyle(d_{M,q}(\mathbb{P}_{\bar{\gamma}},\mathbb{P}_{\boldsymbol{r},\operatorname{B}}))^{q}=\min\limits_{W_{i},i\in\mathcal{N}} (γ¯,W1,,WN),\displaystyle\;\mathcal{H}(\mathbb{P}_{\bar{\gamma}},W_{1},\ldots,W_{N}), (P1)
s.t.\displaystyle\operatorname{s.t.} (10).\displaystyle\;\eqref{eq:mass}.

Given the distribution γ¯\mathbb{P}_{\bar{\gamma}}, Problem (P1) reduces to a load-balancing problem as in [22]. Let us define the Generalized Voronoi Partition (GVP) of 𝒵\mathcal{Z} associated to the sample set ΞB\Xi_{\operatorname{B}} and weight ωN\omega\in^{N}, for all i𝒩i\in\mathcal{N}, as

𝒱i(ΞB,ω):=\displaystyle\mathcal{V}_{i}(\Xi_{\operatorname{B}},\omega)=
{ξ𝒵|ξ𝒓(i)qωiξ𝒓(j)qωj,j𝒩}.\displaystyle\hskip 22.76228pt\{\xi\in\mathcal{Z}\;|\;\|\xi-\boldsymbol{r}^{(i)}\|^{q}-\omega_{i}\leq\|\xi-\boldsymbol{r}^{(j)}\|^{q}-\omega_{j},\;\forall j\in\mathcal{N}\}.

It has been established that the optimal Partition of (P1) is the GVP [22, Proposition V.1]. Further, the standard Voronoi partition, i.e., the GVP with equal weights ω¯:=𝟎\bar{\omega}:=\boldsymbol{0}, results in a lower boundof (P1), when constraints (10) are removed [21], and therefore that of dM,qd_{M,q}. We denote this lower bound as L(γ¯,𝒱(ΞB))L(\mathbb{P}_{\bar{\gamma}},\mathcal{V}(\Xi_{\operatorname{B}})), and use this to quantify a probabilistic support of γ¯\mathbb{P}_{\bar{\gamma}}. Let us consider the support set

Ω(ΞB,ϵ):=i𝒩(𝒱i(ΞB)𝔹ϵ(𝒓(i))),\Omega(\Xi_{\operatorname{B}},\epsilon):=\cup_{i\in\mathcal{N}}\left(\mathcal{V}_{i}(\Xi_{\operatorname{B}})\cap\mathbb{B}_{\epsilon}(\boldsymbol{r}^{(i)})\right),

where 𝔹ϵ(𝒓(i)):={𝒓p|𝒓𝒓(i)ϵ}\mathbb{B}_{\epsilon}(\boldsymbol{r}^{(i)}):=\{\boldsymbol{r}\in^{p}\;|\;\|\boldsymbol{r}-\boldsymbol{r}^{(i)}\|\leq\epsilon\}. Then we have the following lemma.

Lemma IV.1 (Probabilistic Support).

Let ϵ>0\epsilon>0 and let γ¯\mathbb{P}_{\bar{\gamma}} be a distribution such that L(γ¯,𝒱(ΞB))ϵqL(\mathbb{P}_{\bar{\gamma}},\mathcal{V}(\Xi_{\operatorname{B}}))\leq\epsilon^{q}. Then, for any given s>1s>1, at least 11/sq1-1/s^{q} portion of the mass of γ¯\mathbb{P}_{\bar{\gamma}} is supported on Ω(ΞB,sϵ)\Omega(\Xi_{\operatorname{B}},s\epsilon), i.e., Ω(ΞB,sϵ)γ¯(ξ)𝑑ξ11/sq\int_{\Omega(\Xi_{\operatorname{B}},s\epsilon)}\mathbb{P}_{\bar{\gamma}}(\xi)d\xi\geq 1-1/s^{q}.

Proof.

Suppose otherwise, i.e., pΩ(ΞB,sϵ)γ¯(ξ)𝑑ξ=1Ω(ΞB,sϵ)γ¯(ξ)𝑑ξ>1/sq\int_{{}^{p}\setminus\Omega(\Xi_{\operatorname{B}},s\epsilon)}\mathbb{P}_{\bar{\gamma}}(\xi)d\xi=1-\int_{\Omega(\Xi_{\operatorname{B}},s\epsilon)}\mathbb{P}_{\bar{\gamma}}(\xi)d\xi>1/s^{q}. Then,

L(γ¯,𝒱(ΞB))\displaystyle L(\mathbb{P}_{\bar{\gamma}},\mathcal{V}(\Xi_{\operatorname{B}})) pΩ(ΞB,sϵ)ξ𝒓(i)qγ¯(ξ)𝑑ξ,\displaystyle\geq\int_{{}^{p}\setminus\Omega(\Xi_{\operatorname{B}},s\epsilon)}\|\xi-\boldsymbol{r}^{(i)}\|^{q}\mathbb{P}_{\bar{\gamma}}(\xi)d\xi,
sqϵqpΩ(ΞB,sϵ)γ¯(ξ)𝑑ξ>ϵq.\displaystyle\geq s^{q}\epsilon^{q}\int_{{}^{p}\setminus\Omega(\Xi_{\operatorname{B}},s\epsilon)}\mathbb{P}_{\bar{\gamma}}(\xi)d\xi>\epsilon^{q}.

In this way, the support Ω(ΞB,2α)\Omega(\Xi_{\operatorname{B}},2\alpha) contains at least 11/2q1-1/2^{q} of the mass of all the distributions γ¯\mathbb{P}_{\bar{\gamma}} such that dW,q(γ¯,𝒓,B)αd_{W,q}(\mathbb{P}_{\bar{\gamma}},\mathbb{P}_{\boldsymbol{r},\operatorname{B}})\leq\alpha. Equivalently, we have

Prob(γ¯Ω(ΞB,2α))11/2q,\operatorname{Prob}\left(\bar{\gamma}\in\Omega(\Xi_{\operatorname{B}},2\alpha)\right)\geq 1-1/2^{q},

where the random variable γ¯\bar{\gamma} has a distribution γ¯\mathbb{P}_{\bar{\gamma}} such that dW,q(γ¯,𝒓,B)αd_{W,q}(\mathbb{P}_{\bar{\gamma}},\mathbb{P}_{\boldsymbol{r},\operatorname{B}})\leq\alpha. We characterize 𝒘\mathbb{P}_{\boldsymbol{w}} similarly.

Note that in practice, one can choose ball radius factor ss large in order to generate support which contains higher portion of the mass of unknown \mathbb{P}. However, it comes at a cost on the approximation of the reachable set.

Step 2: (Outer-approximation of x,M\mathcal{R}_{\boldsymbol{x},M}) Making use of the probabilistic support, we can now obtain a finite-dimensional characterization of the probabilistic reachable set, as follows

𝒙,M:={𝒙(M)n system (9),𝝃(0)=𝝃0,𝒘Ω(Ξ𝒘,B,2ϵ𝒘,B),γ¯Ω(ΞB,2α)},\displaystyle\mathcal{R}_{\boldsymbol{x},M}=\left\{\hskip-0.77498pt\boldsymbol{x}(M)\in^{n}\;\rule[-11.38092pt]{0.56917pt}{28.45274pt}\;\begin{array}[]{l}\textrm{system }\eqref{eq:attack},\;\boldsymbol{\xi}(0)=\boldsymbol{\xi}_{0},\\ \boldsymbol{w}\in\Omega(\Xi_{\boldsymbol{w},\operatorname{B}},2\epsilon_{\boldsymbol{w},\operatorname{B}}),\\ \bar{\gamma}\in\Omega(\Xi_{\operatorname{B}},2\alpha)\end{array}\right\},

and the true system state 𝒙(t)\boldsymbol{x}(t) at time MM, 𝒙(M)\boldsymbol{x}(M), satisfies Prob(𝒙(M)𝒙,M)(1β)(11/2q)2.\operatorname{Prob}\left(\boldsymbol{x}(M)\in\mathcal{R}_{\boldsymbol{x},M}\right)\geq(1-\beta)(1-1/2^{q})^{2}. Many works focus on the tractable evolution of geometric shapes; e.g. [9, 13]. Here, we follow [13] and propose outer ellipsoidal bounds for 𝒙,M\mathcal{R}_{\boldsymbol{x},M}. Let Q𝒘Q_{\boldsymbol{w}} be the positive-definite shape matrix such that Ω(Ξ𝒘,B,ϵ𝒘,B)𝒘:={𝒘|𝒘Q𝒘𝒘1}\Omega(\Xi_{\boldsymbol{w},\operatorname{B}},\epsilon_{\boldsymbol{w},\operatorname{B}})\subset\mathcal{E}_{\boldsymbol{w}}:=\{\boldsymbol{w}\;|\;{\boldsymbol{w}}^{\top}Q_{\boldsymbol{w}}\boldsymbol{w}\leq 1\}. Similarly, we denote Qγ¯Q_{\bar{\gamma}} and γ¯\mathcal{E}_{\bar{\gamma}} for that of γ¯\bar{\gamma}. We now state the following lemma, that applies [13, Proposition 1] for our case.

Lemma IV.2 (Outer bounds of x,M\mathcal{R}_{\boldsymbol{x},M}).

Given any a[0,1)a\in[0,1), we claim 𝐱,M(Q):={𝐱n|𝛏Q𝛏aM𝛏0Q𝛏0+(2a)(1aM)1a}\mathcal{R}_{\boldsymbol{x},M}\subset\mathcal{E}(Q):=\{\boldsymbol{x}\in^{n}\;|\;{\boldsymbol{\xi}}^{\top}Q\boldsymbol{\xi}\leq a^{M}{\boldsymbol{\xi}_{0}}^{\top}Q\boldsymbol{\xi}_{0}+\frac{(2-a)(1-a^{M})}{1-a}\}, with QQ satisfying

Q0,[aQHQ𝟎QHQQG𝟎GQW]0,\displaystyle Q\succ 0,\;\;\begin{bmatrix}aQ&{H}^{\top}Q&\bf{0}\\ QH&Q&QG\\ \bf{0}&{G}^{\top}Q&W\end{bmatrix}\succeq 0, (11)

where HH, GG are that in (9) and

W=[(1a1)Q𝒘𝟎𝟎(1a2)Qγ¯],\displaystyle W=\begin{bmatrix}(1-a_{1})Q_{\boldsymbol{w}}&\bf{0}\\ \bf{0}&(1-a_{2})Q_{\bar{\gamma}}\end{bmatrix}, (12)
for some a1+a2a,a1,a2[0,1).\displaystyle\textrm{for some }a_{1}+a_{2}\geq a,\;a_{1},a_{2}\in[0,1).

Refer to caption
Figure 2: Statistics of zz.
Refer to caption
Figure 3: Probabilistic Support of 𝒘\mathbb{P}_{\boldsymbol{w}}.
Refer to caption
Figure 4: Empirical and Bound of 𝒙\mathcal{R}_{\boldsymbol{x}}.

A tight reachable set bound can be now derived by solving

minQ,a1,a2\displaystyle\min\limits_{Q,a_{1},a_{2}} logdet(Q),\displaystyle\;-\log\det(Q), (P2)
s.t.\displaystyle\operatorname{s.t.} (11),(12),\displaystyle\;\eqref{eq:Q},\;\eqref{eq:W},

which is a convex semidefinite program, solvable via e.g., SeDuMi [23]. Note that the probabilistic reachable set is

𝒙:=M=1𝒙,M,\mathcal{R}_{\boldsymbol{x}}:=\cup_{M=1}^{\infty}\mathcal{R}_{\boldsymbol{x},M},

which again can be approximated via QQ^{\star} solving (P2) for999The set 𝒙\mathcal{R}_{\boldsymbol{x}} is in fact contained in the projection of (Q)\mathcal{E}(Q^{\star}) onto the state subspace, i.e., 𝒙{𝒙|𝒙(QxxQxeQee1Qxe)𝒙(2a)1a}\mathcal{R}_{\boldsymbol{x}}\subset\{\boldsymbol{x}\;|\;{\boldsymbol{x}}^{\top}\big{(}Q_{xx}-Q_{xe}Q_{ee}^{-1}{Q_{xe}}^{\top}\big{)}\boldsymbol{x}\leq\frac{(2-a)}{1-a}\} with Q:=[QxxQxeQxeQee]Q^{\star}:=\begin{bmatrix}Q_{xx}&Q_{xe}\\ {Q_{xe}}^{\top}&Q_{ee}\end{bmatrix}. See, e.g., [13] for details.

𝒙(Q)={𝒙n|𝝃Q𝝃(2a)1a}.\mathcal{R}_{\boldsymbol{x}}\subset\mathcal{E}(Q^{\star})=\{\boldsymbol{x}\in^{n}\;|\;{\boldsymbol{\xi}}^{\top}Q^{\star}\boldsymbol{\xi}\leq\frac{(2-a)}{1-a}\}.

V Simulations

In this section, we demonstrate the performance of the proposed attack detector, illustrating its distributional robustness w.r.t. the system noise. Then, we consider stealthy attacks as in (8) and analyze their impact by quantifying the probabilistic reachable set and outer-approximation bound.

Consider the stochastic system (2), given as

A=[1.000.100.200.75],B=[0.100.20],L=[0.230.20],\displaystyle A=\begin{bmatrix}1.00&0.10\\ -0.20&0.75\end{bmatrix},\;B=\begin{bmatrix}0.10\\ 0.20\end{bmatrix},\;L=\begin{bmatrix}0.23\\ -0.20\end{bmatrix},
C=[10],K=[0.130.01],n=2,m=p=1,\displaystyle C=\begin{bmatrix}1&0\end{bmatrix},\;K=\begin{bmatrix}-0.13&0.01\end{bmatrix},\;n=2,\;m=p=1,
𝒘1𝒩(0.25,0.02)+𝒰(0,0.5),𝒗𝒰(0.3,0.3),\displaystyle\boldsymbol{w}_{1}\sim\mathcal{N}(-25,02)+\mathcal{U}(0,5),\;\boldsymbol{v}\sim\mathcal{U}(-3,3),
𝒘2𝒩(0,0.04)+𝒰(0.2,0.2),\displaystyle\boldsymbol{w}_{2}\sim\mathcal{N}(0,04)+\mathcal{U}(-2,2),\;

where 𝒩\mathcal{N} and 𝒰\mathcal{U} represent the normal and uniform distributions, respectively. We consider N=103N=10^{3} benchmark samples for 𝒓,B\mathbb{P}_{\boldsymbol{r},\operatorname{B}} and T=102T=10^{2} real-time samples for 𝒓,D\mathbb{P}_{\boldsymbol{r},\operatorname{D}}. We select the parameter q=1q=1, β=0.01\beta=0.01 and false alarm rate Δ=0.05\Delta=0.05. We select the prior information of the system noise via parameters a=1.5a=1.5, c1=1.84×106c_{1}=1.84\times 10^{6} and c2=12.5c_{2}=12.5. Using the measure-of-concentration results, we determine the detector threshold to be α=0.158\alpha=0.158. In the normal system operation (no attack), we run the online detection procedure for 10410^{4} time steps and draw the distribution of the computed detection measure z(t)z(t) as in Fig. 4. We verify that the false alarm rate is 3.68%3.68\%, within the required rate Δ=5%\Delta=5\%. When the system is subject to stealthy attacks, we assume 𝝃0=𝟎\boldsymbol{\xi}_{0}=\boldsymbol{0} and visualize the Voronoi partition 𝒱(Ξ𝒘,B)\mathcal{V}(\Xi_{\boldsymbol{w},\operatorname{B}}) (convex sets with blue boundaries) of the probabilistic support Ω(Ξ𝒘,B,ϵ𝒘,B)\Omega(\Xi_{\boldsymbol{w},\operatorname{B}},\epsilon_{\boldsymbol{w},\operatorname{B}}) and its estimated ellipsoidal bound (red line) as in Fig. 4. Further, we demonstrate the impact of the stealthy attacks (8), as in Fig. 4. We used 10410^{4} empirical points of 𝒙\mathcal{R}_{\boldsymbol{x}} as its estimate and provided an ellipsoidal bound of 𝒙\mathcal{R}_{\boldsymbol{x}} computed by solution of (P2). It can be seen that the proposed probabilistic reachable set effectively captures the reachable set in probability. Due to the space limits, we omit the comparison of our approach to the existing ones, such as the classical χ2\chi^{2} detector in [9] and the CUMSUM procedure [8]. However, the difference should be clear: our proposed approach is robust w.r.t. noise distributions while others leverage the moment information up to the second order, which only capture sufficient information about certain noise distributions, e.g., Gaussian.

VI Conclusions

A novel detection measure was proposed to enable distributionally robust detection of attacks w.r.t. unknown, and light-tailed system noise. The proposed detection measure restricts the behavior of the stealthy attacks, whose impact was quantified via reachable-set analysis.

References

  • [1] A. Cardenas, S. Amin, B. Sinopoli, A. Giani, A. Perrig, and S. Sastry, “Challenges for securing cyber physical systems,” in Workshop on Future Directions of Cyber-Physical Systems, vol. 5, no. 1, 2009.
  • [2] F. Pasqualetti, F. Dorfler, and F. Bullo, “Attack detection and identification in cyber-physical systems,” IEEE Transactions on Automatic Control, vol. 58, no. 11, p. 2715–2729, 2013.
  • [3] S. Amin, A. Cardenas, and S. S. Sastry, “Safe and secure networked control systems under denial-of-service attacks,” in Hybrid systems: Computation and Control, 2009, p. 31–45.
  • [4] F. Miao, Q. Zhu, M. Pajic, and G. J. Pappas, “Coding schemes for securing cyber-physical systems against stealthy data injection attacks,” IEEE Transactions on Control of Network Systems, vol. 4, no. 1, pp. 106–117, 2016.
  • [5] C. Bai, F. Pasqualetti, and V. Gupta, “Data-injection attacks in stochastic control systems: Detectability and performance tradeoffs,” Automatica, vol. 82, pp. 251–260, 2017.
  • [6] Y. Mo and B. Sinopoli, “Secure control against replay attacks,” in Allerton Conf. on Communications, Control and Computing, Illinois, USA, September 2009, pp. 911–918.
  • [7] M. Zhu and S. Martínez, “On the performance analysis of resilient networked control systems under replay attacks,” IEEE Transactions on Automatic Control, vol. 59, no. 3, pp. 804–808, 2014.
  • [8] J. Milošević, D. Umsonst, H. Sandberg, and K. Johansson, “Quantifying the impact of cyber-attack strategies for control systems equipped with an anomaly detector,” in European Control Conference, 2018, pp. 331–337.
  • [9] Y. Mo and B. Sinopoli, “On the performance degradation of cyber-physical systems under stealthy integrity attacks,” IEEE Transactions on Automatic Control, vol. 61, no. 9, pp. 2618–2624, 2015.
  • [10] S. Mishra, Y. Shoukry, N. Karamchandani, S. Diggavi, and P. Tabuada, “Secure state estimation against sensor attacks in the presence of noise,” IEEE Transactions on Control of Network Systems, vol. 4, no. 1, pp. 49–59, 2016.
  • [11] C. Murguia, N. V. de Wouw, and J. Ruths, “Reachable sets of hidden CPS sensor attacks: Analysis and synthesis tools,” IFAC World Congress, vol. 50, no. 1, pp. 2088–2094, 2017.
  • [12] C. Bai, F. Pasqualetti, and V. Gupta, “Security in stochastic control systems: Fundamental limitations and performance bounds,” in American Control Conference, 2015, pp. 195–200.
  • [13] C. Murguia, I. Shames, J. Ruths, and D. Nesic, “Security metrics of networked control systems under sensor attacks,” arXiv preprint arXiv:1809.01808, 2018.
  • [14] V. Renganathan, N. Hashemi, J. Ruths, and T. Summers, “Distributionally robust tuning of anomaly detectors in Cyber-Physical systems with stealthy attacks,” arXiv preprint arXiv:1909.12506, 2019.
  • [15] A. R. Teel, J. P. Hespanha, and A. Subbaraman, “Equivalent characterizations of input-to-state stability for stochastic discrete-time systems,” IEEE Transactions on Automatic Control, vol. 59, no. 2, p. 516–522, 2014.
  • [16] M. Athans, “The role and use of the stochastic linear-quadratic-Gaussian problem in control system design,” IEEE Transactions on Automatic Control, vol. 16, no. 6, pp. 529–552, 1971.
  • [17] N. Fournier and A. Guillin, “On the rate of convergence in Wasserstein distance of the empirical measure,” Probability Theory and Related Fields, vol. 162, no. 3-4, p. 707–738, 2015.
  • [18] F. Santambrogio, Optimal transport for applied mathematicians. Springer, 2015.
  • [19] D. Li and S. Martínez, “High-confidence attack detection via Wasserstein-metric computations,” preprint arXiv:2003.07880, 2020.
  • [20] C. Murguia and J. Ruths, “Cusum and chi-squared attack detection of compromised sensors,” in IEEE Conf. on Control Applications, 2016, pp. 474–480.
  • [21] F. Bullo, J. Cortés, and S. Martínez, Distributed Control of Robotic Networks, ser. Applied Mathematics Series. Princeton University Press, 2009.
  • [22] J. Cortés, “Coverage optimization and spatial load balancing by robotic sensor networks,” IEEE Transactions on Automatic Control, vol. 55, no. 3, pp. 749–754, 2010.
  • [23] J. Sturm, “Using Sedumi 1.02, a MATLAB toolbox for optimization over symmetric cones,” Optimization Methods and Software, vol. 11, no. 1-4, pp. 625–653, 1999.