This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Risk-Aware Stability of Discrete-Time Systems

Margaret P. Chapman    \IEEEmembershipMember, IEEE    and Dionysios S. Kalogerias    \IEEEmembershipMember, IEEE The work of M. P. Chapman was supported in part by the Edward S. Rogers Sr. Department of Electrical and Computer Engineering, University of Toronto, and in part by the Natural Sciences and Engineering Research Council of Canada Discovery Grants Program [RGPIN-2022-04140]. Cette recherche a été financée par le Conseil de Recherches en Sciences Naturelles et en Génie du Canada. The work of D. S. Kalogerias was supported in part by a Microsoft gift.M. P. Chapman is with the Edward S. Rogers Sr. Department of Electrical and Computer Engineering, University of Toronto, Toronto, ON M5S 3G4 Canada (email: mchapman@ece.utoronto.ca).D. S. Kalogerias is with the Department of Electrical Engineering, Yale University, New Haven, CT, 06520 USA (email: dionysis.kalogerias@yale.edu).
Abstract

We develop a generalized stability framework for stochastic discrete-time systems, where the generality pertains to the ways in which the distribution of the state energy can be characterized. We use tools from finance and operations research called risk functionals (i.e., risk measures) to facilitate diverse distributional characterizations. In contrast, classical stochastic stability notions characterize the state energy on average or in probability, which can obscure the variability of stochastic system behavior. After drawing connections between various risk-aware stability concepts for nonlinear systems, we specialize to linear systems and derive sufficient conditions for the satisfaction of some risk-aware stability properties. These results pertain to real-valued coherent risk functionals and a mean-conditional-variance functional. The results reveal novel noise-to-state stability properties, which assess disturbances in ways that reflect the chosen measure of risk. We illustrate the theory through examples about robustness, parameter choices, and state-feedback controllers.

{IEEEkeywords}

Stochastic stability theory, risk functionals, stochastic discrete-time systems, linear systems

1 Introduction

Stability theory of stochastic systems has a long-standing history with contributions from the late 1950s and early 1960s [1, 2, 3] and is an especially active research area currently [4, 5, 6, 7, 8, 9]. The applications of stochastic stability theory are numerous, and we list just a few here: trajectory tracking despite modeling errors [10, p. 426], ensuring that state estimation error stays bounded [11], target tracking for partially observable systems [12], and establishing the convergence of sampling-based optimization algorithms [13, 14].

Stochastic stability theory typically assesses system behavior by evaluating the magnitude |||\cdot|, e.g., Euclidean norm, of the state xtx_{t} over time in probability or in expectation. Evaluating the probability of a harmful event, e.g., |xt||x_{t}| exceeds a desired threshold, provides one approach to analyze stability. To analyze stability from a different perspective, one may evaluate the moments of |xt||x_{t}|. For example, the uniform exponential pp-stability property (informally) means that the expectation of |xt|p|x_{t}|^{p} decays exponentially at a rate independent of the initial condition. However, instead of merely analyzing E(|xt|p)E(|x_{t}|^{p}), we may wish to analyze additional characteristics of the distribution of the state energy |xt|p|x_{t}|^{p} over time. For instance, the state energy may represent parameter estimation error in stochastic gradient descent, state estimation error in a Kalman filter, or trajectory tracking error of a robotic system. Rarer larger realizations of the state energy may arise in practice but may be concealed at the analysis stage by restricting one’s attention to the expectation of |xt|p|x_{t}|^{p}. There may be incomplete knowledge about the system model, suggesting the importance of analyzing the state energy under distributional ambiguity. Also, the classical pp-stability property ignores the variability of the state energy with respect to its mean, whereas it may be desirable to stabilize such variability in practice.

Ultimately, we wish to design control systems whose energy dynamics enjoy specific desirable distributional characteristics. For instance, we may wish the state energy to decay in expectation conditioned on a given fraction of the worst cases. In other settings, we may wish the variance of the state energy to decay, or we may wish the expected state energy to decay in a distributionally robust sense. To design such control systems in the future, we require techniques to analyze state energy distributions from more diverse perspectives.

In the literature, the term stability refers to different properties, for example, uniform boundedness of the state trajectory in probability [15, (D4), p. 31], driving the state to the origin within a given time period (i.e., prescribed-time stabilization) [8], and input-to-state and noise-to-state stability concepts. Since Sontag’s pioneering work in the setting of deterministic systems in the late 1980s [16], stochastic input-to-state and noise-to-state stability notions have been developed, e.g., see [17, 18, 19, 20, 21, 22, 6, 7]. In particular, deterministic input-to-state stability was extended to stochastic noise-to-state stability by Krstić, Deng, and colleagues in the late 1990s and early 2000s [17, 19]. In the current work, we emphasize uniform stability notions of the form

ρ(ψ(xt))aλtψ(𝐱)+b,t{1,2,},\rho(\psi(x_{t}))\leq a\lambda^{t}\psi(\mathbf{x})+b,\quad t\in\{1,2,\dots\}, (1)

where ρ\rho is a map from random variables to the extended real line, ψ\psi is a state energy function, 𝐱n\mathbf{x}\in\mathbb{R}^{n} is an initial condition, and λ[0,1)\lambda\in[0,1), a0a\geq 0, and bb\in\mathbb{R} are parameters. If bb depends on a disturbance process, e.g., b=supt{0,1,}E(|wt|2)<b=\sup_{t\in\{0,1,\dots\}}E(|w_{t}|^{2})<\infty, then the above stability notion describes a noise-to-state stability property.

In the current work, we develop a uniform stability framework for the analysis of diverse characteristics of the state energy’s distribution by permitting a fairly general map ρ\rho in (1) called a risk functional. A risk functional (i.e., risk measure) is a map from a space of random variables representing costs to the extended real line [23, p. 261]. Exponential utility is the classical risk functional in the control systems literature pioneered by Whittle in the 1980s and 1990s [24, 25, 26]. Other seminal works about exponential utility include those by Howard and Matheson [27] and by Jacobson [28] in the early 1970s. The exponential utility ρe,θ(Z)2θlogE(exp(θ2Z))\rho_{\text{e},\theta}(Z)\coloneqq\textstyle\frac{-2}{\theta}\log E(\exp(\frac{-\theta}{2}Z)) of a nonnegative random variable ZZ with θ0\theta\neq 0 is interpreted as a mean-variance approximation, i.e., if the magnitude of θvar(Z)\theta\textbf{var}(Z) is small enough, then ρe,θ(Z)E(Z)θ4var(Z)\rho_{\text{e},\theta}(Z)\approx E(Z)-\textstyle\frac{\theta}{4}\textbf{var}(Z) [24, p. 765]. From a viewpoint based on expected utility theory [29], ρe,θ\rho_{\text{e},\theta} assumes that an exponential function represents the user’s preferences. Such assumptions need not be appropriate for every application, motivating investigations of additional risk-aware criteria.

Building on the above contributions, additional types of risk functionals have been developed [30, 31, 23, 32]. A real-valued coherent risk functional satisfies four useful properties: convexity, monotonicity, translation equivariance ρ(Z+a)=ρ(Z)+a\rho(Z+a)=\rho(Z)+a if aa\in\mathbb{R}, and positive homogeneity ρ(λZ)=λρ(Z)\rho(\lambda Z)=\lambda\rho(Z) if λ>0\lambda>0 [30], [23, Ch. 6.3]. Such a functional also enjoys a distributionally robust representation [23, Th. 6.6]. A common example is the conditional value-at-risk of an integrable random variable ZZ, which represents the expectation of ZZ in a given fraction of the worst cases [23, Th. 6.2]. A mean-dispersion risk functional takes the form ρ(Z)=E(Z)+λD(Z)\rho(Z)=E(Z)+\lambda D(Z), where λ0\lambda\geq 0 and DD is a measure of dispersion relative to E(Z)E(Z). Examples of DD include variance, standard deviation, and upper-semideviation, i.e., E(max{ZE(Z),0})E(\max\{Z-E(Z),0\}). A recursive risk functional of Z=Z0+Z1+Z2Z=Z_{0}+Z_{1}+Z_{2} takes the form ρ(Z)=ρ0(Z0+ρ1(Z1+ρ2(Z2)))\rho(Z)=\rho_{0}(Z_{0}+\rho_{1}(Z_{1}+\rho_{2}(Z_{2}))), where ρi\rho_{i} is a map between spaces of random variables.111To introduce a recursive risk functional, we have used a time horizon of length N=2N=2 for clarity. More generally, one can use a finite time horizon of length NN\in\mathbb{N}, an arbitrary natural number, or one can use an infinite time horizon [32, 14]. While the structure of a recursive risk functional ρ\rho facilitates the development of risk-aware algorithms, it can be difficult to interpret ρ\rho, except in special cases, such as when ρ\rho is the expectation. We will further describe specific risk functionals in Section 2.

To meet application-specific needs and preferences about managing uncertainty, it is desirable to have at hand a broad collection of maps that assess different distributional characteristics. This is one motivation for the theory of risk functionals and the pursuit of additional research in the intersection of risk functionals and control systems. Research in this intersection has been gaining much momentum in recent years. There are new contributions, for example, in risk-aware mean-field games [33, 34], model predictive control [35, 36], temporal logic [37, 38], barrier functions [39], and optimal-control-based safety analysis [40, 41]. We refer the reader to our recent survey article about risk-aware optimal control theory [42] for additional literature. We will focus the remainder of the literature review on existing work related to risk-aware stability theory, which has been less studied.

1.1 Literature related to risk-aware stability theory

First, we will describe connections between the exponential utility functional and stability theory, which have been known for the past several decades. Then, we will describe the closest related works to the current work: Singh et al. [35], Sopasakis et al. [36], Kishida [43], and Tsiamis et al. [44].

In 1988, Glover and Doyle established connections between the linear time-invariant controller that minimizes the infinite-time exponential utility cost functional and the class of stabilizing controllers that satisfy an \mathcal{H}_{\infty}-norm bound in a linear-quadratic setting [45]. In 1995, James and Baras developed a solution approach for the robust \mathcal{H}_{\infty} output feedback nonlinear control problem, which includes an asymptotic stability specification, using techniques from an earlier study of a partially observable exponential utility control problem [46, 47]. In 1999, Pan and Başar provided conditions that guarantee global probabilistic asymptotic stability for nonlinear continuous-time stochastic systems in the context of a long-term exponential utility cost functional [48]. In 2000, Dupuis et al. derived an upper bound for the average output power in terms of the input power and the exponential utility functional, which led to a stochastic small gain theorem [49].

Singh et al. [35] and Sopasakis et al. [36] both studied risk-averse model predictive control with risk-aware stability properties that are defined using a recursive risk functional; see [35, Def. V.1] and [36, Def. 4], respectively. Singh et al. considered a linear system with multiplicative noise, i.e., xt+1=A(wt)xt+B(wt)utx_{t+1}=A(w_{t})x_{t}+B(w_{t})u_{t}, where there are finitely many realizations of wtw_{t} [35]. Sopasakis et al. considered a nonlinear generalization with joint state-input constraints [36]. Lyapunov stability conditions are provided by [35, Lemma VI.1] and [36, Lemma 5], respectively. In particular, the lemma [35, Lemma VI.1] follows directly from a well-established sufficient condition for a classical exponential stability property [50, Def. 1, Lemma 1], which builds on techniques from [51]. In contrast to the above works, we consider linear models with additive noise, which yields noise-to-state stability properties, and we emphasize nonrecursive risk functionals and systems with continuous disturbance spaces.

Kishida developed stability theory for linear systems in the context of a specific coherent risk functional, that is, a distributionally robust conditional value-at-risk (CVaR) functional [43]. The system is linear with additive noise, and the ambiguity set is the family of disturbance distributions with zero mean and known time-invariant covariance [43, Eq. (2)]. Sufficient conditions for the robust CVaR stability property include the maximum singular value σmax(A)\sigma_{\text{max}}(A) of the dynamics matrix AA being strictly less than one [43, Lemma 3.2]. These efforts have inspired us to investigate stability conditions in the context of a broad family of coherent risk functionals for linear systems (Theorem 3.3). Our linear system model in Theorem 3.3 does not require zero-mean or identically distributed disturbances, and we circumvent the condition of σmax(A)<1\sigma_{\text{max}}(A)<1 (instead, our analysis relies on Schur stable matrices). We study connections between stability definitions that apply to nonlinear systems (Theorem 1), and we develop stability theory in the context of a particular noncoherent risk functional as well (Theorem 3.7).

In a linear-quadratic optimal control setting, Tsiamis et al. proposed a constraint on a conditional variance, which admits a quadratic form [44]. We propose a risk functional that is a weighted sum of the mean and a conditional variance (Section 2, Example 4). Then, we incorporate this functional into our risk-aware stability framework, deriving sufficient conditions for stability (Theorem 3.7). The idea of a mean-conditional-variance risk functional comes from the work [44]. However, we require different techniques that are useful for risk-aware stability theory, whereas the focus of [44] is risk-aware linear-quadratic optimal control theory.

Overall, in contrast to the above works, we advocate for a generalized risk-aware stability viewpoint, in which one can evaluate the state energy in terms of any risk functional. We see value in this degree of generality to accommodate the potential needs of diverse applications, in which different characterizations of state energy distributions may be useful.

1.2 Contributions

We propose a uniform stability analysis framework that characterizes diverse features of the state energy’s distribution by permitting ρ\rho in (1) to be a general risk functional. In contrast, the standard paradigm characterizes the average energy or the probability that the state trajectory is close to the origin. Using a generalized risk-aware (uniform) stability property for nonlinear systems (Definition 3), our first contribution is to draw connections between different instances of this property for real-valued coherent risk functionals and other common risk functionals (Theorem 1). Then, specializing to linear systems, our second contribution is to derive sufficient conditions under which a risk-aware stability property holds, where the property is defined using a real-valued coherent risk functional (Theorem 3.3). These sufficient conditions also provide a distributionally robust risk-neutral stability property using the ambiguity set corresponding to the coherent risk functional of interest. Our final contribution is to derive sufficient conditions for stability in the context of a mean-conditional-variance functional (Theorem 3.7). The last two theorems uncover novel risk-aware noise-to-state stability properties, which have not been reported or studied in the literature to our knowledge.

1.3 Organization, notation, and terminology

Section 2 studies risk-aware (uniform) stability properties in the context of various risk functionals and discrete-time nonlinear systems. Section 3 specializes to linear systems and derives sufficient conditions under which risk-aware stability properties are guaranteed for some classes of risk functionals. Section 4 offers illustrative examples. In particular, we provide a simple risk-aware controller that displays empirically a disturbance attenuation effect in the setting of Theorem 3.7 (Illustration 3). Lastly, Section 5 presents concluding remarks, and the Appendix offers supporting technical details.

\mathbb{N} is the set of natural numbers, 0{0}\mathbb{N}_{0}\coloneqq\{0\}\cup\mathbb{N}, \mathbb{R} is the real line, and ¯{,}\bar{\mathbb{R}}\coloneqq\mathbb{R}\cup\{\infty,-\infty\} is the extended real line. Given nn\in\mathbb{N}, n\mathbb{R}^{n} is nn-dimensional Euclidean space, and ¯n\bar{\mathbb{R}}^{n} is extended nn-dimensional Euclidean space. n×n\mathbb{R}^{n\times n} is the set of n×nn\times n real matrices. 𝒮n\mathcal{S}_{n} is the set of n×nn\times n real symmetric positive semidefinite matrices. If M𝒮nM\in\mathcal{S}_{n}, then M12M^{\frac{1}{2}} is the matrix square root of MM, i.e., (M12)M12=M(M^{\frac{1}{2}})^{\top}M^{\frac{1}{2}}=M. If H𝒮nH\in\mathcal{S}_{n} and M𝒮nM\in\mathcal{S}_{n}, we define the matrix HMH_{M} by

HM(M12)HM12.H_{M}\coloneqq(M^{\frac{1}{2}})^{\top}HM^{\frac{1}{2}}. (2)

If M𝒮nM\in\mathcal{S}_{n}, then we denote the smallest and largest eigenvalues of MM by λmin(M)\lambda_{\text{min}}(M) and λmax(M)\lambda_{\text{max}}(M), respectively. 𝒮n+\mathcal{S}_{n}^{+} is the set of n×nn\times n real symmetric positive definite matrices. |||\cdot| is the Euclidean norm on n\mathbb{R}^{n}. ||2|\cdot|_{2} is the matrix norm induced by the Euclidean norm, i.e., |M|2sup|𝐳|=1|M𝐳||M|_{2}\coloneqq\sup_{{|\bf z|}=1}|M{\bf z}| for any Mn×nM\in\mathbb{R}^{n\times n}. InI_{n} is the n×nn\times n identity matrix. 0n0_{n} is the origin of n\mathbb{R}^{n}. tr(M)\text{tr}(M) is the trace of Mn×nM\in\mathbb{R}^{n\times n}.

If 𝕄\mathbb{M} is a metric space, then 𝕄\mathcal{B}_{\mathbb{M}} is the Borel σ\sigma-algebra on 𝕄\mathbb{M}. Given measurable spaces (Ω,)(\Omega,\mathcal{F}) and (Ω,)(\Omega^{\prime},\mathcal{F}^{\prime}), the notation f:(Ω,)(Ω,)f:(\Omega,\mathcal{F})\rightarrow(\Omega^{\prime},\mathcal{F}^{\prime}) means that the function f:ΩΩf:\Omega\rightarrow\Omega^{\prime} is (,)(\mathcal{F},\mathcal{F}^{\prime})-measurable. If Ω\Omega and Ω\Omega^{\prime} are metric spaces and f:(Ω,Ω)(Ω,Ω)f:(\Omega,\mathcal{B}_{\Omega})\rightarrow(\Omega^{\prime},\mathcal{B}_{\Omega^{\prime}}), then ff is called Borel-measurable. (Ω,,P)(\Omega,\mathcal{F},P) is a generic probability space, where EE is the associated expectation operator. q(Ω,,P)\mathcal{L}^{q}(\Omega,\mathcal{F},P) is the associated q\mathcal{L}^{q} space, where q\|\cdot\|_{q} is the associated norm and q[1,]q\in[1,\infty]. We denote q(Ω,,P)\mathcal{L}^{q}(\Omega,\mathcal{F},P) by q\mathcal{L}^{q} for brevity. With slight abuse of notation (which is standard in this case), we use fqf\in\mathcal{L}^{q} to denote a function f:(Ω,)(,)f:(\Omega,\mathcal{F})\rightarrow(\mathbb{R},\mathcal{B}_{\mathbb{R}}) such that fq<\|f\|_{q}<\infty. q\mathcal{L}^{q*} is the dual space of q\mathcal{L}^{q}, and q\|\cdot\|_{q*} is the corresponding norm. The phrase a.e. means almost everywhere with respect to the probability measure PP. F\mathcal{I}_{F} is the indicator function of FF\in\mathcal{F}. w.r.t. means with respect to.

We will use basic measurability and integration theorems, such as those from [52, Ch. 1.5], without an explicit reference.

2 Generalized risk-aware stability

Consider a fully observable stochastic discrete-time system

xt+1=f(xt,wt),t=0,1,,x_{t+1}=f(x_{t},w_{t}),\quad t=0,1,\dots, (3)

where f:n×dnf:\mathbb{R}^{n}\times\mathbb{R}^{d}\rightarrow\mathbb{R}^{n} is a Borel-measurable function, nn and dd are natural numbers, (x0,x1,)(x_{0},x_{1},\dots) is an n\mathbb{R}^{n}-valued stochastic process, and (w0,w1,)(w_{0},w_{1},\dots) is an d\mathbb{R}^{d}-valued independent stochastic process. The processes are defined on an arbitrary probability space (Ω,,P)(\Omega,\mathcal{F},P). We assume that the initial state is fixed at an arbitrary vector 𝐱n\mathbf{x}\in\mathbb{R}^{n}.222More precisely, we assume that the distribution of x0x_{0} is the Dirac measure concentrated at 𝐱\mathbf{x}, where 𝐱\mathbf{x} can be any vector in n\mathbb{R}^{n}.

There are many types of energy-like functions in control theory, e.g., class-𝒦\mathcal{K} functions, locally positive definite functions, and decrescent functions [53, Sec. 5.3.1]. We will find the following energy-like function useful in this work.

Definition 1 (State energy function ψ\psi)

A state energy function ψ:n\psi:\mathbb{R}^{n}\rightarrow\mathbb{R} is Borel-measurable, nonnegative, and satisfies ψ(x)=0\psi(x)=0 if and only if x=0nx=0_{n}.

It is natural to choose ψ\psi so that ψ(x)\psi(x) increases as |x||x| increases. However, we do not need this condition in the current section. In this section, we will study the relationships between different types of risk-aware stability notions for any ψ\psi satisfying Definition 1. In contrast, in Section 3, we will specialize to ψ(x)=xRx\psi(x)=x^{\top}Rx with R𝒮n+R\in\mathcal{S}_{n}^{+}.

The next definition is a non-risk-aware (i.e., risk-neutral) uniform stability property. Variations of the definition to follow are available from [51, Def. 1], [50, Def. 1], and [10, Def. 7.3.8], for example.

Definition 2 (Risk-neutral stability)

Let a state energy function ψ\psi and a subset SnS\subseteq\mathbb{R}^{n} be given. The system (3) is uniformly exponentially stable with an offset with respect to the mean in region SS if and only if there exist λ[0,1)\lambda\in[0,1), a[0,)a\in[0,\infty), and bb\in\mathbb{R} such that

E(ψ(xt))aλtψ(𝐱)+bE(\psi(x_{t}))\leq a\lambda^{t}\psi(\mathbf{x})+b (4)

for every time tt\in\mathbb{N} and initial condition 𝐱S\mathbf{x}\in S. For brevity, we often write “stability with respect to the mean” instead of “uniform exponential stability with an offset with respect to the mean in region SS.” We refer to λ\lambda as a rate parameter, aa as a scale parameter, and bb as an offset parameter.

Definition 2 describes a uniform property because (λ,a,b)(\lambda,a,b) does not depend on 𝐱\mathbf{x} or tt. While one can also consider a definition in which (λ,a,b)(\lambda,a,b) can depend on 𝐱\mathbf{x} or tt, we focus on the uniform case in this work. Definition 2 describes a local property if SS is a strictly proper subset of n\mathbb{R}^{n} and a global property otherwise. If ψ(x)=|x|2\psi(x)=|x|^{2} and b=0b=0, then Definition 2 is uniform exponential mean-square stability. If b=csupt0E(wtMwt)<b=c\sup_{t\in\mathbb{N}_{0}}E(w_{t}^{\top}Mw_{t})<\infty for some c(0,)c\in(0,\infty) and M𝒮n+M\in\mathcal{S}_{n}^{+}, then Definition 2 describes a risk-neutral noise-to-state stability property. (Additional risk-neutral noise-to-state stability definitions have been introduced in the literature; see [17, Def. 4.1] and [22, Def. 2] for two examples.)

As described in the Introduction, we are interested in analyzing the state energy over time, which is a random process (ψ(x1),ψ(x2),)(\psi(x_{1}),\psi(x_{2}),\dots). If we only consider the average state energy, as in Definition 2, then we ignore other characteristics of the distribution of the state energy, including the dispersion with respect to the mean (e.g., variance) and the shape of the upper tail (e.g., the mean in a fraction of the worst cases). Hence, we will develop a risk-aware stability concept to facilitate the analysis of a broader range of distributional characteristics of the state energy.

To generalize Definition 2, the key observation is that the expectation is a map from a family of random variables to ¯\bar{\mathbb{R}}. Hence, we can consider maps that characterize additional features of the distribution of a random variable. Let 𝒵\mathcal{Z} be a family of random variables defined on (Ω,,P)(\Omega,\mathcal{F},P), and let ρ\rho be a map from 𝒵\mathcal{Z} to ¯\bar{\mathbb{R}}.333We prefer smaller realizations of any Z𝒵Z\in\mathcal{Z}. That is, ZZ represents a random cost rather than a random reward. We assume that ρ(Z){}\rho(Z)\in\mathbb{R}\cup\{\infty\} for every Z𝒵Z\in\mathcal{Z}, and we assume that there exists a Z𝒵Z\in\mathcal{Z} such that ρ(Z)\rho(Z) is finite, following convention [23, p. 261]. The term risk functional (i.e., risk measure) invokes these assumptions implicitly. ρ\rho is called a risk functional (i.e., risk measure), and several examples will follow. The precise definition of 𝒵\mathcal{Z} will depend on the risk functional of interest. While Examples 13 can be found in [23], they are necessary to present to keep the current work self-contained.

Example 1 (Value-at-risk)

Let 𝒵\mathcal{Z} be the entire family of random variables on (Ω,,P)(\Omega,\mathcal{F},P). The value-at-risk of Z𝒵Z\in\mathcal{Z} at level α(0,1)\alpha\in(0,1) is defined by

VaRα(Z)inf{z:FZ(z)1α},\text{VaR}_{\alpha}(Z)\coloneqq\inf\{z\in\mathbb{R}:F_{Z}(z)\geq 1-\alpha\}, (5)

where FZ(z)P({Zz})F_{Z}(z)\coloneqq P(\{Z\leq z\}) is the distribution function of ZZ. The map (1α)VaRα(Z)(1-\alpha)\mapsto\text{VaR}_{\alpha}(Z) is the generalized inverse of FZF_{Z}, i.e., the quantile function of FZF_{Z} [54, p. 304].

Example 2 (Conditional value-at-risk)

The conditional value-at-risk of Z𝒵=1Z\in\mathcal{Z}=\mathcal{L}^{1} at level α(0,1]\alpha\in(0,1] is defined by

CVaRα(Z)infs(s+1αE(max{Zs,0})).\text{CVaR}_{\alpha}(Z)\coloneqq\underset{s\in\mathbb{R}}{\inf}\Big{(}s+\textstyle\frac{1}{\alpha}E(\max\{Z-s,0\})\Big{)}. (6)

CVaR1(Z)\text{CVaR}_{1}(Z) equals E(Z)E(Z), and the limit of CVaRα(Z)\text{CVaR}_{\alpha}(Z) as α0\alpha\rightarrow 0 from above equals the essential supremum of ZZ [55]. The name conditional value-at-risk comes from the following: If α(0,1)\alpha\in(0,1) and FZF_{Z} is continuous at VaRα(Z)\text{VaR}_{\alpha}(Z), then CVaRα(Z)=E(Z|ZVaRα(Z))\text{CVaR}_{\alpha}(Z)=E(Z|Z\geq\text{VaR}_{\alpha}(Z)) [23, Th. 6.2]. Synonyms include average value-at-risk and expected shortfall.

The value-at-risk and the conditional value-at-risk assess risk in terms of quantiles. The following functionals assess risk in terms of dispersions relative to the mean.

Example 3 (Mean-deviation, mean-upper-semideviation)

Let q[1,)q\in[1,\infty), β[0,)\beta\in[0,\infty), and Z𝒵=qZ\in\mathcal{Z}=\mathcal{L}^{q} be given. The mean-deviation of ZZ is defined by

MDq,β(Z)E(Z)+βZE(Z)q,\text{MD}_{q,\beta}(Z)\coloneqq E(Z)+\beta\|Z-E(Z)\|_{q}, (7)

while the mean-upper-semideviation of ZZ is defined by

MUSq,β(Z)E(Z)+βmax{ZE(Z),0}q.\text{MUS}_{q,\beta}(Z)\coloneqq E(Z)+\beta\|\max\{Z-E(Z),0\}\|_{q}. (8)

The second term in (8) penalizes realizations of ZZ above the mean but not realizations of ZZ below the mean. However, the second term in (7) does not distinguish between the two cases.

Conditional value-at-risk and the mean-dispersion functionals of Example 3 for particular choices of qq and β\beta belong to the class of real-valued coherent risk functionals [23, Ch. 6.3]. As described previously, such a functional satisfies four desirable properties and also admits a dual representation as a distributionally robust expectation (to be presented). Let ϱ\varrho be a real-valued coherent risk functional defined on q\mathcal{L}^{q} with q[1,)q\in[1,\infty). Then, there is a bounded family 𝒜q\mathcal{A}\subset\mathcal{L}^{q*} of densities such that

ϱ(Z)=supξ𝒜E(Zξ),Zq,\varrho(Z)=\sup_{\xi\in\mathcal{A}}E(Z\xi),\quad Z\in\mathcal{L}^{q}, (9)

by one direction of [23, Th. 6.6].444Moreover, the existence of a dual representation (9) implies that ϱ\varrho is a real-valued coherent risk functional; for technical details and information about additional properties of 𝒜\mathcal{A}, please see [23, Th. 6.6]. In particular, every ξ𝒜\xi\in\mathcal{A} satisfies ξq\xi\in\mathcal{L}^{q*}, E(ξ)=1E(\xi)=1, and ξ0\xi\geq 0 a.e. [23, Eq. (6.38)]. Using (9), one can show that if Z=0Z=0 a.e., then ϱ(Z)=0\varrho(Z)=0. We will find the exact forms of 𝒜\mathcal{A} for Examples 2 and 3 useful for Theorem 1 to come.

Remark 1 (Special cases of 𝒜\mathcal{A})

For conditional value-at-risk at level α(0,1)\alpha\in(0,1), 𝒜\mathcal{A} is given by [23, Eq. (6.70)]

𝒜={ξ:0ξ1α a.e.,E(ξ)=1}.\mathcal{A}=\left\{\xi\in\mathcal{L}^{\infty}:0\leq\xi\leq\textstyle\frac{1}{\alpha}\text{ a.e.},\;E(\xi)=1\right\}. (10)

Note that ξ𝒜\xi\in\mathcal{A} (10) implies that |ξ|1α|\xi|\leq\frac{1}{\alpha} a.e. Here, the upper bound 1α\frac{1}{\alpha} is hyperbolic in the parameter α\alpha. In the following two cases, however, we will present an upper bound that is affine in the corresponding parameter.

For q=1q=1 and β[0,12]\beta\in[0,\frac{1}{2}], mean-deviation is real-valued and coherent [23, Ex. 6.19], and 𝒜\mathcal{A} is given by [23, Eq. (6.90)]

𝒜={ξ:ξ=1+ηE(η),ηβ}.\mathcal{A}=\left\{\xi\in\mathcal{L}^{\infty}:\xi=1+\eta-E(\eta),\;\|\eta\|_{\infty}\leq\beta\right\}. (11)

If ξ𝒜\xi\in\mathcal{A} (11), then 0ξ1+2β0\leq\xi\leq 1+2\beta a.e. The upper bound 1+2β1+2\beta is affine in β[0,12]\beta\in[0,\frac{1}{2}].

For q[1,)q\in[1,\infty) and β[0,1]\beta\in[0,1], mean-upper-semideviation is real-valued and coherent [23, Ex. 6.20]. The corresponding family 𝒜\mathcal{A} of densities is given by [23, Eq. (6.96)]

𝒜={ξq:ξ=1+ηE(η),ηqβ,η0 a.e.}.\mathcal{A}=\{\xi\in\mathcal{L}^{q*}:\xi=1+\eta-E(\eta),\;\|\eta\|_{q*}\leq\beta,\;\eta\geq 0\text{ a.e.}\}. (12)

In particular, if q=1q=1 and ξ𝒜\xi\in\mathcal{A} (12), then 0ξ1+2β0\leq\xi\leq 1+2\beta a.e. The upper bound is affine in β[0,1]\beta\in[0,1].

The next risk functional ρν\rho_{\nu} to be described, which takes inspiration from [44], is related to Example 3 because it also assesses risk in terms of a dispersion relative to the mean. While ρν\rho_{\nu} is noncoherent in general, it enjoys a special meaning in dynamical systems applications, as it depends on a sub σ\sigma-algebra, which can encode the history of a stochastic process.

Example 4 (Mean-conditional-variance ρν\rho_{\nu})

Given ν[0,)\nu\in[0,\infty) and a sub σ\sigma-algebra i\mathcal{F}_{i} of \mathcal{F}, the mean-conditional-variance of Z𝒵=1Z\in\mathcal{Z}=\mathcal{L}^{1} is defined by

ρν(Z)E(Z)+νE(Δt2),\rho_{\nu}(Z)\coloneqq E(Z)+\nu E(\Delta_{t}^{2}), (13)

where Δt\Delta_{t} is a real-valued prediction error defined by

Δt\displaystyle\Delta_{t} E(Z|i)FiZ,\displaystyle\coloneqq E(Z|\mathcal{F}_{i})\mathcal{I}_{F_{i}}-Z, (14)
Fi\displaystyle F_{i} {ωΩ:E(Z|i)(ω)}.\displaystyle\coloneqq\{\omega\in\Omega:E(Z|\mathcal{F}_{i})(\omega)\in\mathbb{R}\}. (15)

Note that ρν(Z)\rho_{\nu}(Z) can be \infty. E(Δt2)E(\Delta_{t}^{2}) is called a conditional variance because Δt=E(Z|i)Z\Delta_{t}=E(Z|\mathcal{F}_{i})-Z a.e.

Any risk functional, e.g., see Examples 14, can be incorporated into the following definition.

Definition 3 (Risk-aware stability)

Let a risk functional ρ:𝒵¯\rho:\mathcal{Z}\rightarrow\bar{\mathbb{R}}, a state energy function ψ\psi, and a subset SnS\subseteq\mathbb{R}^{n} be given. The system (3) is uniformly exponentially stable with an offset with respect to ρ\rho in region SS if and only if ψ(xt)𝒵\psi(x_{t})\in\mathcal{Z} for every tt\in\mathbb{N} and there exist λ[0,1)\lambda\in[0,1), a[0,)a\in[0,\infty), and bb\in\mathbb{R} such that

ρ(ψ(xt))aλtψ(𝐱)+b\rho(\psi(x_{t}))\leq a\lambda^{t}\psi(\mathbf{x})+b (16)

for every time tt\in\mathbb{N} and initial condition 𝐱S\mathbf{x}\in S. For brevity, we often write “stability with respect to ρ\rho” instead of “uniform exponential stability with an offset with respect to ρ\rho in region SS.” As before, we refer to λ\lambda as a rate parameter, aa as a scale parameter, and bb as an offset parameter.

In Section 3, we will see how bb can depend on the disturbance process and the chosen measure of risk, leading to a risk-aware noise-to-state stability property.555In this work, we do not need to write Definition 3 in terms of the existence of class-𝒦\mathcal{K} and class-𝒦\mathcal{K}\mathcal{L} functions. In future work, using these function classes will likely be useful for extending the results of Section 3 (Theorem 3.3 and Theorem 3.7) to nonlinear systems. In Definition 3, the condition ψ(xt)𝒵\psi(x_{t})\in\mathcal{Z} for every tt\in\mathbb{N} is required because the domain of ρ\rho need not be the entire family of random variables on (Ω,,P)(\Omega,\mathcal{F},P), e.g., see Examples 24. In the special case of ρ\rho being the expectation and 𝒵\mathcal{Z} being the entire family of random variables on (Ω,,P)(\Omega,\mathcal{F},P), Definition 3 reduces to Definition 2. While Definition 3 is inspired by [35, Def. V.1], [36, Def. 4], and [43, Def. 3.1], Definition 3 invokes a generalized viewpoint because ρ\rho can be any risk functional. In contrast, a recursive risk functional was considered in [35, Def. V.1] and [36, Def. 4], and a distributionally robust CVaR functional was considered in [43, Def. 3.1]. The next theorem will demonstrate connections between several instances of Definition 3.

Theorem 1 (Analysis of Definition 3)

Consider the discrete-time nonlinear system (3). Let a state energy function ψ\psi, a subset SnS\subseteq\mathbb{R}^{n}, scalars α(0,1)\alpha\in(0,1) and q[1,)q\in[1,\infty), and a real-valued coherent risk functional ϱ\varrho on q\mathcal{L}^{q} be given. Then, the following statements hold:

  1. 1.

    Stability w.r.t. CVaRα\text{CVaR}_{\alpha} in region SS with parameters (λ,a,b)(\lambda,a,b) implies the probabilistic stability property P({ψ(xt)aλtψ(𝐱)+b})1αP(\{\psi(x_{t})\leq a\lambda^{t}\psi(\mathbf{x})+b\})\geq 1-\alpha for every tt\in\mathbb{N} and 𝐱S\mathbf{x}\in S.

  2. 2.

    Stability w.r.t. the mean with parameters (λ,a,b)(\lambda,a,b) implies stability w.r.t. CVaRα\text{CVaR}_{\alpha} with parameters (λ,aα,bα)(\lambda,\frac{a}{\alpha},\frac{b}{\alpha}).

  3. 3.

    Stability w.r.t. the mean with parameters (λ,a,b)(\lambda,a,b) implies stability w.r.t. the mean-deviation on 1\mathcal{L}^{1} and the mean-upper-semideviation on 1\mathcal{L}^{1}, both with parameters (λ,a(1+2β),b(1+2β))(\lambda,a(1+2\beta),b(1+2\beta)). In the case of the mean-deviation, β[0,12]\beta\in[0,\frac{1}{2}]. In the case of the mean-upper-semideviation, β[0,1]\beta\in[0,1].

  4. 4.

    Stability w.r.t. the mean-deviation on q\mathcal{L}^{q} implies stability w.r.t. the mean-upper-semideviation on q\mathcal{L}^{q} with the same parameters.

  5. 5.

    Stability w.r.t. ϱ\varrho in region SS with parameters (λ,a,b)(\lambda,a,b) implies the distributionally robust stability property w.r.t. the mean

    E(ψ(xt)ξ)aλtψ(𝐱)+b,ξ𝒜,t,𝐱S,E(\psi(x_{t})\xi)\leq a\lambda^{t}\psi(\mathbf{x})+b,\quad\xi\in\mathcal{A},\quad t\in\mathbb{N},\quad\mathbf{x}\in S,

    where 𝒜\mathcal{A} is the family of densities in the dual representation of ϱ\varrho.

Remark 2 (Interpretation of Theorem 1)

The first item of Theorem 1 states that a CVaR stability property guarantees a probabilistic stability property. While connections between CVaR and probabilities are well-established, the first item of Theorem 1 is illuminating in light of the extensive history of probabilistic stability theory, e.g., see [15, 10, 56]. The second and third items indicate that stability with respect to the mean ensures stability with respect to some common real-valued coherent risk functionals with transformed scale and offset parameters. The nature of the transformation depends on the specific family of densities in the dual representation. While the mean-deviation and the mean-upper-semideviation are similar functionals, the fourth item indicates that stability with respect to the mean-deviation is a stronger property. The last item states that stability with respect to a real-valued coherent risk functional on q\mathcal{L}^{q} guarantees a distributionally robust risk-neutral stability property, which we will see again in the final part of Theorem 3.3 (to be presented in Section 3).

The proof of Theorem 1 has five parts, one for each item. Each part is based on the properties enjoyed by the risk functional at hand.

Proof 2.2.

Part 1: Let tt\in\mathbb{N} and 𝐱S\mathbf{x}\in S be given. Since α(0,1)\alpha\in(0,1) and ψ(xt)1\psi(x_{t})\in\mathcal{L}^{1}, a minimizer of the right side of (6) with Z=ψ(xt)Z=\psi(x_{t}) is VaRα(ψ(xt))\text{VaR}_{\alpha}(\psi(x_{t})) [23, p. 258], and consequently, VaRα(ψ(xt))CVaRα(ψ(xt))\text{VaR}_{\alpha}(\psi(x_{t}))\leq\text{CVaR}_{\alpha}(\psi(x_{t})). Hence, CVaRα(ψ(xt))aλtψ(𝐱)+b\text{CVaR}_{\alpha}(\psi(x_{t}))\leq a\lambda^{t}\psi(\mathbf{x})+b implies that

VaRα(ψ(xt))aλtψ(𝐱)+b.\text{VaR}_{\alpha}(\psi(x_{t}))\leq a\lambda^{t}\psi(\mathbf{x})+b. (17)

Since α(0,1)\alpha\in(0,1) and z=aλtψ(𝐱)+bz=a\lambda^{t}\psi(\mathbf{x})+b\in\mathbb{R},

VaRα(ψ(xt))zP({ψ(xt)z})1α\text{VaR}_{\alpha}(\psi(x_{t}))\leq z\iff P(\{\psi(x_{t})\leq z\})\geq 1-\alpha (18)

by [54, Lemma 21.1 (i)] and the definition of VaRα\text{VaR}_{\alpha} (5).

The following statement applies to both Parts 2 and 3, in which stability with respect to the mean is assumed. By Definition 2 and nonnegativity of ψ\psi, it holds that 0E(ψ(xt))aλtψ(𝐱)+b0\leq E(\psi(x_{t}))\leq a\lambda^{t}\psi(\mathbf{x})+b for every tt\in\mathbb{N} and 𝐱S\mathbf{x}\in S. In particular, ψ(xt)1\psi(x_{t})\in\mathcal{L}^{1} for every tt\in\mathbb{N}.

Part 2: Let tt\in\mathbb{N} and 𝐱S\mathbf{x}\in S be given. Since ψ(xt)1\psi(x_{t})\in\mathcal{L}^{1} and ψ(xt)\psi(x_{t}) is nonnegative, CVaRα(ψ(xt))1αE(ψ(xt))\text{CVaR}_{\alpha}(\psi(x_{t}))\leq\frac{1}{\alpha}E(\psi(x_{t})) [40, Lemma 2]. Thus, CVaRα(ψ(xt))1α(aλtψ(𝐱)+b)\text{CVaR}_{\alpha}(\psi(x_{t}))\leq\frac{1}{\alpha}(a\lambda^{t}\psi(\mathbf{x})+b).

Part 3: For β[0,12]\beta\in[0,\frac{1}{2}], the mean-deviation on 1\mathcal{L}^{1} is real-valued and coherent. Denoting the associated family of densities by 𝒜\mathcal{A}, every ξ𝒜\xi\in\mathcal{A} satisfies 0ξ1+2β0\leq\xi\leq 1+2\beta a.e. (Remark 1). Let tt\in\mathbb{N}, 𝐱S\mathbf{x}\in S, and ξ𝒜\xi\in\mathcal{A} be given. Since ψ(xt)0\psi(x_{t})\geq 0 and ξ[0,1+2β]\xi\in[0,1+2\beta] a.e., 0E(ψ(xt)ξ)E(ψ(xt))(1+2β)0\leq E(\psi(x_{t})\xi)\leq E(\psi(x_{t}))(1+2\beta). Using ψ(xt)1\psi(x_{t})\in\mathcal{L}^{1} and taking the supremum over 𝒜\mathcal{A}, we find that MD1,β(ψ(xt))E(ψ(xt))(1+2β)\text{MD}_{1,\beta}(\psi(x_{t}))\leq E(\psi(x_{t}))(1+2\beta). Then, the result follows from 0E(ψ(xt))aλtψ(𝐱)+b0\leq E(\psi(x_{t}))\leq a\lambda^{t}\psi(\mathbf{x})+b. The derivation in the case of the mean-upper-semideviation is analogous.

Part 4: Let β[0,)\beta\in[0,\infty) be given. By assumption, ψ(xt)q\psi(x_{t})\in\mathcal{L}^{q} for every tt\in\mathbb{N}, and there exist λ[0,1)\lambda\in[0,1), a[0,)a\in[0,\infty), and bb\in\mathbb{R} such that

MDq,β(ψ(xt))aλtψ(𝐱)+b\text{MD}_{q,\beta}(\psi(x_{t}))\leq a\lambda^{t}\psi(\mathbf{x})+b

for every tt\in\mathbb{N} and 𝐱S\mathbf{x}\in S. Now, let tt\in\mathbb{N} and 𝐱S\mathbf{x}\in S be given. Since q[1,)q\in[1,\infty), ψ(xt)q\psi(x_{t})\in\mathcal{L}^{q}, and P(Ω)=1P(\Omega)=1, E(ψ(xt))E(\psi(x_{t})) and ψ(xt)E(ψ(xt))q\|\psi(x_{t})-E(\psi(x_{t}))\|_{q} are finite. Also, since 0max{g,0}|g|0\leq\max\{g,0\}\leq|g| for any g:(Ω,)(,)g:(\Omega,\mathcal{F})\rightarrow(\mathbb{R},\mathcal{B}_{\mathbb{R}}) and yyγy\mapsto y^{\gamma} is nondecreasing on [0,)[0,\infty) for any γ(0,)\gamma\in(0,\infty),

max{ψ(xt)E(ψ(xt)),0}qψ(xt)E(ψ(xt))q.\|\max\{\psi(x_{t})-E(\psi(x_{t})),0\}\|_{q}\leq\|\psi(x_{t})-E(\psi(x_{t}))\|_{q}.

Hence, MUSq,β(ψ(xt))MDq,β(ψ(xt))aλtψ(𝐱)+b\text{MUS}_{q,\beta}(\psi(x_{t}))\leq\text{MD}_{q,\beta}(\psi(x_{t}))\leq a\lambda^{t}\psi(\mathbf{x})+b.

Part 5: By assumption, ψ(xt)q\psi(x_{t})\in\mathcal{L}^{q} for every tt\in\mathbb{N}, and there exist λ[0,1)\lambda\in[0,1), a[0,)a\in[0,\infty), and bb\in\mathbb{R} such that ϱ(ψ(xt))aλtψ(𝐱)+b\varrho(\psi(x_{t}))\leq a\lambda^{t}\psi(\mathbf{x})+b for every tt\in\mathbb{N} and 𝐱S\mathbf{x}\in S. Using the dual representation (9) of ϱ\varrho with the corresponding family 𝒜\mathcal{A} of densities and ψ(xt)q\psi(x_{t})\in\mathcal{L}^{q} for every tt\in\mathbb{N},

E(ψ(xt)ξ)supξ𝒜E(ψ(xt)ξ)=(9)ϱ(ψ(xt))aλtψ(𝐱)+bE(\psi(x_{t})\xi)\leq\sup_{\xi\in\mathcal{A}}E(\psi(x_{t})\xi)\overset{\eqref{dualrep}}{=}\varrho(\psi(x_{t}))\leq a\lambda^{t}\psi(\mathbf{x})+b

for every ξ𝒜\xi\in\mathcal{A}, tt\in\mathbb{N}, and 𝐱S\mathbf{x}\in S.

The previous theorem provides connections between some different instances of Definition 3. In the next section, we will focus on linear systems and derive sufficient conditions for stability with respect to any real-valued coherent risk functional on q\mathcal{L}^{q} (Theorem 3.3). Then, we will provide a stability result in the context of the mean-conditional-variance functional on 1\mathcal{L}^{1} (Theorem 3.7). The techniques will involve a combination of the properties of the risk functionals of interest and Lyapunov stability theory for discrete-time linear systems.

3 Risk-aware stability conditions
for stochastic linear systems

Now, we consider a stochastic linear time-invariant system, which is a special case of (3), of the form

xt+1=Axt+wt,t=0,1,,x_{t+1}=Ax_{t}+w_{t},\quad t=0,1,\dots, (19)

where An×nA\in\mathbb{R}^{n\times n} is deterministic, (x0,x1,)(x_{0},x_{1},\dots) is an n\mathbb{R}^{n}-valued stochastic process, and (w0,w1,)(w_{0},w_{1},\dots) is an n\mathbb{R}^{n}-valued independent stochastic process. The processes are defined on (Ω,,P)(\Omega,\mathcal{F},P). The initial state is fixed at an arbitrary vector 𝐱n\mathbf{x}\in\mathbb{R}^{n}. For every t0t\in\mathbb{N}_{0}, we define the random object ht(x0,x1,,xt)h_{t}\coloneqq(x_{0},x_{1},\dots,x_{t}) and the σ\sigma-algebra tσ(ht)\mathcal{F}_{t}\coloneqq\sigma(h_{t}) induced by hth_{t}. Thus, xt:(Ω,t)(n,n)x_{t}:(\Omega,\mathcal{F}_{t})\rightarrow(\mathbb{R}^{n},\mathcal{B}_{\mathbb{R}^{n}}). As well as (w0,w1,)(w_{0},w_{1},\dots) being independent, we assume that wtw_{t} and hth_{t} are independent and E(wtwt)n×nE(w_{t}w_{t}^{\top})\in\mathbb{R}^{n\times n} for every t0t\in\mathbb{N}_{0}. The above conditions are standard and ensure that E(|xt|2)E(|x_{t}|^{2}) is finite for every t0t\in\mathbb{N}_{0}. Observe that (w0,w1,)(w_{0},w_{1},\dots) need not be zero-mean, and the disturbance process need not be identically distributed. When we consider the linear system (19), we implicitly assume the conditions stated in this paragraph.

The first result of the section will develop sufficient conditions for stability of the linear system (19) with respect to any real-valued coherent risk functional on q\mathcal{L}^{q}. Then, we will examine how the conditions are related to those that guarantee risk-neutral stability of (19).

Theorem 3.3 (Coherent stability).

Consider the linear system (19) and the quadratic state energy function ψ(x)xRx\psi(x)\coloneqq x^{\top}Rx, where R𝒮n+R\in\mathcal{S}_{n}^{+} is given. Let ϱ\varrho be a real-valued coherent risk functional on q\mathcal{L}^{q} with q[1,)q\in[1,\infty), where 𝒜\mathcal{A} is the family of densities in the associated dual representation. We make the following assumptions:

  1. 1.

    |wt|2q|w_{t}|^{2}\in\mathcal{L}^{q} for every t0t\in\mathbb{N}_{0}, supt0|wt|2q\underset{t\in\mathbb{N}_{0}}{\sup}\big{\|}|w_{t}|^{2}\big{\|}_{q} is finite, and

  2. 2.

    there exists an H𝒮n+H\in\mathcal{S}_{n}^{+} such that HRAHRA𝒮n+H_{R}-A^{\top}H_{R}A\in\mathcal{S}_{n}^{+}.

(Recall that HR(R12)HR12H_{R}\coloneqq(R^{\frac{1}{2}})^{\top}HR^{\frac{1}{2}}.) Then, the system (19) is stable with respect to ϱ\varrho in n\mathbb{R}^{n} (Definition 3). Define ηλmin(HRAHRA)λmax(HR)\eta\coloneqq\frac{\lambda_{\text{min}}(H_{R}-A^{\top}H_{R}A)}{\lambda_{\text{max}}(H_{R})}, and it follows that η(0,1]\eta\in(0,1]. In the case of η<1\eta<1, for any fixed κ(0,1)\kappa\in(0,1), one can choose

λ=1κη,a=λmax(H)λmin(H),b=cb,\textstyle\lambda=1-\kappa\eta,\quad\quad a=\frac{\lambda_{\text{max}}(H)}{\lambda_{\text{min}}(H)},\quad\quad b=cb^{\prime}, (20)

where λ(0,1)\lambda\in(0,1), cλλmin(H)(1λ)(λ(1η))c\coloneqq\frac{\lambda}{\lambda_{\text{min}}(H)(1-\lambda)(\lambda-(1-\eta))}, and

bsupt0ϱ(wtHRwt).b^{\prime}\coloneqq\sup_{t\in\mathbb{N}_{0}}\varrho(w_{t}^{\top}H_{R}w_{t}). (21)

In the case of η=1\eta=1, one can choose λ=a=0\lambda=a=0 and b=c¯bb=\bar{c}b^{\prime}, where bb^{\prime} is defined by (21) and c¯1/λmin(H)\bar{c}\coloneqq 1/\lambda_{\text{min}}(H). Moreover, a distributionally robust risk-neutral stability property holds: E(ψ(xt)ξ)aλtψ(𝐱)+bE(\psi(x_{t})\xi)\leq a\lambda^{t}\psi(\mathbf{x})+b for every density ξ𝒜\xi\in\mathcal{A}, time tt\in\mathbb{N}, and initial condition 𝐱n\mathbf{x}\in\mathbb{R}^{n}.

Remark 3.4 (Discussion of Theorem 3.3).

Theorem 3.3 specifies sufficient conditions that guarantee

ϱ(xtRxt)aλt𝐱R𝐱+c~supt0ϱ(wtHRwt)\varrho(x_{t}^{\top}Rx_{t})\leq a\lambda^{t}\mathbf{x}^{\top}R\mathbf{x}+\tilde{c}\sup_{t\in\mathbb{N}_{0}}\varrho(w_{t}^{\top}H_{R}w_{t}) (22)

for every tt\in\mathbb{N} and 𝐱n\mathbf{x}\in\mathbb{R}^{n}, where a[0,)a\in[0,\infty), λ[0,1)\lambda\in[0,1), c~(0,)\tilde{c}\in(0,\infty), and HR𝒮n+H_{R}\in\mathcal{S}_{n}^{+} are constant. (c~=c\tilde{c}=c or c~=c¯\tilde{c}=\bar{c}, where cc and c¯\bar{c} are provided by Theorem 3.3.) The statement (22) is a risk-aware noise-to-state stability property, where ϱ\varrho evaluates a noise “energy” term wtHRwtw_{t}^{\top}H_{R}w_{t} and the state energy xtRxtx_{t}^{\top}Rx_{t}. Moreover, since λ[0,1)\lambda\in[0,1), we derive the following risk-aware stability bound:

lim suptϱ(xtRxt)c~supt0ϱ(wtHRwt).\limsup_{t\rightarrow\infty}\varrho(x_{t}^{\top}Rx_{t})\leq\tilde{c}\sup_{t\in\mathbb{N}_{0}}\varrho(w_{t}^{\top}H_{R}w_{t}). (23)

Now, let us discuss the assumptions of Theorem 3.3. In the case of q=1q=1, supt0|wt|21=supt0E(|wt|2)\sup_{t\in\mathbb{N}_{0}}\big{\|}|w_{t}|^{2}\big{\|}_{1}=\sup_{t\in\mathbb{N}_{0}}E(|w_{t}|^{2}) is finite, for example, if there is a matrix Σ𝒮n\Sigma\in\mathcal{S}_{n} such that ΣE(wtwt)𝒮n\Sigma-E(w_{t}w_{t}^{\top})\in\mathcal{S}_{n} for every t0t\in\mathbb{N}_{0}. If E(wtwt)E(w_{t}w_{t}^{\top}) is time-invariant, then such a Σ\Sigma exists immediately. (Recall that E(wtwt)n×nE(w_{t}w_{t}^{\top})\in\mathbb{R}^{n\times n} in the model (19).) More generally, the suitability of the first assumption is problem-dependent, and the assumption holds in particular if wtw_{t} is uniformly bounded almost everywhere. Since R𝒮n+R\in\mathcal{S}_{n}^{+}, the second assumption is equivalent to AA being Schur stable; i.e., every eigenvalue of AA has magnitude strictly less than one (Lemma 5.10, Appendix).

We have developed our proof of Theorem 3.3 from first principles using several techniques, including properties of 𝒮n\mathcal{S}_{n} from [57], some basic manipulations from [51, Lemma 1], and properties of real-valued coherent risk functionals from [23].

Proof 3.5.

The property η(0,1]\eta\in(0,1] holds by Lemma 5.12 (Appendix). We shall prove the theorem in the case of η<1\eta<1 and leave the case of η=1\eta=1 to the reader, as the techniques are similar. First, we will verify some properties of the parameters in (20). The properties η(0,1)\eta\in(0,1) and κ(0,1)\kappa\in(0,1) imply that λ(0,1)\lambda\in(0,1). The property a(0,)a\in(0,\infty) holds because H𝒮n+H\in\mathcal{S}_{n}^{+}. The property c(0,)c\in(0,\infty) is true because λ(0,1)\lambda\in(0,1), λ>1η\lambda>1-\eta, and H𝒮n+H\in\mathcal{S}_{n}^{+}. For every ξ𝒜\xi\in\mathcal{A} and t0t\in\mathbb{N}_{0},

0E(wtHRwtξ)λmax(HR)E(|wt|2ξ)0\leq E(w_{t}^{\top}H_{R}w_{t}\xi)\leq\lambda_{\max}(H_{R})E(|w_{t}|^{2}\xi) (24)

because HR𝒮n+H_{R}\in\mathcal{S}_{n}^{+}, wtw_{t} is n\mathbb{R}^{n}-valued, and ξ0\xi\geq 0 a.e. Also,

E(|wt|2ξ)|wt|2qξq,ξ𝒜,t0,E(|w_{t}|^{2}\xi)\leq\|\,|w_{t}|^{2}\,\|_{q}\;\|\xi\|_{q*},\quad\xi\in\mathcal{A},\quad t\in\mathbb{N}_{0}, (25)

by applying [58, Hölder’s Inequality, Th. 6.8 (a)]. Since ϱ\varrho is a real-valued coherent risk functional on q\mathcal{L}^{q}, 𝒜\mathcal{A} is a bounded subset of q\mathcal{L}^{q*} [23, Th. 6.6]. The property b[0,)b\in[0,\infty) follows from (24)–(25), boundedness of 𝒜\mathcal{A} in q\|\cdot\|_{q*}, the assumptions of the theorem, the dual representation of ϱ\varrho (9), and c(0,)c\in(0,\infty).

Second, we will verify properties of some expectations. For any density ξ𝒜\xi\in\mathcal{A}, initial condition 𝐱n\mathbf{x}\in\mathbb{R}^{n}, and function g:(n,n)(,)g:(\mathbb{R}^{n},\mathcal{B}_{\mathbb{R}^{n}})\rightarrow(\mathbb{R},\mathcal{B}_{\mathbb{R}}),

E(g(x0)ξ)=g(𝐱)E(g(x_{0})\xi)=g(\mathbf{x}) (26)

because E(ξ)=1E(\xi)=1 and the distribution of x0x_{0} is the Dirac measure concentrated at 𝐱\mathbf{x}. Let t0t\in\mathbb{N}_{0}, ξ𝒜\xi\in\mathcal{A}, and M𝒮nM\in\mathcal{S}_{n} be given. Then, xtMxtqx_{t}^{\top}Mx_{t}\in\mathcal{L}^{q} and xtMxtξ1x_{t}^{\top}Mx_{t}\xi\in\mathcal{L}^{1}, and these functions are a.e.-nonnegative (Lemma 5.15, Appendix). In particular, since R𝒮n+R\in\mathcal{S}_{n}^{+} and HR𝒮n+H_{R}\in\mathcal{S}_{n}^{+}, ψ(xt)=xtRxtq\psi(x_{t})=x_{t}^{\top}Rx_{t}\in\mathcal{L}^{q} and E(xtHRxtξ)[0,)E(x_{t}^{\top}H_{R}x_{t}\xi)\in[0,\infty).

Having verified the above properties, we are ready to show that ϱ(ψ(xt))aλtψ(𝐱)+b\varrho(\psi(x_{t}))\leq a\lambda^{t}\psi(\mathbf{x})+b for every time tt\in\mathbb{N} and initial condition 𝐱n\mathbf{x}\in\mathbb{R}^{n}. Define v:nv:\mathbb{R}^{n}\rightarrow\mathbb{R} by v(z)zHRzv(z)\coloneqq z^{\top}H_{R}z, which will serve as a Lyapunov function. Recall the definition of bb^{\prime} from (21). We claim it suffices to prove that, for every ξ𝒜\xi\in\mathcal{A} and t0t\in\mathbb{N}_{0},

E(v(xt+1)ξ)λE(v(xt)ξ)+λλ(1η)b.\textstyle E(v(x_{t+1})\xi)\leq\lambda E(v(x_{t})\xi)+\frac{\lambda}{\lambda-(1-\eta)}b^{\prime}. (27)

Indeed, the statement (27) and E(v(xt)ξ)[0,)E(v(x_{t})\xi)\in[0,\infty) for every t0t\in\mathbb{N}_{0} imply

0E(v(xt)ξ)λtE(v(x0)ξ)+λb(1λ)(λ(1η))0\leq E(v(x_{t})\xi)\leq\lambda^{t}E(v(x_{0})\xi)+\textstyle\frac{\lambda b^{\prime}}{(1-\lambda)(\lambda-(1-\eta))} (28)

for every tt\in\mathbb{N}, where we use λ(0,1)\lambda\in(0,1), λ>1η\lambda>1-\eta, the geometric series formula, and b[0,)b^{\prime}\in[0,\infty) (Lemma 5.17, Appendix). Since H𝒮n+H\in\mathcal{S}_{n}^{+}, R𝒮n+R\in\mathcal{S}_{n}^{+}, and every ξ𝒜\xi\in\mathcal{A} is nonnegative a.e., we have

λmin(H)E(ψ(xt)ξ)E(v(xt)ξ)λmax(H)E(ψ(xt)ξ)\hskip-6.02249pt\lambda_{\text{min}}(H)E(\psi(x_{t})\xi)\leq E(v(x_{t})\xi)\leq\lambda_{\text{max}}(H)E(\psi(x_{t})\xi)\hskip-3.01125pt (29)

for every ξ𝒜\xi\in\mathcal{A} and t0t\in\mathbb{N}_{0} (Lemma 5.15, Appendix). We divide by λmin(H)(0,)\lambda_{\text{min}}(H)\in(0,\infty) to find

0E(ψ(xt)ξ)1λmin(H)E(v(xt)ξ).\textstyle 0\leq E(\psi(x_{t})\xi)\leq\frac{1}{\lambda_{\text{min}}(H)}E(v(x_{t})\xi). (30)

Using (29) with t=0t=0 and (26) with g=ψg=\psi lead to

0E(v(x0)ξ)λmax(H)E(ψ(x0)ξ)=λmax(H)ψ(𝐱)\hskip-6.02249pt0\leq E(v(x_{0})\xi)\leq\lambda_{\text{max}}(H)E(\psi(x_{0})\xi)=\lambda_{\text{max}}(H)\psi(\mathbf{x})\hskip-3.01125pt (31)

for every density ξ𝒜\xi\in\mathcal{A} and initial condition 𝐱n\mathbf{x}\in\mathbb{R}^{n}. The statements (28) and (31) imply

0E(v(xt)ξ)λtλmax(H)ψ(𝐱)+λb(1λ)(λ(1η))0\leq E(v(x_{t})\xi)\leq\lambda^{t}\lambda_{\text{max}}(H)\psi(\mathbf{x})+\textstyle\frac{\lambda b^{\prime}}{(1-\lambda)(\lambda-(1-\eta))} (32)

for every ξ𝒜\xi\in\mathcal{A}, tt\in\mathbb{N}, and 𝐱n\mathbf{x}\in\mathbb{R}^{n}. We divide (32) by λmin(H)(0,)\lambda_{\text{min}}(H)\in(0,\infty) and substitute the expressions for aa and bb from (20) to find

01λmin(H)E(v(xt)ξ)aλtψ(𝐱)+b\textstyle 0\leq\frac{1}{\lambda_{\text{min}}(H)}E(v(x_{t})\xi)\leq a\lambda^{t}\psi(\mathbf{x})+b (33)

for every ξ𝒜\xi\in\mathcal{A}, tt\in\mathbb{N}, and 𝐱n\mathbf{x}\in\mathbb{R}^{n}. The statements (30) and (33) imply

0E(ψ(xt)ξ)aλtψ(𝐱)+b0\leq E(\psi(x_{t})\xi)\leq a\lambda^{t}\psi(\mathbf{x})+b (34)

for every ξ𝒜\xi\in\mathcal{A}, tt\in\mathbb{N}, and 𝐱n\mathbf{x}\in\mathbb{R}^{n}. Then, taking the supremum over 𝒜\mathcal{A} and using the dual representation (9) of ϱ\varrho lead to

ϱ(ψ(xt))aλtψ(𝐱)+b,t,𝐱n.\varrho(\psi(x_{t}))\leq a\lambda^{t}\psi(\mathbf{x})+b,\quad t\in\mathbb{N},\quad\mathbf{x}\in\mathbb{R}^{n}. (35)

Hence, it suffices to show (27), which we prove in the Appendix (see Lemma 5.14 and Lemma 5.21).

Theorem 3.3 provides sufficient interpretable conditions for global stability of the linear system (19) with respect to any real-valued coherent risk functional ϱ\varrho on q\mathcal{L}^{q} (Definition 3). We have presented some common examples of ϱ\varrho in Section 2, which assess different distributional characteristics of the state energy. For instance, ϱ(ψ(xt))=CVaRα(ψ(xt))\varrho(\psi(x_{t}))=\text{CVaR}_{\alpha}(\psi(x_{t})) is the average energy in the α\alpha-fraction of the largest energies, if ψ(xt)\psi(x_{t}) is a continuous random variable. For another example, ϱ(ψ(xt))=MUSq,β(ψ(xt))\varrho(\psi(x_{t}))=\text{MUS}_{q,\beta}(\psi(x_{t})) is a weighted sum of the average energy μtE(ψ(xt))\mu_{t}\coloneqq E(\psi(x_{t})) and the order-qq upper-semideviation max{ψ(xt)μt,0}q\|\max\{\psi(x_{t})-\mu_{t},0\}\|_{q} of the energy. Theorem 3.3 facilitates the analysis of various characteristics of the distribution of the state energy and offers a distributionally robust stability guarantee (see also Part 5 of Theorem 1).

It is quite interesting that the assumptions of Theorem 3.3 in the case of q=1q=1 ensure risk-neutral stability of the system (19), i.e., the existence of λN[0,1)\lambda_{\text{N}}\in[0,1), aN[0,)a_{\text{N}}\in[0,\infty), and bNb_{\text{N}}\in\mathbb{R} such that

E(ψ(xt))aNλNtψ(𝐱)+bN,t,𝐱n.E(\psi(x_{t}))\leq a_{\text{N}}\lambda_{\text{N}}^{t}\psi(\mathbf{x})+b_{\text{N}},\quad t\in\mathbb{N},\quad\mathbf{x}\in\mathbb{R}^{n}. (36)

While the proof of this risk-neutral result is nontrivial, it is a simpler version of the proof of Theorem 3.3 (consider ξ=1\xi=1 a.e. and q=1q=1). The values of λN\lambda_{\text{N}} and aNa_{\text{N}} can be chosen to be the values of λ\lambda and aa, respectively, provided by Theorem 3.3. However, the offset term bNb_{\text{N}} in the risk-neutral result and the offset term bb in the risk-aware result (Theorem 3.3) are distinct. In the case of η<1\eta<1, one can choose

bN=csupt0E(wtHRwt),b_{\text{N}}=c\sup_{t\in\mathbb{N}_{0}}E(w_{t}^{\top}H_{R}w_{t}), (37)

while in the case of η=1\eta=1, one can choose bN=c¯supt0E(wtHRwt)b_{\text{N}}=\bar{c}\sup_{t\in\mathbb{N}_{0}}E(w_{t}^{\top}H_{R}w_{t}), where cc and c¯\bar{c} are specified by Theorem 3.3.

Let us further discuss the different offset terms. The offset term bNb_{\text{N}} in (37) is proportional to the supremum of the mean of wtHRwtw_{t}^{\top}H_{R}w_{t}. In contrast, the offset term bb in Theorem 3.3 is proportional to the supremum of the risk of wtHRwtw_{t}^{\top}H_{R}w_{t}, where the meaning of risk is specific to the functional ϱ\varrho of interest. As well as permitting a risk-aware perception of wtHRwtw_{t}^{\top}H_{R}w_{t} that is specific to the primal interpretation of ϱ\varrho, Theorem 3.3 permits distributional ambiguity in wtHRwtw_{t}^{\top}H_{R}w_{t} due to the dual representation of ϱ\varrho. This attribute is useful in settings with distributional modeling uncertainty, which we will illustrate by using the mean-upper-semideviation as an example (Section 4, Illustration 1).

Theorem 3.7 to follow will use a mean-conditional-variance functional ρν\rho_{\nu} (Example 4) and additional assumptions about the distribution of wtw_{t}, which are inspired by [44]. Theorem 3.7 will not provide a distributional robustness property, which is not surprising because ρν\rho_{\nu} is noncoherent in general. However, Theorem 3.7 will provide a versatile risk-aware noise-to-state stability property that depends on second-, third-, and fourth-order centered noise statistics, which is relevant for controller design (to be explored in Illustration 3, Section 4). Prior to presenting Theorem 3.7, we will present some useful notation.

Remark 3.6 (Notation for Theorem 3.7).

As in Theorem 3.3, Theorem 3.7 will also involve the linear system (19) and a quadratic state energy function ψ(x)xRx\psi(x)\coloneqq x^{\top}Rx, where R𝒮n+R\in\mathcal{S}_{n}^{+} is given. Since ψ(xt)1\psi(x_{t})\in\mathcal{L}^{1} for any tt\in\mathbb{N}, the mean-conditional-variance of ψ(xt)\psi(x_{t}) is well-defined (Example 4). For every tt\in\mathbb{N}, ρν(ψ(xt))\rho_{\nu}(\psi(x_{t})) denotes the mean-conditional-variance of ψ(xt)\psi(x_{t}), where t1=σ(x0,x1,,xt1)\mathcal{F}_{t-1}=\sigma(x_{0},x_{1},\dots,x_{t-1}) is the sub σ\sigma-algebra of \mathcal{F} of interest. For every tt\in\mathbb{N}, we define w¯t1E(wt1)\bar{w}_{t-1}\coloneqq E(w_{t-1}), dtwt1w¯t1d_{t}\coloneqq w_{t-1}-\bar{w}_{t-1}, and ΣtE(dtdt)\Sigma_{t}\coloneqq E(d_{t}d_{t}^{\top}).

Theorem 3.7 (Mean-cond.-variance stability).

Consider the stochastic linear system (19), and let R𝒮n+R\in{\cal S}_{n}^{+} and ν[0,)\nu\in[0,\infty) be given. Consider the quadratic state energy function ψ(x)xRx\psi(x)\coloneqq x^{\top}Rx. We make the following assumptions:

  1. 1.

    E(|wt|4)E(|w_{t}|^{4}) is finite for every t0t\in\mathbb{N}_{0}, and there exists a matrix Σu𝒮n\Sigma_{u}\in{\cal S}_{n} such that ΣuΣt𝒮n\Sigma_{u}-\Sigma_{t}\in\mathcal{S}_{n} for every tt\in\mathbb{N}.

  2. 2.

    Define RνR+4νRΣuRR_{\nu}\coloneqq R+4\nu R\Sigma_{u}R. There exists a matrix Hν𝒮n+H^{\nu}\in{\cal S}_{n}^{+} such that HRννAHRννA𝒮n+H_{R_{\nu}}^{\nu}-A^{\top}H_{R_{\nu}}^{\nu}A\in{\cal S}_{n}^{+}.

  3. 3.

    The statistics 𝐰¯t\bar{{\bf w}}_{t}, γt\gamma_{t}, and δt\delta_{t} defined by

    𝐰¯t\displaystyle\bar{{\bf w}}_{t} (InAt+1)1i=0tAiw¯ti,\displaystyle\coloneqq\textstyle(I_{n}-A^{t+1})^{-1}\sum_{i=0}^{t}A^{i}\bar{w}_{t-i}, t0,\displaystyle t\in\mathbb{N}_{0}, (38)
    γt\displaystyle\gamma_{t} E(dtdtRdt),\displaystyle\coloneqq E(d_{t}d_{t}^{\top}Rd_{t}), t,\displaystyle t\in\mathbb{N},
    δt\displaystyle\delta_{t} E((dtRdttr(ΣtR))2),\displaystyle\coloneqq E((d_{t}^{\top}Rd_{t}-\text{tr}(\Sigma_{t}R))^{2}), t,\displaystyle t\in\mathbb{N},

    satisfy supt0|𝐰¯t|2<\sup_{t\in\mathbb{N}_{0}}|\bar{{\bf w}}_{t}|^{2}<\infty, supt|γt|2<\sup_{t\in\mathbb{N}}|\gamma_{t}|^{2}<\infty, and suptδt<\sup_{t\in\mathbb{N}}\delta_{t}<\infty, respectively.

Then, there exist {λν,λ0}[0,1)\{\lambda_{\nu},\lambda_{0}\}\subset[0,1), {aν,a0}[0,)\{a_{\nu},a_{0}\}\subset[0,\infty), and bνb_{\nu}\in\mathbb{R} such that

ρν(ψ(xt))\displaystyle\rho_{\nu}(\psi(x_{t}))
aνλνtψ~ν(𝐱)+4ν|γt|a0λ0tλmax(R)ψ~0(𝐱)Exponentially decreasing terms+bν\displaystyle\leq\underbrace{a_{\nu}\lambda_{\nu}^{t}\tilde{\psi}_{\nu}({\bf x})+4\nu|\gamma_{t}|\sqrt{a_{0}\lambda_{0}^{t}\lambda_{\text{max}}(R)\tilde{\psi}_{0}({\bf x})}}_{\text{Exponentially decreasing terms}}\;+\;b_{\nu} (39)

for every time tt\in\mathbb{N} and initial condition 𝐱n\mathbf{x}\in\mathbb{R}^{n}, where we define ψ~ν(𝐱)4ψν(𝐱)+4λmax(Rν)supt0|𝐰¯t|2\tilde{\psi}_{\nu}({\bf x})\hskip-1.00374pt\coloneqq\hskip-1.00374pt4\psi_{\nu}({\bf x})+4\lambda_{\text{max}}(R_{\nu})\sup_{t\in\mathbb{N}_{0}}|\bar{{\bf w}}_{t}|^{2} and ψν(𝐱)𝐱Rν𝐱\psi_{\nu}({\bf x})\hskip-1.00374pt\hskip-1.00374pt\coloneqq\hskip-1.00374pt{\bf x}^{\top}R_{\nu}{\bf x}. In particular, one can choose

λν=1λmin(HRννAHRννA)λmax(HRνν),aν=λmax(Hν)λmin(Hν),\lambda_{\nu}=1-{\frac{\lambda_{\text{min}}(H_{R_{\nu}}^{\nu}-A^{\top}H_{R_{\nu}}^{\nu}A)}{\lambda_{\text{max}}(H_{R_{\nu}}^{\nu})}},\;\;\;\;a_{\nu}={\frac{\lambda_{\text{max}}(H^{\nu})}{\lambda_{\text{min}}(H^{\nu})}}, (40)

and

bν\displaystyle b_{\nu} =supt{ρν(ψ(dt))+bν,t},\displaystyle=\sup_{t\in\mathbb{N}}\big{\{}\rho_{\nu}(\psi(d_{t}))+b_{\nu,t}\big{\}}, (41)

where, for every tt\in\mathbb{N},

bν,t\displaystyle b_{\nu,t} cν(suptE(dtHRννdt))E(dtRν,tdt)Risk-neutral zero-mean noise bias\displaystyle\coloneqq\underbrace{c_{\nu}\Big{(}\sup_{t\in\mathbb{N}}E(d_{t}^{\top}H_{R_{\nu}}^{\nu}d_{t})\Big{)}-E(d_{t}^{\top}R_{\nu,t}d_{t})}_{\text{Risk-neutral zero-mean noise bias}}
+2𝐰¯t1Rν𝐰¯t1+4νγtR𝐰¯t1Noise mean and skewness terms,\displaystyle\quad+\underbrace{2\bar{{\bf w}}_{t-1}^{\top}R_{\nu}\bar{{\bf w}}_{t-1}+4\nu\gamma_{t}^{\top}R\bar{{\bf w}}_{t-1}}_{\text{Noise mean and skewness terms}}, (42)

Rν,tR+4νRΣtRR_{\nu,t}\coloneqq R+4\nu R\Sigma_{t}R, and cν2λmin(Hν)(1λν)c_{\nu}\coloneqq\frac{2}{\lambda_{\text{min}}(H^{\nu})(1-\lambda_{\nu})}.

Remark 3.8 (Discussion of Theorem 3.7).

Theorem 3.7 establishes a global stability property with respect to ρν\rho_{\nu}, closely resembling Definition 3, where a sum of exponentially decreasing terms appears on the right side of (39). Similar to Theorem 3.3, Theorem 3.7 also uncovers a risk-aware noise-to-state stability property. By denoting the sum of exponentially decreasing terms in (39) by gν,𝐱,tg_{\nu,\mathbf{x},t} and substituting the expression for bνb_{\nu} from (41), we can express (39) as follows:

ρν(ψ(xt))gν,𝐱,t+supt{ρν(ψ(dt))+bν,t}\rho_{\nu}(\psi(x_{t}))\leq g_{\nu,\mathbf{x},t}+\sup_{t\in\mathbb{N}}\big{\{}\rho_{\nu}(\psi(d_{t}))+b_{\nu,t}\big{\}} (43)

for every time tt\in\mathbb{N} and initial condition 𝐱n\mathbf{x}\in\mathbb{R}^{n}. The risk functional ρν\rho_{\nu} evaluates the state energy on the left side of (43), and ρν\rho_{\nu} evaluates the energy of the centered noise dtd_{t} on the right side of (43). Under the assumptions of Theorem 3.7, gν,𝐱,tg_{\nu,\mathbf{x},t} decays to zero as tt\rightarrow\infty, and therefore, Theorem 3.7 provides the following risk-aware stability bound:

lim suptρν(ψ(xt))supt{ρν(ψ(dt))+bν,t}.\limsup_{t\rightarrow\infty}\rho_{\nu}(\psi(x_{t}))\leq\sup_{t\in\mathbb{N}}\big{\{}\rho_{\nu}(\psi(d_{t}))+b_{\nu,t}\big{\}}. (44)

Now, let us discuss the assumptions of Theorem 3.7. For any random variable YY on (Ω,,P)(\Omega,\mathcal{F},P), E(Y4)E(Y^{4}) is finite in many cases, including when YY has a Gaussian distribution, a distribution with bounded support, a log-normal distribution (which is heavy-tailed), or a mixture of these distributions. Using Rν𝒮n+R_{\nu}\in\mathcal{S}_{n}^{+}, the second assumption of Theorem 3.7 is equivalent to AA being Schur stable (Lemma 5.10, Appendix). Hence, one is not an eigenvalue of AA, which ensures that InAt+1I_{n}-A^{t+1} is invertible for every t0t\in\mathbb{N}_{0}. In the case of w¯t=w¯\bar{w}_{t}=\bar{w} for every t0t\in\mathbb{N}_{0}, it follows that 𝐰¯t=(InA)1w¯\bar{{\bf w}}_{t}=(I_{n}-A)^{-1}\bar{w}, and so supt0|𝐰¯t|2\sup_{t\in\mathbb{N}_{0}}|\bar{{\bf w}}_{t}|^{2} is finite (more details to be provided in the proof). We will prove Theorem 3.7 next.

Proof 3.9.

Let ν[0,)\nu\in[0,\infty) be given, and let 𝐱n\mathbf{x}\in\mathbb{R}^{n} be an arbitrary initial condition of the system (19). Unrolling the recursion provides xt+1=At+1x0+i=0tAiwtix_{t+1}=A^{t+1}x_{0}+\sum_{i=0}^{t}A^{i}w_{t-i} for every t0t\in\mathbb{N}_{0}, and thus,

xt+1\displaystyle x_{t+1} =At+1x0+i=0tAiw¯ti+i=0tAi(wtiw¯ti)\displaystyle=A^{t+1}x_{0}+\sum_{i=0}^{t}A^{i}\bar{w}_{t-i}+\sum_{i=0}^{t}A^{i}(w_{t-i}-\bar{w}_{t-i})
=At+1x0+(InAt+1)𝐰¯t+i=0tAidti+1,\displaystyle=A^{t+1}x_{0}+(I_{n}-A^{t+1})\bar{{\bf w}}_{t}+\sum_{i=0}^{t}A^{i}d_{t-i+1}, (45)

where the first line of (38) specifies the definition of 𝐰¯t\bar{{\bf w}}_{t}. That is, for every t0t\in\mathbb{N}_{0}, we have the “centered” solution

xt+1𝐰¯t=At+1(x0𝐰¯t)+i=0tAidti+1.x_{t+1}-\bar{{\bf w}}_{t}=A^{t+1}(x_{0}-\bar{{\bf w}}_{t})+\sum_{i=0}^{t}A^{i}d_{t-i+1}. (46)

It is convenient to define the “centered” state

x^t+1xt+1𝐰¯txt+1=x^t+1+𝐰¯t\hat{x}_{t+1}\coloneqq x_{t+1}-\bar{{\bf w}}_{t}\iff x_{t+1}=\hat{x}_{t+1}+\bar{{\bf w}}_{t} (47)

for every t0t\in\mathbb{N}_{0}. Additionally, the mean of xt+1x_{t+1} is given by

E(xt+1)=At+1𝐱+i=0tAiw¯ti=At+1(𝐱𝐰¯t)+𝐰¯tE(x_{t+1})=A^{t+1}{\bf x}+\sum_{i=0}^{t}A^{i}\bar{w}_{t-i}=A^{t+1}({\bf x}-\bar{{\bf w}}_{t})+\bar{{\bf w}}_{t} (48)

for every t0t\in\mathbb{N}_{0}. As a special case, in the setting of mean-stationary noise, i.e., w¯t=w¯\bar{w}_{t}=\bar{w} for every t0t\in\mathbb{N}_{0}, we obtain

E(xt+1)\displaystyle E(x_{t+1}) =At+1𝐱+i=0tAiw¯\displaystyle=A^{t+1}{\bf x}+\sum_{i=0}^{t}A^{i}\bar{w}
=At+1𝐱+(InAt+1)(InA)1w¯,\displaystyle=A^{t+1}{\bf x}+(I_{n}-A^{t+1})(I_{n}-A)^{-1}\bar{w}, (49)

and so 𝐰¯t=(InA)1w¯\bar{{\bf w}}_{t}=(I_{n}-A)^{-1}\bar{w}. (AA being Schur stable implies that i=0tAi=(InAt+1)(InA)1\sum_{i=0}^{t}A^{i}=(I_{n}-A^{t+1})(I_{n}-A)^{-1} [57, Ex. 5.6.P26].) This case may be analyzed as well and provide improved constants, but we do not pursue it further for the sake of generality.

Now, let us study the stability of the system with respect to ρν\rho_{\nu}. For convenience, we define RνR+4νRΣuRR_{\nu}\coloneqq R+4\nu R\Sigma_{u}R and t\mathbb{C}_{t} for every tt\in\mathbb{N} by

t\displaystyle\mathbb{C}_{t} νδt4νtr((ΣtR)2)\displaystyle\coloneqq\nu\delta_{t}-4\nu\text{tr}((\Sigma_{t}R)^{2})
=νvar(dtRdt)4νE(dtRΣtRdt),\displaystyle\hphantom{:}=\nu{\textbf{var}}(d_{t}^{\top}Rd_{t})-4\nu E(d_{t}^{\top}R\Sigma_{t}Rd_{t}), (50)

where var denotes variance. For every tt\in\mathbb{N}, we have (see Lemma 5.23 in the Appendix for the first line)

ρν(ψ(xt))\displaystyle\rho_{\nu}(\psi(x_{t})) =E(xt(R+4νRΣtR)xt+4νxtRγt)+t\displaystyle=E(x_{t}^{\top}(R+4\nu R\Sigma_{t}R)x_{t}+4\nu x_{t}^{\top}R\gamma_{t})+\mathbb{C}_{t}
=E(xt(R+4νRΣtR)xt)+4νγtRE(xt)+t\displaystyle=E(x_{t}^{\top}(R+4\nu R\Sigma_{t}R)x_{t})+4\nu\gamma_{t}^{\top}RE(x_{t})+\mathbb{C}_{t}
E(xt(R+4νRΣuR)xt)+4νγtRE(xt)+t\displaystyle\leq E(x_{t}^{\top}(R+4\nu R\Sigma_{u}R)x_{t})+4\nu\gamma_{t}^{\top}RE(x_{t})+\mathbb{C}_{t}
=E(xtRνxt)+4νγtRE(xt)+t,\displaystyle=E(x_{t}^{\top}R_{\nu}x_{t})+4\nu\gamma_{t}^{\top}RE(x_{t})+\mathbb{C}_{t}, (51)

where we have used the assumption that Σu𝒮n\Sigma_{u}\in\mathcal{S}_{n} satisfies ΣuΣt𝒮n\Sigma_{u}-\Sigma_{t}\in\mathcal{S}_{n} for every tt\in\mathbb{N}.

The risk ρν(ψ(xt))\rho_{\nu}(\psi(x_{t})) has two distinct dynamical terms whose stability requires analysis, i.e., the quadratic term E(xtRνxt)E(x_{t}^{\top}R_{\nu}x_{t}) and the cross-term γtRE(xt)\gamma_{t}^{\top}RE(x_{t}). For the quadratic term, we use (47) to write, for every tt\in\mathbb{N},

E(xtRνxt)2E(x^tRνx^t)+2𝐰¯t1Rν𝐰¯t1.E(x_{t}^{\top}R_{\nu}x_{t})\leq 2E(\hat{x}_{t}^{\top}R_{\nu}\hat{x}_{t})+2\bar{{\bf w}}_{t-1}^{\top}R_{\nu}\bar{{\bf w}}_{t-1}. (52)

For the expectation on the right side of (52), we apply (46) to derive (note that the noise process dtd_{t} is zero-mean)

E(x^tRνx^t)\displaystyle E(\hat{x}_{t}^{\top}R_{\nu}\hat{x}_{t})
=E((x0𝐰¯t1)[At]RνAt(x0𝐰¯t1)\displaystyle=E\bigg{(}(x_{0}-\bar{{\bf w}}_{t-1})^{\top}[A^{t}]^{\top}R_{\nu}A^{t}(x_{0}-\bar{{\bf w}}_{t-1})
+[i=0t1Aidti]Rν[i=0t1Aidti])\displaystyle\quad\quad+\bigg{[}\sum_{i=0}^{t-1}A^{i}d_{t-i}\bigg{]}^{\top}R_{\nu}\bigg{[}\sum_{i=0}^{t-1}A^{i}d_{t-i}\bigg{]}\bigg{)}
E(2x0[At]RνAtx0+2𝐰¯t1[At]RνAt𝐰¯t1\displaystyle\leq E\bigg{(}2x_{0}^{\top}[A^{t}]^{\top}R_{\nu}A^{t}x_{0}+2\bar{{\bf w}}_{t-1}^{\top}[A^{t}]^{\top}R_{\nu}A^{t}\bar{{\bf w}}_{t-1}
+[i=0t1Aidti]Rν[i=0t1Aidti])\displaystyle\quad\quad+\bigg{[}\sum_{i=0}^{t-1}A^{i}d_{t-i}\bigg{]}^{\top}R_{\nu}\bigg{[}\sum_{i=0}^{t-1}A^{i}d_{t-i}\bigg{]}\bigg{)}
=E(x~tRνx~t)+2𝐰¯t1[At]RνAt𝐰¯t1,\displaystyle=E(\tilde{x}_{t}^{\top}R_{\nu}\tilde{x}_{t})+2\bar{{\bf w}}_{t-1}^{\top}[A^{t}]^{\top}R_{\nu}A^{t}\bar{{\bf w}}_{t-1}, (53)

for every tt\in\mathbb{N}, where the process (x~1,x~2,)(\tilde{x}_{1},\tilde{x}_{2},\dots) is defined by

x~t+1=At+12x0+i=0tAidti+1,t0.\tilde{x}_{t+1}=A^{t+1}\sqrt{2}x_{0}+\sum_{i=0}^{t}A^{i}d_{t-i+1},\quad t\in\mathbb{N}_{0}. (54)

Equivalently, it holds that

x~t+1=Ax~t+dt+1,t0,\tilde{x}_{t+1}=A\tilde{x}_{t}+d_{t+1},\quad t\in\mathbb{N}_{0}, (55)

where we define x~02x0\tilde{x}_{0}\coloneqq\sqrt{2}x_{0}. Now, consider a Schur stable deterministic system defined by zt+1=Aztz_{t+1}=Az_{t} for every t0t\in\mathbb{N}_{0} with an arbitrary initial state z0=𝐳nz_{0}=\bm{z}\in\mathbb{R}^{n}. It holds that666For any tt\in\mathbb{N}, it holds that 𝐰¯t1[At]RνAt𝐰¯t1\displaystyle\bar{{\bf w}}_{t-1}^{\top}[A^{t}]^{\top}R_{\nu}A^{t}\bar{{\bf w}}_{t-1} =|Rν1/2At𝐰¯t1|2\displaystyle=\big{|}R_{\nu}^{1/2}A^{t}\bar{{\bf w}}_{t-1}\big{|}^{2} (|Rν1/2At|2|𝐰¯t1|)2\displaystyle\leq\Big{(}\big{|}R_{\nu}^{1/2}A^{t}\big{|}_{2}\;\big{|}\bar{{\bf w}}_{t-1}\big{|}\Big{)}^{2} =|𝐰¯t1|2(sup|𝐳|=1|Rν1/2At𝐳|)2\displaystyle=\big{|}\bar{{\bf w}}_{t-1}\big{|}^{2}\bigg{(}\sup_{|\bm{z}|=1}\big{|}R_{\nu}^{1/2}A^{t}\bm{z}\big{|}\bigg{)}^{2} =|𝐰¯t1|2sup|𝐳|=1|Rν1/2At𝐳|2\displaystyle=\big{|}\bar{{\bf w}}_{t-1}\big{|}^{2}\sup_{|\bm{z}|=1}\big{|}R_{\nu}^{1/2}A^{t}\bm{z}\big{|}^{2} =|𝐰¯t1|2sup|𝐳|=1𝐳[At]RνAt𝐳\displaystyle=|\bar{{\bf w}}_{t-1}|^{2}\sup_{|\bm{z}|=1}\bm{z}^{\top}[A^{t}]^{\top}R_{\nu}A^{t}\bm{z} =|𝐰¯t1|2sup|𝐳|=1ztRνzt.\displaystyle=|\bar{{\bf w}}_{t-1}|^{2}\sup_{|\bm{z}|=1}z_{t}^{\top}R_{\nu}z_{t}. (56)

0𝐰¯t1[At]RνAt𝐰¯t1|𝐰¯t1|2sup|𝒛|=1ztRνzt0\leq\bar{{\bf w}}_{t-1}^{\top}[A^{t}]^{\top}R_{\nu}A^{t}\bar{{\bf w}}_{t-1}\leq|\bar{{\bf w}}_{t-1}|^{2}\sup_{|\bm{z}|=1}z_{t}^{\top}R_{\nu}z_{t} (57)

for every tt\in\mathbb{N}. Consequently, we have shown that

E(x^tRνx^t)E(x~tRνx~t)+2|𝐰¯t1|2sup|𝒛|=1ztRνztE(\hat{x}_{t}^{\top}R_{\nu}\hat{x}_{t})\leq E(\tilde{x}_{t}^{\top}R_{\nu}\tilde{x}_{t})+2|\bar{{\bf w}}_{t-1}|^{2}\sup_{|\bm{z}|=1}z_{t}^{\top}R_{\nu}z_{t} (58)

for every tt\in\mathbb{N}. Next, let us study the cross term γtRE(xt)\gamma_{t}^{\top}RE(x_{t}) in (51). For any tt\in\mathbb{N}, we use (48) to find

γtRE(xt)\displaystyle\,\gamma_{t}^{\top}RE(x_{t}) (59)
=γtRAt(𝐱𝐰¯t1)+γtR𝐰¯t1\displaystyle=\hskip-1.00374pt\gamma_{t}^{\top}RA^{t}({\bf x}-\bar{{\bf w}}_{t-1})+\gamma_{t}^{\top}R\bar{{\bf w}}_{t-1}
|R1/2γt||R1/2At(𝐱𝐰¯t1)|+γtR𝐰¯t1\displaystyle\leq\hskip-1.00374pt|R^{1/2}\gamma_{t}||R^{1/2}A^{t}({\bf x}-\bar{{\bf w}}_{t-1})|+\gamma_{t}^{\top}R\bar{{\bf w}}_{t-1}
=|R1/2γt|(𝐱𝐰¯t1)[At]RAt(𝐱𝐰¯t1)\displaystyle=\hskip-1.00374pt|R^{1/2}\gamma_{t}|\sqrt{({\bf x}-\bar{{\bf w}}_{t-1})^{\top}[A^{t}]^{\top}RA^{t}({\bf x}-\bar{{\bf w}}_{t-1})}
+γtR𝐰¯t1\displaystyle\quad\quad+\gamma_{t}^{\top}R\bar{{\bf w}}_{t-1}
|R1/2γt|2𝐱[At]RAt𝐱+2𝐰¯t1[At]RAt𝐰¯t1\displaystyle\leq\hskip-1.00374pt|R^{1/2}\gamma_{t}|\sqrt{2{\bf x}^{\top}[A^{t}]^{\top}RA^{t}{\bf x}+2\bar{{\bf w}}_{t-1}^{\top}[A^{t}]^{\top}RA^{t}\bar{{\bf w}}_{t-1}}
+γtR𝐰¯t1\displaystyle\quad\quad+\gamma_{t}^{\top}R\bar{{\bf w}}_{t-1}
|R1/2γt|[xtd]Rxtd+2|𝐰¯t1|2sup|𝒛|=1ztRzt+γtR𝐰¯t1,\displaystyle\leq\hskip-1.00374pt|R^{1/2}\gamma_{t}|\sqrt{[x_{t}^{d}]^{\top}Rx_{t}^{d}\hskip-1.00374pt+\hskip-1.00374pt2|\bar{{\bf w}}_{t-1}|^{2}\sup_{|\bm{z}|=1}z_{t}^{\top}Rz_{t}}+\hskip-1.00374pt\gamma_{t}^{\top}R\bar{{\bf w}}_{t-1},

where xt+1d=Axtdx_{t+1}^{d}=Ax_{t}^{d} for every t0t\in\mathbb{N}_{0} with initialization x0d=2𝐱x_{0}^{d}=\sqrt{2}{\bf x} is another Schur stable deterministic system. Overall, for every tt\in\mathbb{N}, we have shown that

ρν(ψ(xt))\displaystyle\rho_{\nu}(\psi(x_{t})) 2E(x~tRνx~t)+4|𝐰¯t1|2sup|𝒛|=1ztRνzt\displaystyle\leq 2E(\tilde{x}_{t}^{\top}R_{\nu}\tilde{x}_{t})+4|\bar{{\bf w}}_{t-1}|^{2}\sup_{|\bm{z}|=1}z_{t}^{\top}R_{\nu}z_{t}
+4ν|R1/2γt|[xtd]Rxtd+2|𝐰¯t1|2sup|𝒛|=1ztRzt\displaystyle\;\;+4\nu|R^{1/2}\gamma_{t}|\sqrt{[x_{t}^{d}]^{\top}Rx_{t}^{d}+2|\bar{{\bf w}}_{t-1}|^{2}\sup_{|\bm{z}|=1}z_{t}^{\top}Rz_{t}}
+2𝐰¯t1Rν𝐰¯t1+4νγtR𝐰¯t1+t.\displaystyle\;\;+2\bar{{\bf w}}_{t-1}^{\top}R_{\nu}\bar{{\bf w}}_{t-1}+4\nu\gamma_{t}^{\top}R\bar{{\bf w}}_{t-1}+\mathbb{C}_{t}. (60)

To complete the proof, we will analyze the terms in the right side of (60). For convenience, we define

sνsuptE(dtHRννdt),ηνλmin(HRννAHRννA)λmax(HRνν),s_{\nu}\coloneqq\sup_{t\in\mathbb{N}}E(d_{t}^{\top}H_{R_{\nu}}^{\nu}d_{t}),\quad\eta_{\nu}\coloneqq\dfrac{\lambda_{\text{min}}(H_{R_{\nu}}^{\nu}-A^{\top}H_{R_{\nu}}^{\nu}A)}{\lambda_{\text{max}}(H_{R_{\nu}}^{\nu})}, (61)

and λν1ην\lambda_{\nu}\coloneqq 1-\eta_{\nu}, and thus, sν[0,)s_{\nu}\in[0,\infty) and λν[0,1)\lambda_{\nu}\in[0,1) using the assumptions of Theorem 3.7 (also see Lemma 5.12, Appendix). Moreover, we define ψν(y)yRνy\psi_{\nu}(y)\coloneqq y^{\top}R_{\nu}y for every yny\in\mathbb{R}^{n} and aνλmax(Hν)/λmin(Hν)a_{\nu}\coloneqq\lambda_{\text{max}}(H^{\nu})/\lambda_{\text{min}}(H^{\nu}); note that aν(0,)a_{\nu}\in(0,\infty) by the second assumption. Using R𝒮n+R\in\mathcal{S}_{n}^{+} and Rν𝒮n+R_{\nu}\in\mathcal{S}_{n}^{+}, the second assumption implies the existence of a matrix H0𝒮n+H^{0}\in\mathcal{S}_{n}^{+} such that HR0AHR0A𝒮n+H^{0}_{R}-A^{\top}H_{R}^{0}A\in\mathcal{S}_{n}^{+} (see Lemma 5.10, Appendix). We define R0RR_{0}\coloneqq R, a0λmax(H0)/λmin(H0)a_{0}\coloneqq\lambda_{\text{max}}(H^{0})/\lambda_{\text{min}}(H^{0}), η0\eta_{0} by the analogous expression in (61) with ν=0\nu=0, λ01η0\lambda_{0}\coloneqq 1-\eta_{0}, and ψ0(y)yR0y\psi_{0}(y)\coloneqq y^{\top}R_{0}y for every yny\in\mathbb{R}^{n}. By applying the second assumption in particular, the following statements about the terms in (60) hold:

  • The term E(x~tRνx~t)E(\tilde{x}_{t}^{\top}R_{\nu}\tilde{x}_{t}):

    E(x~tRνx~t)2aνλνtψν(𝐱)+sνλmin(Hν)(1λν),t.\displaystyle\hskip-1.00374pt\hskip-1.00374pt\hskip-1.00374pt\hskip-1.00374pt\hskip-1.00374pt\hskip-1.00374ptE(\tilde{x}_{t}^{\top}R_{\nu}\tilde{x}_{t})\leq 2a_{\nu}\lambda_{\nu}^{t}\psi_{\nu}({\bf x})+\dfrac{s_{\nu}}{\lambda_{\text{min}}(H^{\nu})(1-\lambda_{\nu})},\;\;t\in\mathbb{N}. (62)
  • The term ztRνztz_{t}^{\top}R_{\nu}z_{t} for every ν0\nu\geq 0:

    ztRνzt\displaystyle z_{t}^{\top}R_{\nu}z_{t} aνλνtψν(𝐳),t,𝐳n,\displaystyle\leq a_{\nu}\lambda_{\nu}^{t}\psi_{\nu}({\bf z}),\quad t\in\mathbb{N},\quad\mathbf{z}\in\mathbb{R}^{n}, (63)

    and therefore, for every tt\in\mathbb{N},

    sup|𝒛|=1ztRνzt\displaystyle\sup_{|\bm{z}|=1}z_{t}^{\top}R_{\nu}z_{t} aνλνtsup|𝒛|=1ψν(𝐳)=aνλmax(Rν)λνt.\displaystyle\leq a_{\nu}\lambda_{\nu}^{t}\sup_{|\bm{z}|=1}\psi_{\nu}({\bf z})=a_{\nu}\lambda_{\text{max}}(R_{\nu})\lambda_{\nu}^{t}. (64)
  • The term [xtd]Rxtd[x_{t}^{d}]^{\top}Rx_{t}^{d}:

    [xtd]Rxtd2a0λ0tψ0(𝐱),t.\displaystyle[x_{t}^{d}]^{\top}Rx_{t}^{d}\leq 2a_{0}\lambda_{0}^{t}\psi_{0}({\bf x}),\quad t\in\mathbb{N}. (65)

By defining

cν\displaystyle c_{\nu} 2λmin(Hν)(1λν),\displaystyle\coloneqq\dfrac{2}{\lambda_{\text{min}}(H^{\nu})(1-\lambda_{\nu})}, (66)
ψ~ν(𝐱)\displaystyle\tilde{\psi}_{\nu}({\bf x}) 4ψν(𝐱)+4λmax(Rν)supt|𝐰¯t1|2,\displaystyle\coloneqq 4\psi_{\nu}({\bf x})+4\lambda_{\text{max}}(R_{\nu})\sup_{t\in\mathbb{N}}|\bar{{\bf w}}_{t-1}|^{2}, (67)

and ψ~0(𝐱)\tilde{\psi}_{0}({\bf x}) by (67) with ν=0\nu=0, we consolidate our previous results as follows:

2E(x~tRνx~t)+4|𝐰¯t1|2sup|𝒛|=1ztRνztaνλνtψ~ν(𝐱)+cνsν\displaystyle 2E(\tilde{x}_{t}^{\top}R_{\nu}\tilde{x}_{t})+4|\bar{{\bf w}}_{t-1}|^{2}\sup_{|\bm{z}|=1}z_{t}^{\top}R_{\nu}z_{t}\leq a_{\nu}\lambda_{\nu}^{t}\tilde{\psi}_{\nu}({\bf x})+c_{\nu}s_{\nu} (68)

and

[xtd]Rxtd+2|𝐰¯t1|2sup|𝒛|=1ztRzta0λ0tψ~0(𝐱)\displaystyle[x_{t}^{d}]^{\top}Rx_{t}^{d}+2|\bar{{\bf w}}_{t-1}|^{2}\sup_{|\bm{z}|=1}z_{t}^{\top}Rz_{t}\leq a_{0}\lambda_{0}^{t}\tilde{\psi}_{0}({\bf x}) (69)

for every tt\in\mathbb{N}. The statements (60), (68), and (69) provide the following (almost final) bound for any tt\in\mathbb{N}:

ρν(ψ(xt))\displaystyle\rho_{\nu}(\psi(x_{t})) aνλνtψ~ν(𝐱)+4ν|R1/2γt|a0λ0tψ~0(𝐱)Exponentially decreasing terms\displaystyle\leq\underbrace{a_{\nu}\lambda_{\nu}^{t}\tilde{\psi}_{\nu}({\bf x})+4\nu|R^{1/2}\gamma_{t}|\sqrt{a_{0}\lambda_{0}^{t}\tilde{\psi}_{0}({\bf x})}}_{\text{Exponentially decreasing terms}} (70)
+cνsν+2𝐰¯t1Rν𝐰¯t1+4νγtR𝐰¯t1+tBias terms.\displaystyle\quad\;\underbrace{+\;c_{\nu}s_{\nu}+2\bar{{\bf w}}_{t-1}^{\top}R_{\nu}\bar{{\bf w}}_{t-1}+4\nu\gamma_{t}^{\top}R\bar{{\bf w}}_{t-1}+\mathbb{C}_{t}}_{\text{Bias terms}}.

Next, we use (50), the definition Rν,tR+4νRΣtRR_{\nu,t}\coloneqq R+4\nu R\Sigma_{t}R, and the independence of dtd_{t} and ht1h_{t-1} for any tt\in\mathbb{N} to derive

t\displaystyle\mathbb{C}_{t} =νvar(dtRdt)+E(dtRdt)E(dtRdt)\displaystyle=\nu{\textbf{var}}(d_{t}^{\top}Rd_{t})+E(d_{t}^{\top}Rd_{t})-E(d_{t}^{\top}Rd_{t})
4νE(dtRΣtRdt)\displaystyle\quad\quad-4\nu E(d_{t}^{\top}R\Sigma_{t}Rd_{t})
=E(dtRdt)+νvar(dtRdt)Noise mean-varianceE(dtRν,tdt)\displaystyle=\underbrace{E(d_{t}^{\top}Rd_{t})+\nu{\textbf{var}}(d_{t}^{\top}Rd_{t})}_{\text{\text{Noise mean-variance}}}-E(d_{t}^{\top}R_{\nu,t}d_{t})
=ρν(ψ(dt))E(dtRν,tdt).\displaystyle=\rho_{\nu}(\psi(d_{t}))-E(d_{t}^{\top}R_{\nu,t}d_{t}). (71)

Finally, we use |R1/2γt|λmax(R)|γt||R^{1/2}\gamma_{t}|\leq\sqrt{\lambda_{\text{max}}(R)}|\gamma_{t}|, (70), and (71) to derive

ρν(ψ(xt))\displaystyle\hskip-1.00374pt\rho_{\nu}(\psi(x_{t}))
aνλνtψ~ν(𝐱)+4ν|γt|a0λ0tλmax(R)ψ~0(𝐱)Exponentially decreasing terms\displaystyle\leq\underbrace{a_{\nu}\lambda_{\nu}^{t}\tilde{\psi}_{\nu}({\bf x})+4\nu|\gamma_{t}|\sqrt{a_{0}\lambda_{0}^{t}\lambda_{\text{max}}(R)\tilde{\psi}_{0}({\bf x})}}_{\text{Exponentially decreasing terms}}
+cν(suptE(dtHRννdt))E(dtRν,tdt)Risk-neutral zero-mean noise bias\displaystyle\quad\quad+\underbrace{c_{\nu}\Big{(}\sup_{t\in\mathbb{N}}E(d_{t}^{\top}H_{R_{\nu}}^{\nu}d_{t})\Big{)}-E(d_{t}^{\top}R_{\nu,t}d_{t})}_{\text{Risk-neutral zero-mean noise bias}}
+ρν(ψ(dt))Noise risk+2𝐰¯t1Rν𝐰¯t1+4νγtR𝐰¯t1Noise mean and skewness terms\displaystyle\quad\quad\quad+\underbrace{\rho_{\nu}(\psi(d_{t}))}_{\text{Noise risk}}+\underbrace{2\bar{{\bf w}}_{t-1}^{\top}R_{\nu}\bar{{\bf w}}_{t-1}+4\nu\gamma_{t}^{\top}R\bar{{\bf w}}_{t-1}}_{\text{Noise mean and skewness terms}} (72)

for every tt\in\mathbb{N}, completing the proof.

Refer to caption
Refer to caption
Figure 1: Time trajectories (left) and respective empirical CDFs (right) of the noise energy under the nominal model and the two alternative realities described in Illustration 1. The plots also show the obtained risk-neutral and risk-aware noise energy estimates (straight horizontal & vertical lines).

Similarities and differences between Theorem 3.3 and Theorem 3.7 deserve some discussion. Both theorems show how the risk of the state energy and a risk-aware perception of a disturbance process can be decoupled, although the theorems quantify risk in distinct ways; compare (22) and (43). Theorem 3.3 applies to any real-valued coherent risk functional on q\mathcal{L}^{q}, while Theorem 3.7 applies to a mean-conditional-variance functional on 1\mathcal{L}^{1}. The offset parameters from the two theorems each reflect a particular perception of noise, which is induced by the chosen measure of risk; indeed, compare (21) and (41). Theorem 3.7 provides a rate parameter λν\lambda_{\nu}, which depends on ν\nu, where ν\nu is specific to the risk functional ρν\rho_{\nu}. However, the rate parameter from Theorem 3.3 is equivalent to the one in the risk-neutral setting. Later on, we will see that Theorem 3.7 provides insights about the behavior of a simple risk-aware myopic controller, which mitigates extreme peaks of the state energy in simulation (Section 4, Illustration 3). In contrast, how to develop a simple risk-aware controller in the setting of Theorem 3.3 is an open research question.

Ideally, we would like to optimize the parameters provided by Theorem 3.3 or Theorem 3.7 by choosing a suitable state-feedback matrix KK. For example, one can consider the problem of minimizing the offset parameter b=csupt0ϱ(wtHRwt)b=c\sup_{t\in\mathbb{N}_{0}}\varrho(w_{t}^{\top}H_{R}w_{t}) (20) subject to the assumptions of Theorem 3.3, where A=AˇBˇKA=\check{A}-\check{B}K, Aˇ\check{A} and Bˇ\check{B} are given matrices, and KK is to be chosen. This is a difficult risk-aware optimization problem to solve exactly, where the matrix variables HH and KK are coupled through AHRA=(AˇBˇK)HR(AˇBˇK)A^{\top}H_{R}A=(\check{A}-\check{B}K)^{\top}H_{R}(\check{A}-\check{B}K). Moreover, the rate parameter λ\lambda (20) should be minimized simultaneously, adding to the difficulty. Hence, we reserve investigations of such optimization problems for future work. The next section will provide examples to illuminate aspects of our theoretical developments.

4 Illustrative Examples

Illustration 1 (Risk of noise energy)

Theorem 3.3 indicates that the risk of the state energy xtRxtx_{t}^{\top}Rx_{t} of the system under study is of the order of the corresponding risk of the noise energy wtHRwtw_{t}^{\top}H_{R}w_{t} under some assumptions. (The term noise energy refers to any quadratic form wtMwtw_{t}^{\top}Mw_{t} with M𝒮n+M\in\mathcal{S}_{n}^{+}.) We consider for simplicity a one-dimensional, Schur stable linear system driven by independent and identically distributed noise wtw_{t} (i.e., a stable order-11 autoregressive process). This illustration will examine the risk of wt2w_{t}^{2}, which arises in the right side of bb^{\prime} (21) in a special case, compared to realizations of wt2w_{t}^{2} to demonstrate the need for and the usefulness of evaluating the noise energy from a risk-aware perspective.

Let us select the mean-upper-semideviation of order-qq (8) for this illustration. Note that whenever β[0,1]\beta\in[0,1] we obtain a coherent risk functional for every value of q1q\geq 1, with its dual representation (9) being true with the uncertainty set 𝒜{\cal A} being given by (12).

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 2: The time trajectories (top) and empirical CDFs (bottom) of the state energy |xt|2|x_{t}|^{2} and the respective control effort |ut|2|u_{t}|^{2} (where applicable) corresponding to no controller (utu_{t} is zero), the risk-neutral controller for μ=0.25\mu=0.25, and the risk-aware controller for μ=0.25\mu=0.25 and ν=10\nu=10 for the stochastic linear system described in Illustration 3. Despite the simplicity of the risk-aware controller, this controller can mitigate extreme peaks in the state energy without requiring extreme peaks in the control effort.

While we may assume a generic nominal noise model, e.g., wt𝒰[1,1]w_{t}\sim{\cal U}[-1,1], at the same time we may be uncertain about it. In such a case, we would like to obtain a robust estimate of the average energy of the noise with respect to potential distributional modeling uncertainty. This is precisely achieved by using a (coherent) risk functional to evaluate wtw_{t} under its nominal model (which arises from characterizing system stability through Theorem 3.3). To illustrate this, let us consider two alternative realities for wt2w_{t}^{2} defined by

wt,i2\displaystyle w_{t,i}^{2} wt2ξt,i,i{1,2},t0,\displaystyle\coloneqq w_{t}^{2}\xi_{t,i},\quad i\in\{1,2\},\quad t\in\mathbb{N}_{0}, (73)

where ξt,i\xi_{t,i} for i{1,2}i\in\{1,2\} are multiplicative random perturbation processes of the form

ξt,i=1+ηt,iE(ηt,i),i{1,2},\xi_{t,i}=1+\eta_{t,i}-E(\eta_{t,i}),\quad i\in\{1,2\}, (74)

where, for every t0t\in\mathbb{N}_{0},

ηt,1\displaystyle\eta_{t,1} 𝒰[0,1]and\displaystyle\sim{\cal U}[0,1]\quad\text{and} (75)
η2,t\displaystyle\eta_{2,t} =zt[0,1](zt)+>1(zt),\displaystyle=z_{t}\mathcal{I}_{[0,1]}(z_{t})+\mathcal{I}_{>1}(z_{t}), (76)

with zt=wt+1/4z_{t}=-w_{t}+1/4. In other words, each ξt,i\xi_{t,i} for i{1,2}i\in\{1,2\} belongs to the uncertainty set (12) associated with the order-qq mean-upper-semideviation risk functional (8) for every q1q\geq 1. Observe that ξt,1\xi_{t,1} is a uniform perturbation and independent of wtw_{t}, whereas ξt,2\xi_{t,2} is nonuniform and highly dependent on wtw_{t}. Still, both ξt,1\xi_{t,1} and ξt,2\xi_{t,2} are each independent over different values of tt (and thus the same happens for wt,i2,i{1,2}w^{2}_{t,i},i\in\{1,2\}, which is helpful in numerical simulation). It follows that, for β[0,1]\beta\in[0,1] and q1q\geq 1,

E(wt,i2)supξ𝒜E(wt2ξ)=MUSq,β(wt2),i{1,2},E(w_{t,i}^{2})\leq\sup_{\xi\in{\cal A}}E(w_{t}^{2}\xi)=\mathrm{MUS}_{q,\beta}(w_{t}^{2}),\quad i\in\{1,2\}, (77)

illustrating that the mean-upper-semideviation risk functional (which is just one example) can provide worst-case risk-neutral estimates of the noise energy over a particular but rich class of possible realities, including the ones described above.

Figure 1 illuminates the above facts by depicting the time trajectories and corresponding empirical CDFs of the noise energy in all three cases. Figure 1 also shows the risk-neutral estimate (β=0\beta=0) and two risk-aware estimates (β=1\beta=1, q=2q=2 and β=1\beta=1, q=12q=12). We have computed the risk-aware estimates empirically by using the primal representation of mean-upper-semideviation (8) under the nominal noise model (that is, no information about any possible alternative reality is needed). As anticipated, the risk-neutral estimate (expectation) masks the potential dispersion of the noise energy (in fact, in every case), with the issue being more pronounced for the two alternative realities, and especially for the alternative reality whose noise distribution exhibits a fatter tail. In contrast, the risk-aware estimates are biased toward more extreme noise energy realizations (especially in the case of q=12q=12), which equivalently (via risk duality) provide uniformly cautious estimates for a variety of possible realities, including wt,12w_{t,1}^{2} and wt,22w_{t,2}^{2}, due to the richness of the uncertainty set (12). These facts are readily apparent by observing the noise energy trajectories (along with the horizontal lines) and the empirical CDFs (along with the vertical lines) in Figure 1.

Illustration 2 (Trade-offs related to κ\kappa)

Here, we will illustrate trade-offs related to the parameter κ\kappa from Theorem 3.3. Let us consider the case of η<1\eta<1, where the matrices RR, AA, and HH have been chosen. The parameters λ\lambda and cc depend on an adjustable parameter κ(0,1)\kappa\in(0,1). Since λ=1κη\lambda=1-\kappa\eta (20), where η(0,1)\eta\in(0,1) is determined by RR, AA, and HH, choosing κ\kappa to be closer to 1 would provide faster decay. However, we would also like cc to be small to reduce noise effects. (Recall that b=cbb=cb^{\prime} (20), where bb^{\prime} (21) is the supremum of the risk of the noise energy wtHRwtw_{t}^{\top}H_{R}w_{t}.) There is a nonlinear relationship between cc and κ\kappa, that is, by direct substitution, we obtain

c=1κηλmin(H)κη(ηκη).c=\frac{1-\kappa\eta}{\lambda_{\text{min}}(H)\;\kappa\eta\;(\eta-\kappa\eta)}. (78)

Figure 3 depicts cλmin(H)c\lambda_{\text{min}}(H) versus κ\kappa for several values of η(0,1)\eta\in(0,1), illustrating the effect of η\eta on the minimum value of cλmin(H)c\lambda_{\text{min}}(H) with respect to κ\kappa.

Refer to caption
Figure 3: Plots of cλmin(H)c\lambda_{\text{min}}(H) versus κ(0,1)\kappa\in(0,1) on a semilog scale for several values of η(0,1)\eta\in(0,1) (see Theorem 3.3 in the case of η<1\eta<1).
Illustration 3 (Insights about controllers using Theorem 3.7)

An exciting potential use case of the risk-aware stability bounds we have developed is to inform the design of risk-aware controllers. The mean-conditional-variance functional studied in Theorem 3 turns out to be particularly useful for this purpose, mainly due to its quadratic form. To demonstrate this, suppose we are given a linear system of the form

xt+1=Aˇxt+Bˇut+wˇt,t0,x_{t+1}=\check{A}x_{t}+\check{B}u_{t}+\check{w}_{t},\quad t\in\mathbb{N}_{0}, (79)

where, for simplicity, we assume that the noise process wˇt\check{w}_{t} is stationary but with possibly nontrivial dispersive behavior (to be described). For each t0t\in\mathbb{N}_{0}, we consider the myopic (one-step-ahead) regularized optimal control problem

minimizeutmρν(ψ(xt+1)|xt)+μutut,μ0,\begin{array}[]{rl}\underset{u_{t}\in\mathbb{R}^{m}}{\mathrm{minimize}}&\rho_{\nu}(\psi(x_{t+1})|x_{t})+\mu u_{t}^{\top}u_{t},\quad\mu\geq 0,\end{array} (80)

where ρν(|xt)\rho_{\nu}(\cdot|x_{t}) denotes the conditional version of ρν\rho_{\nu} (note that ρν\rho_{\nu} admits an expectation representation, see the first line of (51)), when the state xtx_{t} at time tt is given. This problem is equivalent to the quadratic program

minimizeutmE(xt+1Rνxt+1+4νxt+1Rγ|xt)+μutut,\hskip-1.00374pt\hskip-1.00374pt\hskip-1.00374pt\hskip-1.00374pt\hskip-1.00374pt\hskip-1.00374pt\hskip-1.00374pt\hskip-1.00374pt\hskip-1.00374pt\hskip-1.00374pt\begin{array}[]{rl}\underset{u_{t}\in\mathbb{R}^{m}}{\mathrm{minimize}}&\hskip-1.00374pt\hskip-1.00374pt\hskip-1.00374pt\hskip-1.00374pt\hskip-1.00374pt\hskip-1.00374ptE(x_{t+1}^{\top}R_{\nu}x_{t+1}\hskip-1.00374pt+\hskip-1.00374pt4\nu x_{t+1}^{\top}R\gamma|x_{t})\hskip-1.00374pt+\hskip-1.00374pt\mu u_{t}^{\top}u_{t},\end{array}\hskip-1.00374pt\hskip-1.00374pt\hskip-1.00374pt\hskip-1.00374pt\hskip-1.00374pt\hskip-1.00374pt (81)

where γt=γ\gamma_{t}=\gamma and Σt=Σu\Sigma_{t}=\Sigma_{u} for every tt. Provided that μIn+BˇRνBˇ\mu I_{n}+\check{B}^{\top}R_{\nu}\check{B} is invertible, the solution is

utν,μ=KνμxtTνμγ,t0,u_{t}^{\nu,\mu}=-K_{\nu}^{\mu}x_{t}-T_{\nu}^{\mu}\gamma,\quad t\in\mathbb{N}_{0}, (82)

where the inflated gain KνμK_{\nu}^{\mu} and tail bias TνμT_{\nu}^{\mu} are given by

Kνμ\displaystyle K_{\nu}^{\mu} (μIn+BˇRνBˇ)1BˇRνAˇand\displaystyle\coloneqq(\mu I_{n}+\check{B}^{\top}R_{\nu}\check{B})^{-1}\check{B}^{\top}R_{\nu}\check{A}\quad\text{and} (83)
Tνμ\displaystyle T_{\nu}^{\mu} 2ν(μIn+BˇRνBˇ)1BˇR.\displaystyle\coloneqq 2\nu(\mu I_{n}+\check{B}^{\top}R_{\nu}\check{B})^{-1}\check{B}^{\top}R. (84)

Observe that, when ν=0\nu=0, we obtain the standard myopic linear-quadratic-regulator controller ut0,μ=K0μxtu_{t}^{0,\mu}=-K_{0}^{\mu}x_{t}. Also, note that the risk-aware controller utν,μu_{t}^{\nu,\mu} is distinct from the controller recently developed in [44], the latter being non-myopic and derived via stochastic dynamic programming.

Refer to caption
Figure 4: For different values of ν\nu, we show the rate-ratio (top) and bias-ratio (bottom) achieved by using the risk-aware controller relative to the risk-neutral controller for the stochastic linear system described in Illustration 3. (λ\lambda-ratio means rate-ratio. bb-ratio means bias-ratio.) The bias-ratio plot indicates that the risk-aware controller exhibits a disturbance attenuation effect that becomes more pronounced as ν\nu increases and saturates near ν=10\nu=10.

By substituting the stationary controller (82) into (79), we obtain the closed-loop system

xt+1\displaystyle x_{t+1} =(AˇBˇKνμ)xt+(wˇtBˇTνμγ)\displaystyle=(\check{A}-\check{B}K_{\nu}^{\mu})x_{t}+(\check{w}_{t}-\check{B}T_{\nu}^{\mu}\gamma)
Axt+wt,t0,\displaystyle\eqqcolon Ax_{t}+w_{t},\quad t\in\mathbb{N}_{0}, (85)

where, in contrast to the risk-neutral case (ν=0\nu=0), the risk-aware controller not only regulates the state (in a risk-aware sense through the inflated gain KνμK_{\nu}^{\mu}, driving the state away from directions with high second-order variability, as captured by RνR_{\nu}), but also shifts the noise by a quantity proportional to its third-order moment behavior, as captured by the statistic γ\gamma. Essentially, the risk-aware controller optimally “de-biases” the skewed behavior of wˇt\check{w}_{t} by simply shifting its mean. This implies that the shifted noise wtw_{t} has the same central statistics (in particular, γ\gamma, δ\delta, and covariance) as the original noise wˇt\check{w}_{t}.

Figure 2 shows the time trajectories and empirical CDFs of the state energy and the respective control effort (where applicable) corresponding to no controller (utu_{t} is zero), the risk-neutral controller for μ=0.25\mu=0.25, and the risk-aware controller for μ=0.25\mu=0.25 and ν=10\nu=10, for a Schur stable system of the form of (79) by choosing

Aˇ=[0.80.400.8],Bˇ=[01],andR=I2,\check{A}=\begin{bmatrix}0.8&0.4\\ 0&-0.8\end{bmatrix},\quad\check{B}=\begin{bmatrix}0\\ 1\end{bmatrix},\quad\text{and}\quad R=I_{2}, (86)

and where wˇt\check{w}_{t} follows a Gaussian mixture distribution

wˇtiid0.7×𝒩(02,I2)+0.3×𝒩([2  15],10I2),t0.\check{w}_{t}\overset{\text{iid}}{\sim}0.7\times{\cal N}(0_{2},I_{2})+0.3\times{\cal N}([2\,\,15]^{\top},10I_{2}),\,\,t\in\mathbb{N}_{0}. (87)

In other words, for 70%70\% of the time, the additive disturbance to the system is standard Gaussian, while for the remaining 30%30\% of the time, the system exhibits abrupt Gaussian shocks with large mean and variance in both state coordinates.

We observe that, for the same level of control regularization (μ=0.25\mu=0.25), the risk-aware controller offers a dramatic improvement on the state energy |xt|2|x_{t}|^{2} in terms of stabilizing its statistical variability (i.e., risky behavior) as compared to its risk-neutral counterpart, and it is particularly effective in mitigating the (more infrequent) shocks due to the highly dispersive noise wˇt\check{w}_{t}. Additionally, the control effort |ut|2|u_{t}|^{2} of the risk-aware controller is significantly smaller in magnitude, however less sparse and more persistent as compared to the risk-neutral controller. This is explained due to the strategically designed affine form of the risk-aware controller, which provides increased degrees of freedom, allowing more effective state regulation.

Interestingly, the risk-aware controller utν,μu_{t}^{\nu,\mu} exhibits disturbance attenuation behavior in the sense of the risk functional ρν\rho_{\nu}. Indeed, such an effect can be readily observed through an empirical application of Theorem 3, which also reveals the usefulness of Theorem 3 for informing and evaluating different controller designs. To demonstrate this, in Figure 4 we report, for each value of ν\nu, the ratios of the biases bνb_{\nu} (respectively, decay rates λν\lambda_{\nu}) achieved by using the risk-aware controller utν,μu_{t}^{\nu,\mu} over the risk-neutral controller ut0,μu_{t}^{0,\mu}. From Figure 4 (bottom), we observe that, as ν\nu increases, the corresponding bias-ratio consolidates sharply to a value roughly equal to 0.60.6. In the context of Theorem 3, this finding implies a reduction in the bias term bνb_{\nu} of our stability bound as much as 40%40\% (roughly achieved for ν=10\nu=10) compared to the bias achieved by the standard risk-neutral controller. Such a reduction demonstrates a drastic disturbance attenuation effect exhibited by the risk-aware controller utν,μu_{t}^{\nu,\mu}. Lastly, we observe a similar decreasing trend for the corresponding rate-ratio (see Figure 4 (top)). Hence, the risk-aware controller also improves (albeit slightly) the decay rate of the exponentially decreasing terms of the bound of Theorem 3.

5 Conclusion

By investigating a generalized risk-aware stability viewpoint, we have discovered conditions that guarantee new risk-aware noise-to-state stability properties for linear systems. In the case of any real-valued coherent risk functional on q\mathcal{L}^{q}, the long-term risk of the state energy is on the same order as the supremum of the risk of the noise energy (23). In the case of a mean-conditional-variance functional, a similar relationship appears in (44), and additionally, the noise-dependent bias term bνb_{\nu} can be attenuated by a simple risk-aware controller (Illustration 3). In the future, we plan to investigate extensions to nonlinear systems, including systems with incomplete or misspecified models using statistical learning techniques. We are particularly excited about extending the risk-aware stability theory developed here to enrich the analysis and design of stochastic gradient descent algorithms.

Appendix

The following lemma can be viewed as a corollary of the classical discrete-time Lyapunov stability theorem for deterministic linear systems [59, Th. 7.3.2].

Lemma 5.10.

Let R𝒮n+R\in\mathcal{S}_{n}^{+} and An×nA\in\mathbb{R}^{n\times n} be given. The existence of an H𝒮n+H\in\mathcal{S}_{n}^{+} such that HRAHRA𝒮n+H_{R}-A^{\top}H_{R}A\in\mathcal{S}_{n}^{+} is equivalent to AA being Schur stable.

Proof 5.11.

We will show one direction. (The omitted direction uses HR𝒮n+H_{R}\in\mathcal{S}_{n}^{+} and involves left-multiplying by vv^{*} and right-multiplying by vv, where vnv\in\mathbb{C}^{n} is an eigenvector of AA.) Now, AA is Schur stable if and only if, for any Q𝒮n+Q\in\mathcal{S}_{n}^{+}, there is a unique X𝒮n+X\in\mathcal{S}_{n}^{+} such that XAXA=QX-A^{\top}XA=Q [59, Th. 7.3.2]. Assume that AA is Schur stable, and let Q𝒮n+Q\in\mathcal{S}_{n}^{+} be given. Since R12R^{\frac{1}{2}} is nonsingular, consider H(R12)XR12H\coloneqq(R^{-\frac{1}{2}})^{\top}XR^{-\frac{1}{2}}. Since R12R^{-\frac{1}{2}} is nonsingular and X𝒮n+X\in\mathcal{S}_{n}^{+}, H𝒮n+H\in\mathcal{S}_{n}^{+} holds as well. Moreover, since HR=XH_{R}=X, we have shown the existence of an H𝒮n+H\in\mathcal{S}_{n}^{+} such that HRAHRA𝒮n+H_{R}-A^{\top}H_{R}A\in\mathcal{S}_{n}^{+}.

Lemma 5.12.

Let R𝒮n+R\in\mathcal{S}_{n}^{+} and An×nA\in\mathbb{R}^{n\times n} be given. Suppose that there is an H𝒮n+H\in\mathcal{S}_{n}^{+} such that HRAHRA𝒮n+H_{R}-A^{\top}H_{R}A\in\mathcal{S}_{n}^{+}. Then, ηλmin(HRAHRA)λmax(HR)(0,1]\eta\coloneqq\frac{\lambda_{\text{min}}(H_{R}-A^{\top}H_{R}A)}{\lambda_{\text{max}}(H_{R})}\in(0,1].

Proof 5.13.

The property η(0,)\eta\in(0,\infty) holds because HRAHRA𝒮n+H_{R}-A^{\top}H_{R}A\in\mathcal{S}_{n}^{+} and HR𝒮n+H_{R}\in\mathcal{S}_{n}^{+}. If

λmax(HRAHRA)λmax(HR),\lambda_{\text{max}}(H_{R}-A^{\top}H_{R}A)\leq\lambda_{\text{max}}(H_{R}), (88)

then η1\eta\leq 1, so it suffices to show (88). Since AHRA𝒮nA^{\top}H_{R}A\in\mathcal{S}_{n}, the inequality (88) follows from the eigenvalue monotonicity theorem [57, Cor. 4.3.12].

Lemma 5.14.

For any M𝒮nM\in\mathcal{S}_{n}, yny\in\mathbb{R}^{n}, and znz\in\mathbb{R}^{n}, and for every ε(0,)\varepsilon\in(0,\infty), it is true that

(y+z)M(y+z)(1+ε)yMy+(1+1ε)zMz.\textstyle(y+z)^{\top}M(y+z)\leq(1+\varepsilon)y^{\top}My+(1+\frac{1}{\varepsilon})z^{\top}Mz. (89)

Lemma 5.14 is standard, and so we omit the proof.

Lemma 5.15.

Let 𝒜\mathcal{A} be the family of densities in the dual representation (9) of a real-valued coherent risk functional ϱ\varrho on q\mathcal{L}^{q} with q[1,)q\in[1,\infty). Consider the linear system (19) described in the first paragraph of Section 3, and assume that |wt|2q|w_{t}|^{2}\in\mathcal{L}^{q} for every t0t\in\mathbb{N}_{0}.

  1. 1.

    For any t0t\in\mathbb{N}_{0}, ξ𝒜\xi\in\mathcal{A}, and M𝒮nM\in\mathcal{S}_{n}, xtMxtqx_{t}^{\top}Mx_{t}\in\mathcal{L}^{q} and xtMxtξ1x_{t}^{\top}Mx_{t}\xi\in\mathcal{L}^{1}, and these functions are a.e.-nonnegative.

  2. 2.

    Given H𝒮n+H\in\mathcal{S}_{n}^{+} and R𝒮n+R\in\mathcal{S}_{n}^{+}, define ψ(z)zRz\psi(z)\coloneqq z^{\top}Rz and v(z)zHRzv(z)\coloneqq z^{\top}H_{R}z for every znz\in\mathbb{R}^{n}. For every ξ𝒜\xi\in\mathcal{A} and t0t\in\mathbb{N}_{0}, the statement (29) holds.

Proof 5.16.

Part 1: xtMxtx_{t}^{\top}Mx_{t} is a sum of finitely many (,)(\mathcal{F},\mathcal{B}_{\mathbb{R}})-measurable functions, and so it is (,)(\mathcal{F},\mathcal{B}_{\mathbb{R}})-measurable. The property of |xt|2q|x_{t}|^{2}\in\mathcal{L}^{q} for every t0t\in\mathbb{N}_{0} holds by induction, where we use |wt|2q|w_{t}|^{2}\in\mathcal{L}^{q} and xt+1=Axt+wtx_{t+1}=Ax_{t}+w_{t} with x0x_{0} being fixed. In addition, we use

0|xt+1(ω)|22λmax(AA)|xt(ω)|2+2|wt(ω)|20\leq|x_{t+1}(\omega)|^{2}\leq 2\lambda_{\text{max}}(A^{\top}A)|x_{t}(\omega)|^{2}+2|w_{t}(\omega)|^{2} (90)

for every ωΩ\omega\in\Omega (see Lemma 5.14) and yyγy\mapsto y^{\gamma} being nondecreasing on [0,)[0,\infty) for any γ(0,)\gamma\in(0,\infty) to find

|xt+1|2q 2λmax(AA)|xt|2+2|wt|2q.\big{\|}\,|x_{t+1}|^{2}\,\big{\|}_{q}\leq\big{\|}\,2\lambda_{\text{max}}(A^{\top}A)|x_{t}|^{2}+2|w_{t}|^{2}\,\big{\|}_{q}. (91)

The property of xtMxtqx_{t}^{\top}Mx_{t}\in\mathcal{L}^{q} and xtMxtx_{t}^{\top}Mx_{t} being nonnegative follow from 0xtMxtλmax(M)|xt|20\leq x_{t}^{\top}Mx_{t}\leq\lambda_{\text{max}}(M)|x_{t}|^{2}, yyγy\mapsto y^{\gamma} being nondecreasing on [0,)[0,\infty) for any γ(0,)\gamma\in(0,\infty), |xt|2q|x_{t}|^{2}\in\mathcal{L}^{q}, and λmax(M)|xt|2q=λmax(M)|xt|2q\big{\|}\lambda_{\text{max}}(M)|x_{t}|^{2}\big{\|}_{q}=\lambda_{\text{max}}(M)\big{\|}|x_{t}|^{2}\big{\|}_{q}.

Now, ξ𝒜\xi\in\mathcal{A} implies that ξ0\xi\geq 0 a.e. and ξq\xi\in\mathcal{L}^{q*}. Hence, xtMxtξ0x_{t}^{\top}Mx_{t}\xi\geq 0 a.e. and xtMxtξ1xtMxtqξq<\|x_{t}^{\top}Mx_{t}\xi\|_{1}\leq\|x_{t}^{\top}Mx_{t}\|_{q}\,\|\xi\|_{q*}<\infty [58, Hölder’s Inequality, Th. 6.8 (a)].

Part 2: Since H𝒮n+H\in\mathcal{S}_{n}^{+},

0λmin(H)|z|2zHzλmax(H)|z|2,zn.0\leq\lambda_{\text{min}}(H)|z|^{2}\leq z^{\top}Hz\leq\lambda_{\text{max}}(H)|z|^{2},\quad z\in\mathbb{R}^{n}. (92)

Given yny\in\mathbb{R}^{n}, consider z=R12yz=R^{\frac{1}{2}}y in (92) to find

0λmin(H)yRyyHRyλmax(H)yRy,0\leq\lambda_{\text{min}}(H)y^{\top}Ry\leq y^{\top}H_{R}y\leq\lambda_{\text{max}}(H)y^{\top}Ry, (93)

where we use R=(R12)R12R=(R^{\frac{1}{2}})^{\top}R^{\frac{1}{2}} and HR=(R12)HR12H_{R}=(R^{\frac{1}{2}})^{\top}HR^{\frac{1}{2}}. We substitute ψ(y)=yRy\psi(y)=y^{\top}Ry and v(y)=yHRyv(y)=y^{\top}H_{R}y to find

0λmin(H)ψ(y)v(y)λmax(H)ψ(y).0\leq\lambda_{\text{min}}(H)\psi(y)\leq v(y)\leq\lambda_{\text{max}}(H)\psi(y). (94)

Now, let ξ𝒜\xi\in\mathcal{A} and t0t\in\mathbb{N}_{0} be given. Since (94) holds for every yny\in\mathbb{R}^{n} and since ξ0\xi\geq 0 a.e.,

0λmin(H)ψ(xt)ξv(xt)ξλmax(H)ψ(xt)ξa.e.0\leq\lambda_{\text{min}}(H)\psi(x_{t})\xi\leq v(x_{t})\xi\leq\lambda_{\text{max}}(H)\psi(x_{t})\xi\quad\text{a.e.} (95)

The statement (29) follows from (95) and basic integration properties [52, Th. 1.5.9, see also p. 47].

Lemma 5.17.

Suppose that λ(0,1)\lambda\in(0,1), ν[0,)\nu\in[0,\infty), and (st)t0(s_{t})_{t\in\mathbb{N}_{0}} is a sequence in [0,)[0,\infty) such that st+1λst+νs_{t+1}\leq\lambda s_{t}+\nu for every t0t\in\mathbb{N}_{0}. Then, 0stλts0+ν1λ0\leq s_{t}\leq\lambda^{t}s_{0}+\frac{\nu}{1-\lambda} for every tt\in\mathbb{N}.

Proof 5.18.

By induction, 0stλts0+νi=0t1λi0\leq s_{t}\leq\lambda^{t}s_{0}+\nu\sum_{i=0}^{t-1}\lambda^{i} for every tt\in\mathbb{N}. Since λ(0,1)\lambda\in(0,1) and ν[0,)\nu\in[0,\infty), we use the geometric series formula to find that νi=0t1λiν1λ\nu\sum_{i=0}^{t-1}\lambda^{i}\leq\frac{\nu}{1-\lambda}.

Lemma 5.19.

Let R𝒮n+R\in\mathcal{S}_{n}^{+} and An×nA\in\mathbb{R}^{n\times n} be given. Suppose that there is an H𝒮n+H\in\mathcal{S}_{n}^{+} such that HRAHRA𝒮nH_{R}-A^{\top}H_{R}A\in\mathcal{S}_{n}. Recall that ηλmin(HRAHRA)/λmax(HR)\eta\coloneqq\lambda_{\text{min}}(H_{R}-A^{\top}H_{R}A)/\lambda_{\text{max}}(H_{R}). Then, η[0,1]\eta\in[0,1] and 0zAHRAz(1η)zHRz0\leq z^{\top}A^{\top}H_{R}Az\leq(1-\eta)z^{\top}H_{R}z for every znz\in\mathbb{R}^{n}.

Proof 5.20.

Since HRAHRA𝒮nH_{R}-A^{\top}H_{R}A\in\mathcal{S}_{n} and AHRA𝒮nA^{\top}H_{R}A\in\mathcal{S}_{n},

0λmin(HRAHRA)λmax(HR)0\leq\lambda_{\text{min}}(H_{R}-A^{\top}H_{R}A)\leq\lambda_{\text{max}}(H_{R}) (96)

by applying the eigenvalue monotonicity theorem [57, Cor. 4.3.12]. Since HR𝒮n+H_{R}\in\mathcal{S}_{n}^{+}, we divide (96) by λmax(HR)(0,)\lambda_{\text{max}}(H_{R})\in(0,\infty) to find 0η10\leq\eta\leq 1. Now, let znz\in\mathbb{R}^{n} be given. We multiply the inequalities 0zHRzλmax(HR)|z|20\leq z^{\top}H_{R}z\leq\lambda_{\text{max}}(H_{R})|z|^{2} by η\eta to find

0ηzHRzλmin(HRAHRA)|z|2.0\leq\eta z^{\top}H_{R}z\leq\lambda_{\text{min}}(H_{R}-A^{\top}H_{R}A)|z|^{2}. (97)

We use (97) and HRAHRA𝒮nH_{R}-A^{\top}H_{R}A\in\mathcal{S}_{n} to derive

ηzHRzz(HRAHRA)z.\eta z^{\top}H_{R}z\leq z^{\top}(H_{R}-A^{\top}H_{R}A)z. (98)

Then, we use AHRA𝒮nA^{\top}H_{R}A\in\mathcal{S}_{n}, and we add and subtract by zHRzz^{\top}H_{R}z to find

0zAHRAz\displaystyle 0\leq z^{\top}A^{\top}H_{R}Az =z(HRAHRA)z+zHRz\displaystyle=-z^{\top}(H_{R}-A^{\top}H_{R}A)z+z^{\top}H_{R}z
ηzHRz+zHRz.\displaystyle\leq-\eta z^{\top}H_{R}z+z^{\top}H_{R}z. (99)

We have applied (98) in the last line to complete the proof.

Lemma 5.21.

Let the assumptions of Theorem 3.3 hold, and suppose that η<1\eta<1. Then, E(v(xt+1)ξ)λE(v(xt)ξ)+λλ(1η)bE(v(x_{t+1})\xi)\leq\lambda E(v(x_{t})\xi)+\frac{\lambda}{\lambda-(1-\eta)}b^{\prime} for every ξ𝒜\xi\in\mathcal{A} and t0t\in\mathbb{N}_{0}.

Proof 5.22.

Let ξ𝒜\xi\in\mathcal{A} and t0t\in\mathbb{N}_{0} be given, and recall that ξ0\xi\geq 0 almost everywhere. We use v(xt+1)=xt+1HRxt+1v(x_{t+1})=x_{t+1}^{\top}H_{R}x_{t+1}, HR𝒮n+H_{R}\in\mathcal{S}_{n}^{+}, xt+1=Axt+wtx_{t+1}=Ax_{t}+w_{t}, and Lemma 5.14 to derive

v(xt+1)(1+ε)xtAHRAxt+(1+1ε)wtHRwtv(x_{t+1})\leq(1+\varepsilon)x_{t}^{\top}A^{\top}H_{R}Ax_{t}+\bigg{(}1+\frac{1}{\varepsilon}\bigg{)}w_{t}^{\top}H_{R}w_{t} (100)

for any ε(0,)\varepsilon\in(0,\infty). Multiplying (100) by ξ\xi, taking expectations, and using that E(wtHRwtξ)bE(w_{t}^{\top}H_{R}w_{t}\xi)\leq b^{\prime} from (21) lead to

E(v(xt+1)ξ)(1+ε)E(xtAHRAxtξ)+(1+1ε)b.E(v(x_{t+1})\xi)\leq(1+\varepsilon)E(x_{t}^{\top}A^{\top}H_{R}Ax_{t}\xi)+\bigg{(}1+\frac{1}{\varepsilon}\bigg{)}b^{\prime}. (101)

Lemma 5.14 helps circumvent the issue of the cross term E(xtAHRwtξ)E(x_{t}^{\top}A^{\top}H_{R}w_{t}\xi) need not being zero. Due to (101) and λ=1κη\lambda=1-\kappa\eta (20) for any fixed κ(0,1)\kappa\in(0,1), it suffices to show that E(xtAHRAxtξ)(1η)E(v(xt)ξ)E(x_{t}^{\top}A^{\top}H_{R}Ax_{t}\xi)\leq(1-\eta)E(v(x_{t})\xi), which readily follows in particular from Lemma 5.19 and ξ\xi being nonnegative almost everywhere. To see this, note that

(1+ε)(1η)<1ε<η1η.(1+\varepsilon)(1-\eta)<1\iff\varepsilon<\frac{\eta}{1-\eta}. (102)

By choosing ε=(1κ)η1η\varepsilon=(1-\kappa)\frac{\eta}{1-\eta}, we obtain

λ=(1+(1κ)η1η)(1η)=1κη.\lambda=\bigg{(}1+(1-\kappa)\frac{\eta}{1-\eta}\bigg{)}(1-\eta)=1-\kappa\eta. (103)

Using the same value for ε\varepsilon produces the quantity multiplying bb^{\prime} in the statement of Lemma 5.21.

Lemma 5.23.

Consider the linear system (19), and let R𝒮n+R\in{\cal S}_{n}^{+} and ν[0,)\nu\in[0,\infty) be given. Consider the quadratic state energy function ψ(x)xRx\psi(x)\coloneqq x^{\top}Rx. Under the assumptions of Theorem 3.7, for every tt\in\mathbb{N}, the following equality holds

ρν(ψ(xt))=E(xt(R+4νRΣtR)xt+4νxtRγt)+t,\rho_{\nu}(\psi(x_{t}))=E(x_{t}^{\top}(R+4\nu R\Sigma_{t}R)x_{t}+4\nu x_{t}^{\top}R\gamma_{t})+\mathbb{C}_{t}, (104)

where tνδt4νtr((ΣtR)2)\mathbb{C}_{t}\coloneqq\nu\delta_{t}-4\nu\text{tr}((\Sigma_{t}R)^{2}), and ρν(ψ(xt))\rho_{\nu}(\psi(x_{t})) is finite.

Proof 5.24.

The proof is inspired by the proof of [44, Prop. 1] with additional steps to circumvent expressions of the form \infty-\infty. Let tt\in\mathbb{N} be given. The assumptions of Theorem 3.7 imply that E(wt1)=w¯t1nE(w_{t-1})=\bar{w}_{t-1}\in\mathbb{R}^{n}, E(dtdt)=Σt𝒮nE(d_{t}d_{t}^{\top})=\Sigma_{t}\in\mathcal{S}_{n}, γtn\gamma_{t}\in\mathbb{R}^{n}, and δt[0,)\delta_{t}\in[0,\infty). (We use standard measure-theoretic principles, including Hölder’s inequality [52, Th. 2.4.5] and Minkowski’s inequality [52, Th. 2.4.7]. We do not use supt0|𝐰¯t|2<\sup_{t\in\mathbb{N}_{0}}|\bar{{\bf w}}_{t}|^{2}<\infty, supt|γt|2<\sup_{t\in\mathbb{N}}|\gamma_{t}|^{2}<\infty, or suptδt<\sup_{t\in\mathbb{N}}\delta_{t}<\infty.) The conditional expectation χ^tE(xt|t1)\hat{\chi}_{t}\coloneqq E(x_{t}|\mathcal{F}_{t-1}) takes values in ¯n\bar{\mathbb{R}}^{n} and is (t1,¯n)(\mathcal{F}_{t-1},\mathcal{B}_{\bar{\mathbb{R}}^{n}})-measurable [52, Th. 6.4.3, Th. 1.5.8]. One should be cautious about using χ^tRχ^t\hat{\chi}_{t}^{\top}R\hat{\chi}_{t} because it may take the ill-defined form of \infty-\infty. Hence, we will alter χ^t\hat{\chi}_{t} on a set of measure one. Since E(xt)nE(x_{t})\in\mathbb{R}^{n}, χ^t\hat{\chi}_{t} is n\mathbb{R}^{n}-valued almost everywhere [52, Th. 6.5.4 (a), Th. 1.6.6 (a)], and thus, the set Bt{ωΩ:χ^t(ω)n}B_{t}\coloneqq\{\omega\in\Omega:\hat{\chi}_{t}(\omega)\in\mathbb{R}^{n}\} satisfies P(Bt)=1P(B_{t})=1. Moreover, Btt1B_{t}\in\mathcal{F}_{t-1} because χ^t\hat{\chi}_{t} is (t1,¯n)(\mathcal{F}_{t-1},\mathcal{B}_{\bar{\mathbb{R}}^{n}})-measurable. We define z^tBtχ^t\hat{z}_{t}\coloneqq\mathcal{I}_{B_{t}}\hat{\chi}_{t}. Hence, z^t\hat{z}_{t} is (t1,¯n)(\mathcal{F}_{t-1},\mathcal{B}_{\bar{\mathbb{R}}^{n}})-measurable, z^t(ω)n\hat{z}_{t}(\omega)\in\mathbb{R}^{n} for every ωΩ\omega\in\Omega, and z^t=χ^t\hat{z}_{t}=\hat{\chi}_{t} almost everywhere. We use properties of conditional expectations [52, Ch. 6.5], (t1,n)(\mathcal{F}_{t-1},\mathcal{B}_{\mathbb{R}^{n}})-measurability of xt1x_{t-1}, xt=Axt1+wt1x_{t}=Ax_{t-1}+w_{t-1}, and independence of wt1w_{t-1} and ht1h_{t-1} to find that z^t=xtdt\hat{z}_{t}=x_{t}-d_{t} almost everywhere. Moreover, we derive that E(xtRxt|t1)=z^tRz^t+tr(ΣtR)E(x_{t}^{\top}Rx_{t}|\mathcal{F}_{t-1})=\hat{z}_{t}^{\top}R\hat{z}_{t}+\text{tr}(\Sigma_{t}R) almost everywhere. Next, we apply the previous two statements, the definition of Δt\Delta_{t} (14) with Z=xtRxtZ=x_{t}^{\top}Rx_{t} and i=t1\mathcal{F}_{i}=\mathcal{F}_{t-1}, and z^tRz^t\hat{z}_{t}^{\top}R\hat{z}_{t} being finite to find that Δt=Δ~t\Delta_{t}=\tilde{\Delta}_{t} a.e., where Δ~t\tilde{\Delta}_{t} is defined by

Δ~ttr(ΣtR)2z^tRdtdtRdt.\tilde{\Delta}_{t}\coloneqq\text{tr}(\Sigma_{t}R)-2\hat{z}_{t}^{\top}Rd_{t}-d_{t}^{\top}Rd_{t}. (105)

In addition, all terms of Δ~t2\tilde{\Delta}_{t}^{2} are integrable. (In particular, note that z^t\hat{z}_{t} and dtd_{t} are independent and E(dt)=0nE(d_{t})=0_{n}.) Then, we use E(Δ~t2)=E(Δt2)E(\tilde{\Delta}_{t}^{2})=E(\Delta_{t}^{2}) and z^t=xtdt\hat{z}_{t}=x_{t}-d_{t} a.e. to find

E(Δt2)=4E(xtRΣtRxt)+4E(xtRγt)+δt4tr((ΣtR)2),\displaystyle E(\Delta_{t}^{2})\hskip-1.42262pt=\hskip-1.42262pt4E(x_{t}^{\top}R\Sigma_{t}Rx_{t})+4E(x_{t}^{\top}R\gamma_{t})+\delta_{t}-4\text{tr}((\Sigma_{t}R)^{2}), (106)

which is finite. Finally, using ρν(xtRxt)=E(xtRxt)+νE(Δt2)\rho_{\nu}(x_{t}^{\top}Rx_{t})=E(x_{t}^{\top}Rx_{t})+\nu E(\Delta_{t}^{2}) completes the proof.

Acknowledgement

M.P.C. would like to gratefully acknowledge Erick Mejia Uzeda for discussions.

References

  • [1] J. E. Bertram and P. E. Sarachik, “Stability of circuits with randomly time-varying parameters,” IRE Transactions on Circuit Theory, vol. 6, pp. 260–270, 1959.
  • [2] I. Y. Kats and N. N. Krasovskii, “On the stability of systems with random attributes,” Journal of Applied Mathematics and Mechanics, vol. 24, pp. 1225–1246, 1960, (translated).
  • [3] N. N. Krasovskii and E. A. Lidskii, “Analytical design of controllers in systems with random attributes I, II, III,” Automation and Remote Control, vol. 22, no. 9, pp. 1021–1025, no. 10, pp. 1141–1146, no. 11, pp. 1289–1294, 1961, (translated).
  • [4] H. Min, S. Xu, and Z. Zhang, “Adaptive finite-time stabilization of stochastic nonlinear systems subject to full-state constraints and input saturation,” IEEE Transactions on Automatic Control, vol. 66, no. 3, pp. 1306–1313, 2020.
  • [5] Y. Qin, M. Cao, and B. D. Anderson, “Lyapunov criterion for stochastic systems and its applications in distributed computation,” IEEE Transactions on Automatic Control, vol. 65, no. 2, pp. 546–560, 2020.
  • [6] L. Yao and W. Zhang, “New noise-to-state stability and instability criteria for random nonlinear systems,” International Journal of Robust and Nonlinear Control, vol. 30, no. 2, pp. 526–537, 2020.
  • [7] P. Wang, W. Guo, and H. Su, “Improved input-to-state stability analysis of impulsive stochastic systems,” IEEE Transactions on Automatic Control, vol. 67, no. 5, pp. 2161–2174, 2021.
  • [8] W. Li and M. Krstić, “Stochastic nonlinear prescribed-time stabilization and inverse optimality,” IEEE Transactions on Automatic Control, vol. 67, no. 3, pp. 1179–1193, 2021.
  • [9] R.-H. Cui and X.-J. Xie, “Finite-time stabilization of output-constrained stochastic high-order nonlinear systems with high-order and low-order nonlinearities,” Automatica, vol. 136, p. 110085, 2022.
  • [10] D. Kannan and V. Lakshmikantham, Eds., Handbook of Stochastic Analysis and Applications. New York, NY, USA: Marcel Dekker, Inc., 2002.
  • [11] K. Reif, S. Günther, E. Yaz, and R. Unbehauen, “Stochastic stability of the discrete-time extended Kalman filter,” IEEE Transactions on Automatic Control, vol. 44, no. 4, pp. 714–728, 1999.
  • [12] X. Dong, G. Battistelli, L. Chisci, and Y. Cai, “A variational Bayes moving horizon estimation adaptive filter with guaranteed stability,” Automatica, vol. 142, p. 110374, 2022.
  • [13] H. J. Kushner and G. G. Yin, Stochastic Approximation and Recursive Algorithms and Applications, 2nd ed. New York, NY, USA: Springer-Verlag, 1997, vol. 35.
  • [14] U. Köse and A. Ruszczyński, “Risk-averse learning by temporal difference methods with Markov risk measures,” Journal of Machine Learning Research, vol. 22, pp. 1–34, 2021.
  • [15] H. J. Kushner, Stochastic Stability and Control. New York, NY, USA: Academic Press, Inc., 1967.
  • [16] E. D. Sontag, “Smooth stabilization implies coprime factorization,” IEEE Transactions on Automatic Control, vol. 34, no. 4, pp. 435–443, 1989.
  • [17] M. Krstić and H. Deng, Stabilization of Nonlinear Uncertain Systems. London, U.K.: Springer-Verlag, 1998.
  • [18] J. Tsinias, “The concept of ‘exponential input to state stability’ for stochastic systems and applications to feedback stabilization,” Systems & Control Letters, vol. 36, no. 3, pp. 221–229, 1999.
  • [19] H. Deng, M. Krstić, and R. J. Williams, “Stabilization of stochastic nonlinear systems driven by noise of unknown covariance,” IEEE Transactions on Automatic Control, vol. 46, no. 8, pp. 1237–1253, 2001.
  • [20] L. Huang and X. Mao, “On input-to-state stability of stochastic retarded systems with Markovian switching,” IEEE Transactions on Automatic Control, vol. 54, no. 8, pp. 1898–1902, 2009.
  • [21] P. Zhao, W. Feng, and Y. Kang, “Stochastic input-to-state stability of switched stochastic nonlinear systems,” Automatica, vol. 48, no. 10, pp. 2569–2576, 2012.
  • [22] Z. Wu, “Stability criteria of random nonlinear systems and their applications,” IEEE Transactions on Automatic Control, vol. 60, no. 4, pp. 1038–1049, 2014.
  • [23] A. Shapiro, D. Dentcheva, and A. Ruszczyński, Lectures on Stochastic Programming: Modeling and Theory. Philadelphia, PA, USA: MPS-SIAM, 2009.
  • [24] P. Whittle, “Risk-sensitive linear/quadratic/Gaussian control,” Advances in Applied Probability, vol. 13, pp. 764–777, 1981.
  • [25] ——, Risk-sensitive Optimal Control. Hoboken, NJ, USA: John Wiley & Sons, Inc., 1990.
  • [26] ——, “A risk-sensitive maximum principle: The case of imperfect state observation,” IEEE Transactions on Automatic Control, vol. 36, no. 7, pp. 793–801, 1991.
  • [27] R. A. Howard and J. E. Matheson, “Risk-sensitive Markov decision processes,” Management Science, vol. 18, no. 7, pp. 356–369, 1972.
  • [28] D. Jacobson, “Optimal stochastic linear systems with exponential performance criteria and their relation to deterministic differential games,” IEEE Transactions on Automatic Control, vol. 18, no. 2, pp. 124–131, 1973.
  • [29] J. von Neumann and O. Morgenstern, Theory of Games and Economic Behavior. Princeton, NJ, USA: Princeton University Press, 1944.
  • [30] P. Artzner, F. Delbaen, J.-M. Eber, and D. Heath, “Coherent measures of risk,” Mathematical Finance, vol. 9, no. 3, pp. 203–228, 1999.
  • [31] R. T. Rockafellar and S. Uryasev, “Conditional value-at-risk for general loss distributions,” Journal of Banking & Finance, vol. 26, no. 7, pp. 1443–1471, 2002.
  • [32] A. Ruszczyński, “Risk-averse dynamic programming for Markov decision processes,” Mathematical Programming, vol. 125, no. 2, pp. 235–261, 2010.
  • [33] J. Moon and T. Başar, “Linear quadratic risk-sensitive and robust mean field games,” IEEE Transactions on Automatic Control, vol. 62, no. 3, pp. 1062–1077, 2016.
  • [34] N. Saldi, T. Başar, and M. Raginsky, “Approximate Markov-Nash equilibria for discrete-time risk-sensitive mean-field games,” Mathematics of Operations Research, vol. 45, no. 4, pp. 1596–1620, 2020.
  • [35] S. Singh, Y. Chow, A. Majumdar, and M. Pavone, “A framework for time-consistent, risk-sensitive model predictive control: Theory and algorithms,” IEEE Transactions on Automatic Control, vol. 64, no. 7, pp. 2905–2912, 2018.
  • [36] P. Sopasakis, D. Herceg, A. Bemporad, and P. Patrinos, “Risk-averse model predictive control,” Automatica, vol. 100, pp. 281–288, 2019.
  • [37] L. Lindemann, G. J. Pappas, and D. V. Dimarogonas, “Reactive and risk-aware control for signal temporal logic,” IEEE Transactions on Automatic Control, vol. 67, no. 10, pp. 5262–5277, 2022.
  • [38] S. Safaoui, L. Lindemann, I. Shames, and T. H. Summers, “Risk-bounded temporal logic control of continuous-time stochastic systems,” in 2022 American Control Conference (ACC), 2022, pp. 1555–1562.
  • [39] M. Ahmadi, X. Xiong, and A. D. Ames, “Risk-averse control via CVaR barrier functions: Application to bipedal robot locomotion,” IEEE Control Systems Letters, vol. 6, pp. 878–883, 2021.
  • [40] M. P. Chapman, R. Bonalli, K. M. Smith, I. Yang, M. Pavone, and C. J. Tomlin, “Risk-sensitive safety analysis using conditional value-at-risk,” IEEE Transactions on Automatic Control, to appear, doi: 10.1109/TAC.2021.3131149.
  • [41] M. P. Chapman, M. Fauß, and K. M. Smith, “On optimizing the conditional value-at-risk of a maximum cost for risk-averse safety analysis,” IEEE Transactions on Automatic Control, to appear, doi: 10.1109/TAC.2022.3195381.
  • [42] Y. Wang and M. P. Chapman, “Risk-averse autonomous systems: A brief history and recent developments from the perspective of optimal control,” Artificial Intelligence, p. 103743, 2022.
  • [43] M. Kishida, “Risk-aware stability, ultimate boundedness, and positive invariance,” arXiv preprint arXiv:2204.07329, 2022.
  • [44] A. Tsiamis, D. S. Kalogerias, L. F. Chamon, A. Ribeiro, and G. J. Pappas, “Risk-constrained linear-quadratic regulators,” in 2020 59th IEEE Conference on Decision and Control (CDC). IEEE, 2020, pp. 3040–3047.
  • [45] K. Glover and J. C. Doyle, “State-space formulae for all stabilizing controllers that satisfy an \mathcal{H}_{\infty}-norm bound and relations to risk sensitivity,” Systems & Control Letters, vol. 11, no. 3, pp. 167–172, 1988.
  • [46] M. R. James and J. S. Baras, “Robust \mathcal{H}_{\infty} output feedback control for nonlinear systems,” IEEE Transactions on Automatic Control, vol. 40, no. 6, pp. 1007–1017, 1995.
  • [47] M. R. James, J. S. Baras, and R. J. Elliott, “Risk-sensitive control and dynamic games for partially observed discrete-time nonlinear systems,” IEEE Transactions on Automatic Control, vol. 39, no. 4, pp. 780–792, 1994.
  • [48] Z. Pan and T. Başar, “Backstepping controller design for nonlinear stochastic systems under a risk-sensitive cost criterion,” SIAM Journal on Control and Optimization, vol. 37, no. 3, pp. 957–995, 1999.
  • [49] P. Dupuis, M. R. James, and I. Petersen, “Robust properties of risk-sensitive control,” Mathematics of Control, Signals and Systems, vol. 13, no. 4, pp. 318–332, 2000.
  • [50] D. Bernardini and A. Bemporad, “Stabilizing model predictive control of stochastic constrained linear systems,” IEEE Transactions on Automatic Control, vol. 57, no. 6, pp. 1468–1480, 2012.
  • [51] T. Morozan, “Stabilization of some stochastic discrete-time control systems,” Stochastic Analysis and Applications, vol. 1, no. 1, pp. 89–116, 1983.
  • [52] R. Ash, Real Analysis and Probability. New York, NY, USA: Academic Press, Inc., 1972.
  • [53] S. Sastry, Nonlinear Systems: Analysis, Stability, and Control. New York, NY, USA: Springer Science++Business Media, 1999.
  • [54] A. W. van der Vaart, Asymptotic Statistics. Cambridge, U.K.: Cambridge University Press, 1998.
  • [55] A. Shapiro, “Minimax and risk averse multistage stochastic programming,” European Journal of Operational Research, vol. 219, no. 3, pp. 719–726, 2012.
  • [56] S. P. Meyn and R. L. Tweedie, Markov Chains and Stochastic Stability. Springer-Verlag, 1993, accessed on July 19, 2022. [Online]. Available: http://probability.ca/MT/BOOK.pdf
  • [57] R. A. Horn and C. R. Johnson, Matrix Analysis, 2nd ed. Cambridge University Press, 1985.
  • [58] G. B. Folland, Real Analysis: Modern Techniques and Their Applications, 2nd ed. New York, NY, USA: John Wiley & Sons, Inc., 1999.
  • [59] B. N. Datta, Numerical Methods for Linear Control Systems Design and Analysis. San Diego, CA, USA: Elsevier Academic Press, 2004.