Stochastic receding horizon control with output feedback and bounded control inputs

Peter Hokayem , Eugenio Cinquemani , Debasish Chatterjee , Federico Ramponi and John Lygeros

Abstract.

We provide a solution to the problem of receding horizon control for stochastic discrete-time systems with bounded control inputs and imperfect state measurements. For a suitable choice of control policies, we show that the finite-horizon optimization problem to be solved on-line is convex and successively feasible. Due to the inherent nonlinearity of the feedback loop, a slight extension of the Kalman filter is exploited to estimate the state optimally in mean-square sense. We show that the receding horizon implementation of the resulting control policies renders the state of the overall system mean-square bounded under mild assumptions. Finally, we discuss how some of the quantities required by the finite-horizon optimization problem can be computed off-line, reducing the on-line computation, and present some numerical examples.

Key words and phrases:

Predictive control, output feedback, constrained control, state estimation, stochastic control

This research was partially supported by the Swiss National Science Foundation under grant 200021-122072 and by the European Commission under the project Feednetback FP7-ICT-223866 (www.feednetback.eu). This article was not presented at any IFAC meeting.

P. Hokayem, D. Chatterjee, F. Ramponi, and J. Lygeros are with the Automatic Control Laboratory, ETH Zürich, Physikstrasse 3, 8092 Zürich, Switzerland. E. Cinquemani is with the INRIA Grenoble-Rhône-Alpes. Corresponding author P. Hokayem Tel. +41 44 632 0403. Fax: +41 44 632 1211.

Email: {hokayem,chatterjee,ramponif,lygeros}@control.ee.ethz.ch,
Eugenio.Cinquemani@inria.fr

1. Introduction

A considerable amount of research effort has been devoted to deterministic receding horizon control, see, for example, [Mayne et al., 2000; Bemporad and Morari, 1999; Lazar et al., 2007; Maciejowski, 2001; Qin and Badgwell, 2003] and references therein. Currently we have readily available conclusive proofs of successive feasibility and stability of receding horizon control laws in the noise-free deterministic setting. These techniques can be readily extended to the robust case, i.e., whenever there is an additive noise of bounded nature entering the system. The counterpart for stochastic systems subject to process noise and imperfect state measurements and bounded control inputs, however, is still missing. The principal obstacle is posed by the fact that it may not be possible to determine an a priori bound on the support of the noise, e.g., whenever the noise is additive and Gaussian. This extra ingredient complicates both the stability and feasibility proofs manifold: the noise eventually drives the state outside any bounded set no matter how large the latter is taken to be, and employing any standard linear state feedback means that any hard bounds on the control inputs will eventually be violated.

In this article we propose a solution to the general receding horizon control problem for linear systems with noisy process dynamics, imperfect state information, and bounded control inputs. Both the process and measurement noise sequences are assumed to enter the system in an additive fashion, and we require that the designed control policies satisfy hard bounds. Periodically at times $t=0,N_{c},2N_{c},\cdots$ , where $N_{c}$ is the control horizon, a certain finite-horizon optimal control problem is solved for a prediction horizon $N$ . The underlying cost is the standard expectation of the sum of cost-per-state functions that are quadratic in the state and control inputs [Bertsekas, 2005]. We also impose some variance-like bounds on the predicted future states and inputs—this is one possible way to impose soft state-constraints that are in spirit similar to integrated chance-constraints, e.g., in [Klein Haneveld, 1983; Klein Haneveld and van der Vlerk, 2006].

There are several key challenges that are inherent to our setup. First, since state-information is incomplete and corrupt, the need for a filter to estimate the state from the corrupt state measurements naturally presents itself. Second, it is clear that in the presence of unbounded (e.g., Gaussian) noise, it is not in general possible to ensure any bound on the control values with linear state feedback alone (assuming complete state information is available)—the additive nature of the noise ensures that the states exit from any fixed bounded set at some time almost surely. We see at once that nonlinear feedback is essential, and this issue is further complicated by the fact that only incomplete and imperfect state-information is available. Third, it is unclear whether applicatino of the control policies stabilizes the system in any reasonable sense.

In the backdrop of these challenges, let us mention the main contributions of this article. We

(i)

provide a tractable method for designing, in a receding horizon fashion, bounded causal nonlinear feedback policies, and
(ii)

guarantee that applying the designed control policies in a receding horizon fashion renders the state of the closed-loop system mean-square bounded for any initial state statistics.

We elaborate on the two points above:

(i)

Tractability: Given a subclass of causal policies, we demonstrate that the underlying optimal control problem translates to a convex optimization problem to be solved every $N_{c}$ time steps, that this optimization problem is feasible for any statistics of the initial state, and equivalent to a quadratic program. The subclass of polices we employ is comprised of open-loop terms and nonlinear feedback from the innovations. Under the assumption that the process and measurement noise is Gaussian, (even though the bounded inputs requirement makes the problem inherently nonlinear and it may appear that standard Kalman filtering may not apply,) it turns out that Kalman filtering techniques can indeed be utilized. We provide a low-complexity algorithm (essentially similar to standard Kalman filtering) for updating this conditional density, and report tractable solutions for the off-line computation of the time-dependent optimization parameters.
(ii)

Stability: It is well known that for noise-free discrete-time linear systems it is not possible to globally asymptotically stabilize the system, if the usual matrix $A$ has unstable eigenvalues, see, for example, [Yang et al., 1997] and references therein. Moreover, in the presence of stochastic process noise the hope for achieving asymptotic stability is obviously not realistic. Therefore, we naturally relax the notion of stability into mean-square boundedness of the state and impose the extra conditions that the system matrix $A$ is Lyapunov (or neutrally) stable and that the pair $(A,B)$ is stabilizable. Under such standard assumptions, we show that it is possible to augment the optimization problem with an extra stability constraint, and, consequently, that the successive application of our resulting policies renders the state of the overall system (plant and estimator) mean-square bounded.

Related Works

The research on stochastic receding horizon control is broadly subdivided into two parallel lines: the first treats multiplicative noise that enters the state equations, and the second caters to additive noise. The case of multiplicative noise has been treated in some detail in [Primbs, 2007; Primbs and Sung, 2009; Cannon et al., 2009], where the authors consider also soft constraints on the state and control input. However, in this article we exclusively focus on the case of additive noise.

The approach proposed in this article stems from and generalizes the idea of affine parametrization of control policies for finite-horizon linear quadratic problems proposed in [Ben-Tal et al., 2004, 2006], utilized within the robust MPC framework in [Ben-Tal et al., 2006; Löfberg, 2003; Goulart et al., 2006] for full state feedback, and in [van Hessem and Bosgra, 2003] for output feedback with Gaussian state and measurement noise inputs. More recently, this affine approximation was utilized in [Skaf and Boyd, 2009] for both the robust deterministic and the stochastic setups in the absence of control bounds, and optimality of the affine policies in the scalar deterministic case was reported in [Bertsimas et al., 2009]. In [Bertsimas and Brown, 2007] the authors reformulate the stochastic programming problem as a deterministic one with bounded noise and solve a robust optimization problem over a finite horizon, followed by estimating the performance when the noise can take unbounded values, i.e., when the noise is unbounded, but takes high values with low probability. Similar approach was utilized in [Oldewurtel et al., 2008] as well. There are also other approaches, e.g., those employing randomized algorithms as in [Batina, 2004; Blackmore and Williams, 2007; Maciejowski et al., 2005]. Results on obtaining lower bounds on the value functions of the stochastic optimization problem have been recently reported in [Wang and Boyd, 2009].

Organization

The remainder of this article is organized as follows. In Section 2 we formulate the receding horizon control problem with all the underlying assumptions, the construction of the estimator, and the main optimization problems to be solved. In Section 3 we provide the main results pertaining to tractability of the optimization problems and mean-square boundedness of the closed-loop system. We comment on the obtained results and provide some extensions in Section 4. We provide all the needed preliminary results, derivations, and proofs in Section 5, and some numerical examples in Section 6. Finally, we conclude in Section 7.

Notation

Let $(\Omega,\mathfrak{F},\mathbb{P})$ be a general probability space, we denote the conditional expectation given the sub- $\sigma$ algebra $\mathfrak{F}^{\prime}$ of $\mathfrak{F}$ as $\mathbb{E}_{\mathfrak{F}^{\prime}}[.]$ . For any random vector $s$ we let $\Sigma_{s}\coloneqq\mathbb{E}[ss^{\mathsf{T}}]$ . Hereafter we let $\mathbb{N}_{+}\coloneqq\{1,2,\ldots\}$ and $\mathbb{N}\coloneqq\mathbb{N}_{+}\cup\{0\}$ be the sets of positive and nonnegative integers, respectively. We let $\mathsf{tr}\!\left(\cdot\right)$ denote the trace of a square matrix, $\left\lVert{\cdot}\right\rVert_{p}$ denote the standard $\ell_{p}$ norm, and simply $\left\lVert{.}\right\rVert$ denote the $\ell_{2}$ -norm. In a Euclidean space we denote by ${\mathcal{B}}_{r}$ the closed Euclidean ball or radius $r$ centered at the origin. For any two matrices $A$ and $B$ of compatible dimensions, we denote by $\mathfrak{R}_{k}(A,B)$ the $k$ -th step reachability matrix $\mathfrak{R}_{k}(A,B)\coloneqq\left[\begin{matrix}A^{k-1}B&\cdots&AB&B\end{matrix}\right]$ . For any matrix $M$ , we let $\sigma_{\min}$ and $\sigma_{\max}$ be its minimal and maximal singular values, respectively. We let $(M)_{i_{1}:i_{2}}$ denote the sub-matrix obtained by selecting the rows $i_{1}$ through $i_{2}$ of $M$ , and let $(M)_{i}$ denote its $i$ -th row.

2. Problem Setup

We are given a certain plant with discrete-time dynamics, process noise $w_{t}$ , and imperfect state measurements $y_{t}$ tainted through noise $v_{t}$ (Figure 1). The objective is to design a receding horizon controller which renders the overall closed-loop system mean-square bounded. We discuss the structure and the main assumptions on the dynamics of the system, the performance index to be minimized, and the construction of the optimal mean-square estimator ${\hat{x}}_{t}$ in Subsection 2.1. Then, we formalize the optimization problem to be solved in a receding horizon fashion (to generate the input policies $u_{t}$ ) in Subsection 2.2.

2.1. Dynamics, Performance Index and Estimator

Consider the following affine discrete-time stochastic dynamical model:


$\displaystyle x_{t+1}$	$\displaystyle=Ax_{t}+Bu_{t}+w_{t},$	(2.1a)
$\displaystyle y_{t}$	$\displaystyle=Cx_{t}+v_{t}$	(2.1b)

where $t\in\mathbb{N}$ , $x_{t}\in\mathbb{R}^{n}$ is the state, $u_{t}\in\mathbb{R}^{m}$ is the control input, $y_{t}\in\mathbb{R}^{p}$ is the output, $w_{t}\in\mathbb{R}^{n}$ is a random process noise, $v_{t}\in\mathbb{R}^{p}$ is a random measurement noise, and $A$ , $B$ , and $C$ are known matrices. We posit the following main assumption:

Assumption 1.

(i)
The system matrices in (2.1a) satisfy the following:
1. (a)
  
  The pair $(A,B)$ is stabilizable.
2. (b)
  
  $A$ is discrete-time Lyapunov stable [Bernstein, 2009, Chapter 12], i.e., the eigenvalues $\{\lambda_{i}(A)\mid i=1,\ldots,d\}$ lie in the closed unit disc, and those eigenvalues $\lambda_{j}(A)$ with $\bigl{|}\lambda_{j}(A)\bigr{|}=1$ have equal algebraic and geometric multiplicities.
(ii)

The initial condition and the process and measurement noise vectors are normally distributed, i.e., $x_{0}\sim\mathcal{N}(0,\Sigma_{x_{0}})$ , $w_{t}\sim\mathcal{N}(0,\Sigma_{w})$ , and $v_{t}\sim\mathcal{N}(0,\Sigma_{v})$ . Moreover, $x_{0}$ , $(w_{t})_{t\in\mathbb{N}}$ and $(v_{t})_{t\in\mathbb{N}}$ are mutually independent.
(iii)

The control inputs take values in the control constraint set:

$\mathbb{U}\coloneqq\bigl{\{}\xi\in\mathbb{R}^{m}\,\big{|}\,\left\lVert{\xi}\right\rVert_{\infty}\leqslant U_{\max}\bigr{\}};$ (C1)

i.e., $u_{t}\in\mathbb{U}$ for each $t\in\mathbb{N}$ . $\diamondsuit$

Without any loss of generality, we also assume that $A$ is in real Jordan canonical form. Indeed, given a linear system described by system matrices $\bigl{(}\tilde{A},\tilde{B}\bigr{)}$ , there exists a coordinate transformation in the state-space that brings the pair $\bigl{(}\tilde{A},\tilde{B}\bigr{)}$ to the pair $(A,B)$ , where $A$ is in real Jordan form [Horn and Johnson, 1990, p. 150]. In particular, choosing a suitable ordering of the Jordan blocks, we can ensure that the pair $(A,B)$ has the form $\left(\begin{bmatrix}A_{1}&0\\ 0&A_{2}\end{bmatrix},\begin{bmatrix}B_{1}\\ B_{2}\end{bmatrix}\right)$ , where $A_{1}\in\mathbb{R}^{n_{1}\times n_{1}}$ is Schur stable, and $A_{2}\in\mathbb{R}^{n_{2}\times n_{2}}$ has its eigenvalues on the unit circle. By hypothesis (i)-(i)(b) of Assumption 1, $A_{2}$ is therefore block-diagonal with elements on the diagonal being either $\pm 1$ or $2\times 2$ rotation matrices. As a consequence, $A_{2}$ is orthogonal. Moreover, since $(A,B)$ is stabilizable, the pair $(A_{2},B_{2})$ must be reachable in a number of steps $\kappa\leqslant n_{2}$ that depends on the dimension of $A_{2}$ and the structure of $(A_{2},B_{2})$ , i.e., $\text{rank}(\mathfrak{R}_{\kappa}(A_{2},B_{2}))=n_{2}$ . Summing up, we can start by considering that the state equation (2.1a) has the form

\begin{bmatrix}x^{(1)}_{t+1}\\ --\\ x^{(2)}_{t+1}\end{bmatrix}=\begin{bmatrix}A_{1}x^{(1)}_{t}\\ --\\ A_{2}x^{(2)}_{t}\end{bmatrix}+\begin{bmatrix}B_{1}\\ --\\ B_{2}\end{bmatrix}u_{t}+\begin{bmatrix}w^{(1)}_{t}\\ --\\ w^{(2)}_{t}\end{bmatrix},

(2.2)

where $A_{1}$ is Schur stable, $A_{2}$ is orthogonal, and there exists a nonnegative integer $\kappa\leqslant n_{2}$ such that the subsystem $(A_{2},B_{2})$ is reachable in $\kappa$ steps. This integer $\kappa$ is fixed throughout the rest of the article.

Fix a prediction horizon $N\in\mathbb{N}_{+}$ , and let us consider, at any time $t$ , the cost $V_{t}\coloneqq V(t,\mathcal{Y}_{t},u)$ defined by:

\displaystyle V_{t}

\displaystyle=\mathbb{E}_{\mathcal{Y}_{t}}\Bigg{[}\sum\limits_{k=0}^{N-1}\!\bigl{(}x_{t+k}^{\mathsf{T}}Q_{k}x_{t+k}+u_{t+k}^{\mathsf{T}}R_{k}u_{t+k}\bigr{)}+x_{t+N}^{\mathsf{T}}Q_{N}x_{t+N}\Bigg{]},

(2.3)

where $\mathcal{Y}_{t}=\{y_{0},y_{1},\ldots,y_{t}\}$ is the set of outputs up to time $t$ , (or more precisely the $\sigma$ -algebra generated by the set of outputs,) and $Q_{k}=Q^{\mathsf{T}}_{k}>0$ , $Q_{N}=Q_{N}^{\mathsf{T}}>0$ , and $R_{k}=R^{\mathsf{T}}_{k}>0$ are some given symmetric matrices of appropriate dimension, $k=0,\ldots,N-1$ . At each time instant $t$ , we are interested in minimizing (2.3) over the class of causal output feedback strategies $\mathcal{G}$ defined as:

\left[\begin{matrix}u_{t}\\ u_{t+1}\\ \vdots\\ u_{t+N-1}\end{matrix}\right]\!\!=\!\!\left[\begin{array}[]{l}g_{t}(\mathcal{Y}_{t})\\ g_{t+1}(\mathcal{Y}_{t+1})\\ \vdots\\ g_{t+N-1}(\mathcal{Y}_{t+N-1})\end{array}\right],

(2.4)

for some measurable functions $\mathbf{g}_{t}\coloneqq\{g_{t},$ $\cdots,g_{t+N-1}\}$ which satisfy the hard constraint (C1) on the inputs.

The cost $V_{t}$ in (2.3) is a conditional expectation given the observations up through time $t$ , and as such requires the conditional density $f(x_{t}|\mathcal{Y}_{t})$ of the state given the previous and current measurements. Our choice of causal control policies in (2.4) motivates us to rewrite the cost $V_{t}$ in (2.3) as:

\displaystyle V_{t}

\displaystyle=\mathbb{E}_{\mathcal{Y}_{t}}\!\Bigg{[}\mathbb{E}_{\{\mathcal{Y}_{t},x_{t}\}}\!\Bigg{[}\sum\limits_{k=0}^{N-1}\big{(}x_{t+k}^{\mathsf{T}}Q_{k}x_{t+k}+u_{t+k}^{\mathsf{T}}R_{k}u_{t+k}\big{)}+x_{t+N}^{\mathsf{T}}Q_{N}x_{t+N}\Bigg{]}\Bigg{]}.

(2.5)

The propagation of $f(x_{t}|\mathcal{Y}_{t})$ in time is not a trivial task in general. In what follows we shall report an iterative method for the computation of $f(x_{t}|\mathcal{Y}_{t})$ , whenever the control input is a measurable deterministic feedback from the past and present outputs. For $t,s\in\mathbb{N}_{0}$ , define $\hat{x}_{t|s}=\mathbb{E}_{\mathcal{Y}_{s}}[x_{t}]$ and $P_{t|s}=\mathbb{E}_{\mathcal{Y}_{s}}[(x_{t}-\hat{x}_{t|s})(x_{t}-\hat{x}_{t|s})^{\mathsf{T}}]$ .

Assumption 2.

We require that $\Sigma_{w}>0$ and $\Sigma_{v}>0$ . $\diamondsuit$

The following result is a slight extension of the usual Kalman Filter for which a proof can be found in Appendix A. An alternative proof can also be found in [Kumar and Varaiya, 1986, p.102].

Proposition 3.

Under Assumption 1-(ii) and Assumption 2, $f(x_{t}|\mathcal{Y}_{t})$ and $f(x_{t+1}|\mathcal{Y}_{t})$ are the probability densities of Gaussian distributions $\mathcal{N}(\hat{x}_{t|t},P_{t|t})$ and $\mathcal{N}(\hat{x}_{t+1|t},P_{t+1|t})$ , respectively, with $P_{t|t}>0$ and $P_{t+1|t}>0$ . For $t=0,1,2,\ldots,$ their conditional means and covariances can be computed iteratively as follows:

$\displaystyle\hat{x}_{t+1\|t+1}$	$\displaystyle=\hat{x}_{t+1\|t}$
	$\displaystyle\quad+P_{t+1\|t}C^{\mathsf{T}}(CP_{t+1\|t}C^{\mathsf{T}}+\Sigma_{v})^{-1}(y_{t+1}-C\hat{x}_{t+1\|t}),$	(2.6)
$\displaystyle P_{t+1\|t+1}$	$\displaystyle=P_{t+1\|t}-P_{t+1\|t}C^{\mathsf{T}}(CP_{t+1\|t}C^{\mathsf{T}}+\Sigma_{v})^{-1}CP_{t+1\|t},$	(2.7)

where

	$\displaystyle\hat{x}_{t+1\|t}$	$\displaystyle=A\hat{x}_{t\|t}+Bg_{t}(\mathcal{Y}_{t}),$		(2.8)
	$\displaystyle P_{t+1\|t}$	$\displaystyle=AP_{t\|t}A^{\mathsf{T}}+\Sigma_{w}.$		(2.9)

In particular, the matrix $P_{t|t}$ together with $\hat{x}_{t|t}$ characterize the conditional density $f(x_{t}|\mathcal{Y}_{t})$ , which is needed in the computation of the cost (2.5). Proposition 3 states that the conditional mean and covariances of $x_{t}$ can be propagated by an iterative algorithm which resembles the Kalman filter. Since the system is generally nonlinear due to the function $g$ being nonlinear, we cannot assume that the probability distributions in the problem are Gaussian (in fact, the a priori distributions of $x_{t}$ and of $\mathcal{Y}_{t}$ are not) and the proof cannot be developed in the standard linear framework of the Kalman filter.

Hereafter, we shall denote for notational convenience by $\hat{x}_{t}$ the estimate $\hat{x}_{t|t}$ , and let

\hat{x}_{t}=\left[\begin{matrix}{\hat{x}}^{(1)}_{t}\\ --\\ {\hat{x}}^{(2)}_{t}\end{matrix}\right],

(2.10)

which corresponds to the Jordan decomposition in (2.2).

2.2. Optimization Problem and Control Policies

Having discussed the iterative update of the laws of $x_{t}$ given the measurements $\mathcal{Y}_{t}$ in Section 2.1, we return to our optimization problem in (2.5). We can rewrite our optimization problem as:

\displaystyle\min_{\mathbf{g}_{t}}\Bigl{\{}V_{t}\ \Big{|}\text{ dynamics \eqref{eq:system-a}-\eqref{eq:system-b} and constraint \eqref{eq:C1}}\Bigr{\}},

(OP1)

where the collection of functions $\mathbf{g}_{t}$ were defined following (2.4).

The explicit solution via dynamic programming to Problem (OP1) over the class of causal feedback policies $\mathcal{G}$ , is difficult if not impossible to obtain in general [Bertsekas, 2000, 2007]. One way to circumvent this difficulty is to restrict our search for feedback policies to a subclass of $\mathcal{G}$ . This will result in a suboptimal solution to our problem, but may yield a tractable optimization problem. This is the track we pursue next.

Given a control horizon $N_{c}$ and a prediction horizon $N\;(\geqslant N_{c})$ , we would like to periodically solve Problem (OP1) at times $t=0,N_{c},2N_{c},\cdots$ over the following class of affine-like control policies

u_{\ell}=\eta_{\ell}+\sum_{i=t}^{\ell}\theta_{\ell,i}\varphi_{i}(y_{i}-\hat{y}_{i})\quad\text{for all }\ell=t,\cdots,t+N-1,

(POL)

where $\hat{y}_{i}=C\hat{x}_{i}$ , and for any vector $z\in\mathbb{R}^{p}$ , $\varphi_{i}(z)\coloneqq\bigl{[}\varphi_{i,1}(z_{1}),\ldots,$ $\varphi_{i,p}(z_{p})\bigr{]}^{\mathsf{T}}$ , where $z_{j}$ is the $j$ -th element of the vector $z$ and $\varphi_{i,j}:\mathbb{R}\to\mathbb{R}$ is any function with $\sup\limits_{s\in\mathbb{R}}|\varphi_{i,j}(s)|\leqslant\varphi_{\max}<\infty$ . The feedback gains $\theta_{\ell,i}\in\mathbb{R}^{m\times p}$ and the affine terms $\eta_{\ell}\in\mathbb{R}^{m}$ are the decision variables. The value of $u_{\ell}$ in (POL) depends on the values of the measured outputs from the beginning of the prediction horizon at time $t$ up to time $\ell$ only, which requires a finite amount of memory. Note that we have chosen to saturate the measurements we obtain from the vectors $(y_{i}-\hat{y}_{i})$ before inserting them into the control policy. This way we do not assume that either the process noise or the measurement noise distributions are defined over a compact domain, which is a crucial assumption in robust MPC approaches [Mayne et al., 2000]. Moreover, the choice of element-wise saturation functions $\varphi_{i}(\cdot)$ is left open. As such, we can accommodate standard saturation, piecewise linear, and sigmoidal functions, to name a few.

Finally, we collect all the ingredients of the stochastic receding horizon control problem in Algorithm 1 below.

Algorithm 1 Basic Stochastic Receding Horizon Algorithm

0: density

f(x_{0}|\mathcal{Y}_{-1})=\mathcal{N}(0,\Sigma_{x_{0}})

1: set

t=0

\hat{x}_{0|-1}=0

, and

P_{0|-1}=\Sigma_{x_{0}}

2: loop

3: for

i=0

N_{c}-1

4: measure

y_{t+i}

5: update

\hat{x}_{t+i}(=\hat{x}_{t+i|t+i})

and

P_{t+i|t+i}

using (2.6)-(2.7)

6: if

i=0

then

7: solve the optimization problem (OP1) with the control policies in (POL) to obtain

\{u_{t}^{*},\ \cdots,\ u_{t+N-1}^{*}\}

8: end if

9: apply

u^{*}_{t+i}

10: calculate

\hat{x}_{t+i+1|t+i}

and

P_{t+i+1|t+i}

using (2.8)-(2.9)

11: end for

12: set

t=t+N_{c}

13: end loop

Compact Notation

In order to state the main results in Section 3, it is convenient to utilize a more compact notation that describes the evolution of all signals over the entire prediction horizon $N$ .

The evolution of the system dynamics (2.1a)-(2.1b) over a single prediction horizon $N$ , starting at $t$ , can be described in a lifted form as follows:

	$\displaystyle X_{t}$	$\displaystyle=\mathcal{A}x_{t}+\mathcal{B}U_{t}+\mathcal{D}W_{t}$		(2.11)
	$\displaystyle Y_{t}$	$\displaystyle=\mathcal{C}X_{t}+V_{t}$		(2.11)

where $X_{t}=\left[\begin{matrix}\begin{smallmatrix}x_{t}\\ x_{t+1}\\ \vdots\\ x_{t+N}\end{smallmatrix}\end{matrix}\right]$ , $U_{t}=\left[\begin{matrix}\begin{smallmatrix}u_{t}\\ u_{t+1}\\ \vdots\\ u_{t+N-1}\end{smallmatrix}\end{matrix}\right]$ , $W_{t}=\left[\begin{matrix}\begin{smallmatrix}w_{t}\\ w_{t+1}\\ \vdots\\ w_{t+N-1}\end{smallmatrix}\end{matrix}\right]$ , $Y_{t}=\left[\begin{matrix}\begin{smallmatrix}y_{t}\\ y_{t+1}\\ \vdots\\ y_{t+N}\end{smallmatrix}\end{matrix}\right]$ , $V_{t}=\left[\begin{matrix}\begin{smallmatrix}v_{t}\\ v_{t+1}\\ \vdots\\ v_{t+N}\end{smallmatrix}\end{matrix}\right]$ , $\mathcal{A}=\left[\begin{matrix}\begin{smallmatrix}I\\ A\\ \vdots\\ A^{N}\end{smallmatrix}\end{matrix}\right]$ ,

\mathcal{B}=\left[\begin{matrix}\begin{smallmatrix}0&\cdots&\cdots&0\\ B&\ddots&&\vdots\\ AB&B&\ddots&\vdots\\ \vdots&&\ddots&0\\ A^{N-1}B&\cdots&AB&B\end{smallmatrix}\end{matrix}\right],\mathcal{D}=\left[\begin{matrix}\begin{smallmatrix}0&\cdots&\cdots&0\\ I&\ddots&&\vdots\\ A&I&\ddots&\vdots\\ \vdots&&\ddots&0\\ A^{N-1}&\cdots&A&I\end{smallmatrix}\end{matrix}\right],

and $\mathcal{C}=\mathrm{diag}\{C,\cdots,C\}$ . Let

K_{t}\coloneqq(AP_{t|t}A^{\mathsf{T}}+\Sigma_{w})C^{\mathsf{T}}(C(AP_{t|t}A^{\mathsf{T}}+\Sigma_{w})C^{\mathsf{T}}+\Sigma_{v})^{-1}

be the usual Kalman gain and define $\Gamma_{t}\coloneqq I-K_{t}C$ , and $\Phi_{t}\coloneqq\Gamma_{t}A$ . Then, we can write the estimation error vector as

E_{t}\coloneqq X_{t}-\hat{X}_{t}=\mathcal{F}^{e}_{t}e_{t}+\mathcal{F}^{w}_{t}W_{t}-\mathcal{F}^{v}_{t}V_{t},

(2.12)

where $e_{t}=x_{t}-\hat{x}_{t}$ , $\mathcal{F}^{e}_{t}=\tiny\left[\begin{matrix}I\\ \Phi_{t}\\ \Phi_{t+1}\cdot\Phi_{t}\\ \vdots\\ \Phi_{t+N-1}\cdot\Phi_{t+N-2}\cdots\Phi_{t}\end{matrix}\right]$ ,

	$\displaystyle\mathcal{F}^{w}_{t}$	$\displaystyle=\tiny\left[\begin{matrix}\begin{array}[]{l\|l\|l\|l}0&&0&0\\ \Gamma_{t}&&0&0\\ \Phi_{t+1}\Gamma_{t}&&0&0\\ \vdots&\cdots&\vdots&\vdots\\ \Phi_{t+N-2}\cdots\Phi_{t+1}\Gamma_{t}&&\Gamma_{t+N-2}&0\\ \Phi_{t+N-1}\cdots\Phi_{t+1}\Gamma_{t}&&\Phi_{t+N-1}\Gamma_{t+N-2}&\Gamma_{t+N-1}\end{array}\end{matrix}\right],$
	$\displaystyle\mathcal{F}^{v}_{t}\!$	$\displaystyle=\!\!\tiny\left[\begin{matrix}\begin{array}[]{l\|l\|l\|l\|l}0&0&&0&0\\ 0&K_{t}&&0&0\\ 0&\Phi_{t+1}K_{t}&&0&0\\ \vdots&\vdots&\!\!\cdots\!\!&\vdots&\vdots\\ 0&\Phi_{t+N-2}\cdots\Phi_{t+1}K_{t}&&K_{t+N-2}&0\\ 0&\Phi_{t+N-1}\cdots\Phi_{t+1}K_{t}&&\Phi_{t+N-1}K_{t+N-2}&K_{t+N-1}\end{array}\end{matrix}\right]\!\!.$

Finally, the innovations sequence can be written as

Y_{t}-\hat{Y}_{t}=\mathcal{C}\mathcal{F}^{e}_{t}e_{t}+\mathcal{C}\mathcal{F}^{w}_{t}W_{t}+(I-\mathcal{C}\mathcal{F}^{v}_{t})V_{t},

(2.13)

where $\hat{Y}_{t}\coloneqq\mathcal{C}\hat{X}_{t}$ . Consequently, the innovations sequence over the prediction horizon is independent of the inputs vector $U_{t}$ .

The cost function (2.3) at time $t$ can be written as

V_{t}=\mathbb{E}_{\mathcal{Y}_{t}}\bigl{[}X_{t}^{\mathsf{T}}\mathcal{Q}X_{t}+U_{t}^{\mathsf{T}}\mathcal{R}U_{t}\bigr{]},

(2.14)

where $\mathcal{Q}=\text{diag}\{Q_{0},\cdots,Q_{n}\}$ and $\mathcal{R}=\text{diag}\{R_{0},\cdots,R_{N-1}\}$ . Also, the control policy (POL) at time $t$ is given by

U_{t}=\boldsymbol{\eta}_{t}+\boldsymbol{\Theta}_{t}\varphi(Y_{t}-\hat{Y}_{t}),

(POL)

where

\displaystyle\boldsymbol{\eta}_{t}\!:=\!\!\left[\begin{matrix}\eta_{t}\\ \eta_{t+1}\\ \vdots\\ \eta_{t+N-1}\end{matrix}\right]\!,\varphi(Y_{t}\!-\!\hat{Y}_{t})\!\coloneqq\!\!\left[\begin{matrix}\varphi_{0}(y_{t}\!-\!\hat{y}_{t})\\ \vdots\\ \varphi_{N-1}(y_{t+N-1}\!-\!\hat{y}_{t+N-1})\end{matrix}\right]\!,

and $\boldsymbol{\Theta}_{t}$ has the following lower block diagonal structure

\boldsymbol{\Theta}_{t}\coloneqq\left[\begin{matrix}\theta_{t,t}&0&\ldots&0\\ \theta_{t+1,t}&\theta_{t+1,t+1}&&\vdots\\ \vdots&\vdots&\ddots&0\\ \theta_{t+N-1,t}&\theta_{t+N-1,t+1}&\ldots&\theta_{t+N-1,t+N-1}\end{matrix}\right].

(2.15)

Since the innovation vector $Y_{t}-\hat{Y}_{t}$ in (2.13) is not a function of $\boldsymbol{\eta}_{t}$ and $\boldsymbol{\Theta}_{t}$ , the control inputs $U_{t}$ in (POL) remain affine in the decision variables. This fact is important to show convexity of the optimization problem, as will be seen in the next section. Finally, the constraint (C1) can be rewritten as:

\displaystyle U_{t}\in{\mathbb{U}}\coloneqq\bigl{\{}\xi\in\mathbb{R}^{Nm}\big{|}\left\lVert{\xi}\right\rVert_{\infty}\leqslant U_{\max}\bigr{\}}.

(C1)

3. Main Results

The optimization problem (OP1) we have stated so far, even if it is successively feasible, does not in general guarantee stability of the resulting receding horizon controller in Algorithm 1. Unlike deterministic stability arguments utilized in MPC (see, for example, [Mayne et al., 2000]), we cannot assume the existence of a final region that is control positively invariant and which is crucial to establish stability. This is simply due to the fact that the process noise sequence is not assumed to have a compact domain of support. However, guided by our earlier results in [Ramponi et al., 2009], we shall enforce an extra stability constraint which, if feasible, can render the state of the closed-loop system mean-square bounded.

For $t\in\mathbb{N}$ , the state estimate at time $t+\kappa$ given the state estimate at time $t$ , the control inputs from time $t$ through $t+\kappa-1$ , and the corresponding process and measurement noise sequences can be written as

{\hat{x}}_{t+\kappa|t+\kappa}=A^{\kappa}{\hat{x}}_{t|t}+\mathfrak{R}_{\kappa}(A,B)\left[\begin{matrix}u_{t}\\ \vdots\\ u_{t+\kappa-1}\end{matrix}\right]+\Xi_{t},

where we define

$\displaystyle\Xi_{t}$	$\displaystyle\coloneqq\left[\begin{matrix}A^{\kappa-1}K_{t}CA,A^{\kappa-2}K_{t+1}CA,\cdots,K_{t+\kappa-1}CA\end{matrix}\right]\left[\begin{matrix}e_{t}\\ \vdots\\ e_{t+\kappa-1}\end{matrix}\right]$	(3.1)
	$\displaystyle\quad+\left[\begin{matrix}A^{\kappa-1}K_{t}C,A^{\kappa-2}K_{t+1}C,\cdots,K_{t+\kappa-1}C\end{matrix}\right]\left[\begin{matrix}w_{t}\\ \vdots\\ w_{t+\kappa-1}\end{matrix}\right]$
	$\displaystyle\quad+\left[\begin{matrix}A^{\kappa-1}K_{t},A^{\kappa-2}K_{t+1},\cdots,K_{t+\kappa-1}\end{matrix}\right]\left[\begin{matrix}v_{t+1}\\ \vdots\\ v_{t+\kappa}\end{matrix}\right].$

Scholium 4.

There exists an integer $T$ and a positive constant $\zeta=\zeta(\Sigma_{v},\Sigma_{w},A,B,C)$ such that

\mathbb{E}_{\mathcal{Y}_{t}}\left[\left\lVert{\Xi_{t}}\right\rVert\right]\leqslant\zeta\qquad\text{for all }t\geqslant T.

(3.2)

A proof of Scholium 4 appears in Section 5. Using the constant $\zeta$ , we now require the following “drift condition” to be satisfied: for any given $\varepsilon>0$ and for every $t=0,N_{c},2N_{c},\cdots$ , there exists a $U_{t}\in\mathbb{U}$ , such that the following condition is satisfied

\displaystyle\left\lVert{A_{2}^{\kappa}{\hat{x}}^{(2)}_{t}+\mathfrak{R}_{\kappa}(A_{2},B_{2})\left[\begin{matrix}u_{t}\\ \vdots\\ u_{t+\kappa-1}\end{matrix}\right]}\right\rVert\leqslant\bigl{\|}{\hat{x}}^{(2)}_{t}\bigr{\|}-(\zeta+\tfrac{\varepsilon}{2})\;\;\text{whenever }\bigl{\|}{\hat{x}}^{(2)}_{t}\bigr{\|}\geqslant\zeta+\varepsilon.

(C2)

Note that $\left[\begin{matrix}u_{t}\\ \vdots\\ u_{t+\kappa-1}\end{matrix}\right]=(\boldsymbol{\eta}_{t})_{1:\kappa m}+(\boldsymbol{\Theta}_{t})_{1:\kappa m}\varphi(Y-\hat{Y})$ . (For notational convenience, we have retained $\varphi(Y-\hat{Y})$ with the knowledge that the matrix $(\boldsymbol{\Theta}_{t})_{1:\kappa m}$ causally selects the outputs as they become available, see (2.15).) The constraint (C2) pertains only to the second subsystem of the estimator, as the first subsystem ( $\hat{x}^{(1)}$ ) is Schur stable (see (2.2) and (2.10)). We augment problem (OP1) with the stability constraint (C2) to obtain

\displaystyle\min\Bigl{\{}V_{t}\ \Big{|}\text{ dynamics \eqref{eq:system-a}-\eqref{eq:system-b}, policies \eqref{eq:policy}, and constraints \eqref{eq:C1} \& \eqref{eq:C4}}\Bigr{\}}.

(OP2)

Assumption 5.

In addition to Assumption 1 and Assumption 2, we stipulate that:

(i)

The control authority $U_{\max}\geqslant U_{\max}^{*}$ , where $U_{\max}^{*}\coloneqq\sigma_{\min}(\mathfrak{R}_{\kappa}(A_{2},B_{2}))^{-1}\big{(}\zeta+\tfrac{\varepsilon}{2}\big{)}$ .
(ii)

The control horizon $N_{c}\geqslant\kappa$ , where $\kappa$ is the reachability index.
(iii)

$\bigl{(}A,\Sigma_{w}^{1/2}\bigr{)}$ is stabilizable, and $(A,C)$ is observable. $\diamondsuit$

Theorem 6 (Main Result).

Consider the system (2.1a)-(2.1b), and suppose that Assumption 5 holds. Then:

(i)

For every time $t=0,N_{c},2N_{c},\cdots$ , the optimization problem (OP2) is convex, feasible, and can be approximated to the following quadratic optimization problem:

$\displaystyle\operatorname*{minimize}_{(\boldsymbol{\eta}_{t},\boldsymbol{\Theta}_{t})}\quad$	$\displaystyle 2\hat{x}_{t}^{\mathsf{T}}\mathcal{A}^{\mathsf{T}}\mathcal{Q}\mathcal{B}\boldsymbol{\eta}_{t}+2\mathsf{tr}\!\left(\mathcal{A}^{\mathsf{T}}\mathcal{Q}\mathcal{B}\boldsymbol{\Theta}_{t}\Lambda^{\varphi x}_{t}\right)+2\mathsf{tr}\!\left(\boldsymbol{\Theta}_{t}^{\mathsf{T}}\mathcal{B}^{\mathsf{T}}\mathcal{Q}\mathcal{D}\Lambda^{w\varphi}_{t}\right)$
	$\displaystyle\;\;+\boldsymbol{\eta}_{t}^{\mathsf{T}}\mathcal{M}_{1}\boldsymbol{\eta}_{t}+2\boldsymbol{\eta}_{t}^{\mathsf{T}}\mathcal{M}_{1}\boldsymbol{\Theta}_{t}\Lambda^{\varphi}_{t}+\mathsf{tr}\!\left(\boldsymbol{\Theta}_{t}^{\mathsf{T}}\mathcal{M}_{1}\boldsymbol{\Theta}_{t}\Lambda^{\varphi\varphi}_{t}\right)$	(3.3)
subject to	$\displaystyle\text{the structure of}\ \boldsymbol{\Theta}_{t}\ \text{in}\ \eqref{eq:decision-constraints},$
	$\displaystyle\|(\boldsymbol{\eta}_{t})_{i}\|+\left\lVert{(\boldsymbol{\Theta}_{t})_{i}}\right\rVert_{1}\varphi_{\max}\leqslant U_{\max}\quad\forall\,i=1,\cdots,Nm,$	(3.4)
	$\displaystyle\left\lVert{A_{2}^{\kappa}{\hat{x}}^{(2)}_{t}+\mathfrak{R}_{\kappa}(A_{2},B_{2})(\boldsymbol{\eta}_{t})_{1:\kappa m}}\right\rVert+\sqrt{n_{2}}\left\lVert{\mathfrak{R}_{\kappa}(A_{2},B_{2})(\boldsymbol{\Theta}_{t})_{1:\kappa m}}\right\rVert_{\infty}\varphi_{\max}$
	$\displaystyle\quad\leqslant\bigl{\\|}{\hat{x}}^{(2)}_{t}\bigr{\\|}-(\zeta+\tfrac{\varepsilon}{2})\;\;\text{whenever }\bigl{\\|}{\hat{x}}^{(2)}_{t}\bigr{\\|}\geqslant\zeta+\varepsilon$	(3.5)

where $\mathcal{M}_{1}=\mathcal{R}+\mathcal{B}^{\mathsf{T}}\mathcal{Q}\mathcal{B}$ ,

	$\displaystyle\Lambda^{\varphi}_{t}$	$\displaystyle=\mathbb{E}_{\mathcal{Y}_{t}}[\varphi(Y-\hat{Y})],$	$\displaystyle\Lambda^{\varphi x}_{t}$	$\displaystyle=\mathbb{E}_{\mathcal{Y}_{t}}[\varphi(Y-\hat{Y})x_{t}^{\mathsf{T}}],$		(3.6)
	$\displaystyle\Lambda^{w\varphi}_{t}$	$\displaystyle=\mathbb{E}_{\mathcal{Y}_{t}}[W\varphi(Y-\hat{Y})^{\mathsf{T}}],$	$\displaystyle\Lambda^{\varphi\varphi}_{t}$	$\displaystyle=\mathbb{E}_{\mathcal{Y}_{t}}[\varphi(Y-\hat{Y})\varphi(Y-\hat{Y})^{\mathsf{T}}].$		(3.6)

(ii)

For every initial state and noise statistics $\bigl{(}(\hat{x}_{0},\Sigma_{x_{0}}),\Sigma_{w},\Sigma_{v}\bigr{)}$ , successive application of the control laws given by the optimization problem in (i) for $N_{c}$ steps renders the closed-loop system mean-square bounded in the following sense: there exists a constant $\gamma=\gamma(\mathcal{Y}_{0},\kappa,\hat{x}_{0},\Sigma_{x_{0}},\Sigma_{w},\Sigma_{v},U_{\max}^{*})>0$ such that

$\sup_{t\in\mathbb{N}}\;\mathbb{E}_{\mathcal{Y}_{0}}\bigl{[}\left\lVert{x_{t}}\right\rVert^{2}\bigr{]}\leqslant\gamma.$ (3.7)

In practice, it may be also of interest to impose further some soft constraints both on the state and the input vectors. For example, one may be interested in imposing quadratic or linear constraints on the state, both of which can be captured in the following

\mathbb{E}_{\mathcal{Y}_{t}}\left[X_{t}^{\mathsf{T}}\mathcal{S}X_{t}+\mathcal{L}^{\mathsf{T}}X_{t}\right]\leqslant\alpha_{t},

(C3)

where $\mathcal{S}=\mathcal{S}^{\mathsf{T}}\geqslant 0$ and $\alpha_{t}>0$ . Moreover, expected energy expenditure constraints can be posed as follows

\mathbb{E}_{\mathcal{Y}_{t}}\left[U_{t}^{\mathsf{T}}\tilde{\mathcal{S}}U_{t}\right]\leqslant\beta_{t},

(C4)

where $\tilde{\mathcal{S}}=\tilde{\mathcal{S}}^{\mathsf{T}}\geqslant 0$ and $\beta_{t}>0$ . In the absence of hard input constraints, such expectation-type constraints are commonly used in the stochastic MPC [Primbs and Sung, 2009; Agarwal et al., 2009] and in stochastic optimization in the form of integrated chance constraints [Klein Haneveld, 1983; Klein Haneveld and van der Vlerk, 2006]. This is partly because it is not possible, without posing further restrictions on the boundedness of the process noise $w_{t}$ , to ensure that hard constraints on the state are satisfied. For example, in the standard LQG setting nontrivial hard constraints on the system state would generally be violated with nonzero probability. Moreover, in contrast to chance constraints where a bound is imposed on the probability of constraint violation, expectation-type constraints tend to give rise to convex optimization problems under weak assumptions [Agarwal et al., 2009].

We can also augment problem (OP2) with the constraints (C3) and (C4) to obtain

\displaystyle\min\Bigl{\{}V_{t}\ \Big{|}\text{ dynamics \eqref{eq:system-a}-\eqref{eq:system-b}, policies \eqref{eq:policy}, constraints \eqref{eq:C1}, \eqref{eq:C4}, \eqref{eq:statevar} \& \eqref{eq:inputvar}}\Bigr{\}}.

(OP3)

Notice that the constraints (C3) and (C4) are not necessarily feasible at time $t$ for any choice of parameters $\alpha_{t}$ and $\beta_{t}$ . As such, problem (OP3) may become infeasible over time if we simply apply Algorithm 1. We therefore replace the optimization Step 7 in Algorithm 1 with the following subroutine for some given $\alpha_{t}^{*}$ and $\beta_{t}^{*}$ that make the constraints feasible, precision number $\delta>0$ and maximal number of iterations $\bar{\nu}$ :

Subroutine 7

7a:

Set $\overline{\alpha}=\alpha_{t}^{*}$ , $\underline{\alpha}=0$ , $\overline{\beta}=\beta_{t}^{*}$ , and $\underline{\beta}=0$
7b:

Solve the optimization problem (OP3) using $\alpha_{t}=\overline{\alpha}$ and $\beta_{t}=\overline{\beta}$ to obtain the sequence $\{u_{t}^{*},u_{t+1}^{*},\cdots,u_{t+N-1}^{*}\}$
7c:

Set $\nu=1$
7d:

repeat
7e:

Set $\alpha_{t}=(\overline{\alpha}+\underline{\alpha})/2$ and $\beta_{t}=(\overline{\beta}+\underline{\beta})/2$
7f:

Solve Solve the optimization problem (OP3) using the new $\alpha_{t}$ and $\beta_{t}$ to obtain
the sequence $\{\tilde{u}_{t},\tilde{u}_{t+1},\cdots,\tilde{u}_{t+N-1}\}$
7g:

if step 5f is feasible then
7h:

Set $\overline{\alpha}=\alpha_{t}$ and $\overline{\beta}=\beta_{t}$
7i:

Set $\{u_{t}^{*},u_{t+1}^{*},\cdots,u_{t+N-1}^{*}\}=\{\tilde{u}_{t},\tilde{u}_{t+1},\cdots,\tilde{u}_{t+N-1}\}$
7j:

else
7k:

Set $\underline{\alpha}=\alpha_{t}$ and $\underline{\beta}=\beta_{t}$
7l:

end if
7m:

set $\nu=\nu+1$
7n:

until ( $|\overline{\alpha}-\underline{\alpha}|\leqslant\delta$ and $|\overline{\beta}-\underline{\beta}|\leqslant\delta$ ) or $\nu>\bar{\nu}$

It may be argued that replacing Step 7 in Algorithm 1 with Subroutine 7 above increases the computational burden; however, the parameter $\bar{\nu}$ guarantees that this iterative process is halted after some prespecified number of steps in case the required precision $\delta$ is not reached in the meantime. In some instances, we may be given some fixed parameters $\hat{\alpha}$ and $\hat{\beta}$ with which the constraints (C3) and (C4) have to be satisfied, respectively. This requirement can be easily incorporated into Subroutine 7 by setting $\underline{\alpha}=\hat{\alpha}$ and $\underline{\beta}=\hat{\beta}$ in step 7a.

Assumption 7.

At each time step $t=0,N_{c},2N_{c},\cdots$ , the constants $\alpha^{*}_{t}$ and $\beta_{t}^{*}$ in Subroutine 7 are chosen as

	$\displaystyle\alpha^{*}_{t}$	$\displaystyle\coloneqq 3\mathsf{tr}\!\left(\mathcal{A}^{\mathsf{T}}\mathcal{S}\mathcal{A}\mathbb{E}_{\mathcal{Y}_{t}}\left[x_{t}x_{t}^{\mathsf{T}}\right]+\mathcal{D}^{\mathsf{T}}\mathcal{S}\mathcal{D}\Sigma_{w}\right)+3Nm\sigma_{\max}(\mathcal{B}^{\mathsf{T}}\mathcal{S}\mathcal{B})U_{\max}^{2}+\mathcal{L}^{\mathsf{T}}\mathcal{A}\hat{x}_{t}+\left\lVert{\mathcal{L}^{\mathsf{T}}\mathcal{B}}\right\rVert_{1}U_{\max},$
	$\displaystyle\beta^{*}_{t}$	$\displaystyle\coloneqq Nm\sigma_{\max}(\tilde{\mathcal{S}})U_{\max}^{2}.$

Corollary 8.

Consider the system (2.1a)-(2.1b), and suppose that Assumption 5, and 7 hold. Then:

(i)

For every time $t=0,N_{c},2N_{c},\cdots$ the optimization problem (OP3) solved in Subroutine 7 is convex, feasible, and equivalent to the following quadratically constrained quadratic optimization problem:

$\displaystyle\operatorname*{minimize}_{(\boldsymbol{\eta}_{t},\boldsymbol{\Theta}_{t})}\quad$	$\displaystyle 2\hat{x}_{t}^{\mathsf{T}}\mathcal{A}^{\mathsf{T}}\mathcal{Q}\mathcal{B}\boldsymbol{\eta}_{t}+2\mathsf{tr}\!\left(\mathcal{A}^{\mathsf{T}}\mathcal{Q}\mathcal{B}\boldsymbol{\Theta}_{t}\Lambda^{\varphi x}_{t}\right)+2\mathsf{tr}\!\left(\boldsymbol{\Theta}_{t}^{\mathsf{T}}\mathcal{B}^{\mathsf{T}}\mathcal{Q}\mathcal{D}\Lambda^{w\varphi}_{t}\right)$
	$\displaystyle+\boldsymbol{\eta}_{t}^{\mathsf{T}}\mathcal{M}_{1}\boldsymbol{\eta}_{t}+2\boldsymbol{\eta}_{t}^{\mathsf{T}}\mathcal{M}_{1}\boldsymbol{\Theta}_{t}\Lambda^{\varphi}_{t}+\mathsf{tr}\!\left(\boldsymbol{\Theta}_{t}^{\mathsf{T}}\mathcal{M}_{1}\boldsymbol{\Theta}_{t}\Lambda^{\varphi\varphi}_{t}\right)$
$\displaystyle\mathrm{subject\ to}\quad$	$\displaystyle\text{the structure of}\ \boldsymbol{\Theta}_{t}\ \text{in}\ \eqref{eq:decision-constraints},$
	$\displaystyle\|(\boldsymbol{\eta}_{t})_{i}\|+\left\lVert{(\boldsymbol{\Theta}_{t})_{i}}\right\rVert_{1}\varphi_{\max}\leqslant U_{\max}\quad\forall\,i=1,\cdots,Nm,$
	$\displaystyle\left\lVert{A_{2}^{\kappa}{\hat{x}}^{(2)}_{t}+\mathfrak{R}_{\kappa}(A_{2},B_{2})(\boldsymbol{\eta}_{t})_{1:\kappa m}}\right\rVert+\sqrt{n_{2}}\left\lVert{\mathfrak{R}_{\kappa}(A_{2},B_{2})(\boldsymbol{\Theta}_{t})_{1:\kappa m}}\right\rVert_{\infty}\varphi_{\max}$
	$\displaystyle\hskip 14.22636pt\leqslant\bigl{\\|}{\hat{x}}^{(2)}_{t}\bigr{\\|}-(\zeta+\tfrac{\varepsilon}{2})\;\;\text{whenever }\bigl{\\|}{\hat{x}}^{(2)}_{t}\bigr{\\|}\geqslant\zeta+\varepsilon,\ \forall i=1,\cdots,n_{2}$
	$\displaystyle\boldsymbol{\eta}_{t}^{\mathsf{T}}\mathcal{B}^{\mathsf{T}}\mathcal{S}\mathcal{B}\boldsymbol{\eta}_{t}+2\boldsymbol{\eta}_{t}^{\mathsf{T}}\mathcal{B}^{\mathsf{T}}\mathcal{S}\mathcal{B}\boldsymbol{\Theta}_{t}\Lambda^{\varphi}_{t}+\mathsf{tr}\!\left(\boldsymbol{\Theta}_{t}^{\mathsf{T}}\mathcal{B}^{\mathsf{T}}\mathcal{S}\mathcal{B}\boldsymbol{\Theta}_{t}\Lambda^{\varphi\varphi}_{t}\right)$
	$\displaystyle\quad+2\hat{x}_{t}^{\mathsf{T}}\mathcal{A}^{\mathsf{T}}\mathcal{S}\mathcal{B}\boldsymbol{\eta}_{t}+2\mathsf{tr}\!\left(\mathcal{A}^{\mathsf{T}}\mathcal{S}\mathcal{B}\boldsymbol{\Theta}_{t}\Lambda^{\varphi x}_{t}+\boldsymbol{\Theta}_{t}^{\mathsf{T}}\mathcal{B}^{\mathsf{T}}\mathcal{S}\mathcal{D}\Lambda^{w\varphi}_{t}\right)$
	$\displaystyle\quad+\mathcal{L}^{\mathsf{T}}\mathcal{B}(\boldsymbol{\eta}_{t}+\boldsymbol{\Theta}_{t}\Lambda^{\varphi}_{t})+\mathsf{tr}\!\left(\mathcal{A}^{\mathsf{T}}\mathcal{S}\mathcal{A}\mathbb{E}_{\mathcal{Y}_{t}}\left[x_{t}x_{t}^{\mathsf{T}}\right]\right)+\mathsf{tr}\!\left(\mathcal{D}^{\mathsf{T}}\mathcal{S}\mathcal{D}\Sigma_{w}\right)$
	$\displaystyle\quad+\mathcal{L}^{\mathsf{T}}\mathcal{A}\hat{x}_{t}\leqslant\alpha_{t},$	(3.8)
	$\displaystyle\boldsymbol{\eta}_{t}^{\mathsf{T}}\tilde{\mathcal{S}}\boldsymbol{\eta}_{t}+2\boldsymbol{\eta}_{t}^{\mathsf{T}}\tilde{\mathcal{S}}\boldsymbol{\Theta}_{t}\Lambda^{\varphi}_{t}+\mathsf{tr}\!\left(\boldsymbol{\Theta}_{t}^{\mathsf{T}}\tilde{\mathcal{S}}\boldsymbol{\Theta}_{t}\Lambda^{\varphi\varphi}_{t}\right)\leqslant\beta_{t}.$	(3.9)

(ii)

For every initial state and noise statistics $\bigl{(}(\hat{x}_{0},\Sigma_{x_{0}}),\Sigma_{w},\Sigma_{v}\bigr{)}$ , successive application of the control laws given by the optimization problem in (i) for $N_{c}$ steps renders the closed-loop system mean-square bounded in the following sense: there exists a constant $\gamma=\gamma(\mathcal{Y}_{0},\kappa,\hat{x}_{0},\Sigma_{x_{0}},\Sigma_{w},\Sigma_{v},U_{\max}^{*})>0$ such that

$\sup_{t\in\mathbb{N}}\;\mathbb{E}_{\mathcal{Y}_{0}}\bigl{[}\left\lVert{x_{t}}\right\rVert^{2}\bigr{]}\leqslant\gamma.$

4. Discussion

The optimization problem (OP2) being solved in Theorem 6 is a quadratic program (QP), whereas the optimization problem (OP3) being solved in Corollary (8) is a quadratically constrained quadratic program (QCQP) in the optimization parameters $(\eta,\Theta)$ . As such, both can be easily solved via software packages such as cvx [Grant and Boyd, 2000] or yalmip [Löfberg, 2004].

4.1. Other Constraints and Policies

It is not difficult to see that constraints on the variation of the inputs of the form

\left\lVert{Pu}\right\rVert_{\infty}\leqslant\Delta U_{\max},

where $P=\tiny\left[\begin{matrix}I&-I&0&\cdots&0\\ 0&I&-I&\cdots&0\\ \vdots&&&&\vdots\\ 0&\cdots&0&I&-I\end{matrix}\right]$ , can be incorporated into the optimization problems (OP2) and (OP3). Moreover, we can easily solve the problem using quadratic policies of the form

U_{t}=\eta+\Theta\varphi(Y_{t}-\hat{Y}_{t})+\tilde{\Theta}\tilde{\varphi}(Y_{t}-\hat{Y}_{t}),

(POL^′)

instead of (POL), where

{\tiny\tilde{\Theta}\coloneqq\left[\begin{matrix}\tilde{\theta}_{0,0}&0&\cdots&0\\ \tilde{\theta}_{1,0}&\tilde{\theta}_{1,1}&&\vdots\\ \vdots&\vdots&\ddots&0\\ \tilde{\theta}_{N-1,0}&\tilde{\theta}_{N-1,1}&\cdots&\tilde{\theta}_{N-1,N-1}\end{matrix}\right]},\ \tilde{\varphi}(z)\coloneqq{\tiny\left[\begin{matrix}\tilde{\varphi}_{0}(y_{t}-\hat{y}_{t})^{\mathsf{T}}\tilde{\varphi}_{0}(y_{t}-\hat{y}_{t})\\ \vdots\\ \tilde{\varphi}_{N-1}(y_{t+N-1}-\hat{y}_{t+N-1})^{\mathsf{T}}\tilde{\varphi}_{N-1}(y_{t+N-1}-\hat{y}_{t+N-1})\end{matrix}\right]},

and $\tilde{\theta}_{i,j}\in\mathbb{R}^{m\times 1}$ . The underlying optimization problems (OP2) and (OP3) with the policy (POL^′) are still convex and both Theorem (6) and Corollary (8) still apply with minor changes.

4.2. Off-Line Computation of the $\Lambda$ Matrices

At any time $t=0,N_{c},2N_{c},\cdots$ , our ability to solve the optimization problems (OP2) and (OP3) in Theorem 6 and Corollary 8, respectively, hinges upon our ability to compute the following matrices

$\displaystyle\Lambda^{\varphi e}_{t}$	$\displaystyle=\mathbb{E}_{\mathcal{Y}_{t}}[\varphi(Y_{t}-\hat{Y}_{t})(x_{t}-\hat{x}_{t})^{\mathsf{T}}],$	$\displaystyle\Lambda^{w\varphi}_{t}$	$\displaystyle=\mathbb{E}_{\mathcal{Y}_{t}}[W\varphi(Y_{t}-\hat{Y}_{t})^{\mathsf{T}}],$	(4.1)
$\displaystyle\Lambda^{\varphi}_{t}$	$\displaystyle=\mathbb{E}_{\mathcal{Y}_{t}}[\varphi(Y_{t}-\hat{Y}_{t})],$	$\displaystyle\Lambda^{\varphi\varphi}_{t}$	$\displaystyle=\mathbb{E}_{\mathcal{Y}_{t}}[\varphi(Y_{t}-\hat{Y}_{t})\varphi(Y_{t}-\hat{Y}_{t})^{\mathsf{T}}],$
$\displaystyle\Lambda^{\varphi x}_{t}$	$\displaystyle=\Lambda^{\varphi e}_{t}+\Lambda^{\varphi}_{t}\hat{x}_{t}^{\mathsf{T}}.$

Recall that $Y_{t}-\hat{Y}_{t}$ is the innovations sequence that was given in (2.13), and $\hat{x}_{t}$ is the optimal mean-square estimate of $x_{t}$ given the history $\mathcal{Y}_{t}$ . The matrices (4.1) may be computed by numerical integration with respect to the independent Gaussian measures of $w_{t},\ldots w_{t+N-1}$ , of $v_{t},\ldots$ $v_{t+N}$ , and of $x_{t}$ given $\mathcal{Y}^{t}$ . Due to the large dimensionality of the integration space, this approach may be impractical for online computations. One alternative approach relies on the observation that $\Lambda^{\varphi e}_{t},\Lambda^{\varphi}_{t},\Lambda^{w\varphi}_{t}$ , and $\Lambda^{\varphi\varphi}_{t}$ depend on $x_{t}$ via the difference $x_{t}-\hat{x}_{t}$ . Since $x_{t}-\hat{x}_{t}$ is conditionally zero-mean given $\mathcal{Y}^{t}$ , we can write the dependency of (4.1) on the time-varying statistics of $x_{t}$ given $\mathcal{Y}^{t}$ as follows:

		$\displaystyle\Lambda^{\varphi x}_{t}(\hat{x}_{t},P_{t\|t})=\Lambda^{\varphi e}_{t}(P_{t\|t})+\Lambda^{\varphi}_{t}(P_{t\|t})\hat{x}_{t}^{\mathsf{T}},$		(4.2)
		$\displaystyle\Lambda^{w\varphi}_{t}(P_{t\|t}),\text{ and }\Lambda^{\varphi\varphi}_{t}(P_{t\|t}).$		(4.2)

In principle one may compute off-line and store the matrices $\Lambda^{\varphi e}_{t}(P_{t|t}),\Lambda^{\varphi}_{t}(P_{t|t}),\Lambda^{w\varphi}_{t}(P_{t|t})$ , and $\Lambda^{\varphi\varphi}_{t}(P_{t|t})$ , which depend on the covariance matrices $P_{t|t}$ , and just update online the value of $\Lambda^{\varphi x}_{t}(\hat{x}_{t},P_{t|t})$ as the estimate $\hat{x}_{t}$ becomes available. However, this poses serious requirements in terms of memory. A more appealing alternative is to exploit the convergence properties of $P_{t|t}$ . The following result can be inferred, for instance, from [Kamen and Su, 1999, Theorem 5.1].

Proposition 9.

Under Assumptions 2 and 5-(iii)

•

the (discrete-time) algebraic Riccati equation in $P\in\mathbb{R}^{n\times n}$

$P=A[P-PC^{\mathsf{T}}(CPC^{\mathsf{T}}+\Sigma_{v})^{-1}CP]A^{\mathsf{T}}+\Sigma_{w}$ (4.3)

has a unique solution $P^{*}\geqslant 0$ , and
•

the sequence $P_{t+1|t}$ defined by (2.7) and (2.9) converges to $P^{*}$ as $t$ tends to $\infty$ , for any initial condition $P_{0|-1}\geqslant 0$ .

The assumption that $\Sigma_{v}>0$ can be relaxed to $\Sigma_{v}\geqslant 0$ at the price of some additional technicality (more on this can be found in [Ferrante et al., 2002]). As a consequence of this result, under detectability and stabilizability assumptions, $P_{t|t}$ converges to

P^{\circ}=P^{*}-P^{*}C^{\mathsf{T}}(CP^{*}C^{\mathsf{T}}+\Sigma_{v})^{-1}CP^{*},

(4.4)

which is the asymptotic error covariance matrix of the estimator $\hat{x}_{t}$ . Thus, neglecting the initial transient, it makes sense to just compute off-line and store the matrices $\Lambda^{\varphi e}_{t}(P^{\circ}),\Lambda^{\varphi}_{t}(P^{\circ}),\Lambda^{w\varphi}_{t}(P^{\circ})$ , and $\Lambda^{\varphi\varphi}_{t}(P^{\circ})$ , and just update the matrix $\Lambda^{\varphi x}_{t}$ for new values of the estimate $\hat{x}_{t}$ .

5. Proofs

The proofs of the main results are presented as follows: We begin by showing the result in Scholium 4. Then, we state a fundamental result pertaining to mean-square boundedness for general random sequences in Proposition 11, which is utilized to show the mean-square boundedness conclusions of Theorem 6 and Corollary 8. We proceed to show the first assertion (i) of Theorem 6 in Lemma 12. The proof of the second assertion (ii) of Theorem 6 starts by showing Lemmas 13 and 14 to conclude mean-square boundedness of the orthogonal subsystem $(A_{2},B_{2})$ . We conclude the proof of Theorem 6 by showing mean-square boundedness of the Schur stable subsystem $(A_{1},B_{1})$ . We end this section by proving the extra conclusions of Corollary 8, beyond those present in Theorem 6.

Let us now look at the estimation equation $\hat{x}_{t}(=\hat{x}_{t|t})$ in (2.6) and combine it with (2.8) and the output equation (2.1b) to obtain

{\hat{x}}_{t+1}=A{\hat{x}}_{t}+Bu_{t}+K_{t}\bigl{(}CA(x_{t}-{\hat{x}}_{t})+Cw_{t}+v_{t+1}\bigr{)},

(5.1)

where $K_{t}=(AP_{t|t}A^{\mathsf{T}}+\Sigma_{w})C^{\mathsf{T}}\bigl{(}C(AP_{t|t}A^{\mathsf{T}}+\Sigma_{w})C^{\mathsf{T}}+\Sigma_{v}\bigr{)}^{-1}$ is the Kalman gain. Our next Fact pertains to the boundedness of the error covariance matrices $P_{t|t}$ in (2.7).

Fact 10.

There exists ${T}^{\prime}\in\mathbb{N}$ and $\rho,\rho_{m}>0$ such that $\mathsf{tr}\!\left(P_{t|t}\right)\leqslant\rho$ for all $t\geqslant{T}^{\prime}$ , and $\left\lVert{K_{t}}\right\rVert\leqslant\rho_{m}$ for all $t\geqslant{T}^{\prime}$ .

Fact 10 follows, for example, immediately from Lemma 15 in Appendix B and since by assumption $\Sigma_{v}>0$ , one possible bound on $\bigl{(}\left\lVert{K_{t}}\right\rVert\bigr{)}_{t\geqslant{T}^{\prime}}$ is given by $\left\lVert{K_{t}}\right\rVert\leqslant\frac{\left\lVert{AP_{t|t}A^{\mathsf{T}}+\Sigma_{w}}\right\rVert\left\lVert{C^{\mathsf{T}}}\right\rVert}{\lambda_{\min}(\Sigma_{v})}$ $\leqslant\frac{\bigl{(}\left\lVert{\Sigma_{w}}\right\rVert+\left\lVert{A}\right\rVert^{2}\left\lVert{P_{t|t}}\right\rVert\bigr{)}\left\lVert{C^{\mathsf{T}}}\right\rVert}{\lambda_{\min}(\Sigma_{v})}$ . Note that there are many alternative bounds in the Literature, see, for example, [Anderson and Moore, 1979; Jazwinski, 1970]. Now, using the bounds in Fact 10, we can proceed to prove Scholium 4.

Proof of Scholium 4.

Recall the expression of $\Xi_{t}$ in (3.1) and define the following quantities:

$\displaystyle F^{e}_{\kappa t}$	$\displaystyle\coloneqq\begin{bmatrix}A^{\kappa-1}K_{\kappa t}CA&\cdots&AK_{\kappa(t+1)-2}CA&K_{\kappa(t+1)-1}CA\end{bmatrix},$	(5.2)
$\displaystyle F^{w}_{\kappa t}$	$\displaystyle\coloneqq\begin{bmatrix}A^{\kappa-1}K_{\kappa t}C&\cdots&AK_{\kappa(t+1)-2}C&K_{\kappa(t+1)-1}C\end{bmatrix},$
$\displaystyle F^{v}_{\kappa t}$	$\displaystyle\coloneqq\begin{bmatrix}A^{\kappa-1}K_{\kappa t}&\cdots&AK_{\kappa(t+1)-2}&K_{\kappa(t+1)-1}\end{bmatrix}.$

Then, $\Xi_{\kappa t}$ can be rewritten as

\Xi_{\kappa t}=F^{e}_{\kappa t}e_{\kappa t:\kappa(t+1)-1}+F^{w}_{\kappa t}w_{\kappa t:\kappa(t+1)-1}+F^{v}_{\kappa t}v_{\kappa t+1:\kappa(t+1)}.

(5.3)

But for $t\geqslant{\left\lceil{{T}^{\prime}/\kappa}\right\rceil}$ , ¹¹1Recall that for any positive real number $s$ , $\lceil s\rceil$ denotes the smallest integer that upper-bounds $s$ . we have that

	$\displaystyle\left\lVert{F^{e}_{\kappa t}}\right\rVert$	$\displaystyle\leqslant\kappa\left\lVert{CA}\right\rVert\sup_{\ell\geqslant\kappa{\left\lceil{{T}^{\prime}/\kappa}\right\rceil}}\left\lVert{K_{\ell}}\right\rVert,$
	$\displaystyle\left\lVert{F^{w}_{\kappa t}}\right\rVert$	$\displaystyle\leqslant\kappa\left\lVert{C}\right\rVert\sup_{\ell\geqslant\kappa{\left\lceil{{T}^{\prime}/\kappa}\right\rceil}}\left\lVert{K_{\ell}}\right\rVert,$
	$\displaystyle\left\lVert{F^{v}_{\kappa t}}\right\rVert$	$\displaystyle\leqslant\kappa\sup_{\ell\geqslant\kappa{\left\lceil{{T}^{\prime}/\kappa}\right\rceil}}\left\lVert{K_{\ell}}\right\rVert.$

Using Fact 10 we know that $\sup_{\ell\geqslant\kappa{\left\lceil{{T}^{\prime}/\kappa}\right\rceil}}\left\lVert{K_{\ell}}\right\rVert\leqslant\rho_{h}$ . Thus, it suffices to take

\zeta=\kappa^{3/2}\rho_{h}\Bigl{(}\left\lVert{CA}\right\rVert\sqrt{\rho}+\left\lVert{C}\right\rVert\sqrt{\mathsf{tr}\!\left(\Sigma_{w}\right)}+\sqrt{\mathsf{tr}\!\left(\Sigma_{v}\right)}\Bigr{)}

in order to upper-bound the expectation of $\Xi(t)$ in (5.3) after time $T\coloneqq\kappa{\left\lceil{{T}^{\prime}/\kappa}\right\rceil}$ . ∎

The following result pertains to mean-square boundedness of a random sequence $(\xi_{t})_{t\in\mathbb{N}}$ :

Proposition 11 ([Pemantle and Rosenthal, 1999, Theorem 1]).

Let $(\xi_{t})_{t\in\mathbb{N}}$ be a sequence of nonnegative random variables on some probability space $(\Omega,\mathfrak{F},\mathbb{P})$ , and let $(\mathfrak{F}_{t})_{t\in\mathbb{N}}$ be any filtration to which $(\xi_{t})_{t\in\mathbb{N}}$ is adapted. Suppose that there exist constants $b>0$ , and $J,M<\infty$ , such that

\xi_{0}\leqslant J,

and for all $t\in\mathbb{N}$ :

	$\displaystyle\mathbb{E}\bigl{[}\xi_{t+1}-\xi_{t}\big{\|}\mathfrak{F}_{t}\bigr{]}\leqslant-b\quad\text{on the event }\{\xi_{t}>J\},\quad\text{and}$		(5.4)
	$\displaystyle\mathbb{E}\bigl{[}\left\lvert{\xi_{t+1}-\xi_{t}}\right\rvert^{4}\big{\|}\xi_{0},\ldots,\xi_{t}\bigr{]}\leqslant M.$		(5.5)

Then there exists a constant $\gamma=\gamma(b,J,M)>0$ such that $\displaystyle{\sup_{t\in\mathbb{N}}\mathbb{E}\bigl{[}\xi_{t}^{2}\bigr{]}\leqslant\gamma}$ .

Lemma 12.

Consider the system (2.1a)-(2.1b), and suppose that Assumption 5 holds. Then the first assertion (i) of Theorem 6 holds.

Proof.

It is clear that $X_{t}^{\mathsf{T}}\mathcal{Q}X_{t}+U_{t}^{\mathsf{T}}\mathcal{R}U_{t}$ is convex quadratic in $X_{t}$ and $U_{t}$ , and both $X_{t}$ and $U_{t}$ are affine functions of the design parameters $(\boldsymbol{\eta}_{t},\boldsymbol{\Theta}_{t})$ for every realization of the noise $(w_{t})_{t\in\mathbb{N}}$ . Since taking expectation of a convex function retains convexity [Boyd and Vandenberghe, 2004], we conclude that $\mathbb{E}_{x_{t}}\bigl{[}X_{t}^{\mathsf{T}}\mathcal{Q}X_{t}+U_{t}^{\mathsf{T}}\mathcal{R}U_{t}\bigr{]}$ is convex in $(\boldsymbol{\eta}_{t},\boldsymbol{\Theta}_{t})$ . Also, note that the constraint sets described by (3.4) and (3.5) are convex in $(\boldsymbol{\eta}_{t},\boldsymbol{\Theta}_{t})$ . This settles the claim concerning convexity of the optimization program in Theorem 6-(i).

Concerning the objective function (3.3), we have that

	$\displaystyle\mathbb{E}_{\mathcal{Y}_{t}}\bigl{[}X_{t}^{\mathsf{T}}\mathcal{Q}X_{t}+U_{t}^{\mathsf{T}}\mathcal{R}U_{t}\bigr{]}$
	$\displaystyle\;=\mathbb{E}_{\mathcal{Y}_{t}}\bigl{[}\bigl{(}\mathcal{A}x_{t}+\mathcal{B}U_{t}+\mathcal{D}W_{t}\bigr{)}^{\mathsf{T}}\mathcal{Q}\bigl{(}\mathcal{A}x_{t}+\mathcal{B}U_{t}+\mathcal{D}W_{t}\bigr{)}+U_{t}^{\mathsf{T}}\mathcal{R}U_{t}\bigr{]}$
	$\displaystyle\;=\mathbb{E}_{\mathcal{Y}_{t}}\Bigl{[}\bigl{\\|}\mathcal{A}x_{t}+\mathcal{B}\bigl{(}\boldsymbol{\eta}_{t}+\boldsymbol{\Theta}_{t}\varphi(Y_{t}-\hat{Y}_{t})\bigr{)}+\mathcal{D}W_{t}\bigr{\\|}_{\mathcal{Q}}^{2}+\bigl{\\|}\boldsymbol{\eta}_{t}+\boldsymbol{\Theta}_{t}\varphi(Y_{t}-\hat{Y}_{t})\bigr{\\|}_{\mathcal{R}}^{2}\Bigr{]}$
	$\displaystyle\;=\mathbb{E}_{\mathcal{Y}_{t}}\Bigl{[}x_{t}^{\mathsf{T}}\mathcal{A}^{\mathsf{T}}\mathcal{Q}\mathcal{A}x_{t}+2x_{t}^{\mathsf{T}}\mathcal{A}^{\mathsf{T}}\mathcal{Q}\mathcal{B}\boldsymbol{\eta}_{t}+2x_{t}^{\mathsf{T}}\mathcal{A}^{\mathsf{T}}\mathcal{Q}\mathcal{B}\boldsymbol{\Theta}_{t}\varphi(Y_{t}-\hat{Y}_{t})+2x_{t}^{\mathsf{T}}\mathcal{A}^{\mathsf{T}}\mathcal{Q}\mathcal{D}W_{t}$
	$\displaystyle\qquad+\boldsymbol{\eta}_{t}^{\mathsf{T}}(\mathcal{B}^{\mathsf{T}}\mathcal{Q}\mathcal{B}+\mathcal{R})\boldsymbol{\eta}_{t}+2\boldsymbol{\eta}_{t}^{\mathsf{T}}(\mathcal{B}^{\mathsf{T}}\mathcal{Q}\mathcal{B}+\mathcal{R})\boldsymbol{\Theta}_{t}\varphi(Y_{t}-\hat{Y}_{t})+2\boldsymbol{\eta}_{t}^{\mathsf{T}}\mathcal{B}^{\mathsf{T}}\mathcal{Q}\mathcal{D}W_{t}$
	$\displaystyle\qquad+\varphi(Y_{t}-\hat{Y}_{t})^{\mathsf{T}}\boldsymbol{\Theta}_{t}^{\mathsf{T}}(\mathcal{B}^{\mathsf{T}}\mathcal{Q}\mathcal{B}+\mathcal{R})\boldsymbol{\Theta}_{t}\varphi(Y_{t}-\hat{Y}_{t})+2\varphi(Y_{t}-\hat{Y}_{t})^{\mathsf{T}}\boldsymbol{\Theta}_{t}^{\mathsf{T}}\mathcal{B}^{\mathsf{T}}\mathcal{Q}\mathcal{D}W_{t}$
	$\displaystyle\qquad+W_{t}^{\mathsf{T}}\mathcal{D}^{\mathsf{T}}\mathcal{Q}\mathcal{D}W_{t}\Bigr{]}$

Since $\mathbb{E}_{\{\mathcal{Y}_{t},x_{t}\}}\bigl{[}W_{t}\bigr{]}=0$ and in view of the definitions of the various matrices in (3.6) we have

	$\displaystyle V_{t}$	$\displaystyle=2\hat{x}_{t}^{\mathsf{T}}\mathcal{A}^{\mathsf{T}}\mathcal{Q}\mathcal{B}\boldsymbol{\eta}_{t}+2\mathsf{tr}\!\left(\mathcal{A}^{\mathsf{T}}\mathcal{Q}\mathcal{B}\boldsymbol{\Theta}_{t}\Lambda^{\varphi x}_{t}\right)+2\mathsf{tr}\!\left(\boldsymbol{\Theta}_{t}^{\mathsf{T}}\mathcal{B}^{\mathsf{T}}\mathcal{Q}\mathcal{D}\Lambda^{w\varphi}_{t}\right)$
		$\displaystyle\quad+\boldsymbol{\eta}_{t}^{\mathsf{T}}\mathcal{M}_{1}\boldsymbol{\eta}_{t}+2\boldsymbol{\eta}_{t}^{\mathsf{T}}\mathcal{M}_{1}\boldsymbol{\Theta}_{t}\Lambda^{\varphi}_{t}+\mathsf{tr}\!\left(\boldsymbol{\Theta}_{t}^{\mathsf{T}}\mathcal{M}_{1}\boldsymbol{\Theta}_{t}\Lambda^{\varphi\varphi}_{t}\right)$
		$\displaystyle\quad+\text{terms that do not depend on }(\boldsymbol{\eta}_{t},\boldsymbol{\Theta}_{t}).$

This tallies the objective function in (3.3) with the objective $V_{t}$ in (2.14).

Concerning the constraints, we have shown in [Hokayem et al., 2009; Chatterjee et al., 2009] that combining the constraint $\left\lVert{u_{t}}\right\rVert_{\infty}\leqslant U_{\max}$ and the class of policies (POL) is equivalent to the constraints $\left\lvert{(\boldsymbol{\eta}_{t})_{i}}\right\rvert+\left\lVert{(\boldsymbol{\Theta}_{t})_{i}}\right\rVert_{1}\varphi_{\max}\leqslant U_{\max}$ for all $i=1,\ldots,Nm$ , which accounts for the constraint (3.4). Substituting (POL) into the stability constraint (C2), we obtain

	$\displaystyle\left\lVert{A_{2}^{\kappa}{\hat{x}}^{(2)}_{t}+\mathfrak{R}_{\kappa}(A_{2},B_{2})(\boldsymbol{\eta}_{t})_{1:\kappa m}+\mathfrak{R}_{\kappa}(A_{2},B_{2})(\boldsymbol{\Theta}_{t})_{1:\kappa m}\varphi(W)}\right\rVert$
	$\displaystyle\qquad\leqslant\left\lVert{A_{2}^{\kappa}{\hat{x}}^{(2)}_{t}+\mathfrak{R}_{\kappa}(A_{2},B_{2})(\boldsymbol{\eta}_{t})_{1:\kappa m}}\right\rVert+\left\lVert{\mathfrak{R}_{\kappa}(A_{2},B_{2})(\boldsymbol{\Theta}_{t})_{1:\kappa m}\varphi(W)}\right\rVert$
	$\displaystyle\qquad\leqslant\left\lVert{A_{2}^{\kappa}{\hat{x}}^{(2)}_{t}+\mathfrak{R}_{\kappa}(A_{2},B_{2})(\boldsymbol{\eta}_{t})_{1:\kappa m}}\right\rVert+\sqrt{n_{2}}\left\lVert{\mathfrak{R}_{\kappa}(A_{2},B_{2})(\boldsymbol{\Theta}_{t})_{1:\kappa m}\varphi(W)}\right\rVert_{\infty}$
	$\displaystyle\qquad\leqslant\left\lVert{A_{2}^{\kappa}{\hat{x}}^{(2)}_{t}+\mathfrak{R}_{\kappa}(A_{2},B_{2})(\boldsymbol{\eta}_{t})_{1:\kappa m}}\right\rVert+\sqrt{n_{2}}\left\lVert{\mathfrak{R}_{\kappa}(A_{2},B_{2})(\boldsymbol{\Theta}_{t})_{1:\kappa m}}\right\rVert_{\infty}\varphi_{\max}.$

Accordingly, if condition constraint (3.5) is satisfied, then the stability constraint (C2) is satisfied as well.

It remains to show that the constraints are simultaneously feasible. Inspired by the work in [Ramponi et al., 2009], we consider the candidate controller

\tilde{u}_{t:t+\kappa-1}\coloneqq\left[\begin{matrix}\eta_{t}\\ \vdots\\ \eta_{t+\kappa-1}\end{matrix}\right]\coloneqq-\mathfrak{R}_{\kappa}(A_{2},B_{2})^{\dagger}\operatorname{sat}_{r}(A_{2}^{\kappa}\hat{x}_{t}^{(2)}),

i.e., with $\boldsymbol{\Theta}_{t}\equiv 0$ , where $r\coloneqq\zeta+\varepsilon/2$ . First, we have that

\left\lVert{\tilde{u}_{t:t+\kappa-1}}\right\rVert_{\infty}\leqslant\left\lVert{\tilde{u}_{t:t+\kappa-1}}\right\rVert_{2}\leqslant\sigma_{\min}(\mathfrak{R}_{\kappa}(A_{2},B_{2}))^{-1}(\zeta+\varepsilon/2)=U_{\max}^{*},

and the constraint (C1) is satisfied. Concerning constraint (C2), we have that

	$\displaystyle\left\lVert{A_{2}^{\kappa}\hat{x}_{t}^{(2)}+\mathfrak{R}_{\kappa}(A_{2},B_{2})\tilde{u}_{t:t+\kappa-1}}\right\rVert$	$\displaystyle=\left\lVert{\hat{x}_{t}^{(2)}}\right\rVert-r$
		$\displaystyle=\left\lVert{\hat{x}_{t}^{(2)}}\right\rVert-(\zeta+\varepsilon/2),\ \ \text{whenever }\bigl{\\|}{\hat{x}}^{(2)}_{t}\bigr{\\|}\geqslant\zeta+\varepsilon,$		(5.6)

where the first equality follows from the orthogonality of $A_{2}$ (see [Ramponi et al., 2009]), and the constraint (3.5) is also satisfied. The optimization program (3.3) subject to (3.4)-(3.5) is therefore a quadratic program that is equivalent to (OP2). ∎

Lemma 13.

Consider the system (2.1a)-(2.1b), and suppose that Assumption 5 holds. Then there exist constants $b,J>0$ such that²²2For $z\in\mathbb{R}$ the notation $\left\lceil{z}\right\rceil$ stands for the smallest integer greater or equal to $z$ .

\mathbb{E}_{\mathcal{Y}_{\kappa t}}\Bigl{[}\bigl{\|}{\hat{x}}^{(2)}_{\kappa(t+1)}\bigr{\|}-\bigl{\|}{\hat{x}}^{(2)}_{\kappa t}\bigr{\|}\Bigr{]}\leqslant-b\quad\text{on the set }\bigl{\{}\bigl{\|}{\hat{x}}^{(2)}_{\kappa t}\bigr{\|}>J\bigr{\}}\qquad\text{for all }t\geqslant{\left\lceil{{T}^{\prime}/\kappa}\right\rceil}.

Proof.

Let $\Pi^{(2)}$ be a projection operator that picks the last $n_{2}$ components of any vector in $\mathbb{R}^{n}$ , and consider the subsampled process $({\hat{x}}^{(2)}_{t\kappa})_{t\in\mathbb{N}}$ given by

{\hat{x}}^{(2)}_{\kappa(t+1)}=A_{2}^{\kappa}{\hat{x}}^{(2)}_{\kappa t}+\mathfrak{R}_{\kappa}(A_{2},B_{2})u_{\kappa t:\kappa(t+1)-1}+\Pi^{(2)}\bigl{(}\Xi_{\kappa t}\bigr{)}.

(5.7)

By utilizing the triangle inequality we get

	$\displaystyle\mathbb{E}_{\mathcal{Y}_{\kappa t}}\Bigl{[}\left\lVert{{\hat{x}}^{(2)}_{\kappa(t+1)}}\right\rVert-\left\lVert{{\hat{x}}^{(2)}_{\kappa t}}\right\rVert\Bigr{]}$
	$\displaystyle\;\leqslant\mathbb{E}_{\mathcal{Y}_{\kappa t}}\Bigl{[}\Bigl{\\|}A_{2}^{\kappa}{\hat{x}}^{(2)}_{\kappa t}+\mathfrak{R}_{\kappa}(A_{2},B_{2})u_{\kappa t:\kappa(t+1)-1}\Bigr{\\|}-\left\lVert{{\hat{x}}^{(2)}_{\kappa t}}\right\rVert\Bigr{]}+\mathbb{E}_{\mathcal{Y}_{\kappa t}}\Bigl{[}\left\lVert{\Pi^{(2)}\bigl{(}\Xi_{\kappa t}\bigr{)}}\right\rVert\Bigr{]}.$

We know from Scholium 4 that there exists a uniform (with respect to time $t$ ) upper bound $\zeta$ for the last term on the right-hand side of the preceding inequality for $t\geqslant\kappa{\left\lceil{{T}^{\prime}/\kappa}\right\rceil}$ . We rewrite the inequality as

	$\displaystyle\mathbb{E}_{\mathcal{Y}_{\kappa t}}\Bigl{[}\left\lVert{{\hat{x}}^{(2)}_{\kappa(t+1)}}\right\rVert-\left\lVert{{\hat{x}}^{(2)}_{\kappa t}}\right\rVert\Bigr{]}$	$\displaystyle\leqslant\mathbb{E}_{\mathcal{Y}_{\kappa t}}\Bigl{[}\left\lVert{A_{2}^{\kappa}{\hat{x}}^{(2)}_{\kappa t}+\mathfrak{R}_{\kappa}(A_{2},B_{2})u_{\kappa t:\kappa(t+1)-1}}\right\rVert-\left\lVert{{\hat{x}}^{(2)}_{\kappa t}}\right\rVert\Bigr{]}+\zeta$
		$\displaystyle\leqslant-\frac{\varepsilon}{2},\qquad\text{whenever }\bigl{\\|}{\hat{x}}^{(2)}_{t}\bigr{\\|}\geqslant\zeta+\varepsilon,$

where the last inequality follows from (5.6). Setting $b=\varepsilon/2$ and $J=\zeta+\varepsilon$ completes the proof. ∎

Lemma 14.

Consider the system (2.1a)-(2.1b), and suppose that Assumption 5 holds. Then there exists a constant $M>0$ such that

\mathbb{E}\Bigl{[}\left\lvert{\left\lVert{{\hat{x}}^{(2)}_{\kappa(t+1)}}\right\rVert-\left\lVert{{\hat{x}}^{(2)}_{\kappa t}}\right\rVert}\right\rvert^{4}\,\Big{|}\,\left\lVert{{\hat{x}}^{(2)}_{\kappa i}}\right\rVert,\;i={\left\lceil{{T}^{\prime}/\kappa}\right\rceil},\ldots,t\Bigr{]}\leqslant M\qquad\text{for all }t\geqslant{\left\lceil{{T}^{\prime}/\kappa}\right\rceil}.

Proof.

Fix $t\geqslant{\left\lceil{{T}^{\prime}/\kappa}\right\rceil}$ , and consider again the subsampled process in (5.7). We have, by orthogonality of $A_{2}$ , that $\left\lVert{{\hat{x}}^{(2)}_{\kappa t}}\right\rVert=\left\lVert{A_{2}^{\kappa}{\hat{x}}^{(2)}_{\kappa t}}\right\rVert$ and, by the triangle inequality, that

	$\displaystyle\mathbb{E}\Bigl{[}\left\lvert{\left\lVert{{\hat{x}}^{(2)}_{\kappa(t+1)}}\right\rVert-\left\lVert{{\hat{x}}^{(2)}_{\kappa t}}\right\rVert}\right\rvert^{4}\,\Big{\|}\,\left\lVert{{\hat{x}}^{(2)}_{\kappa i}}\right\rVert,\;i={\left\lceil{{T}^{\prime}/\kappa}\right\rceil},\ldots,t\Bigr{]}$
	$\displaystyle=\mathbb{E}\Bigl{[}\left\lvert{\left\lVert{{\hat{x}}^{(2)}_{\kappa(t+1)}}\right\rVert-\left\lVert{A_{2}^{\kappa}{\hat{x}}^{(2)}_{\kappa t}}\right\rVert}\right\rvert^{4}\,\Big{\|}\,\left\lVert{{\hat{x}}^{(2)}_{\kappa i}}\right\rVert,\;i={\left\lceil{{T}^{\prime}/\kappa}\right\rceil},\ldots,t\Bigr{]}$
	$\displaystyle\leqslant\mathbb{E}\Bigl{[}\left\lVert{A_{2}^{\kappa}{\hat{x}}^{(2)}_{\kappa t}+\mathfrak{R}_{\kappa}(A_{2},B_{2})u_{\kappa t:\kappa(t+1)-1}+\Xi_{\kappa t}-A_{2}^{\kappa}{\hat{x}}^{(2)}_{\kappa t}}\right\rVert^{4}\,\Big{\|}\,\left\lVert{{\hat{x}}^{(2)}_{\kappa i}}\right\rVert,\;i={\left\lceil{{T}^{\prime}/\kappa}\right\rceil},\ldots,t\Bigr{]}$
	$\displaystyle=\mathbb{E}\Bigl{[}\Bigl{(}\left\lVert{\mathfrak{R}_{\kappa}(A_{2},B_{2})u_{\kappa t:\kappa(t+1)-1}}\right\rVert+\left\lVert{\Xi_{\kappa t}}\right\rVert\Bigr{)}^{4}\,\Big{\|}\,\left\lVert{{\hat{x}}^{(2)}_{\kappa i}}\right\rVert,\;i={\left\lceil{{T}^{\prime}/\kappa}\right\rceil},\ldots,t\Bigr{]}.$

Recall that for any two positive numbers $a$ and $b$ , $(a+b)^{2}\leqslant 2a^{2}+2b^{2}$ . Using this latter fact, we arrive at the following upper bound

	$\displaystyle\mathbb{E}\Bigl{[}\left\lvert{\left\lVert{{\hat{x}}^{(2)}_{\kappa(t+1)}}\right\rVert-\left\lVert{{\hat{x}}^{(2)}_{\kappa t}}\right\rVert}\right\rvert^{4}\,\Big{\|}\,\left\lVert{{\hat{x}}^{(2)}_{\kappa i}}\right\rVert,\;i={\left\lceil{{T}^{\prime}/\kappa}\right\rceil},\ldots,t\Bigr{]}$
	$\displaystyle\leqslant 8\mathbb{E}\Bigl{[}\left\lVert{\mathfrak{R}_{\kappa}(A_{2},B_{2})u_{\kappa t:\kappa(t+1)-1}}\right\rVert^{4}+\left\lVert{\Xi_{\kappa t}}\right\rVert^{4}\,\Big{\|}\,\left\lVert{{\hat{x}}^{(2)}_{\kappa i}}\right\rVert,\;i={\left\lceil{{T}^{\prime}/\kappa}\right\rceil},\ldots,t\Bigr{]}.$

By design $\left\lVert{u_{i}}\right\rVert_{\infty}\leqslant U_{\max}$ and $\Xi_{\kappa t}$ is Gaussian and independent of $\Bigl{\{}\left\lVert{{\hat{x}}^{(2)}_{\kappa i}}\right\rVert,\;i={\left\lceil{{T}^{\prime}/\kappa}\right\rceil},\ldots,t\Bigr{\}}$ and has its fourth moment bounded. Therefore, we can easily infer that there exists an $M>0$ such that

\mathbb{E}\Bigl{[}\Bigl{(}\left\lVert{\mathfrak{R}_{\kappa}(A_{2},B_{2})u_{\kappa t:\kappa(t+1)-1}}\right\rVert+\left\lVert{\Xi_{\kappa t}}\right\rVert\Bigr{)}^{4}\,\Big{|}\,\left\lVert{{\hat{x}}^{(2)}_{\kappa i}}\right\rVert,\;i={\left\lceil{{T}^{\prime}/\kappa}\right\rceil},\ldots,t\Bigr{]}\leqslant M

for all $t\geqslant{\left\lceil{{T}^{\prime}/\kappa}\right\rceil}$ . ∎

We are finally ready to prove Theorem 6.

Proof of Theorem 6.

Claim (i) of Theorem 6 was proved in Lemma 12. It remains to show claim (ii). We start by asserting the following inequality

\mathbb{E}_{\mathcal{Y}_{\kappa{\left\lceil{{T}^{\prime}/\kappa}\right\rceil}}}\bigl{[}\left\lVert{x_{t}}\right\rVert^{2}\bigr{]}\leqslant 2\mathbb{E}_{\mathcal{Y}_{\kappa{\left\lceil{{T}^{\prime}/\kappa}\right\rceil}}}\bigl{[}\left\lVert{x_{t}-\hat{x}_{t}}\right\rVert^{2}\bigr{]}+2\mathbb{E}_{\mathcal{Y}_{\kappa{\left\lceil{{T}^{\prime}/\kappa}\right\rceil}}}\bigl{[}\left\lVert{\hat{x}_{t}}\right\rVert^{2}\bigr{]}.

We know from Fact 10 that $\mathbb{E}_{\mathcal{Y}_{\kappa{\left\lceil{{T}^{\prime}/\kappa}\right\rceil}}}\bigl{[}\left\lVert{x_{t}-\hat{x}_{t}}\right\rVert^{2}\bigr{]}\leqslant\rho$ for all $t\geqslant\kappa{\left\lceil{{T}^{\prime}/\kappa}\right\rceil}$ . As such, if we are able to show that the state of the estimator is mean-square bounded, we can immediately infer a mean-square bound on the state of the plant.

We first start by splitting the squared norm of the estimator state as $\left\lVert{\hat{x}_{t}}\right\rVert^{2}=\bigl{\|}\hat{x}_{t}^{(1)}\bigr{\|}^{2}+\bigl{\|}\hat{x}_{t}^{(2)}\bigr{\|}^{2}$ , where $\hat{x}^{(1)}$ and $\hat{x}^{(2)}$ are states corresponding to the Schur and orthogonal parts of the system, respectively. It then follows that

\mathbb{E}_{\mathcal{Y}_{\kappa{\left\lceil{{T}^{\prime}/\kappa}\right\rceil}}}\bigl{[}\left\lVert{\hat{x}_{t}}\right\rVert^{2}\bigr{]}=\mathbb{E}_{\mathcal{Y}_{\kappa{\left\lceil{{T}^{\prime}/\kappa}\right\rceil}}}\bigl{[}\bigl{\|}\hat{x}_{t}^{(1)}\bigr{\|}^{2}\bigr{]}+\mathbb{E}_{\mathcal{Y}_{\kappa{\left\lceil{{T}^{\prime}/\kappa}\right\rceil}}}\bigl{[}\bigl{\|}\hat{x}_{t}^{(2)}\bigr{\|}^{2}\bigr{]},\qquad t\in\mathbb{N}.

(5.8)

Letting $\xi_{t}\coloneqq\bigl{\|}{\hat{x}}^{(2)}_{t}\bigr{\|}$ for $t\in\mathbb{N}$ , we see from Lemma 13 and Lemma 14 the conditions (5.4) and (5.5) of Proposition 11 are verified for the sequence $(\xi_{\kappa t})_{t\geqslant{\left\lceil{{T}^{\prime}/\kappa}\right\rceil}}$ . Thus, by Proposition 11, there exists a constant $\gamma^{(2)}=\gamma^{(2)}(b,J,M)>0$ such that

\mathbb{E}_{\mathcal{Y}_{\kappa{\left\lceil{{T}^{\prime}/\kappa}\right\rceil}}}\bigl{[}\bigl{\|}{\hat{x}}^{(2)}_{\kappa t}\bigr{\|}^{2}\bigr{]}\leqslant\gamma^{(2)}\qquad\text{for all }t\geqslant{\left\lceil{{T}^{\prime}/\kappa}\right\rceil}.

Hence, the orthogonal subsystem is mean-square bounded.

Since the matrix $A_{1}$ is Schur stable, we know [Bernstein, 2009, Proposition 11.10.5] that there exists a positive definite matrix $M\in\mathbb{R}^{n_{1}\times n_{1}}$ that satisfies $A_{1}^{\mathsf{T}}MA_{1}-M=-I$ . It easily follows that there exists $\rho\in\;]0,1[$ such that $A_{1}^{\mathsf{T}}MA_{1}-M\leqslant-\rho M$ ; in fact, $\rho$ can be chosen from the interval $\bigl{]}0,\min\bigl{\{}1,\lambda_{\min}\bigl{(}I\bigr{)}/\lambda_{\max}\bigl{(}M\bigr{)}\bigr{\}}\bigr{[}$ . Therefore, we see that for any $t\geqslant{T}^{\prime}$ ,

\displaystyle\mathbb{E}_{\mathcal{Y}_{t}}

\displaystyle\left[\left\lVert{{\hat{x}}^{(1)}_{t+1}}\right\rVert^{2}_{M}\right]-\left\lVert{{\hat{x}}^{(1)}_{t}}\right\rVert^{2}_{M}\leqslant-\rho\left\lVert{{\hat{x}}^{(1)}_{t}}\right\rVert^{2}_{M}+2\mathbb{E}_{\mathcal{Y}_{t}}\bigl{[}({\hat{x}}^{(1)}_{t})^{\mathsf{T}}A_{1}^{\mathsf{T}}MB_{1}u^{(1)}_{t}\bigr{]}+\mathbb{E}_{\mathcal{Y}_{t}}\bigl{[}\Pi^{(1)}(\Xi_{t})\bigr{]},

where $\Pi^{(1)}$ is the projection onto the first $n_{1}$ coordinates and $\Xi_{t}$ was defined in (3.1). Young’s inequality shows that $2\mathbb{E}_{\mathcal{Y}_{t}}\bigl{[}({\hat{x}}^{(1)}_{t})^{\mathsf{T}}A_{1}^{\mathsf{T}}MB_{1}u^{(1)}_{t}\bigr{]}\leqslant\epsilon\mathbb{E}_{\mathcal{Y}_{t}}\bigl{[}\bigl{\|}A_{1}{\hat{x}}^{(1)}_{t}\bigr{\|}_{M}^{2}\bigr{]}+\frac{1}{\epsilon}\mathbb{E}_{\mathcal{Y}_{t}}\bigl{[}\bigl{\|}B_{1}u^{(1)}_{t}\bigr{\|}_{M}^{2}\bigr{]}$ for $\varepsilon>0$ . Choosing $\epsilon=3\rho/(2(1-\rho))$ and utilizing Fact 10 and the upper bound on the inputs $U_{\max}$ , it follows that for all $t\geqslant{T}^{\prime}$ ,

\displaystyle\mathbb{E}_{\mathcal{Y}_{t}}

\displaystyle\left[\left\lVert{{\hat{x}}^{(1)}_{t+1}}\right\rVert^{2}_{M}\right]-(1-\tfrac{\rho}{2})\left\lVert{{\hat{x}}^{(1)}_{t}}\right\rVert^{2}_{M}\leqslant\frac{2(1-\rho)}{3\rho}\lambda_{\max}\bigl{(}M\bigr{)}\left\lVert{B_{1}}\right\rVert^{2}U_{\max}^{2}n_{1}+\zeta=:c.

Therefore, for $t\geqslant{T}^{\prime}$ , we have that

	$\displaystyle\mathbb{E}_{\mathcal{Y}_{\kappa{\left\lceil{{T}^{\prime}/\kappa}\right\rceil}}}\left[\left\lVert{{\hat{x}}^{(1)}_{\kappa{\left\lceil{{T}^{\prime}/\kappa}\right\rceil}+2}}\right\rVert^{2}_{M}\right]$	$\displaystyle=\mathbb{E}_{\mathcal{Y}_{\kappa{T}^{\prime}}}\left[\mathbb{E}_{\mathcal{Y}_{\kappa{\left\lceil{{T}^{\prime}/\kappa}\right\rceil}+1}}\left[\left\lVert{{\hat{x}}^{(1)}_{\kappa{\left\lceil{{T}^{\prime}/\kappa}\right\rceil}+2}}\right\rVert^{2}_{M}\right]\right]$
		$\displaystyle\leqslant\mathbb{E}_{\mathcal{Y}_{\kappa{\left\lceil{{T}^{\prime}/\kappa}\right\rceil}}}\left[\mathbb{E}_{\mathcal{Y}_{\kappa{\left\lceil{{T}^{\prime}/\kappa}\right\rceil}+1}}\left[(1-\tfrac{\rho}{2})\left\lVert{{\hat{x}}^{(1)}_{\kappa{\left\lceil{{T}^{\prime}/\kappa}\right\rceil}+1}}\right\rVert^{2}_{M}+c\right]\right]$
		$\displaystyle\leqslant(1-\tfrac{\rho}{2})^{2}\left\lVert{{\hat{x}}^{(1)}_{\kappa{\left\lceil{{T}^{\prime}/\kappa}\right\rceil}}}\right\rVert^{2}_{M}+\bigl{(}1+(1-\tfrac{\rho}{2})\bigr{)}c.$

Iterating the last inequality, it follows that

\mathbb{E}_{\mathcal{Y}_{\kappa{\left\lceil{{T}^{\prime}/\kappa}\right\rceil}}}\left[\left\lVert{{\hat{x}}^{(1)}_{t+\kappa{\left\lceil{{T}^{\prime}/\kappa}\right\rceil}}}\right\rVert^{2}_{M}\right]\leqslant(1-\tfrac{\rho}{2})^{t}\left\lVert{{\hat{x}}^{(1)}_{\kappa{\left\lceil{{T}^{\prime}/\kappa}\right\rceil}}}\right\rVert^{2}_{M}+\sum_{i=0}^{t-1}(1-\tfrac{\rho}{2})^{i}c,\qquad t\geqslant{T}^{\prime},

\displaystyle\mathbb{E}_{\mathcal{Y}_{\kappa{\left\lceil{{T}^{\prime}/\kappa}\right\rceil}}}

\displaystyle\left[\left\lVert{{\hat{x}}^{(1)}_{t+\kappa{\left\lceil{{T}^{\prime}/\kappa}\right\rceil}}}\right\rVert^{2}_{M}\right]\leqslant\left\lVert{{\hat{x}}^{(1)}_{\kappa{\left\lceil{{T}^{\prime}/\kappa}\right\rceil}}}\right\rVert^{2}_{M}+(1-\tfrac{\rho}{2})^{-1}c,\qquad t\geqslant{T}^{\prime}.

We can conclude that there exists $\gamma^{(1)}=\gamma^{(1)}({\hat{x}}_{0})>0$ such that

\mathbb{E}_{\mathcal{Y}_{\kappa{\left\lceil{{T}^{\prime}/\kappa}\right\rceil}}}\bigl{[}\bigl{\|}{\hat{x}}^{(1)}_{t}\bigr{\|}^{2}\bigr{]}\leqslant\gamma^{(1)}\qquad\text{for all }t\geqslant\kappa{\left\lceil{{T}^{\prime}/\kappa}\right\rceil}.

(5.9)

We can therefore conclude that

\mathbb{E}_{\mathcal{Y}_{\kappa{\left\lceil{{T}^{\prime}/\kappa}\right\rceil}}}\bigl{[}\bigl{\|}{\hat{x}}_{t}\bigr{\|}^{2}\bigr{]}\leqslant\gamma^{(1)}+\gamma^{(2)}\qquad\text{for all }t\geqslant\kappa{\left\lceil{{T}^{\prime}/\kappa}\right\rceil}.

Since the sequence $(x_{t})_{t\in\mathbb{N}}$ in (2.1a) is generated through the addition of independent mean-square bounded random variables and a bounded control input, and since $\kappa{\left\lceil{{T}^{\prime}/\kappa}\right\rceil}<\infty$ , it follows that there exists $\gamma>0$ such that

\mathbb{E}_{\mathcal{Y}_{0}}\bigl{[}\left\lVert{x_{t}}\right\rVert^{2}\bigr{]}\leqslant\gamma\qquad\text{for all }t\in\mathbb{N},

establishing the second claim (ii) of Theorem 6. ∎

Proof of Corollary 8.

The proof of Corollary 8 follows exactly the same reasoning as in the proof of Theorem 6, up to the constraints in (3.8) and (3.9). Also, rewriting the constraints (C3) and (C4) as (3.8) and (3.9), respectively, can be done similarly to the way we rewrote the cost in Theorem (6). It remains to show the upper bounds $\alpha^{*}$ and $\beta^{*}$ in Assumption (7).

The constraint (C3) can be upper-bounded as follows:

	$\displaystyle\mathbb{E}_{\mathcal{Y}_{t}}\left[X_{t}^{\mathsf{T}}\mathcal{S}X_{t}+\mathcal{L}^{\mathsf{T}}X_{t}\right]$
	$\displaystyle\quad=\mathbb{E}_{\mathcal{Y}_{t}}\left[\left\lVert{\mathcal{A}x_{t}+\mathcal{B}U_{t}+\mathcal{D}W_{t}}\right\rVert_{\mathcal{S}}^{2}+\mathcal{L}^{\mathsf{T}}(\mathcal{A}x_{t}+\mathcal{B}U_{t}+\mathcal{D}W_{t})\right]$
	$\displaystyle\quad\leqslant 3\mathbb{E}_{\mathcal{Y}_{t}}\left[\left\lVert{\mathcal{A}x_{t}}\right\rVert^{2}_{\mathcal{S}}+\left\lVert{\mathcal{B}U_{t}}\right\rVert_{\mathcal{S}}^{2}+\left\lVert{\mathcal{D}W_{t}}\right\rVert_{\mathcal{S}}^{2}\right]+\mathcal{L}^{\mathsf{T}}\mathcal{A}\hat{x}_{t}+\|\mathcal{L}^{\mathsf{T}}\mathcal{B}U_{t}\|$
	$\displaystyle\quad\leqslant 3\mathsf{tr}\!\left(\mathcal{A}^{\mathsf{T}}\mathcal{S}\mathcal{A}\mathbb{E}_{\mathcal{Y}_{t}}\left[x_{t}x_{t}^{\mathsf{T}}\right]+\mathcal{D}^{\mathsf{T}}\mathcal{S}\mathcal{D}\Sigma_{w}\right)+\mathcal{L}^{\mathsf{T}}\mathcal{A}\hat{x}_{t}$
	$\displaystyle\quad\quad 3Nm\sigma_{\max}(\mathcal{B}^{\mathsf{T}}\mathcal{S}\mathcal{B})U_{\max}^{2}++\left\lVert{\mathcal{L}^{\mathsf{T}}\mathcal{B}}\right\rVert_{1}U_{\max},$

where the first inequality follows from the fact that $(a+b+c)^{2}\leqslant 3(a^{2}+b^{2}+c^{2})$ , for any $a,b,c>0$ , and the noise being zero-mean, the second inequality follows from applying norm bounds between the 2 and $\infty$ -norms and Hölder’s inequality. As for the constraint (C4), it can be upper-bounded as follows:

\displaystyle\mathbb{E}_{\mathcal{Y}_{t}}\left[U_{t}^{\mathsf{T}}\tilde{\mathcal{S}}U_{t}\right]\leqslant\sigma_{\max}(S)\left\lVert{U_{t}}\right\rVert^{2}\leqslant Nm\sigma_{\max}(S)\left\lVert{U_{t}}\right\rVert_{\infty}^{2}\leqslant Nm\sigma_{\max}(S)U_{\max}^{2}.

This completes the proof. ∎

6. Examples

Consider the system (2.1a)-(2.1b) with the following matrices:

A=\left[\begin{matrix}0.5&0&0\\ 0&0&-1\\ 0&1&0\end{matrix}\right],B=\left[\begin{matrix}1\\ 0\\ 1\end{matrix}\right],\text{ and }C=I.

The simulation data was chosen to be $x_{0}\sim\mathcal{N}(0,I)$ , $w_{t}\sim\mathcal{N}(0,10I)$ , $v_{t}=\mathcal{N}(0,10I)$ , $Q=I$ , $R=1$ , $N=5$ , $N_{c}=\kappa=2$ , and $\varphi$ the usual piecewise linear saturation function with $\varphi_{\max}=1$ . For this example the theoretical bound on the input is $U^{*}_{\max}=168.6783$ for a choice of $\epsilon=10$ .

We simulated the system for the discrete-time interval $[0,200]$ using Algorithm 1, without the constraints (C3) and (C4). The coding was done using MATLAB and the optimization problem was solved using yalmip [Löfberg, 2004] and sdpt3 [Toh et al., 1999]. The computation of the matrices $\Lambda^{\varphi e}(P^{\circ})$ , $\Lambda^{\varphi}(P^{\circ})$ , $\Lambda^{w\varphi}(P^{\circ})$ , and $\Lambda^{\varphi\varphi}(P^{\circ})$ was done off-line using the steady state error covariance matrix $P^{\circ}$ , as discussed in the previous section, via classical Monte Carlo integration [Robert and Casella, 2004, Section 3.2] using $10^{5}$ samples.

The norms of the state trajectories for 200 different sample paths of the process and measurement noises are plotted in Figure 2 starting from $x_{0}=\left[\begin{matrix}97.38&100.19&99.78\end{matrix}\right]^{T}$ . The average state norm as well as the standard deviation of the state norm are depicted in Figure 3, where it is clear that the proposed controller renders the system mean-square bounded. The average total cost normalized by time for this simulation is plotted in Figure 4.

7. Conclusions

We presented a method for stochastic receding horizon control of discrete-time linear systems with process and measurement noise and bounded input policies. We showed that the optimization problem solved periodically is successively feasible and convex. Moreover, we illustrated how a certain stability condition can be utilized to show that the application of the receding horizon controller renders the state of the system mean-square bounded. We discussed how certain matrices in the cost function can be computed off-line and provided an example that illustrates our approach.

References

Agarwal et al. [2009] Agarwal, M., Cinquemani, E., Chatterjee, D., Lygeros, J., 2009. On convexity of stochastic optimization problems with constraints. In: European Control Conference. pp. 2827–2832.
Anderson and Moore [1979] Anderson, B., Moore, J., 1979. Optimal Filtering. Prentice-Hall.
Batina [2004] Batina, I., 2004. Model predictive control for stochastic systems by randomized algorithms. Ph.D. thesis, Technische Universiteit Eindhoven.
Bemporad and Morari [1999] Bemporad, A., Morari, M., 1999. Robust model predictive control: a survey. Robustness in Identification and Control 245, 207–226.
Ben-Tal et al. [2006] Ben-Tal, A., Boyd, S., Nemirovski, A., 2006. Extending scope of robust optimization: Comprehensive robust counterparts of uncertain problems. Journal of Mathematical Programming 107, 63–89.
Ben-Tal et al. [2004] Ben-Tal, A., Goryashko, A., Guslitzer, E., Nemirovski, A., 2004. Adjustable robust solutions of uncertain linear programs. Mathematical Programming 99 (2), 351–376.
Bernstein [2009] Bernstein, D. S., 2009. Matrix Mathematics, 2nd Edition. Princeton University Press.
Bertsekas [2000] Bertsekas, D. P., 2000. Dynamic Programming and Optimal Control, 2nd Edition. Vol. 1. Athena Scientific.
Bertsekas [2005] Bertsekas, D. P., 2005. Dynamic programming and suboptimal control: a survey from ADP to MPC. European Journal of Control 11 (4-5).
Bertsekas [2007] Bertsekas, D. P., 2007. Dynamic Programming and Optimal Control, 3rd Edition. Vol. 2. Athena Scientific.
Bertsimas and Brown [2007] Bertsimas, D., Brown, D. B., 2007. Constrained stochastic LQC: a tractable approach. IEEE Transactions on Automatic Control 52 (10), 1826–1841.
Bertsimas et al. [2009] Bertsimas, D., Iancu, D. A., Parrilo, P. A., 2009. Optimality of affine policies in multi-stage robust optimizationPreprint available at: http://arxiv.org/abs/0904.3986.
Blackmore and Williams [2007] Blackmore, L., Williams, B. C., July 2007. Optimal, robust predictive control of nonlinear systems under probabilistic uncertainty using particles. In: Proceedings of the American Control Conference. pp. 1759 – 1761.
Boyd and Vandenberghe [2004] Boyd, S., Vandenberghe, L., 2004. Convex Optimization. Cambridge University Press, Cambridge, sixth printing with corrections, 2008.
Cannon et al. [2009] Cannon, M., Kouvaritakis, B., Wu, X., July 2009. Probabilistic constrained MPC for systems with multiplicative and additive stochastic uncertainty. IEEE Transactions on Automatic Control 54 (7), 1626–1632.
Chatterjee et al. [2009] Chatterjee, D., Hokayem, P., Lygeros, J., 2009. Stochastic receding horizon control with bounded control inputs: a vector space approach. IEEE Transactions on Automatic ControlUnder review. http://arxiv.org/abs/0903.5444.
Ferrante et al. [2002] Ferrante, A., Picci, G., Pinzoni, S., 2002. Silverman algorithm and the structure of discrete-time stochastic systems. Linear Algebra and its Applications 351/352, 219–242.
Goulart et al. [2006] Goulart, P. J., Kerrigan, E. C., Maciejowski, J. M., 2006. Optimization over state feedback policies for robust control with constraints. Automatica 42 (4), 523–533.
Grant and Boyd [2000] Grant, M., Boyd, S., December 2000. CVX: Matlab software for disciplined convex programming (web page and software). http://stanford.edu/~boyd/cvx.
Hokayem et al. [2009] Hokayem, P., Chatterjee, D., Lygeros, J., 2009. On stochastic model predictive control with bounded control inputs. In: Proceedings of the combined 48th IEEE Conference on Decision & Control and 28th Chinese Control Conference. pp. 6359–6364, available at http://arxiv.org/abs/0902.3944.
Horn and Johnson [1990] Horn, R. A., Johnson, C. R., 1990. Matrix Analysis. Cambridge University Press, Cambridge.
Jazwinski [1970] Jazwinski, A. H., 1970. Stochastic Processes and Filtering Theory. Academic Press.
Kamen and Su [1999] Kamen, E. W., Su, J. K., 1999. Introduction to Optimal Estimation. Springer, London, UK.
Klein Haneveld [1983] Klein Haneveld, W. K., 1983. On integrated chance constraints. In: Stochastic programming (Gargnano). Vol. 76 of Lecture Notes in Control and Inform. Sci. Springer, Berlin, pp. 194–209.
Klein Haneveld and van der Vlerk [2006] Klein Haneveld, W. K., van der Vlerk, M. H., 2006. Integrated chance constraints: reduced forms and an algorithm. Computational Management Science 3 (4), 245–269.
Kumar and Varaiya [1986] Kumar, P. R., Varaiya, P., 1986. Stochastic Systems: Estimation, Identification, and Adaptive Control. Prentice Hall.
Lazar et al. [2007] Lazar, M., Heemels, W. P. M. H., Bemporad, A., Weiland, S., 2007. Discrete-time non-smooth nonlinear MPC: stability and robustness. In: Lecture Notes in Control and Information Sciences. Vol. 358. Springer-Verlag, pp. 93–103.
Löfberg [2003] Löfberg, J., 2003. Minimax Approaches to Robust Model Predictive Control. Ph.D. thesis, Linköpings Universitet.
Löfberg [2004] Löfberg, J., 2004. YALMIP : A Toolbox for Modeling and Optimization in MATLAB. In: Proceedings of the CACSD Conference. Taipei, Taiwan.
Maciejowski [2001] Maciejowski, J. M., 2001. Predictive Control with Constraints. Prentice Hall.
Maciejowski et al. [2005] Maciejowski, M., Lecchini, A., Lygeros, J., 2005. NMPC for complex stochastic systems using Markov Chain Monte Carlo. Vol. 358/2007 of Lecture Notes in Control and Information Sciences. Springer, Stuttgart, Germany, pp. 269–281.
Mayne et al. [2000] Mayne, D. Q., Rawlings, J. B., Rao, C. V., Scokaert, P. O. M., Jun 2000. Constrained model predictive control: stability and optimality. Automatica 36 (6), 789–814.
Oldewurtel et al. [2008] Oldewurtel, F., Jones, C., Morari, M., Dec 2008. A tractable approximation of chance constrained stochastic MPC based on affine disturbance feedback. In: Conference on Decision and Control, CDC. Cancun, Mexico.
Pemantle and Rosenthal [1999] Pemantle, R., Rosenthal, J. S., 1999. Moment conditions for a sequence with negative drift to be uniformly bounded in $L^{r}$ . Stochastic Processes and their Applications 82 (1), 143–155.
Primbs [2007] Primbs, J., 2007. A soft constraint approach to stochastic receding horizon control. In: Proceedings of the 46th IEEE Conference on Decision and Control. pp. 4797 – 4802.
Primbs and Sung [2009] Primbs, J. A., Sung, C. H., Feb. 2009. Stochastic receding horizon control of constrained linear systems with state and control multiplicative noise. IEEE Transactions on Automatic Control 54 (2), 221–230.
Qin and Badgwell [2003] Qin, S. J., Badgwell, T., Jul. 2003. A survey of industrial model predictive control technology. Control Engineering Practice 11 (7), 733–764.
Ramponi et al. [2009] Ramponi, F., Chatterjee, D., Milias-Argeitis, A., Hokayem, P., Lygeros, J., 2009. Attaining mean square boundedness of a marginally stable noisy linear system with a bounded control input. http://arxiv.org/abs/0907.1436, submitted to IEEE Transactionson Automatic Control, Revised Jan 2010.
Robert and Casella [2004] Robert, C. P., Casella, G., 2004. Monte Carlo Statistical Methods, 2nd Edition. Springer.
Skaf and Boyd [2009] Skaf, J., Boyd, S., 2009. Design of affine controllers via convex optimization. http://www.stanford.edu/~boyd/papers/affine_contr.html.
Toh et al. [1999] Toh, K., Todd, M., Tatuncu, R., 1999. SDPT3– a Matlab software package for semidefinite programming. Optimization Methods and Software (11), 545–581, http://www.math.nus.edu.sg/~mattohkc/sdpt3.html.
van Hessem and Bosgra [2003] van Hessem, D. H., Bosgra, O. H., 2003. A full solution to the constrained stochastic closed-loop MPC problem via state and innovations feedback and its receding horizon implementation. In: Proceedings of the 42nd IEEE Conference on Decision and Control. Vol. 1. pp. 929–934.
Wang and Boyd [2009] Wang, Y., Boyd, S., 2009. Peformance bounds for linear stochastic control. Systems and Control Letters 58 (3), 178–182.
Yang et al. [1997] Yang, Y. D., Sontag, E. D., Sussmann, H. J., 1997. Global stabilization of linear discrete-time systems with bounded feedback. Systems and Control Letters 30 (5), 273–281.

Appendix A Proof of Proposition 3

For $t=0$ , $x_{0}\sim\mathcal{N}({\hat{x}}_{0|-1},P_{0|-1})$ , with $P_{0|-1}>0$ , by assumption. Assume now that, for a given $t\geqslant 0$ , $f(x_{t}|\mathcal{Y}_{t-1})$ is normal with mean ${\hat{x}}_{t|t-1}$ and covariance matrix $P_{t|t-1}>0$ . By Assumption 1-(ii) and dynamics in (2.1b), $f(y_{t}|x_{t},\mathcal{Y}_{t-1})$ is also normal with mean $Cx_{t}$ and covariance matrix $\Sigma_{v}$ . Hence, applying Bayes rule we may write

f(x_{t}|\mathcal{Y}_{t})=\frac{f(y_{t}|x_{t},\mathcal{Y}_{t-1})f(x_{t}|\mathcal{Y}_{t-1})}{f(y_{t}|\mathcal{Y}_{t-1})},

where we have $f(y_{t}|\mathcal{Y}_{t-1})=\int f(y_{t}|x_{t},\mathcal{Y}_{t})f(x_{t}|\mathcal{Y}_{t-1})\text{d}x_{t}$ by the Chapman-Kolmogorov equation. It follows that

	$\displaystyle f(x_{t}\|\mathcal{Y}_{t})$	$\displaystyle\propto f(y_{t}\|x_{t},\mathcal{Y}_{t-1})f(x_{t}\|\mathcal{Y}_{t-1})$
		$\displaystyle\propto\exp\biggl{(}-\frac{1}{2}\Bigl{[}(y_{t}-Cx_{t})^{\mathsf{T}}\Sigma_{v}^{-1}(y_{t}-Cx_{t})+(x_{t}-\hat{x}_{t\|t-1})^{\mathsf{T}}P_{t\|t-1}^{-1}(x_{t}-\hat{x}_{t\|t-1})\Bigr{]}\biggr{)},$

where the proportionality constants do not depend on $x_{t}$ . Let us now focus on the term within square brackets and write $x$ , $y$ , $\hat{x}$ and $P$ in place of $x_{t}$ , $y_{t}$ , $\hat{x}_{t|t-1}$ and $P_{t|t-1}$ for shortness. Expanding the products and collecting the linear and quadratic terms in $x$ one gets

\displaystyle x^{\mathsf{T}}(C^{\mathsf{T}}\Sigma_{v}^{-1}C+P^{-1})x-2x^{\mathsf{T}}(C^{\mathsf{T}}\Sigma_{v}^{-1}y+P^{-1}\hat{x})+(y^{\mathsf{T}}\Sigma_{v}^{-1}y+\hat{x}^{\mathsf{T}}P^{-1}\hat{x}).

Since $P>0$ and $\Sigma_{v}>0$ by assumption, it follows that $C^{\mathsf{T}}\Sigma_{v}^{-1}C+P^{-1}>0$ . If we let $P_{*}=(C^{\mathsf{T}}\Sigma_{v}^{-1}C+P^{-1})^{-1}$ , the expression above can be rewritten as

\displaystyle(P_{*}^{-1}x)^{\mathsf{T}}P_{*}(P_{*}^{-1}x)-2(P_{*}^{-1}x)^{\mathsf{T}}P_{*}(C^{\mathsf{T}}\Sigma_{v}^{-1}y+P^{-1}\hat{x})+(y^{\mathsf{T}}\Sigma_{v}^{-1}y+\hat{x}^{\mathsf{T}}P^{-1}\hat{x}).

Completing the square, the latter expression becomes

\displaystyle[P_{*}^{-1}x-(C^{\mathsf{T}}\Sigma_{v}^{-1}y+P^{-1}\hat{x})]^{\mathsf{T}}P_{*}[P_{*}^{-1}x-(C^{\mathsf{T}}\Sigma_{v}^{-1}y+P^{-1}\hat{x})]+c,

where $c$ depends on $\hat{x}$ and $y$ but not on $x$ . Factoring $P_{*}^{-1}$ out of the two terms $P_{*}^{-1}x-(C^{\mathsf{T}}\Sigma_{v}^{-1}y+P^{-1}\hat{x})$ and simplifying yields

\displaystyle[x-P_{*}(C^{\mathsf{T}}\Sigma_{v}^{-1}y+P^{-1}\hat{x})]^{\mathsf{T}}P_{*}^{-1}[x-P_{*}(C^{\mathsf{T}}\Sigma_{v}^{-1}y+P^{-1}\hat{x})]+c.

Hence,

f(x_{t}|\mathcal{Y}_{t})\propto\exp\left(-\frac{1}{2}(x_{t}-\hat{x}_{*})^{\mathsf{T}}P_{*}^{-1}(x_{t}-\hat{x}_{*})\right),

with $\hat{x}_{*}=P_{*}(C^{\mathsf{T}}\Sigma_{v}^{-1}y+P^{-1}\hat{x})$ , where the missing normalization constant does not depend on $x_{t}$ . Hence, $f(x_{t}|\mathcal{Y}_{t})$ is normal with mean $\hat{x}_{t|t}=\hat{x}_{*}$ and covariance matrix $P_{t|t}=P_{*}$ . Using the matrix inversion lemma, $P_{*}=P-PC^{\mathsf{T}}(CPC^{\mathsf{T}}+\Sigma_{v})^{-1}CP$ , which is (2.7). To obtain (2.6), replace $y$ by $(y-C\hat{x})+C\hat{x}$ in the expression of $\hat{x}_{*}$ to get

	$\displaystyle\hat{x}_{*}$	$\displaystyle=P_{*}\left[C^{\mathsf{T}}\Sigma_{v}^{-1}(y-C\hat{x})+C^{\mathsf{T}}\Sigma_{v}^{-1}C\hat{x}+P^{-1}\hat{x}\right]$
		$\displaystyle=P_{}C^{\mathsf{T}}\Sigma_{v}^{-1}(y-C\hat{x})+P_{}(C^{\mathsf{T}}\Sigma_{v}^{-1}C+P^{-1})\hat{x}$
		$\displaystyle=P_{*}C^{\mathsf{T}}\Sigma_{v}^{-1}(y-C\hat{x})+\hat{x}.$

Finally, using the identity $I=(CPC^{\mathsf{T}}+\Sigma_{v})(CPC^{\mathsf{T}}+\Sigma_{v})^{-1}$ (where the inverse exists because $P\geqslant 0$ and $\Sigma_{v}>0$ ) and the definition of $P_{*}$ ,

	$\displaystyle P_{*}C^{\mathsf{T}}\Sigma_{v}^{-1}$	$\displaystyle=P_{*}C^{\mathsf{T}}\Sigma_{v}^{-1}(CPC^{\mathsf{T}}+\Sigma_{v})(CPC^{\mathsf{T}}+\Sigma_{v})^{-1}$
		$\displaystyle=P_{*}(C^{\mathsf{T}}\Sigma_{v}^{-1}CPC^{\mathsf{T}}+C^{\mathsf{T}})(CPC^{\mathsf{T}}+\Sigma_{v})^{-1}$
		$\displaystyle=P_{*}(C^{\mathsf{T}}\Sigma_{v}^{-1}C+P^{-1})PC^{\mathsf{T}}(CPC^{\mathsf{T}}+\Sigma_{v})^{-1}$
		$\displaystyle=PC^{\mathsf{T}}(CP\bar{C}^{\mathsf{T}}+\Sigma_{v})^{-1},$

which leads to (2.6).

To conclude the proof, we need to compute the density $f(x_{t+1}|\mathcal{Y}_{t})$ and prove (2.8)–(2.9). Using the Chapman-Kolmogorov equation, we can compute the density as follows

f(x_{t+1}|\mathcal{Y}_{t})=\int f(x_{t+1}|x_{t},\mathcal{Y}_{t})f(x_{t}|\mathcal{Y}_{t})\text{d}x_{t}.

(A.1)

However, the explicit computation of the integral in (A.1) is a very tedious exercise. We shall instead rely on characteristic functions. Recall that the characteristic function of an $n$ -dimensional Gaussian random vector $\xi$ with mean $\mu$ and covariance $\Sigma\geqslant 0$ is given by $\Psi(z)=\mathbb{E}[\exp{(iz^{\mathsf{T}}\xi)}]=\exp\left(iz^{\mathsf{T}}\mu-\frac{1}{2}z^{\mathsf{T}}\Sigma z\right)$ , where $i\coloneqq\sqrt{-1}$ and $z\in\mathbb{R}^{n}$ . The characteristic function of $x_{t+1}$ given $\mathcal{Y}_{t}$ is then

	$\displaystyle\Psi(z)$	$\displaystyle=\mathbb{E}[\exp(iz^{\mathsf{T}}x_{t+1})\|\mathcal{Y}_{t}]$
		$\displaystyle=\mathbb{E}\left[\mathbb{E}\big{[}\exp\big{(}iz^{\mathsf{T}}(Ax_{t}+Bu_{t}+w_{t})\big{)}\,\big{\|}\,x_{t},\mathcal{Y}_{t}\big{]}\,\big{\|}\,\mathcal{Y}_{t}\right]$
		$\displaystyle=\mathbb{E}\left[\exp\big{(}iz^{\mathsf{T}}Ax_{t}+iz^{\mathsf{T}}Bg_{t}(\mathcal{Y}_{t})\big{)}\right.\times\left.\mathbb{E}\big{[}\exp(iz^{\mathsf{T}}w_{t})\big{\|}x_{t},\mathcal{Y}_{t}\big{]}\big{\|}\mathcal{Y}_{t}\right],$

where we have used system dynamics in (2.1a) and the general definition of the feedback policies (2.4). Since $w_{t}$ is independent of $x_{t}$ and $\mathcal{Y}_{t}$ and it is Gaussian with mean $0$ and covariance $\Sigma_{w}$ , $\mathbb{E}\big{[}\exp(iz^{\mathsf{T}}w_{t})\big{|}x_{t},\mathcal{Y}_{t}\big{]}=\exp\left(-\frac{1}{2}z^{\mathsf{T}}\Sigma_{w}z\right)$ . Therefore the last expression in the chain becomes

\mathbb{E}\left[\exp\left(iz^{\mathsf{T}}Ax_{t}\right)\ |\ \mathcal{Y}_{t}\right]\exp\left(iz^{\mathsf{T}}Bg_{t}(\mathcal{Y}_{t})\right)\exp\left(-\frac{1}{2}z^{\mathsf{T}}\Sigma_{w}z\right).

It was proved above that, conditionally on $\mathcal{Y}_{t}$ , $x_{t}$ is Gaussian with mean ${\hat{x}}_{t|t}$ and covariance matrix $P_{t|t}$ . Hence

	$\displaystyle\mathbb{E}[\exp(iz^{\mathsf{T}}Ax_{t})\|\mathcal{Y}_{t}]$	$\displaystyle=\mathbb{E}\left[\exp\big{(}i(A^{\mathsf{T}}z)^{\mathsf{T}}x_{t}\big{)}\big{\|}\mathcal{Y}_{t}\right]$
		$\displaystyle=\exp\left(i(A^{\mathsf{T}}z)^{\mathsf{T}}{\hat{x}}_{t\|t}-\frac{1}{2}(A^{\mathsf{T}}z)^{\mathsf{T}}P_{t\|t}(A^{\mathsf{T}}z)\right),$

and consequently

	$\displaystyle\Psi(z)$	$\displaystyle=\exp\left(iz^{\mathsf{T}}A\hat{x}_{t\|t}-\frac{1}{2}z^{\mathsf{T}}AP_{t\|t}A^{\mathsf{T}}z\right)\times\exp\left(iz^{\mathsf{T}}Bg_{t}(\mathcal{Y}_{t})\right)\exp\left(-\frac{1}{2}z^{\mathsf{T}}\Sigma_{w}z\right)$
		$\displaystyle=\exp\Bigl{(}iz^{\mathsf{T}}\big{(}A{\hat{x}}_{t\|t}+Bg_{t}(\mathcal{Y}_{t})\big{)}-\frac{1}{2}z^{\mathsf{T}}(AP_{t\|t}A^{\mathsf{T}}+\Sigma_{w})z\Bigr{)},$

which is the characteristic function of a Gaussian random vector with mean $\hat{x}_{t+1|t}=A\hat{x}_{t|t}+Bg_{t}(\mathcal{Y}_{t})$ and covariance matrix $P_{t+1|t}=AP_{t|t}A^{\mathsf{T}}+\Sigma_{w}>0$ .

Appendix B

Lemma 15.

Consider the system (2.1a)-(2.1b), and suppose that $(A,\Sigma_{w}^{1/2})$ be stabilizable and $(A,C)$ be observable. In addition, assume that $P_{0|0}\geqslant 0$ . Then there exist constants $\rho>0$ and an integer $T$ large enough such that

\mathbb{E}_{\mathcal{Y}_{t}}\bigl{[}\left\lVert{x_{t}-\hat{x}_{t}}\right\rVert^{2}\bigr{]}=\mathsf{tr}\!\left(P_{t|t}\right)\leqslant\rho\qquad\text{for all }t\geqslant T.

(B.1)

Proof.

First, observe that

\sum_{i=0}^{\kappa_{1}-1}A^{i}\Sigma_{w}(A^{i})^{\mathsf{T}}=\Bigl{[}\Sigma_{w}^{1/2}\;A\Sigma_{w}^{1/2}\;\cdots\;A^{\kappa_{1}-1}\Sigma_{w}^{1/2}\Bigr{]}\Bigl{[}\Sigma_{w}^{1/2}\;A\Sigma_{w}^{1/2}\;\cdots\;A^{\kappa_{1}-1}\Sigma_{w}^{1/2}\Bigr{]}^{\mathsf{T}},

and since $(A,\Sigma_{w}^{1/2})$ is controllable by Assumption 5-(iii), we see that there exists $\kappa_{1}\in\mathbb{N}_{+}$ such that for all $k\geqslant\kappa_{1}$ the rank of $\Bigl{[}\Sigma_{w}^{1/2}\;A\Sigma_{w}^{1/2}\;\cdots\;A^{\kappa_{1}-1}\Sigma_{w}^{1/2}\Bigr{]}=n$ ; indeed, $\kappa_{1}$ is the reachability index of $(A,\Sigma_{w}^{1/2})$ . Thus, $\sum_{i=0}^{\kappa_{1}-1}A^{i}\Sigma_{w}(A^{i})^{\mathsf{T}}$ is positive definite, and therefore, there exists some $\delta_{1}^{\prime},\delta_{2}^{\prime}>0$ such that

\delta_{1}^{\prime}I\leqslant\sum_{i=0}^{\kappa_{1}-1}A^{i}\Sigma_{w}(A^{i})^{\mathsf{T}}\leqslant\delta_{2}^{\prime}I.

(B.2)

Second, observe that³³3Here $\otimes$ denotes the standard Kronecker product.

\sum_{i=0}^{\kappa_{2}-1}(A^{i})^{\mathsf{T}}C^{\mathsf{T}}\Sigma_{v}^{-1}CA^{i}=\begin{bmatrix}C\\ CA\\ \vdots\\ CA^{\kappa_{2}-1}\end{bmatrix}^{\mathsf{T}}\bigl{(}I_{\kappa_{2}}\otimes\Sigma_{v}^{-1}\bigr{)}\begin{bmatrix}C\\ CA\\ \vdots\\ CA^{\kappa_{2}-1}\end{bmatrix}.

Since $(A,C)$ is observable by assumption, there exists $\kappa_{2}\in\mathbb{N}_{+}$ such that the rank of the matrix $\Bigl{[}C^{\mathsf{T}}\;A^{\mathsf{T}}C^{\mathsf{T}}\;\cdots\;(A^{\kappa_{2}-1})^{\mathsf{T}}C^{\mathsf{T}}\Bigr{]}^{\mathsf{T}}$ is $n$ . The matrix $I_{\kappa_{2}}\otimes\Sigma_{v}^{-1}$ is clearly positive definite by Assumption 5’(iii), and therefore, we see that there exists $\delta_{1}^{\prime\prime},\delta_{2}^{\prime\prime}>0$ such that

\delta_{1}^{\prime\prime}I\leqslant\sum_{i=0}^{\kappa_{2}-1}(A^{i})^{\mathsf{T}}C^{\mathsf{T}}\Sigma_{v}^{-1}CA^{i}\leqslant\delta_{2}^{\prime\prime}I.

(B.3)

Third, the conditions of Lemma 7.1 in [Jazwinski, 1970, pp. 234] are satisfied, as (B.2) and (B.3) hold with $\delta_{1}=\min\{\delta_{1}^{\prime},\delta_{1}^{\prime\prime}\}$ and $\delta_{2}=\max\{\delta_{2}^{\prime},\delta_{2}^{\prime\prime}\}$ , and the bound $P_{t|t}\leqslant\rho^{\prime}I$ for some $\rho^{\prime}>0$ is established for all $t\geqslant T\coloneqq\max\{\kappa_{1},\ \kappa_{2}\}$ . The assertion now follows immediately from:

\mathbb{E}_{\mathcal{Y}_{t}}\bigl{[}\left\lVert{x_{t}-\hat{x}_{t}}\right\rVert^{2}\bigr{]}=\mathsf{tr}\!\left(P_{t|t}\right)\leqslant n\lambda_{\max}(P_{t|t})\leqslant n\rho^{\prime}\eqqcolon\rho.\qed

$\displaystyle\hat{x}_{t+1\|t+1}$	$\displaystyle=\hat{x}_{t+1\|t}$
	$\displaystyle\quad+P_{t+1\|t}C^{\mathsf{T}}(CP_{t+1\|t}C^{\mathsf{T}}+\Sigma_{v})^{-1}(y_{t+1}-C\hat{x}_{t+1\|t}),$	(2.6)
$\displaystyle P_{t+1\|t+1}$	$\displaystyle=P_{t+1\|t}-P_{t+1\|t}C^{\mathsf{T}}(CP_{t+1\|t}C^{\mathsf{T}}+\Sigma_{v})^{-1}CP_{t+1\|t},$	(2.7)

	$\displaystyle\hat{x}_{t+1\|t}$	$\displaystyle=A\hat{x}_{t\|t}+Bg_{t}(\mathcal{Y}_{t}),$		(2.8)
	$\displaystyle P_{t+1\|t}$	$\displaystyle=AP_{t\|t}A^{\mathsf{T}}+\Sigma_{w}.$		(2.9)

	$\displaystyle\mathbb{E}\Bigl{[}\left\lvert{\left\lVert{{\hat{x}}^{(2)}_{\kappa(t+1)}}\right\rVert-\left\lVert{{\hat{x}}^{(2)}_{\kappa t}}\right\rVert}\right\rvert^{4}\,\Big{\|}\,\left\lVert{{\hat{x}}^{(2)}_{\kappa i}}\right\rVert,\;i={\left\lceil{{T}^{\prime}/\kappa}\right\rceil},\ldots,t\Bigr{]}$
	$\displaystyle=\mathbb{E}\Bigl{[}\left\lvert{\left\lVert{{\hat{x}}^{(2)}_{\kappa(t+1)}}\right\rVert-\left\lVert{A_{2}^{\kappa}{\hat{x}}^{(2)}_{\kappa t}}\right\rVert}\right\rvert^{4}\,\Big{\|}\,\left\lVert{{\hat{x}}^{(2)}_{\kappa i}}\right\rVert,\;i={\left\lceil{{T}^{\prime}/\kappa}\right\rceil},\ldots,t\Bigr{]}$
	$\displaystyle\leqslant\mathbb{E}\Bigl{[}\left\lVert{A_{2}^{\kappa}{\hat{x}}^{(2)}_{\kappa t}+\mathfrak{R}_{\kappa}(A_{2},B_{2})u_{\kappa t:\kappa(t+1)-1}+\Xi_{\kappa t}-A_{2}^{\kappa}{\hat{x}}^{(2)}_{\kappa t}}\right\rVert^{4}\,\Big{\|}\,\left\lVert{{\hat{x}}^{(2)}_{\kappa i}}\right\rVert,\;i={\left\lceil{{T}^{\prime}/\kappa}\right\rceil},\ldots,t\Bigr{]}$
	$\displaystyle=\mathbb{E}\Bigl{[}\Bigl{(}\left\lVert{\mathfrak{R}_{\kappa}(A_{2},B_{2})u_{\kappa t:\kappa(t+1)-1}}\right\rVert+\left\lVert{\Xi_{\kappa t}}\right\rVert\Bigr{)}^{4}\,\Big{\|}\,\left\lVert{{\hat{x}}^{(2)}_{\kappa i}}\right\rVert,\;i={\left\lceil{{T}^{\prime}/\kappa}\right\rceil},\ldots,t\Bigr{]}.$

	$\displaystyle\Psi(z)$	$\displaystyle=\mathbb{E}[\exp(iz^{\mathsf{T}}x_{t+1})\|\mathcal{Y}_{t}]$
		$\displaystyle=\mathbb{E}\left[\mathbb{E}\big{[}\exp\big{(}iz^{\mathsf{T}}(Ax_{t}+Bu_{t}+w_{t})\big{)}\,\big{\|}\,x_{t},\mathcal{Y}_{t}\big{]}\,\big{\|}\,\mathcal{Y}_{t}\right]$
		$\displaystyle=\mathbb{E}\left[\exp\big{(}iz^{\mathsf{T}}Ax_{t}+iz^{\mathsf{T}}Bg_{t}(\mathcal{Y}_{t})\big{)}\right.\times\left.\mathbb{E}\big{[}\exp(iz^{\mathsf{T}}w_{t})\big{\|}x_{t},\mathcal{Y}_{t}\big{]}\big{\|}\mathcal{Y}_{t}\right],$

Stochastic receding horizon control with output feedback and bounded control inputs

Abstract.

Key words and phrases:

1. Introduction

Related Works

Organization

Notation

2. Problem Setup

2.1. Dynamics, Performance Index and Estimator

Assumption 1.

Assumption 2.

Proposition 3.

2.2. Optimization Problem and Control Policies

Compact Notation

3. Main Results

Scholium 4.

Assumption 5.

Theorem 6 (Main Result).

Assumption 7.

Corollary 8.

4. Discussion

4.1. Other Constraints and Policies

4.2. Off-Line Computation of the Λ\Lambda Matrices

Proposition 9.

5. Proofs

Fact 10.

Proof of Scholium 4.

Proposition 11 ([Pemantle and Rosenthal, 1999, Theorem 1]).

Lemma 12.

Proof.

Lemma 13.

Proof.

Lemma 14.

Proof.

Proof of Theorem 6.

Proof of Corollary 8.

6. Examples

7. Conclusions

References

Appendix A Proof of Proposition 3

Appendix B

Lemma 15.

Proof.

4.2. Off-Line Computation of the $\Lambda$ Matrices